a monte carlo study of recovery of weak factor loadings in confirmatory factor analysis

This article was downloaded by: [Umeå University Library]On: 15 November 2014, At: 01:31Publisher: RoutledgeInforma Ltd Registered in England and Wales Registered Number: 1072954Registered office: Mortimer House, 37-41 Mortimer Street, London W1T 3JH,UK

Structural Equation Modeling: AMultidisciplinary JournalPublication details, including instructions forauthors and subscription information:http://www.tandfonline.com/loi/hsem20

A Monte Carlo Study ofRecovery of Weak FactorLoadings in ConfirmatoryFactor AnalysisCarmen XiménezPublished online: 19 Nov 2009.

To cite this article: Carmen Ximénez (2006) A Monte Carlo Study of Recovery of WeakFactor Loadings in Confirmatory Factor Analysis, Structural Equation Modeling: AMultidisciplinary Journal, 13:4, 587-614, DOI: 10.1207/s15328007sem1304_5

To link to this article: http://dx.doi.org/10.1207/s15328007sem1304_5

PLEASE SCROLL DOWN FOR ARTICLE

Taylor & Francis makes every effort to ensure the accuracy of all theinformation (the “Content”) contained in the publications on our platform.However, Taylor & Francis, our agents, and our licensors make norepresentations or warranties whatsoever as to the accuracy, completeness,or suitability for any purpose of the Content. Any opinions and viewsexpressed in this publication are the opinions and views of the authors, andare not the views of or endorsed by Taylor & Francis. The accuracy of theContent should not be relied upon and should be independently verified withprimary sources of information. Taylor and Francis shall not be liable for anylosses, actions, claims, proceedings, demands, costs, expenses, damages,and other liabilities whatsoever or howsoever caused arising directly orindirectly in connection with, in relation to or arising out of the use of theContent.

http://www.tandfonline.com/loi/hsem20

http://www.tandfonline.com/action/showCitFormats?doi=10.1207/s15328007sem1304_5

http://dx.doi.org/10.1207/s15328007sem1304_5

This article may be used for research, teaching, and private study purposes.Any substantial or systematic reproduction, redistribution, reselling, loan,sub-licensing, systematic supply, or distribution in any form to anyone isexpressly forbidden. Terms & Conditions of access and use can be found athttp://www.tandfonline.com/page/terms-and-conditions

Dow

nloa

ded

by [

Um

eå U

nive

rsity

Lib

rary

] at

01:

31 1

5 N

ovem

ber

2014

http://www.tandfonline.com/page/terms-and-conditions

A Monte Carlo Study of Recoveryof Weak Factor Loadings in

Confirmatory Factor Analysis

Carmen XiménezAutonoma University of Madrid, Spain

The recovery of weak factors has been extensively studied in the context of exploratoryfactoranalysis.Thisarticlepresents the resultsofaMonteCarlo simulationstudyof re-coveryofweakfactor loadings inconfirmatory factoranalysisunderconditionsofesti-mationmethod(maximumlikelihoodvs.unweighted least squares), samplesize, load-ing size, factor correlation, and model specification (correct vs. incorrect). The effectsof these variables on goodness of fit and convergence are also examined. Results showthat recovery of weak factor loadings, goodness of fit, and convergence are improvedwhen factors are correlated and models are correctly specified. Additionally, un-weighted least squares produces more convergent solutions and successfully recoversthe weak factor loadings in some instances where maximum likelihood fails. The im-plications of these findings are discussed and compared to previous research.

The development of confirmatory factor analysis (CFA) has provided considerablemeans for theory construction and evaluation (Browne, 1984). The CFA model(Jöreskog & Sörbom, 1981) can be given as:

x = � � + � (1)

where x is a vector of p observed variables, � is a vector of q factors such that q < p,� is a p × q matrix of factor loadings, and � is a vector of p measurement error vari-ables. It is assumed that E(x) = E(�) = E(�) = 0 and that E(��) = 0. The covariancematrix for x, denoted by � is:

� = � � �′ + �� (2)

where � is the q × q covariance matrix of � and �� the p × p covariance matrix of�. For convenience, it is usually assumed that � = � and that �� is diagonal.

STRUCTURAL EQUATION MODELING, 13(4), 587–614Copyright © 2006, Lawrence Erlbaum Associates, Inc.

Correspondence should be addressed to Carmen Ximénez, Universidad Autonoma de Madrid,Departamento de Psicologia Social y Metodologia, Cantoblanco s/n, 28049 Madrid, Spain. E-mail:[email protected]

Dow

nloa

ded

by [

Um

eå U

nive

rsity

Lib

rary

] at

01:

31 1

5 N

ovem

ber

2014

When a CFA is conducted, the researcher must decide model specification,identification, parameter estimation method, and assessment of model fit (Bollen,1989). In practical applications, researchers often face the problem of finding fac-torial structures containing one or more weak factors. A weak factor is a factor thatshows relatively little influence on the set of measured variables or is defined bysmall loading sizes. The recovery of weak factors has been extensively studied inthe context of exploratory factor analysis (EFA). The majority of studies examinethe effect of sample size, model error, and estimation method for correctly speci-fied models. This article extends the previous study of variables that affect the re-covery of weak factor loadings to the context of CFA. We present the results of alarge Monte Carlo simulation study of recovery of weak factor loadings in CFAunder conditions of estimation method, sample size, loading size, model specifica-tion (correct vs. incorrect), and factor correlations.

PAST RESEARCH

Within the context of EFA, Briggs and MacCallum (2003) examined the perfor-mance of maximum likelihood (ML) and unweighted least squares (ULS) estima-tion methods to recover a known factor structure with relatively weak factors. Theyintroduced two types of error (model and sampling error) separately and in combi-nation. Results of a simulation study indicated that in situations with a moderateamount of error, ML often failed to recover the weak factor, whereas ULS suc-ceeded. With small sample sizes (e.g., N = 100), ML failed more often than didULS. This failure was associated with the occurrence of Heywood cases.

Other studies have examined the factor pattern recovery in EFA under conditionsof sample size, number of factors, number of indicators per factor, and level ofcommunalities. For example, MacCallum, Widaman, Zhang, and Hong (1999) andMacCallum, Widaman, Preacher, and Hong (2001) found that the factor pattern re-covery of ML solutions is better as sample size and number of indicators per factorincrease, and as number of factors decreases. They also found that as communalitiesbecome lower, achieving good recovery is more affected by sample size and factordetermination. Velicer and Fava (1998) conducted a study that included similar con-ditions to the studies by MacCallum et al. (2001; MacCallum et al., 1999) and alsoexamined the loadingsizeeffect.They found that recovery improvesas levelof load-ing size, sample size, and number of indicators per factor increase.

Within the context of CFA, there are some studies evaluating those same effectsfor the ML estimation method (Anderson & Gerbing, 1984; Boomsma, 1982;Gerbing & Anderson, 1985). Similar to the studies in the context of EFA, theyfound that with lower loading sizes the recovery improves as sample size and num-ber of indicators per factor increase. The only studies that compare estimationmethods in the context of CFA are by Olsson, Troye, and Howell (1999) andOlsson, Foss, Troye, and Howell (2000), which refer to ML, generalized least

588 XIMÉNEZ

Dow

nloa

ded

by [

Um

eå U

nive

rsity

Lib

rary

] at

01:

31 1

5 N

ovem

ber

2014

squares (GLS), and weighted least squares (WLS) solutions. They evaluated theeffect of estimation method, model misspecification, and sample size on the recov-ery of underlying structure (which they called the theoretical fit) and the goodnessof fit (which they called the empirical fit). Their results suggested better theoreticalfit by ML but at the cost of lower empirical fit. In addition, they found thatmisspecification exerted the largest effect on both theoretical and empirical fit andconcluded that the larger the degree of misspecification, the higher the discrepancybetween the methods. These studies raised the possibility that the type of con-straints could affect factor recovery in CFA.

THIS STUDY

This work extends past research examining the recovery of weak factors and fac-tor pattern recovery in the context of CFA using Monte Carlo simulation. Focusis on the recovery of weak factor loadings for a known factor structure with arelatively weak factor. This study extends past research in several ways. First,the weak factor is defined with loading sizes as those commonly found by re-searchers in practice. Previous studies define the weak factor with loading sizesbetween .40 and .60, whereas in practice, as Briggs and MacCallum (2003)noted, “a researcher would most likely consider a weak factor to be one withloadings of .20 and .30” (p. 53). Second, models with correlated factors are re-ferred to. The majority of previous studies refer to factor structures with or-thogonal factors but do not examine factor recovery with correlated factors. It ispossible that the recovery of weak factor loadings is affected if factors are corre-lated. The magnitude of this effect over a range of conditions is one topic of in-vestigation in this study. Third, we examine the recovery of weak factor loadingsunder misspecification conditions. The majority of previous studies are based onthe assumption that models are correct for the study population. This approach isof limited value as it ignores the fact that, in practice, models are incorrect(MacCallum, 2003). A common misspecification condition found in practice isthe specification of an incorrect number of factors. For example, a researchercould specify a larger number of factors in the model or a smaller one. This kindof misspecification implies that the variables load on different factors in the the-oretical (or true) model and in the estimated (or fitted) one. That is, if the modelis correctly specified, the population model and the estimated one have the samefactor loading pattern and the same number of factors, whereas if the model ismisspecified by altering the number of factors, the estimated model modifies thepattern of both the � matrix (as the dimensionality of the matrix and some struc-tural zeros change) and the � vector (as the number of factors is modified). It isunclear whether the estimated loadings are similar to the theoretical ones underthese circumstances. Previous studies in the context of EFA (e.g., Fava &Velicer, 1992, 1996) indicate that retention of an incorrect number of factors, es-

RECOVERY OF WEAK FACTOR LOADINGS IN CFA 589

Dow

nloa

ded

by [

Um

eå U

nive

rsity

Lib

rary

] at

01:

31 1

5 N

ovem

ber

2014

pecially retention of too few factors, can cause major distortion of loading pat-terns. Therefore, it is expected that the CFA models misspecified by altering thenumber of factors not only show a poorer goodness of fit but also a poorer re-covery of weak factor loadings. The aim in this article is to investigate the mag-nitude of these effects. Fourth, the effect of the estimation method on the recov-ery of weak factor loadings is studied, referring to ML and ULS estimationmethods. Given the assumptions that the observations of x are independent andhave a multivariate normal distribution, the ML method is the most commonlyused estimation procedure, as ML estimators have the desirable asymptoticproperties of being unbiased, consistent, efficient, and normally distributed. TheULS method is commonly used when the normality assumption is not met. Themain difference between both methods is in the discrepancy functions and in theassumptions regarding error (in ML all error is sampling error and in ULS it ismodel error). Unless the model holds exactly for the population, they provide es-timates that are at least slightly different (see Bollen, 1989, pp. 104–113, for amore detailed description of the ML and ULS estimation methods). Previous re-search in the context of EFA has found that ULS is superior to ML on the recov-ery of weak factors. Thus, it is expected that this finding generalizes to the con-firmatory case. Fifth, the goodness of fit of the model and the occurrence ofnonconvergent solutions and Heywood cases under the study conditions are alsoexamined. The following sections describe the main characteristics of the simu-lation design, present the results, and discuss their implications as compared toprevious research.

MONTE CARLO SIMULATION STUDY

The six-step approach for Monte Carlo simulation designs in structural equationmodels recommended by Skrondal (2000) and the Paxton, Curran, Bollen, Kirby,and Fen (2001) guidelines are used to present the design of the simulation study.

Step 1: Statement of the Research Problem

This study explores the effects of estimation method, sample size, loading size,factor correlation, and model specification (correct vs. incorrect) on the recoveryof weak factor loadings in the context of CFA. The effects of these variables on thegoodness of fit of the model and the occurrence of nonconvergent solutions andHeywood cases are also examined.

Step 2: Experimental Plan

The design was developed to address a reasonably diverse set of factor models andmodel characteristics, as to represent the range of values typically encountered inpractice. The general approach used in this study involved the following steps. First,

590 XIMÉNEZ

Dow

nloa

ded

by [

Um

eå U

nive

rsity

Lib

rary

] at

01:

31 1

5 N

ovem

ber

2014

population correlation matrices were defined as having known factor structures, un-der the assumption that the factor model holds exactly in the population. The popula-tion factor structures (or generating models) were defined on the basis of theMacCallum and Tucker (1991) model, which includes 12 measured normal vari-ables and three factors, the third factor being relatively weak. Two other factor struc-tures were defined with the same 12 measured variables. The first structure con-tained two factors, the second factor being the weak factor, whereas the secondstructurecontainedonlyonefactor,being theweakfactor.Asingle-factormodelwasincluded to examine the recovery of a single weak factor, whereas models with twoor three factors would be encountered more often in practice and allow study of howthe weak factor loadings are recovered in the presence of stronger factors. The weakfactors in these structures had loadings of .50 or below, to distinguish them from ma-jor factors, which had loadings of .70 or above. Next, sample correlation matriceswere generated from those populations, using various levels of sample size, loadingsize, and factor correlation. The sample correlation matrices were then factor ana-lyzed by ML and ULS estimation methods. Parameters were estimated for modelscorrectly and incorrectly specified. The sample factor solutions were then evaluatedto determine how the recovery of the weak factor loadings is affected by estimationmethod, sample size, factor correlation, loading size, and model specification ineach model. A detailed description of the levels of these variables follows.

One hundred was chosen as the smallest sample size. Additionally, Boomsma(1982) stated that it is dangerous to use ML CFA with sample sizes of less than100, particularly for models with relatively low factor loadings. Sample sizes of300 and 500 were used to approximate medium and relatively large samplesizes. Two levels of factor correlation were chosen for the study (null, 0, andmoderate, .50).

In the generating models (the models in the left column of Figure 1), the loadingfor every variable assigned to a factor was the same size. If the variable was not as-signed to a factor, the loading was 0. The minimum loading size in the weak factorwas .25. The values of .35 and .50 were also used to obtain more detailed informa-tion on the performance of ML and ULS at the lower and upper level of the weakfactor loadings. The theoretical values of the parameters for each factorial struc-ture of the generating models are summarized in Table 1.

Model specification for the fitted models was varied from correct (C: themodel is estimated with the correct number of factors) to incorrect (I: the modelis estimated with an incorrect number of factors). Two types of misspecificationswere introduced: I1, adding one factor to the model (henceforth referred to asoverfactoring), and I2, omitting one factor from the model (henceforth referredto as underfactoring). This resulted in five types of model: Model 1 (M1), whichhas one factor and can be correctly specified (M1_C) or misspecified byoverfactoring (M1_I1); Models 2 and 4 (M2 and M4), which have two and threefactors, respectively, and can be correctly specified (M2_C and M4_C) ormisspecified either by overfactoring (M2_I1 and M4_I1) or by underfactoring


Dow

nloa

ded

by [

Um

eå U

nive

rsity

Lib

rary

] at

01:

31 1

5 N

ovem

ber

2014

(M2_I2 and M4_I2); and finally, Models 3 and 5 (M3 and M5), which are equalto M2 and M4, respectively, but allow correlation between the factors. Figure 1summarizes all the models defined in the study. Each row of Figure 1 shows thegenerating model and the fitted models. For example, the second row shows thegenerating model for the two-factor model with orthogonal factors (M2_C) andthe fitted models (the correct one, M2_C, and the incorrect ones, M2_I1 andM2_I2).

592 XIMÉNEZ

FIGURE 1 Model specification. C = correct model; I1 = model misspecified by over-factoring; I2 = model misspecified by underfactoring.

Dow

nloa

ded

by [

Um

eå U

nive

rsity

Lib

rary

] at

01:

31 1

5 N

ovem

ber

2014

The dependent variables are the recovery of the weak factor loadings, the good-ness of fit of the model, and the occurrence of nonconvergent solutions and Hey-wood cases (all these variables are defined in Step 6). The variables in the overalldesign are summarized in Table 2.

Step 3: Simulation

The population factor structures defined in Table 1 for generating the models (i.e.,the models in Figure 1 denoted by C) were used as the basis to simulate the samplecorrelation matrices. One thousand sample correlation matrices were simulatedwith thePRELIS2programofJöreskogandSörbom(1996b) foreachcorrectmodel.

Step 4: Estimation

A CFA was conducted on each simulated sample correlation matrix using ML andULS estimation. Parameters were estimated for all the models defined in Figure 1;that is, both for the generating and fitted models. The parameter estimates werecomputed with the LISREL 8.71 program of Jöreskog and Sörbom (1996a).

Step 5: Replication

Steps 3 and 4 were repeated for r = 1 … 252,000 replications.


TABLE 1True Parameters of Generating Models

One-Factor Model Two-Factor Model Three-Factor Model

Factor 1 Factor 1 Factor 2 Factor 1 Factor 2 Factor 3

X1 L .80 0 .95 0 0X2 L .80 0 .95 0 0X3 L .80 0 .95 0 0X4 L .80 0 .95 0 0X5 L .80 0 .95 0 0X6 L .80 0 0 .70 0X7 L .80 0 0 .70 0X8 L 0 L 0 .70 0X9 L 0 L 0 .70 0X10 L 0 L 0 0 LX11 L 0 L 0 0 LX12 L 0 L 0 0 LIf factors arecorrelated

φ12 .50 .50φ13 .50φ23 .50

Note. L = loading size in the weak factor (.25, .35, or .50).

Dow

nloa

ded

by [

Um

eå U

nive

rsity

Lib

rary

] at

01:

31 1

5 N

ovem

ber

2014

Step 6: Analyses of Output

Nonconvergent solutions were deleted to study the effects of the independent vari-ables on the recovery of the weak factor loadings and the goodness of fit. The oper-ational definition employed was that of the LISREL program: failure to reach con-vergence after 250 iterations (see Jöreskog, 1967, p. 460). Moreover, Heywoodcases were detected in each of the cells of the design but were not deleted for analy-sis purposes. The nonconvergent solutions were analyzed separately to study theeffect of the independent variables on the occurrence of nonconvergent solutionsand Heywood cases. Two qualitative variables were created. For convergence(CONVER), nonconvergent solutions were coded 1, whereas convergent solutionswere coded 0. For Heywood cases (HEYWOOD), solutions with Heywood caseswere coded 1, whereas solutions without them were coded 0. Log-linear/logitmodels were fit to the data using ML estimation. The proportion of weighted varia-tion explained by each model (Goodman, 1971) was calculated in addition to theusual likelihood ratio chi-square statistic.

594 XIMÉNEZ

TABLE 2Variables Considered in the Monte Carlo Study

Code Variable Levels

Independent variablesM Method ML (maximum likelihood)

ULS (unweighted least squares)N Sample size 100

300500

L Loading size in the weak factor .25.35.50

C Correlation between factors 0.50

S Model specification C: correctI1: incorrect 1 (model misspecified

by overfactoring)I2: incorrect 2 (model misspecified

by underfactoringDependent variables

φ Coefficient of congruenceRMSD Root mean squared deviationRMSEA Root mean squared error of

approximationCONVER Nonconvergent solutions 0: no

1: yesHEYWOOD Heywood cases 0: no

1: yes

Note. The C variable and the I2 level of the S variable do not appear in the one-factor model.Then, for the one-factor model there is a 2 × 3 × 3 × 2 design, and for the two and three-factor models a 2× 3 × 3 × 2 × 3 design.

Dow

nloa

ded

by [

Um

eå U

nive

rsity

Lib

rary

] at

01:

31 1

5 N

ovem

ber

2014

The recovery of the weak factor loadings was assessed by examination of thecorrespondence between the theoretical loading and the estimated one. We usedthe same two measures of correspondence as in the Briggs and MacCallum (2003)study. The first was the coefficient of congruence (Tucker, 1951), φ, calculated by:

where p is the number of variables that define the factor k, λik(t) is the theoreticloading for the observed variable i of the factor k, and λik(e) the corresponding load-ing obtained from the simulation data. The φk coefficient is computed for each fac-tor in the theoretical model. We adopted the same interpretation guidelines for φvalues as in the MacCallum et al. (1999) study: above .98 = excellent recovery; .92to .98 = good recovery; .82 to .92 = borderline recovery; .68 to .82 = poor recovery;and below .68 = terrible recovery.

A second measure of correspondence, the root mean square deviation (RMSD;Levine, 1977) was also calculated for each factor in the theoretical model:

RMSD reaches a minimum of zero for a perfect pattern-magnitude match and amaximum of two, when all loadings are equal to unity but of opposite signs. Inter-mediate values are difficult to interpret. In practice, most studies consider RMSDvalues below .20 to be indicative of a satisfactory recovery.

Following Skrondal (2000), a simple metamodel is used to analyze the results,which includes only the main and the double interaction effects of each independ-ent variable on the dependent variable. Given the characteristics of the study (thenumber of factors in the model is varied but not the number of observed variables)the number of factors is not manipulated as an independent variable. Then, all anal-yses are conducted separately for the one-, two-, and three-factor models. For thetwo- and three-factor models, the following model was tested:

RWFL = µ + M + N + L + C + S + M*N + M*L + M*C + M*S ++ N*L + N*C + N*S + L*C + L*S + C*S (5)

where: RWFL = recovery of weak factor loadings (φ and RMSD measures)M = method (ML vs. ULS)N = sample size (100, 300, or 500)L = loading size in the weak factor (.25, .35, or .50)C = correlation between factors (0 or .50)S = model specification (C = correct, I1 = incorrect 1, and I2 = incorrect 2)


2( ) ( )1

RMSD ( ) (4)p

k ik t ik ei

p�

� �� λ λ

( ) ( )

1

2 2( ) ( )

1 =1

= (3)

p

ik t ik e

ik

p p

ik t ik ei i

λ λφ

λ λ

�

�

� ��

�

� �

Dow

nloa

ded

by [

Um

eå U

nive

rsity

Lib

rary

] at

01:

31 1

5 N

ovem

ber

2014

For the one-factor model the metamodel includes all terms except the ones thatrefer to C. Additionally, the S variable has only two levels (C and I1). A separateanalysis of variance (ANOVA) was conducted to test the effects included in themetamodel for the one-, two-, and three-factor models. As the large sample size (n= 252,000) can cause even negligible effects to be statistically significant, the ex-plained variance associated with each of the effects was also calculated, measuredby the η2 statistic. A practical significance criterion of accounting for 3% or moreof the variance was adopted. Multiple comparisons were also conducted for the ef-fects that were shown to be statistically and practically significant.

The goodness of fit of the model was measured by the root mean squared errorof approximation (RMSEA) index of Steiger (1990), defined by:

where χ2 is the chi-square statistic and df the degrees of freedom of the model.RMSEA was chosen because, as stated by Sugawara and MacCallum (1993), it isthe most appropriate among the nonincremental fit measures because it behavesconsistently across estimation methods for good models and also has an interpret-able scale associated with it for determining the degree of fit. Browne and Cudeck(1993) suggested that values of RMSEA below .05 indicate close fit, from .05 to.08 fair fit, from .08 to .10 mediocre fit, and above .10 unacceptable fit. In addition,RMSEA is sensitive to model misspecification (Fan & Sivo, 2005). The samemetamodel as in Equation 5 was used to test the effects of the independent vari-ables on the RMSEA index by an ANOVA.

RESULTS

Nonconvergence and Heywood Cases

Of the 252,000 solutions, 23,049 (9.1%) were nonconvergent and 68,319 (27.1%)presented Heywood cases. The proportion of nonconvergent solutions and Hey-wood cases that occurred in obtaining 1,000 good solutions per cell is summarizedin Table 3. The results of the log-linear/logit analyses are summarized in Table 4.The two-way interaction models provided good explanations of the data, account-ing for more than 97% of the weighted variation to be explained in CONVER andin HEYWOOD for each model.

Examination of the parameter estimates and the chi-square values forCONVER and HEYWOOD in the one-factor model shows that the proportion ofconvergent and proper solutions was higher when the loadings in the weak factorapproached .50, the model was correctly specified, and the sample size was in-

596 XIMÉNEZ

� �2 / 1 / 1)RMSEA (6)

n df n

df

χ� � � � ��

Dow

nloa

ded

by [

Um

eå U

nive

rsity

Lib

rary

] at

01:

31 1

5 N

ovem

ber

2014

creased. Furthermore, there were more convergent and proper solutions with theULS estimation method.

In the two- and three-factor models, results show that the proportion of conver-gent and proper solutions was higher when the model was correctly specified, thefactors were correlated, the loadings in the weak factor approached .50, and thesample size was increased. Furthermore, in these models there were also more con-vergent and proper solutions with the ULS estimation method. The L × S and N × Sinteraction effects were of considerable size. Analyses showed that the effect oflow loadings was pronounced for misspecified models. In addition, the largest pro-portion of nonconvergent solutions and Heywood cases occurred with misspeci-fied models and the smallest sample size (N = 100).

Overall, results indicate that when the number of factors was three and factorswere uncorrelated, misspecified models had the largest proportion of noncon-vergent and improper solutions. In addition, nonconvergent and improper solu-tions occurred more frequently for smaller sample sizes (e.g., N = 100) and lowerloading sizes (e.g., loadings of .25).

Recovery of the Weak Factor Loadings

Table 5 shows the summary statistics for the measures of recovery of weak factorloadings (φ and RMSD) for all main effects. Table 6 presents results of the ANOVAfor the RMSD measure. The ANOVA results for the congruence measure, φ, are notincluded for brevity and because they are very similar to the RMSD results.

In the one-factor model, 84% of the total variance as measured by η2 was ac-counted for by the model. As shown in Table 6, the largest effects found are due to thesample size (η2 = .46), loading size (η2 = .33), and specification (η2 = .11) main ef-fects, and to the N × L interaction (η2 = .10). The recovery of the weak factor loadingsimproves as sample and loading size increase. In addition, the average values of φand RMSD for the one-factor model defined by loadings of .25 are indicative of aborderline weak factor loadings recovery, and those defined by loadings of .35 and.50 of a good recovery (see Table 5). The left upper panel of Figure 2 illustrates the N× L interaction. As can be seen, the weak factor loadings are satisfactorily recoveredin all loading sizes except when the factor is defined by loading sizes of .25 and thesample size is too small (N = 100). Finally, the recovery of weak factor loadings im-proves for correctly specified models. However, it should be noted that the recoveryis satisfactory when the model is misspecified by overfactoring. The estimationmethod also produced a small effect when all variables are entered simultaneously(η2 = .04). Overall, the mean values for both φ and RMSD indicate that the recoveryof the weak factor loadings with the ULS estimation method is slightly better thanwith the ML method (see Table 5). The scatter plots in the first row of Figure 3 illus-trate this difference in more detail. These plots show the RMSD coefficient for theweak factor from the ML and ULS solutions for the convergent cases under condi-tions of specification (to conserve space, the conditions of loading and sample size


Dow

nloa

ded

by [

Um

eå U

nive

rsity

Lib

rary

] at

01:

31 1

5 N

ovem

ber

2014

598

TABLE 3Proportion of Nonconvergent Solutions and Heywood Cases Across the Design

CONVER HEYWOOD

CO = 0 CO = .50 CO = 0 CO = .50

C I1 I2 C I1 I2 C I1 I2 C I1 I2

ML ULS ML ULS ML ULS ML ULS ML ULS ML ULS ML ULS ML ULS ML ULS ML ULS ML ULS ML ULS

1F .25 100 .03 .01 .25 .16 .04 0 .30 .21300 0 0 .08 .05 0 0 .09 .06500 0 0 .02 .01 0 0 .02 .01

.35 100 0 0 .06 .03 0 0 .09 .05300 0 0 0 0 0 0 .02 .01500 0 0 0 0 0 0 0 0

.50 100 0 0 0 0 0 0 0 0300 0 0 0 0 0 0 0 0500 0 0 0 0 0 0 0 0

2F .25 100 .18 .11 .31 .28 0 0 .05 .01 .08 .04 0 0 .23 .15 .44 .41 0 0 .09 .09 .34 .33 0 0300 .06 .04 .27 .22 0 0 0 0 .02 .01 0 0 .07 .05 .41 .38 0 0 0 0 .26 .26 0 0500 .02 .01 .24 .19 0 0 0 0 0 0 0 0 .02 .01 .41 .38 0 0 0 0 .25 .25 0 0100 .05 .02 .25 .20 0 0 0 0 .03 .01 0 0 .04 .01 .42 .38 0 0 .01 .01 .28 .27 0 0300 0 0 .18 .12 0 0 0 0 0 0 0 0 0 0 .37 .33 0 0 0 0 .24 .24 0 0500 0 0 .17 .11 0 0 0 0 0 0 0 0 0 0 .37 .33 0 0 0 0 .25 .25 0 0100 0 0 .18 .11 0 0 0 0 0 0 0 0 0 0 .36 .31 0 0 0 0 .25 .23 0 0300 0 0 .17 .08 0 0 0 0 0 0 0 0 0 0 .35 .29 0 0 0 0 .25 .23 0 0500 0 0 .18 .10 0 0 0 0 0 0 0 0 0 0 .36 .31 0 0 0 0 .27 22 0 0

(continued)Dow

nloa

ded

by [

Um

eå U

nive

rsity

Lib

rary

] at

01:

31 1

5 N

ovem

ber

2014

599

3F .25 100 .20 .20 .27 .25 .19 .11 .02 .01 .03 .01 0 0 .24 .24 .46 .46 .33 .25 .10 .08 .30 .46 .01 0300 .15 .15 .24 .21 .18 .11 0 0 0 0 0 0 .18 .18 .46 .46 .33 .26 .01 .01 .25 .50 0 0500 .09 .09 .21 .18 .19 .10 0 0 0 0 0 0 .11 .12 .44 .44 .32 .26 0 0 .25 .50 0 0

.35 100 .12 .12 .20 .18 .20 .10 0 0 0 0 0 0 .18 .17 .45 .45 .32 .23 .01 .01 .24 .46 0 0300 .02 .02 .14 .11 .18 .10 0 0 0 0 0 0 .04 .04 .44 .44 .32 .24 0 0 .24 .50 0 0500 .01 .01 .14 .11 .19 .10 0 0 0 0 0 0 .01 .01 .43 .43 .31 .24 0 0 .25 .50 0 0

.50 100 .01 .01 .11 .09 .20 .09 0 0 0 0 0 0 .03 .03 .43 .43 .29 .18 0 0 .24 .47 0 0300 0 0 .12 .09 .18 .08 0 0 0 0 0 0 0 0 .43 .43 .29 .18 0 0 .24 .50 0 0500 0 0 .13 .10 .18 .09 0 0 0 0 0 0 0 0 .43 .43 .28 .17 0 0 .25 .50 0 0

Note. CONVER = nonconvergent solutions; CO = correlation between factors; C = model specification correct; I1 = model specification incorrect 1; I2 =model specification incorrect 2; ML = maximum likelihood; ULS = unweighted least squares; 1F = one factor; 2F = two factor; 3F = three factor.

TABLE 3 (Continued)

CONVER HEYWOOD

CO = 0 CO = .50 CO = 0 CO = .50

C I1 I2 C I1 I2 C I1 I2 C I1 I2

ML ULS ML ULS ML ULS ML ULS ML ULS ML ULS ML ULS ML ULS ML ULS ML ULS ML ULS ML ULS

Dow

nloa

ded

by [

Um

eå U

nive

rsity

Lib

rary

] at

01:

31 1

5 N

ovem

ber

2014

600

TABLE 4Effect of Independent Variables on the Nonconvergent Solutions and Heywood Causes


CONVER HEYWOOD CONVER HEYWOOD CONVER HEYWOOD

df χ2 Prob. χ2 Prob. df χ2 Prob. χ2 Prob. df χ2 Prob. χ2 Prob.

M 1 118.736 <.001 115.36 <.001 1 382.28 <.001 166.78 <.001 1 488.42 <.001 503.05 <.001N 2 1408.89 <.001 1982.22 <.001 2 863.12 <.001 757.67 <.001 2 424.17 <.001 355.72 <.001L 2 2240.96 <.001 2724.05 <.001 2 1781.65 <.001 1239.14 <.001 2 1846.57 <.001 1414.32 <.001C — — — — — 1 9259.53 <.001 2393.37 <.001 1 19646.40 <.001 15443.26 <.001S 1 1549.43 <.001 2020.97 <.001 2 13410.33 <.001 56334.61 <.001 2 2057.61 <.001 51540.88 <.001M * N 2 2.34 .309 1.15 .563 2 1.34 .513 .599 .741 2 .67 .716 39.08 <.001M * L 2 1.40 .496 1.76 .415 2 37.15 <.001 13.82 .001 2 11.53 .003 6.55 .038M * C — — — — — 1 21.83 <.001 54.60 <.001 1 15.18 <.001 1612.61 <.001M * S 1 10.32 .001 37.26 <.001 2 16.28 <.001 15.50 <.001 2 259.77 <.001 500.36 <.001N * L 4 67.86 <.001 77.67 <.001 4 85.88 <.001 68.32 <.001 4 104.85 <.001 20.32 <.001N * C 2 — — — — 2 302.22 <.001 1.28 .528 2 210.11 <.001 15.07 <.001N * S 2 21.24 <.001 34.31 <.001 4 402.54 <.001 873.99 <.001 4 369.94 <.001 1156.84 <.001L * C — — — — — 2 185.14 <.001 49.97 <.001 2 98.65 <.001 11.27 .004L * S 2 9.15 .010 14.64 <.001 4 565.80 <.001 1056.96 <.001 4 1726.28 <.001 1885.50 <.001C * S — — — — — 2 11.31 .004 32.37 <.001 2 40.14 <.001 5765.45 <.001P .998 .999 .997 .996 .991 .975

Note. CONVER = nonconvergent solutions; M = method; N = sample size; L = loading size in the weak factor; C = correlation between factors; S = modelspecification; P = proportion of weighted variation explained by each model.

Dow

nloa

ded

by [

Um

eå U

nive

rsity

Lib

rary

] at

01:

31 1

5 N

ovem

ber

2014

601

TABLE 5Summary Statistics on Dependent Variables for Main Effects


Congruence RMSD RMSEA Congruence RMSD RMSEA Congruence RMSD RMSEA

M SD M SD M SD M SD M SD M SD M SD M SD M SD

Overall .940 .149 .094 .065 .035 .033 .730 .451 .196 .128 .047 .044 .838 .418 .156 .135 .087 .062Method

ML .933 .177 .098 .078 .035 .033 .727 .454 .199 .131 .047 .043 .837 .420 .157 .138 .084 .060ULS .946 .138 .091 .058 .035 .033 .734 .456 .192 .127 .048 .044 .839 .430 .156 .138 .091 .063

Sample size100 .880 .233 .146 .083 .036 .033 .680 .460 .229 .131 .049 .044 .802 .438 .196 .151 .089 .060300 .959 .083 .080 .040 .035 .032 .748 .445 .186 .122 .047 .043 .850 .409 .145 .127 .087 .062500 .973 .064 .062 .031 .035 .032 .761 .444 .174 .124 .047 .044 .861 .404 .130 .116 .086 .063

Loading size.25 .861 .233 .129 .085 .020 .019 .667 .451 .222 .160 .027 .032 .784 .435 .160 .127 .086 .059.35 .959 .079 .093 .053 .034 .028 .742 .439 .189 .111 .042 .036 .847 .399 .156 .127 .087 .061.50 .989 .043 .065 .036 .049 .040 .777 .456 .174 .096 .071 .049 .877 .414 .153 .148 .089 .066

SpecificationC .947 .162 .084 .057 .011 .016 .936 .170 .107 .077 .012 .016 .939 .177 .141 .114 .013 .016I1 .932 .133 .105 .072 .062 .025 .821 .182 .202 .102 .074 .045 .941 .177 .140 .107 .119 .029I2 — — — .465 .620 .276 .130 .061 .038 .618 .642 .190 .171 .143 .016

Correlation.00 — — — .500 .568 .256 .154 .057 .050 .643 .583 .221 .180 .089 .066.50 — — — .925 .141 .144 .067 .040 .036 .977 .097 .110 .053 .086 .059

Note. RMSD = root mean squared deviation; RMSEA = root mean squared error of approximation; ML = maximum likelihood; ULS = unweighted leastsquares.D

ownl

oade

d by

[U

meå

Uni

vers

ity L

ibra

ry]

at 0

1:31

15

Nov

embe

r 20

14

602

TABLE 6Analysis of Variance Results for the Root Mean Squared Deviation Measure in the Monte Carlo Study


df F Prob. η2 df F Prob. η2 df F Prob. η2

M 1 610.18 <.001 .041 1 1554.33 <.001 .031 1 16.56 <.001 .000N 2 7288.67 <.001 .461 2 3833.48 <.001 .135 2 2187.76 <.001 .087L 2 4229.55 <.001 .331 2 2451.32 <.001 .091 2 25.96 <.001 .001C — — — — 1 42250.34 <.001 .463 1 18472.28 <.001 .286S 1 2193.16 <.001 .114 2 34315.08 <.001 .583 2 3120.33 <.001 .119M * N 2 139.50 <.001 .016 2 198.92 <.001 .008 2 3.24 .039 .000M * L 2 100.46 <.001 .012 2 56.37 <.001 .002 2 12.37 <.001 .001M * C — — — — 1 650.96 <.001 .013 1 26.26 <.001 .001M * S 1 278.53 <.001 .016 2 64.80 <.001 .003 2 35.61 <.001 .002N * L 4 452.48 <.001 .096 4 276.54 <.001 .022 4 60.78 <.001 .005N * C — — — — 2 4.47 .011 .000 2 181.42 <.001 .008N * S 2 132.50 <.001 .015 4 760.45 <.001 .058 4 88.65 <.001 .008L * C — — — — 2 1066.19 <.001 .042 2 24.78 <.001 .001L * S 2 265.37 <.001 .030 4 3724.69 <.001 .118 4 995.64 <.001 .080C * S — — — — 2 10439.42 <.001 .299 2 8228.03 <.001 .263Error 17,069 (.004) 49,037 (.007) 46,064 (.018)Total 17,088 .844 49,070 .816 46,097 .754

Note. Values in parentheses represent mean squared errors. M = method; N = sample size; L = loading size in the weak factor; C = correlation between fac-tors; S = model specification.

Dow

nloa

ded

by [

Um

eå U

nive

rsity

Lib

rary

] at

01:

31 1

5 N

ovem

ber

2014

arenot includedbutareavailable fromtheauthor).Ascanbeseen, themajorityof thepoints are concentrated in the lower left corner, representing the replications inwhich both ML and ULS recover the weak factor loadings adequately. In many otherinstances, however, ULS recovered the weak factor loadings satisfactorily and MLdid not. This is reflected by the points in the plot above .20 on the horizontal axis andbelow .20 on the vertical axis. These cases are not associated with the occurrence ofHeywood cases. There are also cases in which both methods obtained high values inRMSD (these correspond to the models with loading sizes of .25 and N = 100). Itshould be noted that in no case did ML appreciably outperform ULS in the recoveryof the weak factor loadings.

In the two-factor model, 82% of the total variance as measured by η2 was ac-counted for by the model. As shown in Table 6, the largest effects found are dueto the specification (η2 = .58) and correlation (η2 = .46) main effects, and to theC × S interaction (η2 = .30). The average values of φ and RMSD for correctlyspecified models are indicative of a good weak factor loadings recovery, that ofoverfactored models of a borderline recovery, and that of underfactored modelsof a terrible recovery (see Table 5). In addition, the presence of factor correlationsignificantly improves the weak factor loadings recovery. The left middle panelof Figure 2 illustrates the C × S interaction. As can be seen, the weak factorloadings are satisfactorily recovered in all specification types except when themodel is misspecified and the factors are orthogonal. The recovery is especiallypoor for the misspecification by underfactoring condition. However, if factorsare correlated, both correct and misspecified models show good recovery. Othersmaller effects are attributable to the sample and loading size main effects, andto the L × S, N × S, and L × C interactions. Recovery improves as sample andloading size increase. This is true for all specification types except for themisspecification by underfactoring condition, where the recovery is poor in alllevels of sample and loading size. The presence of factor correlation moderatesthis relation, as when factors are correlated the recovery is satisfactory for alllevels of loading size. The estimation method also produced an effect (η2 = .03).The mean values for the correspondence measures indicate that the recovery ofthe weak factor loadings with the ULS estimation method is slightly better thanwith the ML method (see Table 5). The scatter plots in the second and third rowsof Figure 3 illustrate this difference in more detail. The two-factor model withorthogonal factors for correctly specified models shows a similar pattern to theone explained for the one-factor model. However, for the misspecified modelsthe recovery of the weak factor loadings worsens and both methods provide sim-ilar solutions. For overfactored models, the extreme cases where both methodsfail (RMSD values above .50) are associated with the occurrence of Heywoodcases. In the underfactored models the recovery is poorer and both methods pro-vide identical results. If the two factors are correlated, the recovery of the weakfactor loadings is much improved. Both the correct models and the overfactoredmodels show adequate recovery by ML and ULS in the majority of cases, but


Dow

nloa

ded

by [

Um

eå U

nive

rsity

Lib

rary

] at

01:

31 1

5 N

ovem

ber

2014

604

FIGURE 2 Graphical representation of the strongest double interaction effects found on thedependent variables of the study.

Dow

nloa

ded

by [

Um

eå U

nive

rsity

Lib

rary

] at

01:

31 1

5 N

ovem

ber

2014

FIGURE 3 Scatter plots for the RMSD measure in the one-factor, two-factor, andthree-factor models across estimation method, specification, and correlation.

Dow

nloa

ded

by [

Um

eå U

nive

rsity

Lib

rary

] at

01:

31 1

5 N

ovem

ber

2014

there are still some cases of failure of ML when ULS succeeds that are not asso-ciated with the occurrence of Heywood cases. For the underfactored models, therecovery is adequate with both methods (the RMSD values above .20 correspondto the models with low loading and sample sizes).

Finally, in the three-factormodel,75%of the totalvarianceasmeasuredbyη2 wasaccounted for by the model. Same as in the two-factor model, the largest effects areattributable to the correlation (η2= .29) and specification (η2= .12) main effects, andto the C × S interaction (η2 = .26). The left lower panel of Figure 2 illustrates the C × Sinteraction. As can be seen, weak factor loadings are satisfactorily recovered in allspecification types except when the model is misspecified by underfactoring and thefactors areorthogonal. Again, if factors arecorrelated, bothcorrect andmisspecifiedmodels show good recovery. Finally, in the three-factor model, the estimationmethod does not produce an effect when all variables are entered simultaneously (η2

= .00).Thescatterplots in the fourthandfifth rowsofFigure3 indicate that the recov-ery worsens as more orthogonal factors are included in the model. This worsening isespecially pronounced in the underfactored models, in which recovery is poor in themajorityofcaseswitheithermethod.However, if factorsarecorrelated, the recoveryof the weak factor loadings improves significantly and there are some instances inwhich ULS succeeds when ML fails.

The ANOVAs were repeated eliminating the Heywood cases (the results are notincluded for brevity but are available from the author). In the one-factor model,poor recovery for the smallest sample size (N = 100) was found to be associatedwith the occurrence of Heywood cases. In the two- and three-factor models, elimi-nating Heywood cases improves the recovery for models with correlated factors. Inthe models with orthogonal factors, poor recovery for underfactored models is notassociated with the occurrence of Heywood cases.

Goodness of Fit

The summary statistics on RMSEA for all main effects appear in Table 5. TheANOVA results for the goodness-of-fit measure (RMSEA) are summarized inTable 7.

As shown in Table 7, the model accounts for 86% of the total variance for theone-factor model. The largest main effects are attributable to specification (η2 =.75) and loading size (η2 = .39). The L × S interaction also produces a strong effect(η2 = .37). The right upper panel of Figure 2 illustrates this interaction. The aver-age values of RMSEA for correctly specified models are indicative of a close fit forall loading sizes and those for misspecification by overfactoring of a fair or a medi-ocre fit. Therefore, the RMSEA measure is sensitive to model misspecification. Inthe two-factor model, the model accounts for 91% of the total variance. The largesteffects are attributable to the same variables as in the one-factor model and also tothe correlation main effect (η2 = .37) and the C × S interaction (η2 = .52). The rightmiddle panel of Figure 2 illustrates this interaction. Results indicate that the empir-

606 XIMÉNEZ

Dow

nloa

ded

by [

Um

eå U

nive

rsity

Lib

rary

] at

01:

31 1

5 N

ovem

ber

2014

607

TABLE 7ANOVA Results for the RMSEA Index in the Monte Carlo Study


df F Prob. η2 df F Prob. η2 df F Prob. η2

M 1 113.04 <.001 .007 1 1468.56 <.001 .029 1 87237.87 <.001 .659N 2 4.08 .017 .000 2 50.34 <.001 .002 2 45.09 <.001 .002L 2 5552.56 <.001 .394 2 27257.63 <.001 .526 2 490.87 <.001 .021C — — — — 1 29028.25 <.001 .372 1 10914.37 <.001 .195S 1 50455.54 <.001 .747 2 79108.91 <.001 .763 2 458886.70 <.001 .953M * N 2 34.59 <.001 .004 2 10.20 <.001 .000 2 26.07 <.001 .001M * L 2 13.83 <.001 .002 2 664.68 <.001 .026 2 14.22 <.001 .001M * C — — — — 1 1041.56 <.001 .021 1 80477.76 <.001 .641M * S 1 265.33 <.001 .015 2 10.20 <.001 .000 2 18255.84 <.001 .448N * L 4 10.04 <.001 .002 4 21.08 <.001 .002 4 .19 .945 .000N * C — — — — 2 3.61 .027 .000 2 1.89 .151 .000N * S 2 515.69 <.001 .057 4 309.35 <.001 .025 4 615.78 <.001 .052L * C — — — — 2 364.93 <.001 .015 2 9.21 <.001 .000L * S 2 4994.38 <.001 .369 4 9665.83 <.001 .441 4 444.53 <.001 .038C * S — — — — 2 26343.23 <.001 .518 2 31013.78 <.001 .579Error 17,069 (.001) 49037 (.001) 45,064 (.001)Total .863 .908 .980

Note. Values in parentheses represent mean squared errors. M = method; N = sample size; L = loading size in the weak factor; C = correlation between fac-tors; S = model specification.

Dow

nloa

ded

by [

Um

eå U

nive

rsity

Lib

rary

] at

01:

31 1

5 N

ovem

ber

2014

ical fit improves for correlated factors. Finally, in the three-factor model, the modelaccounts for 98% of the total variance. The largest main effects are attributable tospecification (η2 = .95), estimation method (η2 = .66), and correlation (η2 = .20).The M × C, C × S, and M × S interactions also produce strong effects. As in the one-and two-factor models, the RMSEA measure is sensitive to model misspeci-fication. In addition, mean values of RMSEA are smaller for ML than for ULS (seeTable 5). This effect is moderated by the correlation and specification effects. TheM × C interaction (η2 = .64) is represented in the right lower panel of Figure 2. Re-sults indicate that the difference between methods does not exist if factors are or-thogonal. The M × S interaction (η2 = .45) indicates that for correct models bothML and ULS show a close fit. However, for misspecified models, the average val-ues of RMSEA are indicative of a mediocre fit with either method. The C × S inter-action also exerts a strong effect (η2 = .58). For correct models the average value ofRMSEA for orthogonal and correlated factors shows a close fit. For overfactoredmodels the average empirical fit for models with correlated factors is fair, but is un-acceptable for the ones with orthogonal factors. For underfactored models the av-erage value of RMSEA indicates an unacceptable fit both for models with orthogo-nal and for those with correlated factors.

The role of power was also analyzed. Table 8 shows the empirical proportion ofrejections (EPR) of the null hypothesis that the model is correct in the population(test of exact fit) with α = .05 under the study conditions. As can be seen, when themodel is correct, in all conditions the ULS method maintains the EPR values closerto the intended alpha level (.05) than the ML method. For the incorrect models, theEPR values are larger for ULS than for ML in the one-factor model and they arevery similar in the two- and three-factor models. This suggests that when the num-ber of weak factor loadings in the model is large, ULS not only improves their re-covery but also improves power to detect incorrect models. Analyses of power interms of model misspecification indicate that in the two-factor model, more poweris achieved by adding than by omitting a factor when factors are orthogonal,whereas this difference is not so clear if factors are correlated.

DISCUSSION

This article presented the results of a simulation study that examined the recoveryof weak factor loadings in the context of CFA under conditions of estimationmethod (ML vs. ULS), sample size, loading size, factor correlation, and modelspecification (correct vs. incorrect). The effects of the same variables on goodnessof fit and the occurrence of nonconvergent solutions and Heywood cases were alsoexamined. This study extends past research in several ways. The majority of previ-ous studies refer to EFA, define weak factors as those with loading sizes of .40 orabove, refer to models with orthogonal factors, and are based on the assumptionthat models are correct for the population, whereas this study refers to CFA, uses

608 XIMÉNEZ

Dow

nloa

ded

by [

Um

eå U

nive

rsity

Lib

rary

] at

01:

31 1

5 N

ovem

ber

2014

lower loading sizes to define the weak factor (from .25–.50), refers to models withorthogonal and correlated factors, and includes misspecification conditions.

Results concerning the nonconvergent solutions and Heywood cases indicatethat the ML method produces more nonconvergent and improper solutions as com-pared to the ULS method. In addition, the number of nonconvergent solutions andHeywood cases increases when factors are orthogonal, the model is incorrectlyspecified (i.e., if some variables are assigned to the wrong factor), the loading sizein the weak factor is lower, and the sample size is small. These results are similar tothose obtained by Anderson and Gerbing (1984), MacCallum et al. (1999), andVelicer and Fava (1998) for ML solutions and correct models.

Concerning the recovery of the weak factor loadings, this study found many sta-tistically significant effects. First, the results indicate that factor correlation andmodel specification are the variables that produce the largest effects, both sepa-rately and in interaction. The presence of factor correlation improves the recoveryof weak factor loadings significantly. If factors are correlated, recovery is satisfac-tory even for misspecified models by altering the number of factors. However, iffactors are orthogonal, models specified with an incorrect number of factors, espe-cially those misspecified by underfactoring, show a poor recovery and a decreasein power. This result is consistent with previous studies in the context of EFA (e.g.,Fava & Velicer, 1992, 1996), which indicate that retention of an incorrect numberof factors, especially retention of too few factors, can cause major distortion ofloading patterns. Therefore, the consequences of underfactoring generalize fromEFA to CFA. However, as this study only considered a small number of factors(one, two, or three), future research should examine these effects for models with alarger number of factors. As previous research has found that the effects ofunderfactoring are more serious when the correct number of factors is small, it isexpected that the effects found here are diminished if the model includes a largernumber of factors. Second, the effects of sample and loading size are considerable.Recovery of the weak factor loadings improves as sample size increases. The re-covery is especially poor for misspecified models with three orthogonal factorsand the smallest sample size (N = 100). Results indicate that the poor recovery forN = 100 is associated with the occurrence of Heywood cases. These data also dem-onstrate that with increasing sample sizes (N = 300 and 500) ML and ULS esti-mates for the correct model gave identical parameter estimates and also estimatesalmost identical to the true values (the estimates were equal to, at three decimals,the true parameter value in 85% of cases). These findings are similar to those foundin the studies of Briggs and MacCallum (2003), MacCallum et al. (1999, 2001),and Velicer and Fava (1998). One of the topics of investigation of this study was toexamine the recovery of weak factor loadings for lower loading sizes than thoseconsidered in previous studies. Results indicate that the recovery is poor for lowerloading sizes (e.g., .25) but it improves as loading size increases (e.g., .35 and .50).This holds for correctly specified models and for models where misspecificationimplies overfactoring. However, for models where misspecification implies under-


Dow

nloa

ded

by [

Um

eå U

nive

rsity

Lib

rary

] at

01:

31 1

5 N

ovem

ber

2014

610

TABLE 8Results Concerning Estimates of Power With the Proportion of Rejections of the Null Hypothesis

CO = 0 CO = .50

C I1 I2 C I1 I2

Model L N ML ULS ML ULS ML ULS ML ULS ML ULS ML ULS

1F .25 100 .068 .057 .213 .235300 .046 .051 .623 .830500 .047 .049 .929 .987

.35 100 .086 .056 .675 .885300 .056 .050 .998 1.00500 .048 .048 1.00 1.00

.50 100 .090 .052 .993 1.00300 .058 .050 1.00 1.00500 .050 .047 1.00 1.00

2F .25 100 .093 .074 1.00 1.00 .181 .190 .088 .062 .125 .129 .179 .180300 .062 .050 1.00 1.00 .295 .303 .055 .050 .206 .275 .296 .305500 .055 .046 1.00 1.00 .509 .533 .048 .049 1.00 1.00 .510 .514

Dow

nloa

ded

by [

Um

eå U

nive

rsity

Lib

rary

] at

01:

31 1

5 N

ovem

ber

2014

611

.35 100 .094 .060 1.00 1.00 .421 .500 .096 .061 .274 .386 .420 .500300 .063 .048 1.00 1.00 .888 .889 .056 .051 .676 .681 .886 .890500 .053 .048 1.00 1.00 .995 .997 .055 .050 1.00 1.00 .993 .994

.50 100 .089 .049 1.00 1.00 .944 .927 .095 .060 .794 .824 .944 .938300 .057 .048 1.00 1.00 1.00 1.00 .057 .050 1.00 .975 1.00 1.00500 .055 .049 1.00 1.00 1.00 1.00 .054 .050 1.00 1.00 1.00 1.00

3F .25 100 .107 .096 1.00 1.00 1.00 1.00 .087 .059 .979 1.00 1.00 .979300 .072 .071 1.00 1.00 1.00 1.00 .067 .052 1.00 1.00 1.00 1.00500 .061 .061 .999 1.00 1.00 1.00 .069 .048 1.00 1.00 1.00 1.00

.35 100 .097 .095 1.00 1.00 1.00 1.00 .090 .057 .982 1.00 1.00 .975300 .071 .071 1.00 1.00 1.00 1.00 .067 .053 1.00 1.00 1.00 1.00500 .057 .057 .999 1.00 1.00 1.00 .062 .051 1.00 1.00 1.00 1.00

.50 100 .089 .085 1.00 1.00 1.00 1.00 .089 .057 .987 1.00 1.00 .993300 .070 .070 .999 1.00 1.00 1.00 .067 .052 1.00 1.00 1.00 1.00500 .054 .052 1.00 1.00 1.00 1.00 .062 .050 1.00 1.00 1.00 1.00

Note. Cells contain the empirical proportion of rejections of the null hypothesis that the model is exactly correct in the population with α = .05. CO = corre-lation between factors; C = model specification correct; I1 = over-factored model specification; I2 = under-factored model specification; l = loading size in theweak factor; ML = maximum likelihood; ULS = unweighted least squares; 1F = one factor; 2F = two factor; 3F = three factor.

Dow

nloa

ded

by [

Um

eå U

nive

rsity

Lib

rary

] at

01:

31 1

5 N

ovem

ber

2014

factoring, the recovery is poor for any level of loading size. Again, this effect doesnot hold if factors are correlated. Therefore, it appears that the recovery of weakfactor loadings may be troublesome if the magnitude of the loadings is too small(e.g., .25) and the factors in the model are orthogonal. Third, the results also indi-cated that the estimation method had a small effect. Given the assumption of nor-mality, ML sometimes fails to recover the weak factor loadings when ULS suc-ceeds. These cases are not associated with Heywood cases. In addition, in no caseis there a clear advantage of ML over ULS. This result was especially pronouncedas number of factors, sample size, and level of loading size decreased. In the condi-tions of model error, our results indicate that ULS performs at least as well as MLin the recovery of the weak factor loadings. In all cases when ML was successful,ULS also succeeded. However, in the cases where ULS succeeded, ML sometimesfailed. Although the differences between ML and ULS solutions in the recovery ofthe weak factor loadings were small, they do favor the use of ULS over ML.

Finally, results concerning the goodness of fit indicate that RMSEA is sensitiveto model misspecification and detects substantive model changes such as adding oromitting one major factor. In all the models considered, when the model is cor-rectly specified, the RMSEA values are indicative of a close fit and there are nodifferences in empirical fit due to the estimation method. However, if the model isincorrectly specified, the RMSEA values are indicative of a mediocre or unaccept-able fit and are smaller for ML than for ULS for the two- and three-factormisspecified models with correlated factors. In addition, a better empirical fit wasfound for correlated factors. These findings are consistent with the studies of LaDu and Tanaka (1989), Sugawara and MacCallum (1993), and Olsson et al. (2000).

Overall, the results of this study indicate that recovery of weak factor loadings,goodness of fit, and convergence are improved if factors in the CFA model are cor-related and models are specified with the known correct number of factors. How-ever, for models specified with an incorrect number of factors, especially the oneswith few factors, the recovery and the empirical fit are poor, power decreases, andthere are more nonconvergent and improper solutions. Results also indicate that,for correctly specified CFA models that include a weak factor and under normality,ML sometimes fails to recover the weak factor whereas ULS succeeds. This differ-ence is small to be considered practically important but, congruent with previousresearch, it does favor the use of ULS over ML. However, for models incorrectlyspecified by altering the number of factors, the recovery of the weak factorloadings and the empirical fit are poorer, especially if misspecification impliesunderfactoring and factors are orthogonal, and ML and ULS provide similar re-sults but ULS improves power.

At one level, the results of this study supported the general results of previous re-search investigating recovery of weak factor loadings in the context of EFA, whichgeneralize to the confirmatory case. At another level, some specific results have im-plications for thepracticaluseofCFAwith factorial structures that includeweakfac-tor loadings. First, this study has demonstrated that models misspecified by altering

612 XIMÉNEZ

Dow

nloa

ded

by [

Um

eå U

nive

rsity

Lib

rary

] at

01:

31 1

5 N

ovem

ber

2014

the number of factors, especially those with few factors, show a poor recovery of theweak factor loadings. Thus, users of CFA with models that include orthogonal fac-tors, one of these being a weak factor, must be cautious in specifying the number offactors in the model and should err in the direction of overfactoring when the evi-dence is ambiguous. As previous research has not examined in detail the effect ofmisspecification by altering the number of factors in the context of CFA, future stud-ies shouldcontinueexamining theseeffectsusingotherdefinitionsof incorrectmod-els that imply underfactoring. Second, this study shows how important it is to err onthe side of specifying oblique factor solutions when the data come from a study pop-ulation structure that includes weak factor loadings. The benefits of this for modelidentification are known but their benefits for recovering the correct factor loadingshave not been examined in depth. However, more research should continue investi-gating the magnitude of this effect under different levels of factor correlation andother misspecification conditions. Finally, this study demonstrates that when thedata come from a population structure in which all factors are not equally strong, MLfails to recover the weak factor loadings in many instances in which ULS succeeds.Given these findings, researchers performing a CFA with factorial structures includ-ing weak factor loadings should favor the use of ULS estimation, or at least comparethe ML and ULS solutions. This is strongly recommended for factorial structuresthat include a great number of weak factor loadings.

ACKNOWLEDGMENTS

This work was partially supported by Grants 10/EFM/002 from the Comunidad deMadrid (Spain) and SEJ 2004–05872/PSIC from the Ministerio de Educacion yCiencia of Spain.

REFERENCES

Anderson, J. C., & Gerbing, D. W. (1984). The effect of sampling error on convergence, improper solu-tions, and goodness-of-fit indices for maximum likelihood confirmatory factor analysis.Psychometrika, 49, 155–173.

Bollen, K. A. (1989). Structural equations with latent variables. New York: Wiley.Boomsma, A. (1982). The robustness of LISREL against small sample sizes in factor analysis models.

In K. G. Jöreskog & H. Wold (Eds.), Systems under indirect observation: Causality, structure, pre-diction (pp. 148–173). Amsterdam: North-Holland.

Briggs, N. E., & MacCallum, R. C. (2003). Recovery of weak common factors by maximum likelihoodand ordinary least squares estimation. Multivariate Behavioral Research, 38, 25–56.

Browne, M. W. (1984). Asymptotically distribution-free methods in the analysis of covariance struc-tures. British Journal of Mathematical and Statistical Psychology, 37, 62–83.

Browne, M. W., & Cudeck, R. (1993). Alternative ways of assessing model fit. In K. A. Bollen & J. S.Long (Eds.), Testing structural equation models (pp. 136–162). Newbury Park, CA: Sage.

Fan, X., & Sivo, S. A. (2005). Sensitivity of fit indexes to structural or measurement model compo-nents: Rationale of two-index strategy revisited. Structural Equation Modeling, 12, 343–367.


Dow

nloa

ded

by [

Um

eå U

nive

rsity

Lib

rary

] at

01:

31 1

5 N

ovem

ber

2014

Fava, J. L., & Velicer, W. F. (1992). The effects of overextraction in factor and component analysis.Multivariate Behavioral Research, 27, 387–415.

Fava, J. L., & Velicer, W. F. (1996). The effects of underextraction in factor and component analysis.Educational and Psychological Measurement, 56, 907–929.

Gerbing, D. W., & Anderson, J. C. (1985). The effects of sampling error and model characteristics onparameter estimation for maximum likelihood confirmatory factor analysis. Multivariate BehavioralResearch, 20, 255–271.

Goodman, L. A. (1971). The analysis of multidimensional contingency tables: Stepwise proceduresand direct estimation methods for building models for multiple classifications. Technometrics, 13,33–61.

Jöreskog, K. G. (1967). Some contributions to maximum likelihood factor analysis. Psychometrika, 32,443–482.

Jöreskog, K. G., & Sörbom, D. (1981). LISREL: Analysis of linear structural relationships by themethod of maximum likelihood (version V). Chicago: National Educational Resources.

Jöreskog, K. G., & Sörbom, D. (1996a). LISREL 8: User’s reference guide (2nd ed.). Chicago: Scien-tific Software International.

Jöreskog, K. G., & Sörbom, D. (1996b). PRELIS 2: User’s reference guide (3rd ed.). Chicago: Scien-tific Software International.

La Du, T. J., & Tanaka, J. S. (1989). Influence of sample size, estimation method, and model specifica-tion on goodness-of-fit assessments in structural equation models. Journal of Applied Psychology,74, 625–635.

Levine, M. S. (1977). Canonical correlation analysis and factor comparison techniques. Beverly Hills,CA: Sage.

MacCallum, R. C. (2003). Working with imperfect models. Multivariate Behavioral Research, 38,113–139.

MacCallum, R. C., & Tucker, L. R. (1991). Representing sources of error in the common factor model:Implications for theory and practice. Psychological Bulletin, 109, 501–511.

MacCallum, R. C., Widaman, K. F., Preacher, K. J., & Hong, S. (2001). Sample size in factor analysis:The role of model error. Multivariate Behavioral Research, 36, 611–637.

MacCallum, R. C., Widaman, K. F., Zhang, S., & Hong, S. (1999). Sample size in factor analysis. Psy-chological Methods, 4, 84–99.

Olsson, U. H., Foss, T., Troye, S. V., & Howell, R. D. (2000). The performance of ML, GLS, and WLSestimation in structural equation modeling under conditions of misspecification and nonnormality.Structural Equation Modeling, 7, 557–595.

Olsson, U. H., Troye, S. V., & Howell, R. D. (1999). Theoretical fit and empirical fit: The performanceof maximum likelihood versus generalized least squares estimation in structural equation models.Multivariate Behavioral Research, 34, 31–58.

Paxton, P., Curran, P. J., Bollen, K. A., Kirby, J., & Fen, C. (2001). Monte Carlo experiments: Designand implementation. Structural Equation Modeling, 8, 287–312.

Skrondal, A. (2000). Design and analysis of Monte Carlo experiments: Attacking the conventional wis-dom. Multivariate Behavioral Research, 35, 137–167.

Steiger, J. H. (1990). Structural model evaluation modification: An interval estimation approach.Multivariate Behavioral Research, 25, 173–180.

Sugawara, H. M., & MacCallum, R. C. (1993). Effect of estimation method on incremental fit indexesfor covariance structure models. Applied Psychological Measurement, 17, 365–377.

Tucker, L. R. (1951). A method for synthesis of factor analysis studies (Tech. Rep. No. 984). Washing-ton, DC: U.S. Department of the Army.

Velicer, W. F., & Fava, J. L. (1998). Effects of variable and subject sampling on factor pattern recovery.Psychological Methods, 3, 231–251.

614 XIMÉNEZ

Dow

nloa

ded

by [

Um

eå U

nive

rsity

Lib

rary

] at

01:

31 1

5 N

ovem

ber

2014

a monte carlo study of recovery of weak factor loadings in confirmatory factor analysis

Documents