standard errors for rotated factor loadings

PSYCHOMETRIKA--VOL. 3~, NO. 4 D~CEMSER, 1973

S T A N D A R D E R R O R S F O R R O T A T E D F A C T O R L O A D I N G S *

CLAUDE O. ARCHER

NAVAL MISSILE CENTER

AND

ROBERT I . JENNRICH

UNIVERSITY OF CALIFORNIA AT LOS ANGELES AND EDUCATIONAL TESTING SERVICE

Beginning with the res~dts of Girshick on the asymptotic distribution of principal component loadings and those of Lawley ou the distribution of unrotated maximum likelihood factor loadings, the asymptotic distribution of the corresponding analytically rotated loadings is obtained. The principal difficulty is the fact that the transformation matrix which produces the rotation is usuMly itself a fmmtion of the data. The approach is to use implicit differentiation to find the partial derivatives of an arbitrary orthogona| rotation algorithm. Specific details are given for the orthomax Mgorithms and aa example involving maximum likelihood estimation "tnd varimax rota|ion is presented.

1. Introduction

Whi le i ts p roponen t s have never ques t ioned i ts impor t ance , m a n y s t a t i s t i c i ans a re su rpr i sed to l ea rn t h a t fac tor ana lys i s is one of t he mos t p o p u l a r m e t h o d s of s t a t i s t i ca l inves t iga t ion . A n ex tens ive c o m p u t e r usage su rvey a t U C L A found regress ion analys is , d i s c r im ina n t ana lys is , and fac tor ana lys i s t he t h ree mos t p o p u l a r s t a t i s t i ca l methodologies . I n f o r m a l inquir ies a t o the r i n s t i t u t i ons ind ica t e t h a t more of ten t h a n not , fac tor ana lys i s r anks in t he t op three . C o m m o n l y ava i l ab le fac tor ana lys i s p r o g r a m s differ f rom those in Other areas, however , in t h a t t h e y give no s t a n d a r d errors for t he e s t ima te s t h e y produce. Th is is due in large measure to t he fac t t h a t fo rmulas for t he s t a n d a r d errors of a n a l y t i c a l l y r o t a t e d fac tor load ing es t imates , t he e s t ima te s which cons t i t u t e t he p r i m a r y o u t p u t of s t a n d a r d fac tor ana lys i s p rograms , have no t been ava i lab le . I n two i m p o r t a n t pape r s Lawley [1953, 1967] ident i f ied t he a s y m p t o t i c s t a n d a r d er rors of t he u n r o t a t e d load ings p roduced in m a x i m u m l ike l ihood fac tor analys is . S imi la r resul t s for p r inc ipa l

* This research was supported in part by NIH Grant RR-3. The authors are grateful to Dorothy T. Thayer who implemented the algorithms discussed here as well as those of Lawley and Maxwell. We are particularly indebted to Michael Browne for convincing ~ls of the significance of this work and for helping to guide its development and to Harry H. Harman who many,years ago pointed out the need for standard errors of estimate,

581

582 PSYCHOMETRIKA

components analysis were given some time ago by Girshick [1939]. The difficulty in extending these results to the case of rotated toadings is that the transformation matrix T which produces the rotation is usually derived from the data. It is perhaps not surprising that LawIey and h~Iaxwell [1971] state, "I t would be almost impossible to take sampling errors in the elements of T into account. The only course is, therefore, to ignore them in the hope that they are relatively small." While Lawley and Maxwell may well have been concerned with broader issues including such things as subjective rotation, we shall show that at least in the case of analytic rotation the sampling errors in T can be taken into account. This is important because Wexler [1968] has produced some evidence to indicate they cannot be safely ignored. Sometimes the rotation problem can be avoided entirely by specifying a sufficient number of zeros or other constraints to identify the loading matrix. Standard errors for this case have been given by Jbreskog [1969].

We begin with a p by lc factor loading matrix A = (as,) and an asymptotically normal estimate A = (~,). By this we mean that as the size n of the sample on which fl~ is based approaches infinity, the distribution of %/n (,4 - A) approaches multivariate normal with mean zero. The maximum likelihood estimates in the classical factor analysis model [Anderson and Rubin, 1956] and the principal components estimates in principal components analysis both have this property. We are not concerned at this point with specifically which estimates are being considered, but rather with tile effect of an orthogonal rotation algorithm on the asymptotic distribution of ~. We have in mind algorithms such as quartimax, varimax, and equimax, but for the present let h denote an arbitrary orthogonal rotation algorithm. Specifically h, is a function which maps an arbitrary p by l~ matrix X into a p by lc matrix Y = X T where T is an orthogonal matrix whose value may, and generally will, depend on X. We are interested in the asymptotic distribution of A = h(-~) = (~,). In particular if A = h(A) = (X,) we will conclude that %/n(A -- A) is asymptotically normally distributed and find its asymptotic covariance matrix.

2. The Asymptotic Distribution o] Orthogonally Rotated Loadings

In principle at least our task is quite simple. Let dh be the differential of h at A, i.e., if h is represented by the equation A = h(A) then dh is the linear approximation to h represented by the equation

Ohm, (1) d~,. = :~. ~ d ~ . .

This may also be written as dA = dh(dA). In this notation

(2) V~n (h - A) ~ dh(~c/n (A -- A))

where " ~ " is read "is asymptotically equal to" and means that the difference

CLAUDE O. ARCHER AND ROBERT I. JENNRICH 583

between the left and right sides of (1) approaches zero in probability as n -~ ~ . This is the basis of the d-method as discussed by Rao [1965, p. 321]. Since dh is a linear transformation, and .~ is an asymptotically normal estimator of A, 3, is an asymptotically normal estimator of A whose asymptotic covariance matrix may be obtained from dh and the asymptotic covariance matrix of .~. To be more explicit, the formula:

(3) acov (~i, ~i,) = Z . . . . Oh,, 00~m- ~ acov (~mu ~,,,) Ohi* , , (~Olnv

expresses the asymptotic covariances of the rotated loadings in terms of those for the unrotated loadings.

For rotation algorithms of interest, quartimax, varimax, and equimax, it is difficult to find dh or, equivalently, the partial derivatives Ohir/Oai,

directly. Our approach is to use implicit differentiation. Suppose that

(4) ~(A) = 0

is a k ( k -- 1)/2 dimensional constraint which is satisfied whenever A is an h-rotation of A no matter what the value of A. Constraints of this form for rotation algorithms of interest will be found in Section 5. By differentiating the relations

(5)

one obtains

(6)

(7)

(S)

A = A T , T ' T = I , ~b(A) = 0

d h = d A T + A d T

d T ' T + T ' d T = 0

d~b(dA) = 0

where d~b denotes the differential of ~b at h. I t follows from (7) tha t T ' d T is a skew-symmetric lc by k matrix. Let ~ denote the space of all such matrices. I t has dimension k ( k - 1)/2. Moreover, for each skew-symmetric matrix K in ~ let

(9) L(K) = d ~ ( A g ) .

Then L is a linear transformation from a k (k -- 1)/2 dimensional space into a k ( k - 1)/2 dimensional space which we assume is invertible. This is usually the case for constraint functions ~h of interest. Since T ' d T is skew-symmetric it is in the domain of L and using (9) and (5) in order gives

(10) L ( T ' d T ) = dC~(AT'dT) = d ~ ( A d T ) .

Substituting (6) into (8) and using the linearity of d~b shows that dC~(AdT) =

- -d~b(dAT) so that from (10),

(11) L ( T ' d T ) = dCz(AdT) = - - d ¢ ( d A T ) .

584 PSYCHOMETRIKA

Thus

~12) T ' d T = -- L - I[d~,(dA T) ].

Multiplying on the left by A T and using (5) and (6) gives the basic relation

(13) dA = d A T - AL-I[di~(dAT)]

which expresses dA in terms of dA and defines the differential of h at A. It also defines the required partial derivatives of h. All that is needed for a particular rotation algorithm is to find an appropriate constraint function ~b and to recover the partial derivatives Oh~r/Oa~, from (13). The first task will be relatively easy. The second is a little harder.

8. Constraints ]or an Orthogonal Algorithm

Orthogonal rotation algorithms are designed to optimize a criterion:

(14) Q = Q(A) = Q ( A T )

over all orthogonal k by k matrices T. The resulting A = A T is called the Q-rotation of A. In the case of quartimax, varimax, and equimax rotation, Q is a quartic function of A. In target rotation, on the other hand, Q is quadratic. There is, however, no need to specialize at this point. An arbitrary Q is considered here and in the next section.

Assume that A = A T optimizes Q and let dQ denote the differential of Q at h. I t is necessary that

(15) dQ(A dT) = 0

for all d T which satisfy (7). Since T ' d T is skew-symmetric, it is necessary that

(16) dQ(AK) = 0

for all K in ~. Let K(r, s) be the elementary k by lc skew-symmetric matrix which has the value 1 in row r and column s, the value - 1 in row s and column r, and is zero elsewhere. Replacing K in (16) by K(r , s) and writing the result in coordinate form gives:

(17) ¢',8 = ,~, •'" Oh,----~, - >''~ = 0

for 1 <_ r, s _< k. These are the constraints which are needed. We observe that the matrix ¢ = (~r~) is skew-symmetric for arbitrary A.

One may view (17) slightly differently. Let (dQ/dA) denote the p by k matrix (oQ/Oh~,) of partial derivatives of Q. Then (17) says that A'(dQ/dA) is symmetric. In the case of the quartimax criterion (dQ/dA) = A 3 where A 3 = (h~3), and (17) demands that A'A 3 be symmetric. This furnishes a simple test for the convergence of a quartimax algorithm. Corresponding tests apply to other orthogonal algorithms.

CLAUDE O. ARCHER AND ROBERT I. JENNRICI - I 5 ~ 5

4. The Differential of an Orthogonal Algori thm

We turn now to the problem of finding formulas for the partial derivatives of an orthogonal rotation algorithm h. Note first that the elementary skew- symmetric matrices

(18) K(u , v), 1 < u < v ~ k

form a basis for ~. The k(k - 1)/2 by k(k - 1)/2 matrix (L~i) of L, as defined in (9), relative to this basis, taken in lexieographic order, is given by:

(19) L,( . , , ) , , ~.,~, = d~,,(hK(u, v))

where

(20) l(r, s) = (r -- 1)(2k -- r ) / 2 + s -- r

f o r l _~ r < s_< k a n d l _< u < v_( k. While it is not evident from our derivation and not essential to what

follows, the matrix (L~) is in fact symmetric and nonnegative definite. I t can be shown that (L . ) is a matrix of second partial derivatives evaluated at the minimum of an appropriate function. Under the assumption that L is nonsingular, (L.) is positive definite.

Let (L") be the inverse of the matrix (L. ) and let

(21) ei.~. --- [AL- I (K(u , v))],.

be the (i, r)-th component of the matrix AL-~(K(u , v)). Using the fact that (L ~) is the matrix of L -1 with respect to the basis in (18),

r--I k . l ( u . , ) ~ t L t ( r . t ) . z ( u . ~ ) ( 2 2 ) e,, o = ' ' . ' > - •

t=l t = r + l

for 1 < i < p, 1 < r < k, and 1 < u < v_~ k. Thesums in (22) are zero when the lower limit exceeds the upper limit.

Finally, using (21), the basic relation (13) may be put in the coordinate form:

(23) dX~, = E d a v y . -- E E e,.,,, d ~ . T , s = l u<v ist ~ "

Reading the partial derivatives of h from this gives:

(24) Oa,,Oh'" _ (5~,T. -- ,,<.~-" Z, e,.,. ~ T.,

for 1 < i, j _< p and 1 ~ r, s < It. Here 3 . denotes the Kronecker delta.

586 PSYCHOMETRIKA

In summary the partial derivatives of h may be computed from A, T, and a formula for Q as follows:

(i) Use Q and (17) to obtain formulas for the ~r, • (ii) Compute the values of the partial derivatives 0~b,,/0X, .

(iii) Using (19) form the matrix (L,) and invert it. (iv) Compute the e~,~o using (21). (v) Using (24) compute the partial derivatives of h.

5. Orthomax Algorithms

We turn now to the problem of finding specific formulas for the orthomax algorithms. These algorithms are designed to maximize the orthomax criterion:

1 ~ ( ~ ~ , ( ~ 2) ~) (25) Q = ~:,=, = x,~ ~ - p ~ir •

This becomes the quartimax, varimax, and equimax criterion when ~, = 0, 1, and k/2 respectively. Using (17) the corresponding constraint functions are:

- - X 2

for 1 g r, s _~ k. Alternatively, these constraint functions may be found, in the ~, = 0, 1 cases at least, by setting tile rotation angle ~ equal to zero in the quartimax and varimax algorithms described by Harman [1967, p. 300 and p. 307].

The partial derivatives of the ¢~, follow easily from (26). For 1 < i < p a n d l g r ~ s ~_ lctheyare:

(27) OX~ ~-o, ~o,

0hi, 0h~

For all other values of i, r, s, and t:

(28) 0 ¢ , _ 0. Ohit

In summary the partial derivatives of an orthomax algorithm h may be computed from A and T as follows:

(i) Using (27) and (28) compute the partial derivatives of ¢. (ii) Using (19) form (L~j) and invert it.

(iii) Using (21) compute the e~ . . . . (iv) Use (24) to give the partial derivatives of h.

CLAUDE O. ARCHER AND ROBERT I. JENNRICH 587

As observed earher, we must assume that L as defined by (9) is nonsingular when A is an orthomax rotation of A. This needs to be an assumption for it is not always true. I t is easy to show, for example, that if the orthomax criterion has the same value for every rotation of A, then L ---- 0 and is clearly singular. In the two factor case this is the only way in which L can be singular. More generally, in the course of a simulation study the authors have looked at thousands of randomly selected L transformations arising in quartimax rotation. Not one of these was singular. The same was true for a smaller set of varimax rotations. There is presently, at least, no indication that our nonsingularity assumption will prove to be a practical difficulty. Indeed, in the cases looked at, the matrix (L,) was not only nonsingular but fairly well conditioned.

6. An Example and Discussion

Table 1 contains maximum likelihood estimates of unrotated factor loadings obtained from an analysis of the sample correlations of 9 variables measured on 211 subjects as reported by Lawley and Maxwell [1971, p. 43]. The standard errors of these loadings obtained by evaluating Lawley's formulas [Lawley and Maxwell, 1971, p. 62] for standardized loadings (i.e., loadings computed from correlations) are given in Table 2. The values in

TABLE 1

Unrotated Standardized Maximum Likelihood Loadings

Factor

Variate i Ii I!I Communality

1 .664 .323_ .074- .550

2 .689 .247 - .193 .573

3 .493 .302 -.222 .383

4 .837 - .292 - .035 .788

5 .7o5 - .315 - .153 .619

6 .819 - .377 .105 .823

7 .661 .396 - .O78 .6oO

8 .458 • 296 .491 .538

9 .766 .427 - .012 .769

588 PSYCHOMETRIKA

TABLE 2

Standard errors for the unrotated loadings in Table i

Factor

Variate I II llI

i .o47 .o65 .o91

2 .043 .076 .069

3 .06o .078 .083

4 .030 .069 .054

5 .043 .073 .081

6 .042 .052 .037

7 .050 .067 .080

8 .o66 .lO1 .122

9 .o44 .060 .078

this table differ somewhat from those given by Lawley and Maxwell [1971, p. 64] because a minor modification, confirmed in personal communication, is required in Lawley's formulas. These standard errors are pleasingly small, ranging in value from .030 to .122.

Turning to the rotated case, Table 3 contains a varimax rotation of the loadings in Table i together with the transformation matrix T which produced them [Lawley and Maxwell, 1971, p. 75]. As can be seen from the matrix T, a substantial rotation has been made. The resulting structure, however, is not particularly simple. The standard errors for the rotated loadings in Table 3 computed by using Lawley's formulas and ignoring the fact that T is computed from the data are given in Table 4. We will refer to these as uncorrected standard errors. Again their values are pleasingly small, ranging from .028 to .128. Using the results developed here, Table 5 contains the corresponding standard errors corrected for sample variation in T. The values cover a slightly larger range, from .029 to .191. Before turning to a direct comparison of the uncorrected and corrected standard errors, we note that what has been presented thus far demonstrates the feasibility of computing standard errors for analytically rotated loadings. To our knowledge this is

CLAUDE O. A R C H E R AND ROBERT I. J E N N R I C H 589

the first time that asymptotic standard errors for analytically rotated loading estimates have been presented in the literature.

There are important and substantial differences between the uncorrected and corrected standard errors. The uncorrected standard errors range from 42% below to 69% above the corrected standard errors. These differences support Wexler's [1968] simulation study which showed large discrepancies between uncorrected standard errors and standard errors obtained by simulation. The differences between uncorrected and corrected standard errors may be made arbitrarily large by choosing the data carefully. Using artificial data

TABLE 3

Varimax rotation of loadings in Table i

Factor

Variate I II III

i .6o5 .306 .3o0

2 .656 .376 .036

3 .589 .19o -.o22

4 .296 .832 .08i

5 .243 .746 -.065

6 .i78 .870 .i87

7 .709 .258 .175

8 .323 .158 .64o

9 .772 .319 .269

Rotation Matrix T

-596 .771 .224

.728 -.637 -253

-.338 -.013 .941

590 PSYCHOMETRIKA

the authors have computed standard errors which differ by more than 50 fold. Because they originated from real data, however, the differences in Table 4 and 5 are probably more relevant.

The standard errors presented give a simple indication of how stable the factor loading estimates are. Since the standard error of the difference of two estimates never exceeds the sum of their standard errors, a quick significance test can be based on the rule which declares an observed difference significant if it exceeds twice this sum. Under this rule the estimates in Table 1 of ~11 and ~2 differ significantly while those of k~2 and k13 do not. A more sensitive test could be made by expressing the standard error of a difference in terms of the variances and covariances between the factor loading estimates.

Indeed, since the covariances are available, familiar procedures make it possible to test almost any hypothesis about rotated factor loadings. For example, is one variable related to the factors in the same way as another or, given a sample from a second population, does the factor pattern there differ from that of the present population? One may also produce simultaneous confidence intervals which allow him to scan across one or more tables of factor loadings in search of significant differences and account for the fact that he is scanning.

TABLE 4

Uncorrected standard errors for the rotated loadings in Table 3

Factor

Variate I II III

1 .o5o .o68 .087

2 .o51 .07o .o7o

3 .058 .079 .o84

4 .o71 .035 .o47

5 .o83 .o4o .o73

6 .o59 .o28 .o4o

7 .o41 .o74 .o8o

8 .077 .083 .128

9 .o32 .068 .o77

CLAUDE O, ARCHF~ AND ROBF_,RT I. JF, NNRICH 5 9 1

TABLE 5

Corrected standard errors for the rotated loadings in Table 3

Factor

Variate I II III

1 .066 .052 .i03

2 .048 .052 .m20

3 .056 .060 .124

4 .041 .029 .070

5 .049 .037 .092

6 .038 .032 .044

7 .048 .05o .lO9

8 .124 .0 55 .191

9 .055 .044 .109

Before attacking such generalizations, however, it may be wise to ascertain how well the asymptotic results perform on finite samples. Pre- liminary work here is encouraging bu t only begun. Another natural next step is to derive similar results for the oblique case.

Pennell [1972] used the "jackknife" to produce confidence intervals for factor loading estimates. Tried independently by one of the authors, this method was earlier rejected as computationally too expensive for routine use and because it did not produce exact asymptotic formulas. There are many ways to implement the jackknife, however, and this judgment may be premature.

REFERENCES

Anderson, T. W., & Rubin, H. Statistical inference ill factor analysis. Proc. Third Berkeley Syrup. Malh. Statist. Probab., 1956, 5, 111-150.

Girshick, M. A. On the sampling theory of roots of determinantat equations. Annals of Mathematical Statistics, 1939, 10, 203-224.

Harmon, H. H. Modern factor analysis. Chicago: University of Chicago Press, 1967. J6reskog, K. G. A general approach to confirmatory maximum likelihood factor a~lalysis.

Psychometrik~, 1969, 34, 183-202.

592 PSYCHOM~I'RIKA

Lawley, D. N. A modified method of estimation in factor analysis and some large sample results. UppsaM Symposium on. Psychological Factor Analysis. Nordisk Psykologi's Monograph Series No. 3. Stockholm: Almqvist and Wiksell, 1953, Pp. 35-42.

Lawley, D. N. Some new results in maximum likelihood factor analysis. Proceedings of the Royal Society of Edinburgh, 1967, 57A, 256-264.

Lawley, D. N., & Maxwell, A. E. Factor analysis as a statistical method. New York: American Elsevier, 1971.

PenneU, R. Routinely computable confidence intervals for factor loadings using the "jackknife", British Journal of Mathematical and Statistical Psychology, 1972, 25, 107-114.

Rao, C. R. Linear statistical inference. New York: Wiley, 1965. Wexler, N. An evaluation of an asymptotic formula for factor loading variance by random

methods. Unpublished doctoral dissertation. Rutgers University, 1968.

Manuscript received 2123/73 Revised manuscript received 7/30t73

standard errors for rotated factor loadings

Documents