the problem of the additive constant and eigenvalues in metric multidimensional scaling

PSYCHOMETRIKA--VOL. 43, NO, 2. JUNE, 1978

THE PROBLEM OF THE ADDITIVE CONSTANT AND EIGENVALUES IN METRIC MULTIDIMENSIONAL SCALING

TAKAYUKI SAITO

JAPAN UNIVAC RESEARCH INSTITUTE

This paper is concerned with the additive constant problem in metric multidimensional scaling. First the influence of the additive constant on eigenvalues of a scalar product matrix is discussed. The second part of this paper is devoted to the introduction of a new formulation of the additive constant problem. A solution is given for fixed dimensionality, by maximizing a normalized index of fit with a gradient method. An experimental computation has shown that the author's solution is accurate and easy to follow.

Key words: additive constant problem, eigenvalues, metric multidimensional scaling, normalized index of fit.

Introduction

In metric multidimensional scaling, if the original dissimilarity (or similarity) data are interval-scaled, they are customarily processed as if they are ratio-scaled. This process involves what is traditionally called estimation of the additive constant.

The breakthrough and the subsequent development of nonmetric MDS procedures from 1960's have apparently decreased interest in the additive constant problem. The reason is probably the wide applicability of the nonmetric MDS. Given dissimilarity data as metric, we can perform nonmetric scaling using only the ordinal relationships among the elements in data. If this approach is followed, it will not be necessary to bother with the additive constant problem. However, we argue that applying nonmetric procedures to metric data leaves some important part of the original data unutilized. When we consider the relation between the input data and the output solution, metric procedures have advantages over nonmetric ones. Further the metric indeterminancy inherent in a nonmetric approach leads to problems for data with a small number of stimuli. For these reasons, metric procedures should be given the attention that they deserve, hence the additive constant problem still remains to be investigated.

Torgerson [1952, 1958] suggested several ways of treating the problem. Among them the one-dimensional subspace scheme is, though rough, most practical. Messick and Abelson [1956] presented a systematic study of the problem. They assumed that the scalar product matrix B with fallible data would be approximately positive semidefinite. For there would be a small number of large positive eigenvalues and the remaining ones would be around zero, plus or minus. Under this assumption they developed an iterative procedure to find the additive constant for a given dimensionality. In cases where the assumption does not hold, applying their procedure entails some trouble, as Cooper [1972] pointed out. This will be discussed later.

Cooper [1972] presented a direct approach, fitting euclidean distances to dissimilarity data in a least-squares sense. He reformulated the problem as a nonlinear optimization problem and proposed to seek an optimal solution for a fixed dimensionality by successive

The author wishes to thank Mr. Yukio Inukai and anonymous reviewers for their useful suggestions. Sincere thanks are extended to Dr. Yajiro Morita who improved the English manuscript.

Requests for reprints should be sent to Takayuki Saito, Misawa 979-230 A 127, Hino-shi, Tokyo 19 I, Japan.

193

194 PSYCHOMETRIKA

approximation. In this approach the additive constant and the stimulus coordinates are treated as independent variables and thus the constant and the coordinates are determined simultaneously. In contrast to Cooper's approach, the traditional approach of metric MDS should be regarded as an indirect one, since it works with scalar products rather than distances. In this sense Cooper's approach seems to be more natural than the traditional one. However, the traditional approach has some appealing features including the simplicity of formulae deduction and the convenience of computer programming.

In this paper we follow the traditional approach. First we are concerned with the influence of the additive constant on the eigenvalues of the matrix B. Next we formulate a new procedure in the additive constant problem. By this procedure we obtain the solution by optimizing a normalized index of fit P, and thus circumvent the difficulties in Messick and Abelson's procedure for reasons which will be described in the discussion section.

N o t a t i o n

Given n stimuli (n > 3), let Sj~ be the dissimilarity between stimulusj and k. Assume Sjk >-- 0, Sj~ = S~, and Sj~ = 0 for any S~. Denote a matrix of dissimilarities by S,

s = (sj~).

Following Torgerson (1958) define a matrix of scalar products with the origin at the centroid as

B = (bj~)

where

(1) b,~ = (S~. + S% - $2.. - S~) . 2

Let ht (t = 1, 2, . • • , n) be eigenvalues of B in descending order of magnitude and xt be the eigenvector associated with ~,,,

(2) Bxt = ~ x t .

Write the j-th element of xt as xi t . Since B is a doubly centered matrix, there exists an eigenvector 1 whose elements are unity, associated with Xo, an eigenvalue equal to zero.

A n I n d e x o f F i t

Now suppose that the largest r eigenvalues are greater than ~(> 0), some specified value. Then B may approximately be expressed in the least-squares sense as B ~/~, where the (L k) element of B is

(3) bj~ = ~ X,xj~x~. l ' = l

The derived distance is given by

(4) d~ = Xt(x j t - x~t ) ~ •

It should be stressed that the traditional procedure of metric MDS is equivalent to performing the least-squares (LS) approximation to B [Eckart & Young, 1936]. In other words, in this procedure Sj~ is not directly approximated by dl~ but b~ is directly approximated by bjk. Thus we can measure the goodness of fit with an index defined by

b2

(5) P = j'~ '

T A K A Y U K I S A I T O 1 9 5

Let B have a decomposition B = XAX' where X is an orthogonal matrix of eigenvectors, and A a diagonal matrix of eigenvalues. Then

that is,

(6)

tr BB' = tr X A 2 X ' = tr A 2 ,

n 2 ~__.

J,k t = l

From the same argument for/~, we also have

J,k t = l

Hence P is expressed in terms of eigenvalues,

(8) p = t=l

Later we shall refer to the following relation,

(9) 2n j j t t=l '

where doj denotes the distance of po in t j from the origin in the r-dimensional space. This relation follows from the definition of B and the orthonormali ty of the eigenvectors.

The Change o f Eigenvalues

Suppose that original data Sjk are interval-scaled which are, without loss o f general- ity, all nonnegative. We consider an additive constant a and re-scale Sjk as

(lO) &~(~) = s ~ + 5.

Let B(a) be the B matrix composed of Sj~(u), and xt(a), ~,t(o~), P(o~) correspond to xt, At, P respectively. Define a centralization matrix as

where ajk is Kronecker delta. Then B(oe) is expressed as

( l l ) B(~) = B + c

where

O~ 2 (12) C = - a H ' S H + -~- H.

Let us consider the influence of the additive constant on the eigenvalues of B(c~). First noting that both B and C are symmetric in (11 ), we apply a theorem [e.g. Bellman, 1970, p. 117] and then find that ~t(a)..._~ ~.t if C is positive semidefinite. Since H ' H = H, rewrite C as

(13) C = -oeH' (S - B I ) H

where/3 = a / 2 and I is the identity matrix of order n. As is well-known, if (S - /3I) is positive semidefinite, then also H' (S - B I )H is positive semidefinite. Then C becomes

196 PSYCHOMETRIKA

positive semidefinite either if¢~___> m, when a > 0 or if ~_._< g~ when a < 0. Here gt denotes the t-th eigenvalue of S. Thus we have found that

),t(a) >- Xt i f a > 291 .

The result by Lancaster [1964] is more useful in discussing the change of eigenvatues. He investigated the behavior of eigenvalues of matrices whose elements are analytic functions of a parameter. Among the theorems presented by him, Theorem 5 treats the derivatives of simple eigenvalues with respect to the parameter and Theorem 7 treats those of multiple eigenvalues. As regards metric MDS, it is practical, except for artificial data, to assume that eigenvalues are distinct. Under this assumption we apply Lancaster's Theo- rem 5 to B(a).

According to the theorem, the derivative of ~.t(ct) with respect to a evaluated at a =ao is given by

(14)

From (11),

(15)

dX, ~-~-(,~0) x,(,~0). da (a°) = x~(a0)

dB da (or) = - H ' S H + al l .

Since xt(o~) is an eigenvector of the double-centered matrix B(a), we find that for ht(a) ~ O,

1' xdo0 = 0

from the orthogonality of eigenvectors. Then for the centralization matrix H, we can write

(16) H x,(a) = x,(a)

for xt(a) associated with ht(Ol) ~ 0. Thus we have obtained for the nonzero eigenvalues,

(17) d a (s0) = -Xl(ao)(S - aoI)xt(ao)

= -x~(o~o)S Xdao) + ao.

Obviously the eigenvalue equal to zero, h0, does not change. Equation (17) implies that if so > #1, increasing ot from So will cause all the eigenvalues, except )%, to increase. We restate the change of nonzero eigenvalues as:

d~ t o/° (a) Ifao >_ 2#,, then ),t(ao) >- at and ~ ) > 0.

d~t (b) Ifao __>/~1, then ~a-a (a0) > 0.

Bounds of ~,1 are given by

(18) o < ~ ~ <- u~ <- . s ,~)

For any unit vector u, u 'Su < ux, [e.g. Graybill, 1969, p. 309]. Taking u = (n -~j~, " " , n-~n) gives the left part of (18). Further, according to the same argument to obtain (6), we find

t=l ),k

TAKAYUKI SAITO 197

Hence the right part of (18). In passing, it may be of interest to note that

2 U~=2n2 At. t = l t * l

This relation is obtained by combining (18a) with the following equation,

t = ~ h e = t r B . . . . 2n

A New Procedure for the Additive Constant Problem We will now present a new formulation of the additive constant problem. De Leeuw

[Note 1 ] also mentioned this formulation but no details were given. Write the index of fit P as a function of the constant a,

(19) p ( . ) = t=l x~(~)

If P(a) is assumed to be a unimodal and convex function in the range of interest to the investigator, maximizing P(c~) will yield the best LS solution to B(a). Accordingly, the maximization problem will be handled with a gradient method. This is an iterative method which involves computing the derivative of P(o~) on each iteration in order to determine the change of o~.

The Derivative Denote the derivative of P(a) by P'(c~) and a at the i-th iteration by c~. P'(c~) is

computed by

( ~ dXt d ~ t \ 2 at(a) ~ (a) - P(a) ~ )~t(a) ~ (a))

(20) P'(,~) = '° '

Noting here that

(21)

we then have

(22)

x~(a) = ~ bi~(o~), t = l ],k

d~_ db~k

Further, the right-hand sides of (21), (22) are computed by (11), (15), respectively. These relations indicate that eigenvalues, their derivatives and corresponding eigenvectors are required in (20) only for the largest r eigenvalues. Therefore, to obtain P'(a), it is necessary to solve the eigenequation on each iteration

(23) B(a,)xt(o~) = ~t(o~t) ~h(ot~) (t = 1, 2, " " , r).

The Algorithm

The maximization algorithm includes two phases. The first phase is to find a closed interval I0 in which the maximal point of P(ot) lies. We start with a step size As and an

198 PSYCHOMETRIKA

initial estimate ao. Change a by the fixed size Ae~ on each iteration or by the variable size i. Aa on the i-th iteration until we find an interval Io = [a~, au] so that P'(a~) > 0 and P'(a,) < 0. After finding Io, the second phase--the Bolzano search--begins. On each iteration we make the interval smaller by evaluating P'(o~) at the midpoint of the interval and utilizing the information of P'(a) for the past iterations. During iterations in the first or the second phase, the search terminates whenever I P '(a)l becomes smaller than some critical value, & Under the assumption of unimodat and convex function, repeating the iterative process will give the maximal point of P(a),

The Initial Estimate

To get o~0, the following treatments are suggested. One way is to set so = 0. Another is the one-dimensional subspace scheme described in Torgerson [1952],

a0 = Max (Sjl - Sj~ - S~t). J,k,l

According to (b) we can regard #1 as some critical value in the change of eigenvalues as functions of the additive constant. Then, as the third treatment, we set ozo equal to #1. Since the exact value of/a1 is not required for the present purpose, we may use the upper bound of ul given by (18). Concerning the step size used in the first phase, we should select an arbitrary value examining the range of dissimilarity data.

Numerical Examples

Experimental computation of our procedure has been performed on several sets of data successfully. Shown here are results with Torgerson's Munsell color data [Torgerson, 1958, p. 286, Table 5], obtained by setting r equal to two. We use c to denote the additive constant for the original data including negative values. Table 1 represents the result by setting Co = 2.37 (i.e. ao = 0). The initial value was selected so that the smallest dissimilarity may be zero. Table 2 shows a result very close to that in Table 1, obtained by setting co = 3.93, which was given by the one-dimensional subspace scheme. Finally we used a large initial value, co = 20.78, which is an upper bound of t~l. Corresponding to co, 100P0 = 52.52. Applying the iterative~rocedure with Ac = 4.0 yielded 100P = 98.9479 at e = 3.2148 in 16 iterations. Throughout these cases, the convergence criterion 6 was set equal to 10 -5. On the basis of these results, we conclude that c is estimated to be 3.215 at which P attains its highest value. F o r the purpose of comparison, applying Messick & Abelson's procedure [1956] to the same data yielded 100P = 98.9474 at c = 3.2380.

Discussion Assumptions

We used two assumptions to develop the procedure for maximizing P(a). The first assumption, from which the expression P'(o0 in (20) follows, is that the eigenvalues are all distinct, This assumption seems to be strong but it is still practical for real data. If desired, we would be able to generalize the procedure for cases including multiple eigenvalues by utilizing Lancaster's Theorem 7 [1964]. The other assumption, that P ( , ) is unimodal and convex, is not generally realistic. The assumption is imposed for practical purposes, as is customarily done with methods of successive approximation. Even if the assumption does not hold, our procedure will at least yield a local maximal point of P(a). As Messick and Abelson [1956] stated, the investigator must consider the additive constant when he is interested in an exact analysis rather than an exploratory study. In those situations, the positive semidefiniteness of the B(~) matrix might-be violated but the negative eigenvalues would not be as far from zero, hence employment of the second assumption would be practical.

TAKAYUKI SAITO

TABLE i

199

Algorithms

Result with Torgerson's Data (I)*

i c 100P P'

0 2.3700 98.0078 .023335

i 2.5700 98.4123 .017192

2 2.7700 98.6983 .011479

3 2.9700 98.8738 .006124

4 3.17oo 98.9455 .001092

3.3700 98.9196 -.003627

6 3.2700 98.9443 -.001306

7 3.2200 98.9479 -.O00117

8 3.19~0 98.9474 .000485

9 3.207~ 98.9478 .000184

lO 3.2138 98.9479 .0C0033

ll 3.2169 98.9479 -.000042

12 3.21~3 98.9479 -.000004

*Using co =2.37 and AC =0.2 in the

first phase

In the second phase of the algorithm, after we have found an interval that includes the turning point, a unidimensionat search procedure such as the Fibonacci search or the golden section search might be applied under an assumption of unimodality. (One of the reviewers pointed out this possibility.) The search would be performed by using only the function values, i.e. without P'(a). Let Rm be the ratio I,JIo, where I0 is the initial interval and lm is the interval after m evaluations. In the Bolzano search, Rm is (1/2) "~, which becomes 1/1024 for m = 10. In the Fibonacci search, Rm = 1~Fro where Fr~ is Fibonacci number, and R,~ becomes 1/55 for m = 10. In the golden section search, Rm = ((51/~ - 1)/

200 PSYCHOMETRIKA

TABLE 2

Result with Torgerson's Data (2)*

i c lOOP P '

O 3.9300 98.3815" -. 015179

i 3.5"300 98. 8329 -. 007178

2 3.1300 98. 9391 .002073

3 3. 3300 98. 9323 -. 002708

4 3.2300 98.9477 -. oo035"6

3.1800 98.9464 .000849

6 3.205"0 98.9478 .000244

7 3- 2179 98.9479 -. 00005"7

8 3.2113 98.9479 .000093

9 3.2144 98. 9479 . o0o018

i0 3.215"9 98.9479 -. 000019

II 3.215"2 98. 9479 -. 00oo01

*Using Co =3.93 and

first phase

AC =0.4 in the

2) '~ > (1/2) m. Therefore the reduction o f the interval is more effective with the Bolzano search than the Fibonacci or the golden section search as long as we can utilize P'(o~). In addition, the present algorithm with the simple gradient method could be improved by refined gradient methods. In the present paper, however, our primary concern is to show that the maximization problem of P can be solved with a gradient method. Any refinement of the method may be a subject for future investigation.

Comparison with Messick & Abelson's Procedure

With both the present procedure and Messick & Abelson's procedure we must obtain r dominant eigenvalues and eigenvectors. With Messick & Abelson's procedure, we solve the quadratic equation of ~. With our procedure we need derivatives of eigenvalues instead of the solution to the quadratic equation.

TAKAYUKI SAITO 201

Let S U M A be the sum of the largest r eigenvalues and S U M B the sum of the remaining ones. In Messick & Abelson's formulation S U M A is regarded as a measure of fit to data by the MDS model. Therefore, in their formulation, a solution to make S U M B

zero is sought. Now we can suppose cases in which S U M B = 0 and yet some of the negative eigenvalues are not negligible. For such cases the MDS model will not show a good fit, and so working with S U M A will not be suitable. (In selecting a root of the quadratic, Messick & Abelson suggested taking the root which gives the largest S U M A .

Referring to (9), we see that S U M A is proport ional to the sum of squared distances from the origin or the sum of squared interpoint distances. In other words, S U M A reflects the degree of divergence of the configuration. Thus their suggestion implies that the configuration derived with their suggested root is more divergent than that with the other root.)

In the present formulation, the normalized measure of fit P(a) is maximized. Since P(a) is expressed in terms of squared eigenvalues, it is more sensitive to large negative eigenvatues than S U M A . Then maximizing P(a) covers the defect in working with S U M A

to a great extent. However it should be noted that the maximization will not necessarily make the remaining roots zero. Hence there might be a case where the present procedure would become ineffective because of large negative eigenvalues. In such a case it will turn out, after performing the procedure, that embedding the stimuli in a euclidean space is not appropriate.

REFERENCE NOTE

1. De Leeuw, J. Finding a positive semidefinite matrix of prescribed rank r in a nonlinear differentiable manifold (Technical Report). Murray Hill, N J: Bell Laboratories, unpublished.

REFERENCES

Bellman, R. Introduction to matrix analysis. New York: McGraw-Hill, 1970. Cooper, L. G. A new solution to the additive constant problem in metric multidimensional scaling. Psycho-

metrika, 1972, 37, 311-322. Eckart, C. & Young, G. The approximation of one matrix by another of lower rank. Psychometrika, 1936, 1,

211-218. Graybill, F. A. Introduction to matrices with applications in statistics. Belmont: Wadsworth, 1969. Lancaster, P. On eigenvalues of matrices dependent on a parameter. Numerische Mathematik, 1964, 6, 377-387. Messick, S. J. & Abelson, R. P. The additive constant problem in multidimensional scaling. Psychometrika,

1956, 21, 1-15. Torgerson, W. S. Multidimensional scaling: I Theory and method. Psychometrika, 1952, 17, 401-419. Torgerson, W. S. Theory and methods of scaling. New York: John Wiley, 1958.

Manuscript received 9/9/76 First revision received 5/23/77 Final version received 12/21/77

the problem of the additive constant and eigenvalues in metric multidimensional scaling

Documents