configural analysis and pattern recognition

SPECIAL M O N O G R A P H SUPPLEMENT

CONFIGURAL ANALYSIS AND PATTERN RECOGNITION* PAUL HORST

University of Washington

CONFIGURAL ANALYSIS AND PATTERN RECOGNITION Clinical psychologists have long recognized that in attempting to summarize

and interpret a client’s responses to a number of personality items it is not enough merely to count the number of items responded to correctly according to some pre- determined scoring key. Zubin was among the first to emphasize the importance of considering specific patterns or configurations of responses rather than a mere sum or even a weighted sum of these responses. Meehl‘’) was the first to demon- strate in precise quantitative terms that one might have very low zero order correlations for each of two items with a dichotomous criterion for schizophrenic and normal Ss and yet have appreciable positive validity from a configural scoring of the items. This phenomenon has come to be called “Meehl’s paradox.” Horst ( 3 )

showed that “Meehl’s paradox” was a special case of linear combinations of products of item scores.

A great deal of empirical work within the area of pattern analysis has been carried out since the appearance of Rleehl’s article. Among the most significant might be mentioned the work of RfcQuitty(6). Perhaps the most definitive attempt to reduce configural analysis to precise mathematical formulation for the case of dichotomous variables is the work of Lubin and O~burn ‘~) . In recent years, the work of my own students‘l- 2. 6 , * ) indicates typical developments in the area of configural and pattern analysis. It appears, however, that in order better to under- stand the possibilities and limitations of configural analysis it may be useful to attempt a more general mathematical development than has thus far been available.

A. Mathematical Functions and Prediction Before discussing the mathematical formulations of configural prediction, i t

may be useful first to consider some introductory concepts fundamental to scientific prediction in general.

In any scientific prediction endeavor, we typically have measures for a sample of entities on a number of attributes or variables. The variables we wish to predict are the criterion variables, and those from which we predict are the predictor variables. Using this sample of entities which has measures on both the predictor and criterion variables, we develop objective procedures for estimating criterion measures from predictor measures for entities having only predictor measures. It should be noted that no restrictions are necessarily placed on the type of measures. They may be dichotomous, discrete, continuous, etc.

There is evidence of considerable vagueness and loose thinking in discussions of pattern and configural analysis. Essentially, however, configural prediction begins by considering the distinct patterns of measures in the sample for the predictor variables. For each pattern, the average criterion measure for all members of the sample having that pattern of predictor measurp is determined. This value is then taken as the estimate of the criterion measure for subsequent entities having

*This sttidy was sup orted in part by Office of Naval Research Contract Nonr477(33) and Public Hea1t.h Research 8raut MHOO743-08. Reprodwtion in whole or part is permitted for any purpose of the United States Government.

384 PAUL HORST

that pattern of predictor measures but for which criterion measures are not available. For the case of dichotomous measures, Lubin and Osburn(4) have called the set of mean criterion values corresponding to the distinct answer patterns, the configural scale.

A more common procedure for estimating criterion measures from predictor measures is based on the use of mathematical functions. One or more variables may be expressed as mathematical functions of other variables. When a mathematical function is used as a basis for estimating criterion variables, the estimates of the criterion variables are expressed as mathematical functions of the predictor variables.

A mathematical function consists of constants or parameters, and of variables. When it is used to estimate criterion values from predictor variables for a number of entities, the constants have the same values for each of the entities, while the variables, of course, may have a different set of values for each of the entities. The problem is to determine the constants or parameters of the function on the basis of the known values of the criterion and predictor variables for the entities in a sample so that these parameters, together with the known values of predictor variables, may be used to estimate unknown values of criterion variables for other entities.

We may usefully classify such mathematical functions into one of four types as follows:

Type A. Functions which are linear in the parameters and linear in the variables.

Type B. Functions which are linear in the parameters and nonlinear in the variables.

Type C. Functions which are nonlinear in the parameters and linear in the variables.

Type D. Functions which are nonlinear in both the parameters and the variables.

A special case of a type B function which is linear in the parameters but not in the variables is called a multivariate polynomial function. Each term of the function is the product of a number of factors. One of the factors in a term is a parameter or constant and the others are nonnegative integral powers of the variables. I n one or more of the terms, a t least one of the variable factors must have an exponent greater than unity, or a t least two of the variable factors must have exponents greater than zero. If in no term the variable factors have exponents greater than zero, except one and that one is not greater than unity, then we have a special case of a multivariate polynomial function which degenerates to a type A function, or a function which is linear in both the parameters and the variables.

If the factor in a term of a type B function involves transcendental functions of the variables such as trigonometric or exponential, then these may often be expanded in series to yield polynomial functions with known coefficients, and thus reduce the general type B function to a multivariate polynomial function.

A scientific hypothesis can be expressed in terms of one of the four types of functions. To test a scientific hypothesis, one determines the values of the parameters in a mathematical function on the basis of a matrix of experimental data involving measures on both a criterion variable and one or more predictor variables, and finds how well the estimates of the criterion values for a sample of entities compare with the actual criterion values. In these few simple words, we blithely dispense with the troublesome problems of cross validation, degrees of freedom, and other unpleasant topics, even though these are currently major problems in the area of configural prediction.

It is enough to note here that the parameters in a type A or type B function are in general easier to determine than those in the type C and D functions. If

CONFIGURAL ANALYSIS AND PATTERN RECOGNITION 385

the functions are linear in the parameters, closed solutions are usually possible, while for functions nonlinear in the parameters, successive approximations are typically required and troublesome problems of convergence may be encountered.

We shall now see how configural prediction may be regarded as a special case of linear prediction of both type A and type B functions.

B. Suppose we have measures for a sample of v entities on a set of predictor

measures (zl . . . 2,) and also a standardized criterion variable y. We assume that the possible values on zi run from 1 to mi. The number of possible distinct patterns of values for the z variables is indicated by p. If mi is 2 for all n variables, then it is known that p = 2". ( 4 ) This is the case where the measures are dichotomous.

If some of the mi are greater than 2, then the possible number of patterns, p, is less than 2 Zmi. It would be precisely this value if any combination of the possible score values for a given zi were permissible. Since an entity can have only one of the possible values, the total number of distinct value patterns will be much less. The precise number of distinct possible patterns can be expressed as a function of the mi but this is not directly relevant to our discussion.

Suppose now that we indicate by dl . . d, the number of entities in the sample having each of the p possible answer patterns. We assume that each of the di is greater than zero. Obviously, we cannot differentiate on the basis of the zi values alone those entities which have the same pattern of z measures. Therefore, the best estimate of the criterion measure we can make for entities with a given score pattern is the average criterion value of those in the sample with that score pattern. We may indicate more formally what is going on. Let us regard each score pattern as a binary variable. If an entity has a given pattern, it is assigned a score of 1 on the pattern, otherwise a score of 0. Consider a v X p binary matrix L of such measures, and a vth order vector of criterion measures of the v entities. We then have the conventional regression equation given by Eq. (1)

Configural Predictions as a Special Case of Linear Prediction

E = y - L B

and may determine the regression vector B to give the best least square estimate of the criterion y in the sample, from the dichotomous or binary pattern variables.

The solution for B is well known to be given by Eq. (2)

B = ( L ' L ) ' ~ L'Y

The multiple correlation squared of a criterion variable with a set of independent variables is also well known to be one less the residual variance. Therefore, we have Eq. (3)

E'E R2 = 1 - - V

From Eqs. (1) and (2) we have Eq. (4)

and since y is standardized, from Eqs. (3) and (4) we have Eq. (5 )

386 PAUL HORST

If we let d be the pth order diagonal matrix of pattern frequencies, and y the vector of criterion means corresponding to the p patterns, then because of the definition of L and Eq. (5 ) ) we get Eqs. (6) and (7)

d = L’L

From Eqs. (5), (6), and (7) we get Eq. (8)

y‘d R2 = ,, Therefore, the multiple correlation squared based on the binary pattern variables is simply the mean of the weighted sum of squares of the criterion means corresponding to the distinct patterns.

Now from Eqs. (1) and (2) we get Eq. (9)

and from Eqs. (6), (7) and (9), we have Eq. (10)

Because of the definition of L, we see from Eq. (10) that E is simply a vector of the deviations of the criterion measures from the criterion means.

According to Eq. (3), therefore, the multiple correlation squared can also be expressed as one less the average squared deviation of the criterion values about the criterion means. This again is a generalization of the Lubin and OsburnC4) results. The relationships between these interpretations and a two-way analysis of variance are perhaps too obvious and trivial to merit more than passing notice. The implication that a correlation ratio or eta coefficient is a special case of a multiple correlation, where the matrix of independent variables is orthogonal and has binary measures, is also obvious but apparently not well known.

Next let us consider the estimate of the criterion variable as a multivariate polynomial function of the predictor variables z. Suppose we have given a p x n matrix z where the p rcw vectors are the distinct patterns of values for the x variables. Suppose also that we consider a set of p distinct product terms of the type given by Eq. (11)

jki i x = f i x

where the jki are nonnegative integers. For simplicity, we further restrict the jki so that the double summation over i = 1 to n and j = 1 to u is a minimum. This means in particular that for one of the Xj, all the k’s would be 0. and Xj would therefore be unity. We would also have the n cases where all the k’s would be zero except one, and that would be unity. Each of these n cases would obviously give Xi as one of the xi.

Next, suppose we construct a p X p matrix T where the ith row vector consists of the 1.1 distinct Tj values corresponding to the ith x value pattern. We assume that

the pth order T matrix is basic and therefore has a regular inverse. Intuitively, this assumption appears justified but no rigorous proof is presented.

We now let B be a pth order weighting vector and write Eq. (12)

where it will be recalled that y is the vector of criterion means.

From Eq. (12) we have Eq. (13)

T-l - B = Y

From Eqs. (12) and (13), therefore, we see that B can be determined so that its minor product with the i th row vector of T is precisely the corresponding yi or criterion average for the corresponding x pattern. It follows therefore that a configural or pattern prediction can be expressed as a linear function of the T variables, or as a multivariate polynomial function. Here again is a generalization of the demonstration by Lubin and Osburn (1957) for binary variables.

Suppose next that we have given a v X p matrix M obtained from T by re- peating each ith row di times, and the vth order vector of corresponding y measures.

This is then the matrix of all p of the T, and of the y measures for the sample of u entities.

We consider then the residual vector E given by Eq. (14)

E y - M B

The least square solution for B is given by Eq. (15)

B = ( M / M ) - ~ M'Y

But because of the definition of M it can readily be shown that M is given by Eq. (16)

M = L T

B = (T'L'L ~ 1 - l T ' L ' ~

From Eqs. (6) and (7) in (17) we have Eq. (18)

B = (T'd T)'l T'd F From Eq. (18) we have Eq. (19)

B = T - l

which is the same as Eq. (13).

388 PAUL HORST

The multiple correlation is given by Eq. (20)

‘M (M‘M) M‘y V

which because of Eqs. (6), (7), and (16) becomes Eq. (21)

and this is the same as Eq. (8). Therefore, the regression vector and the multiple correlation for the multi-

variate polynomial type B function are precisely the same as for a type A function with binary pattern variables.

Suppose now that only X of the possible p patterns of x values are represented in the sample. Then X may be substituted for p in the preceding arguments, and the same conclusions would follow.

It follows, therefore, that the best configural prediction of the criterion y which can be made for a given sample from a set of predictor variables x is identical to the best least square prediction based on a linear combination of the variable terms of a multivariate polynomial function.

These results are again generalizations of the Lubin and Osburn (1957) results. They have shown for the case of n dichotomous variables, scored either + 1 or - 1, that I.( is given by Eq. (22)

and have stated without proof the relations given by Eqs. (23) and (24)

T = T‘

They have stated further that in the case of polychotomous items, “the only change is that the number of possible answer patterns will. change.” A rigorous proof of Eqs. (23) and (24), and an extension to the general case, would be of interest.

We shall now turn our attention to a somewhat more general application of the multivariate polynomial function and its implications for multiple prediction.

C. Multivariate Transformation to a Multinormal Distribution Suppose we have given a criterion variable y for a large sample of v entities

with an arbitrary frequency distribution. Let us assume that coefficients of a polynomial function u of y can be found such that the frequency distribution of u adequately approximates a normal distribution.

Suppose also we have given the predictor variable zl, with some arbitrary frequency distribution for the sample of v entities. Let us assume that coefficients of a polynomial function z1 of x1 can be found such that the bivariate frequency distribution of u and z1 can be adequately approximated by the binormal frequency distribution function.

Next let us consider the predictor variable x2 with some arbitrary frequency distribution. Let us assume that coefficients of a multivariate polynomial function

22 of z1 and x2 can be found such that 22 is orthogonal to zl, and the trivariate frequency distribution of u, 21, and 22 can be adequately approximated by the trinormal frequency distribution function.

More generally, suppose we have given mutually orthogonal predictor variables z1 . . . zj and a criterion variable u such that the multivariate frequency distribution of u1 and z1 to zj can be adequately approximated by the multinormal frequency distribution function. We assume then that for the predictor variable zj + 1 coefficients of a multivariate polynomial function zj + I of z1 to zj, and zj + !, can be found such that zj 1 is orthogonal to each of the other zi, and the multivariate frequency distribution of u and z1 to zj + can be adequately approximated by the multinormal frequency distribution function.

Suppose now we begin by considering the predictor variable u and criterion variables 21 to zn with some arbitrary multivariate frequency distribution. If our assumptions are valid, it follows that coefficients of n different multivariate polynomial functions z1 to zn of z1 to zn can be found such that the zi are all mutually orthogonal, and the multivariate frequency distribution of u and z1 to z,, can be adequately approximated by a multinormal frequency distribution function.

Before discussing the further implications of these assumptions let us present some justification for them. Consider first the assumption that for a variable y, coefficients of a polynomial function u of y can be found such that the frequency distribution of y for sample v can be adequately approximated by the normal distribution function. As a criterion of adequate approximation, we shall require that some specified number of the first moments of the frequency distribution of u shall be the same as those of a normal distribution with mean zero and variance unity.

It has been well known but sometimes forgotten that the even moments of the normal distribution function are given by Eq. (25)

- - 2 k - 1 ) ! '2k *k!l. (k - 1) !

and the odd moments of course vanish so that we have Eq. (26)

'2k-1 = o

for positive integral values of k. Suppose we let my be the number of distinct score values in the vth order y

vector, and assume without loss of generality that these run from 1 . . my. Let ,v be a vector of these numbers and yf a vector of the relative frequencies in the y vector corresponding to the ,vi.

We define a vector ,v by Eq. (27)

where the parenthetical exponents indicate that each element of the ,v vector has been raised to the indicated power. We define a vector a by Eq. (28)

390 PAUL HORST

We let =L be a v x my binary matrix indicating the score value of each of the u entities, so that we can write Eq. (29)

L V y = Y Y

Consider then Eq. (30)

U = L V a Y Y

where a is to be determined so that a specified number of the first moments of u are the same as for the normal distribution. We define a vector H by Eq. (31)

H = V a Y

From Eqs. (30) and (31) we have Eqs. (32) and (33)

u = L H Y

From the definition of moments, the kth moment of u is given by Eq. (34)

From Eq. (33) and (34) we have Eq. (35)

1' L H ( k ) - Y 'k - V

From the definition of .L the frequency vector is given by Eq. (36)

f' = 1' L Y

From Eqs. (35) and (36) the kth moment is given by Eq. (37)

f'H(k) - 'k - V

The problem, then, is to determine the vector a so that the even moments are given by Eq. (38)

f'H(2k) - - (a - 1) ! 2k -1 (k - 1) !

CONFIGURAL ANALYSIS AND PATTERN RECOGNITION 39 1

and the odd moments by Eq. (39)

Ordinarily, the number of moments solved for would be less than my, and a least square solution would be required. In any case, of course, the solution for the elements of a would require iteration procedures, since Eqs. (38) and (39) are nonlinear in the elements of a.

The above indicates the feasibility of a polynomial transformation of y to u so that the elements of the latter approximate a distribution whose first i moments are the same as those of a normal distribution.

Next we wish to support the assumption that, given u and XI, we can find coefficients of a polynomial function z1 of XI such that the bivariate frequency distribution of u and z1 can be adequately approximated by the binormal frequency function. As a criterion of adequate approximation we shall also require that the first i moments of the frequency distribution of z1 shall approximate the corresponding moments of the normal distribution. But if two variables are normally distributed, their joint distribution function need not be binormal. The binormal frequency function of z and u is given by Eq. (40)

We must also assume therefore that product moment functions of the bivariate frequency distribution are equal to those of the binormal frequency function. This assumption is indicated by Eq. (41)

To develop justification for this assumption and those which follow it, as well as for subsequent developments, we shall therefore require evaluations of certain multiple definite integrals involving the multinormal distribution function.

First, let us consider a set of multinormally distributed variables V = (ul, . . ., vn, u) with the Eli all mutually orthogonal. We let T be a vector of the correlations of the vi with u, and define the supermatrix t by Eq. (42)

392 PAUL HORST

From Eq. (42) we have Eq. (43)

-r' t-1 =

We define a matrix R by Eq. (44)

R = t t '

From Eqs. (42) and (43) we get Eq. (45)

so that R is the matrix of intercorrelations of the v variables and u.

Next we define t#~ by Eq. (46)

0 = v I t ' - l t-l v

z = t-1 v

We define new variables Z by Eq. (48)

Then from Eq. (48) we get Eq. (49)

-1 V = t z We partition the V and Z vectors as in Eqs. (50) and (51) respectively

v ' = (vl, u)

Z' = (z ' , w)

CONFIGURAL ANALYSIS AND PATTERN RECOGNITION

From Eqs. (43), (48), (50), and (51), we get Eqs. (52) and (53)

From Eqs. (42), (49), (50), and (51), we get Eq. (54)

Consider next the multinormal frequency function given by Eq. (55)

f (V) = A ."I2 where A is given by Eq. (56)

We define a generalized multivariate product moment by Eq. (57)

kl kn ku n ... z u f (V) d z1 ... d zn d u P (kl, ... kn, k ) =

We define the multinormal frequency function of the zi and w by Eq. (58)

From Eqs. (54), (57), and (58) we get Eq. (59)

z1 ... Z kn (r'z + WIku f(2) d zl.. . d zn d w n

394 PAUL HORST

Next we define the scalar Ci by Eq. (60)

P (kl ,... kn, k) = (1 - r'r)k/2/.../ z1 kl ... zn kn (C 'z + Y) k f (2) d zl...d zn d Y

To simplify notation we define additional symbols by Eqs. (62), (63), and (64)

;x = c ' z

n k kl K = z1 ... z n From Eqs. (61), (62), (63) and (64), we get Eq. (65)

P (kl, ... kn, k) = gk K (X + W) k f (2) d zl...d 2 d u n

We can then write Eq. (66)

k p - 3 J J! (j - k)! (A + = k! C

j =o Let us now define functions as in Eqs. (67) and (68)

From Eqs. (58), (66), (67) and (68), we have Eq. (69)

7 w’ f (w) d w P (kl,..kn, k) = gk k! C [ d l .../ K kk-’ 1 (2) d zl...d zn 1 * + m

Q a - j=O

Now if j is odd we have Eq. (70)

wJ f (w) d w = 0

and if j is even we get Eq. (71)

We let j be given by Eq. (72)

j = 2 m

Then because of Eqs.

k’ P(kl, . ,kn, k ) = gk k! C

(70), (71), and (72) we can write Eq. (73)

[ X(k-an) f ( z ) d zl...d Zn 1 1

m! (k - 2 m): a

k k- 1 where k’ = - if k is even, and - if k is odd. 2 2 Obviously, the powers of X in the integrand of Eq. (73) can be expanded as

a sum of terms involving the appropriate multinomial coefficients and products of powers of the zi so that the integrand itself becomes a SUM of terms which are products of powers of the zi and a multinomial coefficient. The multiple integral of each term can then be expressed as a product of moments of single variables. Obviously, then, the multiple integral of all terms involving at least one odd power will vanish, and the others can be calculated by means of Eq. (71). The number of nonvanishing terms is of course calculable and doubtless small compared to the total number of terms.

396 PAUL HORST

Let us now return-to the assumption given by Eq. (41). This can be expressed with Eq. (73) by letting n, the number of predictor variables, be 1. In this case we can write Eq. (74)

v = P (i, k )

and from Eqs. (63) and (73) we can write Eq. (75)

f ( 2 , ) d z1 cl*-”! 2i+k-” 1

h m! (k - a)! k’

m=O P (I, k ) = (1 - rl 2)k/2 k! C

In particular, if k = 1 and m = 0, Eq. (75) can be written as Eq. (76)

If i is even Eq. (76) vanishes, and if it is odd, then because of Eqs. (60), (71), and (76) we have Eq. (77)

i ! P (i, 1) = rl (i-W (i - 1)

In particular, if i is 1 we get Eq. (78)

1 P (1, 1) = r

as obviously we should. For the general case in Eq. (75), if i + k is odd, we have Eq. (79)

P (i, k ) = 0

If i + k is even, then from Eq. (71) we have Eq. (80)

Suppose now we define the scalar kGm by Eq. (81)

(i + k - 2 m - 1): k! i+k c = k m

- m - 1): (T -1) i + k m! (k - 2 m ) ! (- 2 2

2 k/2 k-2m c1 k m P (i, k ) = (1 - rl )

Then we define scalar qualities by Eqs. (83) thru (86)

CY = i + k - 2 m - l 2

a = k - 2 m 3

- m - 1 i + k a4 = - 2

Using Eqs. (83) through (86) in (81) we get Eq. (87)

a,! k!

and if m is increased by 1 we have Eq. (88)

(a2 - 2 ) : - C - a1 (a3 - 2 ) : (a4 - 1): ( m + 1): k m + l

From Eqs. (84) and (8G) we have Eq. (90)

398 PAUL HORST

a (a - 1)

2 a2 ( m + 1) 3 3

k m G k m+l =

From Eqs. (84), (85), and (91) we have Eq. (92)

( k - 2 m)(k - 2 m - 1)

2 (I + k - 2 m - 1) ( m + 1) k m k m + l = G

for k > 2m+l. For k 2 2m+l, Eq. (81) must be used.

In particular for m = 0 we have Eq. (93)

- (i + k - l)! kGO - l + k

Therefore if i + k is even, because of Eqs. (73), (80) and (81) ,we have Eq. (94)

m=O P (i, k) = (1 - r 2 ) k/2 Clk-% G

k -2m (1 - r2)m P (i, k) = C kGm rl k'

or from Eq. (95) we have Eq. (96)

Suppose now we let z1 be a polynomial function of 21 as in Eq. (97) 2 z1 = (lbo + b x + b x + ...) 1 1 1 1 2 1

' k m m=O

G (r-2 - 1)" 2 (zl u k

1 = I V

From Eqs. (97) and (98) we may vary i and k and presumably solve for and the coefficients in Eq. (97). Again the solution would not be simple, and problems of convergence and uniqueness would have to be investigated.

We next consider Eq. (99)

where 9 is a multivariate polynomial in z1 and x2 and linear in the coefficients b i . We require that the trivariate frequency distribution in u, 21, 22 approximate adequately a trinormal frequency function in terms of the product moments P (kl, kz, k,) and also that z2 be orthogonal to zl.

The solution for the coefficients in Eq. (99) would involve Eq. (73) for the case of n = 2, and would be considerably more involved than for the case of 21.

Similarly, we may proceed successively with the remaining variables and by means of Eq. (73) hopefully and eventually obtain a matrix of transformed measures z1 to zn and u such that the n + l multivariate frequency distribution adequately approximates a multinormal frequency function with the z measures mutually orthogonal. In the process we would also have solved for the n correlation coefficients ri of the zi with u.

Since the z variables are mutually orthogonal, it is well known that the square of the multiple correlation of u with the z variables is precisely the sum of the squares of the correlations of the zi with u. The regression vector is of course the vector of correlations, T’ = ( r l , . . . T ~ ) .

Next let us inquire whether we can generate binary configural or pattern variables whose multiple correlation with the criterion, or the configural validity, will be increased over that of the best linear combination of the z variables. We have seen earlier that such a multiple correlation can also be expressed in terms of a multivariate polynomial in the variables with optimally determined coefficients given by Eq. (19). Let us consider therefore the v x n matrix given by Eq. (100)

and the vector given by Eq. (101)

where the parenthetical exponents mean that each element of s consists of a product of powers of the zi.

We indicate a type 3 supervector by Eq. (102)

7 = ( 2 , s)

We consider also the residual vector given by Eq. (103)

where 0 is to be determined so as to minimize the minor product moment of c. The solution is well known to be given by Eq. (104)

400 PAUL HORST

Because of the definitions of z and u we have Eqs. (106) and (107)

2'2 = I

z ' u = r

Now the scalar elements of s' z are given by Eq. (108)

kn " kl k +1 kn V kl kn+l 1 J "'ZJ ... Zn )...) c z1 ... 2 ".Zn )...) c z1

V kl+l

1 n 8 ' 2 = ( 2 z1

and for s'u we have Eq. (109)

U 8'U = r. z1 kl ... 2 kn

U h g integrals instead of the summation in Eq. (109) we have Eq. (110)

n k - = u f (z l...z u) d z1 ... d zn d u v n n'

where f is the multinormal distribution function of the zi and u. Now the right of Eq. (110) is the same form as Eq. (57) but with k, = 1.

We can therefore consider the special case of Eq. (73) with k-2m = 1 and we have Eq. (111)

'n - = 8'U V gl T..r K k f (zl ... zn) d z1 ...

From Eqs. (42), (43), (44) and (111) we have Eq. (112)

h b kl kn

'n = 4-1 ...I z1 ... zn (C1 z1 +...+ C z ) f (z l...z ) d zl... n n n

Consider also thejth term of Eq. (108). This can be written z19 Eq. (113)

kn k +1 3 ztl

*..z f ( 2 l...z ) d zl... n n V

which may be written as Eq. (114)

Now Eq. (114) will be nonvanishing if and only if all the ki except kj are even. If, however, all the ki are even except k,, then all except the j th element in Eq. (70) must vanish. Therefore, a t most, only one element in Eq. (118) can be distinct from zero.

Expanding Eq. (112) we can show similarly that, at most, only one term can be nonvanishing and this term will be the one correspanding to the nonvanishing term in Eq. (118). Using Eq. (M), it can be shown that the nonvanishing term in Eq. (112) is the same as the nonvanishing term in Eq. (118) multiplied by the corresponding correlation coefficient. We may therefore write Eqs. (115) and (116)

8 ' 2 = G e 3 J

s'u = G e ' r 3 3

where Gj is given by Eq. (114) and assumed to be the nonvanishing element, if any.

402 PAUL HORST

From Eqs. (105), (106), (107), (115) and (116) we may now write Eq. (117)

But from Eq. (117) we can write Eq. (118)

= J 3 Ie I G:/:r[G 3 J I "1.::][ 1) which enables us a t once to write Eq. (119)

= L:] We may readily generalize to any number of distinct product terms of the

type s in Eq. (101) so that we have a matrix of product elements given by Eq. (120)

For each vector of s we would get a corresponding elementary vector e and its corresponding G scalar. We can then write Eqs. (115) and (116) respectively as Eq. (121) and (122)

6 ' 2 = DG E'

where E' is a binary matrix with a single "1" in each row and DC is a diagnoal matrix of G's. Then generalizing we can write Eq. (123)

which reduces a t once to Eq. (124)

The "0" on the right in Eq. (124) is now a subvector rather than the scalar in Eq.

It can be readily shown that the multiple correlation squared for any number of product terms is given by Eq. (125)

(119).

We have shown therefore that the higher order product terms have zero weight in the regression vector and add nothing to the multiple correlation.

It should be noted that the proof of the nonpredictive value of higher order product terms does not necessarily depend on the multinormal character of the variables. To see this, we write Eq. (126)

kl k +1 kn kl k +1 kn

.l V z1 ... zJ ... zn fZ ( Z l...~n) d zl...d z =

z z1 ... z J ... 2

1 an a

where the a's and b's are the lower and upper limits, respectively, of the distribution. Assume also we can write Eq. (127)

1 *n k

kn ;"'" -[!../'r Z> ... zn (r'z + cw w) fz (Z l...~n) f W (w) d zl...d zn d w dl a i n w

If the z's and u have zero means then Eq. (127) can be written as Eq. (128)

bn kl kn kl kn z z1 ... z

" = /!. ./ z1 . . .zn ( r1 zl.. .rn zn) fZ ( zl.. .zn) d zl.. . 'n a1 in

404 PAUL HORST

Then because of Eq. (108) and (126) we can write Eq. (129)

6 ' 2 = (a1, ... u*) = u'

and from Eqs. (log), (126) and (128) we can write Eq. (130)

sly = u'r

In general now s may be a matrix of product function vectors. We may then write the more general form of Eq. (118) as Eq. (131)

B 8's u]- l [ :I r

which however again yields Eq. (84) for /3 and Eq. (125) for R2, both independent of the product functions.

To what extent this generalization frees us from the assumptions of multinormality of the variables requires further investigations.

In any case we can state with certainty that if a set of n variables x can be transformed to a set of z mutually orthogonal variables, multinormal with u, that no higher order products, and therefore no configural variables can add to the multiple correlation with u. Therefore, configural prediction cannot be useful with multinormally distributed variables. We have also shown that other classes of multivariate distributions exist for which configural prediction is not useful.

Suppose then that the variables are multinormally distributed in the population but not in the sample. Does this mean that configural prediction weights will be useful if determined from the sample? This would suggest that sample parameters are better for prediction purposes than population parameters, which few are yet willing to admit.

The special case of binary variables suggests some interesting questions. Obviously, one cannot begin with a single vector of binary measures and by a polynomial transformation get a vector whose elements are normally distributed. One may however transform a matrix of binary measures by means of more general multivariate polynomial transformations. The conditions under which vectors of the transformed matrix would approach multinormality with respect to their elements would doubtless be an interesting and perhaps fruitful subject for research.

SUMMARY AND CONCLUSIONS Four types of mathematical functions appropriate for prediction purposes

were discussed. One of these is linear in the parameters and non-linear in the variables. A special case of this function is a multivariate polynomial in which $he parameters to be determined from experimental data are the coefficients of the polynomial. This type of function is regarded as particularly appropriate for the analysis of variables with complex interrelationships such as behavior variables.

Configural prediction was formulated as a special case of linear prediction. The work of Lubin and Osburn'') is generalized.

It was assumed that for a multivariate set of experimental data, a multivariate polynomial transformation of the data could be found which transforms the set into one which closely approximates multinormality. hf athematical justification for this assumption was developed and the detailed assumptions required for the development merely pointed out. It was proved that if a multivariate set of data is multinormal, the addition of product terms cannot increase prediction accuracy. Therefore, it appears that only if a set of data cannot be transformed to multi-

normality will it be possible to increase prediction accuracy by configural or pattern recognition techniques.

In cases of binary data a t least, multinormal transformations are not possible. Therefore, in such cases configural or pattern recognition techniques may increase predictive or prognostic efficiency.

REFERENCES 1. ALF, E. F. Coniigural scoring and prediction (Doctoral dissertation, University of Washington) Technical Report, Public Health Research Grant M-743(C2). University of Washington Division of Counseling and Testing Services, November, 1956.

2. Fox, LESLIE A. Configural analysis. Technical Report, Office of Naval Research Contract Nonr. 477(33) and Public Health Research Grant 5 R01 MH00743-12, University of Washington, July 1967.

3. HORST, P. Pattern analysis and configural scoring. J. elin. Psychol., 1954, 10, 3-11. 4. LUBIN, A. and OSBURN, H. G. A theory of pattern analysis for the prediction of a quantitative criterion. Psychumetrika, 1957, 29, 63-73.

5. LUNNEBORG, C. E., JR. Dimensional analysis, latent structure, and the problem of patterns. (Doctoral dissertation, University of Washington) Technical Report, Office of Naval Research Contract Nonr477(08) and Public Health Research Grant M-743(4). University of Washington, September, 1959.

6. McQuxn~ , L. L. Agreement analysis: classifying persons by predominant patterns of responses. Brit. J. statist. Psychol., 1956,9,5-16.

7. 8. WAINWRIGHT, G. E. Configud information in item res onses. (Doctoral dissertation, Uni- versity of Washington) Technical Report, Office of Naval L e a r c h Contract Nonr-477(33) and Public Health Research Grant M H q 0 7 4 e . University of Washington, September, 1965.

9. ZUBIN, J. A technique for measuringhke-mindedness. J. abnorrn. soc. Psychol., 1938, SS, 508-516.

MEEHL, P. E. Configuralscorin . J. wnault. Psychol., 1950,l4, 165-171.

configural analysis and pattern recognition

Documents

dr. ng teck khim cs4243 computer vision and pattern...

pattern recognition: statistics to deep...

stprtool pattern recognition

introduction to pattern recognition. pattern recognition

human pattern recognition

pattern recognition & matlab intro: pattern recognition,...

multivariate pattern recognition for...

raf103f mynsturgreining pattern recognition · pattern...

pattern recognition 1

visual pattern recognition

pattern recognition systems

pattern recognition paper

memristor pattern recognition

pattern recognition techniques for boson sampling...

pattern recognition •why is pattern recognition …pattern...

1 perceptual processes introduction pattern recognition...

pattern recognition concepts

pattern recognition •why is pattern recognition...

1 chapter 1 introduction. 2 what is pattern recognition?...

pattern recognition receptors in immune...