# [Wiley Series in Probability and Statistics] Methods of Multivariate Analysis (Rencher/Methods) || Graphical Procedures

Post on 09-Dec-2016

216 views

Category:

## Documents

3 download

Embed Size (px)

TRANSCRIPT

• CHAPTER 16

GRAPHICAL PROCEDURES

In Sections 16.1, 16.2, and 16.3, we consider three graphical techniques: multidi-mensional scaling, correspondence analysis, and biplots. These methods are de-signed to reduce dimensionality and portray relationships among observations or variables.

16.1 MULTIDIMENSIONAL SCALING

16.1.1 Introduction

In the dimension reduction technique called multidimensional scaling, we begin with the distances Sij between each pair of items. We wish to represent the n items in a low-dimensional coordinate system, in which the distances d^ between items closely match the original distances ^, that is,

d^ = Sij for all i, j .

The final distances d^ are usually Euclidean. The original distances 6ij may be actual measured distances between observations y, and y^ in p dimensions, such as

Methods of Multivariate Analysis, Third Edition. By Alvin C. Rencher and William F. Christensen 5 5 5 Copyright 2012 John Wiley & Sons, Inc.

• 5 5 6 GRAPHICAL PROCEDURES

• MULTIDIMENSIONAL SCALING 5 5 7

3.

and only if B is positive semidefinite of rank q (Schoenberg 1935; Young and Householder 1938; Gower 1966; Seber 1984, p. 236).

Since B is a symmetric matrix, we can use the spectral decomposition in (2.109) to write B in the form

B = V A V , (16.3)

where V is the matrix of eigenvectors of B and is the diagonal matrix of eigenvalues of B. If B is positive semidefinite of rank q, there are q pos-itive eigenvalues, and the remaining n q eigenvalues are zero. If = diag(Ai, 2 , . . . , Xq) contains the positive eigenvalues and Vi = (v1 ; v 2 , . . . , vg) contains the corresponding eigenvectors, then we can express (16.3) in the form

B = VjAjVi

= ZZ'

where

V X A} / 2 = ( \ATvi, V/A^V2, ..., \f\qVq) =

K\

VJ

(16.4)

4. The rows '\i "li ,z'n of Z in (16.4) are the points whose interpoint dis-

5.

6.

tances d^ = (z* ZJ) '(ZJ Zj) match the Sij's in the original distance matrix D, as noted following (16.2).

Since q in (16.4) will typically be too large to be of practical interest and we would prefer a smaller dimension fc for plotting, we can use the first k eigenvalues and corresponding eigenvectors in (16.4) to obtain n points whose interpoint distances d^ are approximately equal to the corresponding Si i s.

If B is not positive semidefinite, but its first k eigenvalues are positive and relatively large, then these eigenvalues and associated eigenvectors may be used in (16.4) to construct points that give reasonably good approximations to the ij's.

Note that the method used to obtain Z from B closely resembles principal com-ponent analysis. Note also that the solution Z in (16.4) is not unique, since a shift in origin or a rotation will not change the distances d^. For example, if C is a q x q orthogonal matrix producing a rotation [see (2.101)], then

(CZi - CZjYiCZi - CZj) = (Z; - ZjYC'CiZi - Zj) = ( z i - z j ) ' ( z i - z j ) [see (2.103)].

Thus the rotated points Cz, have the same interpoint distances d^.

• 5 5 8 GRAPHICAL PROCEDURES

EXAMPLE 16.1.2(a)

To illustrate the first four steps of the above algorithm for metric multidimen-sional scaling, consider the 5 x 5 distance matrix

D = (Sij) = 2/2 0 4 2^2 4 0 2y A^/2 4

V 2N/2 4 Ay/2

The matrix A = (^6j') in step 1 is given by

A =

/0 4 4 4 4 0 8 16 4 8 0 8 4 16 8 0

\A 8 16 8

/

A 0 4

4 \ 8 16 8

! 4 4v^

4 0

For the means, we have . = a.i = 16/5, a*. = a.j = 36/5, i = 2 , . . . , 5, a,, - 32 /5 . With n = 5, the matrix B in step 2 is given by

B

The rank of B is clearly 2. For step 3, the (nonzero) eigenvalues and corre-sponding eigenvectors of B are given by = 16, 2 = 16,

V l

5J)A( I-5 / V 1*) = 5 ) (

0 0 0

\o

0 8 0

- 8 0

0 0 8 0

- 8

0 - 8

0 8 0

o\ 0

- 8 0 8 /

L

( \ kV2

0 -hV2

, v2 =

V " o/ have by (16.4)

( \ , / 2 2 ) =

( \ 0 *vs

0 V - \ ^ )

/ 0 0 \ 2 ^ 0

0 2^2 - 2 V2 C )

\ 0 - 2 ^ 2 /

It can be shown (step 4) that the distance matrix for these five points is D. The five points constitute a square with each side of length 4 and a center point at the origin. The five points (rows of Z) are plotted in Figure 16.1. D

• MULTIDIMENSIONAL SCALING 5 5 9

Figure 16.1 Plot of the five points found in Example 16.1.2(a).

EXAMPLE 16.1.2(b)

For another example of metric multidemensional scaling, we use airline dis-tances between 10 US cities, as given in Table 16.1 (Kruskal and Wish 1978, pp. 7-9).

Table 16.1 Airline Distances Between Ten US Cities

City

1 2 3 4 5 6 7 8 9 10

1

0 587 1212 701 1936 604 748 2139 2182

543

2

587 0 920 940 1745 1188 713 1858 1737 597

3

1212

920 0 879 831 1726 1631 949 1021

1494

4

701 940 879 0

1374 968 1420 1645 1891 1220

5

1936

1745 831 1374 0

2339 2451 347 959 2300

6

604 1188 1726 968 2339 0

1092 2594 2734

923

7

748 713 1631 1420 2451 1092

0 2571

2408 205

8

2139

1858 949 1645 347 2594

2571 0 678 2442

9

2182

1737 1021 1891 959 2734

2408 678 0

2329

10

543 597 1494 1220 2300 923 205 2442

2329 0

Cities: (1) Atlanta, (2) Chicago, (3) Denver, (4) Houston, (5) Los Angeles, (6) Miami, (7) New York, (8) San Francisco, (9) Seattle, (10) Washington, DC.

• 5 6 0 GRAPHICAL PROCEDURES

Figure 16.2 Plot of the points found in Example 16.1.2(b).

The points given by metric multidimensional scaling are plotted in Figure 16.2. Note that in the plot, dimensions 1 and 2 roughly correspond to easting and northing, but the sign for each of these dimensions is arbitrary. That is, the eigenvectors v in (16.4) are normalized but are subject to multiplication b y - 1 . D

16.1.3 Nonmetric Multidimensional Scaling

Suppose the m = n(n l ) /2 dissimilarities Sij cannot be measured as in (16.1) but can be ranked in order,

r2s2 < < SrmSm, (16.5)

where riSi indicates the pair of items with the smallest dissimilarity and rmsm rep-resents the pair with greatest dissimilarity. In nonmetric multidimensional scaling, we seek a low-dimensional representation of the points such that the rankings of the distances

driSl < dr2S2 < < drmSm (16.6)

match exactly the ordering of dissimilarities in (16.5). Thus, while metric scaling uses the magnitudes of the ij 's, nonmetric scaling is based only on the rank order of the ^ 's.

For a given set of points with distances d^, a plot of d^ versus ^ may not be monotonic; that is, the ordering in (16.6) may not match exactly the ordering in (16.5). Suitable tZ -^'s can be estimated by monotonic regression, in which we seek

• MULTIDIMENSIONAL SCALING 5 6 1

values of d,j to minimize the scaled sum of squared differences

^2 Ei^dij dl3f ( i 6 7 )

subject to the constraint

" r i S i ^*22 ' * * ^

where riSi,r2S2, . ,rmsm are defined as in (16.5) and (16.6) (Kruskal 1964a, 1964b). The minimum value of S2 for a given dimension, k, is called the STRESS. Note that the d^'s are not distances. They are merely numbers used as a reference to assess the monotonicity of the dij's. The d^'s are sometimes called disparities.

The minimum value of the STRESS over all possible configurations of points can be found using the following algorithm.

1. Rank the m = n(n l ) /2 distances or dissimilarities ^ as in (16.5).

2. Choose a value of k and an initial configuration of points in k dimensions. The initial configuration could be n points chosen at random from a uniform or normal distribution, n evenly spaced points in fc-dimensional space, or the metric solution obtained by treating the ordinal measurements as continuous and using the algorithm in Section 16.1.2.

3. For the initial configuration of points, find the interitem distances d^. Find the corresponding d^'s by monotonic regression as defined above using (16.7).

4. Choose a new configuration of points whose distances dij minimize S2 in (16.7) with respect to the d^'s found in step 3. One approach is to use an iter-ative gradient technique such as the method of steepest descent or the Newton-Raphson method.

5. Using monotonic regression, find new di/s for the d^'s found in step 4. This gives a new value of STRESS.

6. Repeat steps 4 and 5 until STRESS converges to a minimum over all possible fc-dimensional configurations of points.

7. Using the preceding six steps, calculate the minimum STRESS for values of k starting at k = 1 and plot these. As k increases, the curve will decrease, with occasional exceptions due to round off or numerical anomalies in the search procedure for minimum STRESS. We look for a discernible bend in the plot, following which the curve is low and relatively flat. An ideal plot is shown in Figure 16.3. The curve levels off after k = 2, which is convenient for plotting the resulting n points in 2 dimensions.

There is a possibility that the minimum value of STRESS found by the above seven steps for a given value of k may be a local minimum rather than the global

• 5 6 2 GRAPHICAL PROCEDURES

Figure 16.3 Ideal plot of minimum STRESS versus k.

minimum. Such an anomaly may show up in the plot of minimum STRESS versus k. The possibility of a local minimum can be checked by repeating the procedure, starting with a different initial configuration.

As was the case with metric scaling, the final configuration of points from a non-metric scaling is invariant to a rotation of axes.

EXAMPLE 16.1.3

The voting records for fifteen congressmen from New Jersey on nineteen en-vironmental bills are given in Table 16.2 in the form of a dissimilarity matrix (Hand et al. 1994, p. 235). The congressmen are identified by party: R\ for Re-publican 1, D

• MULTIDIMENSIONAL SCALING 5 6 3

Table 16.2 Dissimilarity Matrix for Voting Records of 15 Congressmen

Ri R2 Di D2 R3 RA s D3 D4 Ds D6 Re Rv Rs D7

Ri

R2

I>i D2 Rs R4

Rs D3 >4 D5 De Re Ri Rs D7

0 8 15 15 10 9 7 15 16 14 15 16 7 11 13

8 0 17 12 13 13 12 16 17 15 16 17 13 12 16

15 17 0 9 16 12 15 5 5 6 5 4 11 10 7

15 12 9 0 14 12 13 10 8 8 8 6 15 10 7

10 13 16 14 0 8 9 13 14 12 12 12 10 11 11

9 13 12 12 8 0 7 12 11 10 9 10 6 6 10

7 12 15 13 9 7 0 17 16 15 14 15 10 11 13

15 16 5 10 13 12 17 0 4 5 5 3 12 7 6

16 17 5 8 14 11 16 4 0 3 2 1 13 7 5

14 15 6 8 12 10 15 5 3 0 1 2 11 4 6

15 16 5 8 12 9 14 5 2 1 0 1 12 5 5

16 17 4 6 12 10 15 3 1 2 1 0 12 6 4

7 13 11 15 10 6 10 12 13 11 12 12 0 9 13

11 12 10 10 11 6 11 7 7 4 5 6 9 0 9

13 16 7 7 11 10 13 6 5 6 5 4 13 9 0

We now use a different initial configuration of points drawn from a uniform distribution. The resulting plot is given in Figure 16.6. The approximate orien-tation of the points in the two-dimensional space is similar, with the exception of the locations of R2 and 5.

Figure 16.4 Plot of STRESS for each value of k for the voting data in Table 16.2.

• 5 6 4 GRAPHICAL PROCEDURES

Figure 16.5 Plot of points found using initial points from a multivariate normal distribu-tion.

Figure 16.6 Plot of points found using initial points from a uniform distribution.

• CORRESPONDENCE ANALYSIS 5 6 5

Figure 16.7 Plot of points found using initial points from a metric solution.

We next use a third initial configuration of points resulting from the metric solution as described in Section 16.1.2. The resulting plot is given in Figure 16.7. All three plots are very similar, indicating a good fit. D

16.2 CORRESPONDENCE ANALYSIS

16.2.1 Introduction

Correspondence analysis is a graphical technique for representing the information in a two-way contingency table, which contains the counts (frequencies) of items for a cross-classification of two categorical variables. With correspondence analysis, we construct a plot that shows the interaction of the two categorical variables along with the relationship of the rows to each other and the columns to each other. In Sections 16.2.2-16.2.4, we consider correspondence analysis for ordinary two-way contingency tables. In Section 16.2.5 we consider multiple correspondence analysis for three-way and higher-order contingency tables. Useful treatments of correspon-dence analysis have been given by Greenacre (1984), Jobson (1992, Section 9.4), Khattree and Naik (1999, Chapter 7), Gower and Hand (1996, Chapters 4 and 9), and Benzecri (1992).

To test for significance of association of the two categorical variables in a con-tingency table, we can use a chi-square test or a log-linear model, both of which represent an asymptotic approach. Since correspondence analysis is associated with the chi-square approach, we will review it in Section 16.2.3. If a contingency table

• 5 6 6 GRAPHICAL PROCEDURES

Table 16.3 Contingency Table with a Rows and b Columns

Rows

1 2

a

Column Total

1

n n 2 1

rial

n.i

Columns

2

n i 2

2 2

rial

71.2

b

nib

ri2b

nab

n.b

Row Total

ni.

ri2.

na.

n

has some cell frequencies that are small or zero, the chi-square approximation is not very satisfactory. In this case, some categories can be combined to increase the cell frequencies. Correspondence analysis may be useful in identifying the categories that are similar, which we may thereby wish to combine.

In correspondence analysis, we plot a point for each row and a point for each column of the contingency table. These points are in effect projections of the rows and columns of the contingency table onto a two-dimensional Euclidean space. The goal is to preserve as far as possible the relationship of the rows (or columns) to each other in a two-dimensional space. If two row points are close together, the profiles of the two rows (across the columns) are similar. Likewise, two column points that are close together represent columns with similar profiles across the rows (see Section 16.2.2 for a definition of profiles). If a row point is close to a column point, this combination of categories of the two variables occurs more frequently than would occur by chance if the two variables were independent. Another output of a correspondence analysis is the inertia or amount of information in each of the two dimensions in the plot (see Section 16.2.4).

16.2.2 Row and Column Profiles

A contingency table with a rows and b columns is represented in Table 16.3. The entries n^ are the counts or frequencies for every two-way combination of row and column (every cell). The marginal totals are shown using the familiar "dot" notation: m. )= nij and n.3 = )"= 1 nzj. The overall total frequency is denoted by n instead of n.. for simplicity: n = nij-

The frequencies n,j in a contingency table can be converted to relative frequencies Pij by dividing by n: p^ = n,j /. The matrix of relative frequencies is called the correspondence matrix and is denoted by P :

P = (Pa) = (nij/n). (16.8)

In Table 16.4 we show the contingency table in Table 16.3 converted to a correspon-dence matrix.

• CORRESPONDENCE ANALYSIS

Table 16.4 Correspondence Matrix of Relative Frequencies

567

Rows

1 2

a

Column Total

1

p n P21

Pal

P-i

Columns

2

Pl2

P22

Pa2

P.2

b

Plb

P2b

Pab

P.b

Row Total

Pi.

P2.

Pa.

1

The last column of Table 16.4 contains the row sums ,. =- This column vector is denoted by r and can be obtained as

r = P j = (pi.,P2.,---,Pa.Y = (ni./n,n2./n,...,na./n)',...