[wiley series in probability and statistics] applied multiway data analysis || graphical displays...

CHAPTER 11

GRAPHICAL DISPLAYS FOR COMPONENTS

A picture is worth a thousand words. ’

11.1 INTRODUCTION

Various kinds of supplementary information are necessary for an in-depth interpretation of results from a multiway principal component analysis. In this chapter we will discuss various types of plots useful in multiway component analyses, review different methods to plot components, and pay attention to plots of single modes as well as plots that portray more than one mode at a time. Elsewhere in this book other types of plots in use in multiway analyses are discussed, such as plots to assess model

“‘One look is worth a thousand words” was coined by Fred R. Barnard in Printers’ Ink, 8 Decem- ber 1921, p. 96. He changed it to “One picture is worth a thousand words” in Printers’ Ink, 10 March 1927, p. 114 and called it “a Chinese Proverb so that people would take it seriously.” http://www2. cs. uregina. cd-hepting/research/web/words/index. html. Accessed May 2007.

Applied Multiway Data Analysis. By Pieter M. Kroonenberg Copyright @ 2007 John Wiley & Sons, Inc.

257

258 GRAPHICAL DISPLAYS FOR COMPONENTS

fit (see Section 8 5 , p. 1791, sums-of-squares plots to assess the fit of lekels of modes (see Section 12.6.2, p. 290), and various residual plots (see Section 12.7, p. 292).

11.2 CHAPTER PREVIEW

The types of plots considered are scatter plots of coordinate spaces involving one or more modes. Considerations for constructing such plots are discussed throughout the chapter and examples are provided. First, we will discuss various options for displaying component spaces for a single mode. then, a variety of plots for more than one set of components in a single space, such as line plots. joint biplots, nested-mode biplots, and nested-mode per-component plots.

The major sources for the material presented in this chapter are Kroonenberg ( 1 983c, Chapter 6 ) . Kroonenberg (1987b), and Geladi. Manley. and Lestander (2003). but especially Kiers (2000a).

11.2.1

Most plots mi11 be demonstrated using the Sempe girls’ growth curves data. but in addition a plot from Andersson’s spectrometry data is included. one from a publication by Dai (1982), and one from a publication b j Nakamura and Sinclair (1995)

Data sets for the examples

Illustrative data set: Girls’ growth curves data. Thirty girls (first mode) were se- lected froin the French auxiological study ( 1953 to 1975) conducted under supervision of Michel Sempe (SempC. 1987). Between the ages of 4 and I5 they u.ere measured yearly (third mode) on a number of variables, eight of which are included here (second mode): weight, length. crown-coccyx length (crlump). chest circumference (chesr). left upper-arm circumference (am). left calf circumference ( c d f ) . maximal pelvic width (pelvis) , and head circumference ihertd)’. The data set is thus a three-way block of size 30 (girls) x 8 (variablesjx 12 (time points.) Before the three-mode analyses. the data were profile preprocessed; see Section 6.6.1. p. 130. One of the consequences is that all scores are in deviation of the average girl‘s profile. After preprocessing. the average girl’s profile contains zeroes on all variables at all ages. and therefore in many plots she is located at the origin of these plots. Figure 11.1 gives an impression of the growth curves of this average French girl. The variables weight, length, and crown-coccyx length have been divided by 10 to equalize the ranges somewhat; no further processing was done.

The data were fitted with a Tucker3 model with three components for the first mode (girls), three components for the second mode (variables). and two components for

‘Earlier \errions of the graphs can be found in Kroonenberg ( I987b): 1987 @Plenum Press: reproduced and adapted ni th kind permiscion of Springer Science and Business Rledia. The data can be obtained from the data set Eection of the website of The Three->lode Company; littp://rhr.ee-nz~,de.ietdenlriill:r~/. Accessed Ma> 2007.

GENERAL CONSIDERATIONS 259

400

200

01 I

Age

4 5 S 7 8 9 10 11 12 13 14 15

Figure 11.1 Girls' growth curves data: Growth curves of the average girl. The starred variables h a e been divided by 10 to reduce their ranges.

the third mode (years of age). This solution had a fit of 77% and was able to describe the most important aspects of the physical development of the 30 French girls.

11.3 GENERAL CONSIDERATIONS

As preparation for the material of this chapter, Chapters 1 and 2 of Krzanowski (2000) are warmly recommended. In those chapters concepts such as distances. inner products, direction cosines. subspaces, projections, and graphical interpretation of principal components are treated in great detail. especially from a user's point of view: also Appendix B provides some guidance for these concepts. In this chapter, we will not provide such a systematic introduction but explain various aspects when they arise.

11.3.1 Representations of components

As explained in more detail in Section 9.3.1. there are two versions of representing principal component analysis. We have referred to these as the coiariarzce form, in which the variables are in principal coordinates and the subjects in standard coordinates, and the distance form, in which the subjects are in principal coordinates and the variables in normalized coordinates. Because in the distance form the Euclidean distances between the subjects are preserved as well as possible given the reduced dimensionality of the space, it is the form of choice when one wants to portray the entities of a single mode. Detailed explanations of the correctness of plotting the distance form are presented in Legendre and Legendre (1998) and Kiers (2000a). The distance form is the following


(11.1)

where B is orthonormal, B’B = I,, and A* is orthogonal, A*’A* = G2, with G2 a diagonal matrix with the squares of the singular values g,,.

The next question is what happens to the other mode, which is in normalized coordinates, in particular, the variable mode in the distance form. In the full-dimensional space the original variables are the (almost certainly nonorthogonal) axes. After calculating the principal components in the full space, these components are the orthonormal coordinate axes that span the same J-dimensional full space as the variables. The variables are directions or variable axes in this space. When we restrict ourselves to the S major principal components, the variable axes are also projected onto this S-dimensional space. To investigate them, we can plot the variable axes in the reduced space using the rows of B , that is, ( b j l , . . . , b j s ) (see Kiers, 2000a, p. 159).

11.4 PLOTTING SINGLE MODES

In multiway analysis, either one treats the component matrices evenhandedly and plots all of them in principal coordinates, or one assigns the modes different roles and plots them in different metrics. In the former option, high-dimensional distances are correctly represented in the low-dimensional plots within the accuracy of the approximation. In the latter case, the variables are generally portrayed in principal coordinates to represent their correlations as well as possible and the subjects in normalized coordinates. If plotted together variables are generally represented as arrows and subjects as points.

11.4.1 Paired-components plots

With the termpaired-carnponentsplot we refer to plots in which the components of a single mode are plotted against each other in two- or higher-dimensional scatter plots. Such plots should have aspect ratios of 1, so that distances in the plots are properly portrayed.

Normalized versus principal coordinates. When the explained variabilities are almost equal, there will be few visual differences between the plots in normalized and principal coordinates, but differences can be considerable when the explained variabilities are very different. Note, however, that the principal coordinates are scaled by the square roots of the explained variabilities which makes the differences less dramatic. In Fig. 11.2, the proportional explained variabilities, which were 0.56 and 0.14 (a factor of 4), become 0.74 and 0.37 at the root scale (a factor of 2).

PLOTTING SINGLE MODES 261

Figure 11.2 Girls’ growth curves data: Difference between normalized and principal coordinates for the first mode (girls). Square roots of explained variability of the two components which serve a scaling constants are 0.74 and 0.37, respectively. For comparability the normalized components have been scaled so that the lengths of the coordinate axes in both plots are equal.

Figure 11.3 the interpretation of the subject space.

Coping data: Demonstration of the inclusion of external variables to enhance


Adding explanatory information. It always pays to make the plots as informative as possible. Especially when the subject or object space is being displayed, the figure typically looks like an unstructured set of points. There are several ways to enhance such plots, in particular, by using non-three-mode (background) information. If a cluster analysis has been carried out the subjects can be labeled according to these clusters, or they may be labeled by another grouping variable. External variables may be plotted in the space, such that they can be used for interpretation by establish- ing which subjects have high and which subjects have low values on these external variables. Assuming the subject space is centered and the axes are indicated by a1 and a2, one then regresses a standardized external variable ZY on the axes, that is, z y = Plal + ,&a2 + e. In the plot, the axis for i+ is drawn through the points (0,O) and (P I , P z ) ; in addition the squared multiple correlation indicates how much of the variable Y is explained by the axes. Figure 1 1.3 gives an example of adding external information for the full Roder (2000) Coping data (for an explanation of the data and the variables, see Section 14.4, p. 354).

An imaginative use of auxiliary information is contained in a plot published in a paper by Dai (1982) on the industrial design of chairs (Fig. 1 1.4).

11.4.2 Higher-dimensional spaces

When we have higher-dimensional spaces, a clear presentation on paper becomes more difficult and many attempts have been made to cope with this problem. The most effective solutions consist of dynamic graphics in which onecan spin the configuration in higher-dimensional, effectively three-dimensional, space along the axes.

Paired-components plots. Failing that, one may inspect the plots by pairs via a paired-componentsplot to create a mental three-dimensional picture. In Fig. 1 1.5 the variables of the girls’ growth curves data have been plotted as vectors. Because the variables are in principal coordinates and the appropriate input preprocessing has been applied (see Section 9.4, p. 218), the angles between the vectors are approximations to the correlations. If such conditions are not fulfilled, the angles can be interpreted as measures of association, the smaller the angles the higher the variables are associated. The circle in the plot is the equilibrium circk (see Legendre & Legendre, 1998, p. 398ff.), which can be used to assess how well variables are presented by the low- dimensional space given normalized coordinates. The closer the end points of the arrows are to the circle, the better the variables are represented in the space.

Minimum spanning tree. One problem with three-dimensional graphs is that it is not always clear whether two vectors that are close together in the graph are also close together in high-dimensional space. Both in a high-dimensional and a low- dimensional space, vectors or arrows from the origin are often only represented by their end points, and thus reference is often made to “points” rather than “vectors”. This usage is sometimes followed here as well. The end points of vectors might end


Figure 11.4 Chair-styles data: The inclusion of external information to enhance an object space. The Japanese text next to the vectors describes the adjectives used to characterize chairs. Source: Dai (1982); reproduced with kind permission from the Japanese Psychological Review.

up close together in a low-dimensional space because their projections just happen to lie close together. To solve that issue, a minimum spanning tree may calculated. Minimum spanning trees are generally based on distances in the full-dimensional space, that is, for each pair of variables J and 3’. First, such distances are calculated

over all subjects and conditions: d j j , = (C, C k ( x , J k - x , ~ , ~ ) ~ ) ~ ’ ~ . Then the minimum spanning tree is constructed in such a way that the sum of all the paths connecting the points (without loops) is the shortest possible. As closest neighbors in the full-dimensional space are connected, one must doubt the closeness of two points in a low-dimensional space when they are not. By drawing lines between closest points in the graph, one can judge more easily whether points that seem close in the low-dimensional space are actually close or not. The minimum spanning tree is equally useful for assessing the closeness of the individuals as portrayed in Fig. 1 1.2.


Figure 11.5 Girls’ growth curves data: Side-by-side paired components plots of the second mode (variables). The variables have been connected to the origin and scaled so that the angles indicate the similarities between the variables.

Figure 11.6 Girls’ growth curves data: Side-by-side paired-components plots of the variable space. The variables have been connected by a minimum spanning tree based on the three- dimensional configuration. The lengths of the connecting paths are indicated as well.

In this case the distances dii, are computed between pairs of individuals across all variables and conditions.

A poor man’s version for calculating distances between points is to calculate distances in the projected space if it has a higher dimension than two. In that way, one


Figure 11.7 Girls’ growth curves data: Three-dimensional representation of the subject space anchored by the coordinate axes and the projections onto the 1-2 plane, 1-3 plane, and the 2-3 plane. The figure shows the position of the girls in the three-dimensional space by vectors which are labeled with the girls’ sequence numbers. Points on the walls and the floor are the projections onto the 1-2 plane, 1-3 plane, and the 2-3 plane, respectively, and they are labeled with the first character of their labels.

can judge whether in the two-dimensional plane points are also close in the three- or higher-dimensional space. In Fig. 1 1.6 the minimum spanning tree is drawn for the variables of the girls’ growth curves data based on a three-dimensional configuration. From the first two dimensions one might be under the mistaken impression that head circumference is fairly close to pelvis, while it is not.

Three-dimensional graphs. Ideally we would like to look at the three-dimensional space itself, and a figure such as Figure 1 1.7 is an example of how it can be constructed. The three-dimensional impression is created by boxing-in the points and projecting their values on the walls and the floor. The walls and the floor are in fact the three paired-components plots. The coordinate axes are included as well as their projections on the walls and the floor. For such a plot to work there should not be too many points or it becomes too messy to interpret.

Size as a third dimension. Another alternative is to use different sizes of points to indicate their values in the third dimension (see Fig. 11.8). This option can be combined with the anchoring of the points in some plane, which can be either the base of the plot or the plane of the first two dimensions (see Fig. 11.9, taken from Nakamura and Sinclair, 1995, p. 104). For all representations, it is true that they only really work for a limited number of points with a reasonably clear structure.

Stereo-pairs and anagljphs. A further approach, which has seen very little appli- cation but should be useful for publications, is the use of stereo vision. Examples are


Figure 11.8 Girls’ growth curves data: Two-dimensional representation of the three- dimensional variable space. The sizes of the points indicate their values on the third axis. An open diamond indicates a negative value; a closed diamond a positive value.

stereo-pairs which consist of two graphs of the same space constructed in such a way that side-by-side they provide a stereo view of the space. Unfortunately, there are very few examples of the use of stereo-pairs, and not many standard computer programs produce such plots. For a general introduction see Holbrook ( 1997)3. Comparable variants are anaglyphs, which are two graphs, one in blue and the other in red, but with a slightly shifted picture, which when viewed through a pair of spectacles with blue glass for one eye and red glass for the other eye give the illusion of depth. To be able to employ these one needs programs4 to produce the graphs and, depending on one’s training, optical aids to see proper depth in the pictures. Three-dimensional spinning of the coordinate space generally has to precede the making of a stereo picture to find the most informative point of view to inspect the three-dimensional graph. For further information on three-dimensional stereo graphs see Huber (1 987).

11.4.3 All-components plots

When there is a natural order in the levels of a mode, it can be advantageous not to plot the components against each other, but to plot them against the levels of the mode. Depending on the spacing of these levels, this plotting can be done against the level number or against the real values of the levels. As an example, Fig. 1 1.10 depicts the two components from the age mode of the girls’growth curves data.

‘http://oxygen.vancouver:wsu.edu/amsreii/theory/holbrooklI-97. html. Accessed May 2007 ‘For an example see http:/hww.stereoqve.jp/soffware/index-e. html. Accessed May 2007.


Figure 11.9 World of woman data: Three-dimensional representation of a component space with anchoring of the points on the 1-2 plane. The picture displays the collection of words closely associated with the word woman as can be found in the “Bank of English’, the corpus of current English compiled by Cobuild. The corpus contains words appearing in the English language in sources like the BBC, books, the spoken word and The Times. Source: Nakamura & Sinclair (1995), Fig. 2, p. 104. 1995 @Oxford University Press. Reproduced with kind permission from the Oxford University Press.

In Fig. 1 1.10, the principal coordinates have been used to emphasize the relative importance of the two components. Given the patterns in the two components, to facilitate interpretation, one should consider a rotation toward a target of orthonormal polynomials to get the first component to portray the constant term, and the second one the deviation from this overall level. This results in Fig 1 1.1 1, but note that in that figure the components are normalized coordinates.

All-components plots are almost standard in chemical applications using the Parafac model, because there a component very often represents the spectrum of a specific analyte. This spectrum can be compared with the spectra of known analytes for identification. Figure 1 1.12 gives an example of the results from a four-component Parafac solution. It displays the input spectrum of excitation frequencies applied to a set of sugar samples5. Experimental details for this research, as well as full analyses for this type of data, can be found be in Bro (19994. The graph show a number of theoretically unacceptable anomalies such as the rising pattern at the end of one of the

5This picture was made using unpublished data collected by Claus Anderson, KVL, Copenhagen, Den- 111 ark.


Figure 11.10 Girls’ growth curves data: All-components plot. The horizontal ages consists of the age of the girls and the vertical axis displays the component coefficients in principal coordinates.

Figure 11.11 Girls’ growth curves data: All-components plot. The horizontal ages consists of the age of the girls and the vertical axis displays the component coefficients in normalized coordinates after rotating the age mode to an maximal smooth first component.


Figure 11.12 Anderson sugar data: Plot of Parafac components for the excitation mode in a fluorescence spectroscopy experiment on sugar. The irregularities in the spectra can be suppressed by introducing further restrictions on the components such as single-peakedness.

components. In his paper, Bro discusses such anomalies and how Parafac analyses can be improved via constraints on the components to regularize the results.

11.4.4 Oblique-components plots

Component spaces from a Parafac analysis are seldom orthogonal which poses some problems in plotting the components. In Section 1 1 S . 1 we will look at portraying the components across modes, but here our aim is to discuss plots of the components within a mode. Plotting nonorthogonal components as rectangular axes destroys any distance or correlation interpretation of the plot. Therefore Kiers (2000a) proposed computing an auxiliary orthonormal basis for the component spaces and projecting unit-length Parafac components (or any nonorthonormal axes for that matter) onto such a plot.

The following steps are necessary to construct a plot of the loadings of the variables in a two-mode situation in which the unit-length columns of A are nonorthogonal. First, one searches for a transformation matrix for the component matrix A, such that A* = AT is column-wise orthonormal; then the variable coefficients B are transformed by the inverse of T, that is, B” = B(T’)-’, which are the coefficients to be plotted.

Let us first write the Parafac model in its two-mode form X = BF’, with F = (A 8 C)G’, with G the rnatricized form of the S x S x S superdiagonal core array B with the gsss on the superdiagonal. To plot the variables in B, this matrix needs to be in principal coordinates; thus, BG is plotted with G an S x S diagonal matrix with the gsss on the diagonal. A transformation matrix Q is derived to find the


Figure 11.13 analysis. orthonormal “auxiliary axes”.

Girls’ growth curves data: Variable mode from a two-dimensional Parafac The Parafac components are plotted in the two-dimensional space spanned by

auxiliary orthonormal axes for F, such that F* = FQ, and B* = B(&’)-’ can be plotted. However, because the Parafac axes are unique, it is not enough to plot the variables with respect to the auxiliary axes, we also need to know the directions for the original Parafac axes in that space. As shown in Kiers (2000a, p. 159), the rows of (Q’)-’ are the required axes. The coordinates of the variables on the Parafac axes are found by projecting them perpendicularly onto these axes as indicated in the plot (see Fig. 1 1.13).

11.5 PLOTTING DIFFERENT MODES TOGETHER

The basic aim of carrying out a three-mode analysis is generally to investigate the relationships between the elements of different modes. To assist in evaluating this aim, it is desirable to have plots across modes. Given the unique orientation of the components of the Parafac model in combination with the parallel proportional profiles property, it seems sensible not to make spatial plots of these components, but to portray the sth components of the three modes in a single plot, a per-component plot or line plot6.

‘In the statistical literature there are other definitions for the term “line plot”, but in this book we will only use it in the sense defined here.

PLOTTING DIFFERENT MODES TOGETHER 271

11.5.1

To construct per-component plots, each term of the model a, @ b, @ c, is separately portrayed by plotting the coefficients for all modes along a single line. The argument for using principal coordinates is not applicable here, as we are not concerned with spatial representations but only across-modes comparisons. Therefore, one should use either the orthonormalized components or, even better, unit mean-square scaling of components. The advantage of the latter scaling is that it compensates for the differences in number of levels in the components of the three modes; see Section 9.4, p. 218.

Per-component plots can also be useful in the Tucker3 model when it is possible to interpret all components of all modes (quite a task), and when it makes sense to interpret each and every one of the terms of the model, that is, all ap @ b, @cr that have sizeable core elements, gpqr . Especially after core rotations that have achieved great simplicity in the core array coupled with (high) nonorthogonality of the components, per-component plots may be very useful. In this case, too, components scaled to unit mean-squares are indicated (for examples in the Tucker3 case see Kroonenberg & Van der Voort, 1987e).

Example per-componentplot: Girls’ growth curves data. Crucial in the interpretation of the per-component plot in Fig. 1 1.14 is that the mean scores per variable have been removed at each time point. This means that the girl with a zero score on the girl components has average growth curves for all variables and the scores portrayed here are deviations from the average girl’s growth curves. Thus, if the product of the three terms is positive for a girl, she is growing faster than the average girl; if it is negative, the girl is lagging behind in growth. As all variables and all years have positive coefficients, whether a girl grows faster or slower than average is entirely determined by the coefficients on the girl component. Differences in body length are the most dramatic; those in arm circumference are much less. Differences with the average girl peak around the 13th year and diminish after that. The reason for this is that some girls have their growth spurts earlier or later than the average girl. It would have been really nice if the measurements had continued for another three years to see whether the differences level off, as one would expect.

Per-component plots or line plots

11 5.2 Two-mode biplots

A standard (two-mode) biplot is a low-dimensional graph in which the rows (say, subjects) and the columns (say, variables) are displayed in a single plot. It is constructed by performing a singular value decomposition on the data matrix: X = AAB’, where A is the diagonal matrix with singular values, which are the square roots of the eigenvalues. The biplot technique often uses one of two asymmetric mappings of the markers. For example, in a row-metric preserving two-dimensional biplot (Gabriel & Odoroff, 1990), the row markers have principal coordinates ( X l a i ~ , X 2 a i 2 ) and the column markers have normalized coordinates ( b j l , b j z ) (Greenacre, 1993, Chapter 4).


Figure 11.14 Girls’ growth curves data: Per-component plot from a two-dimensional Parafac analysis with first component of each of the three modes;. Left: variables; Middle: Age; Right: Girls. Abbreviations: CrRump = Crown-coccyx length; Pelvis = maximum pelvic width. All other measures, except weight, are circumferences. The mean squared lengths of all components of all modes are 1.

In row-isometric biplots, the distances between the row markers are faithfully represented but those between the columns are not, with the reverse for column-isometric biplots (Gower, 1984). An alternative is to divide A equally between the rows and the columns, that is, the rows are displayed as A* = AA1I2 and the variables as B* = BA1/*, where we assume that A* and B* consist only of the first few (mostly two) dimensions.

Commonly, the subjects are displayed as points and referred to a row markers. The variables are then displayed as arrows or vectors and referred to as column markers. In the plot, the relative order of the subjects on each of the variables can be derived from their projections on the variable vectors, and the angles between the variable vectors indicate to what extent these orders are similar for the variables (see Fig. 13.10 for an example with detailed interpretation). The extent to which a biplot is successful in representing these aspects of the data depends on the quality of the low-dimensional (mostly two-dimensional) approximation to the original data.

Gabriel ( 197 1) introduced the word biplot and developed further interpretational procedures with his colleagues (e.g., Gabriel & Odoroff, 1990). The idea goes fur-


ther back, at least to Tucker (1960). Based on Kroonenberg (1995c), a more detailed introduction to the biplot, its relation with the singular value decomposition, interpretational rules, and the effect of preprocessing on a biplot are presented in Appendix B.

11 S.3 Joint biplots

Ajoint biplot in three-mode analysis is like a standard biplot and all the interpretational principles of standard biplots can be used. What is special is that one constructs a biplot of the components of two modes (the display modes) given a component of the third or reference mode. Each joint biplot is constructed using a different slice of the core array. The slicing is done for each component of the reference mode. Each slice contains the strength of the links or weights for the components of the display modes. The coefficients in the associated component of the reference mode weight the entire joint biplot by their values, so that joint biplots are small for subjects with small values on the component and large for those with large coefficients.

The starting point for constructing a joint biplot after a Tucker3 analysis is the I x J matrix A = AG,B’ = A:B:’ or the I x J matrix Ak = AHkB’ = A;BE’ after a Tucker2 analysis. For each core slice, G, (or Hk), a joint biplot for the I x P component matrix A* and the J x Q component matrix B* needs to be constructed such that the P columns of A* and the Q columns of B* are as close to each other as possible; see Kroonenberg and De Leeuw (1977). Closeness is measured as the sum of all P x Q squared distances b2 (a;, b;), for all i and j.

The construction of a joint biplot is as follows (see also Kroonenberg, 1994, pp. 83- 84). The P x Q core slice G , is decomposed via a singular value decomposition into

and the orthonormal left singular vectors U, and the orthonormal right singular vectors V, are combined with A and B, respectively, and the diagonal matrix A, with the singular values is divided between them in such a way that

A: = (1/J)1’4AU,A~/2 aridB: = (J/1)1/4 BV,A:/’, (11.2)

where the fourth-root fractions take care of different numbers of levels in the two component matrices. The columns of the adjusted (asterisked) component matrices are referred to as the joint biplot aces. When the G, (Hk) are not square, their rank is equal to M = min(P, Q), and only M joint biplot axes can be displayed. The complete procedure comes down to rotating each of the component matrices by an orthonormal matrix, followed by a stretching (or shrinking) of the rotated components. The size of the stretching or shrinking of the axes is regulated by the square roots of ALm and the (inverse) fourth root of J / I . Note that even if there is a large difference in explained variability of the axes, that is, between and (AL,m,)2, there can be a sizeable visual spread in the plot as the component coefficients are multiplied by the (AL,m)l/z which makes their values much closer.


As an example with explained variabilities equal to 0.30 and 0.02 (ratio 15: 1), the respective components are multiplied by their fourth roots or 0.74 and 0.38 (ratio 2: 1).

As A:BT‘ = A,, each element 6Tj is equal to the inner product a;b;/, and it provides the strength of the link between i and j in as far as it is contained in the rth core slice. By simultaneously displaying the two modes in one plot, visual inferences can be made about their relationships. The spacing and the order of the subjects’ projections on a variable correspond to the sizes of their inner products, and thus to the relative importance of that variable to the subjects.

One of the advantages of the joint biplot is that the interpretation of the relationships of variables and subjects can be made directly, without involving component axes or their labels. Another feature of the joint biplots is that via the core slice G , (Hk) the coordinate axes of the joint biplots are scaled according to their relative importance, so that visually a correct impression of the spread of the components is created. However, in the symmetric scaling of the components as described here, the distances between the subjects are not approximations to their Euclidean distances (i.e., they are not isometric), nor are the angles between the variables (approximate) correlations. The joint biplot for the Tucker3 model is explicitly meant to investigate the subjects with respect to the variables, given a component of the third mode. For the Tucker2 model, the joint biplot provides information on the relationships between subjects and variables given a level of the third mode.

In practice, joint biplots have proved a powerful tool for disentangling complex relationships between levels of two modes, and many applications make use of them, as do several examples in this book.

Example: Girls’ growth curves data - Joint biplots.

Tucker3 joint biplots. To illustrate a joint biplot, we again use the girls’ growth curves data. The time mode was chosen to be the reference mode and its components are shown in Fig. 11.10. The first time component shows a steady increase with a peak at 13 years of age. The last two years show a fairly steep decline. The primary interpretation of the first time component is that it indicates overall variability at each of the years. Thus, there was increasing differentiation between the girls up to year 13 and this difference decreased for the last two years. Figure 1 1.15 shows the joint biplot with girls and variables as display modes associated with the first time component. The joint biplot portrays the first two (varimax-rotated) axes of a three-dimensional plot belonging to a 3 x 3 ~ 2 solution. By and large, the variables fall in two main groups, the length (or skeletal) variables (Length and crown-coccyx length) and the circumference (or soft tissue) variables (Arm, Calf, Chest). Pelvic circumference is associated with both the skeletal variables and the soft-tissue variables. The third axis (not shown) provides a contrast with the head circumference, and chest and arm.

Girls 19 and 27 have sizeable projections on all variables and are relatively tall and sturdy, while 9 and 1 1 are short and petite. In contrast, girl 4 has average length but


Figure 11.15 Girls' growth curves data: Joint plot from 3x3x2-Tucker3 analysis for first component of age mode. Abbreviations: CrRump = Crownxoccyx length: Pelvis = maximum pelvic width. All other measures. except weight. are circumferences.

I $ rather hefty. girl 18 is shorter but has a similar amount of soft tissue There are no really tall and slim girls (6 and 24 come closest). which seems reasonable since part of someone's weight is due to bones. but short girls can get heavy due to increare in soft tirsue

Figure 11.16 Girls' growth curves data: Joint plots from a 3x3-Tucker2 analysis at 4 years ,1J and at 13 years old. Abbrex,iations: CrRunip = Crown-coccyx length; Pelvis = maxinium pelvic width. A11 other measures, except weight. are circumferences. Note that the scales for the two plots are different, and that the second axes are more or less mirrored.


Table 11.1 Girl5' gronth cuves data Tucker2 core slices at 4 and 13 )ears of ape

Variable components Vanable components Girl 4-years-old 13-years-old

components V1 v2 v 3 V1 v 2 V3

G1 0.32 -0 03 -0.04 0.83 0.02 0 03 G2 0.11 -0.12 -0.02 -0.06 -0.40 -000 G3 -0.10 -004 0.17 0.13 -0 01 0.21

Weight axes 0.20 0.04 0.01 1.08 0.24 0.06

The height of the axes are those of thejoint bipiors for each age group

Tucker2 ,joint biplots. Figure 1 1.16 shows two joint biplots from a 3 x3-Tucker2 analysis. The left-hand plot shows the relative positions of the girls on the variables at the start of the observational period. Young girls clearly show less variability or deviation from the average girl. as can be seen from the smaller scale of the coordinate axes. Moreover. two axes suffice for this age group. showing that there is less differentiation between the girls. Note that at this age girls 9 and 11 were already comparatively small, and 21 was already rather tall. At the age of 13, when the differences between the girls were at their largest. the variables fan out more, indicating that the girls are different from each other on different variables. The Tucker models consist of a single space for the girls and a single one for the variables for all time points but the core slices combine these common components in different ways over the years as can be seen in Table 1 1.1. At age 3, the relationship between girl and variable Components is very diffuse. whereas at age 13 there are clear links between them, indicated by the sizeable core elements.

11.5.4 Nested-mode biplots

The alternative to joint biplots. which display two modes given a component of the third or reference mode. is to construct a matrix in which the rows consist of the fully crossed levels of two of the modes (also called interactive coding). and columns consist of the levels of the remaining mode. The coordinates for such plots can be constructed starting with the basic form for the Tucker3 model.


Figure 11.17 Girls’ growth curves data: Nested-mode biplot for the normalized component space of the variables. Each trajectory represents a girl’s development over time with respect to the variables with high values on the first variable component. Numbers are sequence numbers; 10, 11, 12 are indicated with a 1; real ages are sequence numbers + 3.

with f ( % k l T = CpCq(gpqrckTuzp) the interactively coded scores, so that F is a tall combination-mode matrix. A content-based decision has to be made whether the distances between the f ( zk )T have to be displayed as (approximations to the) distances in high-dimensional space, which requires them to be in principal coordinates. In this case, the rows of B with normalized coordinates are displayed as axes (distance form). Alternatively, if we want the correlations between the variables in Mode B to be correctly displayed, they have to be in principal coordinates (covariance form). When plotting both of them in a single biplot, the second alternative would be the most attractive one, but it all depends on the purpose of the plot.

Because the interpretation of components is generally carried out via the variables, it often makes sense to have the subjects xconditions in the rows and the variables in the columns. Either the subjects are nested in the conditions, or vice versa, and because of this nesting, these plots are referred to as nested-mode biplots. When, in addition, one of the modes in the rows has a natural order, its coefficients can be fruitfully connected for each level of the other mode. For instance, conditions or time points can be nested within each subject. The resulting trajectories in variable space show how the values of a subject change over time or over the conditions. When the variables have been centered per subject-occasion combination, the origin of the plot represents the profile of the subject with average values on each variable on all occasions, and the patterns in the plot represent the deviations from the average subject’s profile.


An interesting aspect of the subject-condition coefficients on the combination- mode components is that per variable component they are the inner products between the variables and the occasions and thus express the closeness of the elements from the two modes in the joint biplot. For further technical details, see Kiers (2000a) and Kroonenbsrg (1983c, pp. 164-166)

Example: Girls’ growth curves data - Nested-mode biplots. Figure 1 1.17 shows the trajectories of the girls in the normalized variable space. The general movement is outward from the origin. indicating that the girls in question deviate more and more from the average profile. but it is also clear that this deviation is decreasing by the age of IS. Given the lack of data beyond age 15. it is not clear whether most of them will regain their initial position with respect to the profile of the average girl.

In Fig. 1 1.18 the average position of the girls on their growth curves is indicated by their sequence number. In addition, the projection of the variable axes are included. They are plotted as the rows of the orthonormal matrix B. As all variables point to the right, it means that girls at the right-hand side of the origin. such as 24, increase their values with respect to the average girl while the girls on the left-hand side. such as 9 and 1 1, lag behind and until their 12th year lag further and further behind. Girls in the southeast corner, such as 18, gain especially more in body mass as is evident from their elevated increases in the soft-tissue variables, while girls like 21 increase especially in skeletal width.

11.5.5 Nested-mode per-component plots

Sometimes it is not very useful to display the interactively coded scores, f ( ik)T, for different components T and T’ in one plot. Sometimes it is clearer to plot the scores of the subjects-condition scores for each of the components separately. Such plots are sometimes easier to use, explain, or present than the nested-mode biplots in which one has to inspect projections on vectors.

In case of a good approximation of the model to the data, the interactively coded component scores as described above will resemble the component scores from a standard principal component analysis on a data matrix in which the columns are variables, and the rows the subject-condition combinations. Other writers (e.g.. Hohn, 1979) have also suggested using such component scores.

For the Tucker2 model the component scores may be derived by rewriting the basic form (Eq. (4.1 I), p. 53) as follows:

p= 1 q=l

Example: Girls’ growth curves data - Nested-mode per-component plots. The trajectories of the girls for the first variable component are depicted in a nested- mode per-component plot (Fig. 1 1.19). All variables have positive values on this

CONCLUSIONS 279

Figure 11.18 Girls’ growth curves data: Nested-mode biplot for the normalized component space of the variables. Numbers indicate girls; if they are located close to the origin they have been deleted. The trajectories end at age 15. Arrows indicate the projections of the vuriuble axes in the low-dimensional space. Angles do not correspond to correlations.

component, so that the trajectories show the girls’ relative growth with respect to the growth of the average girl. A number of patterns can be discerned with respect to these curves: (1) all but one girl cross the horizontal line, indicating that smaller (larger) than average at 4 means smaller (larger) than average all the way through puberty; (2) only a handful of girls have growth curves parallel to that of the average girl, showing that they have the same growth rate; and ( 3 ) the measurements stopped a couple of years too early to establish where the girls ended up with respect to the average girl.

11.6 CONCLUSIONS

Given the complexity of multiway analysis, it is clear that interpretation becomes more challenging when “multi” means “greater than three”. However, well-chosen plots can be of great help in getting an overview of the patterns in the data, and in making sense of them. As Kiers (2000a, p. 169) remarked, the plots used in multiway analysis “rely on two-way (PCA) models, obtained after rewriting the three-


Figure 11.19 Girls’ growth curves data: Xested-mode per-component plot for the first component of the variables. Scores on the first component of the girl-time combinations. Numbers at the right are the identification numbers of the girls, corresponding to those of the earlier plots.

way models at hand.” Therefore, they do not capture all information and should be used in conjunction with the numerical results of the multiway analyses. Nevertheless, without graphing the results in one way or another, i t seems that no multiway analysis can succeed in conveying what is present in the data without taking resort to one or more graphical displays.

[wiley series in probability and statistics] applied multiway data analysis || graphical displays...

Documents