[wiley series in probability and statistics] applied multiway data analysis || exploratory...

30
CHAPTER 15 EX P LORAT0 RY MULTI VAR I ATE LONGITU D I N AL ANALY S I S 15.1 INTRODUCTION' 15.1.1 What are multivariate longitudinal data? Multivariate longitudinal data are nearly by definition three-way data, as they consist of scores of observational units (subjects, objects, etc.) on variables measured at various time points (occasions); that is, they form a fully crossed design. Higher-way data can be imagined in which the fourth mode consists of different conditions under which the longitudinal data have been collected. However, such data sets are rare. Many longitudinal data sets have only one or two variables, and observations consist of single measurements of these variables over a long series of measurements. In such cases, there are really only two modes, variables and points in time. Alternatively, one or two subjects are measured on a single variable for a long period of time, in 'Parts of the text in this chapter have been taken from Kroonenberg, Lammers, and Stoop (1985b). Re- produced and adapted with kind permission from Sage Publications. Applied Mulriway Data Analysis. By Pieter M. Kroonenberg Copyright @ 2007 John Wiley & Sons, Inc. 373

Upload: pieter-m

Post on 09-Dec-2016

218 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: [Wiley Series in Probability and Statistics] Applied Multiway Data Analysis || Exploratory Multivariate Longitudinal Analysis

CHAPTER 15

EX P LO RAT0 RY M U LTI VAR I ATE LONG ITU D I N AL AN ALY S I S

15.1 INTRODUCTION'

15.1.1 What are multivariate longitudinal data?

Multivariate longitudinal data are nearly by definition three-way data, as they consist of scores of observational units (subjects, objects, etc.) on variables measured at various time points (occasions); that is, they form a fully crossed design. Higher-way data can be imagined in which the fourth mode consists of different conditions under which the longitudinal data have been collected. However, such data sets are rare. Many longitudinal data sets have only one or two variables, and observations consist of single measurements of these variables over a long series of measurements. In such cases, there are really only two modes, variables and points in time. Alternatively, one or two subjects are measured on a single variable for a long period of time, in

'Parts of the text in this chapter have been taken from Kroonenberg, Lammers, and Stoop (1985b). Re- produced and adapted with kind permission from Sage Publications.

Applied Mulriway Data Analysis. By Pieter M. Kroonenberg Copyright @ 2007 John Wiley & Sons, Inc.

373

Page 2: [Wiley Series in Probability and Statistics] Applied Multiway Data Analysis || Exploratory Multivariate Longitudinal Analysis

374 EXPLORATORY MULTIVARIATE LONGITUDINAL ANALYSIS

which case we again have two modes, subjects and points in time. The interest in this chapter is in the situation when there are so many subjects, variables, and points in time that some type of condensation of at least two modes is required in order to describe or model the patterns present in the data. In this way, the type of questions that can be asked mostly have the form: Which groups of subjects have for which variables different patterns of development over time?

15.1.2 Types of models

A useful way of ordering techniques for multivariate longitudinal data is to look at the different roles time can play in the analysis of longitudinal data. First, time can be included in the definition of stochastic models. Second, the ordered aspect of time can be included in descriptive, nonstochastic models by, for instance, assuming smooth changes from one time point to the next. Finally, time can be used as a post-hoc interpretational device in otherwise unrestricted modeling.

Multivariate longitudinal data and multiway models Stochastic modeling

0 without latent structures: general linear model [repeated measures analysis of variances, autoregressive models, multivariate time series models];

0 with latent structures: structural equation models; latent growth curves

- non-multiway: subjects are exchangeable [multivariable-multioccasion

- multiway: information on subjects is present in the model via components

or multimode models];

and/or factors [multimode models: multiway variants]

Descriptive modeling

0 component models with functional restrictions on the time modes, for example, smooth functions, explicit time-related functions (growth functions).

0 component models with constraints on the components, for example, smooth- ness constraints

Time as interpretational device

0 Tucker1 model [only components for variables]

0 Tucker2 model with uncondensed time mode [no time components, only com- ponents for variables and subjects]

0 Tucker3 model [components for each of the three modes, including the time mode]

Page 3: [Wiley Series in Probability and Statistics] Applied Multiway Data Analysis || Exploratory Multivariate Longitudinal Analysis

CHAPTER PREVIEW 375

0 exploratory multimode structural equation models

0 STATIS

0 using rotations to a time structure (e.g., orthogonal polynomials) or toward smooth functions

15.2 CHAPTER PREVIEW

The core of this chapter is a detailed exposition of exploratory three-mode models, which can be used for longitudinal analysis with a special emphasis on three-mode models for continuous variables. We will concentrate on data consisting of more than a few subjects, variables, and time points, and designs and research questions taking individual differences into account. Moreover, we will look at truly longitudinal data in which the same subjects are measured over time, rather than cross-sectional data consisting of several samples of subjects.

To illustrate the way three-mode component models can provide useful descrip- tions of multivariate longitudinal data, we will present detailed analyses of two data sets. In particular, one example dealing with the growth of 188 Dutch hospitals over time Kroonenberg et al. (1985b), and one example about the morphological growth of 30 French girls (Kroonenberg, 1987a, 1987b).

15.3 OVERVIEW OF LONGITUDINAL MODELING

15.3.1 Stochastic modeling: Using explicit distributional assumptions

Doubly multivariate repeated measures. A standard, mainly experimental, setup consists of a single dependent variable measured several times with a between-subject design. The aim of such studies is to test differences over time between means of groups that have undergone different (levels of) treatments. Typically, such designs are analyzed by repeated measures analysis of variance. The data are mostly analyzed using multivariate analysis of variance, where multivariate refers to the repeated measurement of the same variable. An extension is the much less common doubly- multivariate repeated measures analysis of variance design, where several variables are measured over time. In fact, the basic data for such a design are three-way data with a between-subject design.

Such designs make assumptions about the distributions of the variables, which are then used to construct significance tests as to whether the means or trends in means are

Page 4: [Wiley Series in Probability and Statistics] Applied Multiway Data Analysis || Exploratory Multivariate Longitudinal Analysis

376 EXPLORATORY MULTIVARIATE LONGITUDINAL ANALYSIS

significantly different from each other. Thus, the observations on the subjects are seen as the result of repeated sampling from specific distributions, but the subjects are not really seen as individuals. Anybody else drawn randomly from the same distribution would have served as well. What is lacking in such an approach is attention to the individuals themselves; moreover, the possibly evolving structure of the variables is not a central part of the investigation.

Structural equation models. As part of the solution, specific models have been devised, which on the one hand specify how the latent variables are related to the observed measurement (the measurement model), and on the other hand specify the relationships between the latent variables (the structural model). The complete model is now generally referred to as a structural equation model (SEM). A detailed, both conceptual and technical, introduction is contained in the book by Bollen (1989), and more recent books dealing explicitly with longitudinal designs for continuous data are Verbeke and Molenberghs (2000) and Fitzmaurice et al. (2004).

The development, analysis, and testing of structural equation models is generally carried out using (means and) covariance matrices as a starting point and hypothesized models are used to calculate estimates for these means and covariances. The ade- quacy of the fit between the observed values and those reconstructed on the base of the model is assessed with various fit indices; see Hu and Bentler (1999) for an overview. Generally, goodness-of-fit testing relies on distributional assumptions about multi- variate normality, but some of the procedures have been shown to be rather robust given large samples. Nonnormal procedures and tests have been developed as well; see Boomsma and Hoogland (2001) for an overview.

Structural equation models have frequently been used for multivariate longitudinal data, especially in panel data with a limited number of variables and a small to medium number of time points. The analyzed covariance matrices have the form of multivariable-multioccasion covariance matrices, or multimode covariance matrices. An example of a structural equation model is shown in Fig. 15.1.

In non-three-mode covariance models, the individuals themselves do not really play a role and they are exchangeable in the sense that any other subject from the same population would have done as well. Within this framework, autoregressive models and dynamic factor models have been proposed. For an overview of these multivariate longitudinal models and multiway modeling see Timmerman (2001).

Latent growth-curve models. Another type of data often analyzed with structural equation models are latent growth-curve models in which not only the (co)variances but also the (structured) means of the latent variables are estimated (e.g., see Bijleveld, Mooijaart, Van der Kamp, & Van der Kloot, 1998, for an overview of the use of structural equation models for longitudinal data) and an in-depth treatment of the whole field Verbeke and Molenberghs (2000) and Fitzmaurice et al. (2004).

Three-mode covariance models. Three-mode covariance models are characterized by the inclusion of information on subjects through components or factors for the

Page 5: [Wiley Series in Probability and Statistics] Applied Multiway Data Analysis || Exploratory Multivariate Longitudinal Analysis

OVERVIEW OF LONGITUDINAL MODELING 377

subjects (i.e., component scores). The basic data used for modeling are still the multimode covariance matrices, but the latent models can be formulated to include either the component scores themselves, or, more commonly, only their (co)variances or correlations. Oort (1999) provides the most comprehensive overview of multimode common factor models, while Oort (2001) deals entirely with longitudinal models in this context. Finally, Kroonenberg and Oort (2003a) discuss the relative advantages and disadvantages of the descriptive and the confirmatory approach toward three- mode covariance modeling; one of their examples deals with longitudinal data.

15.3.2 Descriptive modeling

As the major part of this chapter is concerned with descriptive, exploratory modeling, we will be touch upon some general principles that may be used in component models. It is assumed that the researcher intends to investigate the changes over time between the variables in some condensed form such as linear combinations, using either time itself or composite functions that describe the time development of the scores. There are basically two approaches: either one uses predetermined functions and estimates their parameters alongside other parameters of the component models, or one imposes general restrictions on the components such as smoothness and monotonicity.

Component models with functional restrictions on the time modes. In order to use specific functions, one needs to have a good grasp of the underlying processes. Latent growth modeling is an example in which this is explicitly the case and in which time- related functions can be used. For instance, in Bus’s Learning-to-read study discussed in Chapter 5 of Timmerman’s (200 1) monograph, the psychological and experimental information is that initially the subjects do not have the accomplishments to fulfill the task (baseline). As they are learning how to master the task, their performance improves until the task is more or less fully mastered. Thus, some kind of S-shaped curve will describe the performance over time. Following Browne (1 993) and Browne and Toit (1991), Timmerman (2001, Chapter 5 ) used a Gompertz function to model latent growth curves. In other situations, other curves, such as sinusoidal curves in periodic phenomena, will be more appropriate.

Component models with general constraints on the components. Timmerman (2001, Chapter 4) discusses a number of possibilities for including smoothness con- straints in three-mode component models. In particular, she outlines the use of B- splines for general smoothness and I-splines for monotonic increasing functions1- splines,monotonicity. Note that this approach is not limited to longitudinal designs. The assumption in this context is that from one observation to the next, regular changes can be expected and approximated by smooth curves. This implies that there is suffi- cient autoregressiveness in the process itself. For example, yearly measurements of yield in an agricultural experiment do not fulfill this condition, because weather con- ditions are not a very continuous phenomenon from one year to the next and certainly not an autoregressive one.

Page 6: [Wiley Series in Probability and Statistics] Applied Multiway Data Analysis || Exploratory Multivariate Longitudinal Analysis

378 EXPLORATORY MULTIVARIATE LONGITUDINAL ANALYSIS

Time as an interpretational device. Whereas in the studies discussed in the previous sections time was included in the design and analysis, general descriptive models do not always have facilities for including time explicitly. The analysis is performed as if the time order does not exist, and only afterwards, during the interpretation, do we call in the help of the ordered aspect of time to evaluate the solution. Excluding an essential part of the design has it drawbacks, but on the other hand not making a priori assumptions has its advantages as well.

Several devices are available to use time in a post-hoc manner. One can make trajectories in dimensional graphs connecting time points and inspect to what extent the time order is indeed reproduced in the analysis. Moreover, one might get an idea how to include time in a more sophisticated statistical model. A further possibility lies in optimally rotating time components toward ordered functions, such as higher-order polynomials or otherwise smooth functions.

15.4 LONGITUDINAL THREE-MODE MODELING

In this section we explicitly discuss the analysis of multivariate longitudinal data with three-mode component models. Whereas in multiway profile data the main depen- dence is between variables, in longitudinal data the serial dependence or autocorre- lation between observations on different occasions is important as well. Interactions between the two kinds of dependence introduce further complications.

15.4.1

The promise of three-mode principal component analysis and its analogues in longitu- dinal multivariate data analysis lies in the simultaneous treatment of serial and variable dependence. In the Tucker3 model the serial dependence can be assessed from the component analysis of the time mode, the variable dependence from the variable mode, and their interaction from the core array or from the latent covariance matrix (see Section 15.4.2). Using standard principal component analysis to analyze the data either arranged as a tall combination-mode matrix of subjects x occasions by variables, or arranged as a wide combination-mode matrix of subjects by variables x occasions, the variable and serial dependence, and their interactions become confounded.

Scope of three-mode analysis for longitudinal data

15.4.2 Analysis of multivariate autoregressive processes

Lohmoller (1978a, 1978b, 1989) made several proposals toward the interpretation of serial and variable dependence via three-mode principal component analysis of multivariate longitudinal data. His contribution will be discussed in this section.

Introduction. Basic to Lohmoller’s approach is the assumption that the changes in the variables and the scores of the subjects on these variables can be modeled

Page 7: [Wiley Series in Probability and Statistics] Applied Multiway Data Analysis || Exploratory Multivariate Longitudinal Analysis

LONGITUDINAL THREE-MODE MODELING 379

by multivariate autoregressive processes. Given that the assumption is tenable, he shows how one can get an indication of the size of the parameters of the assumed autoregressive process from the results of the three-mode analysis. In other words, once a three-mode analysis has been performed, an interpretation of its results can be given in terms of the parameters of the autoregressive process.

Lohmoller’s procedure to arrive at these indicators or “estimators” is rather indirect. Some 625 data sets were first generated according to specific autoregressive processes, then analyzed with three-mode principal component analysis. Empirical “estimat- ing” equations were derived for the parameters of the autoregression processes by regressing them on the parameters in the three-mode analyses. Lohmoller himself recognized that the procedure has serious drawbacks, as for each new kind or size of data set new simulation studies have to be performed. On the other hand, via his simulation studies he was able to investigate which results in a three-mode analysis are particularly sensitive to changes in specific parameters in the autoregressive mod- els, and which results reflect the general characteristics of the autoregressive models. Lohmoller (1978a, 1978b) pointed out that, in fact, three-mode path models are to be preferred, because in those the autoregressive processes can be modeled directly. Some data sets, however, are too large to be handled by such an approach, because the J K x J K multimode covariance matrix can become too large.

Component analysis of time modes. One of the problems in three-mode analysis of longitudinal data is the interpretation for the decomposition of the time mode, as its correlation matrix is more often than not a (quasi-)simplex. Entries of a simplex are such that the correlations between two time points are a decreasing function of their difference in rank order. Typically the bottom-left and top-right corners of the correlation matrix have the lowest entries (for examples see Table 15.2). As Guttman (1954) has shown, simplexes have standard principal components. In particular, the components of equidistant simplexes can be rotated in such a way that the loadings on the first component are equal, those on the second component are a linear function of the ranks of the time points, those on the third component are a quadratic function of the ranks, and so on. After extraction of the first two principal components, all variables have roughly the same communalities, and the configuration of time points resembles a horseshoe or “U’, the opening of which increases with the relative size of the first eigenvalue.

The problem is not so much the standard solution as the fact that there are many different processes that could have generated the simplex. In other words, certain processes are sufficient for the appearance of a simplex in the correlation matrix, but they are not necessary. It is therefore very difficult to “prove” the existence of such processes from the appearance of a simplex in the time mode without a substantive theory why the processes should be present.

One kind of change that produces correlation matrices with a (Markov) simplex structure (e.g., see Joreskog, 1970) is a first-order autoregressive processes, to be discussed later. Thus, a simplex-like correlation matrix may be explained by such

Page 8: [Wiley Series in Probability and Statistics] Applied Multiway Data Analysis || Exploratory Multivariate Longitudinal Analysis

380 EXPLORATORY MULTIVARIATE LONGITUDINAL ANALYSIS

Figure 15.1 latent variables (q), nine observed variables (z), and three correlated external variables ( C )

A three-wave multivariate autoregressive model with per phase three correlated

an autoregressive process. Furthermore, a process of steady growth with a level and a gain component produces a correlation matrix with a (Wiener) simplex form (e.g., see Joreskog, 1970). Thus, one may describe the underlying change process, at least approximately, by a level and a gain component. In the case that one has a simplex- like correlation matrix, the results of a component analysis on this correlation matrix may therefore be interpreted both in terms of parameters of the autoregressive process and in terms of a level and a gain component, provided the subject matter warrants the use of such models.

Autoregressive processes. In Fig. 15.1 an example is given of the kind of autore- gressive models one might consider. The model was suggested by Lohmoller (1978a, 1978b) in this context, and we will follow his description.

The structural part of the model (see also Section 15.3.1) has the form of a mul- tivariate regression equation; it is assumed that the state of the latent variables 771~ at occasion k depends on only two influences: the state of the variables at occasion k - 1 (i.e., it is a first-order process), and the state of the external variables [k at occasion k ,

Page 9: [Wiley Series in Probability and Statistics] Applied Multiway Data Analysis || Exploratory Multivariate Longitudinal Analysis

LONGITUDINAL THREE-MODE MODELING 381

where q k and q k - 1 are the vectors of all Q latent variables. In the model it is assumed that the external variables 6 k and C’, ( k # k’) are uncorrelated, which is almost always an oversimplification as it implies that all time-dependent influences are included in the model. We have written the external variables, possibly slightly incorrectly, as latent rather than manifest ones. Finally, it is assumed in this model that the latent and external variables are normalized.

The matrix CP, called the transition matrix, describes how strongly the latent vari- ables at occasion k - 1 influence those at occasion k . When is diagonal, as is the case for the model in Fig. 15.1, the latent variables only influence themselves, and no other latent variables (i.e., dqq/ = 0. q # 4’). When CP is diagonal, the changes in the component structure of the variables are entirely due to external influences.

The matrix * describes the influences of external variables on occasion k on the latent variables on occasion k . The external variables represent the entire in- fluence of the environment on the latent variables. When * is diagonal, as in the model in Fig. 15.1, each external variable only influences one latent variable (i.e.,

yq4’ = 0. q # 4’). Note that here the matrices 9 and CP are assumed to be inde- pendent of k , so the model assumes that the first-order influences remain identical over time. Differences in influence over time of both latent and external variables cannot be accounted for by this particular autoregressive process, and as such it is almost always an oversimplification of reality. The structural part has been entirely discussed until now in terms of latent variables, and therefore we also need a mea- surement model to link the data with the structural model. In the present case, the measurement model is simply the three-mode principal component model itself, in which the components are the latent variables of the autoregressive model.

Latent covariance matrix. Before entering into a discussion of the role autoregres- sive models can play in three-mode analysis, it is necessary to look at what we will call the latent covariance matrix S = ( o ~ ~ . ~ , ~ , ) , called core covariance matrix by Lohmoller (1978a, 1978b). The oqT,qtT, are the inner products of the elements of the Tucker2 core array

I

i = l

where we assume that the I observational units or subjects constitute the unreduced first mode in a Tucker2 model with a second mode of variables, and a third mode of occasions (see Fig. 15.2). Depending on the scaling, the covariances may be cross products, ‘real’ covariances, or correlations. A similar covariance matrix could be defined for the Tucker3 model, but condensing a mode first and then computing the covariance matrix for it does not seem sensible, and no experience has been gained with a similar matrix for the Parafac model.

Page 10: [Wiley Series in Probability and Statistics] Applied Multiway Data Analysis || Exploratory Multivariate Longitudinal Analysis

382 EXPLORATORY MULTIVARIATE L O N G I T U D I N A L ANALYSIS

Multivariable-multioccasion covariance matrix

Three-mode data

r=l

r=2

r=3

r=l r=2 r=3 1.. ..Q 1.. ..Q

Latent covariance matrix

Figure 15.2 From I x J x K three-way data to the J K x J K multimode covariance matrix (multivariable-multioccasion matrix) for the I subject scores on the J K variable-occasion combinations to the latent covariance matrix of the I component score on the combinations of Q variable components and R occasion components.

The core elements hz(qr) may be interpreted as scores of the observational units on the Q x R structured components of the second and third modes. In this context the a th structured component is a J x K vector <a with elements <yk, and <yk = bJqckr , with j k = 1, . . . , J K , and a = 1, . . . , QR. In the example in Section 15.5, one of the structured components, for instance, is labeled as gain in degree of specialization. In that case an hzqr represents the gain in degree of specialization of the ith hospital.

The value of o,,, = (o(qr),(q,TJl) thus indicates the covariance of the a th and cu’th structured components. Within structural equation modeling where the mode of observational units is stochastic, the latent covariance matrix arises in a natural way. If we follow Bloxom’s (1968) formulation, the Tucker2 model can be written as

x = (CBB)<+e . (15.3)

where 6 = (<,) is the random vector of unobserved scores on the QR structured components, x the random vector of observations on the J x K variables, and E the random vector of unobserved residuals. If we indicate the J K x J K residual covari- ance matrix by 0, then the J K x J K multimode covariance matrix E is modeled as

E = (C C3 B)S(C/ 8 B’) + 0, (15.4)

where 0 contains the residual variances on the diagonal and nonzero off-diagonal elements in the case of correlated residuals. Loosely speaking, one may say that the latent covariance matrix underlies the observed multimode covariance matrix and embodies the basic covariances present in the data. This interpretation is based on the

Page 11: [Wiley Series in Probability and Statistics] Applied Multiway Data Analysis || Exploratory Multivariate Longitudinal Analysis

LONGITUDINAL THREE-MODE MODELING 383

fact that it contains the covariances between the unobserved scores on the structured components.

Normalization of S. The options for normalization of the latent covariance matrix parallel those in ordinary PCA. The component scores may be set equal to the lengths of the eigenvalues (principal coordinates), that is the distance form, or they may have length one (normalized coordinates), that is the covariance form (see Section 9.3.1, p. 215). Using normalized score coordinates, the size of the data is transferred to the variable components B and/or the occasion components C . This corresponds with the usage in, for instance, psychology, where the variables are in principal coordinates. Lohmoller generally scaled the latent covariance matrix such that score components are in normalized coordinates, and we have done the same in the example in Section 15.5 for comparability with his results.

The major purpose in discussing the latent covariance matrix from a Tucker2 model is that its structure can be used to investigate the parameters of postulated autoregres- sive processes underlying the observations. Lohmoller (1978a, 1978b, 1989) derived general relationships between latent covariances and general characteristics of the autoregressive processes based on a series of simulation studies.

Apart from this way of interpreting the latent covariance matrix, one may interpret the variances and covariances directly, provided the score components are in principal coordinates. The latent variances of the scores on the structured components, naa, may be divided by the total variation present in the data - SS(Tota1) - so that they can be interpreted as the proportions variance accounted for by the structured components. After all

and the squared elements of the core array can be interpreted as explained variances. The covariances naat may be transformed into direction cosines between the struc- tured components Q and a’:

(15.5)

Interpreting latent covariances in this way is more direct and has a wider applica- bility than the interpretation via parameters of autoregressive processes. On the other hand, the latter interpretation gives more specific and more substantive information because of the postulated model.

1 /2 1 / 2 = Oaa’/Oaa Oafa’.

Linking autoregressive parameters to three-mode results. In the introduction to this chapter we referred to two major sources of dependence in multivariate longi- tudinal data: variable and serial dependence. It is of interest to know whether these kinds of dependence influence each other. One may have a structure between vari- ables (variable dependence) that is not changing over time. Furthermore, subjects

Page 12: [Wiley Series in Probability and Statistics] Applied Multiway Data Analysis || Exploratory Multivariate Longitudinal Analysis

384 EXPLORATORY MULTIVARIATE LONGITUDINAL ANALYSIS

may maintain their relative positions on the variables irrespective of the structure of the variables; that is, there is stability of the variables or high autocorrelations (serial dependence). Finally, variables and subjects may change simultaneously, in which case there is neither stability nor stationarity.

A set of latent variables is said to be stationary when the same component structure is present on all occasions. Furthermore, the set is homogeneous when the variables are highly correlated so that they can be represented in a low-dimensional space by a few components. In autoregressive processes the homogeneity is indicated by the covariances of the latent variables at each time point (the rtq4 in Fig. 15.1).

In three-mode component analysis we derive one set of orthonormal variable com- ponents over all occasions simultaneously, and the dimensionality of the component space is an indication of the overall homogeneity. As there is only one component matrix for the variables, one could get the impression that the model does not allow for nonstationarity. This, for instance, is indeed the case in a model without a core array like the Parafac model (see Section 4.6.1, p. 57), but not in the Tucker2 model, in which the deviations from stationarity show up in the core array and the latent covariance matrix (see Section 15.4.2). Lohmoller’s contribution is that he attempted to investigate what kind of stationarity could be inferred from the latent covariance matrix. He claimed that for an autoregressive process as shown in Fig. 15.1, in- creasing and decreasing homogeneity can be gleaned, ceteris paribus, from the size and the signs of the covariances between the latent variables in the latent covariance matrix. We will return to this point in some detail when discussing the example in Section 15.5.6.

A (latent) variable will be called stable when the relative positions of the ob- servational units on that (latent) variable stay the same in time. The stability of a (latent) variable may be judged from the covariances of the latent variables on different occasions. A set of variables will be called stable when all variables are stable. In autoregressive processes the stability of a latent variable qq is given by dqq (Fig. 15.1); && indicates to what extent the latent variable is determined by its predecessor. Stable variables are sometimes called trait-like, that is, mainly deter- mined by the defining construct or trait, and unstable variables are sometimes called state-like, that is, mainly determined by the moment at which they are measured (see, Cattell, 1966, p. 357). The size of the covariances of a latent variable between time points is thus an indication of the stability of a variable. High values indicate a trait-like and low values a state-like variable. The overall stability of a set of vari- ables, 3, may be determined from the first eigenvalue of the time mode, given that first-order autoregressive processes underlie the data (see Lohmoller, 1978a, p. 29). When a second-order autoregressive model applies, the stability will most likely be overestimated using the first eigenvalue.

Lohmoller also investigated the structure of the latent covariance matrix for au- toregressive models like those in Fig. 15.1 in case of different stabilities of the latent variables, and in case of equal stabilities on changing dimensions. However, we will not go into that part of his study.

Page 13: [Wiley Series in Probability and Statistics] Applied Multiway Data Analysis || Exploratory Multivariate Longitudinal Analysis

EXAMPLE: ORGANIZATIONAL CHANGES IN DUTCH HOSPITALS 385

Finally, it is interesting to consider the situation in which all latent variables have equal stability (i.e., uniform autocorrelations) so that no partial cross-lag correlations exist. For this situation, Lohmoller (1978a, p. 4) showed that the latent covariance matrix is a (QRxQR) identity matrix, 3 or a diagonal matrix 9, depending on the particular scaling of the components. This means that the three-mode model for the observed covariance matrix Eq. (15.4) reduces to

C = (C @ B)D(C 8 B) + 0. (15.6)

This model, which was also described by Bentler and Lee (l978b), may be used as a kind of “null-hypothesis” against which to evaluate latent covariance matrices.

Discussion. The above approach to evaluating change phenomena in multivariate longitudinal data very much depends on the appropriateness of the multivariate au- toregressive models. It is also still very sketchy from a mathematical point of view and therefore requires further investigation. Further practical experience is also necessary to assess its potential. It seems that in some cases the assumption of an underlying autoregressive process is not unreasonable, as in the example in Section 15.4.2. In connection with this example we will discuss rough and ready ways to assess whether the assumption of the autoregressive model is tenable.

Lohmoller’s major contribution has been that he provides a framework for the in- terpretation of multivariate longitudinal data, which cannot easily be handled directly by causal modeling or time series analysis. One of the problems in that area is that the present estimation procedures for multimode covariance matrices may have diffi- culty handling the size of covariance matrices considered here, apart from the fact that there are often not enough observations to make maximum likelihood and generalized least-squares estimation feasible. The advantage of such approaches, however, is that very refined modeling is possible within a hypothesis-testing framework.

Oort (2001) discussed confirmatory longitudinal three-mode common factor analy- sis within the structural equation framework and Kroonenberg and Oort (2003a) pre- sented a comparison between two approaches of handling multimode covariance matrices with three-mode models, but the latter authors did not discuss Lohmoller’s approach.

15.5 EXAMPLE: ORGANIZATIONAL CHANGES IN DUTCH HOSPITALS

15.5.1 Stage 1 : Objectives

In order to gain some insight into the growth and development of large organizations, Lammers ( 1974) collected data on 22 organizational characteristics (variables) of 188 hospitals in The Netherlands (“subjects”) from the annual reports of 1956-1966 (time). His main questions with respect to these data were (1) whether the organiza- tional structure as defined by the 22 variables was changing over time, and (2) whether

Page 14: [Wiley Series in Probability and Statistics] Applied Multiway Data Analysis || Exploratory Multivariate Longitudinal Analysis

386 EXPLORATORY MULTIVARIATE LONGITUDINAL ANALYSIS

Table 15.1 Hospital study: Description and mnemonics of the variables

Variables

Var. No. Label Description

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22

Training Resear FinDir Facili QExtem QRatio Functi Staff RushIn ExStaf NMedProf Admin ParaMd NonMed Nurses Beds Patien Openns ClinSp

ClinSub OutPSub

OutPSp

Training capacity Research capacity Financial director Facility index Ratio of qualified nurses in outside wards Ratio of qualified nurses/total number of nurses Number of functions Total staff Rushing index Executive (managerial and supervising) staff Nonmedical professionals Administrative (Le., clerical) staff Paramedical staff Other nonmedical staff Total number of nurses Total number of beds Total number of patients Openness Main clinical specialties Main outpatient specialties Clinical subspecialties Outpatient subspecialties

there were different kinds of hospitals with different organizational structures and/or different trends in their structures. These two questions will be taken up in this section (for an earlier analysis see Kroonenberg et al., 198%). In the next section we will try to assess (3) the stability of the latent variables, (4) the stationarity of the latent- variable domain and (5) the interaction between serial and variable dependencies. In particular, we will try to assess the parameters of a possibly underlying autoregressive process.

15.5.2 Stage 2: Data description and design

Prior to the three-mode analysis, the majority of the variables (see Table 15.1) were categorized for practical reasons into roughly ten intervals of increasing length in order to remove skewness of the counted variables, ease visual inspection, and prepare the data for other analyses. The variables were slice centered and slice normalized per variable in order to remove incomparable standard deviations, while maintaining the trends over the years in the years. As discussed in Section 6.5.2, p. 129, fiber centering

Page 15: [Wiley Series in Probability and Statistics] Applied Multiway Data Analysis || Exploratory Multivariate Longitudinal Analysis

EXAMPLE: ORGANIZATIONAL CHANGES IN DUTCH HOSPITALS 387

Figure 15.3 plot). Explained variability: first component 55%; second component 0.4%.

Hospital study: Time components in principal coordinates (all-components

is another, possibly better, option, but this would have removed the time trends from the data.

15.5.3 Stage 3: Model and dimensionality selection

The hospital data were analyzed both with a 2 x 2 x 2-Tucker3 model with two com- ponents each for the hospitals, the variables, and the years and with a 2 x 2-Tucker2 model. Details of the dimensionality selection can be found in the original publication (Kroonenberg et al., 1985bj.

15.5.4 Stage 4: Results and their interpretation

Time trends. For the inspection of the time-mode components (Fig. 15.3), it is advantageous to present the components in principal coordinates (standardized com- ponent weights: zq = 0.55; z/2 = O.O04j, because an assessment of their relative importance is crucial. The strong, stable time trend dominating the figure shows that the overall structural organization remains the same, except for a slight increase in the first years (say, 1956-1961). The second trend, gain, shows a very steady increase but is relatively unimportant. From Section 15.4.2 we know that we may expect such components from longitudinal data to show a simplex structure in the time mode. Table 15.2 shows the correlation matrix of the time mode (top) and of two of the variables (number of beds and main outpatient specialties).

Some authors (Van de Geer, 1974; Lohmoller, 1978a) suggest that it is advanta- geous to rotate the components from a simplex to orthogonal polynomials, leading to

Page 16: [Wiley Series in Probability and Statistics] Applied Multiway Data Analysis || Exploratory Multivariate Longitudinal Analysis

388 EXPLORATORY MULTIVARIATE LONGITUDINAL ANALYSIS

Table 15.2 Hospital study: Correlations between years

time mode (based on 188x22 observations) 1 2 3 4 5 6 7 8 9 10 11

1 100 2 96 100 3 94 97 4 93 95 5 89 92 6 87 90 7 86 88 8 85 88 9 83 85 10 81 84 11 80 82

number of beds 1 2

100 98 94 92 90 90 87 86 85

3

100 95 100 93 97 100 91 94 95 100 91 94 95 97 100 89 91 93 94 96 100 87 90 91 93 95 97 100 86 89 90 92 94 95 97 100

4 5 6 7 8 9 10 11

1 100 2 97 100 3 97 98 100 4 96 98 99 5 95 96 98 6 94 96 97 7 94 95 97 8 93 95 96 9 92 94 95 10 92 93 94 11 91 92 93

main outpatient specialties 1 2 3

100 98 100 98 99 97 98 97 98 95 97 94 96 93 95

4 5

100 99 100 98 99 100 97 98 99 100 96 96 98 99 100 95 95 97 97 99 100

6 7 8 9 10 11

1 2 3 4 5 6* 7* 8 9 10 11

100 94 90 87 79 70 72 74 70 66 65

100 92 100 90 98 100 82 90 90 100 72 79 80 89 100 74 82 83 86 79 100 77 84 86 89 83 93 100 72 79 80 82 76 88 93 100 68 77 78 80 75 86 90 96 100 66 75 77 79 72 84 89 95 98 100

Note the curious break in the simplex of the main outpatient specialties at the years 6, 7, and 8. The

correlations rise after year 6, and only systematically fall off again after year 8.

Page 17: [Wiley Series in Probability and Statistics] Applied Multiway Data Analysis || Exploratory Multivariate Longitudinal Analysis

EXAMPLE: ORGANIZATIONAL CHANGES IN DUTCH HOSPITALS 389

Figure 15.4 associated with the first time component (reference mode).

Hospital study: Joint biplot of variables and hospitals (display modes)

a first component that has more or less equal entries, a second component with entries increasing linearly over time when the time points are equidistant, and a third com- ponent that shows a quadratic function of time, that is, first an acceleration and then a deceleration, or vice versa. For the present data, it was attempted to rotate the time mode to such a matrix of orthogonal polynomials, but the rotation matrix was prac- tically an identity matrix ( ~ 1 1 = 0.9996; q2 = rZl = 0.0294; 7-22 = 0.9996). Not surprisingly, it only transferred a very small amount of the growth in overall level from the first to the second component. We will therefore continue to show the unrotated time components.

Hospitals and variables. To answer the first question with respect to the changes in organizational structure, we will inspect a joint biplot with the hospitals and vari- ables as display modes and time as the reference mode (see Section 11.5.3, p. 273). Figure 15.4 is the joint biplot associated with the first time component, which reflects the overall stable characteristics of the variable and hospital domains.

The joint biplot shows that in terms of the variables the axes may be interpreted as size and degree of specialization, with the former component not only represen- tative of the variables intended to measure size (see Table 15.1), but also of most of the other variables. The degree of specialization is primarily indicated by a deficit of main specialties (OutPSp and ClinSp), a somewhat larger research capacity (Re- search), greater proportions of qualified nurses (QRatio), and more qualified nurses outside the wards (QExtern). In terms of the hospitals the axes can be interpreted as

Page 18: [Wiley Series in Probability and Statistics] Applied Multiway Data Analysis || Exploratory Multivariate Longitudinal Analysis

390 EXPLORATORY MULTIVARIATE LONGITUDINAL ANALYSIS

Table 15.3 components of the three modes

Hospital study: Full core array with relationships between the

First hospital type Second hospital type

Degree of Degree of Size specialization Size specialization

Raw values of full core array Level 150 1 Gain 2 -1

Standardized contribution to SS(Fit) Level 0.490 0.000 Gain 0.000 0.001

-1 10

0.000 0.002

53 -5

0.060 0.000

general hospitals and specialized hospitals, respectively. From the relative sizes of the standardized component weights (48% and 8%, respectively) we may conclude that the first components of the hospitals and the variables are by far the most im- portant ones. The second component is essentially determined by the fact that some 15-20 hospitals lack a considerable number of main specialties compared to the other hospitals; that is, they are more specialized. Incidentally, the sharp boundary of the hospitals on the positive Y-axis in Fig. 15.4 is caused by ceiling effects due to the fact that a large number of hospitals have all the main specialties a hospital can have. From Fig. 15.3 in combination with Fig. 15.4, we can deduce that the answer to the Jirst research question is that the overall organizational structure was stable; that is, the relative position of the hospitals remained unaltered, but there is a steady but small increase or decrease in overall level or size, depending on the signs of the loadings on other components (see Section 9.5.3, p. 230 for a discussion on keeping track of signs for components and core arrays).

Interaction between hospitals and variables. To answer the second research ques- tion, a decomposition in terms of components alone does not suffice, and the full core array must be inspected as well. First of all, Table 15.3 confirms the answer to the first research question. The combination of the first components of all three modes (general hospitals, size, and level), gill = 150, explains most of the fitted variation (Proportion of SS(Fit) due to g111)/ SS(Fit overall) = 0.49/0.56 = 0.88. The gain in size of the general hospitals, g112 = 2, is negligible over and above the increase already contained in the level component.

The second important combination (9221 = 53; Prop. SS(Fit) = 0.06) indicates that the specialized hospitals also maintain their overall level of specialization. There is a slight tendency (9222 = - 5 ) to become less specialized, and to grow in overall size (9212 = 10). Similarly the large general hospitals tend to become somewhat less specialized (9221 = - 7). The standardized contributions to the SS(Fit) show that

Page 19: [Wiley Series in Probability and Statistics] Applied Multiway Data Analysis || Exploratory Multivariate Longitudinal Analysis

EXAMPLE: ORGANIZATIONAL CHANGES IN DUTCH HOSPITALS 391

Figure 15.5 core elements, hq,T of the extended core array of the Tucker2 model.

Hospital study: Trends of hospital-variable combinations. Expressed via the

these effects are very small, leading to the conclusion that the specialized hospitals do not have a very different growth pattern from that of the other hospitals.

A more detailed inspection of the time trends is given in Fig. 15.5, where the elements of the extended core array with years as unreduced third mode of a Tucker2 analysis have been plotted against time. The patterns, of course, are in accordance with the Tucker3 analysis presented earlier, but the development of the relations between the hospital and variable components over time is shown more explicitly.

15.5.5 Checking the order of autoregressive process

Before attempting to estimate the parameters of an autoregressive process, it should be established whether it is reasonable to postulate such a process for the data, and if so, whether it is of the right order. There seem to be a number of ways to do this.

First, check whether the time mode is a simplex, as we know that autoregressive processes generate simplexes. Inspection of the hospital data shows the correlation matrix to be a simplex (see Table 15.2), and moreover the points in time (years) are equidistant.

Second, perform a multiple regression of xk for the data at time k on the earlier observations on the same variables, xk-1, xk-2. . . ., where xk is the ( I J x 1) vector for the kth occasion. When the autoregressive process is first-order, only xk-1 should have a sizeable regression coefficient. Although ordinary least-squares estimation will lead to incorrect standard errors for the estimators of the regression coefficients, they are in general unbiased (e.g., see Visser, 1985, p. 71). Table 15.4 shows the relevant

Page 20: [Wiley Series in Probability and Statistics] Applied Multiway Data Analysis || Exploratory Multivariate Longitudinal Analysis

392 EXPLORATORY MULTIVARIATE LONGITUDINAL ANALYSIS

Table 15.4 occasion from earlier occasions

Hospital study: standardized regression coefficients for predicting an

Predictors

Criteria t-1 t-2 t-3 t-4 1-5 t-6 t-7 t-8 t-9 t-10

2 96 -

3 79 19 - 4 85 14 - 5 75 14 7 6 87 10 7 63 19 10 8 70 16 8 5 9 75 13 12

10 76 15 4 4 11 75 1 1 8

-

All values have been multiplied by 100

standardized regression coefficients, and the hospital data seem to follow at least a second-order autoregressive process with a dominant first order. This implies that the lag-one correlations will overestimate the overall stability of the process.

Combining the above information, the assumption of a second-order autoregres- sive process with a rather strong first-order seems plausible. However, as pointed out in Section 15.4.2, Lohmoller’s procedures were only developed for first-order autore- gressive processes, because higher-order autoregressive processes turned out to be unmanageable. We will proceed as if the autoregressive process is a first-order one, keeping in mind this is only an approximation and might lead to an overestimation of the stability.

15.5.6 Assessment of change phenomena

In this section we will apply Lohmoller’s proposals for assessing change phenomena to the hospital study. In order to remain compatible with his discussion, we will again use the results from the fiber-standardized data (i.e., standardized per variable on each occasion), instead of the recommended profile preprocessing of fiber centering the data per variable-occasion combination and slice normalization of variables over all occasions together. Our major tool will be the latent covariance matrix, discussed in Section 15.4.2.

The overall stability of the variable domain, 3, may be assessed in two different ways. First, Lohmoller gives tables linking the overall stability to the eigenvalues of the first two components from the time mode. From these tables, the overall stability is estimated as between 0.85 and 0.95. This estimate may be compared with the correlations between adjacent occasions (i.e., Tk .k -1 of the ( K x K ) correlation

Page 21: [Wiley Series in Probability and Statistics] Applied Multiway Data Analysis || Exploratory Multivariate Longitudinal Analysis

EXAMPLE: ORGANIZATIONAL CHANGES IN DUTCH HOSPITALS 393

matrix R of the time mode in Table 15.2). The comparison of the lag-one correlations in Table 15.4 shows good agreement. In addition, it should be observed that the lag- one correlations do not vary much. This leads us to accept the assumption that 3 is independent of time, and that by and large the hospitals maintain their overall rank order of the variables over the years.

With a high overall stability, the variable components should be very stable as well. Following Lohmoller’s guidelines we may infer from 011,11 = 1.71, and 0 2 1 . 2 1 = 1.70 of the latent covariance matrix (Table 15.5A) that both latent variables are equally stable and trait-like. One may seek confirmation for this by inspecting the cross-lag correlations for representative variables (see Table 15.2). Taking the variable Beds as indicator for size, the stability is obvious; taking the variable Main outpatient specialties to indicate degree of specialization, the stability is still clearly visible but rather irregular between years 6 and 8. The cause of the latter is a matter for separate investigation. One might speculate that the definition of what constitutes a specialty changed at the time.

The zero value of the covariance between size and degree of specialization for level, 021 11, indicates that no cross-lag covariances exist between the latent variables (bppl = 0 . p # p’); that is, is diagonal. The interaction between level and gain of the two latent variables (022.11 # 0; 021.12 # 0) shows that the set of variables is not stationary. There is a negative covariance between level of size and gain in degree of specialization (022.11 = - 2.57), which indicates that hospitals that are large through the years tend to lose or at least not gain in degree of specialization, and vice versa. We suspect this might be the ceiling effect: large hospitals already had all the specialties they could have. Furthermore, there is a positive covariance between gain in size and degree of specialization (021.12 = 1.57): very specialized hospitals tend to become larger, and vice versa.

The importance of the deviations from stationarity are, however, relatively small. This can be assessed from the direction cosines between the structured components; see Eq. (15.5). From Table 15.5B it follows that the direction cosine between level of size and gain in degree of specialization is -0.02 (6’ = 9l0), and the direction cosine between level of degree of specialization and gain in size is 0.01 (0 = 89’). In other words, the deviations from stationarity do not succeed in introducing substantial nonorthogonalities between the structured components. These small direction cosines may seem strange when considering the sizes of the elements in Table 15.5A. It should, however, be realized that the normalizations of the elements of the latent covariance matrix have eliminated the dependence of the elements on the importance of the components to which they refer. This means that all components are treated on the same level. This normalization was chosen partly to fit in with the assumption that all latent variables were normalized per occasion, that is are fiber normalized. It has some advantage in highlighting the interactions, but at the same time gives a rather incorrect impression of their sizes.

Page 22: [Wiley Series in Probability and Statistics] Applied Multiway Data Analysis || Exploratory Multivariate Longitudinal Analysis

394 EXPLORATORY MULTIVARIATE LONGITUDINAL ANALYSIS

Table 15.5 Hospital study: latent covariance matrix

Degree of Size specialization

Level Gain Level Gain

A: Lohmoller scaling Size Level

Gain Degree of Level specialization Gain

B. Standardized Size Level

Gain Degree of Level specialization Gain

1.71 0.01 0.00

-2.57

0.01 0.00 -2.51 1.44 1.57 -0.10 1.57 1.70 -0.09

-0.10 -0.09 3.86

0.50 0.00 0.00

-0.02

0.00 0.00 0.01 0.00

0.00 -0.02 0.01 0.00 0.06 0.00 0.00 0.00

C. Labeling of elements Size Level g11,11 g11,12 g11.21 011,22

Gain 012,ll 012.12 012,21 g12,22

Degree of Level 021,ll 021,12 021,21 021,22

specialization Gain 022.11 022.12 022.21 022.22

15.6 EXAMPLE: MORPHOLOGICAL DEVELOPMENT OF FRENCH GIRLS3

15.6.1 Stage 1 : Objectives

The original study from which the present data were taken was initiated in order to get insight into the physical growth patterns of children from ages four to fifteen. Details about the study together with several analyses of the data for 30 normal French girls can be found in the volume Data analysis. The ins and outs of solving real problems (Janssen et al., 1987).

15.6.2 Stage 2: Data description and design

The thirty girls set were chosen from a larger data base because they had measurements available for all variables for each of the twelve years. The eight variables under con- sideration were the following: Weight, Length, Crown-coccyx length (CrownRump),

3This section is partly based Kroonenberg (1987b): 1987 @Plenum Press: reproduced and adapted with kind permission of Springer Science and Business Media. The data can be obtained from the data set section of the website of The Three-Mode Company; http://three-mode.leidenuniv.nl. Accessed May 2007.

Page 23: [Wiley Series in Probability and Statistics] Applied Multiway Data Analysis || Exploratory Multivariate Longitudinal Analysis

EXAMPLE: MORPHOLOGICAL DEVELOPMENT OF FRENCH GIRLS 395

Figure 15.6 Girls’ growth curves data: Mean curves of the eight variables under consideration, representing the average girl’s growth curves. The variables have not yet been normalized to equal sum of squares; therefore, no scale has been attached to the vertical axis.

Chest circumference (Chest), Left upper-arm circumference (Arm), Left calf circum- ference (Calf), Maximum pelvic width (Pelvis), and Head circumference (Head); all measured in millimeters. Thus, the data form a 30 x 8 x 12 longitudinal data block of girls by variables by years; see Chapter 11 for further analyses of these data.

15.6.3 Stage 3: Model and dimensionality selection

Preprocessing and mean curves. Before the analysis proper, profile preprocessing was applied to the data; that is, the fiber means of each variable at each time point,?E.jk, were removed and the variables were slice-normalized (i.e., divided by s . ~ . ) , which removed the unequal influences of different measurement scales and variances. The effect of this is that the mean growth curve is removed from the analysis and the scores analyzed are those in normalized deviations from the average girl’s growth curves; see Section 6.6.1 (p. 130).

From the growth curves of the average girl (Fig. 15.6), it can be seen that she has a nearly linear growth in most variables, except for chest circumference, which has a change in slope around the tenth year, while weight has a change in slope around the eighth year. Both variables level off around fourteen years of age.

Dimensionality selection. In this section we will present the results from a Tucker2 analysis with 3 components for the girls mode and the variable mode, as well as a Tucker3 analysis in which, in addition, the time mode is represented by two com- ponents. The number of components for the analysis was determined by examin- ing deviance plots (see Section 8.5.2, p. 181). In this plot for the Tucker3 model

Page 24: [Wiley Series in Probability and Statistics] Applied Multiway Data Analysis || Exploratory Multivariate Longitudinal Analysis

396 EXPLORATORY MULTIVARIATE LONGITUDINAL ANALYSIS

Figure 15.7 Girls’ growth curves data: Deviance plot for the Tucker3 model. On the vertical axis the residual sum of squares is indicated and on the horizontal axis the associated degrees of freedom. The preferred models lie on the convex hull. The 3 (girl components) x 3 (variable components) x 2 (time components) model is the one presented in this chapter.

(Fig. 15.7) we settled for a 3x3x2-Tucker3 model and a 3x3-Tucker2 model, be- cause both the 2 x 2 x 1-Tucker3 model favored by the Ceulemans-Kiers st-criterion (see Section 8.5.3, p. 182) and the 3 x 3 ~ 1-Tucker3 model were deemed too simple as they only had one time component.

In addition to the overall fit, the fit of the components for each of the modes was examined as well as that of their combinations (see Table 15.6). The small difference in fit between the two Tucker models indicates that condensing the time mode to two components does not affect the results much, the more so because the fit of the components for the variables and the girls is essentially similar for both analyses. We will therefore switch between the two models whenever expedient.

15.6.4 Stage 4: Results and their interpretation

Variables. The three components of the variables with normalized coordinates are portrayed together with the equilibrium circle Fig. 15.8 (see Legendre & Legendre, 1998, p. 398ff.). The figure shows that all variables fit equally well and that a third dimension is present because of the deviating pattern for the growth of the head. Its similarity to pelvis and crown-coccyx in two dimensions is somewhat deceptive. This becomes clearer when we portray the variable components in principal coordinates and connecting them in accordance with their minimum spanning tree as was done in Fig. 11.6 (see Section 11.4.1, p. 262). The figure shows that Head dips deep under the 1-2 plane and Pelvis less so, while Length is above the plane. A varimax rotation

Page 25: [Wiley Series in Probability and Statistics] Applied Multiway Data Analysis || Exploratory Multivariate Longitudinal Analysis

EXAMPLE: MORPHOLOGICAL DEVELOPMENT OF FRENCH GIRLS 397

Table 15.6 model; Fit expressed as proportion of total sum of squares.

Fit for components of the 3 x 3 x 2-Tucker3 model and the 3 x 3-Tucker2

Tucker3 model Tucker2 model

Mode 1 2 3 1 2 3

Proportional fit per component Girls 0.56 0.14 0.07 0.56 0.15 0.07 Variables 0.58 0.14 0.06 0.58 0.14 0.06 Years 0.75 0.02

Nonnegligible elements of core arruys Tucker3 full core array Tucker2 extended core array

Gl ,Vl ,Yl 0.56 G2,V2,Y 1 0.14 G3,V3,Y 1 0.05 G3,Vl,Y2 0.01

Overall fit 0.77

G1,Vl 0.56 G2,V2 0.14 G3,V3 0.06 G3.Vl 0.01

0.78

Gp,Vq,Yr indicates the core element that is the combination of the pth girl component, the qth variable component, and the rth time component; there are no time components for the Tucker2 model.

Table 15.7 Varimax component solution for variables

Components

Variables Abbreviation 1 2 3

Left upper-arm circumference Chest circumference Left calf circumference Weight

Length Crown-coccyx length

Maximum pelvic width Head circumference

Arm Chest Calf Weight

Length CrownRump

Pelvis Head

0.51 0.47 0.45 0.45

0.08 0.16

0.27 0.16

0.01 0.16 0.17 0.32

0.62 0.55

0.34 0.21

0.16 0.12 0.18 0.24

0.14 0.28

0.40 0.78

Component 1: Soft tissue; Component 2: Skeletal length; Component 3: Skeletal width.

of the component space (Table 15.7) shows that the three rotated components can be characterized by the Soft tissue variables, the Skeletal length variables, and the Skeletal width variables, be it that the pelvic width has noticeable loadings on all components. We will use these components to further examine the growth patterns.

Page 26: [Wiley Series in Probability and Statistics] Applied Multiway Data Analysis || Exploratory Multivariate Longitudinal Analysis

398 EXPLORATORY MULTIVARIATE LONGITUDINAL ANALYSIS

Figure 15.8 coordinates with the equilibrium circle added (see in Section 11.4.1, p. 262).

Girls’ growth curves data: Components for the variables in normalized

Girls. We will not portray the component space of the girls as it is not very revealing except that, due to the centering employed, nearly all combinations of positive and negative coefficients on the three components occur. In accordance with the relative importance of the components (see Table 15.6), the variability in the third dimension is small and primarily caused by a few girls with larger coefficients.

Figure 15.9 rotation to an optimally constant first component. Normalized coordinates.

Girls’ growth curves data: All-components plots of the Time components after

Page 27: [Wiley Series in Probability and Statistics] Applied Multiway Data Analysis || Exploratory Multivariate Longitudinal Analysis

EXAMPLE: MORPHOLOGICAL DEVELOPMENT OF FRENCH GIRLS 399

Years. Figure 15.9 shows that the first time component after being optimally ro- tated toward a constant value is indeed nearly constant, indicating the overall highly constant rank-order of the girls as expressed by their positive correlation matrix. The second component reflects the changes in variability in growth rates, which is grad- ually and from age 8 more markedly increasing until age 13, when the variability decreases sharply, indicating that by then all the girls have had their growth spurts and their differences level off and stabilize. Note that the normalized coordinates preclude the assessment of the relative importance of the components from the plot.

15.6.5 Trends over time for individual girls

To investigate the trends in growth of the individual girls over time, we can construct nested-mode biplots (also called an interactive biplots) (see Section 11.5.4, p. 276) or nested-mode per-component plots. The row coefficients in nested-mode plots represent the structured component scores of the 30 x 1 2 girl-year combinations on each of the components of the variables. These coefficients of nested-mode plots are comparable with the coordinates of the interstructure in STATIS (see Section 5.8, p. 105). For each variable component, we can portray the girlxage coefficients to examine the growth curves in a nested-mode per-component plot, and we can examine the trajectories of the girls over time in a nested-mode biplot (see Section 11.5.3, p. 273).

Individual growth curves per variable component. Here, we will only show the growth curves corresponding to the first rotated component of the reference mode, that is, variables representing Soft tissue. In Fig. 15.6 we saw that the average girls had fairly similar growth curves for most variables, except for a faster increase in weight and chest circumference and a very slow increase in head circumference. The growth curves observed in the nested-mode per-component plot (Fig. 15.10) show to what extent individual curves followed those of the average girl with respect to the Soft tissue variables. The general pattern is one of increasing variability until 13 years of age and a decreasing variability after that. This reflects the situation that some girls grew more and/or earlier than others, and that around the 13th year the fast growers stopped and the average girl caught up. At the same time the late growers started to catch up with the average girl. At the end of the measuring period the differences between the girls had increased considerably compared to when they were 4, but the relative rank order at the beginning and at the end is generally maintained. Once a girl is smaller than her peers, she is very unlikely to catch up in puberty, even though she might temporarily do so if her growth spurt comes early. This follows from the fact that very few growth lines cross that of the average girl. Nearly all early growers fall back in the end to the relative position they had at the beginning. The message from the other components, Skeletal growth and Skeletal width, is similar.

Growth curves as trajectories in nested-mode biplots. The same girl-year coordi- nates can also be plotted in the component space of the variables, rather than with

Page 28: [Wiley Series in Probability and Statistics] Applied Multiway Data Analysis || Exploratory Multivariate Longitudinal Analysis

400 EXPLORATORY MULTIVARIATE LONGITUDINAL ANALYSIS

Figure 15.10 Girls’ growth curves data: Individual growth curves on Soft tissue for the 30 girls; scores are deviations from the average growth curve (represented by the horizontal line at 0).

respect to each component separately. By connecting the scores over the years for each girl we get trajectories in the component space as portrayed in Fig. 15.11 for the Soft tissue-Skeletal length space.

The origin of the space consists of the coordinates of the 12 years of the average girl. All trajectories start in the neighborhood of the origin, move away from it, and more or less return to it. The star-like pattern indicates that there are many different combinations of growth in Soft tissue and Skeletal length, something that is difficult to observe from the per-component growth curve plots. It can also be observed that the largest distance from the origin occurs around 13 years of age, after which the curve returns to the origin, indicating that the relative difference from the average girl is diminishing. There is also one girl who is close to the origin all the time, indicating that her growth pattern mirrors that of the average girl as depicted in Fig. 15.6.

15.7 FURTHER READING

Timmerman (2001) presents one of the few extended treatments of longitudinal data within the three-mode context, treating simultaneous component analysis, the ap- plication of functional constraints on components, modeling longitudinal data of different people measured at different times (cross-sectional data and multiset), as

Page 29: [Wiley Series in Probability and Statistics] Applied Multiway Data Analysis || Exploratory Multivariate Longitudinal Analysis

CONCLUSIONS 401

Figure 15.11 in the component space of Soft tissue (horizontal) and Skeletal length (vertical).

Girls’ growth curves data: Individual growth curves for the 30 girls presented

well as dynamic factor models; subjects not treated in this chapter. Unfortunately, two important topics could not be included here. In particular, Murakami (1983) uses the Tucker2 model in an innovative way to assess the changes in the components for variables and in the components for occasions both separately and in conjunction, and Meredith and Millsap (1982) employed a variant of canonical correlation analysis for a similar purpose. The book by Lohmoller (1989) is also an invaluable source for studying three-mode methods and longitudinal data, including his latent three-mode path models.

15.8 CONCLUSIONS

Insight into developmental processes in multivariate longitudinal data can be acquired by inspecting relationships between the components of the three modes: observational units, variables, and points in time. A detailed analysis of the latent covariance matrix can supply information on differential growth patterns if they exist, and the extended core array can help to inspect the changes in the interrelationships between the components over time. In fact, the entries in the extended core array can be seen as the scores of observational units on structured components of the latent variables and trends. A further level of detail is introduced by using the scores on the structured components of subjects and time to portray individual growth curves per variable

Page 30: [Wiley Series in Probability and Statistics] Applied Multiway Data Analysis || Exploratory Multivariate Longitudinal Analysis

402 EXPLORATORY MULTIVARIATE LONGITUDINAL ANALYSIS

component in a nested-mode per-component plot, and by plotting these same scores in a nested-mode biplot so that their trajectories could be examined and interpreted.

A description of data in terms of an autoregressive model has been shown to be acceptable for certain data sets, but first-order processes might be difficult to find. Our example and most of Lohmoller’s needed a second-order term. Furthermore, the theory is still very much underdeveloped; only in specific cases are detailed statements possible. At present the most useful aspect is the interpretational framework that an autoregressive model can provide for three-mode analysis of multivariate longitudinal data.