canonical variate analysis: some practical aspects by ... · 5.5 c-v plot for arctanh correlations...
TRANSCRIPT
![Page 1: CANONICAL VARIATE ANALYSIS: SOME PRACTICAL ASPECTS by ... · 5.5 C-V plot for arctanh correlations for grasshopper data 137 5.6 I-A and Q-Q plots for log variances for Thais data](https://reader033.vdocuments.mx/reader033/viewer/2022052106/60412acbe2295f639322b604/html5/thumbnails/1.jpg)
CANONICAL VARIATE ANALYSIS: SOME PRACTICAL ASPECTS
by
Norman Albert Campbell
Thesis submitted for the degree of Doctor of Philosophy in
the University of London and for the Diploma of Membership
of the imperial College
![Page 2: CANONICAL VARIATE ANALYSIS: SOME PRACTICAL ASPECTS by ... · 5.5 C-V plot for arctanh correlations for grasshopper data 137 5.6 I-A and Q-Q plots for log variances for Thais data](https://reader033.vdocuments.mx/reader033/viewer/2022052106/60412acbe2295f639322b604/html5/thumbnails/2.jpg)
1
ABSTRACT
Techniques and guidelines are developed for more effective
application of canonical variate analysis.
The influence function is used to develop criteria for the
detection of atypical observations in discriminant analysis. For
Mahalanobis D2, the influence function is a quadratic function of the
discriminant score. The use of robust estimators of means and of
covariances, in conjunction with probability plots of associated
Mahalanobis distances, is shown to lead to enhanced detection of
atypical observations.
Robust M-estimation for canonical variate analysis is developed,
based on a functional relationship formulation. An alternative approach,
based on M-estimation of the canonical variate scores, is also presented.
Graphically-oriented procedures for comparing within-groups
covariance matrices are developed, using basic ideas from analysis of
variance and regression. A multivariate comparison leading to
graphical representation is also considered.
The role of shrunken estimation procedures in canonical variate
analysis is examined. Marked improvement in the stability of the
canonical vectors can be effected when a direction(s) of small between-
groups variation coincides with a direction(s) of small within-groups
variation.
A functional relationship model is used to develop methods for
comparing canonical variate analyses for several independent sets of
data. Criteria for examining the parallelism and coincidence of
discriminant planes, and the dispersal of the means, are given.
The usual canonical variate analysis is generalized to the
situation where the covariance matrices are not assumed to be equal.
Three generalizations are developed, corresponding to different
formulations of the usual approach.
Analyses of data from various fields are given throughout the
thesis to illustrate the application of the approaches developed.
![Page 3: CANONICAL VARIATE ANALYSIS: SOME PRACTICAL ASPECTS by ... · 5.5 C-V plot for arctanh correlations for grasshopper data 137 5.6 I-A and Q-Q plots for log variances for Thais data](https://reader033.vdocuments.mx/reader033/viewer/2022052106/60412acbe2295f639322b604/html5/thumbnails/3.jpg)
2
ACKNOWLEDGEMENTS
I would like to thank Professors D.R. Cox and M.J.R. Healy for
their extensive contributions to an enjoyable and rewarding two
years of study at Imperial College. Their advice and encouragement
throughout the period and their constructive comments on an earlier
draft of the thesis and on papers arising therefrom is greatly
appreciated. I would also like to thank fellow students John
Tomenson, Daryl Pregibon and Peter Rundell, who discussed various
aspects of the work with me.
It is a pleasure to be able to acknowledge the collaboration
of colleagues in zoology, botany and genetics. Problems arising
during collaborative studies have led to the techniques proposed in
this thesis. Bruce Phillips provided the initial stimulation with
his data on geographic variation in whelks. My interest in multivariate
analysis dates from this project. Stephen Hopper, Chris Green,
Darrell Kitchener and John Dearn have spent many hours discussing the
role of multivariate studies in biological problems, and improving
my biological understanding of the areas of application. Collaboration
with Cathy Campbell, Rod Mahon, Tony Watson and Lou Koch is also
appreciated. William Atchley and Richard Reyment have encouraged me
to develop improved techniques for analyzing multivariate data.
My thanks are also due to the Commonwealth Scientific and
Industrial Research Organization and in particular the Division of
Mathematics and Statistics for the CSIRO Divisional Postgraduate
Studentship which made this period of study possible.
![Page 4: CANONICAL VARIATE ANALYSIS: SOME PRACTICAL ASPECTS by ... · 5.5 C-V plot for arctanh correlations for grasshopper data 137 5.6 I-A and Q-Q plots for log variances for Thais data](https://reader033.vdocuments.mx/reader033/viewer/2022052106/60412acbe2295f639322b604/html5/thumbnails/4.jpg)
TABLE OF CONTENTS
3
Page
1
2
3
6
8
10
10
13
ABSTRACT
ACKNOWLEDGEMENTS
TABLE OF CONTENTS
LIST OF TABLES
LIST OF FIGURES
CHAPTER 1 A general introduction
1.1 Outline of the thesis
1.2 Canonical variate analysis
1.3 Canonical variate analysis - a functional relationship formulation
1.4 Geometry of canonical variate analysis
1.5 Computation of the canonical variate solution
19
24
26
32
CHAPTER 2
1.6 Adequacy of discriminant functions and subsets of variables
Detection of atypical observations in discriminant analysis 37
2.1 Influence function 37
2.2 Influence function in discriminant analysis 39
2.2.1 Influence function for Mahalanobis a2 41
2.2.2 Influence function for discriminant means 42
2.2.3 Influence function for the discrimin- ant function coefficients 43
2.2.4 Approximations and representation of the influence functions 43
2.3 Probability plots to detect possible atypical values 45
2.4 An example 47
![Page 5: CANONICAL VARIATE ANALYSIS: SOME PRACTICAL ASPECTS by ... · 5.5 C-V plot for arctanh correlations for grasshopper data 137 5.6 I-A and Q-Q plots for log variances for Thais data](https://reader033.vdocuments.mx/reader033/viewer/2022052106/60412acbe2295f639322b604/html5/thumbnails/5.jpg)
4
Page
CHAPTER 3 Robust procedures to examine variation within a group 55
3.1 Introduction 55
3.2 Robust estimation of multivariate location and scatter 56
3.3 Robust principal components analysis 62
3.4 Some practical examples 65
3.5 Discussion 74
CHAPTER 4 Robust canonical variate analysis 80
4.1 M-estimation of the canonical variate scores 80
4.2 Robust M-estimation of the canonical vectors 85
4.3 Some practical examples 96,
4.4 Discussion 105
CHAPTER 5 Graphical comparison of covariance matrices 109
5.1 Introduction 109
5.2 Graphical comparisons 110
5.2.1 Individual-Average plot 111
5.2.2 A multivariate comparison 121
5.2.3 Orthogonalized variables 126
5.3 Some examples 128
5.4 Further practical aspects 149
CHAPTER 6 Shrunken estimators in canonical variate analysis 151
6.1 Introduction 152
6.2 Shrunken or ridge-type estimators in discriminant analysis 153
6.3 Mean square error of shrunken estimators for discriminant analysis 157
6.4 Shrunken estimators in canonical variate analysis 161
6.5 Practical aspects 165
6.6 Discussion 174
![Page 6: CANONICAL VARIATE ANALYSIS: SOME PRACTICAL ASPECTS by ... · 5.5 C-V plot for arctanh correlations for grasshopper data 137 5.6 I-A and Q-Q plots for log variances for Thais data](https://reader033.vdocuments.mx/reader033/viewer/2022052106/60412acbe2295f639322b604/html5/thumbnails/6.jpg)
5
Page
CHAPTER 7 Comparison of canonical variates 179
7.1 Introduction 179
7.2 Comparison of solutions 183
7.2.1 Individual orientation and dispersal 185
7.2.2 Common orientation, individual dispersal 187
7.2.3 Common orientation and common dispersal 189
7.2.4 Coincidence but individual dispersal 191
7.2.5 Coincidence and common dispersal 192
7.2.6 Common orientation, dispersal and position 194
7.2.7 Likelihood ratio statistics 197
7.3 An example 197
7.4 Discussion of some practical aspects 204
CHAPTER 8 Canonical variate analysis with unequal covariance matrices 207
8.1 Introduction 207
8.2 Generalizations of the usual solution 209
8.2.1 Weighted between-groups formulation 209
8.2.2 Likelihood ratio formulation 211
8.2.3 Functional relationship formulation 216
8.3 Computation of the generalized solutions 220
8.4 Performance of the generalizations when the covariance matrices are equal 223
8.5 Comparison of solutions 228
8.6 Practical application 231
REFERENCES 236
![Page 7: CANONICAL VARIATE ANALYSIS: SOME PRACTICAL ASPECTS by ... · 5.5 C-V plot for arctanh correlations for grasshopper data 137 5.6 I-A and Q-Q plots for log variances for Thais data](https://reader033.vdocuments.mx/reader033/viewer/2022052106/60412acbe2295f639322b604/html5/thumbnails/7.jpg)
6
LIST OF TABLES Page
3.1 Extract of listing of Thais data 68
3.2 Stem-and-leaf plot for ratios of robust to usual
variances for Thais data and for generated multivariate
Gaussian data 77
3.3 Stem-and-leaf plot for ratios of robust to usual
variances for scorpion data 78
4.1 Underlying means, standard deviations and correlations
for generated data 94
4.2 Summary of robust M-estimation canonical variate analyses
of generated data 95
4.3 Canonical roots and vectors for Dicathais data 97
4.4 Summary of non-unit weights from robust analyses of
Dicathais data 100
4.5 Canonical roots and vectors for Thais data 101
4.6 Summary of non-unit weights from robust analyses of
Thais data 103
5.1 Analysis of variance SSQ's from comparison of regressions
calculations with row means as regressor variables 117
5.2 Variances and correlations for the grasshopper data 129
5.3 Fitted linear regressions and analysis of variance table
for log variances for grasshopper data 133
5.4 Analysis of variance table for arctanh correlations for
grasshopper data 135
5.5 Fitted linear regressions and analysis of variance table
for log variances for Thais data 140
5.6 Group rankings for each row of I-A plot for arctanh
correlations for Thais data 145
5.7 Fitted linear regressions and analysis of variance table
for arctanh correlations for Thais data 146
![Page 8: CANONICAL VARIATE ANALYSIS: SOME PRACTICAL ASPECTS by ... · 5.5 C-V plot for arctanh correlations for grasshopper data 137 5.6 I-A and Q-Q plots for log variances for Thais data](https://reader033.vdocuments.mx/reader033/viewer/2022052106/60412acbe2295f639322b604/html5/thumbnails/8.jpg)
List of Tables (continued)
7
Page
5.8 Correlation coefficients for Thais data 148
6.1 Means, pooled standard deviations and correlations
for Dicathais data 166
6.2 Eigenanalysis and canonical variate analyses for
Dicathais data 167
6.3 Canonical roots and vectors for alternative shrunken
estimator formulations for Dicathais data 173
7.1 Representation of comparisons of models of interest 181
7.2 Summary of main results for various models 198
7.3 Summary of grasshopper data 200
7.4 Canonical roots and vectors and determinants for
various models for grasshopper data 203
8.1 Simulation results for comparison of generalized
solutions with usual solution 225
8.2 Maximized log likelihoods for Afrobolivina data 233
![Page 9: CANONICAL VARIATE ANALYSIS: SOME PRACTICAL ASPECTS by ... · 5.5 C-V plot for arctanh correlations for grasshopper data 137 5.6 I-A and Q-Q plots for log variances for Thais data](https://reader033.vdocuments.mx/reader033/viewer/2022052106/60412acbe2295f639322b604/html5/thumbnails/9.jpg)
8
LIST OF FIGURES Page
2.1 Plot of change in Mahalanobis D2 against
discriminant score 50
2.2 Gamma probability plot of influence function values
for D2
2.3 Gamma probability plot of influence function values
for length of coefficient vector
3.1 Gaussian probability plots of cube root of Mahalanobis
squared distances for group 3 of Thais data
3.2 Gaussian and gamma probability plots of Mahalanobis
squared distances for group 8 of Thais data
4.1 Components of squared distance d2 for robust canonical km
variate analysis
4.2 Canonical variate means for Dicathais data
4.3 Canonical variate means for Thais data
4.4 Gaussian and gamma probability plots of Mahalanobis
squared distances for group 6 of Thais data
5.1 I-A, Q-Q and R-R plots for arctanh correlations for
generated data
5.2 I-A, Q-Q and R-R plots for log variances for grasshopper
data
51
52
69
72
92
98
102
104
122
130
5.3 M-S and C-V plots for log variances for grasshopper data 136
5.4 I-A plot for arctanh correlations for grasshopper data 137
5.5 C-V plot for arctanh correlations for grasshopper data 137
5.6 I-A and Q-Q plots for log variances for Thais data 139
5.7 M-S and C-V plots for log variances for Thais data 141
5.8 I-A plot for arctanh correlations for Thais data 144
5.9 M-S and C-V plots for arctanh correlations for Thais data 147
6.1 Plots of canonical variate coefficients and roots for
Dicathais data 168
![Page 10: CANONICAL VARIATE ANALYSIS: SOME PRACTICAL ASPECTS by ... · 5.5 C-V plot for arctanh correlations for grasshopper data 137 5.6 I-A and Q-Q plots for log variances for Thais data](https://reader033.vdocuments.mx/reader033/viewer/2022052106/60412acbe2295f639322b604/html5/thumbnails/10.jpg)
9
List of Figures (continued)
7.1 Representation of three groups for three sets for
various models
7.2 Canonical variate means for grasshopper data
8.1 Plot of canonical variate means versus depth for
borehole samples for Afrobolivina
Page
182
201
234
![Page 11: CANONICAL VARIATE ANALYSIS: SOME PRACTICAL ASPECTS by ... · 5.5 C-V plot for arctanh correlations for grasshopper data 137 5.6 I-A and Q-Q plots for log variances for Thais data](https://reader033.vdocuments.mx/reader033/viewer/2022052106/60412acbe2295f639322b604/html5/thumbnails/11.jpg)
10
CHAPTER ONE: A GENERAL INTRODUCTION
1.1 Outline of the Thesis
Canonical variate analysis, or multiple discriminant analysis, is a
widely used multivariate technique, particularly in biology, geology,
and medicine. The emphasis in the former fields is on description
and summarization of group differences, while in the latter field the
emphasis is primarily on allocation or diagnosis.
The motivation for the study reported here arose from extensive
consultation and collaboration with colleagues in CSIRO, University of
Western Australia and the W.A. Museum, on the application of multi-
variate techniques to biological and agronomic problems.
In the course of these case studies (see author references), it
became obvious that despite the widespread applicability of canonical
variate analysis, surprisingly little is available to guide the
applied statistician in the use of the approach. What little guidance
exists relates to the two-group discriminant function, and even here
the emphasis is almost solely on allocation rates. The general aim of
this study is to examine in detail various practical aspects of
canonical variate analysis, and where necessary to develop techniques
and provide guidelines for more effective application of the approach.
The emphasis in this study is on canonical variate analysis as a
multivariate approach which provides a description and summary of
multivariate differences between groups. Allocation is not considered.
A general outline of canonical variate analysis, with particular
emphasis on the underlying geometry, is given in the remaining Sections
of this Chapter. The remainder of the present Section provides an
introduction to subsequent Chapters.
![Page 12: CANONICAL VARIATE ANALYSIS: SOME PRACTICAL ASPECTS by ... · 5.5 C-V plot for arctanh correlations for grasshopper data 137 5.6 I-A and Q-Q plots for log variances for Thais data](https://reader033.vdocuments.mx/reader033/viewer/2022052106/60412acbe2295f639322b604/html5/thumbnails/12.jpg)
11
Procedures for detecting atypical observations are considered in
Chapters Two, Three and Four. The provision of analyses little
influenced by such observations is considered in Chapter Four.
A graphically-oriented approach for indicating atypical observations
in discriminant analysis, based on the influence function, is developed
in Chapter Two. In Chapter Three, the role of robust M-estimation of
means and covariances is considered. The use of probability plots of
Mahalanobis distances to indicate atypical observations is examined.
A functional relationship formulation for canonical variate analysis
is used in Chapter Four to develop a robust M-estimation approach to
canonical variate analysis. An alternative approach, based on robust
M-estimation of the canonical variate scores, is also presented.
The basic assumptions in canonical variate analysis are that the
vectors of observations on distinct individuals are independently and
identically distributed and that the group covariance matrices are
equal. The original development of the discriminant function, due
to Fisher (1936), does not assume a specific distributional form.
The optimum properties of the discriminant function for the Gaussian
case were first presented by Welch (1939). The assumption of a multi-
variate Gaussian distribution leads to a formal derivation of the
canonical vectors using a likelihood ratio approach, and this turns
out to have some useful extensions. The assumption of an underlying
multivariate Gaussian distribution can be examined using techniques
described in Gnanadesikan (1977, Section 5.4.2). In particular,
probability'plots of Mahalanobis distances are very useful. A refine-
ment, using robust estimates, is presented in Chapter Three.
The commonly used test procedure for examining the equality of
covariance matrices, namely the likelihood-ratio test based on
![Page 13: CANONICAL VARIATE ANALYSIS: SOME PRACTICAL ASPECTS by ... · 5.5 C-V plot for arctanh correlations for grasshopper data 137 5.6 I-A and Q-Q plots for log variances for Thais data](https://reader033.vdocuments.mx/reader033/viewer/2022052106/60412acbe2295f639322b604/html5/thumbnails/13.jpg)
12
determinants, is known to be very sensitive to departure from
Gaussian form. Moreover, no readily-interpretable information is
provided as to how the matrices differ. In Chapter Five of this
study, graphically-oriented procedures for comparing covariance
matrices are developed, using basic ideas from analysis of variance
and regression. These procedures are complemented by formal multi-
variate tests, together with further graphical description.
Chapter Six considers the stability of the canonical vectors.
The use of shrunken estimators is developed. Their adoption is shown
to lead to improved stability when certain of the directions describing
the within-groups variation are associated with small between-groups
variation.
A common problem in multivariate discrimination studies is the
analysis and comparison of several sets of data, each set relating to
the same physical or biological problem. Chapter Seven develops
likelihood ratio criteria for comparing the canonical variate solutions
for different sets of data. Criteria for examining the parallelism and
concurrence of the discriminant planes, and the dispersal of the
canonical variate means, are given.
Chapter Eight generalizes the usual canonical variate analysis
to the situation where the covariance matrices are not assumed to be
equal; three generalizations are developed, corresponding to different
formulations of the usual canonical variate problem. With the
assumption of equal covariance matrices, all formulations lead to the
same eigenanalysis. However, each generalization leads to a slightly
different solution, though two of them can be considered as special
cases of the third. All three generalizations are computationally
more complicated than the usual solution. The usual canonical variate
solution is well understood both conceptually and theoretically, and
![Page 14: CANONICAL VARIATE ANALYSIS: SOME PRACTICAL ASPECTS by ... · 5.5 C-V plot for arctanh correlations for grasshopper data 137 5.6 I-A and Q-Q plots for log variances for Thais data](https://reader033.vdocuments.mx/reader033/viewer/2022052106/60412acbe2295f639322b604/html5/thumbnails/14.jpg)
13
there are undoubted advantages in using it if possible. Procedures
are suggested for comparing the generalizations with the usual
solution, to determine the effect of differences in covariance
structure on the directions of maximum between-group variation.
1.2 Canonical Variate Analysis
Consider g groups of data, with v variables measured on each of
nk individuals for the kth group. Let xkm represent the vector of
observations on the mth individual for the kth group (m = 1,...,nkt
k = 1,...,g). Define the sums of squares and products (SSQPR)
matrix for the kth group as
nk
Sk = E (xkm - xk)(xkm - xk)T m=1
where
nk -1 __
xk nlc mEl xkm (1.2)
and write
g W = E Sk = S (1.3)
k=1
for the within-groups SSQPR matrix on
nW g
= E (nk - 1) k=1
(1.4)
degrees of freedom (d.f.).
![Page 15: CANONICAL VARIATE ANALYSIS: SOME PRACTICAL ASPECTS by ... · 5.5 C-V plot for arctanh correlations for grasshopper data 137 5.6 I-A and Q-Q plots for log variances for Thais data](https://reader033.vdocuments.mx/reader033/viewer/2022052106/60412acbe2295f639322b604/html5/thumbnails/15.jpg)
14
Define the between-groups SSQPR matrix as
g
where
— — — _ B = E nk (xk - xT) (xk - xT) T
k=1
-1 g - xT = nT E nkxk
k=1
(1.5)
(1.6)
and
g nT v E
k nk .
=1 (1.7)
Note that nT will also be written as n, without the subscript.
The simplest formulation of canonical variate analysis is the
distribution-free one of finding that linear combination of the original
variables which maximizes the variation between groups, relative to the
variation within groups. That is, find the canonical vector c1 which
maximizes the ratio ciBc1/c'Wc1; the vector is usually scaled so that
c1Wc1 = nW. The maximized ratio gives the first canonical root fl.
The canonical vector c1 and canonical root f1 can be found by explicit
use of a function maximization routine (and this is done in the
generalization in Section 8.2.1). However, use of Lagrange multipliers
leads directly to the eigenanalysis
(B - fW)c = 0 . (1.8)
Write
C = (c1, ..., ch)
![Page 16: CANONICAL VARIATE ANALYSIS: SOME PRACTICAL ASPECTS by ... · 5.5 C-V plot for arctanh correlations for grasshopper data 137 5.6 I-A and Q-Q plots for log variances for Thais data](https://reader033.vdocuments.mx/reader033/viewer/2022052106/60412acbe2295f639322b604/html5/thumbnails/16.jpg)
15
and
F = diag(fl, ..., fh)
where
h = min(v,g-1) . (1.9)
Then the eigenanalysis in (1.8) leads to
BC = WCF
with
CTWC = nWI (1.10)
and
CTBC = n';
the canonical variates are uncorrelated both within and between groups,
and have unit variance within groups. The approach described in this
paragraph will be referred to in Section 8.2.1 as the weighted between-
groups formulation.
For two groups, Fisher's linear discriminant function results.
Write dx = xi - x2, nT = nl + n2 and define Mahalanobis squared distance
i ~/ 1 as D2 = d 1Dpx. Then c = D- `
nwl 'x and f = nWlnln2nT1D2. T
The distribution-free approach given above follows the original
derivation of the linear discriminant function by Fisher (1936) and the
generalization to canonical vectors by Fisher (1938), Bartlett (1938)
![Page 17: CANONICAL VARIATE ANALYSIS: SOME PRACTICAL ASPECTS by ... · 5.5 C-V plot for arctanh correlations for grasshopper data 137 5.6 I-A and Q-Q plots for log variances for Thais data](https://reader033.vdocuments.mx/reader033/viewer/2022052106/60412acbe2295f639322b604/html5/thumbnails/17.jpg)
16
and Hotelling (1936). Rao (1948, 1952) generalized the linear
discriminant function by finding the linear combinations cix which
maximize the total Mahalanobis distance between all pairs of groups
in the reduced number of dimensions. The sum of the squares of
distances between the canonical variate means for either formulation
in all v dimensions for any pair of groups in the analysis is equal
to the corresponding Mahalanobis D2 for the pair of groups. However,
the first p canonical vectors as defined in (1.10) do not in general
maximize the total D2 over all pairs of groups in p dimensions. For
this formulation, the unweighted between-groups matrix
B = E (xk - xU)(xk - xU)T, with xU = g-1 E xk, replaces B in k=1 k=1
(1.8) and (1.10). Rao (1952, Sections 9c.2 and 9d.1) and Gower
(1966, p.589) discuss the two formulations.
Write
T = B + W;
then an equivalent formulation is to maximize the ratio
ciBcl/ciTcl, leading to the eigenanalysis
(B - r2T)c = 0 . (1.12)
The ratio r1 is the square of the first sample canonical correlation
coefficient. The vector c1 is scaled so that ciTcl = nw(1 - ri)-1 =
nw(1 + f1), so that once again c1Bcl = nwr1(1 - r1)-1 = nwf and
T c4Wci = nw.
Now assume that xkm ti Nv(uk,E). The maximized likelihood when the
uk are unrestricted is
(20-nv/2I n-1W I -n/2e-nv/2 (1.13)
![Page 18: CANONICAL VARIATE ANALYSIS: SOME PRACTICAL ASPECTS by ... · 5.5 C-V plot for arctanh correlations for grasshopper data 137 5.6 I-A and Q-Q plots for log variances for Thais data](https://reader033.vdocuments.mx/reader033/viewer/2022052106/60412acbe2295f639322b604/html5/thumbnails/18.jpg)
17
with 2 v(v+l) + gv estimated parameters. The maximized likelihood for
the hypothesis specifying equality of the uk is
(2,0 -nv/2 (n 1(w + B) I -n/2e-nv/2 (1.14)
with 2 v(v+l) + v estimated parameters. This leads to the well-known
likelihood ratio statistic given by IWI/IW + BI, and commonly referred
to as Wilks A. The statistic A may be written as
_ h h A Jw1/1w+BI = IW I /ITI II+W 1BI-1 = II (l+fi)-1 = II (1-ri)
i=1 i=1
(1.15)
it is asymptotically distributed as x2 on v(g-1) d.f. An improved
approximation due to Bartlett is given in Kshirsagar (1972, p.301).
The non-centrality parameter for the x2 distribution is the trace of the
population analogue of W 1B. The matrix W 1B is referred to here as the
sample non-centrality matrix. As (1.10) shows, an eigenanalysis of
this matrix gives the sample canonical roots and vectors. The approach
described in the paragraph will be referred to in Section 8.2.2 as the
likelihood ratio formulation.
Now assume that all g of the vxl vectors of group means uk lie on
a p-dimensional hyperplane (p<h) or that there are v-p linear functional
relationships between the means. This is equivalent to specifying that
Pk = u0 EY' k (1.16)
where ' is the vxp matrix of population canonical vectors. This approach,
which is used extensively in Chaptem Four, Seven and Eight, is outlined
in Section 1.3. It again leads to the eigenanalysis (1.10); the
![Page 19: CANONICAL VARIATE ANALYSIS: SOME PRACTICAL ASPECTS by ... · 5.5 C-V plot for arctanh correlations for grasshopper data 137 5.6 I-A and Q-Q plots for log variances for Thais data](https://reader033.vdocuments.mx/reader033/viewer/2022052106/60412acbe2295f639322b604/html5/thumbnails/19.jpg)
18
estimator for '' is given by the first p columns of C. The maximized
likelihood is found to be
-nv/2 -1 -n/2 hn
n/2 -nv/2 (27r) In WI { (1 + i)} f e
i=p+l (1.17)
with 2 v(v+1) + v + vp - p2 + p(g-1) estimated parameters. The approach
outlined in this paragraph will be referred to in Section 8.2.3 as the
functional relationship formulation.
The functional relationship model (1.16) and associated maximized
likelihood in (1.17) encompasses the hypotheses resulting in the
maximized likelihoods in (1.15) and (1.14). The hypothesis of no
restriction on the means specifies that no reduction in dimensionality
is possible, so that p = h, and (1.17) reduces to (1.13). The hypothesis
of equality of the uk is equivalent to specifying that uk = uo in (1.16),
so that p = 0, and since
h { lI (1 + f.)}-1 = In 3111/1n- 1 047+B)1,
i=1
(1.17) reduces to (1.14).
An explicit eigenanalysis exists for the estimator of 'Y in (1.16).
However, if function maximization routines were to be used, and they
are in the generalizations in Chapter Eight since explicit solutions do
not result, the maximized likelihoods in (1.14) and (1.13) set bounds
for the maximized likelihood corresponding to (1.16) as p varies from
1 to h.
![Page 20: CANONICAL VARIATE ANALYSIS: SOME PRACTICAL ASPECTS by ... · 5.5 C-V plot for arctanh correlations for grasshopper data 137 5.6 I-A and Q-Q plots for log variances for Thais data](https://reader033.vdocuments.mx/reader033/viewer/2022052106/60412acbe2295f639322b604/html5/thumbnails/20.jpg)
19
1.3 Canonical Variate Analysis - A Functional Relationship
Formulation
The descriptive appeal of canonical variate analysis lies in its
ability to provide a graphical representation of the essential
differences between the groups in a reduced number of dimensions.
Group similarities and differences can be readily discerned from a
scatter plot of group means using the important canonical variates
for the coordinate system. Since the canonical variates are chosen to
be uncorrelated within groups and are usually standardized to have
unit standard deviation within groups, Euclidean distance is the
appropriate metric for interpreting distances. The number of canonical
vectors, p, required to describe the between-groups variation specifies
the effective dimensionality of the space spanned by the group means.
The specification that p canonical vectors are required is equivalent
to the specification that the vectors of group means lie on a
p-dimensional hyperplane, with p < h. An excellent discussion of this
aspect is given in Kshirsagar (1972, pp. 354-360).
Consider again g independent v-variate Nv(uk,E) populations, and
assume that all g of the vxl vectors of population means lie: on a
p-dimensional hyperplane, where p is specified. This can be written
as the following model
uk - uo + EICk (1.18)
where f is the vxp matrix of population canonical vectors. In (1.18),
uo is an unknown vxl fixed vectors E is the unknown vxv population
covariance matrix, assumed common for all populations; and the Ck are
unknown pxl vectors. The population canonical vectors L are uncorrelated
within groups, and are standardized to have unit standard deviation
![Page 21: CANONICAL VARIATE ANALYSIS: SOME PRACTICAL ASPECTS by ... · 5.5 C-V plot for arctanh correlations for grasshopper data 137 5.6 I-A and Q-Q plots for log variances for Thais data](https://reader033.vdocuments.mx/reader033/viewer/2022052106/60412acbe2295f639322b604/html5/thumbnails/21.jpg)
20
within groups, so that TTET = IpXp. Writing EYE = E in (1.18),
with =TE -lū = I, it follows that the columns Ci of E are basis vectors
for the canonical variate space, with the ?k specifying the coordinates
for each mean.
Rao (1973, Section 8c.6) gives results for the model in (1.18) when
E is known. Anderson (1951) derives the functional relationship solution
via a canonical correlation or regression formulation, and gives
(Anderson, 1,51, Section 7) some results for the g-sample problem
(which is equivalent to the formulation in (1.18)). A direct derivation
is given here, using results for matrix differentiation given in Bibby
and Toutenburg (1977, Appendix B).
Consider again the model
k U0 + ETck I
then the relevant part of the log likelihood is
g _ _ -n log I E ( - trE 1S - E nk (xk-uŌ ETCk) TE-1 (x.-p0 Z %V? k) . (1.19)
k=1
Differentiation with respect to (w.r.t.) Ck gives
Write
k = (TTE y) -1 ET)(xk - 110).
P = ET(TTET) 1',T ,
(1.20)
(1.21)
noting that P2 = P and that (I-P)TE-1(I-P) = E-1(I-P). Here P is a
generalized projection operator with respect to the metric E.
The log likelihood in (1.19) becomes
![Page 22: CANONICAL VARIATE ANALYSIS: SOME PRACTICAL ASPECTS by ... · 5.5 C-V plot for arctanh correlations for grasshopper data 137 5.6 I-A and Q-Q plots for log variances for Thais data](https://reader033.vdocuments.mx/reader033/viewer/2022052106/60412acbe2295f639322b604/html5/thumbnails/22.jpg)
21
g _ _ -n loglE) - trE-1S - E nk(xk-u0)TE-1(I-P)(xk-V0),
and differentiation w.r.t. u0 leads to
(1_13)11o = (I-P)xTT
(1.22)
Then the log likelihood in (1.19) maximized w.r.t. U0 and ck is
-n loglEJ - trE-1S - trE 1B + trE 1PB; (1.23)
here
E-1P = T ('YTET) -1TT
.
Using results in Bibby and Toutenburg (1977, Appendix B),
differentiation of (1.23) w.r.t. E and w.r.t. 'Y gives
A-1 "-1 A-1 A ATAA -1AT A ATAA -aAy -nE + E (S+B)E - Y' ('Y ET) Y' BT (T E'Y) 'V = 0 (1.24)
k=1
and
ATAA -14T A 4rA lATA ATAA -1AT -(T E'V) 'Y BT (T ET) 'Y E + ('Y E'V) 'Y B = 0 . (1.25)
Now introduce the usual conditions in canonical variate analysis,
namely that the canonical vectors are uncorrelated within groups with
unit variance, and are uncorrelated between groups, viz.
ATAA 'Y ET = I
and (1.26)
'Y TBT = nFp
![Page 23: CANONICAL VARIATE ANALYSIS: SOME PRACTICAL ASPECTS by ... · 5.5 C-V plot for arctanh correlations for grasshopper data 137 5.6 I-A and Q-Q plots for log variances for Thais data](https://reader033.vdocuments.mx/reader033/viewer/2022052106/60412acbe2295f639322b604/html5/thumbnails/23.jpg)
where F is a diagonal matrix.
Substitution of (1.26) into (1.24) leads to
nE = S + B - EWnF'YTE,
while substitution of (1.26) into (1.25) leads to
A AA
BT = E'YnF .
22
(1.27)
(1.28)
A
Postmultiplication of (1.27) by 'Y, and substitution of (1.26) and
(1.28) gives
AA A nE'Y = ST, (1.29)
which gives the fundamental canonical variate equation
A Bey = SlFp
which is of the same form as (1.10). Premultiplication of (1.29) by
AT and use of (1.26) gives
'I 'TET = Y'Tn-1ST = I .
Substitution of (1.29) into (1.27) gives
nE = S +B-S'IIFWTSn 1
S+ B-B/TTSn 1 (1.31)
.4 AT + B - B'YFT Bn -1 .
p
(1.30)
![Page 24: CANONICAL VARIATE ANALYSIS: SOME PRACTICAL ASPECTS by ... · 5.5 C-V plot for arctanh correlations for grasshopper data 137 5.6 I-A and Q-Q plots for log variances for Thais data](https://reader033.vdocuments.mx/reader033/viewer/2022052106/60412acbe2295f639322b604/html5/thumbnails/24.jpg)
23
It now remains to specify the canonical vectors 4' which maximize
(1.23). From (1.27),
nE(I +'YF'Y E) = S + B P
and so
114 n I E III + YF'YTE l = IS+BI
or
11211 + Fpi = IS+BI.
Also,
trE-1 (S+B) = n tr(I + 'VFp'Y E) = n tr(I + Fp)
and
trE-1PB = tr'YV B = n tr F .
P
Hence the maximized log likelihood in (1.23) becomes
-n log n-'VIS+BI + n logII+Fi - nv.
Now partition C and F in (1.10) as
C =(Cp, Cq)
(1.32)
F 0 F ( P )
0 F q
![Page 25: CANONICAL VARIATE ANALYSIS: SOME PRACTICAL ASPECTS by ... · 5.5 C-V plot for arctanh correlations for grasshopper data 137 5.6 I-A and Q-Q plots for log variances for Thais data](https://reader033.vdocuments.mx/reader033/viewer/2022052106/60412acbe2295f639322b604/html5/thumbnails/25.jpg)
24
where C is vxp and F is pxps then the log likelihood is maximized
A by choosing `Y = C. That is, the first p vectors of C give the
required canonical vectors under the functional relationship formulation.
A A It now remains to find expressions for the
~k and hence the uk.
From (1.20) and (1.26),
k = Cp
- u0),
and so from (1.18) and (1.29).
-1 T uk = uo + nSC C(xk - uo) P P
From (1.21), (1.29) and (1.22), this becomes
T uk = xT + VCpCP(xk - xT)
with V = n-1S. The canonical variate means are given by
ATA = C
PTxk .
These are the usual canonical variate means found by substituting the
vectors of sample means in the first p canonical variate equations.
1.4 Geometry of Canonical Variate Analysis
Geometrically, canonical variate analysis may be considered as a
two-stage rotation procedure, as illustrated in Rempe and Weber (1972).
The first stage involves orthogonal rotation of the original variables
to new uncorrelated variables. This may be accomplished in a number of
ways, one of the most common being to determine the principal components
of the original data, which corresponds to finding the principal axes
![Page 26: CANONICAL VARIATE ANALYSIS: SOME PRACTICAL ASPECTS by ... · 5.5 C-V plot for arctanh correlations for grasshopper data 137 5.6 I-A and Q-Q plots for log variances for Thais data](https://reader033.vdocuments.mx/reader033/viewer/2022052106/60412acbe2295f639322b604/html5/thumbnails/26.jpg)
25
of the pooled within-groups covariance ellipsoid. The new
uncorrelated variables are then scaled by the square roots of the
corresponding eigenvalues to have unit variance, so that the resulting
variables are orthonormal. The second stage of the procedure involves
a principal components analysis of the SSQPR matrix of the group means
in the space of the orthonormal variables. Mahalanobis D2 is the
square of the usual Euclidean distance in the rotated scaled orthonormal
variable space. Note that it is in the rotated scaled space, in which
concentration ellipsoids have become concentration spheres, that the
canonical variate group means are most usefully plotted.
The geometrical approach for the eigenanalysis first-stage rotation
may be expressed algebraically as follows. Write W in terms of its
eigenvectors U = (u1,...,uv) and eigenvalues E = diag(el,...,ev), i.e.
write
W = UEUT . (1.33)
The matrix of scaled eigenvectors UE-1/2
= (ul/V1 ,. .. 'u / v)
provides the first-stage orthonormalization, to
z = E 1/2UTx . (1.34)
The between-groups SSQPR matrix for these variables is given by
E 1/2UTBUE-1/2 (1.35)
The second-stage principal components analysis is
(E 1/2UTBUE-1/2 - fI)a = O (1.36)
and gives the canonical roots fi and canonical vectors for the orthonormal
![Page 27: CANONICAL VARIATE ANALYSIS: SOME PRACTICAL ASPECTS by ... · 5.5 C-V plot for arctanh correlations for grasshopper data 137 5.6 I-A and Q-Q plots for log variances for Thais data](https://reader033.vdocuments.mx/reader033/viewer/2022052106/60412acbe2295f639322b604/html5/thumbnails/27.jpg)
26
variables ai directly. Premultiplication by UE-1/2
and comparison
with (1.8) shows that the canonical vectors ci for the original variables
x are found from the ai by
ci = UE-1/2ai . (1.37)
Note that the canonical variate scores cix and aiz are the same.
1.5 Computation of the Canonical Variate Solution
The usual numerical procedure is to determine the canonical roots f
and canonical vectors a of W-1/2BW1/2; the c then follow since
c = W-1/2a. This procedure implicitly involves orthonormalization of
the original variables x, by transformation to new variables z = W-1/2
x
with identity within-groups matrix. The square-symmetric matrix
W1/2BW1/2 is simply the between-groups SSQPR matrix for the ortho-
normalized variables.
The two-stage eigenanalysis or principal components analysis given
in the previous Section provides one approach.
Let X* be the gxv matrix of group means and Z* = X*UĒ 1/2 be the
matrix of group means for the orthonormal variables defined in (1.34).
It is assumed that X* is mean-centred by columns, such that NX* = 0,
where
N = diag(ni,...,ng) .
For an unweighted between-groups formulation, 1 X* = 0.
Let
X = N1/2X*
![Page 28: CANONICAL VARIATE ANALYSIS: SOME PRACTICAL ASPECTS by ... · 5.5 C-V plot for arctanh correlations for grasshopper data 137 5.6 I-A and Q-Q plots for log variances for Thais data](https://reader033.vdocuments.mx/reader033/viewer/2022052106/60412acbe2295f639322b604/html5/thumbnails/28.jpg)
and
N1/2Z* = XUE 1/2 (1.38)
27
For an unweighted between-groups formulation, X = X* and i.= Z*.
Then
-T- B = X (1.39)
and the between-groups SSQPR matrix for the orthonormal variables is,
from (1.35) and (1.38) ,
E-112UTX LXUE 112 = Z Z.
The eigenanalysis given in (1.36) becomes
(Z Z - fI)a = 0 .
With
m = Za ,
this may be written as
ZZ m = fm,
(1.40)
(1.41)
(1.42)
where
mTm = f .
From (1.41), (1.38) and (1.37),
m = XUE-1/2a - Xc (1.43)
![Page 29: CANONICAL VARIATE ANALYSIS: SOME PRACTICAL ASPECTS by ... · 5.5 C-V plot for arctanh correlations for grasshopper data 137 5.6 I-A and Q-Q plots for log variances for Thais data](https://reader033.vdocuments.mx/reader033/viewer/2022052106/60412acbe2295f639322b604/html5/thumbnails/29.jpg)
28
and so N-1/2m = X*c which is just the vector of canonical variate
means.
The eigenanalysis of ZZ as in (1.42), rather than of the more
usual Z in (1.40), is often referred to as the Q-technique analysis
(Gower, 1966).
From (1.43),
XX m = X Xc.
But from (1.39) and (1.10),
X Xc = fWc
So
Xm= fWc
and hence
c = f-1W 1X m. (1.44)
From (1.42) , (1.39) and (1.33) ,
XUE 1U X m = fm
or
XW 1X m = fm . (1.45)
The eigenanalysis based on (1.42) (or (1.45)) and (1.44) will be
preferable when g « v.
![Page 30: CANONICAL VARIATE ANALYSIS: SOME PRACTICAL ASPECTS by ... · 5.5 C-V plot for arctanh correlations for grasshopper data 137 5.6 I-A and Q-Q plots for log variances for Thais data](https://reader033.vdocuments.mx/reader033/viewer/2022052106/60412acbe2295f639322b604/html5/thumbnails/30.jpg)
29
An alternative procedure (Ashton, Healy and Lipton, 1957; Gower,
1966) is to base the first-stage orthonormalization on a triangular
decomposition or Cholesky decomposition of W.
Let
W = UTUT
where UT is a vxv upper triangular matrix.
The orthonormal variables from the first stage are given by
ZT = UTTx .
The orthonormalization is a successive procedure: z1 is xl apart from
scaling; z2 is a function of xl and x2 only, and represents the residual
component of x2 after regressing on xl (=z1); and finally zv represents
the residual component of xv after regressing on all the previous x's.
Moreover, the ith diagonal term of U-1 is simply the inverse of the
ith diagonal term of UT; and the square of the latter is the residual
or conditional SSQ of xi given xl " " 'xi-l•
Now UT can be written as
UT = E
TUU (1.46)
where ET = diag(uTll
...,uTvv), and UU is a unit triangular matrix.
Write
ZT = XUU'ET1 . (1.47)
The eigenanalysis for the second stage is
(z - fI)aT = 0 (1.49)
![Page 31: CANONICAL VARIATE ANALYSIS: SOME PRACTICAL ASPECTS by ... · 5.5 C-V plot for arctanh correlations for grasshopper data 137 5.6 I-A and Q-Q plots for log variances for Thais data](https://reader033.vdocuments.mx/reader033/viewer/2022052106/60412acbe2295f639322b604/html5/thumbnails/31.jpg)
and
C = UUlET1aT .
The eigenanalysis in (1.48) can be written,from (1.47), as
(ETIUUPX LXUUlETl - fI)aT = 0 . (1.49)
30
The procedures outlined above involve formation of SSQPR matrices.
The trend now is to use numerically more accurate procedures which
avoid this, by working directly on decompositions of the original data
matrix (Chambers, 1977, Chapter 5). The procedures discussed above can
be specified in terms of singular value (SVD) and QR decompositions
(see Chambers, 1977, Sections 5e and 5b). Consider the Q-technique
formulation: by analogy with principal components via singular value
decomposition (Chambers, 1977, p.125), it follows immediately that
ZO = YGAT
is the appropriate singular value decomposition for the second stage of
the canonical variate analysis. Here
A0 01, ... ►rm ) ,
F = diag(fl,...,f) = G2 ►
and ZO is either Z or ZT, whence a0 is either a or aT, while
M = 4111 = YG = YF1/2
This procedure will determine the second-stage analysis more
accurately; it also unifies the usual and Q-technique approaches.
However, the first stage still involves formation of SSQPR matrices.
![Page 32: CANONICAL VARIATE ANALYSIS: SOME PRACTICAL ASPECTS by ... · 5.5 C-V plot for arctanh correlations for grasshopper data 137 5.6 I-A and Q-Q plots for log variances for Thais data](https://reader033.vdocuments.mx/reader033/viewer/2022052106/60412acbe2295f639322b604/html5/thumbnails/32.jpg)
31
Hence it would be desirable to be able to determine the eigenvectors
or triangular matrix for the first-stage orthonormalization using
numerically accurate procedures. An obvious approach is to determine
Z and E (or ZT and ET) simultaneously - but this does not appear
possible.
One possible computational approach for the first stage is as
follows. Let X be the nxv matrix of observations and let XG be the
nxv matrix such that the column sum for each group is zero; then
XGXG = W. The first-stage simultaneous rotation can be effected by
a singular value decomposition of XG as YGGGUT, which gives E = GG
and so Z = XUGG'. Since XG is mean-centred by groups, YG is of little
direct interest. Similarly, if XG = YTUT is the QR decomposition,
then ZTUT = X. There is, of course, the actual size of the data
matrix to be considered. It is not uncommon for a canonical variate
problem to include more than 500 observations; the Dicathais example
considered in Chapters Four and Six includes more than 850 observations,
while Hopper and Campbell (1977) consider 670 observations and 30
variables. The size of the data matrix and the computational time may
preclude explicit consideration of the above approaches for the first-
stage computations.
If numerically accurate procedures are to be used for the first-
stage computations, an alternative approach to that given in the previous
paragraph is to consider the nxv matrix X0 such that the overall column
sum is zero. Then XTXO = T. Here a singular value or QR decomposition
of X0 will lead directly to orthonormalized variables which, when
averaged, will produce matrices which play the same role as "For FT
respectively. The resulting singular values of Z or IT will lead to the
canonical correlation coefficients ri, with fi = ri(1 - ri)-1.
![Page 33: CANONICAL VARIATE ANALYSIS: SOME PRACTICAL ASPECTS by ... · 5.5 C-V plot for arctanh correlations for grasshopper data 137 5.6 I-A and Q-Q plots for log variances for Thais data](https://reader033.vdocuments.mx/reader033/viewer/2022052106/60412acbe2295f639322b604/html5/thumbnails/33.jpg)
32
1.6 Adequacy of Discriminant Functions and Subsets of Variables
This Section reviews results for examining in greater detail the
nature of the discrimination provided by the canonical vectors. Let
xv denote all v variables, let x denote a subset of p variables, and
let xq denote the remaining q = v-p variables. Write the Wilks ratios
in (1.15) corresponding to xv and to x as Av and A. If Ap is similar
to Av, then it is reasonable to conclude that the variables xq do not
contain any additional information for discrimination when the variables
x are first included. The hypothesis of no additional information in
the xq given the x has also been termed the hypothesis of sufficiency
of the x by Rao (1970). Since the Wilks ratios are invariant under
linear transformations, the subset x may be equated with hypothetical
discriminant vectors; in this context, Bartlett (1951) refers to the
adequacy of the hypothetical vectors for discrimination.
One formulation for the hypothesis of no additional information,
which is outlined below, is to consider the equality of the conditional
means E(xq/xp) for each group, assuming an underlying multivariate
Gaussian distribution. The Wilks ratio for the hypothesis is
Aq.p = Av/Ap, as given in (1.53) below. The derivation can also proceed
via a multivariate analysis of covariance formulation with x as the
covariates and xq as the variates (see Rao, 1952, Section 7d.4). Rao
(1970, p.589) presents other formulations.
The hypothesis of no additional information can be extended to
examine the adequacy of hypothetical discriminant vectors. Here p
hypothetical variables ETx are equated with the xp, and q hypothetical
variables with the xq. The approach can be extended to consider
agreement in direction of the sample and hypothetical discriminant
vectors, and =planarity of the means. Consider initially a single
hypothetical discriminant vector t. Then the Wilks ratio for the single
![Page 34: CANONICAL VARIATE ANALYSIS: SOME PRACTICAL ASPECTS by ... · 5.5 C-V plot for arctanh correlations for grasshopper data 137 5.6 I-A and Q-Q plots for log variances for Thais data](https://reader033.vdocuments.mx/reader033/viewer/2022052106/60412acbe2295f639322b604/html5/thumbnails/34.jpg)
33
variable ETx is A = ETWi/ETTC and the adequacy of ETx is examined by
considering Av/AC. Note that the ratio of the between-to-total SSQ
for the hypothetical variable is 0 = 1 - AE - ETBE/ETTC which is the
square of the canonical correlation coefficient for the hypothetical h
vector. Wilks Av can be written as (1 - r1) II (1 - r.) using (1.15) i=2
and so
h Av/Ag
= {(1 - r2)/(1 - 0)} II (1 - ri). i=2
The first term (1 - ri)/(1 - 0) represents the ratio of within-to-total
SSQ for the sample and hypothetical vectors and is a reflection of
their agreement in direction. The second term II (1 - rfi) is a i=2
measure of the lack of collinearity of the group means. The above
formulation is essentially due to Bartlett (1951).
The extension to p hypothetical discriminant vectors E is immediate:
with 1 - 0 _ (~TTTSI, the direction term is II (1 - r?)/(1 - 0), i=1
while the coplanarity term is II (1 - r2). Radcliffe (1966) notes
i-p+1 that tests based on the above factorization are approximate in the sense
that the exact distributions of the factors are not known, nor is it
claimed that the factors are independent or almost independent.
Bartlett (1951) and Williams (1961, 1967) have developed exact factori-
zations using regression arguments, though the so-called approximate
factorization has undoubted practical appeal.
The direction and coplanarity terms can be derived as ratios of
maximized likelihoods. The direction term corresponds to the ratio of
maximized likelihoods for the hypothesis that ETx is adequate for
discrimination versus the hypothesis that the means lie on a
p-dimensional hyperplane. The second, coplanarity, term corresponds
to the ratio of maximized likelihoods for the p-dimensional hyperplane
h
![Page 35: CANONICAL VARIATE ANALYSIS: SOME PRACTICAL ASPECTS by ... · 5.5 C-V plot for arctanh correlations for grasshopper data 137 5.6 I-A and Q-Q plots for log variances for Thais data](https://reader033.vdocuments.mx/reader033/viewer/2022052106/60412acbe2295f639322b604/html5/thumbnails/35.jpg)
34
hypothesis versus the hypothesis of no restriction on the means.
Radcliffe (1967) has derived the likelihood ratios in the context of
a canonical correlation or reduced-rank regression formulation. A more
direct approach in the context of canonical variate analysis is to
specify various models or hypotheses for the means uk as in Section 1.3
and this is now outlined.
Partition xv, uk and t as follows. Write xv = CxP,xTq)T,
Pk = (uPk,ucik)T, and E - ( Epp E') . Similar partitions will be qP qq
used for the various matrices and vectors introduced in Section 1.2.
The hypothesis that there is no information in xq for discrimination
conditional on x may be specified as the equality of the conditional
means u , where u = u - S u and S = E E-1 Write g q•P►k q•P,k qk qP Pk qP qP pp
W = E Sk as in (1.3), and W = W - W WW . The maximized k=1 qq•P qq qP PP Pq
likelihood for no restriction on the conditional means is easily shown
to be
(20-np/2In 1 ppI-n/2e np/2(2~r) nq/2I
n 1Wgq.p1-n/2e-nq/2.
Since
IWI 114 11Wgq.pl. (1.50)
this likelihood is equivalent to the usual likelihood given in (1.13).
Let T, T and T pp pq qq
Tgq.p as for Wgq.p above.
specifying equality of the
be the partition of T in
The maximized likelihood
conditional means u q•P,k
(1.11), and define
for the hypothesis
but no restriction
on the upk is
(2w)-nv/2In 1W I-n/2In 1T I-n/2e-nv/2 PP qq.p
![Page 36: CANONICAL VARIATE ANALYSIS: SOME PRACTICAL ASPECTS by ... · 5.5 C-V plot for arctanh correlations for grasshopper data 137 5.6 I-A and Q-Q plots for log variances for Thais data](https://reader033.vdocuments.mx/reader033/viewer/2022052106/60412acbe2295f639322b604/html5/thumbnails/36.jpg)
35
with 2 (v+l) + gp + q estimated parameters. Using a determinantal
identity similar to that in (1.50), this may be written as
(21r)-nv/21 1 PP I-n/2In-1TI-n/2In-1T
PPIn/2e-nv/2
(1.51)
Note that the variables xq do not enter explicitly into any of the
maximized likelihoods in (1.13), (1.14) or (1.51).
If xp is equated with the hypothetical canonical variates x,
then
W = 5TF15 PP
and (1.52)
T = TTE = ET(B+W)E.
PP
The hypothesis that the means liee on a p-dimensional hyperplane
is discussed briefly in Section 1.2 and outlined in more detail in
Section 1.3. The maximized likelihood is given in (1.17).
The likelihood ratio statistic for examining the adequacy of the
p variables x is given by comparing the maximized likelihood for the
unrestricted hypothesis with that specifying equality of the conditional
means. From (1.13) and (1.51), this is
{(IWPPIITI)/(IWIITPPI)}-n/2 = (Av/Ap)
n/2 (1.53)
as noted in the second paragraph of this Section.
Now equate x with ETx. From (1.52), A = IETWEI/IET(B+W)EI, and
the adequacy of the p hypothetical discriminant vectors E is again
examined by considering Av/Ap.
![Page 37: CANONICAL VARIATE ANALYSIS: SOME PRACTICAL ASPECTS by ... · 5.5 C-V plot for arctanh correlations for grasshopper data 137 5.6 I-A and Q-Q plots for log variances for Thais data](https://reader033.vdocuments.mx/reader033/viewer/2022052106/60412acbe2295f639322b604/html5/thumbnails/37.jpg)
=
n/2
} (1.54)
36
The ratio of maximized likelihoods for the hypothesis that ETx
is adequate for discrimination versus the hypothesis that the means
lie on a p-dimensional hyperplane is, from (1.51), (1.52) and (1.17),
and the relationships in (1.15),
A-n/21TI-n/2 Ā n/2 -n/2
h (111 -n/2{ II
(14.f ))-n/2
i-p+l
h {A II (l+f ,) }-n/2
v i=p+1 1
p { n (14.f.)-1)-n/2 i=1
This is the direction term referred to above, raised to the power n/2,
since A = 1 - 4).
The ratio of maximized likelihoods for the p-dimensional hyperplane
hypothesis versus the hypothesis of no restriction on the means is,
from (1.17) and (1.13), and the relationships in (1.15),
{ II (1+f)}-n/2 = { II (1-r1)}n/2 imp+l i=p+1
(1.55)
This is the coplanarity term referred to above raised to the power n/2.
The product of the two ratios in (1.54) and (1.55) is h
{ II (1-ri)/Ap}n/2, which,from (1.15), is simply (Av/Ap)n/2 as in (1.53). i=1
![Page 38: CANONICAL VARIATE ANALYSIS: SOME PRACTICAL ASPECTS by ... · 5.5 C-V plot for arctanh correlations for grasshopper data 137 5.6 I-A and Q-Q plots for log variances for Thais data](https://reader033.vdocuments.mx/reader033/viewer/2022052106/60412acbe2295f639322b604/html5/thumbnails/38.jpg)
CHAPTER TWO: DETECTION OF ATYPICAL OBSERVATIONS IN DISCRIMINANT
ANALYSIS
In this Chapter, the influence function is used to develop criteria
for detecting atypical observations in discriminant analysis. Section
2.1 discusses the influence function. Section 2.2 develops the
influence function for various aspects of discriminant analysis:
Mahalanobis D2, the discriminant function group means, and functions
of the vector of coefficients. For Mahalanobis D2, the influence function
is a quadratic function of the deviation of the discriminant score for
the perturbed observation from the discriminant score for the mean of
the corresponding group. Chi-squared approximations to the distributions
of the influence functions of interest are also developed in Section 2.2,
and graphical representation is considered in Section 2.3. An example
is given in Section 2.4.
2.1 Influence Function
The applied statistician is often faced with the problem of how to
detect and then treat apparently atypical observations. The detection
of such observations in multivariate data is often more complex than
in the univariate case, since the effect of an atypical observation on
the means, variances and on the correlations between the variables may
need to be taken into account. Gnanadesikan and Kettenring (1972) and
Gnanadesikan (1977) discuss problems of detection of atypical observations
in multivariate data and point out some of the difficulties (see, in
particular, Gnanadesikan, 1977, Section 6.4.2).
One obvious and intuitively appealing approach is to carry through
an analysis with and without a suspected atypical observation, and to
37
![Page 39: CANONICAL VARIATE ANALYSIS: SOME PRACTICAL ASPECTS by ... · 5.5 C-V plot for arctanh correlations for grasshopper data 137 5.6 I-A and Q-Q plots for log variances for Thais data](https://reader033.vdocuments.mx/reader033/viewer/2022052106/60412acbe2295f639322b604/html5/thumbnails/39.jpg)
38
compare the results so obtained. The influence function of Hampel
(1974) provides a useful tool to formalize this. The theoretical
influence function for a particular parameter 6, such as the mean,
is found by perturbing the distribution function F by adding a small
contribution from a unit mass at the point x, evaluating the parameter
at the perturbed distribution function, and subtracting the parameter
evaluated at the unperturbed distribution function. A formal definition
is given by Gnanadesikan (1977, p.272) (see also Devlin et al, 1975).
Essentially, the influence functiōn is the derivative of the parameter 0
w.r.t. the distribution function ?.
Distinction is made between the theoretical influence function,
described in the previous paragraph, and the sample influence function,
in which an actual observation is deleted. A A
If A is an estimator based on n observations, and A-m is an
A estimator, of the same form as A, determined without the mth observation,
A A Devlin et al. (1975) have suggested calling I-(xm;0) = (n - 1)(A - 8- )
the sample influence function. It is the sample influence function
which is of practical interest in the present context, though it is more
convenient to study the theoretical influence function, and extend the
results to the sample case.
When the parameter of interest involves more than one population,
as in multivariate between-group studies, the theoretical influence
function is then determined by perturbing only one of the distribution
functions in the above way (in the sample, by eliminating an observation
from only one of the groups); the parameter is evaluated for one of
the distribution functions perturbed and the others unchanged, and
the parameter evaluated for all the distribution functions unperturbed
is then subtracted.
![Page 40: CANONICAL VARIATE ANALYSIS: SOME PRACTICAL ASPECTS by ... · 5.5 C-V plot for arctanh correlations for grasshopper data 137 5.6 I-A and Q-Q plots for log variances for Thais data](https://reader033.vdocuments.mx/reader033/viewer/2022052106/60412acbe2295f639322b604/html5/thumbnails/40.jpg)
39
This may be written more formally as follows. Consider a general
parameter A = T(Fl,...,Fk,...,Fg), expressed as a functional of the
distribution functions Pk, k = 1,...,g. The perturbed distribution
function may be written
Fk = (1 - E)Fk + cox,
where Sx is the distribution function which assigns unit probability to
the point x. Write ek = T(Fl,...,Fk,...,Fg) for the parameter evaluated
at the perturbed distribution function; the influence function at x is
then given by
Ik(x;A) = lim $ A
E-►0
The subscript k is not retained in the remainder of this Chapter, since
only the distribution function for the first population is perturbed.
In the evaluation of the theoretical influence function only terms of
order E need be retained; for the sample version, this is equivalent to
assuming that terms of order (n-2
) can be ignored.
2.2 Influence Function in Discriminant Analysis
Consider the population linear discriminant function, given by
TE-lx = *Tx, where x ti N(uk,E) if x belongs to population k = 1,2, and
S = ul - u2. The parameters considered here are Mahalanobis Q2 = STE-1S;
the discriminant means -1
~, uk; and the vector of coefficients ~ = E S.
These all involve E-1 and uk. Hence to determine the influence functions,
it is first necessary to consider the effect of the perturbation on uk
and E-1.
![Page 41: CANONICAL VARIATE ANALYSIS: SOME PRACTICAL ASPECTS by ... · 5.5 C-V plot for arctanh correlations for grasshopper data 137 5.6 I-A and Q-Q plots for log variances for Thais data](https://reader033.vdocuments.mx/reader033/viewer/2022052106/60412acbe2295f639322b604/html5/thumbnails/41.jpg)
40
To do this, formally consider
E = w1EF + w2EF 1 2
with w1 + w
2 = 1 and w
k > 0
where EF = j(x - uk)(x - uk)dFk k
(2.1)
(2.2)
and uk = jxdFk. (2.3)
In the following derivation, it is assumed that EF1 = EF2 (i.e.
that the covariance matrices are equal). The adoption of general wl
and w2 in (2.1) is to cover the possibility of unequal sample sizes in
the extension to the sample influence function.
Now perturb the first population, evaluating (2.2) and (2.3) at
F1 (1 - 1 + edx. Write -► to indicate the parameter after
perturbation; then
ul + (1 - e)ul + ex = ul + e(x - ul) = ul + ez
where z = x - ul, and so the expression for S becomes
d-;d+ez. (2.4)
Similarly
EF (1 - e)E + ezzT, F1 F1
and so
E + (1 - ew1)E + ewlzzT
![Page 42: CANONICAL VARIATE ANALYSIS: SOME PRACTICAL ASPECTS by ... · 5.5 C-V plot for arctanh correlations for grasshopper data 137 5.6 I-A and Q-Q plots for log variances for Thais data](https://reader033.vdocuments.mx/reader033/viewer/2022052106/60412acbe2295f639322b604/html5/thumbnails/42.jpg)
giving
-1 ew1E-1zzTE-1
E-1 (1 - ew1)-1
(E-1 - T -1 1-ew1 +ew1zE z
(1 + ew1)E-1 - ew1E-1zzTE-1 , (2.5)
41
to order e.
2.2.1 Influence function for Mahalanobis G2
Mahalanobis 02 is defined as (u1 - u2)TE-1(u1 - u2). Evaluation
of A2 at the perturbed distribution function Fl and the unperturbed
distribution function E.2' using (2.4) and (2.5), gives
A2 -r (d + ez)T{(1 + 1)E-1 - ew1E-1zzTE 1}(d + ez).
Again retain terms only up to order e, and write
$ ° 6TE-1z, (2.6)
which leads to A2 -► (1 + ewl)A2 + 2e$ - ewl02 .
Hence the influence function for A2 becomes
I(xJa2) w1A2 + 2. - wig2.
In (2.6), 4 = E-16 is the vector of discriminant coefficients, so
that $ is simply *(x - p1), which is the deviation of the discriminant
score from the discriminant mean for the first population.
![Page 43: CANONICAL VARIATE ANALYSIS: SOME PRACTICAL ASPECTS by ... · 5.5 C-V plot for arctanh correlations for grasshopper data 137 5.6 I-A and Q-Q plots for log variances for Thais data](https://reader033.vdocuments.mx/reader033/viewer/2022052106/60412acbe2295f639322b604/html5/thumbnails/43.jpg)
42
Note that P is not standardized to unit variance within populations;
the variance is a2. Write the standardized vector of coefficients as
Ost A-1 O, and let Ost = a 1O. Since 0 N N(0,A2), then
ist ti N(O,1)
which corresponds to the usual form for the discriminant score.
The influence function for e2
in terms of Ost
is given by the
following main result:
I(x1A2) = w1A2 + st - w1A2 st (2.7)
2.2.2 Influence function for discriminant means
The discriminant mean for the kth population is $TUk• Now follow
the approach outlined in Section 2.2.1, and define
nk = zTE-luk,
which leads to
TUl + (1 + cw1)IPTU1 + e4 - ew1On1 + en1
and hence
I(x;*TU1) - w1PT11] + - wisnl + ni.
Similarly ,
I( T x;iU2) T = wl~U2 - wl+n2 + n2 •
![Page 44: CANONICAL VARIATE ANALYSIS: SOME PRACTICAL ASPECTS by ... · 5.5 C-V plot for arctanh correlations for grasshopper data 137 5.6 I-A and Q-Q plots for log variances for Thais data](https://reader033.vdocuments.mx/reader033/viewer/2022052106/60412acbe2295f639322b604/html5/thumbnails/44.jpg)
43
2.2.3 Influence function for the discriminant function coefficients
The vector of discriminant coefficients is U • E-1d. From (2.4)
and (2.5)
I(x, ) • w1* - wi+Elz + E lz
wl* + (1 - wiS)E-1z. (2.8)
A simple scalar summary is the squared length of the vector. The
influence function for the squared length is given by
I (x; *TVS) • 2w1*T* - 2w1,*TE-1z + 2*TE lz
• 2wl*TVS + 2(1 - w1,) VITE-lz. (2.9)
By the results (2.4) and (2.5), or equivalently by (2.8) and (2.7), the
influence function for the standardized coefficients is
i (x;*s ) .. { - 2 l0 (2 0A-2)
ist - -lE-lz (wo - 1) .
2.2.4 Approximations and representation of the influence functions
The influence function is a useful tool for assessing the effect of
a point x on the parameter of interest. It is also possible to consider
the influence function as a random variable, since it is a mathematical
transformation of a random variable x and as such will have a probability
distribution. The distribution is considered here for the theoretical
influence function and the results are extended in Section 2.3 to the
sample influence function.
![Page 45: CANONICAL VARIATE ANALYSIS: SOME PRACTICAL ASPECTS by ... · 5.5 C-V plot for arctanh correlations for grasshopper data 137 5.6 I-A and Q-Q plots for log variances for Thais data](https://reader033.vdocuments.mx/reader033/viewer/2022052106/60412acbe2295f639322b604/html5/thumbnails/45.jpg)
44
With x assumed to follow a multivariate Gaussian distribution,
the distribution of the influence function for A2 is negatively skewed.
Differentiation of (2.7) w.r.t. Ost shows that the maximum value of
I(x,A2) occurs at Ost = (wIA)-1 (or 0 = wll); hence
I x(x,A2) = w1A2 + w1 . From initial experience with the approach,
it seems to be more convenient to consider I x(x;A2) - I(x;A2) = ma
IM(x;A2) say, where
IM (x;A ) = w 1(1 - 2w1"st+ w1A2 st ) , (2.10)
since its distribution is non-negative and positively skewed, and so can
be more readily approximated.
From (2.10), IM(x;A2) can be written as
IM(x;A2) = w1A2(0st- w11A-1)
2 •
That is, IM(x;A2) is distributed as w1A2 times a non-central chi-squared
variate with 1 d.f. and non-centrality parameter (w2A2)-1. The moments
of IM(x;A2) are readily evaluated: E(IM) = wll + wiA2, and
E(IM) = w12
+ 6A2 + 3w2A4.
Johnson and Kotz (1970, Section 28.8) suggest approximating the
non-central chi-squared distribution by a gamma distribution; empirical
evidence also supports the approximation. Equating the moments for
IM(x;A2) with those for the bXv
distribution gives v = 1 + (2w12 A2 +
w1A4)-1,
and b = v-1(1 + w12A-2).
The form of the influence function in (2.7) suggests plotting the
change in Mahalanobis A2 against the corresponding discriminant score.
Useful graphical representations and approximations also exist for the
other summary statistics in discriminant analysis. The influence function
![Page 46: CANONICAL VARIATE ANALYSIS: SOME PRACTICAL ASPECTS by ... · 5.5 C-V plot for arctanh correlations for grasshopper data 137 5.6 I-A and Q-Q plots for log variances for Thais data](https://reader033.vdocuments.mx/reader033/viewer/2022052106/60412acbe2295f639322b604/html5/thumbnails/46.jpg)
45
for the squared length of the coefficient vector, given in (2.9),
involves the discriminant score and *TE-1z. With = E-1z, a plot of
the influence function for 1pT*
against 4 and x = *T shows a quadratic
relationship, with the maximum occurring at • = wl, K = 0. Its value
is then 2w1pT*. For graphical representation, a plot of the change
in squared length of * against some linear combination of 0 and K
will usually be adequate, since the two are generally highly correlated.
The variances and covariance of and K are STS, STA-2S and STA-lc
where E = rArT and S = A-1/2rTd. Either the regression of 0 on K or
the first eigenvector seems suitable for graphical representation.
As for the theoretical influence function for I(x;A2), the shape
of the distribution for I(x;lT*) suggests considering instead
IM(xi)TIP) = I x(x;*TVS) - I(xi*TIP) where ma
IM(X ) 4) = 2(w1 A0
st - 1) K (2.11)
since its distribution is positively skewed; the second moment is more
difficult to evaluate, though it can be found by following the approach
given in Kshirsagar (1972, Chapter 6, Section 5).
2.3 Probability Plots to Detect Possible Atypical Values
In practical applications of discriminant analysis, interest usually
focubes on the degree of group separation, reflected in Mahalanobis D2,
and on the relative orders of magnitude of the coefficients as indicators
of the important discriminating variables.
The sample analogues of the theoretical influence functions are now
considered. As noted in Section 2.1, results for the theoretical influence
function carry over to the sample case if, in the derivation, e is replaced
![Page 47: CANONICAL VARIATE ANALYSIS: SOME PRACTICAL ASPECTS by ... · 5.5 C-V plot for arctanh correlations for grasshopper data 137 5.6 I-A and Q-Q plots for log variances for Thais data](https://reader033.vdocuments.mx/reader033/viewer/2022052106/60412acbe2295f639322b604/html5/thumbnails/47.jpg)
46
by -1/(n - 1) and terms of order (n-2) can be ignored. Rather than
consider sample analogues of I(x;A2) in (2.7) and I(x;bT,) in (2.9),
it is more instructive, as in Section 2.2.4, to consider the sample
influence functions corresponding to IM(x;A2) in (2.10) and to
I(x;*T*) in (2.11). The sample influence functions are given by
•stby cs(xm - xl) where cs is the standardized vector of
sample discriminant coefficients, xm is the mth observation, and ;lc
is the vector of means for the kth group, k = 1,2. The weights wk
are given by nk(ni + n2)-1. Mahalanobis A2 is replaced by D2, while
T — — I) 6 becomes (x1 - x2)
T V-2 (I - x1), where V is the pooled covariance
matrix on n1 + n2 - 2 d.f.
The results presented in Section 2.2.4 hold only asymptotically
for the sample influence functions. Before proceeding with detailed
applications in practical situations, the adequacy of the chi-squared
approximations was checked against data generated from bivariate
Gaussian distributions with unit variances and correlation 0.9. The
means were taken as (0,0) and (0,2); 50 observations were generated
for the first group. A brief summary of one such run is as follows.
The observed variances are 0.70 and 0.90, and the correlation 0.89„
Mahalanobis D2 is 20.67. The following results are for the sample values
2 - D2 D m, rather than (n - 1) = 49 times this quantity. (Note that for
the sample influence function, (n - 1)(D2 - D2m), the n refers to the
number of observations associated with the perturbed group). A Q-Q
gamma plot of (n - 1)-1IM(x;D2), with the parameters estimated by
maximum likelihood using the smallest 45 order statistics (Wilk,
Gnanadesikan and Huyett, 1962), is similar to that derived from the x 2
approximation. The maximum likelihood estimate of the shape parameter
is 0.55; the x2 approximation gives 0.50. The corresponding estimates
replacing
![Page 48: CANONICAL VARIATE ANALYSIS: SOME PRACTICAL ASPECTS by ... · 5.5 C-V plot for arctanh correlations for grasshopper data 137 5.6 I-A and Q-Q plots for log variances for Thais data](https://reader033.vdocuments.mx/reader033/viewer/2022052106/60412acbe2295f639322b604/html5/thumbnails/48.jpg)
47
for the scale parameter are 2.05 and 2.16 respectively. A Q - Q
plot of the sample influence function values for IM(x;cTc) against
the quantiles of a gamma distribution with parameters estimated by
maximum likelihood also provides a good approximation.
The examination of the results for the simulated data, and
subsequent practical experience, suggests a gamma approximation for
IM(x;D2) and for IM(x;cTc) to be adequate. The approach is now applied
to a practical example.
2.4 An Example
The following examination of data on two species of the rock crab
Leptograpsus is presented to show how the influence function and related
summary statistics can be used to assess the relative influences of the
different observations on the statistics of interest. It is then
necessary for the statistician to examine the absolute influence of the
observation; even when a few points appear to have a large relative
influence, they still may not affect the statistics of interest to any
important extent. Chapters Three and Four discuss robust procedures
for accommodating atypical observations.
Campbell and Mahon (1974) examined morphological divergence between
the two species of rock crab, here referred to as the blue species and
the orange species. A canonical variate analysis showed the major
divergence to be that between the species, with sexual dimorphism
being considerably less marked. In further unpublished work, the sexes
have been combined and this is done here. Five characters were measured
on 100 individuals of each species. Bivariate scatter plots are presented
in Figure 4 of Campbell and Mahon (1974).
The observed variances and correlations are very similar for the two
species. A discriminant analysis shows no overlap of discriminant scores.
![Page 49: CANONICAL VARIATE ANALYSIS: SOME PRACTICAL ASPECTS by ... · 5.5 C-V plot for arctanh correlations for grasshopper data 137 5.6 I-A and Q-Q plots for log variances for Thais data](https://reader033.vdocuments.mx/reader033/viewer/2022052106/60412acbe2295f639322b604/html5/thumbnails/49.jpg)
48
The observed Mahalanobis D2 is 27.8.
Figures 2.1(a) and (2.1(b) show plots of the change in D2 against
the deviation of the standardized discriminant score from that for the
species mean - the term discriminant score will be taken to imply this.
The quadratic nature of the plots is obvious, with the maximum
occurring around cs(m - x1) = 0.4 (= 2/D) in each case. Mahalanobis
D2 is decreased for observations with discriminant scores between
approximately -0.5 and 1.5 while D2 is actually increased when an
observation with a discriminant score outside this interval is deleted.
The greatest increase in D2 corresponds to the deletion of observations
whose scores are furthest from that for the mean of the other species.
Examination of the 100 discriminant scores for each species shows
some negative skewness. For example, the first and last deciles, the
quartiles and median for the two species are (interpolated from the
ecdf and rounded): orange -1.50, -1.0, 0.0, 0.75, 1.25; and blue
-1.25, -0.75, 0.0, 0.50, 1.00. The scores for the orange species are
more dispersed.
Application of the often used, but only asymptotically true,
argument that the discriminant scores can be treated as standard
Gaussian deviates would suggest that three orange specimens and three
blue specimens should be examined further. The maximum increase in D2
corresponds to a discriminant score of 3.0 for the blue and 2.4 for
the orange species.
Figures 2.2(a) and 2.2(b) show Q-Q gamma plots of IM(D2), with
parameters estimated by maximum likelihood from the smallest 95 order
statistics. The linearity of both plots is apparent; the slope of the
plot for the orange species is very close to unity, as is that for the
blue species if the atypical observation is ignored. The estimated
shape parameters are 0.65 and 0.70 respectively. The linearity of the
![Page 50: CANONICAL VARIATE ANALYSIS: SOME PRACTICAL ASPECTS by ... · 5.5 C-V plot for arctanh correlations for grasshopper data 137 5.6 I-A and Q-Q plots for log variances for Thais data](https://reader033.vdocuments.mx/reader033/viewer/2022052106/60412acbe2295f639322b604/html5/thumbnails/50.jpg)
Figure 2.1 Plot of D2 - D2m
against cs(xm - x) for
(a) orange species - score for blue mean is 5.27
(b) blue species - score for orange mean is -5.27
Figure 2.2 Gamma probability plot of IM(x;D2), with parameters
estimated by maximum likelihood for
(a) orange species - shape parameter = 0.65
(b) blue species - shape parameter = 0.71
Figure 2.3 Gamma probability plot of IM(x;cTc), with parameters
estimated by maximum likelihood for
(a) orange species
(b) blue species
The symbol *, • or # in these Figures and the Figures in the
following Chapters represents one individual; a number, on the
same figure, indicates that number of overprintings; and 9
indicates at least 9 overprintings
49
![Page 51: CANONICAL VARIATE ANALYSIS: SOME PRACTICAL ASPECTS by ... · 5.5 C-V plot for arctanh correlations for grasshopper data 137 5.6 I-A and Q-Q plots for log variances for Thais data](https://reader033.vdocuments.mx/reader033/viewer/2022052106/60412acbe2295f639322b604/html5/thumbnails/51.jpg)
(a) Orange
I
0.15
0.00
N -015 cn
.6 0 0 i0-0.30 ~- .0 ca
.0 -0.45 a) 0) C c0 , -o•sōf--
-0.75
-0.90
132*
1
32156 3.41 2
21• 3* a
• 1 2
1 2 • 1
• • •
• •
•
I
(b) Blue 6.1279 99 *914154 sQl•
*2 1 • •1 1* .
12 1 • • 1
N •
•
3• 3• 2*
-2.0 -1.0 0.0 1.0 2.0 -1.0 0.0 1.0 2.0 3.0 Discriminant Score Discriminant Score
Figure 2.1
![Page 52: CANONICAL VARIATE ANALYSIS: SOME PRACTICAL ASPECTS by ... · 5.5 C-V plot for arctanh correlations for grasshopper data 137 5.6 I-A and Q-Q plots for log variances for Thais data](https://reader033.vdocuments.mx/reader033/viewer/2022052106/60412acbe2295f639322b604/html5/thumbnails/52.jpg)
(a) Orange
**
..)(441-
1.2
0.9
0•6
(b) Blue
* *
1* * * * *1 * 1* - 0.3 _ *2 .- *2 - )Ic1-* * *1
I 13 *2 22 122*
32 * 22 31 25 *54* *8 1 I 1 1 49$6
9 I I ( I 0.0 0.3 0.6 0.9 1.2 1.5 0.0 0.16 0.32 0.48 0.64
Gamma quantiles Gamma quantiles Figure 2.2
0.80
![Page 53: CANONICAL VARIATE ANALYSIS: SOME PRACTICAL ASPECTS by ... · 5.5 C-V plot for arctanh correlations for grasshopper data 137 5.6 I-A and Q-Q plots for log variances for Thais data](https://reader033.vdocuments.mx/reader033/viewer/2022052106/60412acbe2295f639322b604/html5/thumbnails/53.jpg)
121 ** 12 1
3 51
5 18* 6
16- (a) Orange
12
** *
4
I I I 4 8 12 16
Gamma quantiles 0
92 20
(b) Blue
* *1
*1* 1*
*2* *32
32 3*
651 776
67* 0 2 4 6
Gamma quantiles Figure 2.3
![Page 54: CANONICAL VARIATE ANALYSIS: SOME PRACTICAL ASPECTS by ... · 5.5 C-V plot for arctanh correlations for grasshopper data 137 5.6 I-A and Q-Q plots for log variances for Thais data](https://reader033.vdocuments.mx/reader033/viewer/2022052106/60412acbe2295f639322b604/html5/thumbnails/54.jpg)
53
plots contrasts with the lack of linearity of a Gaussian probability
plot of the discriminant scores; the latter is not surprising for
this data set in view of the skewness discussed above.
Figures 2.3(a) and 2.3(b) show Q-Q gamma plots of IM(cTc), with
parameters estimated by maximum likelihood. Again the main trend for
each species lies close to the unit slope line through the origin.
For the orange species, the observed distribution is slightly shorter-
tailed; slight curvature is also evident for the blue species, though
three atypical observations are indicated.
None of the summary graphs and analyses indicates any atypical
observations for the orange species. The histogram and ecdf of the
discriminant scores, the plot of D2 - D2m
against cs( fi - x1), and the
gamma plot for the influence function of cTc all indicate that three
blue observations warrant further examination. The gamma plot for the
influence function for D2 indicates one of the three as being obviously
atypical. Examination of Figure 4(a) in Campbell and Mahon (1974)
shows three observations with carapace width larger than expected on
the basis of that for front lip when compared with the remaining
observations.
The detailed examination of the rock crab example using the
influence function and related summary statistics has identified one,
and possibly three, observations for the blue species which warrant
further consideration. Moreover, observations from the orange species
with similar discriminant scores to two of the three blue observations
are shown to have minimal influence on D2. The rock crab data had
previously been screened by visual scanning of observations ordered by
largest variable measurement, by comparison of correlations and
variances across groups, and by examination of canonical variate scores
after a preliminary species/sex analysis. It was unlikely, therefore,
![Page 55: CANONICAL VARIATE ANALYSIS: SOME PRACTICAL ASPECTS by ... · 5.5 C-V plot for arctanh correlations for grasshopper data 137 5.6 I-A and Q-Q plots for log variances for Thais data](https://reader033.vdocuments.mx/reader033/viewer/2022052106/60412acbe2295f639322b604/html5/thumbnails/55.jpg)
54
that observations having a substantial effect on the analysis would
have remained undetected. And this is so; the deletion of the one,
obvious possible atypical value increases D2 by only 0.9 (to 28.7),
so that the absolute influence of the value is in this case minimal.
What this study has shown, somewhat surprisingly, is the asymmetric
way in which observations influence D2. Moreover, inclusion of an
observation lying furthest from the mean of the other group decreases
rather than increases D2. In the latter case the increase in variances
and/or change in correlation must offset the increase in the separation
of the means. The linear and quadratic components for Ost
in (2.7)
reflect the linear and the quadratic nature of (2.4) and (2.5)
respectively.
The analysis of the Leptograpsus data illustrates that inspection
of discriminant scores per se may sometimes be misleading. Obvious
atypical scores in discriminant and canonical variate analysis are
often taken to be indicative of incorrect measurement or incorrect
allocation of specimens (for example with respect to sex). However,
the assumption of an approximate Gaussian distribution of the
discriminant scores to guide such decisions may not always be justified.
![Page 56: CANONICAL VARIATE ANALYSIS: SOME PRACTICAL ASPECTS by ... · 5.5 C-V plot for arctanh correlations for grasshopper data 137 5.6 I-A and Q-Q plots for log variances for Thais data](https://reader033.vdocuments.mx/reader033/viewer/2022052106/60412acbe2295f639322b604/html5/thumbnails/56.jpg)
55
CHAPTER THREE: ROBUST PROCEDURES TO EXAMINE VARIATION WITHIN A GROUP
In this Chapter, the performance of robust procedures when applied
to multivariate data is examined. Robust M-estimation of means and
covariances is reviewed in Section 3.2, and the use of the robust
estimates in conjunction with probability plots of associated
Mahalanobis squared distances is considered. It is shown that
detection of atypical observations is enhanced when the robust
estimates are used, rather than the usual estimates. The weights
associated with the robust estimation can also themselves be used to
indicate atypical observations. The robust estimates are found to be
similar to the usual estimates for uncontaminated data. A procedure
for ro,.,ust principal components analysis is given in Section 3.3.
Typical data sets are examined in Section 3.4, while some general
recommendations are given in Section 3.5.
3.1 Introduction
The increasing interest in robust procedures over recent years
has been motivated in part by the observation that actual data sets
contain occasional gross errors. This becomes important for largish
data sets, since careful quantitative inspection is difficult. Since
the performance of classical procedures is seriously influenced by
atypical values, robust methods which are little influenced by such
values provide an attractive alternative. The survey papers by Huber
(1972) and Hampel (1973) summarize the important methods and results
from the earlier years of univariate robust studies, while Hampel
(1977) includes a review of more recent results. An introductory paper
by Hogg (1977) gives the basic ideas.
![Page 57: CANONICAL VARIATE ANALYSIS: SOME PRACTICAL ASPECTS by ... · 5.5 C-V plot for arctanh correlations for grasshopper data 137 5.6 I-A and Q-Q plots for log variances for Thais data](https://reader033.vdocuments.mx/reader033/viewer/2022052106/60412acbe2295f639322b604/html5/thumbnails/57.jpg)
56
The emphasis throughout this Chapter is on the provision of
estimates of means and of covariances for a single group which are
little influenced by atypical observations, and on the detection of
observations having undue influence on the estimates. The procedures
to be described are based on the assumption of symmetry of the under-
lying distribution; in fact, a multivariate Gaussian form is examined
in the probability plotting. It seems essential to make some
distributional assumption, otherwise the concept of an atypical
observation has little meaning (see Barnett and Lewis, 1978). It is
assumed in what follows that the data are consistent with an approximately
symmetric distribution. If preliminary analyses indicate that such an
assumption is not warranted, it is assumed that a suitable transformation
is applied to achieve approximate symmetry. Gnanadesikan (1977,
Chapter 5) suggests procedures to achieve this.
3.2 Robust Estimation of Multivariate Location and Scatter
Healy (1968) and Cox (1968) have suggested an extension of
probability plots of univariate data to the multivariate situation, by
plotting the Mahalanobis squared distance of each observation against
the order statistic for a chi-squared distribution with v d.f., where
v is the number of variables.
If x represents the vxl vector of sample means, and V the sample
covariance matrix, then the Mahalanobis squared distance of the mth
observation from the mean of the observations is defined by
dm T -1 = (xm - x) V (xm - x) .
![Page 58: CANONICAL VARIATE ANALYSIS: SOME PRACTICAL ASPECTS by ... · 5.5 C-V plot for arctanh correlations for grasshopper data 137 5.6 I-A and Q-Q plots for log variances for Thais data](https://reader033.vdocuments.mx/reader033/viewer/2022052106/60412acbe2295f639322b604/html5/thumbnails/58.jpg)
57
Gnanadesikan (1977, p.172) discusses probability plots of dm in
detail; as with univariate Gaussian plotting, its particular appeal
is that it combines examination of the distributional assumption of
a multivariate Gaussian form with detection of atypical observations
and is especially suited to informal graphical description.
Observations which are grossly atypical in a single component
can often be detected using univariate techniques applied to each
variable. However for multivariate data, observations are often only
found to be atypical when the value for each variable is considered in
relation to the other variables. The extract of data listed in Table
3.1 in Section 3.4 shows that the atypical values are within the range
of the data when each variable is considered separately. However, the
underlined values fail to maintain the pattern of relationships between
the variables evident in the majority of the observations.
As noted in the previous Chapter, an atypical multivariate vector
of observations will influence both the means and covariances. The
tendency will usually be to deflate the correlations and possibly
inflate the variances, and so to inflate the size of the associated
concentration ellipsoid. This will in general have the effects of
decreasing the Mahalanobis distance for the atypical observation and
distorting the rest of the plot.
The Mahalanobis distances play a basic role in multivariate
M-estimation. From the applied viewpoint, M-estimators can be considered
as a simple modification of classical estimators; the contribution of an
observation to the statistic(s) of interest is gicien full unit weight if
it is a reasonable observation, otherwise its contribution is downweighted.
They can be considered as classical estimators after a weight function
has been applied to the data (Hampel, 1977). The weight function is
given by wm = w(tm)/tm. Here m is a measure of deviation reflecting
![Page 59: CANONICAL VARIATE ANALYSIS: SOME PRACTICAL ASPECTS by ... · 5.5 C-V plot for arctanh correlations for grasshopper data 137 5.6 I-A and Q-Q plots for log variances for Thais data](https://reader033.vdocuments.mx/reader033/viewer/2022052106/60412acbe2295f639322b604/html5/thumbnails/59.jpg)
58
the discrepancy of the mth observation from the robust average value,
relative to a robust measure of scatter, and w is a bounded influence
function (Hampel, 1974), linear over the range of values of tm
corresponding to reasonable data, but bounded outside this range.
For robust M-estimation of multivariate location and scatter, the
appropriate measure of deviation turns out to be the Mahalanobis
distance (see below). Hampel (1973) has suggested that the influence
and hence the weight of an extreme atypical observation should be zero,
so that w should redescend for values of tm sufficiently large.
M-estimators of multivariate location and scatter are discussed
ny Maronna (1976), Hampel (1977) and Huber (1977a,b). The equations
used here to define robust M-estimators of mean and covariance are
given in (3.4) and (3.5) below.
An outline of their derivation is as follows. Consider an
elliptically symmetric density of the form IEI-1/2h(6) where
6m = {(xm - u)TE-1(xm - u)}1/2. Then the relevant part of the log
likelihood is
- 2 log n
I + E log h(6 m).
m=1
Write
u (6m ) = -h(6m) lh' (6m) 6m1
and let dm be defined analogously to 6m with u and E replaced by their
maximum likelihood estimators.
![Page 60: CANONICAL VARIATE ANALYSIS: SOME PRACTICAL ASPECTS by ... · 5.5 C-V plot for arctanh correlations for grasshopper data 137 5.6 I-A and Q-Q plots for log variances for Thais data](https://reader033.vdocuments.mx/reader033/viewer/2022052106/60412acbe2295f639322b604/html5/thumbnails/60.jpg)
59
Then differentiation w.r.t. p gives
A n n u = E u(d )x / E u(d ).
m=1 m m m=l
m (3.1)
Differentiation w.r.t. E gives
n 2 = n-1 E u (dm) (x m - p) (xm - 1#1)T.
m=1
Huber (1977a, p.168; 1977b,p.42) has suggested the modified form
E = E u(d ) (x - p) (x - p)T/ E v(d) m=1 m m m m=1
m (3.2)
with arbitrary v(dm) as the most general form for an affinely invariant
M-estimator of covariance. This form is adopted for the definition of
Vc in (3.5) below, with the v(dm) related to the weights to give an
unbiased estimator when all observations have full weight.
The above derivation is for an elliptically symmetric density;
for the multivariate Gaussian density, u(dm) = 1 and so the usual
estimators result. The appropriate form for robust M-estimators results
by associating the elliptical density with a contaminated multivariate
Gaussian density. The robust estimators effectively give full weight to
observations assumed to come from the main body of the data, from the
uncontaminated distribution, but reduced weight or influence to
observations from the tails of the contaminating distribution. In
practice, this means downweighting the influence of observations with
unduly large Mahalanobis distances.
To define robust M-estimators of location and scatter, rewrite
(3.1) and (3.2) as
![Page 61: CANONICAL VARIATE ANALYSIS: SOME PRACTICAL ASPECTS by ... · 5.5 C-V plot for arctanh correlations for grasshopper data 137 5.6 I-A and Q-Q plots for log variances for Thais data](https://reader033.vdocuments.mx/reader033/viewer/2022052106/60412acbe2295f639322b604/html5/thumbnails/61.jpg)
A u E d
• m
lw (dm) )x n/ E mlw (dm )
m=1 m
where
w (dm) = dmu (dm) ,
and
E = E dm
• -20 (dm) (xm - u ) (x m - ū ) T/ E v (dm)
m=1 m=1
where
0 (dm) = dmu (dm) . (3.3)
For robust M-estimation, the influence of an observation must be
bounded. As defined, w reflects the linear influence of an observation
on the sample mean, and 0 reflects the quadratic influence of an
observation on the variances and covariances. The bounded forms
chosen for w are the influence function proposed by Huber (1964) and a
re-descending form which gives the qualitative behaviour proposed by
Hampel (1973), as given in (3.6) and (3.7) below. If w(dm) is bounded,
then (3.3) suggests taking +(dm) = dmw(dm), which is not bounded. A
more suitable approach is to bound 4(dm) directly; a simple choice is
to set 0 (dm) = w2 (dm)
The equations used here to define robust estimators of means and
covariances are as follows:
n n —c = E w x/ E w
m m=1 m m m=1
(3.4)
60
![Page 62: CANONICAL VARIATE ANALYSIS: SOME PRACTICAL ASPECTS by ... · 5.5 C-V plot for arctanh correlations for grasshopper data 137 5.6 I-A and Q-Q plots for log variances for Thais data](https://reader033.vdocuments.mx/reader033/viewer/2022052106/60412acbe2295f639322b604/html5/thumbnails/62.jpg)
and
n n Vc = £ w2 (x
m - x ) (x
m - x ) T/ ( £ w2m - 1). m=1 m m=1
(3.5)
61
where
wn = w(dm) = w(dm)/dm
and
dm = {(xm - x )TV
C-1(xm - x)}1/2
The solution for x and Vc is iterative.
The two forms of w used here are:
(i) the non-descending form proposed by Huber (1964)
w(dm) = dm if dm < bl
= b1 if dm > b1;
(3.6)
(ii) a re-descending form suggested by Hampel (pers. comm.)
w(dm) = dm if dm < bl
= blexp{- 2(dm-bl)2/b2} if dm > bl .
(3.7)
I have found that b2 in the range 2.5 to 1.0 gives good performance;
the smaller the value of b2, the faster is the rate at which w descends.
![Page 63: CANONICAL VARIATE ANALYSIS: SOME PRACTICAL ASPECTS by ... · 5.5 C-V plot for arctanh correlations for grasshopper data 137 5.6 I-A and Q-Q plots for log variances for Thais data](https://reader033.vdocuments.mx/reader033/viewer/2022052106/60412acbe2295f639322b604/html5/thumbnails/63.jpg)
62
The constant b1 is taken here as
b1 = V17 + b0/VI ,
where b0 is in the range 1.64-3.09; I have usually taken b0 = 2.0.
This form of bl is derived by assuming that dm ti x.. Fisher's square
root approximation gives dm ti N (v, 1/1) (strictly E (dm) =
Then, on the assumption that most of the data points are reasonable
observations, b0 is here equated with a percentage point of the
standard Gaussian distribution.
The approach adopted here is to determine x and Vc and the
associated weights based on the non-descending w, possibly for a range
of values of b0. The initial estimates are the usual means and
covariance matrix. Then the redescending w is introduced, and up to
10 iterations are performed for each value of b2. Examination of
estimates from successive iterations with v = 7 shows that around six
iterations are needed to effect the qualitative changes reported in
Section 3.4; little change in distances and hence weights for the
redescending w seems to result after ten. iterations.
3.3 Robust Principal Components Analysis
A principal components analysis of the covariance matrix V (or
associated correlation matrix R) seeks a linear combination ym = uTxm
of the original variables xm such that the usual sample variance of the
ym is a maximum. The solution is given by an eigenanalysis of V
V = UEUT; viz.
![Page 64: CANONICAL VARIATE ANALYSIS: SOME PRACTICAL ASPECTS by ... · 5.5 C-V plot for arctanh correlations for grasshopper data 137 5.6 I-A and Q-Q plots for log variances for Thais data](https://reader033.vdocuments.mx/reader033/viewer/2022052106/60412acbe2295f639322b604/html5/thumbnails/64.jpg)
63
the eigenvectors ui of U define the linear combinations while the
corresponding diagonal elements ei of the diagonal matrix of eigenvalues
E are the sample variances of the derived variables.
It is arguable that the direction ul should not be determined by
one or two atypical values. Consider an example in which the data
form a tight ellipse except for one observation (see, e.g., the figure
in Hinkley, 1978); this observation may well determine the direction
of the first eigenvector.
An obvious procedure for robustifying the analysis is to replace V
by the robust estimator Vc; this is the M-estimator solution to robust
principal components. This procedure weights an observation according
to its total distance dm from the robust estimate of location. But
this distance can be decomposed into components along each eigenvector;
and an observation may have a large component along one direction and
small components along the remaining directions and hence not be
adequately downweighted. It is therefore appealing to apply robust
M-estimation of mean and variance to each principal component. The
direction cosines will then be chosen to maximize the robust variance
of the resulting linear combination. The aim is to determine those
observations having undue influence on the resulting directions, and
to determine directions which are little influenced by atypical
observations.
The proposed procedure is as follows.
1. Take as an initial estimate of u1 the first eigenvector from an
eigenanalysis of V or Vc.
2. Form the principal component scores ym = uixm.
3. Determine the M-estimators of mean and variance of ym, and the
2 associated weights m. The median and {0.74 (interquartile range)}
![Page 65: CANONICAL VARIATE ANALYSIS: SOME PRACTICAL ASPECTS by ... · 5.5 C-V plot for arctanh correlations for grasshopper data 137 5.6 I-A and Q-Q plots for log variances for Thais data](https://reader033.vdocuments.mx/reader033/viewer/2022052106/60412acbe2295f639322b604/html5/thumbnails/65.jpg)
64
of the ym can be used to provide initial robust estimates.
Here 0.74 = (2 x 0.675)-1 and 0.675 is the 75% quantile for
the N(0,1) distribution. This choice of initial estimate of
variance ensures that the proportion of observations downweighted
is kept reasonably small.
3(a) After the first iteration, take the weights m
as the minimum of
the weights for the current and previous iterations; this is
necessary to prevent oscillation of the solution. n n
4. Calculate x = E w x/ E w and m=1 m m m=1 m
n n Vw = E w2 (x - x ) (xm - x ) T/ ( E w2 - 1) .
m=1 m m m=1 m
5. Determine the first eigenvalue and eigenvector ui of Vw.
6. Repeat steps (2) to (5) until successive estimates of the eigenvalue
are sufficiently close.
To determine successive directions ui, 2 < i < v, project the data
onto the space orthogonal to that spanned by the previous eigenvectors
ui,...,ui_l, and repeat steps (2) to (5); take as the initial estimate
the second eigenvector from the last iteration for the previous eigen-
vector. The proposed procedure for successive directions can be set
out as follows.
7. Form xim = (I - Ui_iUiTl)xn, where Ui_1 = (u* 'i-1
8. Repeat steps (2) to (5) with xim
replacing m, and determine ui.
The covariance matrix based on the xim will be singular, with
rank v-i+l. However only the first eigenvalue and eigenvector
are required.
![Page 66: CANONICAL VARIATE ANALYSIS: SOME PRACTICAL ASPECTS by ... · 5.5 C-V plot for arctanh correlations for grasshopper data 137 5.6 I-A and Q-Q plots for log variances for Thais data](https://reader033.vdocuments.mx/reader033/viewer/2022052106/60412acbe2295f639322b604/html5/thumbnails/66.jpg)
65
9. The principal component scores are given by uixim
ui(I - Ui-11-i)xm and hence u = (I -
Ui-lUi-1) i.
Steps (7), (8) and (9) are repeated until all v eigenvalues ei
and eigenvectors ui, together with the associates weights, are determined.
Alternatively the procedure may be terminated after some specified
proportion of variation is explained.
When the principal components analysis is based on the correlation
matrix, the data must be standardized to unit variance for each variable
to determine successive eigenvalues and eigenvectors. Two possibilities
exist: the data can be standardized by the robust estimates of standard
deviation determined from Vc before determining ui; or the robust
estimates of standard deviation from Vw can be used to standardize
the data following the calculation of ui.
Finally, a robust estimate of the covariance or correlation matrix
can be found from U*E*U*T, to provide an alternative robust estimate.
Both this approach and that described in the previous Section give a
positive definite correlation/covariance matrix. Robust estimation of
each entry separately does not always achieve this.
3.4 Some Practical Examples
The examples discussed here are used to illustrate results which
are typical of those obtained when the robust procedures discussed in
the previous Sections are applied to data arising from practical problems.
With experience, a recommended procedure has now evolved. Initially,
however, a fair amount of exploratory work, using five data sets typical
of discrimination studies in morphometric and medical problems, was
carried out, First, the usual estimates of means and covariance matrix
![Page 67: CANONICAL VARIATE ANALYSIS: SOME PRACTICAL ASPECTS by ... · 5.5 C-V plot for arctanh correlations for grasshopper data 137 5.6 I-A and Q-Q plots for log variances for Thais data](https://reader033.vdocuments.mx/reader033/viewer/2022052106/60412acbe2295f639322b604/html5/thumbnails/67.jpg)
66
were calculated . Gamma (2,2) probability plots of the associated
dd and Gaussian probability plots of the dm 3 were made. The latter
uses the well-known Wilson-Hilferty (1931) transformation of a gamma
variate to Gaussian form (see also Healy, 1968, p.159). Then robust
M-estimates were determined, with non-descending w and with b0 in
the range 1.64-3.09 (in the range of .95-.999 percentage points of
the N(0,1) distribution). The weights were noted and probability plots
of associated distances were made. Finally the redescending w was
introduced with b2 = 2.5(0.25)0.75. The values b2 = 2.25, 1.75 and
1.25 were found to be sufficiently representative, while a value of
b0 = 2.0 seems to indicate atypical observations and yet not be too
sensitive to random variation (see Section 3.5).
The first data set to be discussed is taken from a study of
geographic variation in the whelk Thais lamellosa (C. Campbell, 1978).
Data are available for twelve groups on the west coast of North
America. The group sizes are: 50, 72, 99, 76, 37, 36, 46, 46, 51, 34,
28 and 43. Measurements were made on twenty variables; canonical
analyses show that seven of these provide much of the between-groups
discrimination. Robust covariance estimation was applied to the twelve
groups based on the seven variables.
The probability plots with b0 = 2, b2 = 1.25 show that nine of the
groups have one or two atypical observations, while one group has seven
atypical observations. The associated weights are all less than 0.35;
18 of the 21 atypical observations have zero weight. Probability plots
of the usual distances indicate only ten of the atypical observations,
with a further four or five doubtful. The analyses are carried out
group by group; a between-groups analysis is discussed in Chapter Four.
Four of the 21 atypical observations have two atypical values in
the vector, giving 25 out of a total of 4326 (= 618 x 7 variables)
![Page 68: CANONICAL VARIATE ANALYSIS: SOME PRACTICAL ASPECTS by ... · 5.5 C-V plot for arctanh correlations for grasshopper data 137 5.6 I-A and Q-Q plots for log variances for Thais data](https://reader033.vdocuments.mx/reader033/viewer/2022052106/60412acbe2295f639322b604/html5/thumbnails/68.jpg)
67
variable values which are atypical. The variables for each group show
high correlations, so that with the elementary precaution of listing
the observations in increasing order of the largest variable (here
overall length), atypical values are readily apparent once the
observation is indicated. When the atypical observations are compared
with those above and below them on the listing (see Table 3.1),
corrections of either 100 or 50 units would provide good agreement for
all but four values, while a further two are obviously the result of
interchanging the order of two numbers (e.g. 015 should almost certainly
read 105). The correction of 100 or 50 units is an obvious 'and acceptable
one since the dial on the calipers used to measure the whelks records to
50 units, and it is to be expected in more than 4000 measurements that
occasionally the linear scale will be misread.
Figure 3.1(a) shows a Gaussian probability plot of usual distances
for group 3 (n = 99) (i.e. a Q-Q probability plot of cube root of
squared Mahalanobis distances against Gaussian order statistics with
the distances calculated using the usual means and covariances); there
is some indication of one atypical observation (#79). Robust
M-estimation with b0 = 2.0, b2 = = (non-descending w) gives a weight of
0.03 for this observation; a second observation (#97) has a weight of
0.11. With b2 = 1.25 (redescending w), both observations have a zero
weight. Figure 3.1(b) shows a Gaussian probability plot of robust
distances when b0 = 2.0, b2 = 1.25. Two observations, 79 and 97, are
clearly atypical. Both are atypical in the second variable, the first
being out by 100 units and the second by 50 units. Figure 3.1(c) shows
a Gaussian probability plot of robust distances with b0 = 2.0, b2 = 1.25
after the two observations have been corrected. The linearity of the
plot is obvious. None of the weights is now less than 0.35. The usual
estimate of standard deviation is 48.9; the robust estimate is 44.9
![Page 69: CANONICAL VARIATE ANALYSIS: SOME PRACTICAL ASPECTS by ... · 5.5 C-V plot for arctanh correlations for grasshopper data 137 5.6 I-A and Q-Q plots for log variances for Thais data](https://reader033.vdocuments.mx/reader033/viewer/2022052106/60412acbe2295f639322b604/html5/thumbnails/69.jpg)
Table 3.1
Extract of listing of data for Group 2 (n = 72) for
Thais data, and overall summary.
vi v2 v4 v6
observation 27 313 258 208 45
28 313 271 212 44
29 314 264 197 43
30a 315 265 265 44
31 316 279 224 40
32 317 265 200 41
33a 318 200 255 42
34 321 271 213 45
68
minimum value
maximum value
mean-original data
std devn
mean-corrected datab
std devn
208 173 150 34
416 355 269 61
320.5 -268.3 212.1 44.1
42.44 36.77 26.67 5.43
320.5 269.0 210.7 44.1
42.44 35.92 25.39 5.43
aobservations indicated as possibly atypical by robust M-estimation;
underlined values are probably out by 50 units.
bobservations indicated as possibly atypical are adjusted by 50 units.
![Page 70: CANONICAL VARIATE ANALYSIS: SOME PRACTICAL ASPECTS by ... · 5.5 C-V plot for arctanh correlations for grasshopper data 137 5.6 I-A and Q-Q plots for log variances for Thais data](https://reader033.vdocuments.mx/reader033/viewer/2022052106/60412acbe2295f639322b604/html5/thumbnails/70.jpg)
69
4 +
4-79
NE 2.4 -f-
0.4 4-
4 4
1311 24#
25541 453
1445# #22
#**#112
4•4 +(a)
-3 2 -1.6 ō 1 6 3.2
Gaussian quantiles
1.0.0 + (J )
0-79
-97
#14*4 244423'
44555652 ####ii324
0.0 + be z.o, b2=1.25 .4. + 4 + + +
-3.2 -1.6 0.0 1.6 3.2
Gaussian quantiles
Figure 3.1 - Q-Q plots of Mahalanobis squared distances for Thais data from group 3
la) Gaussian plot - cube root of usual squared distances
(b) Gaussian plot - cube root distances - redescending estimates
be 00
![Page 71: CANONICAL VARIATE ANALYSIS: SOME PRACTICAL ASPECTS by ... · 5.5 C-V plot for arctanh correlations for grasshopper data 137 5.6 I-A and Q-Q plots for log variances for Thais data](https://reader033.vdocuments.mx/reader033/viewer/2022052106/60412acbe2295f639322b604/html5/thumbnails/71.jpg)
70
3.12+(c)
pE
m1.92 4 N 0
0
4-
#
*11** 12 2*
31 454*
46* 25*
442 143
2* **
***14
+
# 0.72 +
+ + -5.2 -1.6
be2.0, bz=x.25 + + + + 0.0 1.6 3.2 Gaussian quantiles
Figure 3.1 (cont.) - Q-Q plots - Thais data - group 3
(c) Gaussian plot - cube root of squared distances - redescending function - corrected data
![Page 72: CANONICAL VARIATE ANALYSIS: SOME PRACTICAL ASPECTS by ... · 5.5 C-V plot for arctanh correlations for grasshopper data 137 5.6 I-A and Q-Q plots for log variances for Thais data](https://reader033.vdocuments.mx/reader033/viewer/2022052106/60412acbe2295f639322b604/html5/thumbnails/72.jpg)
71
while the robust estimate after correcting the data is 45.8. The
correlations of v2 with the remaining variables are increased by 0.02
to 0.06.
Figure 3.2(a) shows a Gaussian probability plot of usual distances
for group 8 (n = 46). The curvilinearity suggests a departure from
multivariate Gaussian form. Robust M-estimation with b0 = 2.0,
b2 = 1.25 gives two zero weights (the corresponding weights with
b2 = = are 0.17 and 0.15 respectively). The Gaussian probability plot
of robust distances (b2 = 1.25) in Figure 3.2(b) indicates two atypical
observations (#24 and #28). The atypical values are out by 100
units for v2 for #24 and by 50 units for v3 for #28; the Q-Q plot of
corrected distances, in Figure 3.2(c), is now linear. Figure 3.2(d)
shows a gamma probability plot of usual distances for group 8 (i.e.
a Q-Q probability plot of squared Mahalanobis distances against gamma
quantiles with parameters (0.5, 3.5)). While the indication that there.
are two atypical values is clearer than in Figure 3.2(a), there is some
curvilinearity in the plot. A gamma probability plot of robust distances,
corresponding to Figure 3.2(b), gives the same striking indication of
two atypical observations. For this group, the standard deviations are
little affected by the'robust estimation, with that for v2 reduced from
36.1 to 34.7. However the correlations are increased: for example,
r(1,2) increases from 0.863 to 0.985 and r(2,3) from 0.855 to 0.965.
Observation 24 also appears to be atypical for v4 by about 30 units,
r(2,4) and r(3,4) are increased from 0.476 and 0.455 to 0.773 and 0.816
respectively.
The second data set is taken from an unpublished study of morpho-
metric divergence in male and female scorpions occurring in Australia.
Nine variables were measured on each specimen. The male data for one
species (n = 181) are discussed here.
![Page 73: CANONICAL VARIATE ANALYSIS: SOME PRACTICAL ASPECTS by ... · 5.5 C-V plot for arctanh correlations for grasshopper data 137 5.6 I-A and Q-Q plots for log variances for Thais data](https://reader033.vdocuments.mx/reader033/viewer/2022052106/60412acbe2295f639322b604/html5/thumbnails/73.jpg)
-1- 2S
+ 6.9t(b)
-f
3.36 ++(Q)
72
t
L24
I
if* i
111 *11
$11# 11*
1 #i
44.
iv 00 + 4 + 4 0.0 1.2 2.4
Gaussian quantiles
+ + + 2s
* * 0 0.96 t
-2.4 -1'2
t24
N E
*' 01 3.9 1- m fa 0
t 11111#
11111 111111
# # # *0010 0.9 +
+ + -2.4 -1.2
bo2.0,b2 1.25 .~. + + + 0.0 1.2 2.4
Gaussian quantiles
Figure 3.2 - Q-Q plots of Mahalanobis squared distances for Thais data from group 8
(a) Gaussian plot - cube root of usual squared distances
(b) Gaussian plot - cube root distances - redescending function
![Page 74: CANONICAL VARIATE ANALYSIS: SOME PRACTICAL ASPECTS by ... · 5.5 C-V plot for arctanh correlations for grasshopper data 137 5.6 I-A and Q-Q plots for log variances for Thais data](https://reader033.vdocuments.mx/reader033/viewer/2022052106/60412acbe2295f639322b604/html5/thumbnails/74.jpg)
73
+ -1- +
M
++(C) + *
** ti m w 0
2. 1+ ** #i**
*i4 *it it
1 1i1
#1111 $*
*4
i i+ #4
b=2.0, bell; + + + -I- t
-2.4 -1.2 0.0 1.2 2.4 Gaussian quantiles
+ + +
`*^• 24
32.0 + (d)
1 * *#
1** i1#
112 12124
121# 0.0 t #** 60=a +
0.0 5.0 10.0 15.0 20.0
Gamma (0.5,3.5) quantiles
Figure 3.2 (cont.) - Q-Qrplots - Thais data - group B
(c) Gaussian plot - cube root of squared distances redescending function - corrected data
(d) Gamma plot - flual squared distances
![Page 75: CANONICAL VARIATE ANALYSIS: SOME PRACTICAL ASPECTS by ... · 5.5 C-V plot for arctanh correlations for grasshopper data 137 5.6 I-A and Q-Q plots for log variances for Thais data](https://reader033.vdocuments.mx/reader033/viewer/2022052106/60412acbe2295f639322b604/html5/thumbnails/75.jpg)
74
Robust covariance estimation (b0 = 2.0, b2 = 1.25) indicates five
observations with weights less than 0.35; one of these (#139) has a
weight of 0.34. The associated Gaussian probability plot shows only
four atypical observations. A robust principal components analysis
indicates a further three observations with weight less than 0.10 for
at least one component and a further sixteen with weight less than 0.35.
The advantage of the combined approach is that different combinations
of variables are examined. For example, one of the observations (#162)
w.t.th a zero weight for robust covariance estimation (b0 ='2.0, b2 = 1.25)
has zero weight for principal components 5 and 6. Examination of the
eigenvectors shows variable 8 to have a high loading for these components.
The measurement of 3.5 was checked by my co-worker and found to be 6.0
(the robust estimate of standard deviation is 0.77). Another observation
(not yet rechecked) has a weight of 0.60 for robust covariance estimation
(b0 = 2.0, b2 = 1.25) but weights of 0.06, 0.21, 0.27 and 0.23 for
principal components 3-6. Of those observations checked and corrected,
the common error was a measurement out by 1 or 1.5 mm; the metal dial
calipers were graduated to 0.05 mm.
3.5 Discussion
The value of Q-Q plots of Mahalanobis distances to assess the
distributional properties of the data and to indicate possibly atypical
values is well-recognized in multivariate studies. Because of the
fundamental role that the distances play in robust M-estimation of
location and scatter, the combination of robust Mahalanobis distances
and Q-Q plots seems an obvious one. As the examples reported here show,
the combined approach enhances the detection of atypical values; in the
whelk example, a probability plot of the usual distances fails to
![Page 76: CANONICAL VARIATE ANALYSIS: SOME PRACTICAL ASPECTS by ... · 5.5 C-V plot for arctanh correlations for grasshopper data 137 5.6 I-A and Q-Q plots for log variances for Thais data](https://reader033.vdocuments.mx/reader033/viewer/2022052106/60412acbe2295f639322b604/html5/thumbnails/76.jpg)
75
indicate an atypical observation, due largely to inflation of the
variances for some variables.
The weights wm associated with the dm also indicate atypical
observations. Extensive examination of a number of data sets
(v = 3, 4, 5, 7, 9) shows that a weight of less than 0.30 with
b0 = 2.0, b2 = 1.25 (corresponding approximately to a weight of less
than 0.60 with b2 = =) has always indicated an atypical observation;
this judgment is based on an examination of the corresponding Q-Q
plots and of the variable values for the observations. A weight of
more than 0.70 with b2 = = is associated with a typical observation.
For v between 4 and 10, a weight of approximately 0.42 with b2 = 1.25
(or 0.66 with b2 = =) corresponds to a squared distance whose value
coincides with the 0.1% point of the Xv distribution. Some flexibility
exists with the choice of b0; values in the range 1.65 to 2.35 give
similar qualitative results.
If the robust estimates are to be used in subsequent statistical
analyses, such as principal components and canonical variate analysis,
it is important that they differ little from the usual estimates when
applied to uncontaminated data. Now for multivariate Gaussian data,
0(6m) = dm which is distributed as andd hence E(0) = v. For the
non-descending w function defined in (3.6) (remember 0 = w2), Huber
(1977a, p.183) gives the equation for the correction factor to standardize
the estimates so that they have the correct asymptotic values for the
Gaussian distribution. As his Table 1 shows, the robust estimate of
covariance is within 2% of the usual estimate for v between 4 and 10.
Table 3.2(a) gives a stem-and-leaf plot of the ratio of robust to usual
estimates of variance for the Thais data (v = 7, g = 10), while Table 3.3
gives results for the scorpion data (v = 9, g = 19). Table 3.2 also
gives the ratios for ten sets of generated multivariate Gaussian data;
![Page 77: CANONICAL VARIATE ANALYSIS: SOME PRACTICAL ASPECTS by ... · 5.5 C-V plot for arctanh correlations for grasshopper data 137 5.6 I-A and Q-Q plots for log variances for Thais data](https://reader033.vdocuments.mx/reader033/viewer/2022052106/60412acbe2295f639322b604/html5/thumbnails/77.jpg)
76
each set consists of six groups with sample size and underlying means
and covariances corresponding to the first six Thais groups (hence
v = 7 and g = 60). The actual Thais data show good agreement with a
multivariate Gaussian form. It seems reasonable to conclude from
Tables 3.2 and 3.3 that the robust estimates of the variances are
generally within 2% of the usual estimates for well-behaved data.
For the generated data, three observations have low weight; and this is
reflected in the corresponding ratios of variances, all of which are
less than 0.97. For the actual data sets, a low ratio for a variable
is always associated with an atypical value for one (or more) of the
observations having low weight wm. A recommended approach is to determine the means, covariances,
distances and associated weights for b0 = =; for b0 = 2.0, b2 = =,
and for b0 = 2.0, b2 = 1.25. Gaussian probability plots of cube root
of squared distances (with b0 = 2.0, b2 = 1.25) together with the
magnitude of non-unit weights will indicate atypical observations.
For more than six or seven variables, a robust principal components
analysis is also useful for identifying atypical observations.
The Mahalanobis squared distances are usually plotted against the
quantiles of a gamma distribution with shape parameter v/2. The results
of this study show that the Wilson-Hilferty cube-root transformation
behaves well on the dm. There is good agreement between Q-Q plots of
dm versus Gamma (2, 2) and of dm/3 versus N(0,1), though the cube root transformation tends to lessen the visual impact of the large distances
on the gamma plot. The general conclusion here reinforces the remarks
of Healy (1968, p.159) that the normal or Gaussian plot seems more
than adequate for detecting atypical observations and examining the
multivariate Gaussian assumption.
![Page 78: CANONICAL VARIATE ANALYSIS: SOME PRACTICAL ASPECTS by ... · 5.5 C-V plot for arctanh correlations for grasshopper data 137 5.6 I-A and Q-Q plots for log variances for Thais data](https://reader033.vdocuments.mx/reader033/viewer/2022052106/60412acbe2295f639322b604/html5/thumbnails/78.jpg)
Table 3.2
Stem-and-leaf plot for ratios of robust to usual variances
for (a) Thais data Or = 7, g = 10) and (b) generated multi-
variate Gaussian data with same underlying structure as
(a)
1.05
Thais data (v = 7, g = 60).
actual
1
(b) generated
1.04 4
1.03 1, 1, 3, 4, 5, 5
1.02 0, 0, 1, 2, 2, 3, 5, 5, 5, 6, 8, 9, 9
1.01 0, 3, 4, 7, 8, 9 1, 2, 2, 3, 3
1.00 3, 5, 8 0(243), 1(19) , 2(9), 3, 3, 4, 4, 5, 6, 6, 6, 7
0.99 3, 4, 5, 6, 6, 6, 8 0(9), 1(6), 2, 3(10), 4(7), 5(5), 6(7), 7(7), 8(13), 9(19)
0.98 0, 0, 0, 2, 3, 3, 5, 6, 1, 1, 2, 2, 4, 5, 5, 6, 6, 6, 8, 8, 8, 9, 9, 9 9, 9, 9
0.97 3 3, 4, 4, 7, 8, 8
0.96 3, 4, 4, 5, 7, 9 1, 2, 2, 2, 7, 9
0.957, 0.933, 0.936, 0.936, 0.952, 0.954, 0.957, 0.947, 0.934, 0.937, 0.922, 0.923, 0.911, 0.934, 0.938, 0.927, 0.928, 0.929, 0.89, 0.85, 0.84, 0.81, 0.910, 0.912, 0.912, 0.89, 0.88, 0.78, 0.59 0.88, 0.87, 0.87, 0.86, 0.82
77
8,
![Page 79: CANONICAL VARIATE ANALYSIS: SOME PRACTICAL ASPECTS by ... · 5.5 C-V plot for arctanh correlations for grasshopper data 137 5.6 I-A and Q-Q plots for log variances for Thais data](https://reader033.vdocuments.mx/reader033/viewer/2022052106/60412acbe2295f639322b604/html5/thumbnails/79.jpg)
Table 3.3
Stem-and-leaf plot for ratios of robust to usual
variances for scorpion data (v = 9, g = 19)
1.02 0, 0, 0, 0, 1, 1, 1, 1, 1, 2, 2, 4
1.01 3, 4, 6, 8, 9, 9, 9
1.00 0(83), 3(10), 7, 8
0.99 1, 1, 4, 6, 7
0.98 0, O, 1, 2, 3, 3, 6
0.97 1, 1, 4, 7, 7
0.96 0, 1, 4, 5
0.95 5
0.94 1, 2, 3, 4, 9
0.93 0, 9
0.92 2, 4, 8, 9
0.91 4, 8
0.90 0, 2
0.89, 0.89, 0.89, 0.88, 0.88, 0.87, 0.86
0.83, 0.82, 0.81, 0.80, 0.80, 0.80,
0.77, 0.73, 0.71
0.68, 0.60, 0.58, 0.56.
78
![Page 80: CANONICAL VARIATE ANALYSIS: SOME PRACTICAL ASPECTS by ... · 5.5 C-V plot for arctanh correlations for grasshopper data 137 5.6 I-A and Q-Q plots for log variances for Thais data](https://reader033.vdocuments.mx/reader033/viewer/2022052106/60412acbe2295f639322b604/html5/thumbnails/80.jpg)
79
A robust principal components analysis provides a useful complement
to robust covariance estimation when there are more than a few variables.
The eigenvectors usually provide contrasts between subsets of variables
(after the first), and hence observations will have low weight only for
those principal components with large loadings for the variable(s) with
the atypical values. This is well-illustrated by the scorpion data:
an observation was not atypical from the Q-Q plots and non-unit weight,
whereas the robust principal components analysis indicated the observation
and gave some guide as to the variable involved.
The procedures discussed in this Chapter and in Chapter Four on
robust between-groups procedures can be readily used for routine
screening of data. With the increasing trend to direct entry of data
at or near collection time, automatic application of robust estimation
procedures will indicate possibly atypical observations; the measure-
ments can then be checked while the individuals or specimens are still
available. The computer costs involved are minimal when compared with
the costs of the experimental or survey work involved in collecting
the data and the time spent reanalyzing data following detection of
errors at a later stage of analysis.
![Page 81: CANONICAL VARIATE ANALYSIS: SOME PRACTICAL ASPECTS by ... · 5.5 C-V plot for arctanh correlations for grasshopper data 137 5.6 I-A and Q-Q plots for log variances for Thais data](https://reader033.vdocuments.mx/reader033/viewer/2022052106/60412acbe2295f639322b604/html5/thumbnails/81.jpg)
CHAPTER FOUR: ROBUST CANONICAL VARIATE ANALYSIS
In this Chapter,, robust procedures for canonical variate analysis
are developed. Robust M-estimation is applied to the scores for each
canonical variate in Section 4.1 to determine appropriate weights to
define robust estimates of the between- and within-groups SSQPR
matrices. Robust M-estimation for canonical variate analysis, based
on the functional relationship formulation outlined in Section 1.3,
is developed in Section 4.2. The weights are shown to depend on the
distance of an observation from the canonical variate mean for the
group. For uncontaminated data, the robust M-estimation procedure
performs similarly to the usual canonical variate analysis. Two
typical data sets are examined in Section 4.3. The ordinary canonical
vectors are little affected by the presence of atypical observations,
though the canonical roots are considerably influenced. Some general
conclusions are given in Section 4.4.
4.1 M-estimation of the Canonical Variate Scores
When different groups of data are available, as in multivariate
discrimination studies, the robust procedures described in Chapter
Three can be applied to each group separately. This will provide
robust estimates of means and of covariance matrices for canonical
variate analysis; possibly atypical observations are also identified.
To be more specific, let k and VZ denote robust M-estimates of the
vector of means and of the covariance matrix for the kth group,
determined as in (3.4) and (3.5) with the group subscript added, and
let w denote the associated weights. The pooled within-groups km
SSQPR matrix WC and the between-groups SSQPR matrix BC are formed in
80
![Page 82: CANONICAL VARIATE ANALYSIS: SOME PRACTICAL ASPECTS by ... · 5.5 C-V plot for arctanh correlations for grasshopper data 137 5.6 I-A and Q-Q plots for log variances for Thais data](https://reader033.vdocuments.mx/reader033/viewer/2022052106/60412acbe2295f639322b604/html5/thumbnails/82.jpg)
81
an obvious way, analogous to (4.2) and (4.1) below, using the weights 1
wkm. The canonical roots and canonical vectors of We Bc are then
found, with the vectors scaled as in (iv) in Section 4.:2 below.
With this approach, an observation is weighted according to its
total distance dkm _ {(xkm - k)TVk 1(xkm - xk)}1/2 . For similar
Vk, this is essentially similar to the overall Mahalanobis distance
based on Wc. But this latter distance can be partitioned into a
component along the canonical variate plane, and a component orthogonal
to it. As such, the approach may be relatively insensitive to
observations atypical for one linear combination, but typical for the
rest.
Now canonical variate analysis can be considered as a one way
analysis of variance for a linear combination ykm = cTxkm of the
original variables. The procedure for robust canonical variate analysis
proposed in this Section is to apply robust M-estimation to the
canonical variate scores ykm for each group. For robust regression,
and hence one-way analysis of variance, the appropriate measure of
deviation tkm on which to base the weight function is the residual
(Huber, 1977b). In the context of one-way analysis of variance for
canonical variate analysis, the residuals for a given c are
cTxkm - dick. The influence function for Mahalanobis D2 in Chapter Two
is a quadratic function of this deviation score.
The procedure is as follows:
(i) Take as an initial estimate of c either the usual canonical
vector or that resulting from the usual analysis based on M-estimates
k and Vk for each group.
(ii) Form the canonical variate scores ykm for each group, and
determine the weights wkm associated with M-estimation of the mean.
The variance is either set equal to 1 or is estimated simultaneously
![Page 83: CANONICAL VARIATE ANALYSIS: SOME PRACTICAL ASPECTS by ... · 5.5 C-V plot for arctanh correlations for grasshopper data 137 5.6 I-A and Q-Q plots for log variances for Thais data](https://reader033.vdocuments.mx/reader033/viewer/2022052106/60412acbe2295f639322b604/html5/thumbnails/83.jpg)
82
(see below).
(iia) After the first iteration, take the weights wkm as the
minimum of the weights for the current and previous iterations.
(iii) Calculate
nk
nk = E wkm , m=1
nk
E wkm km/nk, m=1
and
nk nk
Vk = E w2 (xkm - k) (xkm - k) T/ ( E w2 - 1) . km
m=1 m=1
Calculate
g g xT
= E nk k / E nk, k=1 k=1
and form
g Bw - E nk (k - T)( k - T)T
k=1 (4.1)
and
g nk Ww = E ( E w2 - 1)Vk .
km k=1 m=1 (4.2)
(iv) Determine the first canonical root f and canonical vector c
from (Bw - fWw)c = 0. The vector c is scaled such that
g nk cTWwc = E ( E w2 - 1). km
k=1 m=1
![Page 84: CANONICAL VARIATE ANALYSIS: SOME PRACTICAL ASPECTS by ... · 5.5 C-V plot for arctanh correlations for grasshopper data 137 5.6 I-A and Q-Q plots for log variances for Thais data](https://reader033.vdocuments.mx/reader033/viewer/2022052106/60412acbe2295f639322b604/html5/thumbnails/84.jpg)
with
y* k
or
= Ek k
w (y - y*)2/ Ē wk - 1. t= 1 kR k!t k R=1 t var* (y)
83
(v) Repeat steps (ii) to (iv) until successive estimates of the
canonical root are sufficiently close.
The procedure for robust M-estimation in (ii) follows that in
Section 3.2. The non-descending form of w in (3.6) is chosen initially:
w km) = tkm if tkm < b
1
= b1 if tkm > b1
where
t2 _ km - y*) 2/var* (ykm)
nk nk
E wkkykt/
E wki
k=l i=1
and
var*(ykm) - 1
Here
wkm - w(tkm) = w(tkm)/tkm
The constant b1 is taken here as b1 = 1 + b0ii/21 r have usually taken
b0 = 2.25.
![Page 85: CANONICAL VARIATE ANALYSIS: SOME PRACTICAL ASPECTS by ... · 5.5 C-V plot for arctanh correlations for grasshopper data 137 5.6 I-A and Q-Q plots for log variances for Thais data](https://reader033.vdocuments.mx/reader033/viewer/2022052106/60412acbe2295f639322b604/html5/thumbnails/85.jpg)
84
Once convergence of yk is achieved, the redescending form of w
in (3.7) is introduced for up to ten more iterations. This form is
w (tam) = t if t < bl
= I . exp{-O.5(tkm - bl)2/b2} if tkm > blj
from the results in Chapter Three and preliminary work here, I have
taken b2 to be 1.25.
The choice var*(ykm) = 1 for determining the weights wem, for
a given c, effectively coincides with M-estimation for the one-way
analysis of variance, since c is scaled to have unit within-groups
variance. The choice of the robust variance for a group as var*(ykm)
will determine for each group those observations which have atypical
canonical variate scores relative to those for the rest of the group.
When the canonical variate variances are similar for all groups, the
two choices will give similar results. For somewhat different variances,
the effect of using the robust estimate compared with using 1 will be
to place more emphasis on atypical observations for groups with small
variance and less on those from groups with large variance.
To determine successive canonical vectors ci, 1 < i < h, project
the data onto the space orthogonal to that spanned by the previous
canonical vectors cl,...,c and repeat steps (ii) to (iv). Take
the second canonical vector from the last iteration for the previous
one as the initial estimate. The proposed procedure is as follows.
(vi) Foi3m the generalized projection operator
![Page 86: CANONICAL VARIATE ANALYSIS: SOME PRACTICAL ASPECTS by ... · 5.5 C-V plot for arctanh correlations for grasshopper data 137 5.6 I-A and Q-Q plots for log variances for Thais data](https://reader033.vdocuments.mx/reader033/viewer/2022052106/60412acbe2295f639322b604/html5/thumbnails/86.jpg)
85
and calculate xikm _ (I-P)x ; here Ci-1
= (c1,...,ci_1). In km
practice, it is only necessary to form Pi-1 = WWci-1ci-1
and
calculate xikm =
(I_Pi-1)xi-1
km, since W is usually similar for each
vector.
(vii) Repeat steps (ii) to (iv), with xikm replacing x., and
determine Ci.
The pooled WW will now be singular, with rank v - i+1. This is
readily incorporated into the numerical procedures used. If the
eigenvalue/vector decomposition in Section 1.4 is used as the first-
stage orthonormalization, the smallest i-i eigenvalues will be zero.
(viii) The canonical variate scores for successive directions are
given by
cixikm = ci (I-P)xkm
and hence
ci = (I-P)Tcci .
4.2 Robust M-estimation of the Canonical Vectors
The derivation outlined in the previous Section is essentially a
distribution-free approach, in the spirit of the original Fisher-Rao
derivations in Section 1.2. In this Section, robust M-estimators for
the canonical vectors will be developed, based on the functional
relationship formulation in Section 1.3.
To do this, consider, as in Section 3.2, an elliptically symmetric
density of the form FE1-1/2h(d), where
![Page 87: CANONICAL VARIATE ANALYSIS: SOME PRACTICAL ASPECTS by ... · 5.5 C-V plot for arctanh correlations for grasshopper data 137 5.6 I-A and Q-Q plots for log variances for Thais data](https://reader033.vdocuments.mx/reader033/viewer/2022052106/60412acbe2295f639322b604/html5/thumbnails/87.jpg)
86
ō _ = ()clan - uk) TE-1(xkm - Pk). km km (4.3)
Assume, as in the model (1.18) in Section 1.3, that the are specified
by
Pk = uo + E'Yrk, (4.4)
where Y' is the vxp matrix of population canonical vectors.
The relevant part of the log likelihood is
- 2 log SEI + E E log h(d ) km k=1 m=1
with dkm given by (4.3) and the uk by (4.4).
Write
u (d km) = -h (ōkm) _1h'
(6)t5- , km
and define
nk
k = E u(dkm), m=1
nk
xk = n* E u(dkm)xkm, m=1
(4.5)
g n* = E nk ,
k=1
and
g _ _ xT = n*-1 E nkxk ,
k=1
g nk
![Page 88: CANONICAL VARIATE ANALYSIS: SOME PRACTICAL ASPECTS by ... · 5.5 C-V plot for arctanh correlations for grasshopper data 137 5.6 I-A and Q-Q plots for log variances for Thais data](https://reader033.vdocuments.mx/reader033/viewer/2022052106/60412acbe2295f639322b604/html5/thumbnails/88.jpg)
87
where dkm is defined analogously to dkm in (4.3) with uk and E replaced
by their maximum likelihood estimators.
Write
P = ERY(TTET)-1TT
as in (1.21), and follow the same derivation as in Section 1.3 to obtain
= (~ TE`Y) -11T (x*- u0)
(I-P)i0 = (I-P)x .
g nk _ S* = E E u(d) (x x ) (x - km - k km .k
). k=1 m=1
and (4.6)
B* = E nk (xk x - T) (xk - ;-*)T k=1
differentiation of the log likelihood, maximized for and uo, w.r.t.
and w.r.t. ' leads to equations as in (1.24) and (1.25), with S and B
replaced by S* and B*.
Introduction of conditions analogous to (1.26) and subsequent
simplification leads to the fundamental canonical variate equation
where
•► A B*W = V*Vn*F*p ,
V* = n 1S*
(4.7)
and
With
![Page 89: CANONICAL VARIATE ANALYSIS: SOME PRACTICAL ASPECTS by ... · 5.5 C-V plot for arctanh correlations for grasshopper data 137 5.6 I-A and Q-Q plots for log variances for Thais data](https://reader033.vdocuments.mx/reader033/viewer/2022052106/60412acbe2295f639322b604/html5/thumbnails/89.jpg)
88
and
Y'TV*T = I ,
(4.8)
Y'TB*`Y = n*F* p
with F* a diagonal matrix.
Also, as in Section 1.3,
nE = S* + B* - B*Y'Fp-1Y B*n*-1 . (4.9)
The log likelihood is maximized by taking W to be the first p
eigenvectors of V*-1B* in (4.7).
As before,
w _ *A uk = xT + V*Y'''
T 7(*)(xk - T) . (4.10)
The above derivation is for an elliptically symmetric density;
for the multivariate Gaussian density, u(& m) = 1 and so the usual
canonical variate solution in Section 1.3 results. As in Section 3.2,
the above derivation also leads to the appropriate form for a robust
M-estimator solution, by associating the elliptical density with a
contaminated multivariate Gaussian density. The robust estimators will
again give full weight to observations assumed to come from the main
body of the data, but reduced weight or influence otherwise.
To define robust M-estimators of the canonical vectors, rewrite
the equation for xk in (4.5) as
k k 7C* = E dw (d )x / E d-1w (dem)
km km km kmm=1 m=1
![Page 90: CANONICAL VARIATE ANALYSIS: SOME PRACTICAL ASPECTS by ... · 5.5 C-V plot for arctanh correlations for grasshopper data 137 5.6 I-A and Q-Q plots for log variances for Thais data](https://reader033.vdocuments.mx/reader033/viewer/2022052106/60412acbe2295f639322b604/html5/thumbnails/90.jpg)
89
where
w km) = dkmu (dkm)
and rewrite the equation for S* in (4.6) as
g nk
S* = E E d-20 (dkm) (xkm - xk) (xkm - xk)T k=1 m=1
where
(dkm) = d u (dkm) .
For robust M-estimation, w and 0 must be bounded. As in Section
3.2, take (dkm) = w2 (dlan) , with w defined in (3.6) or (3.7) .
Hence the equations used here to define robust estimators of
means and of covariances for the robust canonical variate solution
are as follows:
nk
nk = E wkm ,
m=1
nk
xk = nk-1
E wkmxkm' m=1
g n* = E nk ,
k=1
_ xT = n*-1 F nk xk ,
k=1
g nk _ _ S* = E E wkm (xkm
- xk) (xkm - xk) T
k=1 m=1
![Page 91: CANONICAL VARIATE ANALYSIS: SOME PRACTICAL ASPECTS by ... · 5.5 C-V plot for arctanh correlations for grasshopper data 137 5.6 I-A and Q-Q plots for log variances for Thais data](https://reader033.vdocuments.mx/reader033/viewer/2022052106/60412acbe2295f639322b604/html5/thumbnails/91.jpg)
and
90
B* g — _
T = £1 nk(xk k - x;)(x - xT) k
where
km = w(d) = w (d*km) /dem
and
dkm = (xkm - uk) TĒ-1
(xkm -
uk) (4.11)
w w w
with and E given by (4.10) and (4.9), with 'Y given by (4.7), and
the scaling for 'Y given by (4.8).
To ensure agreement with the usual unbiased estimator when all
weights are unity, the definition of V* as n-1S* is replaced by
g nk 2 i
V* = S*/ £ ( £wkm - 1)
k=1 m=1
and a similar divisor is applied to the equation for £ in (4.9).
Consider now the distance dkm in (4.11); the superscript is dropped
for convenience. For two groups, B = n1n2nT1dxdT, where nT = n1 + n2
and dx = xl - x2. Then i = V-1dx/(dTV-1dx)1/2 = D-1V-1dx and
x n — w
f = nw n1n2nT D . Hence uk = xk and E = V, so that
dkm = (xkm - xk)TV-1(xkm - xk), This is the usual Mahalanobis distance
for the pooled matrix V.
Now let C be the vxv matrix of vectors from the eigenanalysis of
V 1B and F the diagonal matrix of canonical roots. Partition C and F
-1 -1 2
as in (1.32) in Section 1.3, and remember 'Y = C. From (4.8) and (4.9),
![Page 92: CANONICAL VARIATE ANALYSIS: SOME PRACTICAL ASPECTS by ... · 5.5 C-V plot for arctanh correlations for grasshopper data 137 5.6 I-A and Q-Q plots for log variances for Thais data](https://reader033.vdocuments.mx/reader033/viewer/2022052106/60412acbe2295f639322b604/html5/thumbnails/92.jpg)
I 0
CT EC = ( ) and so E-1 = CpCT + Cq(I+Fq)-1CT. Since O I+F
q
TA = C T— and TA = C T, dkm = (xkm xk) TC CT (xkm
-xk) + P p q q p P
— T -1 T (x xT) Cq (I+Fq)
q)-xT). This is illustrated geometrically in
Figure 4.1. The representation is in the space of the orthonormal
variables after the first-stage analysis. Provided the canonical roots
fqi are small, this distance is essentially that from the observation
to the projection of the group mean in the canonical variate plane.
It remains to specify the constants b0 and b2 for the influence
functions w in (3.6) and (3.7). For the robust estimation of covariance
in Chapter Three, a value of b0 = 2.0 was adopted. Examination of
computer-generated data shows that the robust estimates of variance
are within 2% of the usual estimates for an underlying multivariate
Gaussian distribution. A similar approach is adopted here. Two group
configurations, representative of typical data sets to be examined in
Section 4.3, are used. The first is a four variable-five group data
set with group means and pooled covariance matrix taken from a study
of the whelk Dicathais by Phillips, Campbell and Wilson (1973). The
second is a seven variable-six group data set taken from a study of the
whelk Thais lamellosa by C. Campbell (1978). The means, standard
deviations and correlations for each set are given in Table 4.1.
For both the Dicathais and Thais data, five data sets with a
sample size of 50 for each group were generated from an underlying
multivariate Gaussian distribution. Two further sets were generated
corresponding to the Thais data with sample size 200 in each group.
The procedure used to generate the data is outlined below. The robust
M-estimation canonical variate analysis described above was carried out
91
![Page 93: CANONICAL VARIATE ANALYSIS: SOME PRACTICAL ASPECTS by ... · 5.5 C-V plot for arctanh correlations for grasshopper data 137 5.6 I-A and Q-Q plots for log variances for Thais data](https://reader033.vdocuments.mx/reader033/viewer/2022052106/60412acbe2295f639322b604/html5/thumbnails/93.jpg)
4 ep xrM ,
_ I -,4r - ~T
1
92
Figure 4.1 - Representation of the components of the squared
distance d2 for three groups and two variables. km
The vectors c and c are arbitrarily centred
at xT. The horizontal component is the squared
distance of xkm from xk in the canonical variate
plane. The vertical component is the squared
distance of xkm above the canonical variate
plane, scaled by the corresponding canonical root
as (l+f.)-1/2.
![Page 94: CANONICAL VARIATE ANALYSIS: SOME PRACTICAL ASPECTS by ... · 5.5 C-V plot for arctanh correlations for grasshopper data 137 5.6 I-A and Q-Q plots for log variances for Thais data](https://reader033.vdocuments.mx/reader033/viewer/2022052106/60412acbe2295f639322b604/html5/thumbnails/94.jpg)
93
with b0 in the range 2.0 to 3.0, with the non-descending w in (3.6)
until successive estimates of the canonical roots were within 10-3
of the previous estimates, and then with b2 = 1.25 for the redescending
win (3.7) until convergence.
The results from these analyses with b0 = 2.25 are given in
Table 4.2. Only three of the twelve runs show an increase in the
sample canonical root of more than 1% when robust estimation is intro-
duced. The redescending function in (3.7) with b2 = 1.25 shows little
increase in the canonical roots over those calculated with the non-
descending function in (3.6). For the Dicathais-based data, only
three weights are less than 0.75, and none is less than 0.50 (out of
5x5x50 observations). For the Thais-based data with nk = 50, only three
weights are less than 0.75 (out of 5x6x50); one of these is less than
0.50 (0.35). For the Thais-based data with nk = 200, only four weights
are less than 0.75 (out of 2x6x200) and none is less than 0.50. When
b0 is taken as 2.0, the ratio of canonical roots is increased by less
than 1%, while the weights are reduced by approximately 0.11. On the
basis of these results for the generated data, it seems reasonable to
conclude that with b0 = 2.25 and b2 = 1.25, a weight of less than 0.30
will be associated with an atypical observation; probably observations
with weights between 0.30 and 0.50 warrant closer examination.
The multivariate Gaussian vectors were produced by orthonormal
rotation of vectors of independent Gaussian random numbers. The latter
were generated by the polar method of Marsaglia and Bray (1964) (see
Atkinson and Pearce, 1976, Section 5.1.6, for further details). If
z ti Nv(0,I) and if V = UEUT is an eigendecomposition of V, then
x = x + UE1/2z ti Nv(x,V). According to timings given in Barr and
Slezak (1972), this is not the most efficient method; orthonormalization
based on the Choleski triangular decomposition given in Section 1.5 is
![Page 95: CANONICAL VARIATE ANALYSIS: SOME PRACTICAL ASPECTS by ... · 5.5 C-V plot for arctanh correlations for grasshopper data 137 5.6 I-A and Q-Q plots for log variances for Thais data](https://reader033.vdocuments.mx/reader033/viewer/2022052106/60412acbe2295f639322b604/html5/thumbnails/95.jpg)
94
Table 4.1 Underlying means, standard deviations and correlations
for the data generated in Section 4.2.
(a) Dicathais
Grp 1 Grp 2 Grp 3 Grp 4 Grp 5 vl v2 v3 v4
vl 39.36 33.39 44.02 33.34 55.94 9.391a 0.967 0.984 0.975
v2 16.10 11.99 14.91 13.34 25.00 4.077 0.916 0.911
v3 28.04 25.58 33.51 24.92 38.93 6.664 0.987
v4 12.81 12.02 17.46 13.02 20,84 3.374
(b) Thais
Grp 1 Grp 2 Grp 3 Grp 4 Grp 5 Grp 6 vl v2 v3 v4 v5 v6 v7
vl 346.3 320.6 410.3 549.3 263.8 341.9 88.d10.990 0.983 0.837 0.938 0.971 0.969
v2 276.4 269.3 344.8 428.7 191.5 254.5 66.1 0.991 0.848 0.945 0.977 0.975
v3 218.1 210.8 280.5 343.2 149.9 195.9 52.2 0.845 0.952 0.976 0.967
v4 62.1 44.2 73.3 87.1 49.0 63.1 13.6 0.803 0.853 0.845
v5 65.9 75.1 97.4 119.3 51.6 66.9 16.8 0.935 0.923
v6 193.5 193.1 245.1 301.8 139.1 176.5 47.1 0.985
v7 158.2 163.8 200.2 247.4 117.6 150.1 39.2
a standard deviations on diagonal, correlations off diagonal
![Page 96: CANONICAL VARIATE ANALYSIS: SOME PRACTICAL ASPECTS by ... · 5.5 C-V plot for arctanh correlations for grasshopper data 137 5.6 I-A and Q-Q plots for log variances for Thais data](https://reader033.vdocuments.mx/reader033/viewer/2022052106/60412acbe2295f639322b604/html5/thumbnails/96.jpg)
95
Table 4.2 Summary of robust M-estimation canonical variate analyses
of generated data. The constant b0 for the non-descending
function w is 2.25.
(a) Ratio of M-estimation canonical roots to usual canonical roots, for each of the runs for the first two roots (fl and f2). The first line is for the non-descending w, and the second is for the re-descending w with b2 =
(i) Dicathais-based data nk = 50
1.25.
f : 1.014 1.019 1.000 1.004 1.006 f2: 1.007 1.007 1.008 1.003 1.005 1 1.017 1.027 1.000 1.004 1.007 1.010 1.010 1.011 1.004 1.006
(ii) Thais-based data nk = 50
fl: 1.002 1.007 1.002 1.001 1.008 f2: 1.005 1.002 1.003 1.001 1.000 1.002 1.010 1.002 1.002 1.011 1.007 1.003 1.004 1.001 1.000
(iii) Thais-based data nk = 200
fl: 1.001 1.003 f2: 1.002 1.001 1.001 1.003 1.003 1.001
(b) Non-unit weights for each of the runs
Grp
(i)
1 2 3 4 5
Dicathais-based data nk = 50 v = 4 g = 5
run 1 run 2 run 3 run 4 run 5
0.78,0.96 0.85,0.90,0.93 0.95 0.55
0.96 0.98,0.99 0.93 0.61 0.81,0.84
(ii) Thais-based data nk = 50 v = 7 g = 6
Grp 1 0.35,0.96 0.80 0.85 2 0.95 3 0.90 0.93 4 0.86 0.67 5 0.86 0.96 0.87 6 0.91 0.97 0.98 0.67,0.97
(iii) Thais-based data nk = 200 v = 7 g = 6 (only two runs)
Grp 1 0.82, 0.84, 0.89, 0.91, 0.93 0.99
2 0.70
3 0.79, 0.88, 0.99
4 0.88, 0.89, 0.95, 0.97 0.74, 0.97
5 0.66, 0.85, 0.94, 0.97
6 0.71, 0.81
![Page 97: CANONICAL VARIATE ANALYSIS: SOME PRACTICAL ASPECTS by ... · 5.5 C-V plot for arctanh correlations for grasshopper data 137 5.6 I-A and Q-Q plots for log variances for Thais data](https://reader033.vdocuments.mx/reader033/viewer/2022052106/60412acbe2295f639322b604/html5/thumbnails/97.jpg)
96
faster. However, I have found the relative magnitude of the smallest
eigenvalue to be a more sensitive indicator of near-singularity than
that of the smallest diagonal element of the triangular matrix. For
this reason, I have adopted the eigenrotation for occasional generation
of multivariate data, so that simple monitoring or the nature of the
within-groups variation is provided. For a large-scale study, the
more efficient routine should be implemented.
4.3 Somē Practical Examples
The first data set to be examined is the Dicathais study by Phillips,
Campbell and Wilson (1973). Fourteen groups were collected around the
coast of Australia and New Zealand. Four variables were measured on
each animal. Group sizes are 102, 101, 75, 69, 29, 48, 32, 83, 88, 44,
34, 33, 82 and 61. Table 4.3 summarizes the canonical roots and vectors
for the first two canonical variates for the usual analysis and for the
various robust procedures described in Sections 4.1 and 4.2. The robust
M-estimates of means and of covariances were calculated as described in
Chapter Three, with b0 = 2.0 and b2 = 1.25. Figure 4.2 shows a plot of
the group means for the first two canonical variates.
The relative positions of the main clusters of groups are little
changed by the introduction of the robust procedures, though there are
some variations in the large cluster with low canonical variate scores.
In particular, the positions of groups 5 and 6 relative to groups 1, 3,
4 and 10 and to groups 2 and 7 have been altered under robust
M-estimation of the canonical vectors.
The estimates of the canonical vectors are little changed for the
various approaches. However, there is a marked change in the canonical
roots. For.the robust M-estimates, the increase is 29% and 15%. The
![Page 98: CANONICAL VARIATE ANALYSIS: SOME PRACTICAL ASPECTS by ... · 5.5 C-V plot for arctanh correlations for grasshopper data 137 5.6 I-A and Q-Q plots for log variances for Thais data](https://reader033.vdocuments.mx/reader033/viewer/2022052106/60412acbe2295f639322b604/html5/thumbnails/98.jpg)
97
Table 4.3 Canonical roots and vectors for Dicathais data.
Coefficients for standardized variables are given in brackets.
c-vectors c-root c-vectors c-root
usual c.v. analysis
-0.50 0.47 -0.35 1.62 2.13 0.48 -0.99 -0.31 0.43 1.68
(-4.8) (2.0) (-2.4) (5.6) (4.6) (-4.3) (-2.1) (1.5)
c.v. analysis -0.67 based on
0.78 -0.23 1.50 2.35 0.47 -0.98 -0.51 0.86 1.87
M-estimates of means and covariances
robust
(-6.3)
-0.62
(3.2)
0.86
(-1.5)
-0.31
(5.1)
1.46 2.75
(4.4)
0.39
(-4.0)
-0.90
(-3.4)
-0.54
(2.9)
1.04 1.94 M-estimation of canonical vectors b =2.25, b2=1.25
robust
(-5.6)
-0.55
(3.3)
0.61
(-2.0)
-0.41
(4.8)
1.72 2.58
(3.5)
0.50
(-3.5)
-1.09
(-3.5)
-0.42
(3.4)
0.72 2.22 M-estimation applied to c.v. scores, b =2.25, b2=1.25, variance = 1
robust
(-5.2)
-0.58
(2.5)
0.58
(-2.7)
-0.34
(5.8)
1.68 2.36
(4.7)
0.51
(-4.5)
-1.08
(-2.8)
-0.39
(2.4)
0.59 2.06 M-estimation applied to c.v. scores,
(-5.5) (2.5) (-2.3) (5.8) (5.0) (-4.6) (-2.6) (2.0)
b =2.25, b2=1 .25, robust variance
![Page 99: CANONICAL VARIATE ANALYSIS: SOME PRACTICAL ASPECTS by ... · 5.5 C-V plot for arctanh correlations for grasshopper data 137 5.6 I-A and Q-Q plots for log variances for Thais data](https://reader033.vdocuments.mx/reader033/viewer/2022052106/60412acbe2295f639322b604/html5/thumbnails/99.jpg)
6V30
V1 0
14 V
6 14
4
• • 9
• 12
• V/2 8 A/2
Ag
98
A usual c.v.a.
♦ M—estimates of means and covariances
• robust c.v.a. M—estimation of c.v. scores
2
S o •2
i' A _ :s® I3
Q 10 V . ® e
4 se = d► \ ♦ • 0 •3 ; V IIB
• l
DICATHAIS
14
-1
6 usual c.v.a.
08 68
612 V robust i1—estimates of canonical vectors — non—desc [v
69
o robust I1—estimates of canonical vectors — re—desc G)
O9v
0 1
12 v 2
2 a7
>° 1 227 5
lo . 405
z 104 O6e6
O 0 Q 3 vō
130 213
v0 61!
-1
0 i 2
canonical variate I
Figure 4.2 - Canonical variate means for Dicathais data
![Page 100: CANONICAL VARIATE ANALYSIS: SOME PRACTICAL ASPECTS by ... · 5.5 C-V plot for arctanh correlations for grasshopper data 137 5.6 I-A and Q-Q plots for log variances for Thais data](https://reader033.vdocuments.mx/reader033/viewer/2022052106/60412acbe2295f639322b604/html5/thumbnails/100.jpg)
99
use of the redescending w results in an increase of 15% and 5% over
the non-descending w, whereas the change was less than 1% for the
generated data.
Table 4.4 gives the non-unit weights less than 0.35 for the various
robust procedures adopted. Ten of the last 15 individuals for group 6
are listed. Inspection of the data shows that 18 individuals (31-48)
have measurements larger than those recorded for any of the remaining
groups. The variance for the first canonical variate for group 6 for
the usual analysis is twice that of the other variances. The robust
M-estimation of the canonical vectors has downweighted the influence
of these larger observations. Robust M-estimation of the means and
covariances indicates only one observation as being possibly atypical.
However, on closer inspection the Q-Q Gaussian plot of cube root of
Mahalanobis distances does show some slight curvilinearity, being
approximately linear for the first 30+ observations, and then curving
upwards, though the effect is not pronounced. Apart from group 6,
there is generally good agreement as to observations indicated as being
possibly atypical by robust M-estimation of canonical vectors and of
means and covariances. Moreover, of 19 observations with low weight
for robust M-estimation of the canonical vectors, 14 also have low
weight for either the first or the second canonical vector from robust
M-estimation of the scores.
The second data set to be examined is the Thais study by
C. Campbell (1978). Brief details and group sizes are given in Section
3.4. The results from applying robust M-estimation of means and
covariances are given in Section 3.4. The corrections noted in that
Section, namely changes of 50 or 100 units or of the order of two
numbers, have been made to the data set considered here. As Table 4.6
shows, only three observations then have weights less than 0.35.
![Page 101: CANONICAL VARIATE ANALYSIS: SOME PRACTICAL ASPECTS by ... · 5.5 C-V plot for arctanh correlations for grasshopper data 137 5.6 I-A and Q-Q plots for log variances for Thais data](https://reader033.vdocuments.mx/reader033/viewer/2022052106/60412acbe2295f639322b604/html5/thumbnails/101.jpg)
100
Table 4.4 - Summary of non-unit weights less than 0.35 from robust analyses of Dicathais data. b = 2.25, b
2 = 1.25 for robust
canonical variate analyses. b0 = 2.0, b2 = 1.25 for robust covariance estimation.
grp obs robust m-est c-vecs
m-est c v scores m-est c v scores variance = 1 robust variance cvl cv2 cvl cv2
m-est mean covariance
c-variate group variance
1 96 36a 10 2,34,15b 40 17,31c 1.3 2.7 102 03 55 02 2,74,19 67 17 49
2,82,18
3 16 32 17 90 03 2,22,30 45 1.3 1.3 75 04 01 00 2,23,18 00
4 65 32 05 2,56,26 06 39 60,22 1.3 1.6 68,27
6 34 28 5.5 2.0 38 04 05 65 39 07 15 41 30 42 07 32 44 07 37 35 45 00 02 01 04 37 46 00 47 12 48 00 00 33 79
7 31 00 03 2,25,08 11 2,25,26 00 1.2 2.4
8 2,6,15 83,08 2.0 2.0 2,1,30 2,4,31
10 1,2,20 1,2,17 1.3 0.6 2,42,05
11 34 07 03 02 00 27,00 1.5 1.0 31,00
12 15 02 42 2,33,27 85 00 2.7 3.1
13 26 00 79 79 00 80,08 1.8 1.4 82,01
a observation 96 for group 1 has a weight of 0.36 for robust M-estimation of the canonical vectors, weights of 1.00 and 0.10 for M-estimation of the canonical variate scores for the first two vectors when a variance of 1 is adopted and weights of 1.00 and 0.40 when a robust variance is adopted, and a weight of 1.00 for robust M-estimation of means and covariances.
b observation 34 has a weight of 0.15 for M-estimation of the canonical variate scores for the second (2) canonical vector, and unit weight for robust M-estimation of the canonical vectors.
c observation 17 has a weight of 0.31 for M-estimation of the means and covariance matrix, and unit weight for robust M-estimation of the canonical vectors.
![Page 102: CANONICAL VARIATE ANALYSIS: SOME PRACTICAL ASPECTS by ... · 5.5 C-V plot for arctanh correlations for grasshopper data 137 5.6 I-A and Q-Q plots for log variances for Thais data](https://reader033.vdocuments.mx/reader033/viewer/2022052106/60412acbe2295f639322b604/html5/thumbnails/102.jpg)
101
Table 4.5 Canonical roots and vectors for Thais data. Coefficients for standardized variables are given in brackets
canonical vector I c-root
usual c.v. analysis -0.62 0.35 0.40 -0.85 0.35 0.14 0.24 3.585 (-5.5) (2.4) (2.1) (-1.2) (0.6) (0.7) (0.9)
c.v. analysis based -0.62 0.36 0.40 -0.85 0.34 0.14 0.23 3.615 on M-estimates of (-5.5) (2.4) (2.1) (-1.2) (0.6) (0.7) (0.9) means and covariances
robust M-estimation -0.69 0.42 0.42 -0.87 0.31 0.16 0.24 4.189 of canonical vectors(-5.2) (2.4) (1.8) (-1.0) (0.4) (0.7) (0.8) b0=2.25, b2=1.25
robust M-estimation -0.69 0.42 0.45 -0.90 0.37 0.17 0.19 4.382 applied to c.v. (-6.0) (2.7) (2.3) (-1.2) (0.6) (0.8) (0.7) scores, b0=2.25, b2=1.25, variance = 1
robust M-estimation -0.65 0.40 0.43 -0.91 0.36 0.12 0.23 4.037 applied to c.v. (-5.8) (2.6) (2.3) (-1.2) (0.6) (0.6) (0.9) scores, b =2.25, b2=1.25, robust variance
canonical vector II c-root
usual c.v. analysis -0.09 -0.20 0.19 0.74 0.36 0.16 -0.07 1.209 (-0.8)(-1.3) (1.0) (1.0) (0.6) (0.8) (-0.3)
c.v. analysis based -0.10 -0.19 0.19 0.78 0.40 0.13 -0.07 1.237 on M-estimates of (-0.9)(-1.2) (1.0) (1.0) (0.7) (0.6) (-0.3) means and covariances
robust M-estimation -0.15 -0.27 0.37 0.89 0.08 0.29 -0.10 1.631 of canonical vectors(-1.1)(-1.5) (1.6) (1.0) (0.1) (1.1) (-0.3) bo= 2.25, b2=1.25
robust M-estimation -0.15 -0.18 0.24 0.94 0.00 0.31 -0.04 2.101 applied to c.v. (-1.3)(-1.2) (1.3) (1.2) (0.0) (1.4) (-0.1) scores, b =2.25, b2=1.25, vvariance = 1
robust M-estimation -0.12 -0.18 0.15' 0.77 0.31 0.20 -0.02 1.277 applied to c.v. (-1.0)(-1.2) (0.8) (1.0) (0.5) (0.9) (-0.1) scores, b0=2.25, b2=1.25, robust variance
![Page 103: CANONICAL VARIATE ANALYSIS: SOME PRACTICAL ASPECTS by ... · 5.5 C-V plot for arctanh correlations for grasshopper data 137 5.6 I-A and Q-Q plots for log variances for Thais data](https://reader033.vdocuments.mx/reader033/viewer/2022052106/60412acbe2295f639322b604/html5/thumbnails/103.jpg)
102
-~ 4 +
5
6.72-f (a)
b 0
6 7
4
U -E 5.12+ 0
12 1 9 3
10
3152--1- 11 2 _1- + 4 -1- + 4
-2.24 -0.64 0.96 2.56 4.16 Canonical voriafe I
8.2 -I- tb) 7
4
8
-I-
5
6.2 + 0
6 1. 9
3 12 10
4.2 -i- 11 2 -1- + + + + +
-2.08 -0.48 1.12 2.72 4.32 canonical variate I
Figure 4.3 - Canonical variate means for Thais data.
(a) usual canonical variate analysis
(b) robust M-estimation of the canonical vectors, with b0 = 2.25, b2 = 1.25.
![Page 104: CANONICAL VARIATE ANALYSIS: SOME PRACTICAL ASPECTS by ... · 5.5 C-V plot for arctanh correlations for grasshopper data 137 5.6 I-A and Q-Q plots for log variances for Thais data](https://reader033.vdocuments.mx/reader033/viewer/2022052106/60412acbe2295f639322b604/html5/thumbnails/104.jpg)
103
Table 4.6 - Summary of non-unit weights less than 0.35 from robust analyses of Thais data. by = 2.25, b2 = 1.25 for robust canonical variate analyses. by = 2.00, b2 = 1.25 for robust covariance estimation. See Table 4.4 for explanation of entries.
grp obs robust m-est c-vecs
m-est c v scores m-est c v scores m-est mean variance=1 robust variance covariance
cvl cv2 cvl cv2
1 50 05
4 52 42 1,45,17 1,45,22 20 1,64,07 1,64,11
5 23 40 70 1,5,23 2,1,29
2,3,16 2,32,35
6 23 01 00 1,20,37 24 00 21 00 2,22,01 25 40 02 26 00 07 00 87 27 00 60 03 28 01 02 29 01 00 30 08 00 31 00 19 00 32 02 00 33 00 00 34 00 00 35 00 00 36 00 73 00
7 22 11 2,1,03 2,1,28 32 02 81 2,3,03 2,3,2w
2,4,32
8 43 19 08 1,34,08 22 1,34,24 45 07
10 1,28,24
11 1,28,73 1,28,05 28,00
12 5 24 00
![Page 105: CANONICAL VARIATE ANALYSIS: SOME PRACTICAL ASPECTS by ... · 5.5 C-V plot for arctanh correlations for grasshopper data 137 5.6 I-A and Q-Q plots for log variances for Thais data](https://reader033.vdocuments.mx/reader033/viewer/2022052106/60412acbe2295f639322b604/html5/thumbnails/105.jpg)
14.2.o, 62..0 +
3. 0 (a)
4
4*
104
-+-
2.0+
# 4' 60= 2 o, 4= oo _}. 1 0+ 4- -- — --I- + -I- 4-
-2.4 -1.2 0.0 1.2 2.4 Gaussian quonf;les
20. 0+ (6)
11 1*
#ir1111 * #
0.0 + -I- -I- +- -1- -I- 0.0 5.0 10.0 15.0 20.0
Gamma (2,3.5) quantiles
Figure 4.4 - Q-Q plots of Thais data from group 6.
(a) Gaussian plot of cube root of squared Mahalanobis distance
(b) Gamma plot of squared Mahalanobis distance
![Page 106: CANONICAL VARIATE ANALYSIS: SOME PRACTICAL ASPECTS by ... · 5.5 C-V plot for arctanh correlations for grasshopper data 137 5.6 I-A and Q-Q plots for log variances for Thais data](https://reader033.vdocuments.mx/reader033/viewer/2022052106/60412acbe2295f639322b604/html5/thumbnails/106.jpg)
105
Table 4.5 summarizes the canonical roots and vectors for the usual
and various robust canonical variate analyses. Figure 4.3 shows a plot
of the canonical variate group means. The two plots are very similar,
except for the marked change in the position of group 6 for the second
canonical variate. The largest 14 individuals for this group have low
weights for the robust M-estimation and for the second canonical variate
for M-estimation of the scores with unit variance (Table 4.6). None of
these has low weight for the robust covariance estimation. Inspection
of the data shows that individuals 22-36 are larger than any collected
from the other groups. Q-Q plots of Mahalanobis distances in Figure 4.4
show marked lack of linearity.
There is generally good agreement between the weights from the
robust M-estimation of the canonical vectors and from M-estimation of
the scores with variance unity. The canonical vectors are little changed
by the various robust procedures, though there is again a marked change
in the roots.
For both examples, there is an obvious explanation for the changes
resulting from the use of robust procedures. And in each case, the data
from the group with the large animals do not agree with a multivariate
Gaussian form, unlike those for the remaining groups, so that initial
examination of the data would indicate the need for caution.
4.4 Discussion
The conclusion to be drawn from the analyses reported here and
those of other data sets is that the canonical vectors are little
influenced by a small number of atypical observations, and hence the
assessment of the relative importance of the variables is little affected.
However, when the influence of the atypical observations is downw eighted,
![Page 107: CANONICAL VARIATE ANALYSIS: SOME PRACTICAL ASPECTS by ... · 5.5 C-V plot for arctanh correlations for grasshopper data 137 5.6 I-A and Q-Q plots for log variances for Thais data](https://reader033.vdocuments.mx/reader033/viewer/2022052106/60412acbe2295f639322b604/html5/thumbnails/107.jpg)
106
the canonical roots are increased, often by as much as 15%. Unless a
particular group contains a reasonable proportion of observations
which are downweighted, as in the Thais example, the pattern of the
canonical variate means are generally similar for the usual and robust
analyses.
The robust M-estimation of the canonical vectors indicates values
which are atypical in relation to the summary provided by the estimated
group means and covariance matrix in the chosen number of dimensions.
Observations may be genuinely atypical in some way. However, it is
important to ensure that the representation provided by the canonical
variates in the reduced number of dimensions considered is adequate for
the group or groups with atypical observations before interpreting the
results. It may well be that a group mean lies a significant distance
above or below the canonical variate plane so that observations are
distant rather than atypical in the sense of a wrong measurement(s) or
variable observation. The use of robust M-estimation of the vectors
and of the scores will be informative here. For the Thais data, the
larger observations for group 6 have low weight for the M-estimation of
the vectors and for the second vector resulting from M-estimation applied
to the scores with unit variance adopted. The agreement for the
Dicathais data is less pronounced.
The question arises as to whether the robust procedures should be
used in preference to the usual analysis, and if so, which one(s) should
be used. For an occasional atypical observation(s) in some of the
groups, my preference is for the robust M-estimates of the vectors and
the vectors from M-estimation of the scores adopting unit variance. For
the examples considered, the choice is more complicated. The Dicathais
data show reasonable agreement with an underlying multivariate Gaussian
distribution, even for group 6. And the third canonical root is small,
![Page 108: CANONICAL VARIATE ANALYSIS: SOME PRACTICAL ASPECTS by ... · 5.5 C-V plot for arctanh correlations for grasshopper data 137 5.6 I-A and Q-Q plots for log variances for Thais data](https://reader033.vdocuments.mx/reader033/viewer/2022052106/60412acbe2295f639322b604/html5/thumbnails/108.jpg)
107
indicating an adequate representation. It may well be that the animals
from that region grow to a larger size, and that techniques which allow
for different covariance matrices, such as those discussed in Chapter
Eight, are more appropriate for the analysis. The Thais data show
reasonable agreement with a multivariate Gaussian distribution, apart
from group 6 for which the Q-Q plots are somewhat distorted. The data
for group 6 would warrant closer examination. The larger animals do not
form part of the same population as the smaller animals, and the robust
approaches indicate that the larger animals are atypical when compared
with the remaining data. It seems reasonable to conclude that for
occasional atypical observations, the robust procedures provide a simple
and effective means of accommodating such observations in the analysis.
In addition, indication of subsets of atypical observations may lead to
further insight into the data.
Ahmed and Lachenbruch (19 T1) and Randles, Broffitt, Ramberg and
Hogg (1978) have considered the use of robust procedures for discriminant
analysis. The interest in both cases is in allocation rates. Ahmed and
Lachenbruch (1977) use an iterative trimming suggestion of Gnanadesikan
and Kettenring (1972), with either 5% or 10% trimming of observations
with the largest Mahalanobis distances, to estimate means and covariances,
and then calculate the discriminant function. They also use 15% or 25%
trimming of the discriminant scores as an alternative procedure, with
the discriminant vector recalculated from the remaining observations.
They show improved performance over the usual discriminant function
when contamination is present, with similar performance of the three
approaches for Gaussian data. To me, trimming is less appealing than
M-estimation. The latter reduces the influence of an observation only
if the observation is atypical, whereas trimming always reduces the
influence of'some observations. Randles, Broffitt, Ramberg and Hogg
![Page 109: CANONICAL VARIATE ANALYSIS: SOME PRACTICAL ASPECTS by ... · 5.5 C-V plot for arctanh correlations for grasshopper data 137 5.6 I-A and Q-Q plots for log variances for Thais data](https://reader033.vdocuments.mx/reader033/viewer/2022052106/60412acbe2295f639322b604/html5/thumbnails/109.jpg)
108
(1978) use M-estimation of means and covariances with b1 = 2.0,
b2 = =, and then calculate the discriminant function. As an alternative,
they suggest estimating the discriminant vector by maximising a function
of the between-to-within groups ratio of a linear combination of the
variables. To do this, they suggest choosing the function so that the
influence of observations whose scores are a "great distance from a
robust measure of the middle" is reduced. This suggestion has a basic
weakness. For consider two groups which are well separated. Then all
observations will be a great distance from the cutoff point, and so
under their suggestion all observations will be markedly downweighted.
![Page 110: CANONICAL VARIATE ANALYSIS: SOME PRACTICAL ASPECTS by ... · 5.5 C-V plot for arctanh correlations for grasshopper data 137 5.6 I-A and Q-Q plots for log variances for Thais data](https://reader033.vdocuments.mx/reader033/viewer/2022052106/60412acbe2295f639322b604/html5/thumbnails/110.jpg)
CHAPTER FIVE: GRAPHICAL COMPARISON OF COVARIANCE MATRICES
This Chapter develops procedures for comparing within-group
covariance matrices. The procedures are based on separate analyses
of the variances and of the correlations. The variances and correlations
are represented as two-way tables, with the columns representing groups.
Section 5.2.1 develops graphical procedures based on comparisons of
linear regressions, by considering a multiplicative columns-regression
model for the interaction of groups X variances and of groups x corre-
lations. A multivariate comparison is considered in Section 5.2.2,
and this leads to the use of canonical variate analysis to display the
differences in covariance structure. Section 5.2.3 presents procedures
based on orthonormalization of the original variables. For equal
covariance matrices, correlations between suitably orthonormalized
variables should be zero, and variances should be unity. Section 5.3
applies the procedures to two sets of data, while Section 5.4 gives
further discussion of the various approaches.
5.1 Introduction
A fundamental assumption in canonical variate analysis is that of
equality of covariance matrices. The commonly-used procedure for
comparing covariance matrices for several groups is based on the
likelihood ratio criterion; this is the procedure presented in virtually
all recent multivariate texts (an exception is Gnanadesikan, 1977).
This criterion is known to be very sensitive to non-normality (Layard,
1972, 1974). A further drawback in applied studies is that no readily-
interpretable information is provided as to how the covariance matrices
differ.
109
![Page 111: CANONICAL VARIATE ANALYSIS: SOME PRACTICAL ASPECTS by ... · 5.5 C-V plot for arctanh correlations for grasshopper data 137 5.6 I-A and Q-Q plots for log variances for Thais data](https://reader033.vdocuments.mx/reader033/viewer/2022052106/60412acbe2295f639322b604/html5/thumbnails/111.jpg)
110
There are at least two distinct reasons for studying differences
in covariance structure. One is to detect and identify differences in
covariance structure per se when these are of direct interest in the
particular field of application. A second reason is to be able to
relate the differences to the subsequent effect, if any, on the
ordination resulting from a canonical variate analysis.
5.2 Graphical Comparisons
Let Vk be the sample covariance matrix for the kth group,
k = 1,...,g, and write Vk in terms of its variances ski, i = 1,...,v,
and its correlations rkj, j = 1,...,v(v-1)/2. The ordering of the j
is taken here to correspond with the order of entry in the upper
triangle of the correlation matrix, viz. (1,2),(1,3),...,(v-1,v).
The variances and correlations for each group provide one natural
summary of the scatter of the variables and of the linear relationships
between them, summarizing size and orientation aspects of the data.
They also provide a basic description in the sample space geometry of
multivariate analysis - the variance is represented by the squared
length of a vector and the correlation by the cosine of the angle
between two vectors. Moreover, the variances and the correlations can
be transformed so that the distributions of the transformed statistics
have second moments approximately independent of the corresponding
population parameter; the distributions more closely approximate to the
Gaussian form. For the variance, either the logarithmic transformation
(Bartlett and Kendall, 1946) or the cube root transformation (Wilson
and Hilferty, 1931) can be used. For the correlation coefficient, the
arctanh transformation of Fisher (1921) is used.
![Page 112: CANONICAL VARIATE ANALYSIS: SOME PRACTICAL ASPECTS by ... · 5.5 C-V plot for arctanh correlations for grasshopper data 137 5.6 I-A and Q-Q plots for log variances for Thais data](https://reader033.vdocuments.mx/reader033/viewer/2022052106/60412acbe2295f639322b604/html5/thumbnails/112.jpg)
111
The comparisons of covariance matrices described in this Chapter
are based on separate comparisons of the variances and of the
correlations. For convenience, either the transformed variances or
the transformed correlations for the g groups will be denoted by tik
with i = 1,...,q where q = v or q = v(v-1)/2,and k = 1,...,g, and will
be referred to as elements.
The basis of the procedure to be described in detail in the next
Section is to consider the q elements for the g groups as two qxg
matrices. Each matrix is analogous to a two-way data matrix in
analysis of variance; the interest here is in the nature of the
interaction, if any, between the groups and elements.
Alternatively, each column can be written as a multivariate vector
tk, k = 1,...,g. In a multivariate context, for the groups to have
similar covariance matrices, they must have similar profile vectors
tk (see Morrison, 1976, Section 4.6, for a discussion of the latter).
A multivariate comparison is discussed in Section 5.2.3.
5.2.1 Individual-average plot
Consider now a qxg elements x groups two-way table. If there are
no differences in the covariance matrices across groups, this means
that the elements within each row will be essentially the same and that
the pattern of changes of the elements is similar from group to group.
In the context of analysis of variance, this is equivalent to specifying
that there is no group effect and that there is no interaction of
elements with groups.
Since the elements are themselves summary statistics, they have
associated variances and covariances. If these are assumed known (see
discussion of this below), then a weighted analysis of variance can be
![Page 113: CANONICAL VARIATE ANALYSIS: SOME PRACTICAL ASPECTS by ... · 5.5 C-V plot for arctanh correlations for grasshopper data 137 5.6 I-A and Q-Q plots for log variances for Thais data](https://reader033.vdocuments.mx/reader033/viewer/2022052106/60412acbe2295f639322b604/html5/thumbnails/113.jpg)
112
formed, and the hypothesis of no interaction (and of no main effect)
in the two-way table can be examined by the usual F-test. However
this gives little insight into the nature of the interaction. To gain
more insight, a graphical approach is proposed which, when combined
with a formal statistical analysis, also provides the SSQs for
interaction and for groups in the analysis of variance table.
The basis of the procedure is the result that the hypothesis of
no interaction in the analysis of a two-way table can be expressed as
that of a linear regression, with slope unity, of the set of elements
for a particular group on the row means. The hypothesis of no group
effect corresponds to that of the same zero intercept for each group.
To see this, consider the familiar rows x columns model for the
analysis of variance, viz.
E(tik) = u + ri + ck + Tik
The expected row means are given by
E(ti.) = u + ri
(5.1)
(5.2)
under the usual constraints that r. = c. = Ti. = T.k = 0, where the •
subscript denotes average over the subscript. From (5.1) and (5.2),
if the interaction Tik is null and the column (here group) effect ck
is null, then E(tik) = E(ti.). Equivalently, there is a linear
relationship between E(tik) and E(ti.) with slope unity and intercept
zero.
Since a single linear regression with unit slope and zero intercept
is of ultimate interest, the proposed graphical procedure is to plot
the individual element values tlk,...,tgk against the row means
![Page 114: CANONICAL VARIATE ANALYSIS: SOME PRACTICAL ASPECTS by ... · 5.5 C-V plot for arctanh correlations for grasshopper data 137 5.6 I-A and Q-Q plots for log variances for Thais data](https://reader033.vdocuments.mx/reader033/viewer/2022052106/60412acbe2295f639322b604/html5/thumbnails/114.jpg)
113
tl.,...,t for all g groups and see if there is reasonable agreement q.
with the null model of a single common linear relationship. The
formal statistical analysis consists of the fitting of linear
regressions for each group, and the comparison of the fitted slopes
and intercepts. The plot is here called the I-A (individual-average)
plot. The idea was proposed by Yates and Cochran (1938) in the
context of an examination of the interaction of varieties of barley
and the places at which they were grown.
The alternative specification of linear regressions with unspecified
intercept and slope is adopted because it usually provides a simple
description of the resulting I-A plot. The summary statistics of slope,
intercept, % variation explained, and residual mean square complement
the graphical interpretation. A scatter plot of mean level against
fitted slope will be referred to subsequently as an M-S plot. Moreover,
the calculations for the comparison of the fitted regressions for
equality of slope and thence of intercept give the complete analysis of
variance table.
Consider the alternative specification in more detail, namely a
linear relationship between E(t. - t•k) = r. + Tik and E(ti. - t.,) = r. .1. ik
That is, consider the special structure
ri + Tik Skri . (5.3)
Summation over k gives
or
g gri = ri E Sk
k=1 (5.4)
s = 1.
![Page 115: CANONICAL VARIATE ANALYSIS: SOME PRACTICAL ASPECTS by ... · 5.5 C-V plot for arctanh correlations for grasshopper data 137 5.6 I-A and Q-Q plots for log variances for Thais data](https://reader033.vdocuments.mx/reader033/viewer/2022052106/60412acbe2295f639322b604/html5/thumbnails/115.jpg)
114
From (5.3),
Tik = (6k - 1)ri
and so the model (5.1) then becomes
E(tik) = u + r. + ck + (6k - liri
(5.5)
= u + ck + skri ,
with the Sk constrained as in (5.4). This specifies a linear relation-
ship between E(tik) and E(ti.), with slope Sk and intercept
E(tik) - 8 E(t..). When Sk = a for all k, (5.4) shows that Sk = s = 1,
and so, from (5.3), Tik = 0. That is, within this setup, a common
slope of unity corresponds to a null interaction effect. The intercept
then becomes E(tik) - E(t) = ck; a zero intercept corresponds to a
null column (i.e. group) effect. The specification adopted in this
paragraph is that of a multiplicative model for the interaction in the
two-factor linear model (see, e.g., Williams, 1952; Mandel, 1961).
The parameters in the model (5.5) can be estimated by maximum
likelihood under the assumption of Gaussian errors. This leads to an
eigenanalysis, the solution of which gives the estimates of the Sk and
of the r. (Williams, 1952; Mandel, 1971).
An alternative is to use a conditional regression approach (to
use the terminology of Mandel, 1961); this approach follows directly
from the Yates-Cochran (1938) formulation. First, the parameters in
(5.1) are estimated withTik = 0, under the assumption of independent,
identically distributed Gaussian errors. This gives the usual estimates
![Page 116: CANONICAL VARIATE ANALYSIS: SOME PRACTICAL ASPECTS by ... · 5.5 C-V plot for arctanh correlations for grasshopper data 137 5.6 I-A and Q-Q plots for log variances for Thais data](https://reader033.vdocuments.mx/reader033/viewer/2022052106/60412acbe2295f639322b604/html5/thumbnails/116.jpg)
115
ri = t - ti, and ck = t.k - t . Then the estimates of the slopes
Sk are found by regressing thetik on the t. for each column
k = 1,...,g.
The conditional regression approach is adopted here, largely
because of the computational benefits which result from the subsequent
comparison of regression table. Since the regressor variable is simply
the row mean and hence is the same for all regressions, the latter
table contains the SSQs for the complete analysis of variance. Moreover
the partition of the interaction SSQ into components due to the
comparison of slopes and to deviations from regression leads to standard
statistical tests. As Tukey (1949), Scheffe (1959, Section 4.8) and
Mandel (1961, Section 4) have shown, the SSQs follow x2 distributions,
with g - 1 and (g-1)(g-2) d.f. respectively. The procedure for
comparison of regressions is well-known (see, e.g., Sprent, 1969,
Section 7.4): first fit individual regressions for each column; then
fit a common within-columns regression; and finally bulk the data over
all columns and fit a single overall linear regression. The differences
in residual SSQs for the three stages give., the SSQ due to common slope,
and conditionally on common slope the SSQ due to common position. To
see how the simplifications for the comparison of regressions result,
write yik for the observed tik and xik for the regressor variable so
thatxik = t. for all k. Then
g q g q q E E (yik-y)(xik
-x) = E E (tik-t.k)(ti -t..) = g E (t. .-t.. k=1 i=1 k=1 i=1 i=1
and
gq 2 g q 2 2 E (x. -x ) = E E (t. -t ) = g E (t, -t ) .
k=1 i=1 lk •k k=1 i=1 1~ i=1 1~
![Page 117: CANONICAL VARIATE ANALYSIS: SOME PRACTICAL ASPECTS by ... · 5.5 C-V plot for arctanh correlations for grasshopper data 137 5.6 I-A and Q-Q plots for log variances for Thais data](https://reader033.vdocuments.mx/reader033/viewer/2022052106/60412acbe2295f639322b604/html5/thumbnails/117.jpg)
116
4 Hence the SSQ due to common regression is g E (t. -t )2 = row SSQ,
isl 1• , .
while the estimate of common slope is unity. Similar calculations
for the remaining terms in the comparison of regressions table lead
to Table 5.1.
The results outlined above assume independent and identically
distributed Gaussian errors for each element. In the two-way tables
for comparison of covariance matrices, this does not hold, since the
elements within a column (group) are in general correlated, while the
columns will have different variances if the group sizes differ.
Hence weighted regressions are desirable. Consider first the effect of
different group sizes. Specifically, var(tanh-1 rkj
) = (nk-3)-1,
var(log ski) = 2(nk-1)-1, and var(s1/3) = 2{9(nk-1)1-1. Hence when ki
the group sizes differ, the weights will differ from column to coiumn.
As a consequence, while the orthogonality of the row and the column
effects will still hold, the interaction or rowsxcolumns effect will
no longer be orthogonal to the row effect. However, an added advantage
of the conditional regression approach is that since the interaction
SSQ is conditional on the usual row and column effects, it gives the
SSQ usually calculated when non-orthogonality exists.
A more fundamental problem is that although the variances of the
elements may be regarded as known, their asymptotic covariances (and
hence the weighting in the regressions) depend on the unknown
population correlations pig, as summarized below. The
approach adopted here is based on the common observation that in many
discrimination studies, the pooled covariances matrix is calculated
with a reasonable number of degrees of freedom and should therefore
provide good estimates of thepig.
Some protection against atypical elements is desirable, however,
since they may unduly influence the average across groups; to provide
![Page 118: CANONICAL VARIATE ANALYSIS: SOME PRACTICAL ASPECTS by ... · 5.5 C-V plot for arctanh correlations for grasshopper data 137 5.6 I-A and Q-Q plots for log variances for Thais data](https://reader033.vdocuments.mx/reader033/viewer/2022052106/60412acbe2295f639322b604/html5/thumbnails/118.jpg)
Table 5.1 Relationship between analysis of variance SSQs for two-way table and comparison of
regressions calculations with row means as regressor variable. Degrees of freedom
for x2 are also given.
Total SSQ
Regression SSQ Deviation SSQ
(a) individual regressions row + interaction row + regre.sion deviation g (q-1) q+g-2 (q-2) (g-1)
(b) common slope regressions row + interaction row interaction g (q-1) q-1 (q-1) (g-1)
difference of (a) and (b)
regression regression g-1 g-1
(c) overall (single) regression row + column + row column + interaction interaction q-1
q(g-1) qg-1
difference of (b) and (c) column column g-1 g-1
![Page 119: CANONICAL VARIATE ANALYSIS: SOME PRACTICAL ASPECTS by ... · 5.5 C-V plot for arctanh correlations for grasshopper data 137 5.6 I-A and Q-Q plots for log variances for Thais data](https://reader033.vdocuments.mx/reader033/viewer/2022052106/60412acbe2295f639322b604/html5/thumbnails/119.jpg)
118
this, a robust average of the elements for each row is used. Specifi-
cally, the midmeans of the arctanh r for each m are calculated, and km
the robust means are back-transformed to provide estimates of the pij.
Remember that the order m = 1,...,q corresponds to (i,j) in the
sequence (1,2),(1,3),...,(v-1,v).
An alternative to the calculations proposed above is to determine
the pooled estimates of the pij from those groups which appear to be
similar from the I-A plots; the pooling is over all the elements in
similar groups. There is then the choice of whether to average the
arctanh values and backtransform, or to pool the individual correlations
directly, or to pool the individual covariance matrices in the usual
way and then calculate the correlations. With a computer it is
straightforward to use the various alternatives and compare the results.
In practical applications to date, the graphical descriptions provided
by the alternative estimates have been similar to those provided by
the midmean estimates. This is discussed further in Section 5.3.
In the practical application of the approach, the assumption is
made that the population covariance matrix P for the elements is known.
Asymptotic variances and correlations of the transformed variances and
transformed correlations can be derived from results given in Elston
(1975). Let s denote the sample variance for the ith variable, r.
the sample correlation coefficient, and let aii and
pli denote the
corresponding population parameters. Asymptotically, var(sii) = 2n-laii 2
and cov(sii,sjj) = 2n-1aiiajjpij. Now use the second order result for
the variance of a function of sii. Since Slog sii/asii evaluated at
aii gives aii, var(log sii) = 2n-1 for all i = 1,...,v, and
cov(logsii,log s..) = 2n-1
p2
j. The result for var(log sii) agrees
with that of Bartlett and Kendall (1946) if n-1 replaces n. Hence the
asymptotic covariance matrix for log sii is of the simple form
![Page 120: CANONICAL VARIATE ANALYSIS: SOME PRACTICAL ASPECTS by ... · 5.5 C-V plot for arctanh correlations for grasshopper data 137 5.6 I-A and Q-Q plots for log variances for Thais data](https://reader033.vdocuments.mx/reader033/viewer/2022052106/60412acbe2295f639322b604/html5/thumbnails/120.jpg)
119
2(n-1)-1P, where P is a vxv matrix with unit diagonal terms and off-
diagonal elements pij. For the correlation coefficient, asymptotically
var(rij) = n-l(1—pij)2 (Elston, 1975, p.136), while
cov(rij ,rik ~ ) = n
-1{p.k (1-P
2 ij -P2 ik 2ij ) - 1p pik
(1-p2ij -P
2 jk -P
2 ik )} and
cov(rij,rkm) = n-1{2 ijPkm(Pik+Pim+pjk+pjm) - (PijPikPim + PijPjkPjm
+ PikPjkPkm + PimPjmPkm) + PikPjm + PimPjk}. Since a tank lrij/ar
ij
evaluated at pij gives (1-pij)-1, it follows that var(tanh-lrij) = n-1
for all i = 1,...,v-l; j = i+1,...,v, which agrees with Fisher (1921)
if n-3 replaces n. The asymptotic covariance matrix for tank lr.j is i
of the simple form (n-3)-1P, where P is a v(v-1)/2xv(v-1)/2 matrix
with unit diagonal terms and off-diagonal terms given by
2 coy (ri j ,rik) { (1-pij) (1-Pik) }-1 or coy (r .. ,rfi) { (1-pij) (1-pkm) }-l.
Note that the asymptotic covariance matrices depend only on the unknown
pij. The back-transformed robust midmeans are substituted for the pij
in the practical applications in Section 5.3, unless otherwise indicated.
The calculations for the comparison of regressions with known
covariance matrix follow directly from the usual theory (see, e.g.,
Sprent, 1969, Section 4.1). In matrix notation, with tk = (tlk,...,tgk) T
and tk ti Ng(uk'akkP) and ckk « nkl as i n the preceding paragraph, the
calculations proceed by replacing tk by ~rkk/2P-1/2tk, t, _ (tl.,...,tq.)T
by crick
and lq (as in q = lglq) by ckk/2P-l/21q. For example,
the slope Sk is estimated by
akktkP-lt. - okktk
P 11tTP-1lakk(akk1TP-11)-1
sk -1 T -1 -1 T -1 •
2 -1 T -1 -1 akkt.P t, - (akkt.P 1) (akkl P 1)
a
in this example, the akk cancel.
Since the regression calculations, and hence the analysis of
variance calculations, are weighted with weights assumed known, the
![Page 121: CANONICAL VARIATE ANALYSIS: SOME PRACTICAL ASPECTS by ... · 5.5 C-V plot for arctanh correlations for grasshopper data 137 5.6 I-A and Q-Q plots for log variances for Thais data](https://reader033.vdocuments.mx/reader033/viewer/2022052106/60412acbe2295f639322b604/html5/thumbnails/121.jpg)
120
various SSQs are compared with the appropriate X2 distributions.
As in the usual analysis of variance, various residual plots can
be made. To emphasize those groups or elements in a group which differ
from the rest, take the residuals as the departs of the individual
elements from the robust midmeans. Under the null hypothesis of no
group differences, these departures will have zero mean and common
standard deviation if the group sizes are equal. An obvious summary
is provided by a Q-Q Gaussian plot. Two views of the same plot reveal
the differences of interests the first could use different symbols,
such as letters of the alphabet, for each group; and the second could
use different symbols, such as the numbers 1 to v or v(v-1)/2, for each
element. I have found teletype and line printer plots to be adequate.
The elements within a group are not iancorrelated, though empirical
evidence and some resu'ts in Tukey (1976, Section 5.3) suggest that
the linearity of the plot should not be affected. Another plot which is
sometimes useful, referred to here as an R-R plot, is that of the
departures from the robust midmean against the departures from the
usual meant ideally, the plot should be linear with unit slope and
zero intercept. Atypical groups or elements are indicated by departures
from unit slope and/or by the clustering of elements for a particular
group(s). Again the same plot viewed with group symbols and with
element symbols shows the differences of interest.
To illustrate the simplicity of the I-A plot, and the subsequent
residual plots, consider the following computer generated example. All
populations are assumed to have unit variances. Two groups are generated
from a four-variate multivariate Gaussian population with correlations
0.975, 0.950, 0.925, 0.900, 0.850, and 0.875, while for a third group,
four of the six population correlations are reduced, viz. 0.975, 0.600,
0.575, 0.500, 0.450, and 0.875. The generation of the covariance
![Page 122: CANONICAL VARIATE ANALYSIS: SOME PRACTICAL ASPECTS by ... · 5.5 C-V plot for arctanh correlations for grasshopper data 137 5.6 I-A and Q-Q plots for log variances for Thais data](https://reader033.vdocuments.mx/reader033/viewer/2022052106/60412acbe2295f639322b604/html5/thumbnails/122.jpg)
121
matrices, via the Bartlett decomposition (see, e.g., Newman and Odell,
1971, Section 5.2), assumes a sample size of 50. Figure 5.1(a) shows
the I-A plot for the arctanh-transformed correlation coefficients.
Since greater resolution on the teletype exists for the abscissa,
the individual elements are represented along that axis. The evidence
for departure from equality of correlation structure is obvious, and
needs no formal analysis. It is also clear that the two groups
generated from the same population do indeed behave similarly.
Figures 5.1(b) and 5.1(c) show Q-Q and R-R plots for the same example.
The nature of the differences between the groups is again obvious.
5.2.2 A multivariate comparison
The approach proposed in Section 5.2.1 uses the univariate concept
of regression. As noted in the introduction to this Section, it is also
possible to consider the elements for each group as a profile represented
by a multivariate vector tk.
Consider the column vector tk ti Nq(1ak,01;kk
P) with akk and P assumed
known; Qkk depends only on the group size. Then the relevant part
of the log-likelihood for all g groups is
9
- tr P-1
E akk(tk-uk)(tk uk)T .
k=1
(5.6)
Under the hypothesis of equality of the Ilk, differentiation of (5.6)
w.r.t. the common u gives
ū = E akk
tk/ E akk
k=1 k=i
and the maximized term is -trP-1B where
![Page 123: CANONICAL VARIATE ANALYSIS: SOME PRACTICAL ASPECTS by ... · 5.5 C-V plot for arctanh correlations for grasshopper data 137 5.6 I-A and Q-Q plots for log variances for Thais data](https://reader033.vdocuments.mx/reader033/viewer/2022052106/60412acbe2295f639322b604/html5/thumbnails/123.jpg)
+ i-
2.1 +
+
122
+ 4- +
B AC
C
C BAC
A B
8A
C A 8
0.9+ C B A + + +
0. 35 0. 85 i . 35 1.85 2.35
individual arctanh correlations
Figure 5.1(a) - I-A plot - arctanh correlations - generated data - two groups (A & B) are generated from the same population.
+ --------- -{- -{- ~- +- 0.12-I-
v)
-o U)
ō- 0. 48
a,
N
0
C
C C
8 B8 AA A A
B
B CC A A
B
-1.08 -1-C + + -I- -I- -I- +
-2. 0 -1.0 0.0 1. 0 2. 0 Gaussian Quantiles
Figure 5.1(b) - Q-Q plot - arctanh correlations - generated data.
![Page 124: CANONICAL VARIATE ANALYSIS: SOME PRACTICAL ASPECTS by ... · 5.5 C-V plot for arctanh correlations for grasshopper data 137 5.6 I-A and Q-Q plots for log variances for Thais data](https://reader033.vdocuments.mx/reader033/viewer/2022052106/60412acbe2295f639322b604/html5/thumbnails/124.jpg)
123
0. 48_ ++
B 8 1 .
iA 8 B 8 A A
y
C
C C
-1. i2 +
-1.08
+ + + -F- -0.68 —0.28 —0.12 0.52
usual residuals tik"ti.
Figure 5.1(c) - R-R plot - arctanh correlations - generated data - for the Q-Q and R-R plots, a number indicates
the number of overprintings.
![Page 125: CANONICAL VARIATE ANALYSIS: SOME PRACTICAL ASPECTS by ... · 5.5 C-V plot for arctanh correlations for grasshopper data 137 5.6 I-A and Q-Q plots for log variances for Thais data](https://reader033.vdocuments.mx/reader033/viewer/2022052106/60412acbe2295f639322b604/html5/thumbnails/125.jpg)
g _
B = E akk (tk-11) (tk T.
k=1
Hence the likelihood ratio statistic is exp(- 2 - tr P 1B), and so
-2 log A = tr P-1B. The statistic tr P-1B is distributed as x2 on
q(g-1) d.f. under the null hypothesis, and as a non-central x2 with
non-centrality parameter tr P-1 E akkukuk
under the alternative k=1
hypothesis (see, e.g., Chakravarti, 1966, Section 3.1).
The form of the test statistic suggests that a conventional
canonical' variate analysis, in which the vector of column elements tk
is treated like a vector of group means, may be used to examine
differences in covariance structure. Specifically, the roots fi and
vectors ci of (B-fP)c = 0 are determined; there are min(g-1,q) non-zero
roots, the sum of which equals the trace statistic above. The plot of
canonical variate means, referred to subsequently as a C-V plot,
indicates those groups which have similar patterns of elements, and
those groups which differ. The value of the plots of canonical variate
means will depend on whether most of the information is in the first
few canonical roots (termed concentrated structure by Olson, 1974,
following Schatzoff, 1966) or whether it is spread over most of the
roots (diffuse structure).
The statistic tr P-1B is just the sum of the column SSQ and
interaction SSQ in the analysis of variance discussed in Section 5.2.1.
This is easily seen by considering the case when P = I and akk = 1.
Then
-1 T q g 2 tr P B = tr E (tk-t ) (tk-t. ) = E1 E (tik-t .
k=1 i=1 k=1
q g = E E {(t.k-t)
2 + (tik-t. -t.k+t)2} 1.
i=1 k=1
124
![Page 126: CANONICAL VARIATE ANALYSIS: SOME PRACTICAL ASPECTS by ... · 5.5 C-V plot for arctanh correlations for grasshopper data 137 5.6 I-A and Q-Q plots for log variances for Thais data](https://reader033.vdocuments.mx/reader033/viewer/2022052106/60412acbe2295f639322b604/html5/thumbnails/126.jpg)
125
as required. The result is also obvious when the vectors tk are
considered as profiles (refer again to Morrison, 1976, Section 4.6).
A profile is simply a graphical representation obtained by plotting
the components of the vector against the variable number (from 1 to q).
Equality of mean vectors implies that the profiles are similar in
shape or are parallel, and that they also have the same overall mean
value; the former implies lack of interaction, and the latter lack of
group effect.
Layard (1972) has demonstrated the non-robustness of the likelihood
ratio test for the equality of two covariance matrices. Of particular
relevance to this study is that one of the tests he proposes, his
standard error test, is very similar to the multivariate comparison
outlined above. The standard error test is a simultaneous test of
equality of log variances and arctanh correlations for two groups;
a consistent estimate of the asymptotic covariance matrix is obtained
by substituting sample quantities for population moments in the
asymptotic covariance matrix of the sample second moments (see Layard,
1972, p.125). A sampling experiment with two variables presented in
Layard (1974, p.464) shows that empirical significance levels for a
nominal 5% level for the standard error test are between 5.8%, for
an underlying standard Gaussian distribution, and 8.5%, for a double
exponential; the levels for the usual likelihood ratio test are 4.5%
for the Gaussian, 21.8% for the double exponential and 39.2 for a 10%
contaminated Gaussian with a scale factor of 3. The results are based
on 1000 replications of samples of size 25. The power of the standard
error test is comparable with that of the likelihood ratio test for
Gaussian samples. Layard's results suggest that the multivariate trace
statistic, and the analysis of variance table, will provide useful
![Page 127: CANONICAL VARIATE ANALYSIS: SOME PRACTICAL ASPECTS by ... · 5.5 C-V plot for arctanh correlations for grasshopper data 137 5.6 I-A and Q-Q plots for log variances for Thais data](https://reader033.vdocuments.mx/reader033/viewer/2022052106/60412acbe2295f639322b604/html5/thumbnails/127.jpg)
126
guidelines for indicating the statistical significance of the differences
in variance and/or correlation structure.
5.2.3 Orthogonalized variables
If the covariance matrices are similar, then the corresponding
concentration ellipsoids will be similar in orientation and in size.
After suitable rotation and standardization, the concentration ellipsoids
will then become concentration spheres. This is, of course, the first-
stage rotation in canonical variate analysis (see Section 1.4). The
correlations of the orthonormalized variables within-groups should be
zero, and the variances should be unity. The rotation and scaling is
based on a pooled within-groups covariance or correlation matrix.
The previous paragraph suggests two further procedures for examining
equality of covariance structure, based on the equality of the correla-
tions between orthonormalized variables, and on the equality of the
variances of the orthonormalized variables. However, the elements of
the pooled covariance or correlation matrix are influenced by an atypical
group(s) or element(s) of a group, with the result that all the correla-
tions and the variances between the orthonormalized variables will tend
to differ from zero and from unity respectively. A simple but effective
solution is to use the robust estimate of the correlation matrix described
in Section 5.2.1, with the variables standardized by the back-transform
of the midmean of the log variances.
The orthonormalizing transformation based on the eigenanalysis of
the robust correlation matrix is applied to the covariance matrices for
each of the groups after the latter have been standardized by the robust
variances. Specifically, let RA be the robust correlation matrix, and
let VA be the diagonal matrix of robust variances, with Vk the covariance
matrix for the kth group. Let RA = UAEAUA be the eigenanalysis of RA.
![Page 128: CANONICAL VARIATE ANALYSIS: SOME PRACTICAL ASPECTS by ... · 5.5 C-V plot for arctanh correlations for grasshopper data 137 5.6 I-A and Q-Q plots for log variances for Thais data](https://reader033.vdocuments.mx/reader033/viewer/2022052106/60412acbe2295f639322b604/html5/thumbnails/128.jpg)
127
Then the standardized covariance matrix is given by VA1/2VkVA1/2
and that for the orthonormalized variables for the kth group is given
by E-1/2UTV 1/2V V-1/2U E 1/2 = V0 A A A k A A A k
say.
All variances, for all groups, for the orthonormal variables- should
be approximately unity under the null hypothesis, and the variances will
be approximately x2 distributed. Moreover, from the results on variances
and correlations of variances and of correlations in Section 5.2.1, the
variances within a group will tend to be uncorrelated after transforma-
tion. Hence, if the group sizes are similar, the equality of the variances
can be examined by a gamma probability plot.
All correlations, for all groups, for the orthonormal variables
should be approximately zero under the null hypothesis, and again will
tend to be uncorrelated. If the group sizes-are similar, a Gaussian
probability plot of the arctanh transformed values should have slope
(n-3)-1/2 (see, e.g., Hills, 1969).
The examination of the orthonormal variables has an added advantage:
if a canonical variate analysis is carried out on these variables, the
within-groups directions that do not contribute to the between-groups
discrimination can be identified and eliminated, and this will often
lead to a marked improvement in the stability of the canonical variate
coefficients, as discussed in Chapter Six. Moreover, in the context
of discrimination, it is then only necessary to base the comparison of
the covariance matrices on the remaining orthonormalized variables,
and determine whether the correlations between these variables are
approximately zero and the variances approximately one. Plots of group
means and associated concentration ellipses for pairs of important
orthonormal variables can also be made; examples of this are given in
Campbell (1979) and in Campbell and Atchley (1979, paper in preparation).
![Page 129: CANONICAL VARIATE ANALYSIS: SOME PRACTICAL ASPECTS by ... · 5.5 C-V plot for arctanh correlations for grasshopper data 137 5.6 I-A and Q-Q plots for log variances for Thais data](https://reader033.vdocuments.mx/reader033/viewer/2022052106/60412acbe2295f639322b604/html5/thumbnails/129.jpg)
128
The approach described in this Subsection will be of particular
value when the first-stage principal component analysis provides a
useful biological or physical interpretation. Campbell and Atchley
(1979) relate the principal components to differences in the patterns
of correlation between species of grasshoppers.
5.3 Some Examples
The first data set to be examined is from a study of variation in
and between species of grasshopper in the Snowy Mountains, New South
Wales (Campbell and Dearn, 1979). In all, twenty-five groups were
collected along three altitudinal transects. Each population contains
one of three species. The species are denoted as P, C and U. Thirteen
groups were collected along the first transect (altitude 980 m - 2140 m),
ten groups along the second transect (1040 m - 1540 m) and two groups
from a third transect. Sample sizes are given in Table 5.2.
Of twelve variables measured, three contain much of the information
for discrimination between the species. To illustrate the approaches
described in Section 5.2, differences in the variance and correlation
structure for the twenty-five groups for the three variables are
considered here. The variances and correlations for each group are
given in Table 5.2. The approaches are presented for illustrative
purposes. I am not proposing that such a detailed analysis is warranted
when there are only three variables; careful inspection of the log
variances and arctanh correlations will often then suffice, though the
I-A plot may also be a useful aid.
Figure 5.2(a) shows an I-A plot of average log variance against
individual log variance. The regressions to be fitted are of individual
values against average values; however the reverse is plotted because
of the greater resolution on the abscissa on teletype plots. The order
![Page 130: CANONICAL VARIATE ANALYSIS: SOME PRACTICAL ASPECTS by ... · 5.5 C-V plot for arctanh correlations for grasshopper data 137 5.6 I-A and Q-Q plots for log variances for Thais data](https://reader033.vdocuments.mx/reader033/viewer/2022052106/60412acbe2295f639322b604/html5/thumbnails/130.jpg)
129
Table 5.2 Variances (x1000) and correlations for the grasshopper data
Grpa n vl v2 v3 r(1,2) r(1,3) r(2,3)
1 P 20 3.225 4.216 11.251 0.670 0.799 0.549
2 C 20 1.646 5.352 16.645 0.721 0.742 0.539
3 C 20 0.889 6.873 9.900 0.664 0.503 0.827
4 C 39 2.648 6.253 11.371 0.698 0.669 0.715
5 C 20 1.604 3.150 11.183 0.465 0.423 0.602
6 C 20 1.973 5.197 17.197 0.786 0.757 0.761
7 C 17 1.047 3.563 5.988 0.502 0.634 0.518
8 C 19 1.743 6.792 19.493 0.412 0.401 0.852
9 C 20 2.171 5.546 11.321 0.558 0.756 0.583
10 C 13 2.073 6.677 14.476 0.616 0.903 0.642
11 U 18 1.156 3.603 8.356 0.419 0.614 0.367
12 U 15 4.441 8.681 31.078 0.879 0.820 0.812
13 U 26 1.572 4.961 16.496 0.620 0.609 0.661
14 P 10 2.023 8.093 9.854 0.684 0.916 0.720
15 C 16 2.660 5.783 9.273 0.553 0.685 0.474
16 C 20 5.947 7.973 24.020 0.790 0.907 0.891
17 C 20 3.289 4.592 19.199 0.691 0.586 0.624
18 C 19 3.120 4.447 14.809 0.627 0.797 0.802
19 C 20 1.625 5.722 12.729 0.644 0.685 0.762
20 C 16 2.733 2.710 13.319 0.563 0.804 0.765
21 C 17 3.293 6.894 22.224 0.634 0.839 0.757
22 C 12 1.988 4.209 18.950 0.459 0.624 0.663
23 U 19 1.858 3.813 12.558 0.521 0.506 0.578
24 C 19 1.676 4.439 15.898 0.447 0.325 0.726
25 U 16 2.126 5.233 21.056 0.406 0.682 0.470
aspecies identification (P, C or U) is also given. Groups 1-13 are
from transect I, groups 14-23 are from transect II and groups 24 and
25 are from transect III.
![Page 131: CANONICAL VARIATE ANALYSIS: SOME PRACTICAL ASPECTS by ... · 5.5 C-V plot for arctanh correlations for grasshopper data 137 5.6 I-A and Q-Q plots for log variances for Thais data](https://reader033.vdocuments.mx/reader033/viewer/2022052106/60412acbe2295f639322b604/html5/thumbnails/131.jpg)
130
-4.2
CT) cr
a►
-6.2
+
1-
+ 4
C GK M NFJ ERWN2 5
+ i- N A TX 0
G K0 tI5J MFYNuP FD R 8
G vRFI J p T EKWAgySDNU L
XM80c N
D A OR L P
L
+ +
-7.2 -6.2 -5.2 -4.2 -3.2 individual log variances
Figure 5.2(a) I-A plot - log variances - grasshopper data - order of groups for each row: 7 11 15 14 3 5 1 9 4 23 19 20
10 18 24 13 2 6 22 17 8 25 21 16 12; 20 5 7 11 23 22 1 24 18 17 13 6 25 2 9 19 15 4 10 8 3 21 16 14 12; 3 7 11 13 5 19 2 24 8 23 6 22 14 10 25 9 4 15 20 18 1 17 21 12 16.
LL
LP
0.1 +
121 2
23V 03F
33 13J
32 233
11W 0
K1 GT
0.9 + G C + 4 + 4 -I- -I-
-3.2 -1.6 0.0 1.6 3.2 Gaussian quantiles
Figure 5.2(b) Q-Q plot - log variances - grasshopper data
![Page 132: CANONICAL VARIATE ANALYSIS: SOME PRACTICAL ASPECTS by ... · 5.5 C-V plot for arctanh correlations for grasshopper data 137 5.6 I-A and Q-Q plots for log variances for Thais data](https://reader033.vdocuments.mx/reader033/viewer/2022052106/60412acbe2295f639322b604/html5/thumbnails/132.jpg)
LL
i R5
ā lY 54
• 0.1 + 13 Y2IS
43F � HX4
..4 M42
° 3W 0
K KE G T
-0.9 +1 -4-
÷ + + + + -0.9 -0. 4 0.1 0.6 1.1
usual residuals tik-ti.
Figure 5.2(c) R-R plot - log variances - grasshopper data
![Page 133: CANONICAL VARIATE ANALYSIS: SOME PRACTICAL ASPECTS by ... · 5.5 C-V plot for arctanh correlations for grasshopper data 137 5.6 I-A and Q-Q plots for log variances for Thais data](https://reader033.vdocuments.mx/reader033/viewer/2022052106/60412acbe2295f639322b604/html5/thumbnails/133.jpg)
132
of the groups for each row is given in the Figure legend to distinguish
groups whose symbols are overprinted in Figure 5.2(a). The general
visual impression is of a band of linear trends oriented at around 450;
this is indicative of a lack of groups x elements interaction. Figures
5.2(b) and 5.2(c) show Q-Q and R-R plots with the letters A-Y representing
the 25 groups; linearity of the plots is again evident. Each of the
plots shows groups 12(L) and 16(P), and 7(G) and 11(K), at the extremes
of the plot.
Table 5.3 summarizes the fitted linear regressions and formal
analysis of variance table. The intercepts for common slope, of one,
are also given. All but three of the fitted regressions account for
more than 90% of the variation; and for only two of these three are
the deviations significant at the nominal 5% level. The analysis of
variance table shows that none of the SSQs of interest reaches even
the 90% point of the appropriate x2 distributions. In particular,
the pooled residual SSQ is not significant, nor is there a significant
difference in slopes. An M-S plot of mean against slope in Figure
5.3(a) provides graphical support for the analysis and for the
subjective conclusions from the I-A plot. Again groups 12 and 16, and
group 7, are on the edges of the plot. Groups 12(L) and 16(P) have
large variances, while group 7 has small variances.
The fitted regressions and analysis of variance table were also
calculated with the weighting matrix P based on alternative derivations
of the pij, viz, the usual pooled within-groups SSQPR matrix; the
pooled within-groups SSQPR matrix for the 12 groups 4, 5, 9-11, 14,
17-21, 23, with these groups chosen subjectively from the I-A, M-S and
C-V plots; and the backtransform of the average of the arctanh correla-
tions for the 12 groups. All fitted slopes and intercepts for these
three alternatives are within 0.02 of those for the robust midmean
![Page 134: CANONICAL VARIATE ANALYSIS: SOME PRACTICAL ASPECTS by ... · 5.5 C-V plot for arctanh correlations for grasshopper data 137 5.6 I-A and Q-Q plots for log variances for Thais data](https://reader033.vdocuments.mx/reader033/viewer/2022052106/60412acbe2295f639322b604/html5/thumbnails/134.jpg)
133
Table 5.3 Summary of fitted linear regressions and analysis of variance
table for the log variances for the grasshopper data
(a) fitted linear regressions
Sk r2 Deviation SSQ Grp Total SSQ t -Skt t.k-t.,
1 17.0 -1.65 0.01 0.68 0.94 1.04
2 50.6 1.13 -0.03 1.21 1.00 0.09
3 58.4 0.81 -0.30 1.20 0.85 8.68
4 39.8 -1.19 0.09 0.76 0.98 0.65
5 37.8 -0.10 -0.34 1.04 0.98 0.57
6 45.0 0.83 0.03 1.15 1.00 0.02
7 24.3 -1.19 -0.63 0.89 0.94 1.45
8 51.8 1.51 0.12 1.26 0.99 0.52
9 25.6 -0.75 -0.03 0.86 0.99 0.28
10 22.4 0.15 0.09 1.01 0.98 0.44
11 32.8 -0.32 -0.49 1.03 0.99 0.41
12 28.0 0.93 0.67 1.04 0.98 0.44
13 69.0 1.21 -0.07 1.24 1.00 0.03
14 12.1 -1.06 0.04 0.79 0.83 2.03
15 11.6 -1.86 0.00 0.65 0.97 0.30
16 21.3 -0.56 0.67 0.76 0.94 1.34
17 34.6 0.05 0.20 0.97 0.93 2.50
18 24.8 -0.68 0.09 0.85 0.94 1.37
19 39.7 0.28 -0.09 1.06 0.98 0.90
20 26.1 -0.72 -0.16 0.89 0.82 4.58
21 30.2 0.51 0.39 1.02 0.99 0.21
22 29.7 1.14 -0.01 1.21 0.98 0.56
23 34.2 -0.05 -0.19 1.02 0.99 0.30
24 46.3 0.95 -0.10 1.19 1.00 0.06
25 40.7 1.33 0.12 1.22 0.99 0.24
(b) analysis of variance table
source of variation d.f. SSQa SSQ'
row (elements) 2 797 782
column (groups) 24 28.1 28.3
row x column 48 56.9 56.4 slopes 24 27.9 26.5 deviations 24 29.0 29.9
C + RxC 72 84.9 84.7
a correlations derived from robust midmeans b correlations derived from usual pooled within-groups SSQPR matrix
![Page 135: CANONICAL VARIATE ANALYSIS: SOME PRACTICAL ASPECTS by ... · 5.5 C-V plot for arctanh correlations for grasshopper data 137 5.6 I-A and Q-Q plots for log variances for Thais data](https://reader033.vdocuments.mx/reader033/viewer/2022052106/60412acbe2295f639322b604/html5/thumbnails/135.jpg)
134
calculations. The (C + RxC) SSQs are within 2% of the C + RxC SSQ
for the robust midmean. Table 5.3(b) gives the analysis of variance
table when the pig are derived from the usual pooled within-groups
SSQPR matrix.
Now consider the multivariate approach in Section 5.2.2. With
only three variables, there are only three canonical roots. Their
values for the analysis of the log variances are 42.7, 24.8 and 17.4,
with the sum, the trace statistic, 84.9, agreeing with that for
columns SSQ + rowxcolumns SSQ in Table 5.3. Figure 5.3(b) shows a
C-V plot for the three canonical variates. There is in general
excellent agreement with Figure 5.3(a), with the obvious exception of
group 3 which has a significant residual SSQ and relatively low r2.
Figure 5.4 shows an 1-A plot of average arctanh correlation
against individual arctanh correlations. The reasonably smooth linear
trends evident in Figure 5.2(a) no longer obtain, and this is reflected
in the fitted regressions. Only ten of the groups exhibit significant
row variation within a column at even the 20% level, and only four at
the 10% level; only for four of the ten do the fitted regressions
account for more than 50% of the total variation. The analysis of
variance in Table 5.4 shows no significant interaction or column
effects. The significant deviation from regression, or residual, SSQ
when the interaction is partitioned further reflects the poor description
afforded by the linear regressions.
The fitted regressions and analysis of variance table were also
calculated based on the alternative derivations of the pig discussed
above. The SSQs for (C + RxC) are again within 2% of the value for the
robust midmean calculations (see, for example, Table 5.4). There is
close agreement between the M-S plots from the various calculations.
The three canonical roots from the canonical variate analysis of
![Page 136: CANONICAL VARIATE ANALYSIS: SOME PRACTICAL ASPECTS by ... · 5.5 C-V plot for arctanh correlations for grasshopper data 137 5.6 I-A and Q-Q plots for log variances for Thais data](https://reader033.vdocuments.mx/reader033/viewer/2022052106/60412acbe2295f639322b604/html5/thumbnails/136.jpg)
135
the arctanh correlations are 40.9, 26.5 and 16.0, with sum 83.4
(c.f. Table 5.3). The C-V plot in Figure 5.5 shows no obviously
different groups, though population 16 is again on the edge of the plot.
Table 5.4 Analysis of variance table for the arctanh
correlations for the grasshopper data
Source of variation d.f. SSQa SSQb
row (elements) 2 9.7 9.2
column (groups) 24 24.7 24.7
row x column 48 58.7 58.5 slopes 24 20.4 19.4 deviations 24 38.3 39.1
C + RxC 72 83.4 83.2
a correlations derived from robust midmean
b correlations derived from within-groups SSQPR matrix pooled over groups 4, 5, 9-11, 14, 17-21, and 23.
The grasshopper example illustrates the simplicity of the
interpretation based on I-A, Q-Q and R-R plots and associated regression
statistics when linear regressions adequately describe the individual-
average relationship and the use of the multivariate C-V plot to
complement the univariate regression approach. No marked differences
in covariance structure exist, though group 16 may be worth closer
examination.
A second example, from a study of geographic variation in the
whelk Thais lamellosa along the west coast of North America
(C. Campbell, 1978), does exhibit differences in covariance structure.
The comparison of the covariance matrices for nine of the groups for
five variables is discussed here. Group 6 (see Chapter Four) is excluded.
![Page 137: CANONICAL VARIATE ANALYSIS: SOME PRACTICAL ASPECTS by ... · 5.5 C-V plot for arctanh correlations for grasshopper data 137 5.6 I-A and Q-Q plots for log variances for Thais data](https://reader033.vdocuments.mx/reader033/viewer/2022052106/60412acbe2295f639322b604/html5/thumbnails/137.jpg)
D N R
I
T
G
E 0A
0 u
-6.084 +
0. 624
A
N d
4
K
J
4
G
I F
X(BM
-0.54+ +
4.05
YV
4.55 5.05
+
5.55 6.05 4. + + +
+ + -4.48+
-f
136
4 -I-
P l
U
4 VY H
J F + B
S X M w
C E
K
+ + + + -I-
0.784 0.944 1.104 1.264 estimated slope iik
Figure 5.3(a) M-S plot - log variances - grasshopper data
4 0.664
b
a 1 a 0.0644)
W E
I.
canonical variate I
Figure 5.3(b) C-V plot - log variances - grasshopper data - third canonical variate is indicated by vertical lines.
![Page 138: CANONICAL VARIATE ANALYSIS: SOME PRACTICAL ASPECTS by ... · 5.5 C-V plot for arctanh correlations for grasshopper data 137 5.6 I-A and Q-Q plots for log variances for Thais data](https://reader033.vdocuments.mx/reader033/viewer/2022052106/60412acbe2295f639322b604/html5/thumbnails/138.jpg)
t
X HE CW Q KG DO BIF RAT L U
K YO Gp iE Q4 1VX UT RLC H P
0.8 +
o + YHKEGW
0 JMV S FP L + R CND
+ + + + + 0.0 0.4 0.8 1.2 1.6
individual arctanh correlations Figure 5.4 I-A plot - arctanh correlations - grasshopper data
0. 9-fi
137
JP N
F
W G 0B
I.56+
0.96+
C
0.361 -1- + -+- + + +
-0.4 0.0 o• 4 0.8 1.2
canonical variate I
Figure 5.5 C-v plot - arctanh correlations - grasshopper data - third canonical variate is indicated by vertical lines.
![Page 139: CANONICAL VARIATE ANALYSIS: SOME PRACTICAL ASPECTS by ... · 5.5 C-V plot for arctanh correlations for grasshopper data 137 5.6 I-A and Q-Q plots for log variances for Thais data](https://reader033.vdocuments.mx/reader033/viewer/2022052106/60412acbe2295f639322b604/html5/thumbnails/139.jpg)
138
Group sizes are: 50, 99, 76, 37, 46, 50, 33, 28, 43. It is not the
intention to provide a biological interpretation of the data, but
rather to provide a basis for making such an interpretation. The
analyses are based on robust M-estimates of covariances, and hence
correlations (see Chapter Three).
Figure 5.6(a) shows an I-A plot for the log variances; the group
rankings are given in the Figure legend. A series of straight lines
would provide a reasonable description. Figure 5.6(b) shows a Q-Q
plot. Three groups of residuals are evident, and these correspond to
groups with visually similar positions in Figure 5.6(a), viz. H, I;
F, A, B, G; and C, E, D.
Table 5.5 summarizes the fitted linear regressions and formal
analysis of variance table for the analysis of the log variances.
Linearity of the fitted regressions holds for all but groups 2(B)
and 4(D) where departure from linearity is significant at the 1% and
0.1% levels respectively, when the deviation SSQ is compared with the
X3 distribution. Only for group 4 is less than 99% of the variation
explained by the linear regression. An M-S'plot of mean versus slope
is given in Figure 5.7(a). It is clear from Figures 5.6 and 5.7(a)
that groups 8(H) and 9(I), and groups 3(C) and 5(E) and possibly 4(D),
have similar variance structure. The groups 1(A), 2(B), 6(F) and
7(G) have similar mean level but differ in slope.
There are five canonical vectors with non-null roots. The roots
for the analysis of the log variances are 181, 84, 20, 11 and 3. The
sum, the trace statistic, is 299 (c.f. the SSQ due to C + RxC in
Table 5.5), to be compared with X40. A C-V plot for the first three
canonical vectors is shown in Figure 5.7(b). Again groups 3 and 5,
and groups 8 and 9, are similar.
![Page 140: CANONICAL VARIATE ANALYSIS: SOME PRACTICAL ASPECTS by ... · 5.5 C-V plot for arctanh correlations for grasshopper data 137 5.6 I-A and Q-Q plots for log variances for Thais data](https://reader033.vdocuments.mx/reader033/viewer/2022052106/60412acbe2295f639322b604/html5/thumbnails/140.jpg)
139
8. 6 +
individual 103
Figure 5.6(a) - I-A plot - log variances - order of groups for each
3 5 4; 8 9 6 2 1 7 3 5 4 8 9 6 2 1 7 3 5 4; 9 8 2
variances
- Thais data row, from top: 9 8 6 1 2 7 8 9 1 2 6 7 3 5 4;
6 1 7 3 5 4.
4 1.08±
-I-
D DD E
DE CE11
CC
GG 16
B B111
F118
U) 0
`a
CIP N
--0. 12± ..4 0
F
IH HHIIIH
1.32-I-H -2.4
I -I- +
-1 .2 0.0
Gaussian quantiles
1.2 2.4
Figure 5.6(b) - Q-Q plot - log variances - Thais data.
![Page 141: CANONICAL VARIATE ANALYSIS: SOME PRACTICAL ASPECTS by ... · 5.5 C-V plot for arctanh correlations for grasshopper data 137 5.6 I-A and Q-Q plots for log variances for Thais data](https://reader033.vdocuments.mx/reader033/viewer/2022052106/60412acbe2295f639322b604/html5/thumbnails/141.jpg)
Table 5.5 Summary of fitted linear regressions and analysis of
variance table for the log variances for the Thais data
Grp n Total SSQ t.k-gst t.k-t.. r2 Deviation SSQ
1 50 805 0.16 -0.21 0.95 0.99 4.6
2 99 2478 -1.50 -0.19 1.17 0.99 14.6
3 76 1316 0.77 0.65 0.98 1.00 1.9
4 37 454 2.31 0.98 0.82 0.96 16.9
5 46 706 1.32 0.78 0.92 0.99 5.4
6 50 566 1.29 -0.29 0.78 0.99 7.3
7 33 699 -0.50 0.07 1.07 0.99 4.2
8 28 588 -1.81 -1.12 1.09 0.99 3.3
9 43 830 -1.43 -1.13 1.04 0.99 5.8
Source of variation d.f. SSQ
row (elements) 4 8243
column (groups) 8 99
row x column 32 200 slopes 8 136 deviations 24 64
C + RxC 40 299
140
![Page 142: CANONICAL VARIATE ANALYSIS: SOME PRACTICAL ASPECTS by ... · 5.5 C-V plot for arctanh correlations for grasshopper data 137 5.6 I-A and Q-Q plots for log variances for Thais data](https://reader033.vdocuments.mx/reader033/viewer/2022052106/60412acbe2295f639322b604/html5/thumbnails/142.jpg)
141
4 8.52+ -}
D
E C
G -1-
A 8 3
6.124 -E- 4- I
0. 78 0.88 0.9$
Figure 5.7(a) - M-S plot - log variances - Thais data.
H +
-I- -I- 1.08 1.18
estimated slope Sk
a I a E
7.32+
4- 4-
B
'0 G
F
7.i+ 4- + +-
-0.24 0.56 1.36
tN
+ + 2.16 2.96
f
canonical vari of e I
Figure 5.7(b) - C-V plot - log variances - Thais data.
![Page 143: CANONICAL VARIATE ANALYSIS: SOME PRACTICAL ASPECTS by ... · 5.5 C-V plot for arctanh correlations for grasshopper data 137 5.6 I-A and Q-Q plots for log variances for Thais data](https://reader033.vdocuments.mx/reader033/viewer/2022052106/60412acbe2295f639322b604/html5/thumbnails/143.jpg)
142
The formal analysis of variance and comparison of regressions can
be partitioned for comparisons of particular groups. Using the
conventional 5%, 1% and 0.1% levels as guidelines, specific comparisons
indicate that groups 8 and 9 do not differ significantly, nor do groups
3 and 5, though the inclusion of group 4 leads to a highly significant
RxC SSQ. Comparisons within groups 1, 2, 6 and 7 (highly significant
overall) show that groups 1 and 2 differ significantly, while the
overall comparison (C + RxC SSQ) for groups 1(A) and 6(F) is not
significant, though there is a significant difference (p < 0.01) in
slopes. The level of significance is, however, relatively small when
compared with that between groups 1 and 2 for example, or within groups
3, 4 and 5.
When the individual versus average trends are reasonably linear
and roughly parallel, the differences in mean or, equivalently, position
of the common slope lines provide a first summary of group differences.
Differences in slope within groups of roughly equal position are
indicative of differences in relative magnitudes of the variances for
the groups, and hence of possible differences in the orientation of the
associated concentration ellipsoids.
Figure 5.8 shows an I-A plot for the arctanh correlation coefficients;
the group rankings are given in Table 5.6, together with the row means
and identity of the elements. Examination of the order and clustering
of the groups in each row of Figure 5.8 and Table 5.6 shows that groups
6(F), 9(I) and 5(E) tend, for different elements, to have correlations
somewhat lower than those for the remaining groups, while group 4(D)
sometimes has higher correlations. Groups 5, 6 and 9 rank as the
bottom three groups for virtually all rows, while groups 1 and 4 rank
in the top three for nearly all rows.
![Page 144: CANONICAL VARIATE ANALYSIS: SOME PRACTICAL ASPECTS by ... · 5.5 C-V plot for arctanh correlations for grasshopper data 137 5.6 I-A and Q-Q plots for log variances for Thais data](https://reader033.vdocuments.mx/reader033/viewer/2022052106/60412acbe2295f639322b604/html5/thumbnails/144.jpg)
143
Table 5.7 summarizes the fitted regressions and formal analysis of
variance table. The fitted regressions provide a reasonable description
for only about half the groups, both when evaluated by r2 and by the
significance of the deviation SSQ. Figure 5.9(a) shows an M-S plot of
column mean against fitted slope. The groups 6, 9, 5 and 4 are at the
edges of the plot. Group 1(A) has a low r2 and so its position must be
interpreted with care.
There are eight canonical vectors with non-null roots; the canonical
roots for the arctanh correlations are 60, 56, 38, 26, 14, 10, 3, 1
with trace statistic of 208 on 80 d.f. A C-V plot for the first four
canonical vectors is shown in Figure 5.9(b).
Comparison of Figures 5.9(a) and 5.9(b) shows a number of differences.
These can in part be explained by the poor description afforded by the
linear regressions, e.g. groups 5, 6, 8, 9. However, groups 2, 3 and 4
are well fitted by linear regressions, and yet their relative
similarities differ in the two representations.
The clustering of the rows themselves in Figure 5.8 suggests that
further examination of the patterns of correlation within each matrix
may provide further insight. To show how this might proceed, Table
5.8 lists the correlation coefficients for each group. The order is
as in Figure 5.8 and Table 5.5. The first three rows in Figure 5.8
refer to correlations between vl-v3, while the fourth row refers to
correlations between v4 and v5. The bottom rows refer to correlations
of v4 and of v5 with vl-v3. The second part of Table 5.8 provides a
subjective pooling of the correlations. Correlations are pooled if
they are within 0.02 of each other, otherwise the individual values
are given; the value 0.02 is arbitrarily chosen, though it does
accommodate most correlations. From the smoothed summary, some clear
patterns emerge. Groups 1, 2, 3, 4 and 7 have correlations >0.97
![Page 145: CANONICAL VARIATE ANALYSIS: SOME PRACTICAL ASPECTS by ... · 5.5 C-V plot for arctanh correlations for grasshopper data 137 5.6 I-A and Q-Q plots for log variances for Thais data](https://reader033.vdocuments.mx/reader033/viewer/2022052106/60412acbe2295f639322b604/html5/thumbnails/145.jpg)
-}
-1-
144
-1- -+ 4 -E 4
IEF H BCA GD
FI EG AM 8 G D
0
ā2.364
IF I REF GBNCBAAD CG D (t)
E I F CHBGA D TIE E FH F8C CGADDGAH
I EI EF NF 8 HCACDDBG A
1.96-+E I F HC BG AD
-4- + + 4 1. 3 1. 8 2.3 2.8
individual arctanh correlations 3.3
2. 76 -}-
Figure 5.8 - I-A plot - arctanh correlations - Thais data.
![Page 146: CANONICAL VARIATE ANALYSIS: SOME PRACTICAL ASPECTS by ... · 5.5 C-V plot for arctanh correlations for grasshopper data 137 5.6 I-A and Q-Q plots for log variances for Thais data](https://reader033.vdocuments.mx/reader033/viewer/2022052106/60412acbe2295f639322b604/html5/thumbnails/146.jpg)
Table 5.6 Order of the groups for each row of the I-A plot for
arctanh correlations for Thais data. Row means and
identity of the element are also given.
element row mean group rankings
1,2 2.73 9 5 6 8 2 3 1 7 4
2,3 2.56 6 9 5 7 1 8 2 3 4
1,3 2.31 9 6 5 7 8 2 1 3 4
4,5 2.29 9 6 8 5 2 3 1 4 7
2,5 2.08 5 9 6 3 8 2 7 1 4
1,5 2.07 5 9 6 2 3 7 1 4 8
2,4 2.06 9 5 6 8 3 2 4 7 1
3,4 2.03 5 9 8 6 3 1 4 2 7
1,4 2.01 9 5 6 2 8 3 4 7 1
3,5 1.96 5 9 6 8 3 2 7 1 4
145
![Page 147: CANONICAL VARIATE ANALYSIS: SOME PRACTICAL ASPECTS by ... · 5.5 C-V plot for arctanh correlations for grasshopper data 137 5.6 I-A and Q-Q plots for log variances for Thais data](https://reader033.vdocuments.mx/reader033/viewer/2022052106/60412acbe2295f639322b604/html5/thumbnails/147.jpg)
146
Table 5.7 Summary of fitted linear regressions and analysis of
variance table for the arctanh correlations for the
Thais data
Grp Total SSQ t,k-Skt.. t.k-t.. Sk r Deviation SSQ
1 29 1.41 0.13 0.47 0.45 16.2
2 132 0.14 0.08 0.97 0.88 15.4
3 116 0.02 0.15 1.05 0.89 13.0
4 86 -0.49 0.40 1.37 0.94 4.8
5 110 -0.99 -0.31 1.28 0.82 20.2
6 73 -0.12 -0.27 0.94 0.74 19.4
7 55 0.25 0.22 0.98 0.69 17.1
8 46 0.35 -0.03 0.84 0.49 23.0
9 73 -0.81 -0.46 1.14 0.92 6.1
Source of variation d.f. SSQ
row (elements) 9 555
column (groups) 8 43
row x column 72 165 slopes 8 30 deviations 64 135
C + RxC 80 208
2
![Page 148: CANONICAL VARIATE ANALYSIS: SOME PRACTICAL ASPECTS by ... · 5.5 C-V plot for arctanh correlations for grasshopper data 137 5.6 I-A and Q-Q plots for log variances for Thais data](https://reader033.vdocuments.mx/reader033/viewer/2022052106/60412acbe2295f639322b604/html5/thumbnails/148.jpg)
1.92 1.62 1.32
C
2.85-+
147
D
G
A
C B
F E
I 1.85+ +
+ + + + + 0.425 0.675 0.925 1.175 1.425
estimated slope k Figure 5.9(a) - M-S plot - arctanh correlations - Thais data.
4-- 2.464
0
O
F
1 .26+ {- 0.72 1.02
A~
canonical variate I
Figure 5.9(b) - C-V plot - arctanh correlations - Thais data - third and fourth canonical variates are indicated
by vertical and horizontal lines.
![Page 149: CANONICAL VARIATE ANALYSIS: SOME PRACTICAL ASPECTS by ... · 5.5 C-V plot for arctanh correlations for grasshopper data 137 5.6 I-A and Q-Q plots for log variances for Thais data](https://reader033.vdocuments.mx/reader033/viewer/2022052106/60412acbe2295f639322b604/html5/thumbnails/149.jpg)
148
Table 5.8 Summary of correlation coefficients (x100) for Thais data,
(a)
and smoothed summary
original correlations
for various combinations of elements
4,5 2,5 1,5 2,4 3,4 1,4 3,5 1,2 2,3 1,3
1 99 99 99 99 99 98 99 98 99 98
2 99 99 98 98 98 97 98 98 96 97
3 99 99 99 98 98 97 98 98 98 97
4 100 100 99 99 98 98 98 98 98 98
5 98 99' 98 97 94 91 94 91 94 87
6 98 97 94 97 95 96 95 95 94 94
7 100 99 98 99 98 98 98 98 98 98
8 99 99 98 97 95 99 95 95 97 96
9 98 97 94 95 91 91 91 92 88 92
(b) smoothed summary for various combinations
1,2 2,3 1,3 4,5 4v1-3a 5v1-3 v05b v1,2,3c
4 100 100 99 98 98 98 98 100
1 99 99 99 99 99 98 99 99
7 100 99 98 99 98 98 98 99
3 99 99 98 98 98 97 98 99
2 99 99 99 98 98 97 98 99
8 99 99 98 97 95 95,96,99 96 99
5 98 99 98 97 91,94,94 87,91,94 93d 98
6 98 97 94 97 95 95 95 97d
9 98 97 94 95 88,91,92 91 91d 97d
a average of r(1,4), r(2,4), r(3,4)
b average of r(1,4), r(2,4), r(3,4), r(1,5), r(2,5), r(3,5)
c average of r(1,2), r(1,3), r(2,3)
d subjective average, since range greater than 0.02.
![Page 150: CANONICAL VARIATE ANALYSIS: SOME PRACTICAL ASPECTS by ... · 5.5 C-V plot for arctanh correlations for grasshopper data 137 5.6 I-A and Q-Q plots for log variances for Thais data](https://reader033.vdocuments.mx/reader033/viewer/2022052106/60412acbe2295f639322b604/html5/thumbnails/150.jpg)
149
for all pairs of variables, including those of v4 and v5 with vl-v3,
in contrast with groups 5, 6 and 9 which have lower correlations of
v4 and v5 with vl-v3. Group 8 is somewhat intermediate. Both
Figures 5.9(a) and 5.9(b) show the distinction between the two subsets.
vl-v3 are length measurements while v4 and v5 are width measurements.
There are some differences in the patterns of correlations within
groups 1, 2, 3, 4 and 7, though it is doubtful whether these are of any
practical significance. It is the overall similarity of the correlations
which leads to the different graphical summaries, depending on those
features of the patterns which are emphasized by the particular linear
combinations of the elements chosen. A change in correlation from 0.98
to 0.94 or to 0.92 will have considerably more effect on the orientation
of the corresponding concentration ellipsoid than will a change from
0.99 to 0.98.
5.4 Further Practical Aspects
The aim of the approaches presented in this Chapter is to provide
procedures for comparing covariance matrices which will indicate the
nature of any differences which exist. The I-A plots and accompanying
residual plots indicate atypical groups and/or elements. For more than
five groups and five-ten variables, the overall plots may need to be
combined with plots of, say, five groups at a time, still plotted against
the overall row means. This will offset overprinting of symbols in the
overall plot and hence provide better group identification.
The separate treatment of the variances and of the correlations
leads to ready identification of particular structure. For example,
proportional changes in variance from group to group will be reflected
in parallel straight lines in the I-A plot with different positions.
Uniform correlation matrices or a common variance for all variables
![Page 151: CANONICAL VARIATE ANALYSIS: SOME PRACTICAL ASPECTS by ... · 5.5 C-V plot for arctanh correlations for grasshopper data 137 5.6 I-A and Q-Q plots for log variances for Thais data](https://reader033.vdocuments.mx/reader033/viewer/2022052106/60412acbe2295f639322b604/html5/thumbnails/151.jpg)
150
will be indicated by a null row effect.
The formal analysis of variance and multivariate comparison should
be used to complement the graphical displays and fitted regressions,
supported by commonsense interpretation of the results. When most of
the elements are similar, as in the Thais correlation example, so that
a small row effect results, the fitted regressions may be misleading.
The advantage of the I-A plot for the Thais example is that it draws
attention to the correlations for groups 5(E), 6(F) and 9(I), which
are somewhat lower than the corresponding correlations for the remaining
groups. Care must be taken when interpreting results from fitted
regressions, since the latter may be effectively determined by only
one or two observations, as will occur when all but one or two of the
variances or correlations in each group are essentially the same.
Another possibility is that the slope of the fitted line, and the
corresponding residual SSQ, may be unduly influenced by an atypical
element(s); a robust fitting procedure based on M-estimators could be
used to provide smoothed trends.
![Page 152: CANONICAL VARIATE ANALYSIS: SOME PRACTICAL ASPECTS by ... · 5.5 C-V plot for arctanh correlations for grasshopper data 137 5.6 I-A and Q-Q plots for log variances for Thais data](https://reader033.vdocuments.mx/reader033/viewer/2022052106/60412acbe2295f639322b604/html5/thumbnails/152.jpg)
CHAPTER SIX: SHRUNKEN ESTIMATORS IN CANONICAL VARIATE ANALYSIS
This Chapter examines the role of shrunken estimation procedures
in discriminant and canonical variate analysis, and delineates
situations where they are likely to be effective. Shrunken estimators
are considered in Section 6.2 for the discriminant function and a
simple hypothetical example is discussed to illustrate the ideas.
Some asymptotic results for the mean square error (MSE) of the
coefficients in the two-group discriminant analysis are given in
Section 6.3. It is shown that for the g-inverse estimator, the MSE
of the corresponding coefficients will be less than that for the usual
solution provided the contribution to Mahalanobis 02 along the smallest
eigenvalue/vector combination is sufficiently small. Section 6.4
introduces shrunken estimators for canonical variate analysis, by
increasing the variance or standard deviation of the orthogonalized
variables from the first-stage calculations before scaling. When
the between-groups SSQ for an orthonormal variable is small and the
corresponding variance (and particularly eigenvalue) is also small,
shrinking will lead to improved stability of the canonical vectors.
Section 6.5 discusses a practical application. Some practical guidelines
are given in Section 6.6, together with some recommendations for variable
selection.
151
![Page 153: CANONICAL VARIATE ANALYSIS: SOME PRACTICAL ASPECTS by ... · 5.5 C-V plot for arctanh correlations for grasshopper data 137 5.6 I-A and Q-Q plots for log variances for Thais data](https://reader033.vdocuments.mx/reader033/viewer/2022052106/60412acbe2295f639322b604/html5/thumbnails/153.jpg)
152
6.1 Introduction
In many applications of canonical variate analysis, the relative
magnitudes of the coefficients for the variables standardized to unit
variance by the pooled within-groups standard deviation are useful
indicators of the important variables for discrimination. If the
relative magnitudes of the standardized coefficients are to be used
in this way, stability of the coefficients is important. Stability
here refers to the sampling variation of the coefficients over repeated
samples.
Discriminant analysis, the two-group canonical variate analysis,
can be considered as a regression problem with a dummy y-variate. In
regression, the presence of high correlation between a pair of
regressor variables leads to instability in the corresponding regression
coefficients, reflected in large standard errors. Generalized ridge or
shrunken estimation procedures produce more stable estimates; Alldredge
and Gilb (1976) give a comprehensive bibliography.
In discriminant analysis, it is easy to show that high correlation
within groups when combined with between-groups correlation of the
opposite sign leads to greater group separation and a more powerful test
than when the within-groups correlation is low. However, if the
instability inherent in regression analysis with highly correlated
regressor variables carries over to discriminant analysis and thence
to canonical variate analysis, interpretation of the importance of
variables based on the relative magnitudes of the standardized coefficients
will be misleading.
![Page 154: CANONICAL VARIATE ANALYSIS: SOME PRACTICAL ASPECTS by ... · 5.5 C-V plot for arctanh correlations for grasshopper data 137 5.6 I-A and Q-Q plots for log variances for Thais data](https://reader033.vdocuments.mx/reader033/viewer/2022052106/60412acbe2295f639322b604/html5/thumbnails/154.jpg)
153
6.2 Shrunken or Ridge-Type Estimators in Discriminant Analysis
If W is the pooled within-groups SSQPR matrix on nW = ni+n2-2
d.f., and dx = xi - x2, then the sample discriminant function is cTx,
where
c = nW 1d . w x
For consistency of notation in the following derivations, the usual
sample discriminant vector will be denoted by U. There is some
inconsistency of notation in this Section with that in Chapter One and
in Section 6.4 of this Chapter, in that the canonical vector c in (1.10)
is standardized to have unit variance within groups. I have chosen to
retain the same notation in this Section since the essential meaning
of the vectors is unchanged.
Write
nwW 1 = UE UT = E uiui/ei i=1
where the columns of U are the eigenvectors of nw1W, and the diagonal
elements of the diagonal matrix E are the eigenvalues. Then
v cU = UE-1UTdx = E uiuidx/ei = UE-1/2aU ,
i=1
where
aU = E 1/2UTd = E-1/2d = d
x y z
Here y = UTx denotes the orthogonalized variables and z = E-1/2UTx
the orthonormalized variables derived from x, aU is the vector of
discriminant function coefficients for the orthonormalized variables,
![Page 155: CANONICAL VARIATE ANALYSIS: SOME PRACTICAL ASPECTS by ... · 5.5 C-V plot for arctanh correlations for grasshopper data 137 5.6 I-A and Q-Q plots for log variances for Thais data](https://reader033.vdocuments.mx/reader033/viewer/2022052106/60412acbe2295f639322b604/html5/thumbnails/155.jpg)
154
and dy and dz are the vectors of mean differences for the new variables.
Then ai = ei1/2 dyi = dzi.
The Mahalanobis squared distance is
v v D2 = dxnwW 1dx = E d
2./e.1 = E d
2. .
i=1 y i=1
When the means are virtually coincident along one of the principal
components, dyi = yli r y2i -
0. The coefficient ai involves dyi/e1/2.
When the eigenvalue is also small, ai will be given by the ratio of
two small quantities and will be subject to wide fluctuations from
sample to sample. However the contribution to D2 is the square of this
quantity and so will tend to be small, even when the corresponding
eigenvalue is also small. Of course there is no reason in principle
why the largest dyi should not occur with the smallest e..
The geometry of discriminant and canonical variate analysis suggests
that at solution to the problem of smalldyi and small ei is to shrink
ai towards the origin by increasing the scaling factor ei for the first-
stage orthonormalization. One possibility is to consider the class of
shrunken estimators defined by
aGR = (E+H)-1/2d
y or aiR = dyi/(ei+hi)1/2 (6.1)
with H = diag(h1,...,hv). These are generalized ridge estimators (see,
e.g., Goldstein and Smith, 1974, in the context of regression analyses).
In terms of the original variables, (6.1) gives the estimator
cGR = (n IW+ UHUT)-1dx = U(E+H)-1/2aGR . (6.2)
Generalized-inverse (or principal component) estimators are
obtained by setting the smallest v-p coefficients ai to zero, which is
![Page 156: CANONICAL VARIATE ANALYSIS: SOME PRACTICAL ASPECTS by ... · 5.5 C-V plot for arctanh correlations for grasshopper data 137 5.6 I-A and Q-Q plots for log variances for Thais data](https://reader033.vdocuments.mx/reader033/viewer/2022052106/60412acbe2295f639322b604/html5/thumbnails/156.jpg)
155
equivalent to setting hi = 0 for i < p and h = 03 for i > p. Hence
cGI = UPE-1dyp - UPEp
1/2ap, where the subscript p denotes the
appropriate vxp, pxp and pxl partitions of U, E, d and aU respectively.
By contrast, when hi = h for all i, the ordinary ridge estimator
cR = Onto W + hI)-1dx results.
It is instructive to consider a simple hypothetical example which
illustrates the problem of instability. Assume for convenience that
there are two variables, and that they have unit variance and correlation
r within groups. Then the eigenvalues are 1 + r and 1 - r, with eigen-
veutors 2-1/2(1,1) and 2-1/2(1,-1). With differences dxl and dx2 in
the original variables, the differences along the principal components
are (dxl + dx2)/✓2 and (dxl - dx2)/12.
The discriminant coefficients for the orthonormal variables are
aGR = dxl x2 and aGR = dxl-dx2
1 ✓2 (1+r+h1)1/2 a2 GR
(1-r+h2)1/2
and hence
GR dxl x2 + dxl-dx2 and cGR dxlx2 - dxl-dx2 c = 1 2 (1+r+h1) 2 (1-r+h2) 2 2(1+r+h
1) 2 (1-r+h2)
Consider the situation in which dxl = dx2, and r is high. Assume
that dxl'is slightly greater than dx2. For k2 = 0, 141 will be
inflated by small 1 - r, while small increases in h2, which may be
large compared with 1 - r, A411 cause Ia 2Rl to tend rapidly to zero.
In this situation c2R will be negative for h2 = 0, and will become
positive as h2 increases sufficiently. If dx1 4 dx2,
then the corres-
ponding contribution to D2, (dxl - dx2)2/2(1-r), will become important
and shrinkage will be accompanied by a marked decrease in discrimination.
![Page 157: CANONICAL VARIATE ANALYSIS: SOME PRACTICAL ASPECTS by ... · 5.5 C-V plot for arctanh correlations for grasshopper data 137 5.6 I-A and Q-Q plots for log variances for Thais data](https://reader033.vdocuments.mx/reader033/viewer/2022052106/60412acbe2295f639322b604/html5/thumbnails/157.jpg)
156
It seems natural to require that if the contribution of
dy2 = (dxl - dx2)/1/2 to D2 is small, then the estimate a2 should also
be small in magnitude. This would be achieved here by the generalized-
ridge estimator a2R, with h2 chosen such that a2R is not sensitive to
small variations in h2.
Now dxl will be approximately equal to dx2 if the major axes of
the concentration ellipses for the two groups are virtually collinear.
If this situation is combined with high within-groups correlation, then
the overall correlation of the combined data, ignoring group distinctions,
will also be high.
The effect of high correlation on the stability of the coefficients
from a discriminant analysis, and its relation to regression analysis,
can now be explained. When discriminant analysis is viewed as a
regression problem with dummy response variable, the total matrix for
the combined data, namely W + t dxdT with t = n-1 + n-1 determines the
discriminant coefficients; the correlations between the dummy response
and observed (regressor) variables are essentially the dxi. Problems
of instability arise in regression when r is large, so that x.x.
r ~ r Moreover, one of the eigenvectors will contrast xi and xi,Yxj.Y xj; there will be a relatively large positive component corresponding
to xi, a relatively large negative component for x., and small
components for the remaining variables. When the regression analysis
is carried out on the principal components of the original variables,
the sum of squares corresponding to this contrast-type eigenvector
will be very small. The corresponding terms in the dummy variable
analysis are the correlation of xi and xj in W + tdxdx and the
similarity of the values of dxi and dxj. As noted in the previous
paragraph, only when dxi = dxj will the correlation between xi and x.
in the combined matrix be high.
![Page 158: CANONICAL VARIATE ANALYSIS: SOME PRACTICAL ASPECTS by ... · 5.5 C-V plot for arctanh correlations for grasshopper data 137 5.6 I-A and Q-Q plots for log variances for Thais data](https://reader033.vdocuments.mx/reader033/viewer/2022052106/60412acbe2295f639322b604/html5/thumbnails/158.jpg)
the columns
eigenvalues
of E, are
of r, are the eigenvectors of E, with corresponding
X.. Then the 1 asymptotic distributions, for distinct roots
157
6.3 Mean Square Error of Shrunken Estimators for Discriminant Analysis
In this Section, the asymptotic mean square error of the shrunken
estimators is derived, and is compared with that for the usual
estimator.
Assume now that x ti Nv(uk,E) if x belongs to the kth population,
k = 1,2, and let d = Ul - p2. The population vector of discriminant
coefficients is given by * = E-1d.
From (6.1) and (6.2) the generalized ridge estimator may be written
v v cGR =
iEl uiuidx/(ei+hi) = lEl uidyi/(ei+hi) . (6.3)
The expectation and mean square error (MSE) of cGR involve the
moments of ei and ui. Exact results do not appear to be available.
Anderson (1963) has derived asymptotic results, showing that the ei
independent. Write E = rArT, where the yi, and u. are asymptotically 1
and
e ^' N (Ai ,.2Ai/nw) , (6.4a)
ui ti Nv(yi,riLiri/nw) (6.4b)
where the subscript i for r denotes that the ith vector of r is deleted,
so that ri is vx(v-1) while the subscript i for S2i denotes that the ith
diagonal element is deleted; SZi is
of ni is A.Ai/(Aj-Ai)2.
(v-1)x(v-1). The jth diagonal term
![Page 159: CANONICAL VARIATE ANALYSIS: SOME PRACTICAL ASPECTS by ... · 5.5 C-V plot for arctanh correlations for grasshopper data 137 5.6 I-A and Q-Q plots for log variances for Thais data](https://reader033.vdocuments.mx/reader033/viewer/2022052106/60412acbe2295f639322b604/html5/thumbnails/159.jpg)
158
From (6.4) the following can be established:
E(uiu.) = yiy. + ri3~ir./nw; (6.5a)
A A.
E(e.+h.) A.+h. + n A.+h. + 12 (A.+h.)2 - A +h-{l+g(nw,hi)} say;
w i i n i w
(6.5b)
and
Xi 1 1 6 i 60 i 2 E{ 2} 2 {1 + n
A.+h. + 2 (A,+h.) }. ( e.+h.) (A,+h.) w i i n 1 i 1 1 w
(6.5c)
The independence of W and dX implies the independence of (ei,ui)
and dX. Hence (6.5) with (6.3) gives
v v
E(cGR) - irl Ai+hi
y1 1 + irl
A.1 +hl yiyi`S g(nw.hi)
v
+ E {l+g(n ,h)} 1 r.S 2.rT i=1 Ai
+hi w i n i i i
and for large n,
T v 'y.y.(5. E(cGR) -} £
Al+h.l = a + rHrT)-l6 = CGR •
i=1 i 1
The asymptotic MSE (aMSE) of cGR can be evaluated using (6.5).
2 After some algebra, and ignoring terms involving 1/ni and smaller,
![Page 160: CANONICAL VARIATE ANALYSIS: SOME PRACTICAL ASPECTS by ... · 5.5 C-V plot for arctanh correlations for grasshopper data 137 5.6 I-A and Q-Q plots for log variances for Thais data](https://reader033.vdocuments.mx/reader033/viewer/2022052106/60412acbe2295f639322b604/html5/thumbnails/160.jpg)
v ((Ty.)2 aMSE(cGR) = 1 E 1 1 (2X3-4h.X. + n h.X. + n h3) nw
i=1 Xi (X1+hi)3 i l i w l l wi
(6Ty) 2 X.Xi X3-2X
1
+ nw 1=1 j#1. Xj (X.-X.)2 (Xi+hi)2
•
With h. = O for all i,
U 2 ° (dTyi)2 1 v 2 A.-2A. aMSE(c ) _ E + E E (dT y.) 3 1
nw i=1 A. nw i=1 ji 3 Xi(Xj-Ai
The generalized inverse result follows by setting hi = 0
i < p, hi = = i > p, to give
GI 2 p ((Tyi ) 2 v (dTyi) 2
aMSE (c )= n E 2 + E 2 w 1=1 Xi i=p+1 Xi
+ 1 E E 3 1 (6Ty)2 . nw i=1 j#i Xi(Xj-Xi)
2 (6.7)
Comparison of (6.6) and (6.7) shows that aMSE(cGI) < aMSE(cU)
provided
v (dTy )2 v X.-2X.
2 n1-2 E E 2 (dy) 2 .
i=p+1 Xi nw -2 j#i (Xj-Xi) A
In the interesting case in practice of the single small eigenvalue,
with p = v-1 and Xv « Xi for all i < v, the condition becomes
159
(6.6)
![Page 161: CANONICAL VARIATE ANALYSIS: SOME PRACTICAL ASPECTS by ... · 5.5 C-V plot for arctanh correlations for grasshopper data 137 5.6 I-A and Q-Q plots for log variances for Thais data](https://reader033.vdocuments.mx/reader033/viewer/2022052106/60412acbe2295f639322b604/html5/thumbnails/161.jpg)
160
(dTyv) 2
1 v-1 (6Tyi) 2
Xv — nw-2 i=1 Ai (6.8)
Denote (6Tyi)2/ai, the contribution to Mahalanobis Q2 along the
ith eigenvector, by A(i). Then the condition (6.8) may be written as
A (v) < A2/ (nw-1) . (6.9)
The requirement (6.8) or (6.9) is intuitively reasonable and
accords with practice, namely that when the contribution to the overall
discrimination (measured by A2) is small, and the eigenvalue A , is also
small, then the MSE of the g-inverse or principal component solution
will be reduced by eliminating the corresponding principal component.
Unfortunately, a simple result cannot be established for the
comparison of aMSE(cGR) with aMSE(c0). Even a simplification such as
h. = 0 for all i < v, by = (b-1)Xv fails to lead to a useful comparison.
One situation in which some insight can be gained is the case v = 2 with
X2 « A1, and hl = h2 = bX2. Under the assumption that terms of order
7 2/X1 can be ignored, it can be shown that aMSE(cR) < aMSE(cU) provided
(6T 11)2
(nw-2)b2+(nw 6)b-10 (6Ty2)2
Xl (b+1) (b+2) X2 (6.10)
Consider the situation in the previous Section, with two variables
with equal unit variance and correlation p. Let the difference between
the means for the first variable be 6 (=dl) and that for the second be
to (=62). The condition t = 1 implies that the means lie along the major
axis of either ellipsoid, while t = -1 implies that the means lie along
the minor axis. For this situation, X1 = l+p, X2 = 1-p, (6Tyg)2 = 62(1+t)2/2
and (6Ty2)2 = 62(1-t)2/2. The condition (6.10) becomes
![Page 162: CANONICAL VARIATE ANALYSIS: SOME PRACTICAL ASPECTS by ... · 5.5 C-V plot for arctanh correlations for grasshopper data 137 5.6 I-A and Q-Q plots for log variances for Thais data](https://reader033.vdocuments.mx/reader033/viewer/2022052106/60412acbe2295f639322b604/html5/thumbnails/162.jpg)
161
b2+3b+2 1-p (l-t)2
(nw 2)b2+(nw 6)b-10 1+p
(l+t)2
As t + +1, the condition is readily satisfied, and this is just
the situation where al
= 62 and little or no information for discrimination
lies in the direction of smaller within-groups variation. However,
when t + -1 the condition will not be satisfied; here all the information
for discrimination lies in the direction of smaller within-groups
variation.
6.4 Shrunken Estimators in Canonical Variate Analysis
The argument given in Section 6.2 to illustrate the instability of
the discriminant coefficients is readily extended to more than two groups.
When the sum of squares between the means along a particular eigenvector
of the within-groups dispersion matrix is small, and the corresponding
eigenvalue is also small, instability of the coefficients will again result.
To see this, consider a hypothetical situation with two highly
correlated variables, with the remaining v-2 variables having lower
correlations. Then uv will be approximately of the form
2-1/2(*,...,*,1,-1)T where the * represent small numbers, while
ui = v-1/21. The remaining ui will have small components for the v-1
and vth variables. The form of the eigenvectors can be readily verified
empirically. If the between-groups SSQ e-1uTBuv for the vth orthonormal
variable is also small, then the corresponding component av of the
canonical vector(s) a will also be small. Since c = UE-1/2a from
(1.37), it follows that for the hypothetical situation,
![Page 163: CANONICAL VARIATE ANALYSIS: SOME PRACTICAL ASPECTS by ... · 5.5 C-V plot for arctanh correlations for grasshopper data 137 5.6 I-A and Q-Q plots for log variances for Thais data](https://reader033.vdocuments.mx/reader033/viewer/2022052106/60412acbe2295f639322b604/html5/thumbnails/163.jpg)
162
cv-1 = (2ev )-1/2av + (ve1
)-1/2al and cv
m -(2ev )-1/2av + (ve1 )-1/2a1 .
But av/~ involves the ratio of two small numbers and may be
fortuitously large if ev is small enough, in which case the term will
dominate both cv-1 and cv, giving coefficients of similar magnitude
but opposite sign.
The introduction of shrunken estimators in Section 6.2 proceeds
by increasing the eigenvalues ei by positive shrinkage constants hi
before the first-stage orthonormalization. The procedure for canonical
variate analysis is to find the canonical roots fGR and canonical
vectors aGR
of
(E+EH) -1/2
UTBU(E+HE)
-1/2 (6.11)
where H = diag(hl,...,hv); here the shrinkage constants are multiples
of the eigenvalues. Then
CGR = UE 1/2(I+H)-1/2aGR
gives the shrunken estimators of the canonical vectors for the original
variables.
Consider now the formulation of canonical variate analysis in
Section 1.5. Write
ZH = XU(E+HE)-1/2 = Z(I+H)-1/2 (6.12)
The symmetric matrix in (6.11) above becomes HH. H. The eigen-
analysis to give fGR and aGR, and hence cGR, is
(ZH fGRI)aGR = 0
or
Z GR
= f a GR
ZHZHa (6.13)
![Page 164: CANONICAL VARIATE ANALYSIS: SOME PRACTICAL ASPECTS by ... · 5.5 C-V plot for arctanh correlations for grasshopper data 137 5.6 I-A and Q-Q plots for log variances for Thais data](https://reader033.vdocuments.mx/reader033/viewer/2022052106/60412acbe2295f639322b604/html5/thumbnails/164.jpg)
163
The first-stage simultaneous rotation produces new variables
zH = (I+H)-1/2E 1/2UT
x = (I+H)-1/2z, which corresponds to a principal
component analysis of WH = U(E+H)UT.
The relationship (6.13) gives, as in Section 1.5,
Z Zm
GR = fGR nGR H H
with
mGR = Z aGR H
and so the Q-technique and singular value decomposition calculations
carry over directly.
In the usual canonical variate solution, the two-stage computations
give the same canonical vectors cU, irrespective of the first-stage
orthonormalization z = W-1/2x used. The invariance of canonical variate
analysis under different first-stage orthonormalizations is one of its
attractive features. However, this invariance will no longer result
when shrunken estimators are introduced. There is even some ambiguity
over the way in which the shrunken estimators are introduced above.
Explicit consideration of the alternative first-stage rotations suggests
that shrunken estimators could equally well be introduced as the solution
to the eigenanalysis of ZHZH with
EH = XU(E
1/2+H2E1/2)-1 = Z(I+H2)-1 • (6.14)
Here the shrinkage constants h2i are multiples of the standard
deviations of the orthogonalized variables UTx.
Shrunken estimators can also be introduced for the triangular
decomposition or successive orthonormalization of Section 1.5. When
two variables are highly correlated within groups, the diagonal term
![Page 165: CANONICAL VARIATE ANALYSIS: SOME PRACTICAL ASPECTS by ... · 5.5 C-V plot for arctanh correlations for grasshopper data 137 5.6 I-A and Q-Q plots for log variances for Thais data](https://reader033.vdocuments.mx/reader033/viewer/2022052106/60412acbe2295f639322b604/html5/thumbnails/165.jpg)
164
uTji of UT corresponding to the second of the variables will be small.
From (1.46), the corresponding diagonal term of ET will then also be
small. Instability of some of the coefficients will again result if
the corresponding diagonal term of ZTZT is also small, where ZT is
defined in (1.47). For thenaTj will be small, and the calculation
for some components of c will again involve the ratio of two small
numbers, this time aTi/uTjj.
The proposed solution is to increase the uTii'
so that
TH = XUU1(ET+HTET)-1 = ZT(I+HT)-1 (6.15)
The eigenanalysis becomes
(I+HT)-1ZTZT(I+HT)-laTR = GR .
and hence
CTR = U-1(I+HT)-1aTR
T
Again, by using the form (6.15), the calculations can also be
set out in Q-technique and singular value decomposition form.
Unfortunately, the lack of invariance goes even further. Canonical
variate analysis can be carried out with the data in standardized
form, where the standardization is based on the pooled within-groups
standard deviations. Then B and W in (1.5) and (1.3) are written in
standardized form, and the resulting canonical vectors are those for
the standardized variables, as referred to in the opening paragraph
of Section 6.1. Hence the shrunken estimates can be determined using
eigenanalyses based on any of (6.12), (6.14) or (6.15), using either
the original or standardized variables since they will give different
first-stage analyses.
![Page 166: CANONICAL VARIATE ANALYSIS: SOME PRACTICAL ASPECTS by ... · 5.5 C-V plot for arctanh correlations for grasshopper data 137 5.6 I-A and Q-Q plots for log variances for Thais data](https://reader033.vdocuments.mx/reader033/viewer/2022052106/60412acbe2295f639322b604/html5/thumbnails/166.jpg)
165
As shown in Section 1.2,the maximization of the between- to
total SSQ for a linear combination of the variables also leads to the
usual canonical variate solution. Shrunken estimators can be defined
in a similar way to that above, since directions of near-singularity
within groups and corresponding small differences in these directions
between groups will also be reflected in the smallest eigenvalue/
vector combinations for the total SSQPR matrix. Again, the shrunken
estimators can be based on a simultaneous or a successive first-stage
rotation, and on the original or on standardized data.
6.5 Practical Aspects
To illustrate the ideas presented in the previous Sections, data
examined by Phillips, Campbell and Wilson (1973) in a study of
geographic variation in the whelk Dicathais around the coast of
Australia and New Zealand are re-analyzed. Four variables describing
the size and shape of the shell were measured, namely overall length
(L), length of spire (LS), length of aperture (LA) and width of
aperture (WA). Means, pooled standard deviations and correlations
are given in Table 6.1.
The presentation for most of this Section uses the generalized-
ridge formulation in (6.11) and (6.12). The analysis is based on
standardized variables, so that the first-stage eigenanalysis is on
the within-groups correlation matrix. The last part of this Section
briefly examines alternative formulations.
Table 6.2 lists the eigenvalues and eigenvectors for the correlation
matrix given in Table 6.1. As might be expected from the high
correlations, there are two very small eigenvalues. The smallest
accounts for less than 0.08% of the within-groups variation. The
eigenvector corresponding to the smallest eigenvalue, hereafter referred
![Page 167: CANONICAL VARIATE ANALYSIS: SOME PRACTICAL ASPECTS by ... · 5.5 C-V plot for arctanh correlations for grasshopper data 137 5.6 I-A and Q-Q plots for log variances for Thais data](https://reader033.vdocuments.mx/reader033/viewer/2022052106/60412acbe2295f639322b604/html5/thumbnails/167.jpg)
166
Table 6.1 Means, pooled standard deviations and correlations for
the Dicathais data.
group 1 2 3 4 5 6 7 8 9
L 39.36a 33.39 35.54 33.86 27.43 51.73 37.47 40.11 38.43
LS 16.10 11.99 14.06 13.07 10.14 20.73 13.79 13.16 12.71
LA 28.04 25.58 25.81 25.10 20.42 37.21 28.55 31.94 30.40
WA 12.81 12.02 11.76 11.60 9.64 17.97 13.39 16.08 14.90
group 10 11 12 13 14 L LS LA WA
L 33.17 32.39 44.02 33.34 55.94 9.728b 0.967 0.983 0.975
LS 12.36 13.29 14.91 13.34 25.00 4.312 0.913 0.912
LA 24.67 23.12 33.51 24.92 38.93 6.817 0.986
WA 11.21 11.76 17.46 13.02 20.84 3.476
a values in columns 1-14 are the group means for the four variables
b diagonal elements are the pooled standard deviations; off-diagonal are the corresponding correlation coefficients.
![Page 168: CANONICAL VARIATE ANALYSIS: SOME PRACTICAL ASPECTS by ... · 5.5 C-V plot for arctanh correlations for grasshopper data 137 5.6 I-A and Q-Q plots for log variances for Thais data](https://reader033.vdocuments.mx/reader033/viewer/2022052106/60412acbe2295f639322b604/html5/thumbnails/168.jpg)
I
III
Table 6.2 Eigenanalysis of within-groups correlation matrix, and summary of canonical variate analyses for Dicathais data
eigenvector e-value L LS LA WA
0.50 0.49 0.50 0.50 3.869
-0.33 0.15 -0.56 0.75 0.016
eigenvector e-value L LS LA WA
II 0.08 0.79 -0.42 -0.43 0.112
IV 0.79 -0.33 -0.51 0.03 0.003
aU
cU(h40= )
cGI(h4=*)
cU(hi.0)
cU(hi=0)
cU(hi=0)
canonical vector I c-root canonical vector II c-root
PCI
-0.32a
L
PCII
-0.08
LS
PCIII
-0.93
LA
PCIV
0.17
WA
2.13
PCI
0.09
L
PCII
-0.93
LS
PCIII
-0.02
LA
PCIV
-0.35
WA
1.68
-4.82 2.02 -2.41 5.64 2.13 4.65 -4.28 -2.12 1.51 1.68
-2.42 0.66 -3.78 5.91 2.09 0.17 -2.54 1.93 0.18. 1.48 -7.96 3.70 4.79 2.03 0.52 -2.65 2.05 1.60
-0.79 -4.75 5.83 1.98 -2.36 2.84 0.81 1.45 -1.85 -3.70 5.86 2.06 5.17 -5.57 0.69 0.90
between-group SSQ for each orthonormal variable: 0.549, 1.491, 1.872, 0.381
sum = tr(W 1B) = 4.293
a standardized canonical vectors for orthonormal variables
b h1=h2=h3=0; h4=0 gives usual canonical vector E.. rn
![Page 169: CANONICAL VARIATE ANALYSIS: SOME PRACTICAL ASPECTS by ... · 5.5 C-V plot for arctanh correlations for grasshopper data 137 5.6 I-A and Q-Q plots for log variances for Thais data](https://reader033.vdocuments.mx/reader033/viewer/2022052106/60412acbe2295f639322b604/html5/thumbnails/169.jpg)
L L
LA
— WA WA
Ls LS
f
2.1$
6
4
0
c- v
ar►Ot
e c
oeffi
cien
ts
-2
-4
2 1- \ LA
L,WA
LS
c-v a
riat
e co
e ffic
ients
CV
II
LS
L J.. 0.01 0.02 0.08 0.04 0.05 OD
168
shrinkage constant (_ h,,.e4 )
Figure 6.1 - Plots of the canonical variate coefficients and canonical
roots as the smallest eigenvector/value contribution is
shrunk progressively to zero, by increasing h4.
![Page 170: CANONICAL VARIATE ANALYSIS: SOME PRACTICAL ASPECTS by ... · 5.5 C-V plot for arctanh correlations for grasshopper data 137 5.6 I-A and Q-Q plots for log variances for Thais data](https://reader033.vdocuments.mx/reader033/viewer/2022052106/60412acbe2295f639322b604/html5/thumbnails/170.jpg)
169
to as the smallest eigenvector, contrasts L with LS and LA.
Table 6.2 also gives the between-groups SSQ e. uiBui corresponding
to each eigenvalue/vector combination, together with the coefficients
aU for the orthonormal variables, the canonical roots fU and the
canonical vectors cU for the standardized original variables. The
smallest eigenvalue/vector contains 9% of the between-groups variation.
Examination of a1 and a2 indicates that the third principal component
dominates the first canonical variate, while the second principal
component dominates the second canonical variate. The smallest
principal component makes a greater contribution to the second canonical
variate than it does to the first.
Figure 6.1 and Table 6.2 show the coefficients for the first two
canonical variates as a14 and a24 are shrunk towards zero. The changes
in the coefficients for L, LS and LA are evident, even for a small
amount of shrinking. These changes in magnitude, and indeed sign,
hold for a wide range of values of h4, with virtually no change in the
first canonical root, and only minimal change in the second canonical
root.
The changes in the coefficients can be predicted from the results
given in the early discussion of the example: the smallest eigenvalue
is very small; the corresponding eigenvector is dominated by L in
relation to LS and LA; the corresponding between-group SSQ is relatively
small; and the greater (but still small) contribution made by the
smallest principal component to the second canonical variate is
reflected in the more marked changes in the coefficients for the
second canonical vector.under shrinking.
The generalized inverse coefficients cGI and c2I for the canonical 1
variates (h4 = co) provide a stable basis for interpretation. The first
canonical variate reflects differences in the shape and size of the
![Page 171: CANONICAL VARIATE ANALYSIS: SOME PRACTICAL ASPECTS by ... · 5.5 C-V plot for arctanh correlations for grasshopper data 137 5.6 I-A and Q-Q plots for log variances for Thais data](https://reader033.vdocuments.mx/reader033/viewer/2022052106/60412acbe2295f639322b604/html5/thumbnails/171.jpg)
170
aperture, while the second reflects differences in the relative length
of spire. This interpretation is supported by an examination of the
second and third eigenvectorst the latter contrasts LA with WA in
particular, while the former contrasts LS with LA and WA.
The lack of stability of the canonical variate coefficients for L,
LS and LA suggests that one or some of these variables may be redundant
for discrimination. None of the variables has a small standardized
coefficient so that none of them is an obvious candidate for deletion.
However, the canonical roots are little affected by shrinking, while
the coefficients for L and LA change markedly. Moreover, after
shrinking, L enters less noticeably into either canonical variate.
This suggests that an analysis based only on L, LS and WA or on LS,
LA and WA is worth examining. The canonical vectors and canonical
roots are given in Table 6.2. Note that the sum of the coefficients
for L, LS and LA is virtually the same. The interpretation of the
canonical variates is unchanged from that based on the generalized
inverse estimates.
Tne contribution of the eliminated variable can be assessed
formally by a multivariate analysis of covariance (see, e.g. Kshirsagar,
1972, Chapter 8). The procedure is simple: carry out a canonical
variate analysis based on all v variables, and carry out an analysis
based on the p retained variables (so here v-p = 1), determining Wilks
A for each analysis. Wilks A is defined in (1.15). The ratio Av/Ap
gives the Wilks Av.p, which is the basis of the statistic for assessing
the importance of the v-p variables after first including the p retained
variables (see Section 1.6). In general,
- {ng + nw - p - 2(v-p+ng+1)} log Av.p ti X(v-p)n • g
![Page 172: CANONICAL VARIATE ANALYSIS: SOME PRACTICAL ASPECTS by ... · 5.5 C-V plot for arctanh correlations for grasshopper data 137 5.6 I-A and Q-Q plots for log variances for Thais data](https://reader033.vdocuments.mx/reader033/viewer/2022052106/60412acbe2295f639322b604/html5/thumbnails/172.jpg)
171
where ng is the between-groups d.f. When v-p = 1, as in the example,
1(nw p) /n } f ti F , where f = (1-A ) /A . g v.p ng,nw-p v.p v.p v.p
Exclusion of LA (respectively L) gives A3 = 0.093 (0.102),
while A4 = 0.078, so that A4.3 = 0.838 (0.765) and f4.3 = 0.193
(0.308). Hence, with nw = 866 and ng = g-1 = 13, 12.81 (20.43) is
to be compared with an F(13,863) distribution (or, approximately,
151.9 (230.6) with a X13 distribution). Clearly, the result in both
cases is highly significant, suggesting that the omitted variable is
of value statistically for discrimination, in addition to the
discrimination contained in the other three.
It is tempting to apply the same argument to the use of the
g-inverse estimator, which eliminates the fourth principal component
rather than one of the original variables. The formal calculations
give A3 = 0.097, so that A4.3 = 0.804 and f4.3 = 0.243. In this case,
16.12 would be compared with the F(13,863) distribution.
It has been my experience that the conclusions reached by the
formal approach outlined in the previous three paragraphs are often
misleading. With the large number of degrees of freedom within groups,
a ratio of Wilks lambdas as high as 0.97 will be adjudged significant
at the 5% level. Moreover canonical variates which contain virtually
no practical information for discrimination, even though statistically
significant, influence this ratio. For example, the significance of
the last two canonical roots for the whelk data can be assessed using
the Bartlett (Kshirsagar, 1972, equation 8.7.3) or Lawley (Kshirsagar,
1972, equation 8.7.4) chi-squared approximations; the value is
approximately 370, to be compared with the X22
distribution. In this
example, and many others analyzed by me, a canonical root as small as
0.36, the value for the third root, contains no practical information,
and yet its statistical significance is marked. As in many areas of
![Page 173: CANONICAL VARIATE ANALYSIS: SOME PRACTICAL ASPECTS by ... · 5.5 C-V plot for arctanh correlations for grasshopper data 137 5.6 I-A and Q-Q plots for log variances for Thais data](https://reader033.vdocuments.mx/reader033/viewer/2022052106/60412acbe2295f639322b604/html5/thumbnails/173.jpg)
172
applied statistics, distinction must be made between the practical
significance and statistical significance of a result. This is
particularly true of multivariate discrimination problems, where
statistically significant differences between the means may be
associated with considerable overlap of individual canonical variate
scores; it is the latter which usually determines the practical value
of a canonical variate.
A more realistic guide to the information lost by excluding certain
variables or principal components is given by the ratio of canonical
roots corresponding to canonical variates judged to be of practical
significance. In this example, the first two canonical variates
summarize the variation between the groups; the g-inverse solution
retains 94% of this information while that based on L, LA and WA
(LS, LA and WA) retains 95% (90%) of the information. As some further 2
guide, the ratio II and a similar ratio for fU based i=1
on three and on four variables can be calculated; their values are
0.912 and 0.940 (0.871) respectively.
In this example, there is little to choose between the canonical
variate solution based on the generalized inverse solution, and that
based on the three variables L, LA and WA. The interpretation of the
nature of the group differences is similar from both.
Now consider the alternative shrunken estimator formulations in
(6.14) and (6.15) in Section 6.4, for either the original or standardized
data, using either the B/W or B/T formulation. Table 6.3 gives the
canonical roots and vectors for a selection of the analyses. For the
eigenanalysis-standardized data combination, the B/W and B/T
formulations give very similar results. For the eigenanalysis-B/W
combination using the within-groups covariance matrix, the diagonal
terms of Z Z are 0.51, 1.36, 1.99 and 0.44. The smallest eigenvector
![Page 174: CANONICAL VARIATE ANALYSIS: SOME PRACTICAL ASPECTS by ... · 5.5 C-V plot for arctanh correlations for grasshopper data 137 5.6 I-A and Q-Q plots for log variances for Thais data](https://reader033.vdocuments.mx/reader033/viewer/2022052106/60412acbe2295f639322b604/html5/thumbnails/174.jpg)
Usual h2i=0
correlation matrix; eigenanalysis; h -co B/W 24
h24=1
h24=5
correlation matrix; eigenanalysis; B/T h24
covariance matrix; eigenanalysis;
B/W h24=O0 h24=1
correlation matrix; triangular;
B/W hT3=o hT3=1
Table 6.3 Canonical roots and vectors for a selection of analyses for alternative shrunken estimator
formulations for Dicatliais data
Canonical vector I
L LS LA WA c-root
Canonical vector II
L LS LA WA c-root
-4.8 2.0 -2.4 5.6 2.13 4.6 -4.3 -2.1 1.5 1.68
-2.4 0.7 -3.8 5.9 2.09 0.2 -2.5 1.9 0.2 1.48
-3.3 1.1 -3.3 5.9 2.09 2.7 -3.6 0.1 0.5 1.53
-2.7 0.8 -3.6 5.9 2.09 1.0 -2.9 1.4 0.2 1.48
-2.8 0.9 -3.6 5.9 2.10 0.3 -2.6 1.8 0.3 1.49
-4.9 2.1 -2.4 5.6 2.13 -1.5 -1.8 2.4 0.8 1.37
-4.9 2.1 -2.4 5.6 2.13 1.5 -3.1 0.1 1.2 1.45
-1.2 0.4 -4.8 5.9 2.02 3.5 -3.9 -0.6 0.7 1.66
-2.8 1.1 -3.7 5.9 2.04 4.2 -4.1 -1.3 1.0 1.67
![Page 175: CANONICAL VARIATE ANALYSIS: SOME PRACTICAL ASPECTS by ... · 5.5 C-V plot for arctanh correlations for grasshopper data 137 5.6 I-A and Q-Q plots for log variances for Thais data](https://reader033.vdocuments.mx/reader033/viewer/2022052106/60412acbe2295f639322b604/html5/thumbnails/175.jpg)
174
is (0.56, -0.55, -0.58, 0.21)T. The coefficients a14 and a24 are
-0.005 and -0.44. The first is smaller and the second is larger
than the corresponding coefficients in Table 6.2. For the g-inverse
solution, the first canonical root is unchanged, while the second is
decreased more than in the analysis using the correlation matrix.
The change in sign for c23 is again evident. The diagonal terms of
Z for the triangular orthonormalization are 0.50, 1.63, 0.26 and
1.90. The coefficients a13 and a23 are 0.24 and -0.10. The third row
of UUT
is (0.9o, -0.15, 0.11, 0.00). Setting hT3 = co results in less
change in the second canonical vector than does setting h24 = co. The
change in the first canonical root is more marked than for the eigen-
analysis orthonormalization. In this example, the third row of UUT
places much less emphasis on the second and third variables, and none
on the fourth. Shrinkage based on the triangular orthonormalization
effects little change in the coefficients for L, LS and LA for the
second vector.
6.6 Discussion
When the group separation along a particular eigenvector(s) is
small and the corresponding eigenvalue(s) is also small, a marked
improvement in the stability of the canonical variate coefficients
can be effected by shrinking the ai corresponding to the eigenvector(s)
towards zero. Instability will be largely confined to those variables
which exhibit highest loadings on this eigenvector(s). In many of the
examples considered to date, the smallest eigenvector reflects a
contrast involving two or at the most three variables, and the instability
if it exists is confined to the corresponding coefficients; in fact the
sum of the coefficients is usually virtually stable.
![Page 176: CANONICAL VARIATE ANALYSIS: SOME PRACTICAL ASPECTS by ... · 5.5 C-V plot for arctanh correlations for grasshopper data 137 5.6 I-A and Q-Q plots for log variances for Thais data](https://reader033.vdocuments.mx/reader033/viewer/2022052106/60412acbe2295f639322b604/html5/thumbnails/176.jpg)
175
The practical question is to decide on the nature and degree of
shrinking to be adopted. Out of 16 data sets examined (8 published,
8 unpublished), roughly one third exhibited no instability which could
be overcome while maintaining group separation. In each of these
cases, much of the discrimination was associated with the smallest
eigenvector/value combination. When there is little discrimination
associated with the smallest eigenvector(s)/value(s), it will often be
satisfactory to use a generalized inverse approach. Shrinking the
effect of a component to zero gives results which differ little from
partial shrinking. Moreover, since the instability is usually
associated with only one or two of the principal components, there is
no advantage in drastic shrinking along the other directions - the aim
here is to improve stability and at the same time maintain group
separation.
Specific choice of the shrinkage constants remains an open
question, as indeed it does in ridge regression. The analysis of the
whelk data and similar analyses of other examples indicate that precise
determination of the constants is not necessary when the aim is to
summarize and interpret the nature of group differences. In practice,
a range of values of hi can be used, beginning with hi between 0.5 and
1, and increasing hi until stable estimates consistent with maintaining
group separation are achieved.
The ideas presented here have implications for variable selection.
A set of variables with unstable coefficients often indicates that
some of the variables are redundant and can be safely eliminated.
Variables with small standardized coefficients can also be eliminated.
The variables, amongst those remaining, with the largest standardized
coefficients will then usually be the more important variables for
discrimination. Clearly, when variables are being eliminated, care
must be taken to ensure that discrimination is little affected.
![Page 177: CANONICAL VARIATE ANALYSIS: SOME PRACTICAL ASPECTS by ... · 5.5 C-V plot for arctanh correlations for grasshopper data 137 5.6 I-A and Q-Q plots for log variances for Thais data](https://reader033.vdocuments.mx/reader033/viewer/2022052106/60412acbe2295f639322b604/html5/thumbnails/177.jpg)
176
Precise guidelines are difficult to set down (see the discussion in
Section 6.5); much depends on the degree of separation contained in
the remaining variables.
From the viewpoint of effective data analysis, selection of
variables based on examination of the relative magnitudes and stability
of the standardized coefficients may be preferable to a stepwise
procedure. The former focusses attention on how correlations between
the variables, and their relation to differences between the means,
affect the relative importance of the variables. In the Dicathais
example, either L or LA can be eliminated with little loss of
discrimination - and there is marked instability in the corresponding
coefficients.' Both variables are involved in the smallest eigenvector,
and so the statistician must make the conscious choice, perhaps bringing
in taxonomic considerations, as to which variable is the more useful
to retain.
Practical considerations have a strong bearing on the form in
which shrunken estimators are introduced. In my experience, it is easier
to interpret the nature of the eigenvectors in the simultaneous rotation
than it is to interpret the rows of the transpose of the inverse of
the unit triangular matrix for the successive orthonormalization.
The eigenanalysis is straightforward: the smallest eigenvalue/vector
combinations are the ones which will reflect directions of near-
singularity within groups if they exist. And the variables with high
absolute loadings for those eigenvectors are the ones which together
result in the near-singularities. The diagonal terms of the triangular
matrix do not seem to reflect near-singularities to the same degree.
Moreover, all variables appear in all eigenvectors, whereas only the
first j variables appear in the jth linear combination formed by the
Choleski procedure. I have also found the eigenanalysis to be a more
![Page 178: CANONICAL VARIATE ANALYSIS: SOME PRACTICAL ASPECTS by ... · 5.5 C-V plot for arctanh correlations for grasshopper data 137 5.6 I-A and Q-Q plots for log variances for Thais data](https://reader033.vdocuments.mx/reader033/viewer/2022052106/60412acbe2295f639322b604/html5/thumbnails/178.jpg)
177
sensitive indicator of near-singularity. This and ease of interpretation
lead to a preference for the eigenrotation.
There appears to be little difference between shrunken estimates
based on any of. (6.11) to (6.15) and those based on an equivalent
formulation corresponding to (1.12) for the between-to-total approach.
This is intuitively reasonable, since shrunken estimators will be
effective when there is little between-groups variation for a direction
corresponding to little within-groups variation. But this suggests
that directions of near-singularity for the total matrix will then
correspond with those for the within-groups matrix. High within-groups
correlation and high overall correlation of the same sign implies that
the group means are virtually coincident for some of the minor
directions of within-group variation (see also Section 6.2); but for
the overall correlations to also be high, the latter must be similar
to minor directions of total or overall variation.
Experience with the use of shrunken estimators has mainly been
gained using (6.11). The formulation for the Choleski decomposition
and the unification of the approaches in Section 1.5 was in response
to questions of lack of invariance and of uniqueness of the eigenvector
decomposition. The choice of the shrinkage constants as multiples of
the standard deviations of the orthogonal variables from the first-stage
eigenanalysis, rather than as multiples of the variances, makes little
difference to the analysis. A more important point for the eigenvector
rotation is the choice of original or standardized data. My own
preference is for the latter, since an eigenanalysis of the correlation
matrix is more readily interpretable. However, the obvious recommenda-
tion is to carry out the analysis on both forms of the data.
In summary, the recommended approach is to shrink markedly those
components corresponding to a small eigenvalue and small contribution
![Page 179: CANONICAL VARIATE ANALYSIS: SOME PRACTICAL ASPECTS by ... · 5.5 C-V plot for arctanh correlations for grasshopper data 137 5.6 I-A and Q-Q plots for log variances for Thais data](https://reader033.vdocuments.mx/reader033/viewer/2022052106/60412acbe2295f639322b604/html5/thumbnails/179.jpg)
178
to trace (WB). Since the sum of the coefficients tends to be stable,
deletion of one or some of the variables with unstable coefficients
may suggest itself as the next step in the analysis. It should be
pointed out that in many cases there will be no advantage in shrinking;
this occurs when much of the between-group variation coincides with
the directions of the smallest eigenvectors. Whereas in regression
the presence of high correlation will almost certainly indicate
instability of the coefficients of the variables with highest loadings
in the smallest eigenvector(s), within-groups correlations as high as
0.98 may be associated with marked stability in discriminant analysis.
With high positive within-group correlation and negative between-groups
correlation, marked shrinking will never be necessary. However, with
high positive within- and between-groups correlations, marked shrinking
will nearly always be advantageous.
![Page 180: CANONICAL VARIATE ANALYSIS: SOME PRACTICAL ASPECTS by ... · 5.5 C-V plot for arctanh correlations for grasshopper data 137 5.6 I-A and Q-Q plots for log variances for Thais data](https://reader033.vdocuments.mx/reader033/viewer/2022052106/60412acbe2295f639322b604/html5/thumbnails/180.jpg)
179
CHAPTER SEVEN: COMPARISON OF CANONICAL VARIATES
In this Chapter, the functional relationship formulation for
canonical variate analysis outlined in Section 1.3 is used to develop
methods for comparing canonical variate analyses for several independent
sets of data. Section 7.2 develops likelihood ratio criteria for
examining common orientation of discriminant planes; coincidence of
discriminant planes; common orientation and common dispersal of means;
coincidence and common dispersal, and overall coincidence or common
orientation and common position. An example is given in Section 7.3.
Section 7.4 discusses some practical aspects.
7.1 Introduction
A common problem in multivariate discrimination studies is the
analysis and comparison of different sets of data, where each set
relates to the same physical or biological problem. For example, in
medical studies, data on patients in various disease categories are
often available from a number of regions; the data may also be
available for different socioeconomic classes, races and so on.
The general problem considered is that of a discrimination study,
with s sets of data, each set relating to the same g groups, with the
same v variables in each group. A commonly adopted approach is to
carry out separate canonical variate analyses for each set, and also a
combined analysis based on all gs groups. Visual comparisons of the
resulting canonical vectors and of plots of canonical variate means
are then made. The problem can also be considered in the context of
multivariate analysis of variance. The total variation can be
partitioned into effects for sets, for groups, and for sets x groups.
![Page 181: CANONICAL VARIATE ANALYSIS: SOME PRACTICAL ASPECTS by ... · 5.5 C-V plot for arctanh correlations for grasshopper data 137 5.6 I-A and Q-Q plots for log variances for Thais data](https://reader033.vdocuments.mx/reader033/viewer/2022052106/60412acbe2295f639322b604/html5/thumbnails/181.jpg)
180
The interaction and appropriate main effect can also be partitioned
into the group effect for each set, and, formally, into the set effect
for each group. An examination of the contribution of the sets x
groups effect, relative to the variation within-groups, will give
some indication of whether the variation between the groups is similar
for all sets. However, as discussed further in Section 7.4, by
analogy with the univariate analysis of variance and partitioning
of polynomial trends, this can lead to an insensitive comparison.
Itis desirable to be able to make more detailed comparisons of the
between-group discrimination for each set. While separate canonical
variate analyses and a combined analysis go some way to achieving this,
they still require subjective comparisons of the results.
This Chapter presents a more formal approach by formulating
models in terms of structure of the group means. The simple represen-
tation in Figure 7.1 together with Table 7.1 give the sequence of
models considered here. It is assumed that the number, p, of canonical
variates of interest is specified, and that the corresponding canonical
roots are well-separated, so that the canonical vectors for each set
are well-defined. This is, in my experience, a reasonable assumption.
The term dispersal in Figure 7.1 and Table 7.1 refers to the relative
positions of the (projected) means for each set. The sequence
1(c) } 1(d) is not specified in Table 7.1; it will only be of interest
when either factor can be designated as sets or as groups. It can be
approached by interchanging the designation of sets and groups, and
carrying out parallel analyses for each designation.
![Page 182: CANONICAL VARIATE ANALYSIS: SOME PRACTICAL ASPECTS by ... · 5.5 C-V plot for arctanh correlations for grasshopper data 137 5.6 I-A and Q-Q plots for log variances for Thais data](https://reader033.vdocuments.mx/reader033/viewer/2022052106/60412acbe2295f639322b604/html5/thumbnails/182.jpg)
181
Table 7.1 Representation of comparisons of models of interest.
1(•) refers to Figure 7.1, while 1(.') denotes the same
Figure with coincident vectors.
individual orientation and dispersal - 1(a)
common orientation, individual dispersal - 1(b)
common orientation, coincidence, common dispersal - individual dispersal -
1(c)
1(b' )
coincidence, common dispersal - 1(c')
coincidence, common dispersal and position or overall coincidence - 1(d')
![Page 183: CANONICAL VARIATE ANALYSIS: SOME PRACTICAL ASPECTS by ... · 5.5 C-V plot for arctanh correlations for grasshopper data 137 5.6 I-A and Q-Q plots for log variances for Thais data](https://reader033.vdocuments.mx/reader033/viewer/2022052106/60412acbe2295f639322b604/html5/thumbnails/183.jpg)
182
(b~
Figure 7.1 - Representation of three groups for three sets.
The axes represent variables. The symbols • , •
and ♦ represent sets. The means for each set lie on
discriminant vector: (a) different orientation for
each vector; (b) common orientation but different
dispersal; (b') coincidence but different dispersal;
(c) common orientation and dispersal but different
positions; (c') coincidence and common dispersal;
(d') with means collapsed along dotted lines -
overall coincidence or common orientation, dispersal
and position.
![Page 184: CANONICAL VARIATE ANALYSIS: SOME PRACTICAL ASPECTS by ... · 5.5 C-V plot for arctanh correlations for grasshopper data 137 5.6 I-A and Q-Q plots for log variances for Thais data](https://reader033.vdocuments.mx/reader033/viewer/2022052106/60412acbe2295f639322b604/html5/thumbnails/184.jpg)
183
7.2 Comparison of Solutions
Consider s sets of data, with g groups in each set. Assume that
an observed vxl vector xktm is distributed as Nv(ukt,), where
m = 1,...,nkt; k = 1,...,g; and t = 1,...,s.
The general model considered in this Chapter is the generalization
of that considered in Section 1.3, viz.
ukt '0t + EYtkt
with 'Yt the vxp matrix of population canonical vectors for the tth set.
Estimation under this model, and under those to be discussed
subsequently, can all be reduced to a generalized eigenanalysis of
the sort obtained when there is only one set (s = 1). The approach
adopted is to reduce the relevant part of the log likelihood to a
form analogous to (1.23) for the single-set case in Section 1.3, whence
estimates of the canonical vectors and covariance matrix follow by
analogy with (1.30), (1.26) and (1.31). Since these results for the
single-set case are used repeatedly in the remainder of this Section,
they will be reviewed very briefly here. The relevant part of the
log likelihood maximized with respect to 4k and u0 is given by
n log 1E1 - tr E 1S - tr E 1B + tr E 1PB . (7.1)
Maximization w.r.t. E and 'Y gives T = C, where the C are the first p
columns of C, and C satisfies
BC = SCF; (7.2)
![Page 185: CANONICAL VARIATE ANALYSIS: SOME PRACTICAL ASPECTS by ... · 5.5 C-V plot for arctanh correlations for grasshopper data 137 5.6 I-A and Q-Q plots for log variances for Thais data](https://reader033.vdocuments.mx/reader033/viewer/2022052106/60412acbe2295f639322b604/html5/thumbnails/185.jpg)
184
here F is the diagonal matrix of canonical roots. The canonical
vectors and roots satisfy
CTBC = nF
and (7.3)
CTSC = ni .
Finally,
nE = S + B - n 1BC F-1CTB . (7.4) P P P
For consistency of notation, the matrices and vectors required
will now be defined for general t = 1,...,s. Write
xkt =
n,t =
nk. =
-1 nkt
nkt E xktm ,
m=1
E nkt , k=1
s
! nkt ' t l
— -1 g - X•t
__ n•t k
=1 nkt xkt '
— -1 s -
Xk•
__ n
k. t lE nktxkt '
s g
nT = tE1 kEl
nkt ,
(7.5)
and
_1 s g —
XT _ n
-1nktxkt t=1 k=1
![Page 186: CANONICAL VARIATE ANALYSIS: SOME PRACTICAL ASPECTS by ... · 5.5 C-V plot for arctanh correlations for grasshopper data 137 5.6 I-A and Q-Q plots for log variances for Thais data](https://reader033.vdocuments.mx/reader033/viewer/2022052106/60412acbe2295f639322b604/html5/thumbnails/186.jpg)
185
Let St and Bt be the usual within-groups and between-groups
SSQPR matrices for the tth set, viz.
g nkt _ — St = S E (x1-,4„,
-- - xkt) (xktm - xkt)T
k=1 m=1 (7.6)
and
- Bt = E nkt(xkt - x)(xkt - x.t)T ,
k=1
and write
s ST = E St
t=1
The derivation of the main results for the models of interest
now follows the sequence given in Table 7.1.
7.2.1 Individual orientation and dispersal
Consider the model
kt =
uOt + ETtrkt (7.7)
specifying different canonical vectors for each set. Then the relevant
part of the log likelihood may be written as
s g _ _ -nT loglE l-trE
-1ST - E E nkt(xkt-uOt-t 11
-1(xkt-uOt-£Y'Jkt) . t=1 k=1
With
Qt = (Y't -1 ETt) T
t
![Page 187: CANONICAL VARIATE ANALYSIS: SOME PRACTICAL ASPECTS by ... · 5.5 C-V plot for arctanh correlations for grasshopper data 137 5.6 I-A and Q-Q plots for log variances for Thais data](https://reader033.vdocuments.mx/reader033/viewer/2022052106/60412acbe2295f639322b604/html5/thumbnails/187.jpg)
186
and
Pt £'YQt '
proceed as in the single set case to obtain
Ckt Qt(xkt - uOt)
and
(I-Pt)uOt = (I-Pt)x.t .
With
s BT = £ Bt r
t=1
the log likelihood maximized with respect to not and Ckt is
s -nTlog l £I - tr E- 1S
T - tr £-1BT + tr £-1 E PtBt .
t=1
Following the same steps as for the single-set case leads to
s
l nT£ = ST + BT - nT
E BtTtFt
l4'tBt
t=1 p
where the diagonal matrix Ftp satisfies
AT A = npFtp '
(7.8)
(7.9)
while
A
BtI,t = nT£'YtFtp . (7.10)
![Page 188: CANONICAL VARIATE ANALYSIS: SOME PRACTICAL ASPECTS by ... · 5.5 C-V plot for arctanh correlations for grasshopper data 137 5.6 I-A and Q-Q plots for log variances for Thais data](https://reader033.vdocuments.mx/reader033/viewer/2022052106/60412acbe2295f639322b604/html5/thumbnails/188.jpg)
187
Now consider a particular value of t, say t = f. Then from (7.8),
nTE~'t =ST f +B/ ff
f -B; fF fp'YfB f'Y fnTl (7.11)
+ E Bt (I-TtFtppTtBtnT1
)Tf .
t#f
From (7.9), the second and third terms on the r.h.s. of (7.11) cancel.
Write
$
Hf = E Bt (I-'YtFtp Yt -1
BtnT ) t=1 p
t#f
then from the simplified form of (7.11) and from (7.10),
A A Bf/f = (ST+Hf)Y'fFfp (7.12)
A and the vectors Y'f are scaled so that
A A
f (ST+Hf)'Yf = nTI .
The solution is iterative; this is discussed in Section 7.4.
7.2.2 Common orientation, individual dispersal
Common orientation or parallelism of the discriminant planes is
specified by the model
ukt = u0t + ETCkt • (7.13)
Write
Q = (`YT ERY) -1TT
![Page 189: CANONICAL VARIATE ANALYSIS: SOME PRACTICAL ASPECTS by ... · 5.5 C-V plot for arctanh correlations for grasshopper data 137 5.6 I-A and Q-Q plots for log variances for Thais data](https://reader033.vdocuments.mx/reader033/viewer/2022052106/60412acbe2295f639322b604/html5/thumbnails/189.jpg)
188
and
P = E'YQ .
Then proceeding as in the single set case and as in Section 7.2.1
gives
Akt = Q (xkt - uot)
and
(I-P)1Ot = (I-P)x.t .
The relevant part of the log likelihood maximized with respect to uOt
and Akt
is
-nT log+El - tr E 1ST - tr E1BT + tr E 1PBT .
But this is of the same form as (7.1) for the single-set case,
with ST, BT and nT replacing S, B and n. This gives 'YT as the solution of
w w
BT;T ST'YTFTp (7.14)
with
T 'YTBT'YT = n Tp
and
'YTST'YT = nTI ,
while, from (7.4),
('.15)
![Page 190: CANONICAL VARIATE ANALYSIS: SOME PRACTICAL ASPECTS by ... · 5.5 C-V plot for arctanh correlations for grasshopper data 137 5.6 I-A and Q-Q plots for log variances for Thais data](https://reader033.vdocuments.mx/reader033/viewer/2022052106/60412acbe2295f639322b604/html5/thumbnails/190.jpg)
189
A The vectors 'YT are again the first p columns of the matrix of eigen-
vectors CT which satisfy (7.14).
The estimated means are given by
A T - }t kt
__ x•t +
VTTTTT(xkt - x•t)
AT- with VT = nT1ST, and the canonical variate means are given by TTxkt.
The model (7.13) specifies common orientation of the discriminant
planes. However, it does not specify common dispersal of the projected
means for the canonical vectors across sets and hence does not
necessarily specify equal canonical roots.
7.2.3 Common orientation and common dispersal
Following the first sequence of models, viz. 1(b) to 1(c) in
Figure 7.1, the model specifying common orientation and dispersal,
and hence common canonical roots, is
kt = '
Ot +
k . (7.16)
Write
= -1 u0 (k) nk•
tEl n
ktuOt
Differentiation of the log likelihood w.r.t. Ck gives
Ck = QS- - u0(k))
and so the log likelihood maximized with respect to Ck may be written as
-nT loglEl - tr E1ST
- E E nkt(xkt u0t)TE-1(xkt - u0t) k=1 t=1
g Tr-1 -
+ k=l nk• (xk• - 1-10(k)) E p(xk• - u0(k)) •
![Page 191: CANONICAL VARIATE ANALYSIS: SOME PRACTICAL ASPECTS by ... · 5.5 C-V plot for arctanh correlations for grasshopper data 137 5.6 I-A and Q-Q plots for log variances for Thais data](https://reader033.vdocuments.mx/reader033/viewer/2022052106/60412acbe2295f639322b604/html5/thumbnails/191.jpg)
190
Differentiation w.r.t. pOt
leads to
g _ E {nktE
-1(uot-xkt) - nklnktE-lr(E nkfu0f nk•k• = 0,
k=1 f=1
A and so the expression for uOt
becomes
-1 g AOt __ x•t
- n•t P kE1 nkt(xk, - ū0(k)) .
Write
g s _ - T BDG = E E n
kt (xkt u0t) (xkt uOt)
k=1 t=1
and
g _ A
BD = El
nk. (xk• u0(k)) (xk. - u0(k))T . k
Then the relevant part of the log likelihood becomes
-nT loglEl - tr E 1ST - tr E1BDG + tr E'PBD
nT loglEl - tr E'ST - tr E-1(BDG BD) - tr E IBD + tr E-1PBD .
But this is of the same form as (7.1) for the single-set case, with
ST + BDG - BD replacing S and BD replacing B. From (7.2), (7.3) and
(7.4), this gives the canonical vectors ;D and canonical roots P
Dp as
the solution of
BDYD = (ST + BDG - BD) 1'DFDp (7.17)
![Page 192: CANONICAL VARIATE ANALYSIS: SOME PRACTICAL ASPECTS by ... · 5.5 C-V plot for arctanh correlations for grasshopper data 137 5.6 I-A and Q-Q plots for log variances for Thais data](https://reader033.vdocuments.mx/reader033/viewer/2022052106/60412acbe2295f639322b604/html5/thumbnails/192.jpg)
191
with
AT '4'D (ST + BDG - BD)'YD = nTI
and
T `Y DBD'Y D = nTFDp ,
while
nTE= ST + BDC - BDY'DFDpY'DBDnTl . (7.18)
The solution is iterative in that BDG and BD depend on
pOt which
x depends on tip; this is discussed in Section 7.4.
7.2.4 Coincidence but individual dispersal
The model (7.13), and Figure 7.1(b), specifies parallel discriminant
planes with different dispersal and different position in the direction
orthonormal to the plane. The requirement of coincidence of the
discriminant planes (Figure 7.1(b')) implies that pot = u0 + E'YKt which
gives
ukt = PO + E'YKt + E'Y
kt '
or
and
'kt = u0 +
E'l kt •
Proceeding as in the previous part of this Section gives
A
Akt = Q(x
kt - u0)
A (I-P)u0 = (I-P)xT .
(7.19)
![Page 193: CANONICAL VARIATE ANALYSIS: SOME PRACTICAL ASPECTS by ... · 5.5 C-V plot for arctanh correlations for grasshopper data 137 5.6 I-A and Q-Q plots for log variances for Thais data](https://reader033.vdocuments.mx/reader033/viewer/2022052106/60412acbe2295f639322b604/html5/thumbnails/193.jpg)
192
Write
g s _ BGT = E E nkt(xkt - xT) (xkt - xT)T .
k=1 t=1
Then it is easy to show that the solution is once again as for the
single-set case, with ST replacing S and BGT replacing B. This gives
A the canonical vectors TG and roots F
Gp as the solution of
A A BGT G
_ST%GFGp
with the usual constraints analogous to (7.3), while
A nTE = ST + BGT BGTTGFGp~G 1 B GTnT
.
(7.20)
(7.21)
7.2.5 Coincidence and common dispersal
From the first form of (7.19), the requirement of coincidence and
common dispersal is specified by the model
ukt = UG E'YKt + ET4k .
Write
• -1__ s K(k) nk• EnktKt •
t=1
Differentiation of the log likelihood w.r.t. ~ gives
A
k = Q (xk, - uG EY'K (k))
(7.22)
![Page 194: CANONICAL VARIATE ANALYSIS: SOME PRACTICAL ASPECTS by ... · 5.5 C-V plot for arctanh correlations for grasshopper data 137 5.6 I-A and Q-Q plots for log variances for Thais data](https://reader033.vdocuments.mx/reader033/viewer/2022052106/60412acbe2295f639322b604/html5/thumbnails/194.jpg)
193
and so the log likelihood maximized with respect to ;k may be written as
g s _ _ -nTlog l E ( - trE 1ST - E E nkt (xkt-uō EY'Kt) TE-1(xkt-uŌ
EY'Kt) k=1 t=1
g _ _ + E nk. (xk.-u0-EY'K
(k) )TE-1P
(xk•-1.10 E'VK (k)) . k=1
Differentiation w.r.t. Kt leads to
_ g _ 5 E'YKt = Px.t - Pn. E nktxk. + n.
t 9E.
(k) . k=1 k=1
Define
K, = nTl
E nKt = t=1
then the maximum likelihood estimator for uo satisfies
A tI-P)la0 = (I-P) (xT - E'YK) .
Write
A BCG = E E nkt (xkt-E'YKt) (xkt-E'YKt) T
k=1 t=1
g _ BCG k l = E nk. (xk. -E'YK (k)) (xk.-E'YK
(k) )T
and
BCT = nT(xT-E'YK.) (xT EY'k. T ) .
g s
![Page 195: CANONICAL VARIATE ANALYSIS: SOME PRACTICAL ASPECTS by ... · 5.5 C-V plot for arctanh correlations for grasshopper data 137 5.6 I-A and Q-Q plots for log variances for Thais data](https://reader033.vdocuments.mx/reader033/viewer/2022052106/60412acbe2295f639322b604/html5/thumbnails/195.jpg)
194
Then the relevant part of the log likelihood maximized with
respect to Ck, Kt and u0 may be written as
-nTloglEl-trE 1ST-trE 1BCG+trE-1PBCO+trE-1BCT-trE-1PBCT .
But this is of the same form as (7.1), with ST + BCG - BCO
replacing S, and BCO BCT replacing B. From (7.2), (7.3) and (7.4)
w this gives the canonical vectors 'YC and canonical roots FCp as the
solution of
w w (B -BB
CT) C (ST+BCG-BCO)'VCFCp
qT (ST+BCG-BC0);' = nTI
wT w ~C (BCO-BCT)'YC = nTFCp
while
nTE = ST+BCG BCT-(BCO BCT)~CFCpgC(BCO-BCT)n l T
with
and
(7.23)
(7.24)
The solution is iterative in that BCG' BCO and BCT depend on
w w A
E , 'Y and Kt .
7.2.6 Common orientation, dispersal and position
The model for common position in addition to coincidence and
common dispersal, or overall coincidence, is specified by
kt = u0 + E'Yck . (7.25)
![Page 196: CANONICAL VARIATE ANALYSIS: SOME PRACTICAL ASPECTS by ... · 5.5 C-V plot for arctanh correlations for grasshopper data 137 5.6 I-A and Q-Q plots for log variances for Thais data](https://reader033.vdocuments.mx/reader033/viewer/2022052106/60412acbe2295f639322b604/html5/thumbnails/196.jpg)
195
Note that while this model follows as a direct simplification
of (7.22) (or Figure 7.1(c')) in Table 7.1, comparison of (7.25)
(or Figure 7.1(d')) with (7.16) and (7.19) suggests that it could
also follow directly from either of these models. However comparison
of the corresponding Figures 7.1(c) and 7.1(b') with Figure 7.1(d')
suggests that the sequence as specified is more logical.
The maximum likelihood estimators of Ck and are easily shown
to satisfy
w k = Q (x
- PO)
(I-P)u0 = (I-P)xT .
Write
g s BG = E E nkt (xkt -
xk • ) (xkt - xk, ) T
k=1 t=1
g _ _ _ _ T Bo = nk• (xk. - xT)(xk. - xT) . k=1
Then the relevant part of the log likelihood maximized with respect to
u0 and Ck may be written as
-nT1oglE1 - trE'ST - trE "BG - trE1B0 + trE 1PB0 .
But this is again of the same form as (7.1) for the single-set
case, with ST + BG replacing S, and BD replacing B. Note that
ST + BG may be written as
and
and
•
![Page 197: CANONICAL VARIATE ANALYSIS: SOME PRACTICAL ASPECTS by ... · 5.5 C-V plot for arctanh correlations for grasshopper data 137 5.6 I-A and Q-Q plots for log variances for Thais data](https://reader033.vdocuments.mx/reader033/viewer/2022052106/60412acbe2295f639322b604/html5/thumbnails/197.jpg)
196
g s nkt _ s E { E E (xktm xkt)(xktm
xkt)T +
E nkt(xkt-xk.)(xkt-xk.)T} k=1 t=1 m=1 t=1
g s nkt _ — = E E E
(xktm xk •) (xktm xk •) T • k=1 t=1 m=1
and this is simply the sum of the within-groups SSQPR matrices for
each group for the data bulked over all sets. From (7.2), (7.3) and
(7.4), the required canonical vectors TO and canonical roots Fop are
given by
with
BOr'O = (ST + BG) %I OFop
(7.26)
T0 (ST + BG) W0 = nTI
and
Y 0B0'YO = nTFOp ,
while
nTE = ST + BG +B
0 - BOY'0F0p'TYOBOnT . (7.27)
This is an intuitively acceptable solution since the data for each group
are bulked over all sets, and a single canonical variate analysis is
then carried out to examine the variation between the g (larger bulked)
groups. In particular, with V0 = nT
-1 (ST+BG),
ATA
ukt = xT+VoToTO(xk,-xT) and
TOUkt = ~Ypxk., which are simply the
canonical variate means for the g larger groups.
![Page 198: CANONICAL VARIATE ANALYSIS: SOME PRACTICAL ASPECTS by ... · 5.5 C-V plot for arctanh correlations for grasshopper data 137 5.6 I-A and Q-Q plots for log variances for Thais data](https://reader033.vdocuments.mx/reader033/viewer/2022052106/60412acbe2295f639322b604/html5/thumbnails/198.jpg)
197
7.2.7 Likelihood ratio statistics
The likelihood ratio statistics for comparison of the various
models reduce to the ratio of the determinants of the estimates of
E under the models. The determinant can usually be factorized: for
the single set case, the form for E given in (7.4) may be rewritten
as nE = S + n 1BCgF-CTB = S + n 1SCgCTB where the Cq are the last v-p
columns of C and some of the diagonal elements of Fq will be zero if
g-1 < v. Hence
1'121 = ISI I i+n-1CgCTBI = ISII I+n 1CTBCgI = ISIII+FqI.
However, it is straightforward when programming the comparisons to
w calculate E and evaluate the determinant directly. The d.f. for the
various comparisons are given, as usual, by the difference in the
number of estimated parameters for the models. Table 7.2 summarizes
the equation numbers for the main results for each model (including
those for the 2) and the corresponding degrees of freedom. With
Table 7.1 (and Figure 7.1), it provides a ready reference for the
comparisons of the models.
7.3 An Example
The data set to be discussed is taken from a study of morphological
divergence and altitudinal variation in three species of grasshopper
(see also Section 5.3). Groups were sampled at a number of sites along
two inuependent altitudinal transects in the Snowy Mountains, New South
Wales. Details are given in Campbell and Dearn (1979). The first
canonical variate for the data for the two transects effects complete
separation of the three species; information for discrimination between
the species is restricted to this variate. The question then arises as
![Page 199: CANONICAL VARIATE ANALYSIS: SOME PRACTICAL ASPECTS by ... · 5.5 C-V plot for arctanh correlations for grasshopper data 137 5.6 I-A and Q-Q plots for log variances for Thais data](https://reader033.vdocuments.mx/reader033/viewer/2022052106/60412acbe2295f639322b604/html5/thumbnails/199.jpg)
Table 7.2 Summary of models, with equation numbers for the model and for the estimates of ' and of £,
and the degrees of freedom for the model. Refer to Table 7.1 and Figure 7.1 for cross-
reference of model details.
model
1(a) ukt = uOtt£vitkt
1(b) ukt = pOt)+ £'kt
1(b') ukt = u0 + £TCkt
1(c) ukt = 1Ot?+
1(c') Pict = u0 + £`YKt
+ £TCk
1(d') ukt = u0 + £TCk
text eqn
7.7
7.13
7.19
7.16
7.22
7.25
A A
eqn for eqn for £
7.12 7.8
7.14 7.15
7.20 7.21
7.17 7.18
7.23 7.24
7.26 7.27
degrees of freedom*
sv+s(vp-p2) + Wg-1)
sv + vp-p2 + sp(g-1)
✓ + vp-p2 + sp(g-1)
sv + vp-p2 + p(g-1)
✓ + vp-p2 + p(s-1) + p(g-1)
✓ + vp-p2 + p(g-1)
*there are also v(v+l)/2 d.f. for estimation of E common to all models
![Page 200: CANONICAL VARIATE ANALYSIS: SOME PRACTICAL ASPECTS by ... · 5.5 C-V plot for arctanh correlations for grasshopper data 137 5.6 I-A and Q-Q plots for log variances for Thais data](https://reader033.vdocuments.mx/reader033/viewer/2022052106/60412acbe2295f639322b604/html5/thumbnails/200.jpg)
199
to whether the canonical variate effecting species separation is
the same for both transects. Three variables - eye width, eye depth,
and width of head - contain much of the information for discrimination.
Data for fourteen of the groups considered by Campbell and Dearn (1979)
are reexamined. Table 7.3 gives a summary.
Figure 7.2 shows a plot of the group means for the first canonical
variate for all fourteen groups, and for the analysis of each transect
separately. The 'overall canonical variate analysis and the separate
canonical variate analyses show the same general pattern of group
dispersal. Separation of the group means along transect II appears to
be greater than that along transect I, largely as a result of the greater
separation of the Praxibulus and K. usitatus groups. However, as
Table 7.4 shows, the canonical roots for the two transects are very
similar (2.16 vs 2.08); the slightly greater separation of the
K. cognatus means for the second transect offsets the separation of the
other two species. The slight tendency for K. cognatus groups along
transect II to have larger canonical variate means than groups along
transect I is a reflection of the slightly larger size of animals from
the second transect. This is shown in the SIZE column in Table 7.3 for
the means for the first principal component derived from the pooled
correlation matrix.
Table 7.4 lists the canonical root and canonical vector for the
various models outlined in Tables 7.1 and 7.2. The determinant of f
is also given. The similarity of the canonical roots and vectors and of
(ĒI for the individual orientation and the common orientation models is
evident. The additional specification of common dispersal results in a
marked increase in(Ēj, from 0.83 to 0.98, and a corresponding change in
the canonical root. The specification of coincidence but individual
dispersal has relatively less effect on these statistics, with ~Ē~
![Page 201: CANONICAL VARIATE ANALYSIS: SOME PRACTICAL ASPECTS by ... · 5.5 C-V plot for arctanh correlations for grasshopper data 137 5.6 I-A and Q-Q plots for log variances for Thais data](https://reader033.vdocuments.mx/reader033/viewer/2022052106/60412acbe2295f639322b604/html5/thumbnails/201.jpg)
set species
t=l Praxibulus
K. cognatus
K. usitatus
t=2 Praxibulus
K. cognatus
K. usitatus
groupa altitude
1-5 980
1-16 1010
1-6 1180
1-27 1230
1-26 1390
1-50 1520
1-50 1520
2-35 1040
2-32 1010
2-22 1180
2-23 1240
2-41 1380
2-57 1480
2-56 1520
pooled correlation matrix
pooled within-groups standard deviations
200
Table 7.3 Summary of data for Praxibulus and Kosciuscola species
n EW ED HW SIZEb
20 1.57 2.22 3.17 33.34
20 1.51 2.18 3.24 32.88
20 1.53 2.17 3.31 33.28
39 1.53 2.18 3.26 33.15
12 1.54 2.21 3.33 33.61
13 1.55 2.16 3.34 33.44
18 1.59 2.11 3.67 34.54
10 1.59 2.25 3.17 33.65
16 1.61 2.24 3.52 34.88
20 1.55 2.14 3.44 33.58
19 1.55 2.22 3.46 34.12
20 1.53 2.20 3.37 33.54
17 1.53 2.18 3.35 33.40
12 1.61 2.18 3.88 35.71
1.0 0.63 0.70
1.0 0.69
0.049 0.074 0.121
a the first number refers to the transect (here set), the second to a
group code used by Campbell and Dearn (1979).
b SIZE is the mean for each group of the first principal component
(the eigenanalysis is based on the correlation matrix derived from
the pooled covariance matrix).
![Page 202: CANONICAL VARIATE ANALYSIS: SOME PRACTICAL ASPECTS by ... · 5.5 C-V plot for arctanh correlations for grasshopper data 137 5.6 I-A and Q-Q plots for log variances for Thais data](https://reader033.vdocuments.mx/reader033/viewer/2022052106/60412acbe2295f639322b604/html5/thumbnails/202.jpg)
201
1-26 I-50 I-16,27 1-6 1-50 1-5
(a) It ]I .
2-35 2-56 2-57 2-23 2-22 2-41 2-32
1-26 I-50 1-5 1-16,27 1-6
(b)I s szsz I-50
T
CO 11 4 2-35
fiZ fit 2-57 2-23 2-22
2-41 2-32
4 2-56
-1 0 1 2 3 4 5 6 7 8 9
FIRST CANONICAL VARIATE
Figure 7.2 Canonical variate means for populations of Praxibulus (•), Kosciuscola cognatus (0) and K. usitatus (♦) for a canonical variate analysis for (a) all populations for transects I and II combined; (b) for transect I data; and (c) for transect II data.
Population numbers are given for cross-reference with Table 7.3.
![Page 203: CANONICAL VARIATE ANALYSIS: SOME PRACTICAL ASPECTS by ... · 5.5 C-V plot for arctanh correlations for grasshopper data 137 5.6 I-A and Q-Q plots for log variances for Thais data](https://reader033.vdocuments.mx/reader033/viewer/2022052106/60412acbe2295f639322b604/html5/thumbnails/203.jpg)
202
increasing from 0.83 to 0.87.
The ratio of determinants for the hypothesis of common orientation,
viz. 1(b) vs 1(a) in Figure 7.1, is 1.007 (= 0.8278 : 0.8220). The
total number of observations is 256 (see Table 7.3) so that
256 log 1.007 = 1.81 is to be compared with a X2 distribution; taking
the d.f. of 242 as the multiplier gives 1.7. The corresponding
probability is around 0.50. The specification of common dispersal
given common orientation, viz. 1(c) vs,1(b), gives a ratio of determinants
of 1.18 and hence a value of 42 (40 using the d.f.) to ae compared with a
X6 distribution; the result is highly significant. The specification of
coincidence but individual dispersal, viz 1(b') vs 1(b), gives a ratio
of determinants of 1.05 and a value of 13.0 (12.3 using the d.f.) to be
compared with a X3 distribution; the associated probability is between
0.01 and 0.001.
The significant departure from the model for common dispersal could
be anticipated from Figure 7.2, despite the similarity of the canonical
roots for the two transects. There are obvious differences in the
nature of the separation. For common dispersal to hold, the differences
between corresponding groups from set to set must be essentially the
same.
The formal comparison of canonical variates for the two transects
suggests that the nature of the discrimination - viz. discrimination in
terms of eye shape and size relative to head shape and size - is
effectively the same for both transects. Examination of a plot of group
means and associated concentration ellipses for the three species shows
that animals from transect II have a slightly larger eye depth, relative
to head width. This results in slight separation of the (parallel)
discriminant vectors and leads to the significant result for the model
specifying coincidence.
![Page 204: CANONICAL VARIATE ANALYSIS: SOME PRACTICAL ASPECTS by ... · 5.5 C-V plot for arctanh correlations for grasshopper data 137 5.6 I-A and Q-Q plots for log variances for Thais data](https://reader033.vdocuments.mx/reader033/viewer/2022052106/60412acbe2295f639322b604/html5/thumbnails/204.jpg)
Table 7.4 Canonical root and vector ana for various models given in Tables 7.1 and 7.2.
Canonical root and vector for overall canonical variate analysis (cva) are also given.
The variables are denoted EW, ED and HW.
IEl x 107 c-root EW ED HW
cva transects I and II 4.59 7.00 12.66 -12.41
1(a) individual c-vectorsa transect I transect II 0.8220 2.16
2.08 5.69 7.11
13.31 12.22
-11.85 -12.11
1(b) common orientation - individual dispersal 0.8278 4.23 6.41 12.81 -11.99
1(b') coincidence and individual dispersal 0.8711 4.61 6.81 12.33 -12.08
1(c) common orientation - common dispersalb 0.9761 3.44 7.45 11.70 -11.16
1(c') coincidence - common dispersal° 1.0319 3.44 7.47 11.67 -11.16
1(d') overall coincidence 1.2892 2.79 6.39 11.70 -9.77
convergence after: a 3 iterations; b 3 iterations; c 5 iterations.
![Page 205: CANONICAL VARIATE ANALYSIS: SOME PRACTICAL ASPECTS by ... · 5.5 C-V plot for arctanh correlations for grasshopper data 137 5.6 I-A and Q-Q plots for log variances for Thais data](https://reader033.vdocuments.mx/reader033/viewer/2022052106/60412acbe2295f639322b604/html5/thumbnails/205.jpg)
204
7.4 Discussion of Some Practical Aspects
As shown in Section 7.2, the calculations for all models reduce
to application of the algorithm for the single-set case, though for
three models the solution given is iterative. The single-set algorithm
is just the usual canonical variate algorithm, as discussed in Section
1.5. Hence the procedures discussed in previous Chapters, such as the
use of robust estimates of means and of covariances rather than the
usual estimates, the adoption of shrunken estimator procedures, and
indeed adoption of a full robust M-estimator approach, could all be
implemented.
The iterative solution for the model specifying individual
orientation results from the assumption of common covariance matrix for
all sets. Initial estimates of Tit and Ftp are given by the individual
canonical vectors and roots for each set. Where parallelism obtains,
convergence typically seems to take place in three-five iterations.
However, for one data set examined, where the nature. of the canonical
vectors for the two sets considered differed appreciably, some thirty
iterations were required to achieve successive estimates of the canonical
roots within 10-s. For ithe'model specifying common orientation and
dispersal, an explicit canonical variate solution follows given estimates
of the Pot. When common orientation holds, convergence is typically in
two-three iterations, as in the example in Section 7.3.
The comparisons developed in Section 7.2 provide a detailed
examination of the nature of the interaction, by concentrating the
examination on the subspace containing much of the information for
discrimination. For the univariate two-factor analysis of variance,
the total variation is partitioned into main effects for A and for B
and the interaction AxB. When one factor, say B, is qualitative, single
![Page 206: CANONICAL VARIATE ANALYSIS: SOME PRACTICAL ASPECTS by ... · 5.5 C-V plot for arctanh correlations for grasshopper data 137 5.6 I-A and Q-Q plots for log variances for Thais data](https://reader033.vdocuments.mx/reader033/viewer/2022052106/60412acbe2295f639322b604/html5/thumbnails/206.jpg)
205
d.f. polynomial trends are often fitted. If these trends adequately
describe the variation, then parallelism of responses implies lack of
interaction. Moreover, by isolating single or few d.f. effects, the
resulting comparisons are usually more sensitive.
In this Chapter, a separate canonical variate plane is determined
for each set. If the planes do not differ in orientation (i.e. if they
are parallel), analogy with the above would suggest examining the
position of the planes, since parallelism above implies lack of
interaction. However, a fundamental difference is that the canonical
vectors define linear combinations of the variables, rather than of
powers of a single variable as for the polynomial trends. Both axes
in Figure 7.1 represent variables, rather than one axis representing a
response variable and the other levels of the quantitative factor as
in the univariate case.
It is assumed in this Chapter that the p canonical vectors adequately
describe the variation between the group means, so that the model (7.7)
is a reasonable one. With this assumption, the significance of the
set x groups interaction for multivariate data is related to the
parallelism of the discriminant planes. But parallelism does not in
itself imply lack of sets x groups interaction. Consideration of the
interaction term ukt - uk• - u.t + u and/or Figure 7.1 shows that the
added condition of common dispersal of means in the discriminant plane
is also required for the interaction to be null (compare Figures 7.1(b)
and 7.1(c)). This latter condition implies equality of the corresponding
canonical roots. Given common orientation and common dispersal, the
added condition of overall coincidence of the planes (viz. 1(d')) results
in a null set effect, as in analysis of covariance and in the partitioning
of interaction for the univariate case. Since the interaction and main
effect are considered in the space of the canonical vectors of interest,
![Page 207: CANONICAL VARIATE ANALYSIS: SOME PRACTICAL ASPECTS by ... · 5.5 C-V plot for arctanh correlations for grasshopper data 137 5.6 I-A and Q-Q plots for log variances for Thais data](https://reader033.vdocuments.mx/reader033/viewer/2022052106/60412acbe2295f639322b604/html5/thumbnails/207.jpg)
206
a more sensitive examination of the nature of the sets x groups effect
should in general result.
![Page 208: CANONICAL VARIATE ANALYSIS: SOME PRACTICAL ASPECTS by ... · 5.5 C-V plot for arctanh correlations for grasshopper data 137 5.6 I-A and Q-Q plots for log variances for Thais data](https://reader033.vdocuments.mx/reader033/viewer/2022052106/60412acbe2295f639322b604/html5/thumbnails/208.jpg)
207
CHAPTER EIGHT: CANONICAL VARIATE ANALYSIS WITH UNEQUAL COVARIANCE
MATRICES
In this Chapter, canonical variate analysis is extended for use
when the covariance matrices are not assumed to be equal. Linear
combinations of variates are derived, in Section 8.2.1, by generalizing
the weighted between-groups SSQ approach, in Section 8.2.2 by
generalizing the likelihood ratio test and the associated non-centrality
matrix, and in Section 8.2.3 by generalizing the functional relationship
formulation. Function minimization routines must be used for the solution
to two of the generalizations. Computational aspects are discussed in
Section 8.3. In Section 8.4, the usual solution and the first two
generalizations are compared via generated data for a few typical
configurations of means in a situation in which the covariance matrices
are in fact equal. The MSE of the canonical variate coefficients and
group means for the generalizations are approximately three times those
for the usual solution, due to the corresponding changes in the variances.
Section 8.5 outlines some possible approaches for comparing the
generalized solutions with the usual solution. There was not time to
consider this in detail. An example is discussed in Section 8.6.
8.1 Introduction
Ideally, bivariate scatter plots of pairs of canonical variates
should exhibit approximately uncorrelated clusters for each group, with
unit standard deviation within each group. Unfortunately, this does not
always occur; there can be marked differences both in the scatter and
in the correlations of the scores. The idea of forming linear combinations
of the variables is widely accepted in practice. Because of this and the
![Page 209: CANONICAL VARIATE ANALYSIS: SOME PRACTICAL ASPECTS by ... · 5.5 C-V plot for arctanh correlations for grasshopper data 137 5.6 I-A and Q-Q plots for log variances for Thais data](https://reader033.vdocuments.mx/reader033/viewer/2022052106/60412acbe2295f639322b604/html5/thumbnails/209.jpg)
208
attendant simplicity of the representations, generalizations of the
usual solution which lead to a representation by linear combinations
of the original variables are considered.
There are some obvious heuristic approaches which can be adopted
to examine the effect of within-group heterogeneity on the description
of between-group differences by a small number of linear functions.
One approach is to compare the analyses using different estimates of the
within-groups SSQPR matrix. The latter could be the robust estimate
used in Chapter Five; the matrix calculated by pooling over all but one
of the groups; the covariance matrix for each group in turn, provided
sample sizes are large enough; or the matrix calculated by pooling subsets
of covariance matrices, weighting in various ways if sample sizes are
unequal. Another approach is to base the calculation of Mahalanobis DZ
for each pair of groups on the overall pooled covariance matrix, or on
the covariance matrix calculated by pooling only the matrices for the
two groups involved. The ordinations from a principal coordinates
analysis of the matrices of D2 values can be compared (Campbell and
Mahon, 1974). The recent results of Constantine and Gower (1978) on
the analysis of asymmetric matrices suggest determining D2 twice for
each pair of groups, using each of the covariance matrices in turn as
the within-groups matrix, and examining the degree of asymmetry in the
representations. With the use of alternative estimates of the within-
groups metric, or the determination of D2 twice for each pair of groups,
a large number of ordinations can be produced; and these must then be
compared. Gower (1971, 1975) suggests comparing ordinations by
minimizing the distance between the group means in the simplified
representations (see also Sibson, 1978). To date, detailed guidelines
are not available for interpreting the resulting measure.
Ideally, procedures should consider the covariance structure for
![Page 210: CANONICAL VARIATE ANALYSIS: SOME PRACTICAL ASPECTS by ... · 5.5 C-V plot for arctanh correlations for grasshopper data 137 5.6 I-A and Q-Q plots for log variances for Thais data](https://reader033.vdocuments.mx/reader033/viewer/2022052106/60412acbe2295f639322b604/html5/thumbnails/210.jpg)
209
each group in relation to the position of the mean. It may be that in
certain situations, differences in covariance structure have little or
no effect on major directions of between-group variation. Possible
examples include when a group with different covariance structure is
widely separated from the remaining groups; and when only one or a few
variables result in the differences, with the variables contributing
little to the discrimination. The solutions proposed in Section 8.2
generalize the usual canonical variate solution by associating the
covariance matrix for each group with the corresponding mean in the
various formulations in Section 1.2.
The differences in covariance structure are assumed to reflect
real biological or physical differences in the underlying variability of
the populations. This is in contrast with the situation where appropriate
transformation of the data achieves reasonable equality of covariance
structure, as will occur for example when there is systematic change
of variances with means. Of course, the effect of a transformation on
any distributional assumptions must also be considered.
8.2 Generalizations of the Usual Solution
8.2.1 Weighted between-groups formulation
Let xk represent the mean for the kth group, and Vk the covariance
matrix, where Vk = (nk-1)-1Sk and Sk is defined in (1.1).
Then for any linear combination cTx, the mean and variance for the
kth group are cTxk and cTVkc .
Define a weighted mean by
5 g c 1 ={ E nk(cTVkc)-lc k}/{ E nk(c
TVkc)-1}
kul k=1 (8.1)
![Page 211: CANONICAL VARIATE ANALYSIS: SOME PRACTICAL ASPECTS by ... · 5.5 C-V plot for arctanh correlations for grasshopper data 137 5.6 I-A and Q-Q plots for log variances for Thais data](https://reader033.vdocuments.mx/reader033/viewer/2022052106/60412acbe2295f639322b604/html5/thumbnails/211.jpg)
210
here the weights are the inverse of the sample variances for the
linear combination.
Then by analogy with the usual one-way analysis of variance with
known weights, an appropriate weighted between-group SSQ to consider is
E nk (cTVkc) -1(c c - c 1) 2
k=1 (8.2)
g E nk(cTVkc)-lcTBkc
k=1
where
Bk k xI) (xk-x1 )
The SSQ in (8.2) could be defined with general weights wk replacing the
nk, and in particular with wk = constant. There is no direct
generalization of the Rao extension to canonical variates (see Section
1.2).
Maximization of (8.2) w.r.t. c will lead to coefficients c1 and a
maximized ratio f1; these will be termed the canonical vector and
canonical root for the generalization of the weighted between-groups
formulation. When each Vk is replaced by the pooled covariance matrix
VP = n-1W (the latter being defined in (1.3) and (1.4)), the usual
canonical variate formulation in (1.8) results.
In the usual canonical variate analysis, the assumption of common
covariance matrix leads to the pooled SSQPR matrix W as the appropriate
scaling metric for successive canonical vectors. However, for the
generalization discussed here, there is no obvious associated scaling
metric for successive vectors. The approach adopted here is to introduce
some average covariance matrix VA and to choose successive vectors ci
![Page 212: CANONICAL VARIATE ANALYSIS: SOME PRACTICAL ASPECTS by ... · 5.5 C-V plot for arctanh correlations for grasshopper data 137 5.6 I-A and Q-Q plots for log variances for Thais data](https://reader033.vdocuments.mx/reader033/viewer/2022052106/60412acbe2295f639322b604/html5/thumbnails/212.jpg)
211
to maximize (8.2) subject to the constraint (ci)TVAc = 0, j < i.
Some possible choices of VA are the usual pooled within-groups
covariance matrix VP; the inverse of the sum of the inverses of the
individual SSQPR matrices Sk (or covariance matrices Vk); and some
form of weighted average of the individual covariance matrices. Another
possibility is to backtransform the robust midmeans of the log variances
and of the arctanh correlations, and recombine to form a covariance
matrix VR. The latter is used in the example in Section 8.6.
Write vck = cTVkc and cvk = Vkc; then the derivatives of the
weighted between-groups criterion w.r.t. the vector of coefficients c
are given by
2 E nkv-2 cTBkc cvk - 2 E n
kvckB kc . k=1 k=1
(8.3)
8.2.2 Likelihood ratio formulation
Consider g independent v(uk,Ek) populations, and let xkm be a
vector of observations from a sample of size nk from the kth population,
with m = 1,...,nk.
The relevant part of the log likelihood may be written as
g - E nk logIEkI - tr E Ek1Sk - E nk(xk uk)TEkl(xk-uk) , k=1 k=1 k=1
(8.4)
with Sk and xk defined in (1.1) and (1.2). Then j = xk. Differentiation
of (8.4) w.r.t. Ek gives
-1 Ek = nk Sk = Vk (8.5)
![Page 213: CANONICAL VARIATE ANALYSIS: SOME PRACTICAL ASPECTS by ... · 5.5 C-V plot for arctanh correlations for grasshopper data 137 5.6 I-A and Q-Q plots for log variances for Thais data](https://reader033.vdocuments.mx/reader033/viewer/2022052106/60412acbe2295f639322b604/html5/thumbnails/213.jpg)
(270
___ ,., L nk/2
-nv/2 II
IVk+Bk 1 e
k=1
(8.10)
212
Hence the maximized likelihood becomes
(27)-nv/2 =1
II 2 e-nv/2
k Vk (8.6)
with 2 gv(v+1) + gv estimated parameters.
Now consider the hypothesis of equality of mean vectors. Replace
the uk in (8.4) by u and differentiate w.r.t. p to give
u = ( E nk Ekl )-1
E vkEklXk . k=1 k=1
(8.7)
Write
Bk = (xk-u) (xk-u) T . (8.8)
Then the maximum likelihood estimator of Ek is given by
nkEk = Sk + nkBk
or
A
Ek = Vk + Bk . (8.9)
Hence the maximized likelihood becomes
with Z gv(v+l) + v estimated parameters.
The determinant in (8.10) may be written as
I Vk+BkI = IVkIII+Vk1BkI
A T -1 — " = IVk 1 {1 + (xk-u) Vk (xk-u)}. (8.11)
![Page 214: CANONICAL VARIATE ANALYSIS: SOME PRACTICAL ASPECTS by ... · 5.5 C-V plot for arctanh correlations for grasshopper data 137 5.6 I-A and Q-Q plots for log variances for Thais data](https://reader033.vdocuments.mx/reader033/viewer/2022052106/60412acbe2295f639322b604/html5/thumbnails/214.jpg)
213
The ratio of maximized likelihoods is, from (8.6), (8.10) and
(8.11) ,
II {i + (xk-p) TVkl k (x ū ) } -nk/2
k=1
An alternative test statistic, the generalization of Hotelling's
trace statistic, is tr(M), where
g M E nkVklBk .
k=1 (8.12)
The distribution of tr(M) is discussed further below.
The matrix M can be used in a practical way, as the analogue of the
sample non-centrality matrix in canonical variate analysis. An eigen-
analysis of the square non-symmetric matrix M will produce eigenvectors
and eigenvalues, and these will be termed the canonical vectors and
canonical roots for the generalization of the likelihood ratio
formulation.
The non-centrality matrix M involves B . As (8.7), (8.8) and (8.9)
show, the maximum likelihood solution is iterative. James (1954) and
Chakravarti (1966) replace Ek in (8.7) by Vk to give xW in (8.13) below.
Chakravarti (1966) examines the level and power of the statistic tr(M).
This statistic differs from the usual trace statistic in that the Vk
replace Vp, and the calculations for the between-groups components
involve a weighted overall mean. Chakravarti (1966) shows that the
important modification is the replacing of the pooled sample covariance
matrix by the individual group covariance matrices. The choice of the
weighted mean is less important.
Consider now the non-centrality matrix M for a linear combination
cTx. Then, as in Section 8.2.1, the mean and variance for the kth group
![Page 215: CANONICAL VARIATE ANALYSIS: SOME PRACTICAL ASPECTS by ... · 5.5 C-V plot for arctanh correlations for grasshopper data 137 5.6 I-A and Q-Q plots for log variances for Thais data](https://reader033.vdocuments.mx/reader033/viewer/2022052106/60412acbe2295f639322b604/html5/thumbnails/215.jpg)
214
T- are c k and cTVkc respectively. Let the overall mean be c k,
with choice of xo discussed below. Write Bk = (xk x0)(x -x0)T.
Then the scalar quantity corresponding to M in (8.12) for the linear g
combination cTx is E nk(cTVkc)-1cTBkc. If x0 is set equal to the k=1
quantity in (8.2) results. Examination of generated data in Section
8.4 shows that the weighted between-groups and likelihood ratio
generalizations give very similar performance for the first vector when
the covariance matrices are assumed to be equal.
Each of the generalizations discussed in this Section has one or
more drawbacks in its practical implementation. For the non-centrality
matrix generalization, some of the eigenvalues may be complex (see
next paragraph). From experience gained with the approach to date, this
seems to occur only for the smaller roots and when the group separation
in the corresponding directions is minimal. In the usual solution,
successive vectors are chosen to be uncorrelated within groups. Here,
successive vectors are sometimes highly correlated within groups.
The roots and vectors of M need not be real. In general, M cannot
be written as the product of two square symmetric matrices. However,
for two groups, the canonical root is always real. To see this, write
XW = ( nkVkl)-1 E nkVklxk I
k=1 k=1 (8.13)
weighted between-groups mean xI in (8.1), the weighted between-groups
and
-1 Zk = nkVk , Zs = k lE Zk . (8.14)
When g = 2, xl-xW = ZS1Z2dx, with dx = x1-x2, and x2-xW = -ZS1Zldx .
Since Zs -1
= Z11(Z11
+Z21)Z21 = Z21( Z11+Z21)Z11, it follows that
![Page 216: CANONICAL VARIATE ANALYSIS: SOME PRACTICAL ASPECTS by ... · 5.5 C-V plot for arctanh correlations for grasshopper data 137 5.6 I-A and Q-Q plots for log variances for Thais data](https://reader033.vdocuments.mx/reader033/viewer/2022052106/60412acbe2295f639322b604/html5/thumbnails/216.jpg)
215
M = Z1B1 + Z2B2 = (Z11+Z21)-ld
xdxT
= (ni V1+n21V2)-1dxdx .
The eigenvalue of M is given by dx(ni V1+n21N2)-1dx'
Note that nk1Vk is the sample covariance matrix of the vector of
means xk; James (1954) refers to the above as the multivariate analogue
of the Behrens-Fisher problem. James (1954, Section 7) develops improved
X approximations for tr(M). With xW as in (8.13) and Zk and Zs as in
(8.14), write
Then
Bk = (xkxw) (xk-xw) T • (8.15)
tr(M) = tr E nkVklBk = E (xk xW)TZk (xk-xW) . k=1 k=1
The improved x2 approximation is of the form Xv(g-1)(ho+hl X(
g-1))
where
h0 = 1 + {2v (g-1) }-1 E (nk 1) -1 {tr (I-Zs1Zk) }2 k=1
and
g h1 = [2v (g-1) {2v (g-1)+2}]-1 [ E (nk 1) -1tr (I-Zs1Zk) 2
k=1
+ 2 E (nk1)-1{tr (I-Zs Zk) }2] . k=1
![Page 217: CANONICAL VARIATE ANALYSIS: SOME PRACTICAL ASPECTS by ... · 5.5 C-V plot for arctanh correlations for grasshopper data 137 5.6 I-A and Q-Q plots for log variances for Thais data](https://reader033.vdocuments.mx/reader033/viewer/2022052106/60412acbe2295f639322b604/html5/thumbnails/217.jpg)
216
For two groups, this becomes
2 1 h0 = 1 + (2v)-1 l E (nk-1)-1{tr (Vs nk vk) }2
k=1
and
2 h1 = {2v(2v+2)}-1 E (nk-
1)-1(tr(Vslnk'Vk)2 + {tr(Vs1nklVk)}2], k=1
where 2
= E nk1Vk . Vs k=1
8.2.3 Functional relationship formulation
Consider again g independent v-variate v(etr,Ek) populations.
Assume that the v 1 vectors of population means uk are specified by
the model
uk = 110 (8.16)
where T is again the vxp matrix of population canonical vectors. The
model (8.16) associates each population mean with its own covariance
matrix through the postulated multivariate Gaussian form and through
the direct association of each group with its own basis vectors. This
formulation of the model is not as intuitively acceptable as that in
(1.13). A more obvious generalization of the model in (1.13) would be
to replace Ek in (8.16) by some fixed scaling metric EAVE, with the
vectors chosen so that TTEAVE
T = I. However the resulting maximum
likelihood estimators are considerably more complicated than those given
below. Because of its relative algebraic tractability, the formulation
in (8.16) is considered here. The derivation parallels that given in
![Page 218: CANONICAL VARIATE ANALYSIS: SOME PRACTICAL ASPECTS by ... · 5.5 C-V plot for arctanh correlations for grasshopper data 137 5.6 I-A and Q-Q plots for log variances for Thais data](https://reader033.vdocuments.mx/reader033/viewer/2022052106/60412acbe2295f639322b604/html5/thumbnails/218.jpg)
217
Section 1.3.
The relevant part of the log likelihood is
g — - E n log I E I - E tr Ek
-15 - E n (x -u -E TC )
TE-1 (x -u -E TC ) .
k=1 k k k=1 k k-1 k k O k k k k O k k
Differentiation w.r.t. ~k
gives
A k ('Y T ZJ Y) -1'Y T (xk-UO) .
Write
Pk = EkT (TT kT) --1TT
. (8.17)
Since (I-Pk)TE-1(I-Pk) = Ekl(I-P
k), the log likelihood may be
written as
- E nk loglEkl - E tr EklSk
- E nk(xk-u0)TEkl(I-Pk)(xk-u0) • k=1 k=1 k=1
Differentiation w.r.t. u0 gives
{ E nkEkl
(I-Pk) }ū0 =
E nkEkl (I-Pk)xk . k=1 k=1
Write
Bk = (xk-u0) (xk lt0)T•
(8.18)
(8.19)
Then the log likelihood may be written as
- E 1og!E I - tr E Ek-1S - tr E n E-lBF + tr E n E -1 P BF
k=1 k
k=1 k k=1 k k k
k=1 k k k k
(8.20)
![Page 219: CANONICAL VARIATE ANALYSIS: SOME PRACTICAL ASPECTS by ... · 5.5 C-V plot for arctanh correlations for grasshopper data 137 5.6 I-A and Q-Q plots for log variances for Thais data](https://reader033.vdocuments.mx/reader033/viewer/2022052106/60412acbe2295f639322b604/html5/thumbnails/219.jpg)
218
Differentiation w.r.t. Ek gives
n2k Sk + nkBk-EkY' (~YTEk'Y)-1`YTnkBk~' ('1 Ek'Y) 1'v Ek . (8.21)
Pre- and post-multiplication of (8.21) by 'YT and by T gives
TA 'YTEkY' = ~, Tnk1SkT = 'YTVk'Y (8.22)
while postmultiplication of (8.21) by 'Y and substitution of (8.22) gives
EkT _ (Vk+Bk) Y'{tvT (Vk+Bk) % }-l'Y TVk'Y. (8.23)
Write
Tk = Vk + Bk
and substitute (8.22), (8.23) and its transpose in (8.21) to obtain
Ek = Tk - Tk'Y ('YTTk'Y) -1'YTBkT ('YTTk'Y) -1'YTTk . (8.24)
The determinant of Ek in (8.24) can be written as
Unfortunately, Ek itself must be calculated, since it occurs in (8.18).
Hence the computational savings implicit in the expression for Itki
cannot be realized.
To obtain the maximum likelihood estimator of 'V, it is necessary to
introduce some conditions or constraints on T. A natural one in the
context of canonical variate analysis is to choose some suitable metric
![Page 220: CANONICAL VARIATE ANALYSIS: SOME PRACTICAL ASPECTS by ... · 5.5 C-V plot for arctanh correlations for grasshopper data 137 5.6 I-A and Q-Q plots for log variances for Thais data](https://reader033.vdocuments.mx/reader033/viewer/2022052106/60412acbe2295f639322b604/html5/thumbnails/220.jpg)
219
VR and require that WT VR'~Y = I. In fact, as discussed below, the
choice VR = I will suffice for the actual optimization.
So differentiate (8.20) with respect to 'Y, subject to the
constraint that `YTY = I, and let M be a symmetric pxp matrix of Lagrange
multipliers; the resulting likelihood equation is
- E
ATA -1AT n B y('YTE T)-1Y TE k=l k k kk k k
g •TA ^ -1"T F AT + E (Y' EkW) ' nkBk + MW = 0 . k=1
(8.25)
Postmultiplication by 'y gives M = 0, so that, on using (8.22) and
(8.23), the likelihood equation becomes
g AT w -1AT FA AT A -1AT g AT A,-1AT F E ('Y Vk~y) nkBk~Y (W Tk1') 'Y Tk - E (T VkT) 'Y nkBk = 0
k=l k=1 (8.26)
but no further effective simplification appears possible.
Consider again the orthogonality constraint TTy = I introduced
above. If instead the required condition is WOVRWO = I, then
A tTVRT = QR say = UQEQUQ for the eigenanalysis of and and so
-1/2 EQ UTQyTVR~I'UQEQ1/2 = I and hence y0 = ~iUQEQI~2
The numerical maximization of the log likelihood in (8.20) reduces
g to choosing y to minimize L nu k loglĒkl with gk given by (8.24). The
k=1 derivative of the log likelihood is given by (8.41). The maximized
likelihood is invariant under orthogonal (but not orthonormal) rotation
of the original variables (see Section 8.3). This results in a considerable
advantage computationally, since the likelihood can then be maximized
for each vector ~ conditionally on the previous 1'1 ,. .. ,,i-1. Further
discussion is given in Section 8.3.
![Page 221: CANONICAL VARIATE ANALYSIS: SOME PRACTICAL ASPECTS by ... · 5.5 C-V plot for arctanh correlations for grasshopper data 137 5.6 I-A and Q-Q plots for log variances for Thais data](https://reader033.vdocuments.mx/reader033/viewer/2022052106/60412acbe2295f639322b604/html5/thumbnails/221.jpg)
220
If the Ek are assumed known, the functional relationship
formulation reduces to choosing V to maximize the last term of (8.20).
-1~TBk,
this becomes
E nk(*TEk*)-ljT(xk-ū0)(xk-po)Tp. If the sample Vk replace the Ek k=1
results.
If the Ek are assumed known and equal to VP = nW1W, (8.25) reduces
to the usual canonical variate solution in (1.10).
8.3 Computation of the Generalized Solutions
The likelihood ratio-non-centrality matrix generalization requires
the eigenvalues and vectors of an unsymmetric matrix. I have used the
NAG routine FO2AGF for the eigenanalysis.
The weighted between-groups and functional relationship generalizations
require explicit use of function minimization/maximization routines.
I have experienced considerable difficulty in developing an effective
overall computing procedure, and this has hampered adequate evaluation
of the various generalizations. The current program has an option for
using either the Simplex procedure described by Nelder and Mead (1965)
as implemented in the NAG routine EO4CCF or Powell's hybrid steepest
descent/quasi Newton method as implemented in the NAG routine E04DCF.
The Simplex procedure is generally considered to be inefficient when
compared with gradient methods when more than a few variables are
involved. It does, however, have the advantage of being relatively
insensitive to poor initial estimates.
The functional relationship formulation encompasses the situations
where no restrictions are placed on the mean vectors and where all
g From (8.17), this can be written as tr E nk(TTEkT)
_ — k=1
= tr E nk (Y'TEkW) -1 'T (x
k-u0) (xk u0) T'1 T. When p = 1, g k=1 _
A
and if xI replaces u0, the weighted between-groups quantity in (8.2)
![Page 222: CANONICAL VARIATE ANALYSIS: SOME PRACTICAL ASPECTS by ... · 5.5 C-V plot for arctanh correlations for grasshopper data 137 5.6 I-A and Q-Q plots for log variances for Thais data](https://reader033.vdocuments.mx/reader033/viewer/2022052106/60412acbe2295f639322b604/html5/thumbnails/222.jpg)
221
vectors are assumed equal. The maximized likelihood for the functional
relationship with 1 < p < h will be between those for the usual
unrestricted and restricted hypotheses in (8.6) and (8.10).
The function maximization can be carried out on the original
variables, perhaps standardized with respect to some average covariance
matrix VA to unit variance, or on orthogonal or on orthonormal
transformations of the original or standardized variables. The effect
of the various possible transformations on the maximized likelihood
can be found by considering the effect on IEk !. Let VA = SARASA be
the decomposition of VA in terms of the diagonal matrix of standard
deviations SA and correlation matrix RA, and let VA = UAEAUAT
and
RA = URERUR be eigenanalyses of VA and of RA. It is straightforward to
establish the following table of determinants for the possible
transformations; each entry gives the determinant in terms of the
original determinant I ki and the determinants of the matrix of standard
deviations and/or of eigenvalues.
original orthogonal orthonormal
original
12k1 IEk!
IEAI-l1Ek 1
(8.27)
standardized
ISA!-2IEk! ISAI-21 2k1 I SA
!-2
-1 I Ek I
IEA I-l
lEk !
The bottom right-hand entry follows from IEA I = IVA I = ISA I 2IRA I = ISA I 2IER
Hence for an orthonormal transformation of the original variables, the
maximized likelihoods are given by
IEAI11/2
times the maximized likeli-
hoods in (8.6) and (8.10).
Successive vectors for the weighted between-groups and functional
relationship generalizations are to be chosen to satisfy ciVAci = 1
![Page 223: CANONICAL VARIATE ANALYSIS: SOME PRACTICAL ASPECTS by ... · 5.5 C-V plot for arctanh correlations for grasshopper data 137 5.6 I-A and Q-Q plots for log variances for Thais data](https://reader033.vdocuments.mx/reader033/viewer/2022052106/60412acbe2295f639322b604/html5/thumbnails/223.jpg)
222
and ciVACj = 0, j = 1,...,i-1. In the functional relationship A
derivation, * = ci. Alternatively, consider variables orthonormal
with respect to VA. Write a = EĀ"2UĀc, as in Section 1.4. Then
aiai = 1 and a.aj = 0.
Initially, the constraints were incorporated by substitution for
one of the coefficients, with the variables orthonormalized. Specifically,
for the first coefficient vector, all coefficients are divided by the
largest component, and the maximization proceeds for the remaining
v-i coefficients. At each iteration, alai = 1 is used to solve for the
excluded component. For the second coefficient vector, the coefficients
are divided by the largest component ignoring the already excluded
component, and the maximization proceeds for the remaining v-2 coefficients.
At each iteration, a2a2 = 1 and a2a1 = 0 is used to solve for the two
excluded components. This procedure leads to an unconstrained
minimization, though the way in which the constraints are accommodated
seems to lead to relatively poor performance of the routines.
Very limited empirical evidence suggests that a more effective
procedure results if the orthogonality constraint ciVAcj = 0 is
accommodated by explicit projection of the data orthogonal to the
previous cj. Assume that the first vector, cl, has been found and that
ciVACj = 1. The residual projection operator Rl = I - VAc1ci projects
the observations x onto the space orthogonal With respect to VA) to
cl. But VĀ2 = RlVAR1 will now be of rank v-1. So form the eigenanalysis
VA2 = UA2EA2UA2 and set VA2 - UT A2,v-1VA2UA2,v-1 where UA2,v-1 denotes
the first v-1 columns of UA2. The (v-1)x(v-1) matrix VA2 will be of
full rank. Now carry out the maximization to determine c2, the
coefficients for the variables UT v-1Rlx
, and scale c2 so that
PTPP VP c2 = 1. Then the required c2 is given by c2 = R1
T (c2) UA2,v-1 2 c2. 2
Note that c2VAcl (c2)TUA2,v-1R1VAc1 Oand c2VAC2 = (c2)TUA2,v-1R1V ART 1
UA2,v-lc2 = (c2)TVA2c2 = 1 as required. In general, with
![Page 224: CANONICAL VARIATE ANALYSIS: SOME PRACTICAL ASPECTS by ... · 5.5 C-V plot for arctanh correlations for grasshopper data 137 5.6 I-A and Q-Q plots for log variances for Thais data](https://reader033.vdocuments.mx/reader033/viewer/2022052106/60412acbe2295f639322b604/html5/thumbnails/224.jpg)
223
P RTU . R. U , the n c. = P.c.. Note tha t P = I. i 1 A2,v-1 • . 1-1 Ai,v-i+1 1 1 1 1
When the orthonormal form is used, VA = I and Ri = Ri = I-aia..
The length constraint for each vector is accommodated in the actual
maximization routine by using polar coordinates A. The constraint
cTc = 1 is adopted for the maximization and the vector resealed so
that cTVAc = 1.
The original means and covariance matrices can be retained for the
actual calculation of the log likelihood at each iteration. Specifically,
for the ith vector and for any iteration, calculate ci from Ai, and then
ci = Pic.. For the gradient calculations, the chain rule can be used
to form the derivatives for the polar coordinates from those for the
original variables. Specifically, taL/ae)T is given by
(9L/aci)TP(acP/a8i).
The alternative computational approaches are currently being
evaluated.
8.4 Performance of the Generalizations when the Covariance Matrices
are Equal
The performance of the weighted between-groups and likelihood ratio
generalizations and the usual solution is studied for some general
situations using computer-generated data. The functional relationship
generalization was developed after this aspect was completed; because
of the computing difficulties discussed in Section 8.3 and the similarity
of the results for the two generalizations, I have not carried out the
calculations for the third generalization. The situations examined are:
(i) one group differs from the rest, either in one or all variables;
(ii) two directions are of interest, the first two groups differing
from the rest, each on only one or two variables; (iii) a simple
bivariate configuration with groups symmetric about the 1 direction;
![Page 225: CANONICAL VARIATE ANALYSIS: SOME PRACTICAL ASPECTS by ... · 5.5 C-V plot for arctanh correlations for grasshopper data 137 5.6 I-A and Q-Q plots for log variances for Thais data](https://reader033.vdocuments.mx/reader033/viewer/2022052106/60412acbe2295f639322b604/html5/thumbnails/225.jpg)
224
and (iv) configurations corresponding to two actual data sets. The
population covariance matrix is taken as the identity matrix; for the
actual data sets, the orthonormal variable configurations are used.
The simulations are blocked in that each of the three solutions are
calculated for the same generated data set. The independent Gaussian
observations are generated by the polar method of Marsaglia and Bray
(1964). The generation of the covariance matrices is via the Bartlett
decomposition (Newman and Odell, 1971, Section 5.2). The vectors of
means are produced by dividing the vectors of generated observations
by the assumed sample size and adding the vectors of population means.
Usually, 100 sets of data are generated. I have used the MSE of the
coefficients and of the group means for each canonical variate to
compare the solutions. The vectors are scaled to have unit length.
Table 8.1 gives the results for a two direction configuration,
with the overall mean for the likelihood ratio solution calculated from
(8.13). The usual solution performs somewhat better than the
generalizations, both for componentwise MSE and overall MSE. The MSE,
means and SSQ's for the two generalizations are similar. The lower
MSE for the usual solution is due to the lower variance of the coefficients
and means. The bottom part of Table 8.1(a) and 8.1(b) gives the results
when the unweighted mean xT in (1.6) and the full maximum likelihood
iterative estimate in (8.7)-(8.9) are used. The full maximum likelihood
solution seems to perform badly. It may be that the component Bk,which
enters into the estimation of the Ek and hence of p, and which itself
A depends on u, is more sensitive to random fluctuations than when the
Ek are replaced by the sample Vk. The canonical roots also have lowest
variance for the usual solution and highest for the full maximum
likelihood solution. The approximately threefold change in MSE is again
evident when the sample size is reduced to 15 for each group. A seven
![Page 226: CANONICAL VARIATE ANALYSIS: SOME PRACTICAL ASPECTS by ... · 5.5 C-V plot for arctanh correlations for grasshopper data 137 5.6 I-A and Q-Q plots for log variances for Thais data](https://reader033.vdocuments.mx/reader033/viewer/2022052106/60412acbe2295f639322b604/html5/thumbnails/226.jpg)
225
Table 8.1 Summary of simulations to compare the usual canonical
variate solution with the weighted between-groups and
likelihood ratio generalizations. Results are for 100
runs of a three variate, four population configuration,
with ul = (4,0,0)T , u2 = (0,2,0)
T , p3 = U4
= 0, and
Ek = E = I. Sample size is 50 for each group.
(a) For this configuration *1 = (0.978, -0.208, 0)T and
*2 = (0.208, 0.978, 0)T, with canonical roots 621 and 100
128.8. The MSE is calculated as E (cmi-*i)
where m=1
cmi represents the ith component of the sample vector,
and * denotes either U, LR or WBG. The overall mean for
the likelihood ratio solution is calculated using (8.13).
The bottom part gives the results when the unweighted
mean in (1.6), denoted by superscript U, and the full
likelihood ratio iterative solution in (8.7)-(8.10),
denoted by M, are used.
(b) The population means for the canonical variates are
*Tp • 3.913, -0.415, 0,0 and *ZUk : 0.830, 1.956, 0, 0. 1 IC 100 The MSE is calculated as E {(c* k)m - *TUk}2 .
m=1
![Page 227: CANONICAL VARIATE ANALYSIS: SOME PRACTICAL ASPECTS by ... · 5.5 C-V plot for arctanh correlations for grasshopper data 137 5.6 I-A and Q-Q plots for log variances for Thais data](https://reader033.vdocuments.mx/reader033/viewer/2022052106/60412acbe2295f639322b604/html5/thumbnails/227.jpg)
226
8.1(a) comparison of vectors and roots
CVI MSE USL 0.0084 0.200 0.108 Sum = 0.316 LR 0.0274 0.561 0.284 0.872
WBG 0.0271 0.559 0.261 0.847
Mean USL 0.978 -0.202 -0.009 LR 0.975 -0.204 -0.010
WBG 0.975 -0.204 -0.013
SSQ USL 0.0085 0.198 0.101 (x100) LR 0.0263 0.565 0.276
WBG 0.0262 0.564 0.246
canonical roots: mean (SSQ) USL: 618.2 (1265); LR: 622.0 (2751); WBG: 621.1 (2676)
CVII MSE USL 0.011 0.0006 0.162 Sum = 0.174 LR 0.070 0.006 0.518 0.594 WBG 0.293 0.014 0.262 0.569
Mean USL 0.205 0.978 0.001 LR 0.207 0.975 0.000
WBG 0.205 0.976 -0.005
SSQ USL 0.011 0.0006 0.164 (x100) LR 0.071 0.0052 0.523
WBG 0.295 0.0137 0.262
canonical roots: USL: 128.1 (40.2); LR: 130.2 (124); WBG: 130.6
CVI MSE LRU 0.027 0.567 0.280 Sum = 0.874 LRM 0.024 2.642 0.444 3.110
Mean LRU 0.974 -0.205 -0.011 LRM 0.990 -0.076 -0.013
SSQ LR0 0.026 0.572 0.270 (x100) LRM 0.010 0.920 0.432
canonical roots: U: 625.4 (2894); M: 764.8 (7215)
(67)
![Page 228: CANONICAL VARIATE ANALYSIS: SOME PRACTICAL ASPECTS by ... · 5.5 C-V plot for arctanh correlations for grasshopper data 137 5.6 I-A and Q-Q plots for log variances for Thais data](https://reader033.vdocuments.mx/reader033/viewer/2022052106/60412acbe2295f639322b604/html5/thumbnails/228.jpg)
8.1(b) comparison of canonical variate means
227
CVI MSE USL 0.182 0.820 0.033 0.038 Sum = 1.072 LR 0.487 2.233 0.033 0.038 2.791 WBG 0.510 2.227 0.034 0.038 2.808
Mean USL 3.912 -0.403 -0.0027 -0.0021 LR 3.898 -0.408 -0.0025 -0.0021 WBG 3.899 -0.407 -0.0026 -0.0022
SSQ USL 0.183 0.813 0.033 0.038 (x100)
LR 0.469 2.251 0.033 0.038 WBG 0.495 2.243 0.033 0.038
CVII MSE USL 0.144 0.047 0.039 0.033 Sum = 0.264 LR 1.103 0.070 0.039 0.034 1.246 WBG 4.752 0.090 0.039 0.033 4.914
Mean USL 0.824 1.953 0.0005 0.0018 LR 0.832 1.950 0.0003 0.0010 WBG 0.823 1.951 0.0005 0.0018
SSQ USL 0.142 0.047 0.040 0.033 (x100) LR 1.114 0.067 0.039 0.034
WBG 4.794 0.083 0.039 0.033
CVI MSE LRU 0.489 2.258 0.033 0.038 Sum = 2.819 LRM 0.439 10.56 0.033 0.037 11.07
Mean LRU 3.897 -0.410 -0.0025 -0.0021 LRM 3.961 -0.152 -0.0025 -0.0020
SSQ LRU 0.469 2.278 0.033 0.038 (x100) LM 0.206 3.668 0.033 0.038
![Page 229: CANONICAL VARIATE ANALYSIS: SOME PRACTICAL ASPECTS by ... · 5.5 C-V plot for arctanh correlations for grasshopper data 137 5.6 I-A and Q-Q plots for log variances for Thais data](https://reader033.vdocuments.mx/reader033/viewer/2022052106/60412acbe2295f639322b604/html5/thumbnails/229.jpg)
228
variable run with changes in two variables viz 111
u2 = (0,0,1.8,1.8,0,0,0)T, p3 = 0, p4
= (0,0,0,0,0,0.4,0.4)T, and
sample sizes of 50 again shows an approximately threefold change in
MSE. For the first vector, the MSE of the coefficients is 1.251 for
the usual solution, 2.828 for the likelihood ratio solution using xW,
and 2.875 for the weighted between-groups solution. The differences
in MSE are again due to differences in the variances. For the group
means, the MSE are 4.23, 9.61 and 10.87 respectively. The means of the
canonical roots are 484.0, 495.3 and 491.6 respectively, for a population
value of 481.4; corresponding SSQs are 459, 1260 and 1140.
The results outlined above are paralleled for the other configurations
examined. The usual solution is preferable to the generalizations when
the covariance matrices do not differ, in that the generalizations show
greater variation for the canonical variate coefficients, roots and
group means.
8.5 Comparison of Solutions
An obvious question to ask is whether the descriptions given by
the generalized canonical variate solutions differ from the description
given by the usual canonical variate solution.
One approach is to consider the usual canonical vectors as
hypothetical vectors, and ask whether they provide discrimination which
is as good as that provided by a generalized solution. For the usual
situation, different formulations lead to the same statistic Av/Ap, as
outlined in Section 1.6. When the covariance matrices are not assumed
to be equal, the various formulations lead to different and often more
complicated solutions.
The maximized likelihood for the functional relationship formulation
U can be compared with the value obtained when C is substituted for 'Y in
![Page 230: CANONICAL VARIATE ANALYSIS: SOME PRACTICAL ASPECTS by ... · 5.5 C-V plot for arctanh correlations for grasshopper data 137 5.6 I-A and Q-Q plots for log variances for Thais data](https://reader033.vdocuments.mx/reader033/viewer/2022052106/60412acbe2295f639322b604/html5/thumbnails/230.jpg)
Ek = ( )
qp,k
Eqq,k
229
(8.24) (and hence (8.17), (8.18) and (8.19)). The relative magnitudes
of the changes in the likelihood using either 1 or CU can be examined.
A more formal comparison follows by comparing 2 log (ratio of likelihoods)
with the x2 distribution with vp-p2 d.f.; equivalently, set an asymptotic
confidence region for Y' of size a and determine whether CU is within
the region (see Cox and Hinkley, 1974, p.343, for further discussion).
In Section 1.6, the adequacy of hypothetical vectors is examined
by considering the equality of conditional means. The approach is
outlined here when the covariance matrices are not assumed to be equal.
The derivation follows that considered in Section 1.6, with an observation
xkm ti N v (uk, Ek) . Partition
Epp,k Epq,k
with a similar partition for Sk in (1.1). The maximized likelihood
for no restriction on the conditional means is easily shown to be
g -n /2 (27)
-np/2
k=1 Ink1Spp,k
k?
k e-np/2(2n)
-nq/2 nk
1s
k e-nq/2
-nk/2
. k 1 qq.p'
(8.28)
Using a determinantal identity similar to (1.50), this reduces to (8.6).
For the hypothesis specifying equality of the conditional means,
the unconditional part of the likelihood is unchanged, giving
Ēpp.k = nk1Spp.k and hence the first part of (8.28). The conditional
part of the log likelihood may be written
![Page 231: CANONICAL VARIATE ANALYSIS: SOME PRACTICAL ASPECTS by ... · 5.5 C-V plot for arctanh correlations for grasshopper data 137 5.6 I-A and Q-Q plots for log variances for Thais data](https://reader033.vdocuments.mx/reader033/viewer/2022052106/60412acbe2295f639322b604/html5/thumbnails/231.jpg)
230
g - kE1 nk log I Egq.p,k I
g nk E E f x -S x -u ) TE-1 (x -S x -u ) ,
k=1 m=1 q qp,k pkm q•P gq•P,k qkm qp,k pkm q•P
with
R = F. E-1
and E _ E - E E 1 E qp,k qp,k PP,k gq•P,k qq,k qp,k PP,k Pq,k
Differentiation w.r.t. uq.p gives
uq.p = ( E ng Egg•P,k
)-1 nk Egq•p,k(xgk - Sgp,kxpk) , (8.29)
k=1 k=1
while differentiation w.r.t. a gives qp,k
S x A qp,k
(S +n pp,k k
x pk pk
) = S +n qp,k k(x qk uq.p)xpk
(8.30)
Since the solution foruq•p involves Sgp,k, an explicit solution
is not possible.
Differentiation w.r.t. Eqq.p,k gg•p,k
nk
n E = E (x -R x - ) (x -S x -u ) T k gq•p,k m=l qkm qp,k pkm
u q.P qkm qp,k pkm q•P
(8.31)
this may be rewritten in terms of the sample means and covariance matrices.
Again the solution is iterative.
Because of the iterative solution in (8.29), (8.30) and (8.31), the
maximum likelihood estimator for Egq•p,k
in (8.31) does not reduce to
the ana.Logue of the equal covariance matrix case, namely T Hence gq•P,k.
the equivalent determinantal identity to (1.50) cannot be used to
simplify the maximized likelihood. Note that the generalized result
![Page 232: CANONICAL VARIATE ANALYSIS: SOME PRACTICAL ASPECTS by ... · 5.5 C-V plot for arctanh correlations for grasshopper data 137 5.6 I-A and Q-Q plots for log variances for Thais data](https://reader033.vdocuments.mx/reader033/viewer/2022052106/60412acbe2295f639322b604/html5/thumbnails/232.jpg)
231
requires explicit definition of the conditional variates xq as well
as the covariates x. The maximized likelihood is
-n /2
~ /
2 -nv/2 (2~)-nv/2
~nk15Pp.k l
k I gq•P,k1
e k=1
(8.32)
with 2 gv(v+1) + gp + q estimated parameters.
The intuitively acceptable notions of dimension and collinearity
given in Section 1.6 do not obtain here. Because of time limitations,
I have not examined this equality of conditional means approach.
8.6 Practical Application
Campbell and Payment (1978) have applied the shrunken estimation
techniques described in Chapter Six to data on the foraminifer
Afrobolivina afra from 46 borehole samples taken at approximately equal
depth intervals. Five of the nine variables measured contain much of
the discrimination. Group sizes are: 24, 12, 30, 28, 62, 31, 16, 16,
16, 15, 16, 16, 16, 12, 15, 12, 11, 26, 13, 26, 12, 39, 13, 18, 30, 17,
21, 42, 31, 28, 34, 13, 17, 14, 14, 22, 12, 14, 16, 13, 14, 15, 29, 31,
15 and 60. The total sample size is 997. The canonical roots for the
usual analysis based on five variables are 2.29, 0.64, 0.25, 0.08 and
0.05. The first canonical variate reflects depth changes down the
borehole, as shown by the solid line in Figure 8.1. The group variances
for the first canonical variate range from 0.2 to 3.7; the vector is
scaled so that the average variance is unity. The techniques discussed
in Chapter Five indicate highly significant differences in both variance
and correlation structure.
Figure 8.1 shows a plot of the group means for the first canonical
variate against depth for the usual solution and two of the generalizations.
There is very good agreement between the three profiles.
![Page 233: CANONICAL VARIATE ANALYSIS: SOME PRACTICAL ASPECTS by ... · 5.5 C-V plot for arctanh correlations for grasshopper data 137 5.6 I-A and Q-Q plots for log variances for Thais data](https://reader033.vdocuments.mx/reader033/viewer/2022052106/60412acbe2295f639322b604/html5/thumbnails/233.jpg)
232
Table 8.2 gives the maximized likelihoods for the unrestricted
and equal means hypotheses, corresponding to (8.6) and (8.10), and
for the functional relationship generalization, with p = 1 and p = 2,
resulting from (8.24). The difference in the maximized likelihoods
for p = 0 and p = v is 777. Of this, the first vector (p = 1)
explains some 80%. The values of the log likelihood when the usual
canonical vectors CU and likelihood ratio generalization vectors C LR
are substituted for '1' in (8.24) are also given. Because of the
iterative solution in (8.18), (8.19) and (8.24), even this calculation
may be relatively time consuming on the computer. For example, to
calculate the log likelihood with p = 1 and cU replacing *1 takes 15
iterations for the value to fall below 3199 and 35 iterations to fall
A below 3195. The value after the first iteration, with u calculated
using (8.13), is 3326.
The similarity of the various solutions is evident in the magnitudes
of the log likelihoods. The value of -3098 for Y' for p = 2 may be
slightly large, due to convergence difficulties; the values obtained
range from -3098 to -3125 using different initial estimates and
computing routines.
For this data set, the differences in covariance structure have
little effect on the ordination of the groups along the first canonical
variate. The first canonical root for the likelihood ratio-non-centrality
matrix generalization, given by (cLR)TMcr, is 5.18, compared with the
value of 2.29 for the usual solution. This difference reflects the
effect of the few groups with large variances on the pooled covariance
matrix for the usual solution. When the group variances for the canonical
variate(s) are similar, the two values will also be similar.
There are some practical problems to clear up before the approaches
described in this Chapter will be suitable for general use. Extensive
![Page 234: CANONICAL VARIATE ANALYSIS: SOME PRACTICAL ASPECTS by ... · 5.5 C-V plot for arctanh correlations for grasshopper data 137 5.6 I-A and Q-Q plots for log variances for Thais data](https://reader033.vdocuments.mx/reader033/viewer/2022052106/60412acbe2295f639322b604/html5/thumbnails/234.jpg)
233
Table 8.2 Maximized log likelihoods, excluding the factor
- 2 nv(l + log 2n), for the functional relationship generalization
in Section 8.2.3 and for the unrestricted (p = v) and usual null
(p = 0) hypotheses in Section 8.2.2. The log likelihoods when the
usual and likelihood ratio canonical vectors replace those for the
generalized functional relationship solution are also given.
uk = u0 (p = 0 in (8.16)) -3786
uk 0 + Ekiplck (p = 1) -3165
Pk = u0 + Ek Y k (p = 2) -3098
Pk =
k (p = v) -3009
uk =uo + EkciCk (p = 1) -3194
uk = u0
+ EkCU~k (p = 2) -3105
uk u0 + Ekc1LR~k (p = 1) -3174
uk =u0 + EkCLR k (p = 2) -3098
![Page 235: CANONICAL VARIATE ANALYSIS: SOME PRACTICAL ASPECTS by ... · 5.5 C-V plot for arctanh correlations for grasshopper data 137 5.6 I-A and Q-Q plots for log variances for Thais data](https://reader033.vdocuments.mx/reader033/viewer/2022052106/60412acbe2295f639322b604/html5/thumbnails/235.jpg)
0 234
9 8 0 1 2 3 4 5 6 7
Canonical Variafe I
• ---•-• functional relnship
~--- usuu I solution
• • likelihood ratio 5
•.
10
•
15
20
A 25
30
35
40
45
•
•
•
•:_•
••
Figure 8.1 Plot of first canonical variate means for each borehole sample versus depth for the usual solution and two of the generalizations.
![Page 236: CANONICAL VARIATE ANALYSIS: SOME PRACTICAL ASPECTS by ... · 5.5 C-V plot for arctanh correlations for grasshopper data 137 5.6 I-A and Q-Q plots for log variances for Thais data](https://reader033.vdocuments.mx/reader033/viewer/2022052106/60412acbe2295f639322b604/html5/thumbnails/236.jpg)
235
application of the functional relationship generalization to a number
of data sets, combined with detailed analysis of specific configurations,
is needed to develop general guidelines for the use of the approach.
Evaluation of proposals for comparing the solutions is also needed.
Time limitations and computing difficulties have precluded these aspects
of the study. Another problem requiring solution is the extension of
the use of shrunken estimators to the various generalizations. Here,
small cTBkc and small c Vkc for all groups would seem to be the natural
analogue. A further aspect needing clarification is the role of the
roots and particularly vectors from the likelihood ratio-non-centrality
matrix generalization. For one data set examined, the first two roots
were similar in magnitude, and the second vector duplicated the first.
The third root and vector paralleled the information given by the second
root and vector for the usual solution.
Despite the unresolved problems, the generalized approaches, and
particularly the functional relationship generalization, appear to be
useful for evaluating the effect of unequal covariance matrices on the
description provided by the canonical variates. This descriptive aim
should be clearly distinguished from that of allocation. Whereas
differences in orientation and size of the associated concentration
ellipsoids may have little effect on overall conclusions about group
differences and similarities, they may have a marked effect on the correct
allocation of individuals to the groups. This failure to distinguish
between allocation and description is a potential source of confusion
in the canonical variate and particularly discriminant analysis
literature. Geisser (1977) gives an excellent discussion of the two
(often quite distinct) aims (see his pp. 302-309).
![Page 237: CANONICAL VARIATE ANALYSIS: SOME PRACTICAL ASPECTS by ... · 5.5 C-V plot for arctanh correlations for grasshopper data 137 5.6 I-A and Q-Q plots for log variances for Thais data](https://reader033.vdocuments.mx/reader033/viewer/2022052106/60412acbe2295f639322b604/html5/thumbnails/237.jpg)
236
REFERENCES
AHMED, S.W. and LACHENBRUCH, P.A. (1977). Discriminant analysis when
scale contamination is present in the initial sample. In
Classification and Clustering (J. Van Ryzin, ed), pp. 331-353.
New York: Academic Press.
ALLDREDGE, J.R. and GILB, N.S. (1976). Ridge regression: an annotated
bibliography. int. Stat. Rev., 44, 355-360.
ANDERSON, T.W. (1951). Estimating linear restrictions on regression
coefficients for multivariate normal distributions. Ann. Math.
Statist., 22, 327-351.
ANDERSON, T.W. (1963). Asymptotic theory for principal component
analysis. Ann. Math. Statist., 34, 122-148.
ASHTON, E.H., HEALY, M.J.R. and LIPTON, S. (1957). The descriptive
use of discriminant functions in physical anthropology.
Proc. Roy. Soc. Lond. Ser. B, 146, 552-572.
ATKINSON, A.C. and PEARCE, M.C. (1976). The computer generation of
beta, gamma and normal random variables (with Discussion).
J.R. Statist. Soc. A, 139, 431-460..
BARNETT, V. and LEWIS, T. (1978). Outliers in Statistical Data.
New York: Wiley.
BARR, D.R. and SLEZAK, N.L. (1972). A comparison of multivariate
normal generators. Comm. ACM, 15, 1048-1049.
BARTLETT, M.S. (1938). Further aspects of the theory of multiple
regression. Proc. Camb. Phil. Soc., 34, 33-40.
BARTLETT, M.S. (1951). The goodness of fit of a single hypothetical
discriminant function in the case of several groups. Ann. Engen.,
16, 199-214.
![Page 238: CANONICAL VARIATE ANALYSIS: SOME PRACTICAL ASPECTS by ... · 5.5 C-V plot for arctanh correlations for grasshopper data 137 5.6 I-A and Q-Q plots for log variances for Thais data](https://reader033.vdocuments.mx/reader033/viewer/2022052106/60412acbe2295f639322b604/html5/thumbnails/238.jpg)
237
BARTLETT, M.S. and KENDALL, D.G.°(1946). The statistical analysis of
variance heterogeneity and the logarithmic transformation.
J. R. Statist. Soc. B, 8, 128-138.
BIBBY, J. and TOUTENBURG, H. (1977). Prediction and Improved Estimation
in Linear Models. New York: Wiley.
CAMPBELL, C.A. (1978). The frilled dogwinkle: ecological genetics of
a morphologically variable snail, Thais lamellosa. Ph.D. Thesis,
University of California, Davis.
CAMPBELL, N.A. (1976). A multivariate approach to variation in micro-
filariae: Examination of the species Wuchereria lewisi and demes
of the species W. bancrofti. Aust. J. Zool., 24, 105-114.
CAMPBELL, N.A. (1978). Multivariate analysis in biological anthropology:
some further considerations. J. Bum. Evol., 7, 197-203.
CAMPBELL, N.A. (1979). Some practical aspects of canonical variate
analysis. BIAS, 6 (to appear)
CAMPBELL, N.A. and ATCHLEY, W.R. (1979). The geometry of multivariate
analysis and its relation to the analysis of morphometric shape.
In preparation for Syst. Zool.
CAMPBELL, N.A. and DEARN, J.M. (1979). Altitudinal variation in, and
morphological divergence between, three related species of grasshopper.
Aust. J. Zool., 27 (to appear).
CAMPBELL, N.A. and MAHON, R.J. (1974). A multivariate study of variation
in two species of rock crab of the genus Leptograpsus. Aust. J.
Zool., 22, 417-425.
CAMPBELL, N.A. and REYMENT, R.A. (1978). Discriminant analysis of a
Cretaceous foraminifer using shrunken estimators. Math. Geol.,
10, 347-359.
CHAKRAVARTI, S. (1966). A note on multivariate analysis of variance
test when dispersion matrices are different and unknown.
•
![Page 239: CANONICAL VARIATE ANALYSIS: SOME PRACTICAL ASPECTS by ... · 5.5 C-V plot for arctanh correlations for grasshopper data 137 5.6 I-A and Q-Q plots for log variances for Thais data](https://reader033.vdocuments.mx/reader033/viewer/2022052106/60412acbe2295f639322b604/html5/thumbnails/239.jpg)
238
Calcutta Statist. Assoc. Bull., 15, 75-86.
CHAMBERS, J.M. (1977). Computational Methods for Data Analysis.
New York: Wiley.
CONSTANTINE, A.G. and GOWER, J.C. (1978). Graphical representation of
asymmetric matrices. Appl. Statist., 27, 297-304.
COX, D.R. (1968). Notes on some aspects of regression analysis.
a. R. Statist. Soc. A, 131, 265-279. COX, D.R. and HINKLEY, D.V. (1974). Theoretical Statistics. London:
Chapman and Hall.
DEVLIN, Susan J., GNANADESIKAN, R. and KETTENRING, J.R. (1975). Robust
estimation and outlier detection with correlation coefficients.
Biometrika, 62, 531-545.
ELSTON, R.C. (1975). On the correlation between correlations.
Biometrika, 62, 133-140.
FISHER, R.A. (1921). On the probable error of a coefficient of
correlation deduced from a small sample. Metron, 1, 1-32.
FISHER, R.A. (1936). The use of multiple measurements in taxonomic
problems. Ann. Eugen., 7, 179-188.
FISHER, R.A. (1938). The statistical utilization of multiple
measurements. Ann. Eugen., 8, 376-386.
GEISSER, S. (1977). Discrimination, allocatory and separatory, linear
aspects. In CIassification and Clustering (J. Van Ryzin, ed.),
pp. 301-330. New York: Academic Press.
GNANADESIKAN, R. (1977). Methods for Statistical Data Analysis of
Multivariate Observations. New York: Wiley.
GNANADESIKAN, R. and KETTENRING, J.R. (1972). Robust estimates,
residuals, and outlier detection with multiresponse data.
Biometrics, 28, 81-124.
![Page 240: CANONICAL VARIATE ANALYSIS: SOME PRACTICAL ASPECTS by ... · 5.5 C-V plot for arctanh correlations for grasshopper data 137 5.6 I-A and Q-Q plots for log variances for Thais data](https://reader033.vdocuments.mx/reader033/viewer/2022052106/60412acbe2295f639322b604/html5/thumbnails/240.jpg)
239
GOLDSTEIN, M. and SMITH, A.F.M. (1974). Ridge type estimators
for regression analysis. J.R. Statist. Soc. B, 36, 284-291.
GOWER, J.C. (1966). A Q-technique for the calculation of canonical
variates. Biometrika, 53, 588-590.
GOWER, J.C. (1971). Statistical methods of comparing different
multivariate analyses of the same data. In Mathematics in the
Archaeological and Historical Sciences (F.R. Hodson, D.G. Kendall
and P. Tautu, eds), pp. 138-149: Edinburgh. University Press.
GOWER, J.C. (1975). Generalized procrustes analysis. Psychometrika,
40, 33-51.
HAMPEL, F.R. (1973). Robust estimation: a condensed partial survey.
Z. Wahr. verw. Geb., 27, 87-104.
HAMPEL, F.R. (1974). The influence curve and its role in robust
estimation. J. Amer. Statist. Assoc., 69, 383-393.
HAMPEL, F.R. (1977). Modern trends in the theory of robustness.
Res. Rep. no. 13, Fachgruppe fur Stat., Eidgenossische Technische
Hochschule, Zurich.
HEALY, M.J.R. (1968). Multivariate normal plotting. Appl. Statist.,
17, 157-161.
HILLS, M. (1969). On looking at large correlation matrices. Biometrika,
56, 249-253.
HINKLEY, D.V. (1978). Improving the jackknife with special reference
to correlation estimation. Biometrika, 65, 13-21.
HOGG, R.V. (1977). An introduction to robust procedures. Comm. Statist.-
Theor. Meth., A6, 789-794.
HOPPER, S.D. and CAMPBELL, N.A. (1977). A multivariate morphometric
study of taxonomic relationships in kangaroo paws (Anigozanthos
Labill. and Macropidia Drumm. ex Hary.: Haemodoraceae). Aust. J.
Bot., 25, 523-544.
![Page 241: CANONICAL VARIATE ANALYSIS: SOME PRACTICAL ASPECTS by ... · 5.5 C-V plot for arctanh correlations for grasshopper data 137 5.6 I-A and Q-Q plots for log variances for Thais data](https://reader033.vdocuments.mx/reader033/viewer/2022052106/60412acbe2295f639322b604/html5/thumbnails/241.jpg)
240
HOTELLINC, H. (1936). Relation between two sets of variates.
Biometrika, 28, 321-377.
HUBER, P.J. (1964). Robust estimation for a location parameter.
Ann. Math. Statist., 35, 73-101.
HUBER, P.J. (1972). Robust statistics: A review. Ann. Math. Statist.,
43, 1041-1067.
HUBER, P.J. (1977a). Robust covariances. In Statistical Decision Theory
and Related Topics IT (Shanti S. Gupta and David S. Moore, ads)
pp. 165-191. New York: Academic Press.
HUBER, P.J. (1977b). Robust Statistical Procedures. Philadelphia:
SIAM.
JAMES, G.S. (1954). Tests of linear hypotheses in univariate and
multivariate analysis when the ratios of the population variances
are unknown. Biometrika, 41, 19-43.
JOHNSON, N.L. and KOTZ, S. (1970). Continuous Univariate Distributions - 2.
Boston, Mass.: Houghton Mifflin.
KSHIRSAGAR, A.M. (1972). Multivariate Analysis. New York: Marcel
Dekker.
LAYARD, M.W.J. (1972). Large sample tests for the equality of two
covariance matrices. Ann. Math. Statist., 43, 123-141.
LAYARD, M.W.J. (1974). A Monte Carlo comparison of tests for equality
of covariance matrices. Biometrika, 61, 461-465.
MANDEL, J. (1961). Non-additivity in two-way analysis of variance.
J. Amer. Statist. Ass., 56, 878-888.
MANDEL, J. (1971). A new analysis of variance model for non-additive
data. Technometrics, 13, 1-18.
MARONNA, R.A. (1976). Robust M-estimators of multivariate location and
scatter. Ann. Statist., 1, 51-67.
MARSAGLIA, G. and BRAY, T.A. (1964). A convenient method for generating
normal variables. SIAM Rev., 6, 260-264.
![Page 242: CANONICAL VARIATE ANALYSIS: SOME PRACTICAL ASPECTS by ... · 5.5 C-V plot for arctanh correlations for grasshopper data 137 5.6 I-A and Q-Q plots for log variances for Thais data](https://reader033.vdocuments.mx/reader033/viewer/2022052106/60412acbe2295f639322b604/html5/thumbnails/242.jpg)
241
MORRISON, D.F. (1976). Multivariate Statistical Methods, second
edition. New York: McGraw-Hill.
NELDER, J.A. and MEAD, R. (1965). A simplex method for function
minimization. Computer J., 7, 308-313.
NEWMAN, T.G. and ODELL, P.L. (1971). The Generation of Random Variates.
London: Griffin.
OLSON, C.L. (1974). Comparative robustness of six tests in multivariate
analysis of variance. J. Amer. Statist. Assoc., 69, 894-908.
PHILLIPS,' B.F., CAMPBELL, N.A. and WILSON, B.R. (1973). A multivariate
study of geographic variation in the whelk Dicathais. J. Exp.
Mar. Biol. Ecol., 11, 29-63.
RADCLIFFE, J. (1966). Factorizations of the residual likelihood
criterion in discriminant analysis. Proc. Carob. Phil. Soc., 62,
743-752.
RADCLIFFE, J. (1967). A note on an approximate factorization in
discriminant analysis. Biometrika, 54, 665-668.
RANDLES, R.H., BROFFITT, J.D., RAMBERG, J.S. and HOGG, R.V. (1978).
Generalized linear and quadratic discriminant functions using
robust estimates. J. Amer. Statist. Ass., 73, 564-568.
RAO, C.R. (1948). The utilization of multiple measurements in problems
of biological classification. J. R. Statist. Soc. B, 10, 159-193.
RAO, C.R. (1952). Advanced Statistical Methods in Biōmetric Research.
New York: Wiley
RAO, C.R. (1970). Inference on discriminant function coefficients.
In Essays on Probability and Statistics (R.C. Bose, et al., eds),
pp. 587-602. Chapel Hill: University of North Carolina and
Statistical Publishing Society.
RAO, C.R. (1973). Linear statistical Inference and its Applications,
second edition. New York: Wiley.
![Page 243: CANONICAL VARIATE ANALYSIS: SOME PRACTICAL ASPECTS by ... · 5.5 C-V plot for arctanh correlations for grasshopper data 137 5.6 I-A and Q-Q plots for log variances for Thais data](https://reader033.vdocuments.mx/reader033/viewer/2022052106/60412acbe2295f639322b604/html5/thumbnails/243.jpg)
242
REMPE, U. and WEBER, E.E. (1972). An illustration of the principal
ideas of MANOVA. Biometrics, 28, 235-238.
ROY, S.N., GNANADESIKAN, R. and SRIVASTAVA, J.N. (1971). Analysis and
Design of Certain Quantitative Multiresponse Experiments. Oxford:
Pergamon.
SCHATZOFF, M. (1966). Sensitivity comparisons among tests of the general
linear hypothesis. J. Amer. Statist. Assoc., 61, 415-435.
SCHEFFE, H. (1959). The Analysis of Variance. New York: Wiley
SIBSON, R. (1978). Studies in the robustness of multidimensional
scaling.. J.R. Statist. Soc. B, 40, 234-238.
SPRENT, P. (1969). Models in Regression and Related Topics. London: Methuen.
TUKEY, J.W. (1949). One degree of freedom for non-additivity.
Biometrics, 5, 232-242.
TUKEY, P.A. (1976). Statistical models with covariance constraints.
Ph.D. Thesis, University of London.
WELCH, B.L. (1939). Note on discriminant functions. Biometrika,
31, 218-220.
WILK, M.B., GNANADESIKAN, R. and HUYETT, Miss M.J. (1962). Estimation
of the parameters of the gamma distribution using order statistics.
Biometrika, 49, 525-545.
WILLIAMS, E.J. (1952). The interpretation of interactions in factorial
experiments. Biometrika, 39, 65-81.
WILLIAMS, E.J. (1961). Tests for discriminant functions. J. Austral.
Math. Soc., 2, 243-252.
WILLIAMS, E.J. (1967). The analysis of association among many variates
(with Discussion). J.R. Statist. Soc. B, 29, 199-242.
WILSON, E.B. and HILFERTY, M.M. (1931). The distribution of chi-square.
Proc. Nat. Acad. Sci., Washington, 17, 684-688.
YATES, F. and COCHRAN, W.G. (1938). The analysis of groups of experiments.
J. Agric: Sci., 28, 556-580.