fa and procrustes problems

8/6/2019 Fa and Procrustes Problems

1/48

POLAR DECOMPOSITIONS, FACTOR ANALYSIS AND

PROCRUSTES PROBLEMS IN FINITE DIMENSIONAL INDEFINITE

SCALAR PRODUCT SPACES

ULRIC KINTZEL

Abstract. A criterion for the existence of H-polar decompositions based on comparing canonicalforms is presented, and a numerical procedure is explained for computing H-polar decompositionsof a matrix for which the product of itself with its H-adjoint is diagonalisable. Furthermore, theH-orthogonal or H-unitary procrustes problem is stated, and solved by application of H-polar de-compositions.

Key words. Indefinite scalar products, polar decompositions, factor analysis, procrustes prob-lems.

AMS subject classifications. 15A63, 15A23.

1. Introduction. Let F be the field of the real numbers R or of the complexnumbers C and let Fn be a ndimensional vector space over F. Furthermore, letH be a fixed chosen regular symmetric or hermitian matrix of Fnn and let x =(x1, . . . , xn)T, y = (y1, . . . , yn)T be column vectors of Fn. Then the bilinear orsesquilinear functional

[x, y] = (Hx, y) where (x, y) =n

=1

xy (y = y if F = R)

defines an indefinite scalar product in Fn. Indefinite scalar products have almostall the properties of ordinary scalar products, except for the fact that the squarednorm [x, x] of a vector x = 0 can be positive, negative or zero. A correspondingvector is called positive (space-like), negative (time-like) or neutral (isotropic, light-like) respectively. If A is an arbitrary matrix of Fnn then its H-adjoint A[] ischaracterised by the property that

[Ax, y] = [x, A[]y] for all x, y Fn.

This is equivalent to the fact that between the H-adjoint A[] and the ordinary adjointA = AT the relationship

A[] = H1AH

exists. If in particular A[] = A or AH = HA, one speaks of an H-selfadjoint orH-symmetric or H-hermitian matrix, and an invertible matrix U with U[] = U1 or

UHU = H is called an H-isometry or H-orthogonal matrix or H-unitary matrix. IfA is a given matrix ofFnn, then a representation such as

A = UM with UHU = H and MH = HM

is called an H-polar decomposition of A.

Institut fur Mathematik, MA 4-5, TU Berlin, Strae des 17. Juni 136, 10623 Berlin, Germany;email: [email protected]

1


2/48

Decompositions of this kind have been investigated in detail in the publications[BMRRR1-3] and [MRR] as well as in the further references specified there. More spe-cialised results concerning polar decompositions of H-normal matrices, i.e. matriceswhich commute with their H-adjoint, are discussed in [LMMR].

H-polar decompositions are also the central subject of this paper, in which theo-retical as well as practical questions are discussed. For example, the more theoreticalChapter 3 is primarily concerned with finding a further criterion for the existence ofH-polar decompositions, whereas the more practical Chapter 4 presents a numericalprocedure for computing H-polar decompositions of a matrix A for the case in whichthe matrix A[]A is diagonalisable. In both chapters some statements are requiredconcerning subspaces of the Fn, which are first of all derived in Chapter 2, wherebysome numerical questions are already examined in outlook for Chapter 4. In the finalChapter 5, two applications from a branch of mathematics known in psychology asfactor analysis or multidimensional scaling, are ported into the environment of indef-inite scalar products. This involves on the one hand the task of constructing sets ofpoints which take up given distances, and on the other hand the task of converging

two sets of such points in the sense of the method of least squares or the procrustesproblem1, which is achieved with the help of an H-polar decomposition.

In a typical application of multidimensional scaling (MDS, for example see [BG])test persons are first of all requested to estimate the dissimilarity (or similarity)of specified objects which are selected terms describing the subject of the analysis.For example, if professions are to be analysed, terms such as politician, journalist,physician, etc. can be used as the objects. In this way the comparison of N objectsin pairs produces the similarity measures, called proximities, pkl, 1 k, l N,from which the distances dkl = f(pkl) are then determined using a function f, forexample f(x) = ax + b, which is called the MDS model. Using these distances, thecoordinates of points xk in an ndimensional Euclidean space are constructed suchthat xk xl = dkl whereby . stands for the Euclidean norm. Thus each object

is now represented by a point in a coordinates system and the data can be analysedwith regard to their geometric properties. For example, it can thereby be attemptedto interpret the basis vectors of the space in the sense of psychological factors, suchas social status in the given example of professions.

The results of interrogating the test persons are often categorised in M groups,e.g. according to gender and/or age, producing several descriptive constellations of

points x(r)k , 1 r M, in a Euclidean space of dimension n = max{n(r)} which

must be mutually compared in the analysis. To make such a comparison of two con-stellations xk and yk possible, it is first of all necessary to compensate for irrelevantdifferences resulting from the different locations in space. This is done with an orthog-onal transformation U devised such that

k Uxk yk2 is minimised. Thereafter

the constellations xk = Uxk and yk can be analysed.

Thus the MDS model f is chosen in particular by adding a constant b (andby making further assumptions such as dkk = 0), so that the triangular inequality isfulfilled and therefore the points can be embedded in an Euclidean space [BG, Chapter18]. This restriction is not required mathematically if a pseudo-Euclidean geometry is

1Procrustes, a robber in Greek mythology, who lived near Eleusis in Attica. Originally he wascalled Damastes or Polypemon. He was given the name Procrustes (the stretcher) because hetortured his victims to fit them into a bed. If they were too tall, he chopped off their limbs or formedthem with a hammer. If they were too small, he stretched them. He was overcome by Theseus whoserved him the same fate by chopping off his head to fit him into the bed.

2


3/48

admitted. This is the subject of the investigations in Chapter 5, where the stated taskof constructing points from given distances and rotation data in the sense of optimumbalancing in the environment of indefinite scalar products is considered.

The following notation is used in the course of this work: The kernel, the image

and the rank of a matrix A are designated ker A, im A and rank A respectively. Ifthe matrix A is square, then tr A, det A and (A) are its trace, determinant andspectrum respectively. Furthermore, the abbreviation A = (A)1 = (A1) isused. The symbol 0 is used for zero vectors as well as for zero matrices. In someplaces it is additionally provided with size attributes 0p,q Fpq or 0p Fpp,whereby lower indices may also be intended as enumeration indices. This is evidentfrom the respective context. Ip, Np and Zp respectively designate the p p identitymatrix, the p p matrix with ones on the superdiagonal and otherwise zeros, andthe p p matrix with ones on the antidiagonal and otherwise zeros. In particularJp() = Ip + Np specifies an upper Jordan block for the eigenvalue . Moreover,A1 . . . Ak stands for the block diagonal matrix consisting of the specified blocks,and diag(1, . . . , k) stands for a diagonal matrix with the specified diagonal elements.

Even when no further specifications are made, a regular (real) symmetric or (complex)hermitian matrix is always meant by H, and instead ofA[] sometimes AH is writtento specify the matrix on which the scalar product is based. Lastly, the direct sum oftwo subspaces X, Y Fn is denoted by X Y.

2. Subspaces. The properties of subspaces of an indefinite scalar product spaceover the field of the real or complex numbers are discussed in detail in [GLR, ChapterI.1]. In this chapter some additional properties are described which are required inthe course of the further considerations.

Let F = R or F = C and let [., .] be an indefinite scalar product of Fn with theunderlying regular symmetric or hermitian matrix H Fnn. A subspace M Fn issaid to be positive (non-negative, neutral, non-positive, negative) if

[x, x] > 0 ([x, x] 0, [x, x] = 0, [x, x] 0, [x, x] < 0)is satisfied for all 0 = x M. The set defined by

M[] = {x Fn : [x, y] = 0 for all y M}is also a subspace ofFn and is termed the H-orthogonal companion of M, for whichthe important equations

(M[])[] = M and dim M + dim M[] = n

hold. A subspace M is called non-degenerate if x M and [x, y] = 0 for all y Mimply that x = 0, otherwise M is called degenerate. Furthermore, the equations

M M[] = {0} and M M[] = Fnare satisfied if and only if M is non-degenerate. (This is an essential difference com-pared with the ordinary scalar product, for which these equations are always fulfilledfor the ordinary orthogonal complement M.) It is now shown in [GLR, Theorem1.4] that every non-negative (non-positive) subspace is a direct sum of a positive (neg-ative) and a neutral subspace. However, the following more general theorem, whoseproof is based on statements made in [GR, Chapter IX, 2], is also true.

Theorem 2.1 (Decomposition of subspaces).

3


4/48

1. Every non-degenerate subspace M Fn can be expressed as a direct sumM = M+ M whereby M+ is positive, M is negative and both spaces areH-orthogonal.

2. Every subspace M

Fn can be expressed as a direct sum M = M0

M1 whereby M0 is neutral, M1 is non-degenerate and both spaces are H-orthogonal.

Proof. 1. Assume that M+ is a positive subspace ofM with maximum dimension.

Then M+ is non-degenerate and M+ M[]+ = Fn. Thus a representation

M+ (M M[]+ ) = M, M = M M[]+ ,

exists with two H-orthogonal summands, and it remains to show that M is negative.Suppose that a vector x M exists with [x, x] > 0. Then it would follow that[x + y, x + y] = [x, x] + [y, y] > 0 for all y M+. But this would mean that thesubspace M+ span{x} is also positive, in contradiction to the maximality of M+.

Thus M is non-positive and the Schwarz inequality [GLR, Chapter I.1.3]2

|[x, y]|2 [x, x][y, y]

can be applied. Now assume that x0 M with [x0,x0] = 0. Then the Schwarzinequality shows that [x0,x] = 0 must be fulfilled for all x M. Since it is also truethat [x0,y] = 0 for all y M+, it follows that [x0,z] = 0 for all z M. Thus x0 = 0,because M is non-degenerate.

2. Let M0 = M M[]. Then M0 is neutral, because if a vector x M0 Mwere to exist with [x, x] = 0, it would follow that x / M[] M0. Now let M1 be acomplementary subspace, so that

(M M[]) M1 = M, M0 = M M[],

with two H-orthogonal summands applies. To show that M1 is non-degenerate, letx0 M1 with [x0, x] = 0 for all x M1. Furthermore, [x0, y] = 0 for all y M0, sothat [x0, z] = 0 for all z M. Thus it follows that x0 M1 and x0 M0, so thatx0 = 0.

On combining the two statements of the theorem, it is clear that every subspaceM Fn can be expressed in the form

M = M+ M M0with a positive, a negative and a neutral - mutually H-orthogonal - subspace. In orderto deduce the dimensions of these spaces, we can refer to the following classical result[GR, Chapter IX, 2].

Remark 2.2 (Projection onto subspaces). Let M = span{x1, . . . , xm} be asubspace ofF

n

. Then every vector y M,

y =

m=1

x,

can be represented uniquely by its coordinates y = (1, . . . , m)T Fm with respect

to the given basis of M. Now ifX = [x1 . . . xm] Fnm is a matrix whose columns2There is a typing error contained in equation (1.8): It must be read |(Hy,z)|2 (Hy,y)(Hz,z).

4


5/48

are the basis vectors, then y = Xy and for H = XHX Fmm we obtain

(Hy, z)n = (HXy, Xz)n = (XHXy, z)m = (Hy, z)m where

(x, y)k =

k=1

xy.

Consequently all properties of the non-degenerate scalar product H : Fn Fn Fin the subspace M can be studied with the help of the possibly degenerate scalarproduct H : Fm Fm F. In particular, if(H) contains p positive and q negativeeigenvalues, and if r = m p q is the multiplicity of the eigenvalue 0, then for thedecomposition of M described above,

dim M+ = p, dim M = q, dim M0 = r.

The dimensions of the subspaces are uniquely determined. This is a consequence ofSylvesters law of inertia, according to which the numbers of positive, negative and

vanishing elements are invariant in all diagonal representations of H. Furthermore,the subspace M0 is uniquely determined by the nullspace ker H of the degeneratescalar product. If M is ultimately a non-degenerate subspace, then det H = 0, i.e.r = 0, and the maximum dimension of a neutral subspace of M is given by min(p,q).This is not explicitly shown here, but the proof is given in [GLR, Theorem 1.5].

The following theorem describes the interesting fact that a single subspace inducesa decomposition of the entire space into four complementary subspaces.

Theorem 2.3 (Decomposition of the space). Let M Fn. Then four subspacesM1, M2, M

0, M

0 ofF

n exist with the following properties:1. Fn = M0 M1 M2 with M0 = M0 M0 .2. M0 = M M[] and M = M1 M0 as well as M[] = M2 M0.3. M0, M1, M2 are non-degenerate and H-orthogonal in pairs.

4. M

0, M

0 are neutral and dim M

0 = dim M

0 .Proof. Suppose that M1 and M2 are the complements ofM0 which exist according

to Theorem 2.1 and for which the assertion 2. is fulfilled. Then M1 M andM2 M[] are H-orthogonal and non-degenerate, so that M1 M2, too, is non-degenerate. Consequently

Fn = (M1 M2) (M1 M2)[]

and, moreover, M0 (M1 M2)[]. If it is now chosen that M0 = (M1 M2)[] =M0 M0 , then the assertions 1. and 3. are fulfilled too. From

Fn = (M1 M2 M0) M0 = (M + M[]) M0

it furthermore follows that

dim M0 = n dim(M + M[])= n (dim M + dim M[] dim(M M[]))= dim M0,

whereby the equations

dim M + dim N = dim(M + N) + dim(M N) and dim M + dim M[] = n,5


6/48

which are valid for all subspaces, have been applied [GR, Chapter I.1.21], [GLR,Chapter I.1.2].

It remains to show that M0 is neutral. Let r = dim M0 = dim M

0 . Then M0

is a 2r

dimensional non-degenerate subspace of Fn, which can be split according

to Theorem 2.1 into a positive and a negative subspace M0 = M+0 M0 . Letp = dim M+0 and q = dim M

0 . Since M0 must contain the rdimensional neutralsubspace M0 it follows that r min(p,q) and thus p r and q r [GLR, Theorem1.5]. On the other hand p + q = 2r, so that p = q = r. Thus for the subspace M0 therepresentations

M0 = M+0 M0 = M0 M0 exist with

dim M+0 = dim M

0 = dim M

0 = dim M

0

and H-orthogonal spaces M+0 , M

0 , so that the three bases

M+0 = span{x+1 , . . . , x+r }, M0 = span{x1 , . . . , xr }, M0 = span{x1, . . . , xr}

with

[x+k , x+l ] > 0, [x

k , x

l ] < 0, [x+k , x

l ] = 0 and [x

k, x

l] = 0 for 1 k, l rcan now be chosen. Since M+0 is positive, M

0 is negative and M0 is neutral, it must

also be true that M+0 M0 = M0 M0 = {0}, so that each basis vector of M0 canbe expressed in the form

xk =

ri=1

kix+i +

ri=1

kix

i with (k1, . . . , kr)T = 0, (k1, . . . , kr)T = 0.

Furthermore, the vectors defined by

x+k =

ri=1

kix+i and x

k =

ri=1

kix

i

can be used as new basis vectors for M+0 , M

0 , because if it is assumed that theconstants (1, . . . , r) = 0 with 1x+1 + . . . , rx+r = 0 exist, then 0 = 1x1 + . . . +rx

r = 1(x

+1 + x

1 ) + . . . + r(x+r + x

r ) = 1x

1 + . . . + rxr M0 and thus

M0 M0 = {0}. The linear independence of the vectors x1 , . . . , xr can be shownanalogously. Finally, defining

xk = x+k xk for 1 k r and M0 = span{x1 , . . . , xr},

then M0 is on the one hand a neutral subspace because

[xk , x

l ] = [x+k

xk , x

+l

xl ] = [x

+k , x

+l ] + [x

k , x

l ]

= [x+k + xk , x+l + xl ] = [xk, xl] = 0

and on the other hand

M0 = M+0 M0

= span{x+1 , . . . , x+r } span{x1 , . . . , xr }= span{x+1 + x1 , . . . , x+r + xr } span{x+1 x1 , . . . , x+r xr }= M0 M0 ,

6


7/48

so that the 4th assertion of the theorem is fulfilled too.Whereas the statements have been proved so far without reference to particular

bases, we will also continue to use H-orthogonal bases. The following two theoremscontain corresponding generalisations of the Schmidt orthonormalisation method.

They will in particular be used to construct H-orthogonal bases of eigenspaces.Theorem 2.4 (Orthonormalisation of bases). LetF = R orF = C and let X be

a subspace ofFn with dim X = m. Then there exists a basis {u1, . . . , um} of X suchthat

[uk,ul] = kkl, k =

+1, for 1 k p1, for p + 1 k p + q0, for p + q + 1 k p + q + r

and p + q + r = m. In particular, if X is non-degenerate, then r = 0.Proof. (Complete induction) Let X initially be non-degenerate and let {x1, . . . ,

xm} be a basis of X. Also let k, l be two indices of {1, . . . , m} such that |[xk, xl]| ismaximised. Then it necessarily follows that [xk, xl]

= 0, because otherwise X would

be degenerate. For the case k = l let the basis which is obtained by interchangingx1 and xk still be designated as {x1, . . . , xm}. Otherwise, on account of the polaridentities which are valid for all x, y Fn,

[x, y] =1

4

[x + y, x + y] [x y, x y] if F = R and

[x, y] =1

4

[x + y, x + y] [x y, x y]

+i

4

[x + iy, x + iy] [x iy, x iy] if F = C,

the following selection can be made: Let

z+ = x

k+ x

l, z = x

k xl, ifF = R or F = C and

| Re[xk, xl]| | Im[xk, xl]|z+ = xk + ixl, z = xk ixl, otherwise

and

xk = z+, xl = z

, if |[z+, z+]| |[z, z]|xk = z

, xl = z+, otherwise

.

Then span{xk, xl} = span{xk, xl} and [xk, xk] = 0. Let the particular basis obtainedby replacing xk, xl with xk, xl and then exchanging x1 and xk still be designated as{x1, . . . , xm}. If now we make

u1 = x1/

|[x1, x1]| and 1 = sig[x1, x1] {+1, 1},then [u1, u1] = [x1, x1]/

|[x1, x1]

|= 1 and for the vectors defined by

xi = xi 1[xi, u1]u1 for 2 i mwe obtain [xi, u1] = [xi, u1] 1[xi, u1][u1, u1] = 0. Thus X can be expressed asdirect sum of its H-orthogonal subspaces span{u1} and X = span{x2, . . . , xm}, sothat X too is nondegenerate. Now according to the induction hypothesis there existsa basis {u2, . . . , um} ofX with the demanded properties, so that finally {u1, . . . , um}is the wanted basis of X, if a suitable sorting is also made in the case of 1 = 1.If X is a degenerate subspace, the same construction can be applied, but it then

7


8/48

terminates after a certain number of steps, namely when no more non-vanishing scalarproducts can be found. The remaining r vectors xi thus satisfy [x

i, x

j ] = 0 for

m r + 1 i, j m.Theorem 2.5 (Orthonormalisation of pairs of bases). LetF = R orF = C and

let X, Y be two neutral subspaces ofFn with dim X = dim Y = m and X Y = {0}.Then there exists a basis {u1, . . . , um} of X and a basis {v1, . . . , vm} of Y such that

[uk,vl] = kkl, k =

1, for 1 k p0, for p + 1 k p + r

and p + r = m. In particular, if X Y is non-degenerate, then r = 0.Proof. (Complete induction) Let X Y initially be non-degenerate and let {x1,

. . . , xm} be a basis of X and {y1, . . . , ym} be a basis of Y. Also let k, l be twoindices from {1, . . . , m} so that |[xk, yl]| is maximised. Then it necessarily follows that[xk, yl] = 0, because otherwise X Y would be degenerate. Let the particular basesobtained by exchanging x1 and xk or y1 and yl still be designated as{x1, . . . , xm}and

{y1, . . . , ym

}respectively. In the case F = R now let 1

{+1,

1

}such that

1 = [x1, 1y1] > 0 and let

u1 = x1/

1 as well as v1 = 1y1/

1,

so that [u1, v1] = [x1, 1y1]/[x1, 1y1] = 1; in the case F = C let 1 = [x1, y1] and let

u1 = x1/1 as well as v1 = y1/1 where 21 = 1

so that [u1, v1] = [x1, y1]/[x1, y1] = 1. For the vectors defined by

xi = xi [xi, v1]u1 as well as yi = yi [yi, u1]v1 for 2 i mwe obtain [xi, v1] = [xi, v1] [xi, v1][u1, v1] = 0 and [yi, u1] = [yi, u1] [yi, u1] [v1, u1] = 0. Thus XY can be expressed as direct sum of its H-orthogonal subspacesspan

{u1, v1

}and X

Y = span

{x2, . . . , x

m

} span

{y2, . . . , y

m

}, so that X

Y, too, is non-degenerate. Now according to the induction hypothesis, two bases{u2, . . . , um} and {v2, . . . , vm} of X and Y exist with the demanded properties,so that finally {u1, . . . , um} and {v1, . . . , vm} are the wanted bases of X and Y. IfX Y is a degenerate subspace, the same construction can be applied, but it thenterminates after a certain number of steps, namely when no more nonvanishing scalarproducts can be found. The remaining 2r vectors xi, y

i thus satisfy [x

i, y

j ] = 0 for

m r + 1 i, j m.Numerical Procedure 2.6 (Orthonormalisation of bases). The proofs of

the Theorems 2.4 and 2.5 were formulated by choice of the maximised scalar product(pivoting) such that they can be implemented directly as stable numerical methods.To limit the coordinates of the neutral basis vectors

up+q+1, . . . , um or up+1, . . . , um and vp+1, . . . , vm

they should additionally be brought to the Euclidean length x =

(x, x) = 1after completing the orthogonalisation process. Furthermore, the normalisation inthe method of Theorem 2.5 can be modified by a factor

u1 =1

x1, v1 =1

1y1 (F = R) or

u1 =

1x1, v1 =

1

1y1 (F = C).

8


9/48

In particular, if the choice =y1/x1 is made, then u1 = v1 is ensured,

which is found to be particularly advantageous in practice. This can be demonstratedwith the following example in which A =

tr(AA) denotes the Frobenius Norm

and cond A =

A

A1

denotes the condition number of a matrix A:

Let H = diag(1, 1) and x = x x T, y = y y T with x, y C\{0}. ThenX = span{x} and Y = span{y} are neutral subspaces of equal dimension such thatX Y = {0} and Theorem 2.5 can be applied. Let = [x, y] = 2xy and let be oneof the two square roots of . Then the columns [xy] of the matrix X1 where

X1 =

x

y

x

y

, i.e. X11 = ||

2

2xy

y

y

x

x

,

are the vectors obtained by orthonormalisation without modification and it is truethat

cond X1 = |x|2 + |y|2|x||y| because

X12 = 2(|x|2 + |y|2)||2 , X

11 2 =

||2(|x|2 + |y|2)2|x|2|y|2 .

If now =y/x = |y|/|x|, then the columns [xy] of the matrix X2 where

X2 =

x

y

x

y

, i.e. X12 = ||

2

2xy

y

y

x

x

,

are the vectors obtained by orthonormalisation with modification and it is true that

cond X2 =4|x|2 + |y|2

2|x||y| = 2 because

X22 = 2(4|x|2 + |y|2)

2||2 , X11 2 =

||2(4|x|2 + |y|2)22|x|2|y|2 .

But for arbitrary real numbers a, b we find that 0 (a b)2 = a2 2ab + b2 or2ab a2 + b2 from which in the case of ab > 0 the inequality 2 (a2 + b2)/ab follows,so that in particular cond X1 2 and thus

cond X1 cond X2

is fulfilled. Therefore, the matrix X2 obtained by modification with the factor isat least as well conditioned as X1; but in the case of |x| = |y| it is always betterconditioned.

It is easily seen that Theorem 2.4 provides a special basis for the statements fromTheorem 2.1. A corresponding basis representation of Theorem 2.3 can be formulatedas follows.

Theorem 2.7 (Extension of bases). LetF = R orF = C and let X be a subspaceof Fn with dim X = m. Then there exists a basis {u1, . . . , un} of Fn which has thefollowing properties:

9


10/48

1. If U = [u1 . . . un] is a matrix whose columns are the basis vectors, then

UHU =

Ip 0

0 Iq

0 IrIr 0

Is 0

0 It

with p + q + r = m and p + q + 2r + s + t = n.2. If the subspaces X1, X

0, X

0 , X2 are defined by

X1 = span{u1, . . . , up+q},X0 = span{up+q+1, . . . , up+q+r},X0 = span{up+q+r+1, . . . , up+q+2r},X2 = span{up+q+2r+1, . . . , up+q+2r+s+t},

then X0, X0 are neutral subspaces with the same dimension, X1, X2 and

X0 = X0 X0 are non-degenerate and mutually H-orthogonal, and Fn =

X0 X1 X2 as well as

X = X1 X0, X[] = X2 X0, X X[] = X0.Proof. Let p + q + r = m and let E = {ei}mi=1 be a basis of X which exists

according to Theorem 2.4 such that

[ei, ej ] =

+1, for r + 1 i = j r + p1, for r + p + 1 i = j r + p + q0, otherwise

.

Furthermore, let E = {ei}mi=1 be a dual basis with respect to E , i.e.

[ei,

ej ] = ij for 1 i, j m.Then the vectors defined by

ek =

ek 12

r=1

[

ek,

e]e for 1 k r

satisfy the equations3

[

ek,

el] = 0 for 1 k, l r and[ei,

ek] = ik for 1 i m, 1 k r.If now we make

ek =1

2(ek +

ek) and e

k =1

2(ek

ek) for 1

k

r

then it is true that

[ek, e

l] = kl, [e

k, e

l ] = kl, [ek, el ] = 0 for 1 k, l r and[ei, e

k] = 0, [ei, e

k] = 0 for r + 1 i m, 1 k r.3The same construction is also specified in [BMRRR2] and in [BR] within the scope of the proof

for Witts theorem. However, the necessary orthonormalisation of the vectorsek is there not carried

out completely, so that the basis {ei, ek,ek} also constructed there is not orthonormalised.

10


11/48

Thus the set of the vectors

{u1, . . . , um+r} = {ei}mi=r+1 {ek}rk=1 {ek}rk=1

constitutes an orthonormalised basis of a non-degenerate subspace Y which can becomplemented with n m r further vectors um+r+1, . . . , un to give an orthonor-malised basis ofFn

[ui, uj ] = iij , i {+1, 1} for 1 i, j n.

For the matrix U consisting of these basis vectors it is true that

UHU = (Ip Iq) (Ir Ir) (Is It),

whereby s specifies the number of positive, t specifies the number of negative com-plementing vectors, and a suitable sorting method is assumed. Instead of the basis{u1, . . . , un} it is also possible to use the basis

{u1, . . . , up+q, up+q+1, . . . , up+q+2r, up+q+2r+1, . . . , un}with {up+q+1, . . . , up+q+2r} = {ek}rk=1 {

ek}rk=1For the matrix U consisting of these basis vectors it is true that

UHU =

Ip 0

0 Iq

0 IrIr 0

Is 0

0 It

,

and evidently the second part of the assertion is fulfilled too by this basis.An important application of this result is the following Theorem of Witt con-

cerning the extension of isometries, whose proof has been taken over from the papers[BR, Theorem 4.1] and [BMRRR2, Theorem 2.1]. Thereby (H) gives the number of

positive eigenvalues of an hermitian matrix H.Theorem 2.8 (Witt, extension of isometries). LetF = R orF = C and let [., .]1,

[., .]2 be two indefinite scalar products ofFn with the underlying regular symmetric or

hermitian matrices H1, H2 Fnn for which (H1) = (H2) is fulfilled. If X1 andX2 are subspaces ofF

n and U0 : X1 X2 is a regular transformation such that

[U0x, U0y]2 = [x, y]1 for all x, y X1,

then there exists a regular transformation U : Fn Fn such that

[Ux, Uy]2 = [x, y]1 for all x, y Fn and Ux = U0x for all x X1.

Proof. Let dim X1 = m and let {e1, . . . , em} be an orthonormalised (accordingto Theorem 2.4) basis of X1 with

[ek,el] = kkl, k =

+1, for 1 k p1, for p + 1 k p + q0, for p + q + 1 k p + q + r

and p + q + r = m. Then {f1, . . . , fm} with fk = U0ek for 1 k m is anorthonormalised basis ofX2, and both bases can be extended to bases ofF

n according

11


12/48

to Theorem 2.7. For the matrices R1 = [e1 . . . en] and R2 = [f1 . . . fn] consisting ofthe extended basis vectors it follows that

R1H1R1 = R

2H2R2 = Ip 0

0 Iq 0 Ir

Ir 0 Is 0

0 Itwith r+s+t = nm, since the number of positive and the number of negative vectors,s and t respectively, must be identical for both bases, because of the assumption thatthe signatures of the matrices H1 and H2 are the same. Thus the transformationdefined by

UR1 = R2 or U = R2R11

fulfils the assertion of the theorem.The singular value decomposition is a suitable tool for practical application of

the Theorems 2.7 and 2.8. It can be used for constructing a dual basis as well as forcomplementing a basis of a subspace to give a basis of Fn.

Numerical Procedure 2.9 (Extension of bases and isometries). Let X =[x1 . . . xm] be an n m matrix (m < n) whose columns constitute a basis of X1, andlet

HX =

U1 U2

0

V = U1V

be a singular value decomposition of the matrix HX. Then the columns [y1 . . . ym]of

Y =

U1 U2 1

0

V = U1

1V

constitute a basis which is dual with respect to the columns of X because YHX =V1U

1U

1V = Im. Thus if it is assumed that

XHX = Ip Iq 0r with p + q + r = m,as can always be achieved by orthonormalisation of the columns of X according toTheorem 2.4, then it is true for the matrix constituted by X = [x1 . . . xp+qxp+q+1 . . .xmy

p+q+1 . . . y

m] with

yk = yk 1

2

m=p+q+1

[yk, y]xk for p + q + 1 k m

that

(X)H(X) = Ip

Iq

0r Ir

Ir 0rand for the matrix defined by X = [x1 . . . xp+qx

p+q+1 . . . x

mx

p+q+1 . . . x

m] with

xk =1

2(xk + y

k), x

k =1

2(xk yk) for p + q + 1 k m

it is found that

(X)H(X) = Ip Iq Ir Ir.12


13/48

If the columns of X are renamed as [x1 . . . xm+r] and if

X =

U1 U2

0

V

is a further singular value decomposition, then the columns [um+r+1 . . . un] of U2complement the columns of X to give a basis ofFn, and for (X|U2) = [x1 . . . xm+rum+r+1 . . . u

n] with the vectors obtained by orthonormalisation

uk = uk m+r=1

[uk, x

]x

,

= sig[x

, x

] = [x

, x

] for m + r + 1 k n

it follows that

(X

|U2)

H(X

|U2) = Ip

Iq

Ir

Ir

(U2)

H(U2).

Finally, the columns of U2 can be orthonormalised, giving a matrix U2 for which

(X|U2)H(X|U2) = Ip Iq Ir Ir Is It = Z

with p + q + 2r + s + t = n whereby (H) = p + r + s specifies the positive index ofinertia of H.

Starting with the matrix Y = [y1 . . . ym] = [U0x1 . . . U0xm], a matrix (Y|V2 )

can be constructed in the same manner, for which, too,

(Y|V2 )H(Y|V2) = Ip Iq Ir Ir Is It = Z

as is also possible by using two scalar products [., .]1 and [., .]2 with (H1) = (H2).Thus the matrix

U = (Y|V2 )(X|U2)1 = (Y|V2 )Z(X|U2)H

gives the wanted isometry.

3. Canonical forms and H-polar decompositions. The following theoremreviews a well-known statement concerning a transformation of the form ( A, H) (S1AS, SHS), which is designated as canonical form. It goes back to results ofKronecker and Weierstrass and is fundamental for handling H-selfadjoint matrices.

Theorem 3.1 (Canonical form). LetH Cnn be regular and hermitian and letA Cnn be H-hermitian. Then there exists a regular matrix S Cnn such that

S1AS = A1 . . . Ak and SHS = H1 . . . Hk (3.1a)whereby the blocks Aj and Hj are of equal size and each pair (Aj , Hj) has one andonly one of the following forms:

1. Pairs belonging to real eigenvalues

Aj = Jp() and Hj = Zp (3.1b)

with R, p N and {+1, 1}.13


14/48

2. Pairs belonging to non-real eigenvalues

Aj =

Jp() 0

0 Jp()

and Hj =

0 Zp

Zp 0

(3.1c)

with C\R, Im() > 0 and p N.Moreover, the canonical form (S1AS, SHS) of (A, H) is uniquely determined upto the permutation of blocks.

Proof. see [GLR, Theorem 3.3].The ordered set of the signs appearing in the blocks (3.1b) is an invariant of

the canonical form and is called its sign characteristic. Furthermore, an analogousform also exists for real matrices [GLR, Theorem 5.3], but this is not required for thefollowing discussion.

If now F = R or F = C and A Fnn, then a representation of the kind A = UM,in which U Fnn is an H-isometry and M Fnn is H-selfadjoint, is called an H-polar decomposition of A. Decompositions of this kind have been studied in detail

in the papers [BMRRR1-3] and [MRR] as well as in the references cited therein. Thefollowing results are important for these investigations.Theorem 3.2 (H-polar decomposition of real matrices). LetH Rnn be regular

and symmetric, and let A Rnn. Then A admits a real H-polar decompositionA = UrMr (Ur Rnn is H-orthogonal, Mr Rnn is H-symmetric) if and onlyif it admits a complex H-polar decomposition A = UcMc (Uc Cnn is H-unitary,Mc Cnnis H-hermitian).

Proof. see [BMRRR1, Lemma 4.2].Theorem 3.3 (Existence of H-polar decompositions). LetF = R orF = C and

let A Fnn. ThenA admits an H-polar decomposition if and only if there exists anH-selfadjoint matrix M Fnn such that M2 = A[]A and ker M = ker A.

Proof. see [BMRRR1, Theorem 4.1] and [BR, Lemma 4.1].Theorem 3.4 (Existence of H-selfadjoint square roots). LetF = R orF = C and

letA Fnn. Then there exists an H-selfadjoint matrixM such thatM2 = A[]A andker M = ker A if and only if the canonical form of (A[]A, H) satisfies the followingconditions:

1. Blocks belonging to a negative real eigenvalue < 0 can be represented in theform

ri=1

[Jpi() Jpi()],r

i=1

[Zpi Zpi ]

.

2. Blocks belonging to the eigenvalue 0 can be represented in the form (J1J2J3, Z1 Z2 Z3) where

(J1, Z1) = r

i=1

[Npi Npi ],r

i=1

[Zpi Zpi ]

with pi 1,

(J2, Z2) =

sj=1

[Npj Npj1],s

j=1

[jZpj jZpj1]

with pj > 1,

(J3, Z3) =

tk=1

0,t

k=1

k

.

14


15/48

3. If a basis in which the blocks from 2. exist is designated with E1 E2 E3,

E1 = {e(1)i,k} ri=1 2pik=1 , E2 = {e(2)i,k} si=1 2pi1k=1 , E3 = {e(3)i,1 } ti=1 ,

then such a basis must exist in which

ker A = span{e(1)i,1 + e(1)i,pi+1} ri=1 span{e(2)i,1 } si=1 span{e(3)i,1 } ti=1 .

(Remark: From this condition it follows that ker M = ker A.)When all conditions are fulfilled, the following relationships exist between the

canonical form of (M2, H) and of (M, H):a. Non-real eigenvalues. If the canonical form of (M2, H) contains a block of

the form

(Jp( + i) Jp( i), Z2p) with , R and > 0,then the canonical form of (M, H) contains a block of the form

(Jp() Jp(), Z2p) or (Jp() Jp(), Z2p) with 2 = + i.b. Positive real eigenvalues. If the canonical form of (M2, H) contains a block

of the form

(Jp(2), Zp) with > 0,

then the canonical form of (M, H) contains a block of the form

(Jp(), Zp) or (Jp(), (1)p+1Zp).c. Negative real eigenvalues. If the canonical form of (M2, H) contains a block

of the form

(Jp(2) Jp(2), Zp Zp) with > 0,then the canonical form of (M, H) contains a block of the form

(Jp(i) Jp(i), Z2p).d. First case with eigenvalue 0. If the canonical form of (M2, H) contains a

block of the form

(Np Np, Zp Zp) (J1, Z1),then the canonical form of (M, H) contains a block of the form

(N2p

, Z2p

) or (N2p

,

Z2p

).

e. Second case with eigenvalue 0. If the canonical form of (M2, H) contains ablock of the form

(Np Np1, Zp Zp1) (J2, Z2),then the canonical form of (M, H) contains a block of the form

(N2p1, Z2p1).

15


16/48

f. Third case with eigenvalue 0. If the canonical form of (M2, H) contains ablock of the form

(0, )

(J3, Z3),

then the canonical form of (M, H) contains a block of the form

(0, ).

Proof. see [BMRRR1, Theorem 4.4, Lemma 7.8], [BMRRR3, Errata] and Theo-rem 4.4.

Whereas Theorem 3.2 makes it possible to transfer results concerning complexH-polar decompositions to real decompositions, Theorem 3.3 - whose proof is basedon Witts theorem - and Theorem 3.4 constitute the essential result for the existenceof H-polar decompositions. Note that there is an error in [BMRRR1, Theorem 4.4]which is pointed out by the following example.

Example 3.5. With the designations used in [BMRRR1], let

H =

1 00 1

, X =

11 2

1 + 1 1 + 1

, X[]X =

0 00 0

with 1 < < 1. Then according to the statement (ii) of Theorem 4.4 the equation(X[]X, H) = (B0, H0) is satisfied and ker B0 = span{e1, e2}. However, ker X =span{e1 + e2} = ker B0, so that according to the statement (iii) of the theorem theH-polar decomposition

X = UA, U =1

1 2

1 1

, A =

1 11 1

should not exist.

This error is corrected by making a change in the second condition of Theorem3.4 for the size of the blocks from (J1, Z1), so that now the conditionpi 1 is imposedinstead of the original condition pi > 1. This correction is also made in [BMRRR3,Errata], and the Theorem 4.4 contained in the next chapter, too, emphasises the needfor making this change. (Since Theorem 4.4 is proved independent of Theorem 3.4the forward reference given here is not problematic.)

A further approach for showing the existence of H-polar decompositions will nowbe pointed out. It will be investigated only for complex matrices with regard toTheorem 3.2 and is based on the following observation. (Since H- as well as Z-hermitian matrices will be encountered in the following discussion, the abbreviatingnotation AH = A[]H will be used from now on.)

Lemma 3.6. If A Cnn admits an H-polar decomposition, then the canonicalforms of the pairs (AHA, H) and (AAH, H) are identical.

Proof. Let A = UM be an H-polar decomposition of A. Then UH = U1 andMH = M imply

U1AAHU = U1(UM)(MHUH)U = M2 = (MHUH)(UM) = AHA,

so that AHA and AAH are H-unitary similar. If now (R1AHAR, RHR) = (J, Z)is the canonical form of the pair (AHA, H) and ifS = UR, then (S1AAHS, SHS)= (R1U1AAHUR, RUHUR) = (J, Z) also gives the canonical form of the pair(AAH, H).

16


17/48

The question now arises whether the converse of this statement is also true. Itcan be answered immediately as follows for regular matrices.

Theorem 3.7. LetA Cnn be regular and let the canonical forms of the pairs(AHA, H) and (AAH, H) be identical. Then A admits an H-polar decomposition.

Proof. Let R and S be regular matrices from Cnn so that

(J, Z) = (R1AHAR, RHR) = (S1AAHS, SHS)

gives the canonical form of the pairs (AHA, H) and (AAH, H). Then the regularmatrix defined by

B = S1AR

is Z-normal, as is given with Z1 = Z = Z from

(ZBZ)B = Z(RAS)Z(S1AR) = ZRAHAR

= R

1

H

1

A

HAR = J,B(ZBZ) = (S1AR)Z(RAS)Z = S1AH1ASZ

= S1AH1AHS = J.

Let f(B) denote an arbitrary polynomial in B. Then the commutability of B andZBZ furthermore yields (ZBZ)f(B) = f(B)(ZBZ) or (Zf(B)Z)B = B(Zf(B)Z)and

(Zf(B)Z)f(B) = f(B)(Zf(B)Z) or

(Zf(B)Z)f(B) = f(B)(Zf(B)Z)

respectively. The last one of these equations is obtained from the last but one equa-

tion and the fact that the commutability of two matrices P and Q also implies thecommutability of P1 and Q, provided that the inverse exists.

If now X1BX = Jp1(1). . .Jpk(k) is the Jordan normal form of B, a matrixB with (

B)2 = B can be chosen such that it can be expressed as a polynomial

f(B) =

B, namely

B = X

Jp1(1) . . .

Jpk(k)

X1 with

Jp() =

f()f()

1!

f()

2! f

(p1)()

(p 1)!

f()f()

1!

.. .

.

..

f(). . .

f()

2!. . .

f()

1!f()

, f() =

, (3.2)

whereby in all blocks belonging to the same eigenvalue (B) C\{0} the samebranch (!) of the multivalent function

must be chosen [WED, Chapter VII,

17


18/48


19/48


20/48

is the wanted H-polar decomposition of A.Remark 3.10. In addition to the H-polar decomposition of A given in the

proof,

AH = UM with M = UMU

1 = YKY

1 and U = U

1 = XY

1

is an H-polar decomposition of AH and, furthermore, AHY = XK as can easilybe verified using AH = H1AH. Thus the basis vectors can be assigned to thesubspaces as follows

x1, . . . , xp+q,

imAH x1, . . . , x

r kerA

, x1 , . . . , x

r , y1, . . . , yp+q,

imA y1, . . . , y

r kerAH

, y1 , . . . , y

r ,

as is also evident from the equations

im AH

ker A = (im AH)[] and (ker AH)[] = im A

ker AH

which were not used in the proof but are valid too. Example 3.11.

1. If C\{0} and

H =

Ip

Ip

, A =

0p 0p

Ip Ip

, AH =

Ip 0pIp 0p

,

then AHA = 0 but AAH = 0. Therefore no H-polar decomposition of Aexists.

2. If C\{0} and

H = IpIp , A = 0p

Ip , AH = Ip 0p ,

then AHA = 0 and AAH = 0. Therefore an H-polar decomposition of Aexists, namely

A = UM with U =

1

IpIp

, M =

Ip

0p

.

3. If R and

H =

Zp

Zp

, A =

0p

Jp()

, AH =

Jp()

0p

,


A = UM with U =

Ip

Ip

, M =

Jp()

0p

.

4. If C\R, J2p(, ) = Jp() Jp() and

H =

Z2p

Z2p

, A =

02p

J2p(, )

, AH =

J2p(, )

02p

,

20


21/48


A=

UMwith

U=

I2p

I2p , M = J2p(, )

02p . These examples are here listed in detail because the cases 3. and 4. are important

in connection with normal forms of H-normal matrices, but this will not be consideredfurther here [LMMR, Theorem 10, Example 11]. The following sufficient condition forthe existence of H-polar decompositions can now be proved with the help of Lemma3.9.

Theorem 3.12. Let A Cnn and let the canonical forms of the pairs (AHA,H) and (AAH, H) be identical. Furthermore, let al l blocks of the canonical formbelonging to the eigenvalue 0 be of size 1. ThenA admits an H-polar decomposition.

Proof. Let R, S, J, Z and B be as in the proof of Theorem 3.7, so that BZB =BBZ = J can be assumed. Furthermore, let

J = J1 J0, J0 Cmm and Z = Z1 Z0, Z0 Cmm

whereby J0, Z0 designates the part of the canonical form belonging to the eigenvalue0. Then the spectra of the blocks J1 and J0 are disjoint and also the matrices J andB commute, so that also B must take the form

B = B1 B0, B0 Cmm.

Therefore B1 is regular because of BZ11 B1 = B1B

Z11 = J1 and it consequently admits

a Z1polar decomposition constructed according to Theorem 3.7

B1 = T1K1 (= K1T1) with T

1Z1T1 = Z1 and K

1Z1 = Z1K1.

Moreover, the prerequisites of the theorem yield5 BZ00 B0 = B0BZ0

0 = J0 = 0m andtherefore B0 admits a Z0polar decomposition

B0 = T0K0 with T

0Z0T0 = Z0 and K

0Z0 = Z0K0

constructed according to Lemma 3.9. Finally, if

T = T1 T0, U = STR1 and K = K1 K0, M = RKR1

then TK is a Z-polar decomposition of B and UM is an H-polar decomposition ofA.

Corollary 3.13. LetA Cnn be H-normal and let all blocks of the canonicalform of the pair(AHA, H) belonging to the eigenvalue 0 be of size 1. ThenA admits

an H-polar decomposition.A further criterion for the existence of H-polar decompositions of H-normal matri-

ces is given in Theorem 34 of [LMMR]. In Theorem 35 also contained there, H-polardecompositions of singular H-normal Matrices X = UA Cnn with UH = U1,AH = A are presented for all possible nontrivial cases in which H has exactly two

5If it could be proved that every H-normal matrix A with (AHA)k = 0 for a k from N admitsan H-polar decomposition, then the present restrictions regarding the block sizes for the eigenvalue0 would no longer be needed.

21


22/48

negative eigenvalues and whose existence is not guaranteed by Theorem 34. Thereby,in the listed cases, the index of nilpotency k of the matrices XHX = XXH is

k = 1, in case(I),(VI)-(VII)

2, in case(II)-(III),(VIII)-(XII)3, in case(IV)-(V)

as can easily be verified. Thus, the existence of the given H-polar decompositions inthe cases (I), (VI)-(VII) is ensured by Corollary 3.13 and, moreover, the hypothesisexpressed in the footnote on the last page is supported, too. On the other hand for R and

H =

11

1

, X =

0 1 i0 1

0

with XHX = XXH =

0 0 10 0

0

the existence of the H-polar decomposition

X = UA with U =

1 i

1

22 + i

1 i1

, A =

0 1 00 1

0

( R)

is guaranteed by Theorem 34(ii) but not by Corollary 3.13, so that the two criteriaare mutually supplementary.

4. Numerical computation of H-polar decompositions of a matrix A

for which A[]A is diagonalisable. The practical computation of H-polar decom-positions of a matrix A Cnn is a tedious task consisting of the following steps6:

1. Computation of the canonical form of the pair (A[]A, H),

2. Computation of an H-selfadjoint matrix M such that M2

= A[]

A andker M = ker A,3. Computation of an H-isometry U such that A = UM.

This chapter specifies a numerical method for the case that the matrix A[]A isdiagonalisable. This requires a simplified canonical form for a pair (A, H) where Ais a diagonalisable H-selfadjoint matrix and H is a regular hermitian matrix, whichis derived based on the following facts taken from [GLR, Chapter I.2.2]:

IfA Cnn is H-selfadjoint then for every non-real eigenvalue (A) it followsthat also (A) and the Jordan normal form ofA contains blocks of the same sizefor both eigenvalues. Let

EA() = {x Cn : (A I)kx = 0 for a k N}

be the generalised eigenspace for the eigenvalue . If 1, . . . , r are the real andr+1, . . . , s are the non-real eigenvalues with positive real parts, and if Xi = EA(i)

6In the case H = I the polar decomposition of a square matrix A is obtained from its singularvalue decomposition. Namely if A = UV, then UV is the isometric and VV is the selfadjointfactor of a polar decomposition of A. This suggests that in the case of an indefinite scalar product,too, a similar approach can be adopted to obtain an H-polar decomposition. An analogy to thesingular value decomposition is examined in the paper [BR]. In Chapter 8 thereof the statementsregarding H-polar decomposition are derived, on which the present chapter is based: The existence ofan H-polar decomposition of A is inferred there, too, from the canonical form of the pair (A[]A,H).

22


23/48

for 1 i r and Xi,1 = EA(i), Xi,2 = EA(i) for r + 1 i s, then the Cn canbe expressed as the direct sum of the non-degenerate generalised eigenspaces

Cn = X1

. . .

Xr

Xr+1

. . .

Xs with Xi = Xi,1

Xi,2 for r + 1

i

s

for which the following equations are satisfied

[xk, xl] = 0 for xk Xk, xl Xl and 1 k = l s,[xk, yk] = 0 for xk, yk Xk,1 or xk, yk Xk,2 and r + 1 k s.

Therefore, if R Cnn is a matrix whose column vectors are bases of the subspacesXi, then R is regular and

R1AR = A1 . . . Ar

Ar+1,1 0

0 Ar+1,2

. . .

As1 0

0 As2

,

RHR = H1

. . .

Hr

0 Hr+1

H

r+1 0 . . .

0 Hs

H

s 0 ,whereby the size of the blocks is given by pi = dim Xi for 1 i r and pi =dim Xi,1 = dim Xi,2 for r + 1 i s. In the special case of a diagonalisable matrix Athe generalised eigenspaces contain exclusively eigenvectors and we get the simplifiedform

R1AR = 1Ip1. . .rIpr

r+1Ipr+1 00 r+1Ipr+1

. . .

sIps 0

0 sIps

,

RHR = H1 . . . Hr

0 Hr+1Hr+1 0

. . .

0 Hs

Hs 0

.

The following result is obtained directly from this representation whose proof will be

given in two ways in order to, on the one hand, show the connection to Theorem 3.1and, on the other hand, to provide the foundation for the corresponding numericalmethod.

Theorem 4.1 (Simplified canonical form). Let A Cnn be H-selfadjoint anddiagonalisable. Then there exists a regular matrix S Cnnsuch that

S1AS = A1 . . . Ak and SHS = H1 . . . Hk, (4.1a)

whereby the blocks Aj and Hj are of equal size and the pairs (Aj , Hj) have one andone only of the following forms:

1. Pairs belonging to real eigenvalues.

Aj = Ip and Hj = Ipq 0

0 Iq (4.1b)with R and p, q N, q p.

2. Pairs belonging to non-real eigenvalues

Aj =

Ip 0

0 Ip

and Hj =

0 IpIp 0

(4.1c)

with C\R, Im() > 0 and p N.23


24/48

Moreover, the form (S1AS, SHS) of (A, H) is uniquely determined up to the per-mutation of blocks.

First proof. According to Theorem 3.1 a regular matrix S Cnn exists suchthat the pair (A, H) is in the canonical form (3.1). Because of the assumed diagonal-

isability, the size of the blocks appearing therein is always p = 1. Now combining allp blocks of the form (3.1b) or (3.1c) which belong to the same eigenvalue R or C\R respectively, then after a suitable permutation it is always possible to buildone block of the form (4.1b) or (4.1c). From p q blocks of (3.1b) with = +1 andq blocks of (3.1b) with = 1 this gives one block of the form (4.1b).

Second proof. For all real eigenvalues (A), 1 r, let {u1, . . . , up}be an orthonormalised (according to Theorem 2.4) basis of eigenvectors of EA(),arranged such that (Huj , uj) = 1 for 1 j p q and (Huj, uj) = 1 for p q + 1 j p. For all non-real eigenvalues , (A), r + 1 s, let{u1, . . . , up} and {v1, . . . , vp} be two orthonormalised (according to Theorem 2.5)bases of eigenvectors of EA() and EA(). On now combining these bases ascolumns of the matrix S, the pair (S1AS, SHS) takes on the assumed form.

Example 4.2 (Non-defective matrix pencils). If H G is a non-defectivematrix pencil with regular hermitian matrices H and G from Cnn, then by definitionregular matrices P and Q exist such that

H = P1HQ and G = P

1GQ

are diagonal matrices. Thus the matrix H1G = Q1H GQ1 is diagonalisable and

because (H1G)H = H(H1G) it is H-hermitian. The canonical form (S1H1GS,SHS) of the pair (H1G, H) can therefore be expressed according to Theorem 4.1in the simplified form

S1H1GS =

rj=1

jIpj

s

j=r+1

jIpjjIpj

,

SHS =

rj=1

Ipjqj

Iqj

sj=r+1

Ipj

Ipj

,

SGS =

rj=1

j

Ipjqj

Iqj

sj=r+1

jIpj

jIpj

with 1, . . . , r R\{0} and r+1, . . . , s C\R [MMX, Corollary 2.4]. From the second proof of Theorem 4.1 the following numerical procedure can be

derived for computing a simplified canonical form.Numerical Procedure 4.3 (Simplified canonical form). First of all the eigen-

values 1, . . . , n and a corresponding basis of the Cn consisting of the eigenvectors

r1, . . . , rn of A must be computed. For example, the LR method [PW], the QRmethod [F] or a method based on Jacobi rotations and shears according to Eberlein[E] can be used for this purpose. Thereafter the eigenvalues must be combined ingroups of numerical eigenvalues 1, . . . ,

k. This can be done with eigenvalue esti-mates via Gerschgorin circles [KR], with a modified cluster analysis algorithm [K] orsimply by the following sorting method:

for j = 1 . . . n 1 dok = jfor i = j + 1 . . . n do

24


25/48

if | Re(i) Re(k)| < thenif | Im(i) Im(k)| < or | Im(i) + Im(k)| < then

if Im(i) > Im(k) thenk = i

end if

else if | Im(i)| > | Im(k)| thenk = i

end if

else if | Re(i) + Re(k)| < thenif Re(i) > Re(k) then

k = iend if

else if | Re(i)| > | Re(k)| thenk = i

end if

end for

if k = j thenSWAP(j , k)for i = 1 . . . n do

SWAP(rij , rik)end for

end if

end for

Thereby > 0 is a control parameter, with which the sorted eigenvalues are combinedin groups whose mean value is interpreted as numerical eigenvalue

1, . . . , p1 p1+1, . . . , p1+p2 . . . p1+...+pk1+1, . . . , p1+p2+...+pk 1

2 k

.

A group limit is reached when either | Re(i)Re(i+1)| or | Im(i)Im(i+1)| . Furthermore, the sorting method is devised such that pairs of non-real numeri-cal eigenvalues always come consecutively, so that either i is a real eigenvalue ori, i+1 is a pair of conjugate non-real eigenvalues. Thereby the parameter alsodecides whether an eigenvalue is to be considered as real | Im(i)| < or as non-real| Im(i)| , and the eigenvalue 0 is additionally characterised by | Re(i)| < ; itis always sorted to the end, entailing organisational advantages. The eigenvectorsbelonging to an eigenvalue i or the eigenvectors belonging to a pair of eigenvalues

i , i+1 =

i

rmi1+1, . . . , rmi1+pi or rmi1+1, . . . , rmi1+pi and rmi+1, . . . , rmi+pi+1

with mi = p1 +p2 + . . . +pi can be orthonormalised by method 2.6, which is possiblein the second case only when pi = pi+1. Otherwise an error situation has arisen whichmight possibly be correctable by choosing a more suitable value for (provided thatthe matrix A is H-hermitian, as is of course assumed here). Empirically it was foundthat in most cases a very good result can be achieved with machine (machineaccuracy). A generalised method for numerical computation of the canonical form ofa pair (A, H) with a non-diagonalisable matrix A is described in [K].

The existence of an H-hermitian square root of the matrix A[]A can also be shownin simplified form if it is diagonalisable. The complete proof of the specialisation of

25


26/48

Theorem 3.4 is given for clarity and in order to make the following numerical procedurecomprehensible.

Theorem 4.4 (Existence of H-selfadjoint square roots). Let A Cnn and letB = A[]A be diagonalisable. Then there exists an H-selfadjoint matrix M such that

M2 = B and ker M = ker A if and only if the following conditions are satisfied:1. The part of the (simplified) canonical form of the pair (B, H) belonging to

negative real eigenvalues = 2 consists of blocks of the form

Bj = 2I2p and Hj =

Ip 0

0 Ip

with > 0 and p N.2. The part of the (simplified) canonical form of the pair(B, H) belonging to the

eigenvalue 0 consists of the blocks

Bj = 0p and Hj =

Ir+s 0

0 Ir+t

with p,r,s,t N

, 2r + s + t = p, and if {e1, . . . , er+s, f1, . . . , fr+t} stand forthe basis in which these blocks exist, then two permutations (1, . . . , r+s) of(1, . . . , r + s) and (1, . . . , r+t) of (1, . . . , r + t) exist, so that

ker A = span{e(1) + f(1), . . . , e(r) + f(r)} span{e(r+1), . . . , e(r+s)} span{f(r+1), . . . , f(r+t)}

is fulfilled.Proof. []: Let B Cnn be a diagonalisable matrix with (B) = {1, . . . , k}

and let

R1BR = 1Ip1 . . . kIpkbe its Jordan normal form whereby the columns ofR form a basis of the Cn consistingof eigenvectors of B. Then every matrix M with M2 = B can be expressed in theform

M = R(

1Ip1 . . .

kIpk)R1

on setting Ip = Xp(

Ipq

Iq)X

1p for = 0 and

0p = Xp

Ir

0r

0p2r

X1p for = 0

where Xp Cpp designates an arbitrary regular matrix. The simplified canonicalform of an H-selfadjoint matrix M whose square is diagonalisable

R1MR = M1 . . . Mk, RHR = H1 . . . Hktherefore consists of blocks (Mj , Hj) of the form

IpIp

,

Ip

Ip

for C\R or

Ip+q , Ip Iq

for R\{0} or

Ip+q

0p+q

0s+t,

Ip Iq

Ip Iq

Is It

for = 0,

26


27/48

whereby the blocks belonging to the eigenvalue 0 have been combined in evidentmanner to the ordinary canonical form

M0 = p+q

i=1

0 10 0 s+t

i=1

[0],H0 =

pi=1

0 11 0

q

i=1

0 1

1 0

s

i=1

[1]

t

i=1

[1]

.

Now if such a matrix M is given, and if the basis in which the (simplified) canonicalform exists is designated as {g1, . . . , gn}, then the (simplified) canonical form of thepair (M2, H) has the following properties:

1. If C\R iR and if the canonical from of (M, H) contains the blocks

Mj1 Mj2 =

Ip1Ip1

Ip2

Ip2

,

Hj1 Hj2 = Ip1

Ip1

Ip2Ip2

,

then the canonical form of (M2, H) contains a pair of blocks of the form

M2j =

2Ip1+p2

2

Ip1+p2

, Hj =

Ip1+p2

Ip1+p2

. [End of 1.]

2. If R\{0} and if the canonical form of (M, H) contains the blocks

Mj1 Mj2 = Ip1+q1 Ip2+q2 ,Hj1 Hj2 = (Ip1 Iq1) (Ip2 Iq2),

then the canonical form of (M2, H) contains a pair of blocks of the form

M2j = 2I(p1+p2)+(q1+q2), Hj = Ip1+p2 Iq1+q2 . [End of 2.]

3. If = i iR\{0} and if the canonical form of (M, H) contains a pair ofblocks

Mj =

iIp

iIp

, Hj =

Ip

Ip

,

then the canonical form of (M2, H) contains a pair of blocks

M2j = 2Ip

2

Ipand Hj .

If a new basis {e1, . . . , ep, f1, . . . , fp} is now chosen with

ek =1

2(gk + gk+p), fk =

12

(gk gk+p) for 1 k p,

then

(Hjek, el) = kl, (Hjek, fl) = 0, (Hjfk, fl) = kl for 1 k, l p27


28/48

and the following blocks appear

M2j = M2j and Hj =

Ip

Ip

. [End of 3.]

4. If = 0 and if the canonical form of (M, H) contains a pair of blocks

Mj =

Ip+q

0p+q

0s+t, Hj =

Ip Iq

Ip Iq

Is It,

then the canonical form of (M2, H) contains a pair of blocks

M2j = 02p+2q+s+t and Hj .

If a new basis {e1, . . . , ep+q+s, f1, . . . , fp+q+t} is now chosen with

ek =1

2

(gk + gk+p+q), fk =1

2

(gk

gk+p+q) for 1

k

p,

ek =1

2(gk gk+p+q), fk = 1

2(gk + gk+p+q) for p + 1 k p + q,

ek+p+q = gk+2p+2q for 1 k s,fk+p+q = gk+2p+2q+s for 1 k t,

then

(Hjek, el) = kl, (Hjek, f) = 0, (Hjf, f) = vfor 1 k, l p + q + s and 1 , p + q + t

and the following blocks appear

M2j = M2j and Hj =

Ip+q+sIp+q+t

.

Moreover, it is true that

ker M = span{g1, . . . , gp+q, g2p+2q+1, . . . , g2p+2q+s+t}= span{e1 + f1, . . . , ep+q + fp+q} span{ep+q+1, . . . , ep+q+s} span{fp+q+1, . . . , fp+q+t}.

[]: Let B be diagonalisable and let R1BR = B1 . . . Bk, RHR = H1 . . . Hk be the (simplified) canonical form of the pair (B, H). Furthermore, let theconditions 1. and 2. be satisfied, and let p be diagonal matrices with diagonal

elements of{+1, 1}. Then the matrix M can be constructed as follows:1. If = 2 C\R and if the canonical form contains a pair of blocks of the form

Bj =

Ip

Ip

and Hj =

Ip

Ip

,

then for

Mj =

p

p

28


29/48


30/48

Using the notation of Theorem 3.4, the part of the canonical form belonging tothe eigenvalue 0 in the basis {e1, f1, . . . , er, fr, er+1, . . . , er+s, fr+1, . . . , fr+t} can beexpressed as

ri=1

(N1 N1)

s

i=1

[0]

t

i=1

[0]

,

ri=1

(Z1 Z1)

s

i=1

[1]

t

i=1

[1]

.

This confirms again the correction of Theorem 3.4 explained with Example 3.5. Fur-thermore, the diagonal matrices p used in the proof given above comply with therelationships between the canonical forms of (M2, H) and (M, H) listed in Theorem3.4(a-f).

Summing up, the following procedure can now be described using the designationX Y = [x1 . . . xp y1 . . . yq] Cn(p+q) for given matrices X = [x1 . . . xp] Cnp

and Y = [y1 . . . yq] Cnq

having the specified columns.Numerical Procedure 4.5 (H-polar decomposition). Let A Cnn and letA[]A be diagonalisable. Then an H-polar decomposition ofA can be constructed asfollows.

1st step: First of all the (simplified) canonical form of the pair (A[]A, H) mustbe determined7 using the procedure 4.3, which is given as example by

R1A[]AR = J = J3 J2 J1 J0,

J =

2Ip3

2Ip3

2Ip22Iq2

2Ip1

2Ip1

0r+s0r+t

,

RHR = ZJ = ZJ,3 ZJ,2 ZJ,1 ZJ,0,

ZJ = Ip3

Ip3

Ip2 Iq2 Ip1 Ip1 Ir+s Ir+t ,R = R3 R2 R1 R0

with C\R iR and , R (0, ). Thereby the (rectangular) blocks Rj ,0 j 3, which belong to the corresponding eigenvalue contain the (normalised)eigenvectors.

2nd step: Now an H-hermitian square root M of the matrix A[]A can be deter-mined with

S1MS = K = K3 K2 K1 K0,

K = Ip3 Ip3Ip2

Iq2 iIp1iIp1

Ir0r 0s 0t ,

SHS = ZK = ZK,3 ZK,2 ZK,1 ZK,0,7If the matrix A[]A has only non-real and positive eigenvalues, then the canonical form does

not necessarily have to be determined. If in this case K is a diagonal matrix with square roots ofthe eigenvalues, then M = RKR1 and U = AM1 are given directly as H-polar decompositionof A. The canonical form is required in the general case considered here, in order to decide whetheran H-hermitian square root exists in the case of negative eigenvalues, and in order to find a suitablekernel transformation in the case of the eigenvalue 0.

30


31/48

ZK =

Ip3

Ip3

Ip2Iq2

Ip1Ip1

IrIr

IsIt

,

S = R3 R2 R1 R0 (R0 according to kernel transformation).

In the case of the negative eigenvalue 2 this construction is possible only if condition1. of Theorem 4.4 is fulfilled, i.e. only if the number of negative is the same as thenumber of positive eigenvectors from R1. It would then also be possible to use theblocks

K1 =

iIp1

iIp1

, ZK,1 =

Ip1

Ip1

and R1 =

12

R1

Ip1 Ip1Ip1 Ip1

,

but this is found to be less convenient when constructing the isometry in the thirdstep. Furthermore, in the blocks ofK diagonal matrices p3 , p2 , q2 , p1 , r withdiagonal elements from {+1, 1} could also be used instead of the identity matrices,which would then produce another H-polar decomposition of A. Finally, the requiredtreatment of the eigenvalue 0 in this step is based on the following procedure.

Kernel transformation: Let R0 = R+ R, where R+ contains the r + s positiveand R contains the r + t negative vectors from ker A[

]A, so that on the one hand

R+HR+ = Ir+s and R

HR = Ir+t

and because of ZJJ = (RHR)(R1A[]AR) = RAHAR = (AR)H(AR) on

the other hand

(AR+)H(AR+) = 0r+s and (AR)

H(AR) = 0r+t.

Furthermore, let U+(AR+)V+ = + and U

(AR)V = be singular value de-compositions of the matrices AR+ and AR. Then if the number r of non-vanishingsingular values

i (see procedure 4.3) in

+and is the same, i.e.

(AR+)V+ = U++ = [a+1 . . . a

+r 01 . . . 0s],

(AR)V = U = [a

1 . . . a

r 01 . . . 0t],

and if the transformation defined by

(U++)rW = (U)r, W = (1+ U+U)r(the index r stands for submatrices)

is unitary WW = WW = Ir then

R+ R+ = R+V+(W Is) and R R = RVwith R+, R Cnr can be set. The matrix given by

R0 = R+ R R+ Rthen satisfies the equations

AR0 = [a1 . . . ar a1 . . . ar 01 . . . 0s+t], (AR0)H(AR0) = 02r+s+tand (R0)

H(R0) = (Ir Ir) (Is It),31


32/48

and for

R0 =1

2(R+ R)

Ir IrIr Ir

(R+ R)

we get

AR0 =

2[01 . . . 0ra

1 . . . a

r 01 . . . 0s+t], (AR

0)H(AR0) = 02r+s+t

and (R0)H(R0) =

Ir

Ir

IsIt

.

If the number of non-vanishing singular values in + and differ or the transfor-mation W is not unitary, then ker A cannot be expressed in the just constructed formker A = span{e1 + f1, . . . , er + fr, er+1, . . . , er+s, fr+1, . . . , fr+t} with

R+ = [e1 . . . er], R+ = [er+1 . . . er+s], R = [f1 . . . fr], R = [fr+1 . . . fr+t].

In that case the condition 2. of Theorem 4.4 is violated and an H-polar decompositionof A then does not exist.

3rd step: After the second step M = SKS1 is the H-hermitian factor, andin the regular case, in which no blocks of the form J0, ZJ,0 and K0, ZK,0 exist,U = AM1 = ASK1S1 is the H-unitary factor of an H-polar decomposition of A.The inverse of the matrix S which thereby appears can be obtained in the numericalevaluations of the equations using S1 = ZKS

H, which follows from SHS = ZK .In the singular case let

K = K3 K2 K1 K0 with K0 = K10 =

IrIr

Is+t

and also let

S0 = S

0 S0 S0 with S0, S0 Cnr and S0 Cn(s+t)

be a partitioning of the part of the matrix S belonging to the eigenvalue 0. Thenon the one hand AS0 = 0n,r A0 0n,s+t with A0 =

2[a1 . . . a

r ], because the

columns of S0 and S0 constitute a basis of ker A after the kernel transformation, andon the other hand S0K0 = 0n,r S0 0n,s+t. Therefore the matrices ASK1 andMSK1 take on the form

ASK1 = AS3K13 AS2K12 AS1K11 AS0K10

with AS0K10 = A

0 0n,r+s+t,MSK1 = S3K3K

13 S2K2K12 S1K1K11 S0K0K10

with S0K0K10 = S

0 0n,r+s+tand their respective first m = n

r

s

t columns

(ASK1)m = AS3K13 AS2K12 AS1K11 A0,

(MSK1)m = S3 S2 S1 S0,are bases of im A or im M for which

(ASK1)mH(ASK1)m = (MSK

1)mH(MSK1)m =

(ZK)m =

Ip3

Ip3

Ip2Iq2

Ip1Ip1

0r.

32


33/48

Now both bases can be extended according to Theorem 2.7 to bases of the Cn suchthat

(ASK1)n = AS3K13

AS2K

12

AS1K

11

(A0

A0

A0),

(MSK1)n = S3 S2 S1 (S0 S0 S0),

so that (ASK1)nH(ASK1)n = (MSK

1)nH(MSK1)n = ZK is fulfilled. This

is obviously trivial in the case of im M because there (MSK1)n = S can be chosen.In the case of im A it is convenient to start with the matrix

Cm =1

2AS3K

13

Ip3 Ip3Ip3 Ip3

AS2K12 AS1K11 A0 with

CmHCm =

Ip3

Ip3

Ip2Iq2

Ip1Ip1

0r,

whose columns already contain an orthonormal basis of im A. Finally

U = (ASK1)n(MSK1)1n = (ASK1)nS1 = (ASK1)n(ZKSH)

gives the wanted H-isometry for which UM is an H-polar decomposition of A. In order to be able to assess the numerical properties of the described procedure,

a corresponding implementation was tested using the programming language C. Forthis purpose the canonical forms (K, ZK) were given according to step 2 with

p3 = p2 = q2 = p1 = r = s = t = 2, i.e. n = 20,

and random values for ,,. Using randomly chosen transformations S and ran-domly chosen H-isometries U, test examples of the kind

M = S1KS, H = SZKS, UHU = H and A = UM

were constructed, always based on normally distributed random numbers from theinterval [2, 2]. Finally H-polar decompositions UM of the test matrices A werecomputed, whose numerical accuracy can be estimated via the residuals or the con-dition number, respectively,

rA = A UM, rM = MH HM, rU = UHU H, cU = UU1

where A =

tr(AA) is the Frobenius norm. The results of two statistical ex-periments with 50 repetitions are shown in the following table, in which is the(empirical) mean value, 2 the (empirical) variance and min/max specify the respec-tive minimum and maximum value. The machine accuracy and the parameter werethereby given as

machine = 2.2204460492503131 1016 and = 106.For the first variant the Gauss-Jordan method of matrix inversion [ST, Kapitel

4.2] was used to compute the matrix S1; for the second variant the inversion wasaccomplished with the equation S1 = ZKS

H. The eigenvalues were computed inthe depicted cases using a generalisation of the Jacobi method according to Eberlein[E]. In other experiments the LR or the QR method was used. It was found that allmethods produce nearly the same numerical results.

33


34/48

Table 4.1Results of two statistical experiments

Variant k = 50 2 min max 10

Jacobi log rA -10.444 1.623 -12.249 -6.954 3.597e-11with log rM -8.696 1.681 -10.565 -5.674 2.014e-09

inverse log rU -8.303 1.919 -10.225 -5.303 4.977e-09log cU 4.498 0.868 2.986 6.452 31 477

Jacobi log rA -8.890 2.560 -11.260 -5.040 1.288e-09without log rM -12.296 0.097 -12.805 -11.216 5.058e-13inverse log rU -8.297 1.904 -10.140 -5.287 5.047e-09

log cU 4.498 0.868 2.986 6.452 31 477

5. Factor analysis and procrustes problems. This chapter shows how pro-crustes problems in indefinite scalar product spaces are solved by application of H-polar decompositions. As motivation for this task, it will first of all be shown how to

construct points from given distances. This is a generalisation of the work [YH] forcomplex vector spaces and indefinite scalar products.Let F = R or F = C and let [., .] be an indefinite scalar product of the Fn with

the underlying regular symmetric or hermitian matrix H Fnn. Then for arbitraryvectors x, y Fn in the case F = R

[x, y] =1

2([x, x] + [y, y] [x y, x y]) (5.1a)

and in the case F = C

Re[x, y] =1

2([x, x] + [y, y] [x y, x y])

=1

2([x, x] + [y, y] [iy ix, iy ix]),

Im[x, y] = 12

([x, x] + [y, y] [x iy, x iy])

= 12

([x, x] + [y, y] [y ix, y ix]),

(5.1b)

so that the scalar products of the vectors can be expressed in terms of their normand distance squares. Now let the vectors x1, . . . , xN Fn be given and let X =[x1 . . . xN]

FNn be a matrix whose rows are the conjugate transposed vectors.Then

W = XHX

is the Gramian matrix of the xk and for the dimension of the set of points which theyset up m = rank X = rank W. In particular, if N

n and span

{x1, . . . , xN

}= Fn,

then the number of positive and the number of negative eigenvalues of H and W areequal, and furthermore the eigenvalue 0 appears in (W) with the multiplicity N n(Sylvesters law of inertia). Moreover, the elements wkl = [xl, xk] of the matrix Waccording to (5.1) can be expressed in the form

wkl =1

2(2k +

2l 2kl) if F = R or (5.2a)

wkl =1

2(2k +

2l 2kl) +

i

2(2k +

2l 2kl) if F = C. (5.2b)

34


35/48

where

2k = [xk, xk], 2kl = [xl xk, xl xk], 2kl = [xl ixk, xl ixk] (5.3)

with

2

k,

2

kl,

2

kl R

and

2

kl =

2

lk,

2

kk = 0 or

2

kl +

2

lk = 2(

2

k +

2

l ) for 1 k, l N.Conversely, let the norm and distance squares be given such that (5.3) is satisfied,and let the elements of a matrix W be defined by (5.2). Then this matrix is symmetricor hermitian and can therefore be written in the form

W = RR.

Thereby is a diagonal matrix of the real eigenvalues 1, . . . , N of W and R =[r1 . . . rN] is a matrix whose columns form a basis of the F

N consisting of orthonor-malised eigenvectors. Now if p is the number of positive and n p is the number ofnegative eigenvalues and if it is assumed that

1, . . . , p > 0, p+1, . . . , n < 0 and n+1 = . . . = N = 0,

then the matrices defined by

1 = diag(1, . . . , p, p+1, . . . , n) and R1 = [r1 . . . rn]

satisfy W = R11R1 too. Consequently if we make

= diag(

1, . . . ,

p,p+1, . . . ,n) and

H = diag(+1, . . . , +1 p

, 1, . . . ,1 np

)

then the matrix

X = R1 FNn

fulfils on the one hand rank X = n and on the other hand XHX = R1HR1= R11R

1 = W. Therefore the conjugate transposed rows x1, . . . , xN Fn of

the matrix X constitute a generating system of the Fn, and for the indefinite scalarproduct defined by [x, y] = (Hx, y) we have wkl = [xl, xk]. This means that also

[xk, xk] = wkk = 2k,

[xl xk, xl xk] = wkk + wll wkl wlk = 2kl,[xl ixk, xl ixk] = wkk + wll + iwkl iwlk = 2kl (ifF = C),

so that the constructed points correspond to the given norm and distance squares.We thus have proved the following theorem.

Theorem 5.1 (Construction of vectors from norm and distance squares). Let

F = R orF = C and let 2k,

2kl be real numbers such that

2kl =

2lk and

2kk = 0 for

all k, l from{1, . . . , N }. Furthermore, for the case F = C let 2kl be real numbers suchthat 2kl +

2lk = 2(

2k +

2l ) for all k, l from{1, . . . , N }. Then the following statements

are equivalent:1. There exist vectors x1, . . . , xN Fn constituting a generating system for theFn, for which [xk, xk] = 2k as well as [xl xk, xl xk] = 2kl, and in the caseF = C also [xl ixk, xl ixk] = 2kl is satisfied. Thereby [., .] is an indefi-nite scalar product of the Fn with underlying regular symmetric or hermitianmatrix H Fnn which has p positive eigenvalues.

35


36/48

2. The symmetric or hermitian matrixW FNN whose elements wkl are de-fined by (5.2) has p positive andnp negative eigenvalues, and the eigenvalue0 appears with multiplicity N n.

A real vector space provided with an indefinite scalar product defined by a matrix

Hq = diag(+1, . . . , +1 nq

, 1, . . . ,1 q

)

is also called a pseudo-Euclidean space; a corresponding complex vector space is calleda pseudo-unitary space. In both cases q is termed the index of inertia [GR, ChapterIX, 4]. In the case of q = 0 we have an Euclidean or unitary space for which weimmediately get the following corollary.

Corollary 5.2. The vectors which exist according to Theorem 5.1 are vectorsof an Euclidean or unitary space if and only if the matrix W is positive semidefinite.

In the case N = 2 of a real space, the vectors

x1 =OP1 and x2 =

OP2

describe a triangle with the corner points O, P1, P2. Furthermore

det W =1

2(21

22 +

21

212 +

22

212)

1

4(41 +

42 +

412)

=1

4

212 (1 2)2

(1 + 2)

2 212

and for 1, 2, 12 0 this determinant is non-negative if and only if|1 2| 12 and 12 1 + 2

and the relations obtained by cyclic exchange of the variables are fulfilled. But thisis just the triangle inequality, so that Corollary 5.2 contains a generalisation of this

essential property of Euclidean geometry.Remark 5.3 (Factor analysis and multidimensional scaling). If the coordinates

origin is designated x0 and writing 2k0 =

20k instead of

2k, then Theorem 5.1 shows

how N+1 points (objects) must be arranged in a coordinates system so that they takeup given distances. This constitutes the basis for an entire discipline in psychologywhich is there called factor analysis [H] or multidimensional scaling [D]. Essentiallythis is a procedure for geometric modelling of cognitive processes and analysing theresulting constellations of objects with regard to the geometric invariants (dimension,signature, lengths, volumes, angles). It would thereby also be possible in principle tointerpret physical invariants, because the matrix

T = XX, T = T with T =N

k=1 xkx

k for 1

,

n

can be interpreted as (contravariant) tensor of inertia in the sense of Hermann Weylwho discovered in this connection an interesting confusion of an antisymmetric tensorwith a vector product, which is possible only in R3 [WEY, 6]. Unfortunately thiserror has so far not been corrected in the textbooks of classical physics, so that thedynamics of rotational motion could partly be presented in a simplified form.

At any rate, for the points constructed in Theorem 5.1 T = R1R1 = 2 is

a diagonal matrix, so that the xk are oriented along its inertial axes and the absolute

36


37/48

values of the eigenvalues give the associated (contravariant) moments of inertia. Itmust also be mentioned here that the scalar products of N points x1, . . . , xN Fnwhose centroid lies at the coordinates origin, i.e.

Nk=1

xk = 0,

are determined by the equations

[xk, xl] =1

2

1

N

Nj=1

xj xk2H +1

N

Ni=1

xl xi2H

xl xk2H1

N2

Ni=1

Nj=1

xj xi2H

if F = R,

[xk

, xl] =

1

21

N

N

j=1

(

xj

xk

2

H+ i

xj

ixk

2

H)

+1

N

Ni=1

(xl xi2H + ixl ixi2H)

(xl xk2H + ixl ixk2H)

1N2

Ni=1

Nj=1

(xj xi2H + ixj ixi2H)

if F = C

whereby for brevity x2H = [x, x] has been set [T]. From the given distance squares2kl = xl xk2H and 2kl = xl ixk2H it is therefore possible to calculate theelements wkl = [xl, xk] of a matrix W whose row and column sums vanish. Using the

construction of Theorem 5.1 this produces points whose centroid lies at the origin sothat the resulting set of points is oriented along its central main inertial axes. However,in the complex case this is possible only when the 2kl satisfy a rather complicatedcondition arising from the centroid situation.

After this brief excursion, the main result of this chapter will now be derived. Forthis purpose let x1, . . . , xN still be the vectors, constructed according to Theorem 5.1,of a Fn provided with an indefinite scalar product [., .] = (H., .). For every H-isometryU Fnn it then follows that

[Uxl, Uxk] = [xl, xk] = wkl for UHU = H,

which can also be expressed in matrix equation form

XVHV

X

= XHX

= W for VHV

= H

by making V = U. Thus the conjugate transposed rows xk = Uxk contained in thematrix X = XV take on the specified distances, too.

Now let two Ntuples of vectors (x1, . . . , xN) and (y1, . . . , yN) of the Fn be given,which one can consider as having arisen, for example, by measuring the distances ofa dynamical system of ob jects at two different times. Then, on comparing the twoconstellations, the question arises, what part of the observed differences is due to thedifferent position in space, and what part is due to actual differences in the inner

37


38/48

structure of the constellations. Expressed mathematically, the task is to determinean H-isometry U Fnn which solves the optimising problem

f(U) =

N

k=1

[Uxk yk, Uxk yk] min, h(U) = U

HU H = 0. (5.4a)

The sum of distance squares arising therein can be expressed in the form of a trace,so that an alternative expression with

f(V) = tr[(XV Y)(XV Y)H] min, h(V) = VHV H = 0 (5.4b)is given, whereby as above X = [x1 . . . xN]

, Y = [y1 . . . yN] and V = U were set.

Within the scope of Euclidean vector spaces a solution of this problem was found inthe paper [S] where it was called the orthogonal procrustes problem. In the presentcontext of indefinite scalar products it is furthermore called the H-orthogonal or H-unitary procrustes problem. By introducing a matrix of the (unknown) Lagrangemultipliers L

Fnn, the side condition can then be stated in the form

h(V) = tr[L(VHV H)] (5.5)and the necessary first order condition for solving the problem is

V(f + h) = 0.

Differentiation of the trace [DP] gives

f

V= 2XXVH 2XYH and h

V= (L + L)VH,

so that V must satisfy the equation

XXVH + VH = XYH for =L + L

2=

and U must satisfy the equation

HUXX + HU = HYX or UXXH + UH = YXH. (5.6)

Now defining

M = (XX + )H and A = YXH,

it follows that MH = H(XX + )H = HM and the necessary condition takes theform

A = UM with A = YXH and UHU = H, MH = HM. (5.7)

Thus the solution of the problem can be determined by an H-polar decomposition ofthe matrix A. Strictly speaking, the trace should not be differentiated in the complexcase, because the complex derivatives do not exist. But there the necessity for thelast equation can be derived too by determining the real derivatives.

If A = A1 + iA2 Cmn is a complex matrix, then its real and imaginary partsare designated below with A1 and A2 respectively. The same applies to complex

38


39/48

scalars = 1 + i2 C. Furthermore, let the real representations of a matrix of thefirst and second kind be defined by

A = A1 A2

A2 A1 R2m2n and A = A2 A1A1 A2 R

2m2n

whereby {+1, 1} is called the characteristic of the depiction. For the operators and introduced this way, the following calculation rules apply.

Lemma 5.4 (Real depiction of complex matrix equations). Let X, Y Cmn,Z Cnk, A, B Cnn and C. Then

1. X = (iX), X = (iX),2. (X) = 1X

2X, (X) = 1X + 2X,3. (X + Y) = X + Y, (X + Y) = X + Y,4. (XZ) = XZ = XZ, (XZ) = XZ = XZ,5. (X) = (X

T) = (X)T, (X) = (X

T) = (X)T,

6. A = (A)T, A = (A)T, if A = A,7. B

= (B

)

T

, B

= (B

)

T

, if B

= B,8. (A1) = (A)1, (A1) = (A)1, if det(A) = 0,9. 2tr(A) = tr(A) + i tr(A),

10. | det(A)|2 = det(A) = det(A).Proof. The proofs are obtained by simple verification demonstrated by some

examples. Proof of 2 and 3:

(X) = [(1X1 2X2) + i(1X2 + 2X1)]

=

1X1 2X2 (1X2 + 2X1)

(1X2 + 2X1) 1X1 2X2

= 1

X1 X2X2 X1

2

X2 X1

X1 X2

= 1X 2X analogously (X) = 1X + 2X,

(X + Y) = [(X1 + Y2) + i(X2 + Y2)]

=

X1 + Y1 (X2 + Y2)

(X2 + Y2) X1 + Y1

=

X1 X2X2 X1

+

Y1 Y2Y2 Y1

= X + Y analogously (X + Y) = X + Y.

Proof of 8: Let B = A1. Then I+i0 = AA1 = AB = (A1+iA2)(B1+iB2) =(A1B1 A2B2) + i(A1B2 + A2B1) and therefore

A

(A

1)

= A

B

= A1 A2A2 A1 B

1 B

2B2 B1

=

A1B1 2A2B2 (A1B2 + A2B1)(A1B2 + A2B1) A1B1 2A2B2

=

I 0

0 I

,

A(A1) = AB =

A2 A1

A1 A2

B2 B1B1 B2

=

A2B2 2A1B1 (A1B2 + A2B1)

(A1B2 + A2B1) A2B2 2A1B1

=

I 00 I

.

39


40/48

Proof of 10: On the one hand | det(A)|2 = det(A)det(A) = det(A)det(A) =det(AA) = det[(A1 + iA2)(A1 iA2)] = det[A21 + A22] and on the other hand

det(A

) = det A1

A2

A2 A1 = det A1

iA2

A2

A2 + iA1 A1 = det

A1 iA2 A2

0 A1 + iA2

= det[(A1 iA2)(A1 + iA2)] = det[A21 + A22],

det(A) = det

A2 A1

A1 A2

= det

A2 + iA1 A1

A1 + iA2 A2

= det

A2 + iA1 A1

0 A2 iA1

= det[(A2 + iA1)(A2 iA1)] = det[A21 + A22].

If now the abbreviation Z = XV

Y is used, the equations (5.4b) and (5.5) in the

case F = C can be brought with the Lemma 5.4 into the equivalent real representation

2f(V) = 2 tr(ZZH) = 2 tr(ZHZ)

= tr[(ZHZ)] + i tr[(ZHZ)]

= tr[(ZH)(Z)] + i tr[(ZH)(Z)]

= tr[ZH(Z)T] + i tr[ZH(Z)T]

= tr[(XV Y)H(XV Y)T]= tr[(XV Y)T(XV Y)H]= f(V)

and

2h(V) = tr[L(VHV H)] + i tr[L(VHV H)]= tr[L(VH(V)T H)] + i tr[L(VH(V)T H)]= h1(V

) + ih2(V) = h(V).

Thereby the antisymmetry of H was taken into account in the first transformation,from which the vanishing of the imaginary part follows. The necessary first ordercondition for an optimum is

f

V+

h1V

= 0 andh2

V= 0

and the differentiation of the trace gives

f

V= 2(X)TXVH 2(X)TYH = 2(XXVH XYH),

h1V

= [L + (L)T]VH = [(L + L)VH],

h2V

= [L (L)T]VH = [(L L)VH],40


41/48

so that the equations now written again in complex form

XXVH XYH + L + L

2VH = 0 and

L L2

VH = 0

must be satisfied. The second equation demands that the antihermitian part of theLagrange multipliers vanishes, and the first can be expressed with the abbreviation for the hermitian part and with V = U in the form

HUXX + HU = HYX

This is just (5.6), so that in the complex case too the necessary condition (5.7) mustbe fulfilled.

Lastly, to also get a sufficient criterion for the minimum of the function f, letA = UM be an H-polar decomposition of the matrix A = YXH and let

(R1A[]AR, RHR) = (J, ZJ) with J =k

i=1Jpi(i),

(S1MS, SHS) = (K, ZK) with K =mj=1

Jpj (j)

be the canonical forms of the pairs (A[]A, H) and (M, H), respectively. Returningtherewith to the initial equation (5.4b), we find that

f(V) = tr[(XV Y)(XV Y)H]= tr(VXXVH VXYH YXVH + YYH)= tr(XXH) + tr(YYH) 2 Re[tr[(YXH)(H1UH)]]= 2 Re[tr(AU1)] = 2 Re[tr(UMU1)] = 2 Re[tr(SKS1)]= 2 Re[tr(K)] for = tr(X

XH) + tr(Y

YH).

Thus the value of f(V) is minimised when in the calculation of the H-polar decompo-sition of A the square roots of the positive real eigenvalues ofA[]A are chosenpositive (Theorem 3.4, case b)

= +

for > 0 (5.8a)

and when the square roots , of the non-real eigenvalues , of A[]A arechosen such that their real parts are positive (Theorem 3.4, case a)

= +

(cos2

+ i sin2

) with

=

|

|, = arg() for

C

\R.

(5.8b)

The roots of the negative real eigenvalues and of the eigenvalue 0, which are fixed byspecification anyway, make no contribution to the optimised value. Thus altogetherthe following theorem applies.

Theorem 5.5 (Existence of solutions of the H-isometric procrustes problem).A solution of the H-orthogonal or H-unitary procrustes problem (5.4) exists if

fa and procrustes problems

Documents