gelfand - lectures on linear algebra

LECTURES ONLINEAR ALGEBRA

L M. GELTANDAcademy of Sciences, Moscow, U.S.S.R.

Translated from the Revised Second Russian Edition

by A. SHENITZERAdelphi College, Garden City, New York

INTERSCIENCE PUBLISHERS, INC., NEW YORKINTERSCIENCE PUBLISHERS LTD., LONDON

PDF compression, OCR, web optimization using a watermarked evaluation copy of CVISION PDFCompressor

COPYRIGHT 0 1961 BY INTERSCIENCE PUBLISHERS, INC.

ALL RIGHTS RESERVED

LIBRARY OF CONGRESS CATALOG CARD NUMBER 61-8630

SECOND PRINTING 1963

PRINTED IN THE UNITED STATES OF AMERICA


PREFACE TO THE SECOND EDITION

The second edition differs from the first in two ways. Some of thematerial was substantially revised and new material was added. Themajor additions include two appendices at the end of the book dealingwith computational methods in linear algebra and the theory of pertur-bations, a section on extremal properties of eigenvalues, and a sectionon polynomial matrices ( §§ 17 and 21). As for major revisions, thechapter dealing with the Jordan canonical form of a linear transforma-tion was entirely rewritten and Chapter IV was reworked. Minorchanges and additions were also made. The new text was written in colla-boration with Z. Ja. Shapiro.

I wish to thank A. G. Kurosh for making available his lecture noteson tensor algebra. I am grateful to S. V. Fomin for a number of valuablecomments Finally, my thanks go to M. L. Tzeitlin for assistance inthe preparation of the manuscript and for a number of suggestions.

September 1950 I. GELtAND

Translator's note: Professor Gel'fand asked that the two appendicesbe left out of the English translation.


PREFACE TO THE FIRST EDITION

This book is based on a course in linear algebra taught by the authorin the department of mechanics and mathematics of the Moscow StateUniversity and at the Byelorussian State University.

S. V. Fomin participated to a considerable extent in the writing ofthis book. Without his help this book could not have been written.

T h e author wishes to thank Assistant Professor A. E. Turetski of theByelorussian State University, who made available to him notes of thelectures given by the author in 1945, and to D. A. Raikov, who carefullyread the manuscript and made a number of valuable comments.

The material in fine print is not utilized in the main part of the textand may be omitted in a first perfunctory reading.

January 1948 I. GEL'FAND

vii


TABLE OF CONTENTS

Page

Preface to the second edition

Preface to the first edition vii

I. n-Dimensional Spaces. Linear and Bilinear Formsti-Dimensional vector spacesEuclidean space 14

Orthogonal basis. Isomorphism of Euclidean spaces 21

Bilinear and quadratic forms 34

Reduction of a quadratic form to a sum of squares 42

Reduction of a quadratic form by means of a triangular trans-formation 46

The law of inertia 55

Complex n-dimensional space 60

II. Linear Transformations 70

Linear transformations. Operations on linear transformations . 70

Invariant subspaces. Eigenvalues and eigenvectors of a lineartransformation 81

The adjoint of a linear transformation 90

Self-adjoint (Hermitian) transformations. Simultaneous reduc-tion of a pair of quadratic forms to a sum of squares 97

Unitary transformations 103

Commutative linear transformations. Normal transformations . 107

Decomposition of a linear transformation into a product of aunitary and self-adjoint transformationLinear transformations on a rea/ Euclidean space 114

Extremal properties of eigenvalues 126

III. The Canonical Form of an Arbitrary Linear Transformation . 132

The canonical form of a linear transformation 132

Reduction to canonical form 137

Elementary divisors 142

Polynomial matrices 149

IV. Introduction to Tensors 164

The dual space 164

Tensors 171


CHAPTER I

n-Dimensional Spaces. Linear and Bilinear Forms

§ 1. a-Dimensional vector spaces

1. Definition of a vector space. We frequently come acrossobjects which are added and multiplied by numbers. Thus

In geometry objects of this nature are vectors in threedimensional space, i.e., directed segments. Two directed segments

are said to define the same vector if and only if it is possible totranslate one of them into the other. It is therefore convenient to

measure off all such directed segments beginning with one commonpoint which we shall call the origin. As is well known the sum oftwo vectors x and y is, by definition, the diagonal of the parallelo-gram with sides x and y. The definition of multiplication by (real)numbers is equally well known.

In algebra we come across systems of n numbersx = (ei, E2, , e) (e.g., rows of a matrix, the set of coefficientsof a linear form, etc.). Addition and multiplication of n-tuples bynumbers are usually defined as follows: by the sum of the n-tuples

x = (E1, E2, , ¿) and y =(,n2, , ?In) we mean the n-tuple

x + y = + ij, E2 + n2, , + n). By the product of thenumber A and the n-tuple x = (ei , e2, e) we mean the n-tupleix= (241, )42, , AE ).

In analysis we define the operations of addition of functionsand multiplication of functions by numbers. In the sequel weshall consider all continuous functions defined on some interval

fa, b].In the examples just given the operations of addition and multi-

plication by numbers are applied to entirely dissimilar objects. Toinvestigate all examples of this nature from a unified point of viewwe introduce the concept of a vector space.

DEFINITION 1. A set R of elements x, y, z, is said to be avector space over a field F if:

[I]


2 LECTURES ON LINEAR ALGEBRA

With every two elements x and y in R there is associated anelement z in R which is called the sum of the elements x and y. Thesum of the elements x and y is denoted by x + y.

With every element x in R and every numeer A belonging tu a cfield F there is associated an element Ax in R. Ai( is referred to as theproduct of x by A.

The above operations must satisfy the following requirements(axioms):

I. 1. x+y=y+x (commutativity)(x + y) 4- z = x (y z) (associativity)

R contains an element 0 such that x = x for all x inR. 0 is referred to as the zero element.

For every x in R there exists (in R) an element denoted byx with the property x ( x) = O.II. 1. Ix= x

2. (ßx) = ß(x).III. 1. (cc + fi)x = + fix

2. ct(x y) = cor.

It is not an oversight on our part that we have not specified howelements of R are to be added and multiplied by numbers. Anydefinitions of these operations are acceptable as long as theaxioms listed above are satisfied. Whenever this is the case we aredealing with an instance of a vector space.

We leave it to the reader to verify that the examples 1, 2, 3above are indeed examples of vector spaces.

Let us give a few more examples of vector spaces.4. The set of all polynomials of degree not exceeding some

natural number n constitutes a vector space if addition of polyno-mials and multiplication of polynomials by numbers are defined inthe usual manner.

We observe that under the usual operations of addition andmultiplication by numbers the set of polynomials of degree n doesnot form a vector space since the sum of two polynomials of degreen may turn out to be a polynomial of degree smaller than n. Thus

(r t) ( r t) =5. We take as the elements of R matrices of order n. As the sum


n-DIMENSIONAL SPACES 3

of the matrices I laikl I and we take the matrix Hai, + b1,11.As the product of the number X and the matrix 11 aikl we take thematrix 112a1tt It is easy to see that the above set R is now avector space.

It is natural to call the elements of a vector space vectors. Thefact that this term was used in Example I should not confuse thereader. The geometric considerations associated with this wordwill help us clarify and even predict a number of results.

If the numbers X, y, involved in the definition of a vectorspace are real, then the space is referred to as a real vector space. Ifthe numbers 2, it, are taken from the field of complex numbers,then the space is referred to as a complex vector space.

More generally it may be assumed that A, 1.4, , are elements of anarbitrary field K. Then R is called a vector space over the field K. Manyconcepts and theorems dealt with in the sequel and, in particular, thecontents of this section apply to vector spaces over arbitrary fields. How-ever, in chapter I we shall ordinarily assume that R is a real vector space.

2. The dimensionality of a vector space. 1/Ve now define the notionsof linear dependence and independence of vectors which are offundamental importance in all that follows.

DEFINITION 2. Let R be a vector space. W e shall say that thevectors x, y, z, , IT are linearly dependent if there exist numbers

cc, /3, y, 9, not all equal to zero such that

(1) yz + + Ov = O.

Vectors which are not linearly dependent are said to be linearly

independent. In other words,a set of vectors x, y, z, , v is said to be linearly independent if

the equality

GO( + /3y + yz + + Ov = 0

implies that =- p y 0 = O.Let the vectors x, y, z, , IT be linearly dependent, i.e., let

x, y, z, , IT be connected by a relation of the form (1) with atleast one of the coefficients, a, say, unequal to zero. Then

fty yz Ov.

Dividing by a and putting



(/57X) = A, = tu, ', (01x) =we have

(2) x /1.y pZ + + Of.Whenever a vector x is expressible through vectors y, z, , y

in the form (2) we say that x is a linear combination of the vectorsy, z, , y.

Thus, if the vectors x, y, z, , y are linearly dependent then atleast one of them is a linear combination of the others. We leave it tothe reader to prove that the converse is also true, i.e., that if one ofa set of vectors is a linear combination of the remaining vectors thenthe vectors of the set are linearly dependent.

EXERCISES. 1. Show that if one of the vectors x, y, z, y is the zerovector then these vectors are linearly dependent.

2. Show that if the vectors x, y, z, are linearly dependent and u, v,are arbitrary vectors then the vectors x, y, z, , u, y, are linearlydependent.

We now introduce the concept of dimension of a vector space.Any two vectors on a line are proportional, i.e., linearly depend-

ent. In the plane we can find two linearly independent vectorsbut any three vectors are linearly dependent. If R is the set ofvectors in three-dimensional space, then it is possible to find threelinearly independent vectors but any four vectors are linearlydependent.

As we see the maximal number of linearly independent vectorson a straight line, in the plane, and in three-dimensional spacecoincides with what is called in geometry the dimensionality of theline, plane, and space, respectively. It is therefore natural to makethe following general definition.

DEF/NITION 3. A vector space R is said to be n-dimensional if itcontains n linearly independent vectors and if any n 1 vectorsin R are linearly dependent.

If R is a vector space which contains an arbitrarily large numberof linearly independent vectors, then R is said to be infinite-dimensional.

Infinite-dimensional spaces will not be studied in this book.Ve shall now compute the dimensionality of each of the vector

spaces considered in the Examples 1, 2, 3, 4, 5.



As we have already indicated, the space R of Example 1contains three linearly independent vectors and any four vectors

in it are linearly dependent. Consequently R is three-dimensional.Let R denote the space whose elements are n-tuples of real

numbers.This space contains n linearly independent vectors For instance,

the vectors

are easily seen to be linearly independent. On the other hand, any

m vectors in R, tre > n, are linearly dependent. Indeed, let

Yi(nii, n12,Y2 = (V21, n22, , n2n),

Yin = (nml, nm2, , nmn)

be ni vectors and let ni > n. The number of linearly independent

rows in the matrix

17ml, 717n2, n tnn

cannot exceed n (the number of columns). Since m > n, our ni

rows are linearly dependent. But this implies the linear dependence

of the vectors y1, y2, , YntThus the dimension of R is n.

Let R be the space of continuous functions. Let N be anynatural number. Then the functions: f1(t) 1, f2(t) = t,fN(t) = tN-1 form a set of linearly independent vectors (the proofof this statement is left to the reader). It follows that our space

contains an arbitrarily large number of linearly independentfunctions or, briefly, R is infinite-dimensional.

Let R be the space of polynomials of degree n 1. In

this space the n polynomials 1, t, , t"-1 are linearly independent.

It can be shown that any ni elements of R, ni > n, are linearlydependent. Hence R is n-dimensional.

[nu., n12, Thn

n21, n22y 172n

xi -= (1, 0, , 0),x, = (0, 1, , 0),

x= (0, 0, , 1)



5. We leave it to the reader to prove that the space of n x nmatrices [a2kH is n2-dimensional.

3. Basis and coordinates in n-dimensional spaceDEFINITION 4. Any set of n linearly independent vectors

e1, e2, , en of an n-dimensional vector space R is called a basis of R.

Thus, for instance, in the case of the space considered in Example1 any-three vectors which are not coplanar form a basis.

By definition of the term "n-dimensional vector space" such aspace contains n linearly independent vectors, i.e., it contains abasis.

THEOREM 1. Every vector x belonging to an n-dimensional vectorspace R can be uniquely represented as a linear combination of basisvectors.

Proof: Let e1, e2, , e be a basis in R. Let x be an arbitraryvector in R. The set x, e1, e2, e contains n + 1 vectors. Itfollows from the definition of an n-dimensional vector space thatthese vectors are linearly dependent, i.e., that there exist n Inumbers « al, , ex not all zero such that

(3) ao X + 1e1 + cte = O.Obviously «0 O. Otherwise (3) would imply the linear depend-ence of the vectors e1, e2, e. Using (3) we have

cc °cx = 2 e2 m en.cco 22o «0

This proves that every x e R is indeed a linear combination of thevectors el, e2, , en.

To prove uniqueness of the representation of x in terms of thebasis vectors we assume that

x = $1e1 E2e2 + + eand

X = E',e, r2e2 + + ete.Subtracting one equation from the other we obtain

O = (el $'1)e1 (2 e2)e2 + + (E. e/n)e.


tt-DIMENSIONAL SPACES 7

Since e1, e2, , en are linearly independent, it follows that

Et'= e2 e'2 = = en = e'n =i.e.,

= eta ea = E12, , = $'7t

This proves uniqueness of the representation.

DEFINITION 5. if el, e2, , e form a basis in an n-dimensionalspace and

(4) x = + e2e2 + +then the numbers $1, E2, , E are called the coordinates of the vectorx relative to the basis el, e2, e.

Theorem 1 states that given a basis el, e2, , e of a vectorspace R every vector X E R has a unique set of coordinates.

If the coordinates of x relative to the basis e1, e2, , en are

E2, , En and the coordinates of y relative to the same basisare ni, n2, a, i.e., if

x + 2e2 + + EneY = + /72; + +

then

x + Y = (El + ni)ei + (e2 712)e2 + + + n)e,i.e., the coordinates of x + y are E, ni, e2 772, , En +Similarly the vector /lx has as coordinates the numbers 141,2E2, , 2En

Thus the coordinates of the sum of two vectors are the sums of theappropriate coordinates of the summands, and the coordinates of theproduct of a vector by a scalar are the products of the coordinatesof that vector by the scalar in question.

It is clear that the zero vector is the only vector all of whosecoordinates are zero.

EXAMPLES. 1. In the case of three-dimensional space our defini-tion of the coordinates of a vector coincides with the definition ofthe coordinates of a vector in a (not necessarily Cartesian) coor-dinate system.

2. Let R be the space of n-tuples of numbers. Let us choose asbasis the vectors


S LECTURES ON LINEAR ALGEBRA

et = (1, 1, 1, 1),

e, = (0, I, 1, , 1),

en = (0, 0, 0, 1),

and then compute the coordinates n , n of the vectorx = (E1, $2, , e) relative to the basis el, e,, , e. By definition

x = n2e2 ' + nen;i.e.,

(E1, E, en) n/(1, 1, , 1)272(0, 1, , 1)

Ti(0, 0, , 1)= + n2, ni + + + n).

The numbers (ni, .72, , n) must satisfy the relations

= El,171 + 172 E2

Then

x = (Ei, 52, En)Ei(1, o, o) 2(o, 1, , o) + + $(o, o, , 1)

= e2e2+ +Ee,,.It follows that in the space R of n-tuples ($1, $2, , $) the numbers

+ 172 + + n. = E.Consequently,

ni " Si, n2 " S2 -- $1, ' nn " $n $fl-1-

Let us now consider a basis for R in which the connection be-tween the coordinates of a vector x = E2, , en) and thenumbers et, E2, , E which define the vector is particularlysimple. Thus, let

= (1,e, = (0,

= (0,

0,1,

0,

,

,

0),0),

1).



may be viewed as the coordinates of the vector,

EXERCISE. Show that in an arbitrary basis

e, = (all, 6122, ,e2 (a, a22, a,),

en (a, a", - , a,)the coordinates n,, /2' 17 of a vector x ($1., e2, , en) are linear

combinations of the numbers E1, E2, ' ,

Let R be the vector space of polynomials of degree n 1.

A very simple basis in this space is the basis whose elements arethe vectors el = 1, e, = t, , e = t"--1-. It is easy to see that thecoordinates of the polynomial P(t) = a0r-1 a1t"-2 +in this basis are the coefficients a_, an_2, , ao.

Let us now select another basis for R:

e' = 1, e'2 = t a, e's = (t a)2, , e' = (t a)"--1.

Expanding P (t) in powers of (t a) we find that

P (t) = P (a) + P' (a)(t a) + + [P(nl) (a)I (n-1)!](t a)n_'.

Thus the coordinates of P(t) in this basis are

P(a), P' (a), , [PR-1)(a)/ (n 1)1

Isomorphism of n-dimensional vector spaces. In the examplesconsidered above some of the spaces are identical with others whenit comes to the properties we have investigated so far. One instanceof this type is supplied by the ordinary three-dimensional space Rconsidered in Example / and the space R' whose elements aretriples of real numbers. Indeed, once a basis has been selected inR we can associate with a vector in R its coordinates relative tothat basis; i.e., we can associate with a vector in R a vector in R'.When vectors are added their coordinates are added. When avector is multiplied by a scalar all of its coordinates are multipliedby that scalar. This implies a parallelism between the geometricproperties of R and appropriate properties of R'.

We shall now formulate precisely the notion of "sameness" or of"isomorphism" of vector spaces.

ei,X =(, , e) relative to the basis

e, = (1, 0, , 0), e2 = (0, 1, , 0), en = (0, 0, 1).



DEFINITION 6. Two vector spaces R and RI, are said to be iso-morphic if it is possible to establish a one-to-one correspondenceX 4-4 x' between the elements x e R and x' e R' such that if x 4> x'and y y', then

the vector which this correspondence associates with x + y isX' + y',

the vector which this correspondence associates with Ax is Ax'.

There arises the question as to which vector spaces are iso-morphic and which are not.

Two vector spaces of different dimensions are certainly not iso-morphic.

Indeed, let us assume that R and R' are isomorphic. If x, y,are vectors in R and x', y', are their counterparts in R' then

in view of conditions I and 2 of the definition of isomorphismthe equation Ax yy + = 0 is equivalent to the equationAx' py' + = O. Hence the counterparts in R' of linearlyindependent vectors in R are also linearly independent and con-versely. Therefore the maximal number of linearly independentvectors in R is the same as the maximal number of linearlyindependent vectors in R'. This is the same as saying that thedimensions of R and R' are the same. It follows that two spacesof different dimensions cannot be isomorphic.

THEOREM 2. All vector spaces of dimension n are isomorphic.Proof: Let R and R' be two n-dimensional vector spaces. Let

e2, , en be a basis in R and let e'2, , e' be a basis inR'. We shall associate with the vector

(5) x = e2e2 + + eethe vector

x' E2e'2 + +i.e., a linear combination of the vectors e', with the same coeffi-cients as in (5).

This correspondence is one-to-one. Indeed, every vector x e Rhas a unique representation of the form (5). This means that theE, are uniquely determined by the vector x. But then x' is likewiseuniquely determined by x. By the same token every x' e R'determines one and ordy one vector x e R.



It should now be obvious that if x 4* x' and y e> y', thenx + y 4> x' + y' and 2x 4> Ax'. This completes the proof of theisomorphism of the spaces R and R'.

In § 3 we shall have another opportunity to explore the conceptof isomorphism.

5. Subspaces of a vector space

DEFINITION 7. A subset R', of a vector space R is called a subspace

of R if it forms a vector space under the operations of addition andscalar multiplication introduced in R.

In other words, a set R' of vectors x, y, in R is called asubspace of R if x e R', y e R' implies x y e R', x e R'.

EXAMPLES. 1. The zero or null element of R forms a subspaceof R.

The whole space R forms a subspace of R.The null space and the whole space are usually referred to as

improper subspaces. We now give a few examples of non-trivial

subspaces.Let R be the ordinary three-dimensional space. Consider

any plane in R going through the origin. The totality R' of vectorsin that plane form a subspace of R.

In the vector space of n-tuples of numbers all vectorsx = (E1, E2, , En) for which ei = 0 form a subspace. More

generally, all vectors x (E1, E2, , E) such thata,E1 a2E2 + + anen = 0,

where al, a2, , an are arbitrary but fixed numbers, form asubspace.

The totality of polynomials of degree n form a subspace ofthe vector space of all continuous functions.

It is clear that every subspace R' of a vector space R must con-tain the zero element of R.

Since a subspace of a vector space is a vector space in its ownright we can speak of a basis of a subspace as well as of its dimen-sionality. It is clear that the dimension of an arbitrary subspace of a

vector space does not exceed the dimension of that vector space.

EXERCISE. Show that if the dimension of a subspace R' of a vector spaceR is the same as the dimension of R, then R' coincides with R.



A general method for constructing subspaces of a vector space Ris implied by the observation that if e, f, g, are a (finiteOY infinite) set of vectors belonging to R, then the set R' of all(finite) linear combinations of the vectors e, f, g, forms asubspace R' of R. The subspace R' is referred to as the subspacegenerated by the vectors e, f, g, . This subspace is the smallestsubspace of R containing the vectors e, f, g,

The subspace R' generated by the linearly independent vectorse1, e2, e, is k-dimensional and the vectors e1, e2, , eh form abasis of R' Indeed, R' contains k linearly independent vectors(i.e., the vectors e1, e2, eh). On the other hand, let x1,x2, , x, be 1 vectors in R' and let 1 > k. If

x1= $11e1 + $ 12; + + elkekX2 enel. 4- $ 22; + + E2e,

xi E12e2 + ' + etkek,

then the I rows in the matrix

$12, P Elk

$21, $22, , $2k

Etl., $22, E lk

must be linearly dependent. But this implies (cf. Example 2,page 5) the linear dependence of the vec ors x, x2, , xi. Thusthe maximal number of linearly independent vectors in R', i.e.,the dimension of R', is hand the vectors e, e2, , e7, form a basisin R'.EXERCISE. Show that every n-dimensional vector space contains

subspaces of dimension /, / 1, 2, , n.

If we ignore null spaces, then the simplest vector spaces are one-dimensional vector spaces. A basis of such a space is a singlevector el O. Thus a one-dimensional vector space consists ofall vectors me1, where a is an arbitrary scalar.

Consider the set of vectors of the form x xo °Lei, where xoand e, 0 are fixed vectors and ranges over all scalars. It isnatural to call this set of vectors by analogy with three-dimensional space a line in the vector space R.



Similarly, all vectors of the form /el ße2, where el and e2are fixed linearly independent vectors and a and fi are arbitrarynumbers form a two-dimensional vector space. The set of vectorsof the form

X = xe ße2,

where xo is a fixed vector, is called a (two-dimensional) plane.

EXERCISES. 1. Show that in the vector space of n-tuples (ei, ep, '

of real numbers the set of vectors satisfying the relation

6/1E1 a2E, + + ane, ----

(a1, a,, , a, are fixed numbers not all of which are zero) form a subspaceof dimension n 1.

Show that if two subspaces R, and 112 of a vector space R have onlythe null vector in common then the sum of their dimensions does not exceedthe dimension of R.

Show that the dimension of the subspace generated by the vectorse, f, g, is equal to the maximal number of linearly independent vectorsamong the vectors e, f, g,

6. Transformation of coordinates under change of basis. Lete2, , en and e'1, e'2, , e' be two bases of an n-dimensional

vector space. Further, let the connection between them be givenby the equations

e', = ae a21e2 + + ae,(6)

e'2 = tine' a22 e2 + + an,en,

= ame, a211e2 + + ae.The determinant of the matrix d in (6) is different from zero(otherwise the vectors e'1, e'o, e' would be linearly depend-ent).

Let ei be the coordinates of a vector x in the first basis andits coordinates in the second basis. Then

x = Ele $2e2 + $e = E'le', +Replacing the e', with the appropriate expressions from (6) we get

x e1e1 $2e2 + + ee e'l(ae ae2 + ae)±a2e)ane2

E'(a1,,e1±a21e2+ + aen).



en aniVi + a,V, + + anE'n.

Thus the coordinates of the vector x in the first basis are express-ed through its coordinates in the second basis by means of thematrix st which is the transpose of .21.

To rephrase our result we solve the system (7) for ¿'i, , .

Then

buE1 + 1)12E2 + + binen,ei2 = bn + b22$2 + + b2Jn,

e'n = b11 b2$2 + bnnen

where the b are the elements of the inverse of the matrix st.Thus, the coordinates of a vector are transformed by means of amatrix ri which is the inverse of the transpose of the matrix at in (6)which determines the change of basis.

§ 2. Euclidean space

1. Definition of Euclidean space. In the preceding section avector space was defined as a collection of elements (vectors) forwhich there are defined the operations of addition and multipli-cation by scalars.

By means of these operations it is possible to define in a vectorspace the concepts of line, plane, dimension, parallelism of lines,etc. However, many concepts of so-called Euclidean geometrycannot be forniulated in terms of addition and multiplication byscalars. Instances of such concepts are: length of a vector, anglesbetween vectors, the inner product of vectors. The simplest wayof introducing these concepts is the following.

We take as our fundamental concept the concept of an innerproduct of vectors. We define this concept axiomatically. Usingthe inner product operation in addition to the operations of addi-

Since the e, are linearly independent, the coefficients of the e, onboth sides of the above equation must be the same. Hence

auri + an + + rn

(7)E2 = an VI a22 E'2 + + a2netn,



tion and multiplication by scalars we shall find it possible to devel-

op all of Euclidean geometry.

DEFINITION 1. If with every pair of vectors x, y in a real vectorspace R there is associated a real number (x, y) such that

(x, y) = (y, x),(2x, y) = 2(x, y), (A real)(x, + x2, y) =] (x1, y] + (x2, y],(x, x) 0 and (x, x) = 0 tt and only if x 0,

then we say that an inner product is defined in R.A vector space in which an inner product satisfying conditions

1 through 4 has been defined is referred to as a Euclidean space.

EXAMPLES. 1. Let us consider the (three-dimensional) space R

of vectors studied in elementary solid geometry (cf. Example /,§ 1). Let us define the inner product of two vectors in this space asthe product of their lengths by the cosine of the angle betweenthem. We leave it to the reader to verify the fact that the opera-tion just defined satisfies conditions 1 through 4 above.

Consider the space R of n-tuples of real numbers. Let

x = (et, 23 , En) and Y = (n2, n2, , th,) be in R. In additionto the definitions of addition

x + Y = (ei nt, ez + n2, en + n)

and multiplication by scalars

Ax (25,, AE,, , Aen)

with which we are already familiar from Example 2, § 1, we definethe inner product of x and y as

(x, Y) = eint + 5m + +it is again easy to check that properties 1 through 4 are satisfiedby (x, y) as defined.

Without changing the definitions of addition and multipli-cation by scalars in Example 2 above we shall define the innerproduct of two vectors in the space of Example 2 in a different andmore general manner.

Thus let taiki I be a real n x n matrix. Let us put



+ an]. ennl an2 En n2 + ann ennn

We can verify directly the fact that this definition satisfiesAxioms 2 and 3 for an inner product regardless of the nature of thereal matrix raj/cll. For Axiom 1 to hold, that is, for (x, y) to besymmetric relative to x and y, it is necessary and sufficient that

a= a,i.e., that IctO be symmetric.

Axiom 4 requires that the expression

(x, x) ctikeie,i, k=1

be non-negative fore very choice of the n numbers el, e2, , enand that it vanish only if E, = E, = = E,, = O.

The homogeneous polynomial or, as it is frequently called,quadratic form in (3) is said to be positive definite if it takes onnon-negative values only and if it vanishes only when all the Eiare zero. Thus for Axiom 4 to hold the quadratic form (3) mustbe positive definite.

In summary, for (1) to define an inner product the matrix 11(211must be symmetric and the quadratic form associated with Ila11must be positive definite.

If we take as the matrix fIctO the unit matrix, i.e., if we puta = 1 and a, = O (i k), then the inner product (x, y) deEnedby (1) takes the form

(x, y) = I eini

and the result is the Euclidean space of Example 2.EXERCISE. Show that the matrix

( 0 1)1 0

cannot be used to define an inner product (the corresponding quadraticform is not positive definite), and that the matrix

(1

1\1 21

can be used to define an inner product satisfying the axioms I through 4.

(x, Y) = a11C1n1 + a12C1n2 + + am el /7

(1) a21e2171 + a22E2n2 + + a 271 egbi



In the sequel (§ 6) we shall give simple criteria for a quadraticform to be positive definite.

Let the elements of a vector space be all the continuousfunctions on an interval [a, b]. We define the inner product of twosuch functions as the integral of their product

(f, g) = fa f(t)g(t) dt.

It is easy to check that the Axioms 1 through 4 are satisfied.Let R be the space of polynomials of degree n 1.

We define the inner product of two polynomials as in Example 4

(P, Q) = P (t)Q(t) dt.L

2. Length of a vector. Angle between two vectors. We shall nowmake use of the concept of an inner product to define the lengthof a vector and the angle between two vectors.

DEFINITION 2. By the length of a vector x in Euclidean space wemean the number

(4) (x, x).

We shall denote the length of a vector x by the symbol N.It is quite natural to require that the definitions of length of a

vector, of the angle between two vectors and of the inner productof two vectors imply the usual relation which connects thesequantities. In other words, it is natural to require that the innerproduct of two vectors be equal to the product of the lengths ofthese vectors times the cosine of the angle between them. Thisdictates the following definition of the concept of angle betweentwo vectors.

DEFINITION 3. By the angle between two vectors x and y we meanthe number

arc cos(x,

y)lx1 1311

i.e., we put(x, y)

cos 9)13C1 1371

(5)



The vectors x and y are said to be orthogonal if (x, y) = O. Theangle between two non-zero orthogonal vectors is clearly .7i/ 2.The concepts just introduced permit us to extend a number oftheorems of elementary geometry to Euclidean spaces.

The following is an example of such extension. If x and y areorthogonal vectors, then it is natural to regard x + y as thediagonal of a rectangle with sides x and y. We shall show that

Y12 = 1x12 1Y121

i.e., that the square of the length of the diagonal of a rectangle is equalto the sum of the squares of the lengths of its two non-parallel sides(the theorem of Pythagoras).

Proof: By definition of length of a vector

lx + 3/12 (x Y, x Y).

In view of the distributivity property of inner products (Axiom 3),

(X + y, x + y) = (x, x) (x, Y) 4- (Y, x) (Y, y).Since x and y are supposed orthogonal,

(x, y) = (y, x) = O.Thus

1x + y12 = (x, x) (Y, Y) 13112,

which is what we set out to prove.This theorem can be easily generalized to read: if x, y, z, -

are pairwise orthogonal, then

Ex + y + z + 12 = ixr2 )7I2 1z12 + '3. The Schwarz inequality. In para. 2. we defined the angle w

between two vectors x and y by means of the relation

(x, y)cos 99

1x1 1Y1

If is to be always computable from this relation we must showthat

I We could have axiomatized the notions of length of a vector and anglebetween two vectors rather than the notion of inner product. However,this course would have resulted in a more complicated system of axiomsthan that associated with the notion of an inner product.


y)1/1x1 IYI 1.


(X, Y)1 < < 1IXI IYI

or, equivalently, that(x, y)2 < I

Ix12 13112

which, in turn, is the same as

(6) (x, y)2 (x, x)(y, y).

Inequality (6) is known as the Schwarz inequality.Thus, before we can correctly define the angle between two vectors by

means of the relation (5) we must prove the Schwarz inequality. 2To prove the Schwarz inequality we consider the vector x ty

where t is any real number. In view of Axiom 4 for inner products,

(x ty, x ty) 0;

i.e., for any t,12(y , y) 2t(x, y) + (x, x) O.

This inequality implies that the polynomial cannot have two dis-tinct real roots. Consequently, the discriminant of the equation

t2(y, y) 2t(x, y) + (x, x) =

cannot be positive; i.e.,(x, y)2 (x, x)(y, y) _CO,

which is what we wished to prove.

EXERCISE. Prove that a necessary and sufficient condition for

(x, y), (x, x)(y, y) is the linear dependence of the vectors x and y.

EXAMPLES. We have proved the validity of (6) for an axiomaticallydefined Euclidean space. It is now appropriate to interpret this inequalityin the various concrete Euclidean spaces in para. 1.

1. In the case of Example 1, inequality (6) tells us nothing new. (cf.the remark preceding the proof of the Schwarz inequality.)

2 Note, however that in para. 1, Example /, of this section there is noneed to prove this inequality. Namely, in vector analysis the inner productof two vectors is defined in such a way that the quantity (x, y)/1x1 1y1 is

the cosine of a previously determined angle between the vectors. Conse-

quently, 1(x,



2 In Example 2 the inner product was defined as

(x, y) = Et=1

It follows that

(X, X) E Et2, (y. y) = E i)(8,2=1 2--1

and inequality (6) becomes

Yn )( n

ei M 5- ( Ei2 tif2).i=1 i=1 i=1

In Example 3 the inner product was defined as

(1) Y) aikeink,i,n=1

where

and

(3) E aikeiek >.k=1

for any choice of the ¿i. Hence (6) implies thatil the numbers an satisfy conditions (2) and (3), then the following inequality

holds:

2 ( n

n( E ao,$((o) E aikeik)( E 6111115112)k=1 k=1 i, 2=1

EXERCISE. Show that if the numbers an satisfy conditions (2) and (3),anakk. (Hint: Assign suitable values to the numbers Ei, En, , En

and /72, , 71 in the inequality just derived.)

In Example 4 the inner product was defined by means of the integralfb

1(1)g (t) dt. Hence (6) takes the form

0. af(t)g(t) dt))2 fba [f(t)J' dt [g(t)p dt.

This inequality plays an important role in many problems of analysis.

We now give an example of an inequality which is a consequenceof the Schwarz inequality.

If x and y are two vectors in a Euclidean space R then

(7) Ix + Yi [xi + 1Yr-



Proof:

lx y12 = (x + y, x -1- y) = (x, x) 2(x, y) + (y, y).

Since 2(x, y) 213E1 it follows that

1x±y12 = (x+y, x+y) (x, x)+21x1 lyi+ (y, Y) = fix1+IYI)2,

i.e., 1x + yl x1 SI, which is the desired conclusion.

EXERCISE. Interpret inequality (7) in each of the concrete Euclideanspaces considered in the beginning of this section.

In geometry the distance between two points x and y (note theuse of the same symbol to denote a vectordrawn from the origin-and a point, the tip of that vector) is defined as the length of thevector x y. In the general case of an n-dimensional Euclideanspace we define the distance between x and y by the relation

d lx yl.

§ 3. Orthogonal basis. Isomorphism of Euclidean spaces

I. Orthogonal basis. In § 1 we introduced the notion of a basis(coordinate system) of a vector space. In a vector space there is

no reason to prefer one basis to another. 3 Not so in Euclideanspaces. Here there is every reason to prefer so-called orthogonalbases to all other bases. Orthogonal bases play the same role inEuclidean spaces which rectangular coord nate systems play inanalytic geometry.

DEFINITION 1. The non-zero vectors el, e2, e of an n-dimensional Euclidean vector space are said to form an orthogonal

basis if they are pairwise orthogonal, and an orthonormal basis i f , in

addition, each has unit length. Briefly, the vectors e,,e2, ,eform an orthonormal basis

3 Careful reading of the proof of the isomorphism of vector spaces givenin § 1 will show that in addition to proving the theorem we also showed thatit is possible to construct an isomorphism of two n-dimensional vector spaceswhich takes a specified basis in one of these spaces into a specified basis inthe other space. In particular, if e,, e,, e and e'5, e',, are twobases in R, then there exists an isomorphic mapping of R onto itself whichtakes the first of these bases into the second.



f 1 if i = k(ei, ek) =10 if i k.

For this definition to be correct we must prove that the vectorsei, e,, en of the definition actually form a basis, i.e., arelinearly independent.

Thus, let

A2e2 + + Ae = O.

We wish to show that (2) implies Ai = 2.2 = A = O. Tothis end we multiply both sides of (2) by el (i.e., form the innerproduct of each side of (2) with ei). The result is

21(e1, el) + A2(e1, e2) + + 2(e1, en) = O.

Now, the definition of an orthogonal basis implies that

ei) 0 0, (e1, ek) = 0 for k é L

Hence A = O. Likewise, multiplying (2) by e, we find thatA2 = 0, etc. This proves that el, e2, , e are linearly independ-ent.

We shall make use of the so-called orthogonalization procedure toprove the existence of orthogonal bases. This procedure leadsfrom any basis f, 2, , f to an orthogonal basis el, e2, , e.THEOREM 1. Every n-dimensional Euclidean space contains

orthogonal bases.Proof: By definition of an n-dimensional vector space (§ 1,

para. 2) such a space contains a basis f1, f2, , f. We put= f1. Next we put e, = f, e1, where a is chosen so that

(e2, el) = 0; i.e., (f, e1, ei) = O. This means that

(f,, e1)/(e1, e1).

Suppose that we have already constructed non-zero pairwiseorthogonal vectors el, e2, ", To construct ek we put

e, =A1e11+ ' + Ak-ietwhere the Al are determined from the orthogonality conditim



(ek, e1) = (fk 21e2-1 ' ' + = 0,(ek, e2) = (fk Aiek.-t. -I- + A2e, e2) = 0,

(fk, e2-1) 21(ek_1, e) = O.

It follows that

2-1 = (f/c, e1)/(e1, el), = (fk, e2)/(e2, e2), '(f2,

So far we have not made use of the linear independence of thevectors f1, fa, , but we shall make use of this fact presentlyto prove that e, O. The vector ek is a linear combination of thevectors ek, e2, , e, f2. But e, can be written as a linearcombination of the vector f_, and the vectors e, e2, ,Similar statements hold for e22, e2_2, , ek. It follows that

ek = alfk ci2f2 + + 2-1 fk.

In view of the linear independence of the vectors f1, f2, fk wemay conclude on the basis of eq. (5) that ek O.

Just as ek, e2, , ek_k and fk were used to construct e, soek, e2, , e, and fk+, can be used to construct e,, etc.

By continuing the process described above we obtain n non-zero,pairwise orthogonal vectors ek, e2, , en, i.e., an orthogonalbasis. This proves our theorem.

It is clear that the vectors

e'k = ek/lekl (k = 1, 2, , n)

form an orthonormal basis.EXAMPLES OF ORTHOGONALIZATION. 1. Let R be the three-dimensional

space with which we are familiar from elementary geometry. Let fi, f,, f,be three linearly independent vectors in R. Put e, = f,. Next select a..;ctor e, perpendicular to e, and lying in the plane determined by e, = f,

(e2, e,) = (f2 A1e2-1 + ' + e2-1) O.

Since the vectors el, e2, , e, are pairwise orthogonal, the latterequalities become:

(fk, el) + 22-1(e1, et) = 0,(fk, 02) 22-2(e2, e2) 0,



and f2. Finally, choose e, perpendicular to ei ande, (i.e., perpendicular tothe previously constructed plane).

Let R be the three-dimensional vector space of polynomials of degreenot exceeding two. We define the inner product of two vectors in thisspace by the integral

fi P(t)Q (t) dt.

The vectors 1, t, 12 form a basis in R. We shall now orthogonalize this basis.We put e, = 1. Next we put e, = t I. Since

O (t+ I, I) = f (t dt = 2a,

it follows that a = 0, i.e., e, = t. Finally we put e, = 12 + 131 y 1.

The orthogonality requirements imply ß 0 and y = 1/3, i.e., e, = t21/3. Thus 1, t, P 1/3 is an orthogonal basis in R. By dividing each basisvector by its length we obtain an orthonormal basis for R.

Let R be the space of polynomials of degree not exceeding n 1.

We define the inner product of two vectors in this space as in the precedingexample.

We select as basis the vectors 1, t, -,t^-2. As in Example 2 the processof orthogonalization leads to the sequence of polynomials

1, t, 12 1/3, (3/5)1, .Apart from multiplicative constants these polynomials coincide with theLegendre polynomials

1 dk (12 1)k

2k k! dtk

The Legendre polynomials form an orthogonal, but not orthonormal basisin R. Multiplying each Legendre polynomial by a suitable constant weobtain an orthonormal basis in R. We shall denote the kth element of thisbasis by Pk(t).

Let e1, e2, , en be an orthonormal basis of a Euclidean spaceR. If

x = 52e, + + enen,y = ?he, n2e2 + +77e0,

then

(x,Y)= $2e2 + ee, n2e2 + + ne).Since

ek) = {011ff 7 kk,



EXAMPLES. 1. Let Po(t), P,(1), P o(t) be the normed Legendrepolynomials of degree 0, 1, , n. Further, let Q (t) be an arbitrary polyno-

it follows that

(x, y) El% + $27)2 + + Enfl.Thus, the inner product of two vectors relative to an orthonormal basisis equal to the sum of the products of the corresponding coordinates ofthese vectors (cf. Example 2, § 2).

EXERCISES 1. Show that if f., f., f is an arbit ary basis, then

(x, y) = E nikEink,Cle=1

where aik = aki and ei e2, " en and ni, n2, ?I are the coordinates ofx and y respectively.

2. Show that if in some basis f1, f'2, , f(x, Y) = 071 + + + Enn.

for every x = " ' Ent. and Y = nifi + + n,,f, then thisbasis is orthonormal.

We shall now find the coordinates of a vector x relative to anorthonormal basis el, e2, , e.

Letx = eie, E2e2 ene.

Multiplying both sides of this equation by el we get

(x, e1) = el) + $2(e2 el) + + e(en , el) =and, similarly,

(7) e, = (x, e2), , = (x, e).Thus the kth coordinate of a vector relative to an orthonormal basis isthe inner product of this vector and the kth basis vector.

It is natural to call the inner product of a vector x and a vector eof length 1 the projection of x on e. The result just proved may bestates as follows: The coordinates of a vector relative to an orthonor-mal basis are the projections of this vector on the basis vectors. Thisis the exact analog of a statement with which we are familiar fromanalytic geometry, except that there we speak of projections onthe coordinate axes rather than on the basis vectors.



rnial of degree n. We shall represent Q (t) as a linear combination of theLegendre polynomials. To this end we note that all polynomials of degree

n form an n-dimensional vector space with orthonormal basis Po(t),, P(t). Hence every polynomial Q(t) of degree n can be rep-

resented in the forraQ (t) = P(t) + c1P1(1) + + cP(t).

It follows from (7) that

c, I Q(t)P,(1) dt.

2 Consider the system of functions

(8) 1, cos t, sin t, cos 2t, sin 2t, cos nt, sin nt,on the interval (0, 2n). A linear combination

P(t) = (a012) + a, cos t + b, sin t + al cos 2t + + 6,, sin nt

of these functions is called a trigonometric polynomial of degree n. Thetotality of trigonometric polynomials of degree n form a (2n + 1) -dimen-sional space R,. We define an inner product in R1 by the usual integral

2n(P, Q) = fo P(t)Q(t) dt.

It is easy to see that the system (8) is an orthogonal basis Indeedr 2r

cos kt cos It dt = 0 if k 1,Jorn

sin kt cos It dt = 0,

f.o 2r sin kt sin lt dt = 0, if k I.

Since27,

.f: sin% kt dt = cos% kt dt no

and.{0

227

ldt = 2n,

it follows that the functions(8') 1/1/2a, (l/ n) cos t, (l/ n) sin t, (l/Vn) cos nt, (l/Vn) sin ntare an orthonormal basis for R1.

2. Perpendicular from a point to a subspace. The shortest distancefrom a point to a subspace. (This paragraph may be left out in afirst reading.)

DEFINITION 2. Let R, be a subspace of a Euclidean space R. Weshall say that a vector h e R is orthogonal to the subspace R, if it isorthogonal to every vector x e RI.



If h is orthogonal to the vectors e, e,, e then it is alsoorthogonal to any linear combination of these vectors. Indeed,

(h, et) = 0 (1= 1, 2, , m)

implies that for any numbers 2,, 22, ,(h, 21e1 + 22; + + 2.,e,) = O.

Hence, for a vector h to be orthogonal to an m-dimensional sub-space of R it is sufficient that it be orthogonal to ni linearly inde-pendent vectors in It1, i.e., to a basis of 12.1.

Let R, be an m-dimensional subspace of a (finite or infinitedimensional) Euclidean space R and let f be a vector not belongingto lt1. We pose the problem of dropping a perpendicular from thepoint f to 121, i.e., of finding a vector f, in R1 such that the vectorh f f is orthogonal to R1. The vector f, is called the orthogonalprojection of f on the subspace R1. We shall see in the sequel thatthis problem has always a unique solution. Right now we shallshow that, just as in Euclidean geometry, 1111 is the shortest dis-tance from f to RI. In other words, we shall show that if f, e R1and f1 f, then.

If - > If - 41.Indeed, as a difference of two vectors in RI, the vector fo f1

belongs to R, and is therefore orthogonal to h = f f. By thetheorem of Pythagoras

If 412 + 14 - f1I2 = If - f0 + 4 - f1I2 = If -so that

If - > IfWe shall now show how one can actually compute the orthogo-

nal projection f, of f on the subspace R1 (i.e., how to drop aperpendicular from f on 141). Let el, e,, e, be a basis of R1.As a vector in 121, f, must be of the form

f, = c2e2 + c,e,.To find the c, we note that f f, must be orthogonal to III, i.e.,(f fo, ej = O (k = 1, 2, , in), or,

(f0, ej = (f, ej.



Replacing f, by the expression in (9) we obtain a system of mequations for the c,

c,(ei, e1) c2(e2, e1) + + c,(e,,,, e1)= (f, e1) (k = 1, 2, m).

We first consider the frequent case when the vectors e1, e2, ,e,,, are orthonormal. In this case the problem can be solved withease. Indeed, in such a basis the system (11) goes over into thesystem

ci = (f, e,).

Since it is always possible to select an orthonormal basis in anm-dimensional subspace, we have proved that for every vector fthere exists a unique orthogonal projection f, on the subspace

We shall now show that for an arbitrary basis el, e2, , en, thesystem (11) must also have a unique solution. Indeed, in view ofthe established existence and uniqueness of the vector f0, thisvector has uniquely determined coordinates el, c2, , cm withrespect to the basis el, e2, , e,. Since the c, satisfy the system(11), this system has a unique solution.

Thus, the coordinates c, of the orthogonal projection fo of the vector fon the subspace 111 are determined from the system (12) or from thesystem (11) according as the c, are the coordinates of to relative to anorthonormal basis of R1 or a non-orthonormal basis of

A system of m linear equations in in unknowns can have aunique solution only if its determinant is different from zero.It follows that the determinant of the system (11)

(el, (e2, e1) ' (era, ei)(ex, ;) (e2, e2) (e(ex, e,) (e2, e,) (e e,)

must be different from zero. This determinant is known as theGramm determinant of the vectors e1, e2, e,.

EXAMPLES. 1. The method of least squares. Let y be a linearfunction of x1, x2, , x,; i.e., let

y = + c,,,x



where the c, are fixed unknown coefficients. Frequently the c, aredetermined experimentally. To this end one carries out a numberof measurements of ael, x2, , x, and y. Let x, x2,, X, y,denote the results of the kth measurement. One could try todetermine the coefficients c1, c2, , cm from the system of equa-tions

X11C1 X21C2 + + XnaCm = y1,Xl2C1 + X22 C2 + + Xm2 = Y2,

xincl x2c2 + + xmc, = y .However usually the number n of measurements exceeds thenumber m of unknowns and the results of the measurements arenever free from error. Thus, the system (13) is usually incompati-ble and can be solved only approximately. There arises the problemof determining t1, c2, , cm so that the left sides of the equationsin (13) are as "close" as possible to the corresponding right sides.As a measure of "closeness" we take the so-called meandeviation of the left sides of the equations from the correspondingfree terms, i.e., the quantity

E (X1kC1 X2nC2 + + XinkCk Ykrk=1

The problem of minimizing the mean deviation can be solveddirectly. However, its solution can be immediately obtained fromthe results just presented.

Indeed, let us consider the n-dimensional Euclidean space ofn-tuples and the following vectors: e, = (x11, x, , x1),

e2 = (X21 X22, 4 4, X2n), , = (x,, x,2, , x), andf =_- (y1, y2, y) in that space. The right sides of (13) are thecomponents of the vector f and the left sides, of the vector

ciel c2e2 + +

Consequently, (14) represents the square of the distance fromf to clel c2e2 + + c,e,n and the problem of minimizing themean deviation is equivalent to the problem of choosing ninumbers e1, e2, , Cm so as to minimize the distance from f tofo = c,e, e2e2 c,e,. If R1 is the subspace spanned by



the vectors el, e2, , e, (supposed linearly independent), thenour problem is the problem of finding the projection of f on RI.As we have seen (cf. formula (11)), the numbers c1, c2, ,

which solve this problem are found from the system of equations

(e1, ei)c, (e2, e1)c2 + + (e, el)a, = (f, e1),(15) (e1, e2)c1 (e2, e2)c2 + ' (e e2)c,,, = (f, e2),

I x,c2k

(e1, en)ci (e2, em)c, + + (e, en)c, = (f, e,),where

(f, ex) = xxiYs, ek) = xoxk,,I =1 1-1

The system of equations (15) is referred to as the system ofnormal equations. The method of approximate solution of thesystem (13) which we have just described is known as the methodof least squares.

EXERCISE. Use the method of least squares to solve the system ofequations

2c 33c 44c = 5.

Solution: el (2, 3, 4), f = (3, 4, 5). In this case the normal systemconsists of the single equation

(e1, enc = (ee f),

29c = 38; c = 38/29.

When the system (13) consists of n equat ons in one unknown

xic

(13')x2c = y2,

xcthe (least squares) solution is

(x, y) k=1xox

c

(x, x)



In this case the geometric significance of c is that of the slope of aline through the origin which is "as close as possible" to thepoints (x1, y1), (x2, y2), , (x, y).

2. Approximation of functions by means of trigonometric polynomials. Let(t) be a continuous function on the interval [0, 2n]. It is frequently

necessary to find a trigonometric polynomial P(t) of given degree v,Michdiffers from f(t) by as little as possible. We shall measure the proximity of1(t) and P (t) by means of the integral

u(t) - 13(1)i2 dl.

Thus, we are to find among all trigonometric polynom als of degree n,

P(t) = (a012) + a, cos t b, sin t + + an cos nt b sin nt,

that polynomial for which the mean deviation from f (t) is a minimum.Let us consider the space R of continuous functions on the interval

10, 2:r] in which the inner product is defined, as usual, by means of theintegral

(I, g) = 21,f(t)g(t) dt.

Then the length of a vector f(t) in R is given by

= 6atr EN)? dt.

Consequently, the mean deviation (16) is simply the square of the distancefrom j(t) to P(t). The trigonometric polynomials (17) form a subspace R,of R of dimension 2n + 1. Our problem is to find that vector of R, which isclosest to fit), and this problem is solved by dropping a perpendicular fromf(t) to R1.

Since the functions

1 cos t sin teo e, e, ,

V2,7 ' Nhz 1 Orcos nt sin nte,_, ; e,

Ahr

form an orthonormal basis in R, (cf. para. 1, Example 2), the requiredelement P(t) of R, is

2nP(t) = E ce1,

k=0

where

ck = (t, el),or


32 n-DIMENSIONAL SPACES

I 27 1 J.2"e,,....1c,= -- f(t)dt; tik., f(t) cos kt dt;

Vart o Vn o1

27

C2k - , 1(t) sin kt dt.' 7( /0

ThuS, for the mean deviation of the trigonometric polynomial

a, nP(t) := * Ea,. cos kt + b sin kt

2 k=1

from f(t) to be a minimum the coefficients a and bk must have the values

127 1 2=7a, 5 fit) dt; a =- 5 1(1) cos kt dt;X o x o

127b = 5 f(t) sin kt dt."Jo

The numbers a,. and bk defined above are called the Fourier coefficients ofthe function fit).

3. Isomorphism of Euclidean spaces. We have investigated anumber of examples of n-dimensional Euclidean spaces. In each ofthem the word "vector" had a different meaning. Thus in § 2,

Example 2, "vector" stood for an n-tuple of real numbers, in § 2,Example 5, it stood for a polynomial, etc.

The question arises which of these spaces are fundamentallydifferent and which of them differ only in externals. To be morespecific:

DEFINITION 2. Two Euclidean spaces R and R', are said to beisomorphic if it is possible to establish a one-to-one correspondencex 4> x' (x e R, x' e R') such that

I. If X 4> X' and y 4> y', then x + y X' + y', i.e., if OWcorrespondence associates with X E R the vector X' E R' and withy e R the vector y' e R', then it associates with the sum x + y thesum x' y'.

If x x', then Axe> Ax'.If x 4> x' and y 4--> y', then (x, y) = (x', y'); i.e., the inner

products of corresponding pairs of vectors are to have the same value.

We observe that if in some n-dimensional Euclidean space R atheorem stated in terms of addition, scalar multiplication andinner multiplication of vectors has been proved, then the same



theorem is valid in every Euclidean space R', isomorphic to thespace R. Indeed, if we replaced vectors from R appearing in thestatement and in the proof of the theorem by corresponding vec-tors from R', then, in view of the properties 1, 2, 3 of the definitionof isomorphism, all arguments would remain unaffected.

The following theorem settles the problem of isomorphism ofdifferent Euclidean vector spaces.

THEOREM 2. All Euclidean spaces of dimension n are isomorphic.We shall show that all n-dimensional Euclidean spaces are

isomorphic to a selected "standard" Euclidean space of dimensionn. This will prove our theorem.

As our standard n-dimensional space R' we shall take the spaceof Example 2, § 2, in which a vector is an n-tuple of real numbersand in which the inner product of two vectors x' = (E1, ¿2, , e)and y' = (7? na , nn) is defined to be

(x', = Eft?' + $2n2 + +Now let R be any n-dimensional Euclidean space. Let el,

e2, , en be an orthonormal basis in R (we showed earlier thatevery Euclidean space contains such a basis). We associate withthe vector

x = e2e2 + + enein R the vector

= (81, 82, , 8n)in R'.

We now show that this correspondence is an isomorphism.The one-to-one nature of this correspondence is obvious.

Conditions 1 and 2 are also immediately seen to hold. It remainsto prove that our correspondence satisfies condition 3 of the defini-tion of isomorphism, i.e., that the inner products of correspondingpairs of vectors have the same value. Clearly,

(x, Y) = $1ni + $2n2 + + Ennn,

because of the assumed orthonormality of the e,. On the otherhand, the definition of inner multiplication in R' states that

(x', y') = El% + $2n2 + + $nn.



Thus

(x', y') = (x, Y):

i.e., the inner products of corresponding pairs of vectors haveindeed the same value.

This completes the proof of our theorem.

EXERCISE. Prove this theorem by a method analogous to that used inpara. 4, § 1.

The following is an interesting consequence of the isomorphismtheorem. Any "geometric" assertion (i.e., an assertion stated interms of addition, inner multiplication and multiplication ofvectors by scalars) pertaining to two or three vectors is true if it istrue in elementary geometry of three space. Indeed, the vectors inquestion span a subspace of dimension at most three. This sub-space is isomorphic to ordinary three space (or a subspace of it),and it therefore suffices to verify the assertion in the latter space.In particular the Schwarz inequality a geometric theoremabout a pair of vectors is true in any vector space because it istrue in elementary geometry. We thus have a new proof of theSchwarz inequality. Again, inequality (7) of § 2

Ix + yl ixl

is stated and proved in every textbook of elementary geometry asthe proposition that the length of the diagonal of a parallelogramdoes not exceed the sum of the lengths of its two non-parallel sides,and is therefore valid in every Euclidean space. To illustrate, theinequality,

(f(t) g(t))2 dtbb

VSa [f (t)]2 dt VSa [g(t)]2 dt,

which expresses inequality (7), § 2, in the space of continuous func-tions on [a, b], is a direct consequence, via the isomorphism theo-rem, of the proposition of elementary geometry just mentioned.

§ 4. Bilinear and quadratic forms

In this section we shall investigate the simplest real valuedfunctions defined on vector spaces.

vr



1. Linear functions. Linear functions are the simplest functionsdefined on vector spaces.

DEFINITION I. A linear function (linear form) f is said to bedefined on a vector space if with every vector x there is associated anumber f(x) so that the following conditions hold:

_fix Y) f(x) +AY),!(Aa) = 1f (x).

Let et, e2, , en be a basis in an n-dimensional vector space.Since every vector x can be represented in the form

x = enen,

the properties of a linear function imply that

f(x) = f&ie, + E2e2 + + ene) eifie,) ¿j(e2) ++ enf(e.)-

Thus, if et, e2, , en is a basis of an n-dimensional vector spaceR, x a vector whose coordinates in the given basis are E1, $2, , E,and f a linear function defined on R, then

(1) f(x) = aiel a252+ -in amen,

where f(e) = = 1, 2, , n).The definition of a linear function given above coincides with

the definition of a linear function familiar from algebra. Whatmust be remembered, however, is the dependence of the a, on thechoice of a basis. The exact nature of this dependence is easilyexplained.

Thus let et, e2, , e and e'1, e'2, , e'n be two bases in R.Let the e', be expressed in terms of the basis vectors et, e2, , eby means of the equations

e', = acne, + ac21e2 + + ocnIenre'2 acne, + 122e2 + + (Xn2en,

e' 2,2,e2 + + 2,e.Further, let

f(X) = a252 + anen



relative to the basis el, e2, , en, and

f(x) = a'le', + a'2E12 + + a'e'relative to the basis e'1, e'2, e'.

Since a, = f(e,) and a' k = f(e'), it follows that

ai = f(xikei cc2ke2 + + C(nk en) = Xlki(ei) ac22f(e2)

+ + akf(e.) = ctik + c(2k az + + anka.This shows that the coefficients of a linear form transform under achange of basis like the basis vectors (or, as it is sometimes said,cogrediently).

2. Bilinear forms. In what follows an important role is played bybilinear and quadratic forms (functions).

DEFINITION 2. A (x; y) is said to be a bilinear function (bilinearform) of the vectors x and y if

for any fixed y, A (x; y) is a linear function of x,for any fixed x, A (x; y) is a linear function of y.

In other words, noting the definition of a linear function,conditions 1 and 2 above state that

A (xi + x2; y) = A (x,; y) + A (x2; y), A (Ax; y) = (x; y),A (x; + y2) = A (x; yi) + A (x; yz), A (x; pty) = yA(x; y).

EXAMPLES. 1. Consider the n-dimensional space of n-tuples ofreal numbers. Let x = ($1, bt2, , E), y = n2, , nk), anddefine

A (x; y) = a12e072 ' ' +(2) + an 27/1 anE27/2 ' ' a2 nne2n

anlennl an2enn2 + + annennn

A (x; y) is a bilinear function. Indeed, if we keep y fixed, i.e., if

we regard ni , n2, n as constants, ae ink depends linearlyi, k1

on the $,; A (x; y) is a linear function of x =- (E1, ey, en).

Again, if $1, $2, E are kept constant, A (x; y) is a linearfunction of y.



2. Let K(s, t) be a (fixed) continuous function of two variabless, t. Let R be the space of continuous functions f(t). If we put

b b

A (f; g) = Is

K(s, t)f(s)g(t) ds dt,

then A (f; g) is a bilinear function of the vectors f and g. Indeed,the first part of condition 1 of the definition of a bilinear formmeans, in this case, that the integral of a sum is the sum of theintegrals and the second part of condition 1 that the constant A.may be removed from under the integral sign. Conditions 2have analogous meaning.

If K(s, t) 1, thenb

A (f; g) =jib

f(s)g(t) ds dt = f(s) ds g(t) dt,

i.e., A( f; g) is the product of the linear functions!: f(s) ds and

bJag(t) dt.

EXERCISE. Show that if 1(x) and g(y) are linear functions, then theirproduct 1(x) g(y) is a bilinear function.

DEFINITION 3. A bilinear function (bill ear form) is calledsymmetric 21

A (x; y) = A (y x)

for arbitrary vectors x and y.

In Example / above the bilinear form A (x; y) defined by (2) issymmetric if and only if aik= aid for all i and k.

The inner product (x, y) in a Euclidean space is an example of asymmetric bilinear form.

Indeed, Axioms I, 2, 3 in the definition of an inner product(§ 2) say that the inner product is a symmetric, bilinear form.

3. The matrix of a bilinear form. We defined a bilinear formaxiomatically. Now let el, e2, en be a basis in n-dimensionalspace. We shall express the bilinear form A (x; y) using thecoordinates ei, e2, e of x and the coordinates ni, n,, , ri ofy relative to the basis e1, e2, e. Thus,A (x; y) = A (ei ei $2e2 + + e en; /he, )72e2 + + 71e).In view of the properties 1 and 2 of bilinear forms



A (x; y) = I A (ei;k=1

or, if we denote the constants A (ei; e1) by a,,

A (x; y) = ai,e

To sum up: Every bilinear form in n-dimensional space can bewritten as

A (x: Y) = azkink,i, 1=1

where X -r- $1e1 + 4- e, y = ?he,. + + tine, anda, = A (ei; ek).

The matrix a/ = is called the matrix of the bilinear formA (x; y) relative to the basis el, e2, , en.

Thus given a basis e1, e2, , e the form A (x; y) is determinedby its matrix at= Ha11.

EXAMPLE. Let R be the three-dimensional vector space of triples(El. EP 59) of real numbers. We define a bilinear form in R by means of theequation

A (x; y) = El% + 2eon + 3eana.Let us choose as a basis of R the vectors

e, = (1, 1, 1); e, = (1, 1, 1); e, = (1, 1, 1),

6 0 4= 0 6 d.4 2 6

It follows that if the coordinates of x and y relative to the basis e,, e2, e,are denoted by 5',, 5',, and n'2, TA, respectively, thenA (x; y) = 65'oy, 4E', n'a 6E', + + 2 3.17' +

and compute the matrix .91 of the bilinear form A (x; y). Nlaking use of (4)we find that:

an 1 1 + 2 1 1 + 3 1 1 = 6,a = an 1 1 + 2 1 1 + 3 1 (-1) = 0,a = 11 + 2 1 1 + 3 (-1) (-1) = 6,a= a 1 1 + 2 U (-1) + 3 1 (-1) = 4,a = a= 1 1 1- 2 1 (-1) + 3 (-1)(-1) = 2,a 1 - 1 + 2 (-1) (-1) + 3 (-1) (I) = 6,


(5)


4. Transformation of the matrix of a bilinear form under a changeof basis. Let el, e2, e and f, f2, f be two bases of ann-dimensional vector space. Let the connection between thesebases be described by the relations

= cue' c21e2 d- + ce,c22e2 + +f2 = cue].

In = crne + c2e2 + +which state that the coordinates of the vector fn relat ve to thebasis el., e2, , e are cm, c22, - , c. The matrix

1[clici2w C21C22 . C2n

cn2c2 cn

is referred to as the matrix of transition from the basis e,, e2,to the basis f1, f2, , f.

Let si = I laid I be the matrix of a bilinear form A (x; y) relativeto the basis e1, e2, , e and .4 = I ibikl 1, the matrix of that formrelative to the basis f1, f2, L. Our problem consists in findingthe matrix I IbI I given the matrix I kJ.

By definition [eq. (4)] b = A (f,,, fe), i.e., bp, is the value of ourbilinear form for x f, y = fe. To find this value we make useof (3) where in place of the ei and ni we put the coordinates of fp

and fe relative to the basis e1, e2, , e, i.e., the numbersc, ca, , en and c, c, , c,. It follows that

(6) by, = A (fp; fe) = acic..k=-1

We shall now express our result in matrix form. To this endwe put e1,, = c' The e',,1 are, of course, the elements of thetranspose W' of W. Now b becomes 4

4 As is well known, the element c of a matrix 55' which is the product oftwo matrices at = 11a0,11

cik E ai,zbk.c/-1

Using this definition twice one can show that i = d.Vt, then

= E abafic Aft.a.ß=1

and a = is defined as


40

(7*)

Using

(7)

LECTURES ON LINEAR ALGEBRA

Icriaikc,.t, k=1

matrix notation we can state that

=wi sr.Thus, if s is the matrix of a bilinear form A (x; y) relative to the

basis el, e2, , e and [,.:W its matrix relative to the basis f1, f2, , f,then PI = W' dW, where W is the matrix of transition from e1,e2, , e to f1, f2, , f and W' is the transpose of W.

5. Quadratic forms

DEFINITION 4. Let A (x; y) be a symmetric bilinear form. Thefunction A (x; x) obtained from A (x; y) by putting y = x is calleda quadratic form.

A (x; y) is referred to as the bilinear form polar to the quadraticform A (x; x).

The requirement of Definition 4 that A (x; y) be a symmetricform is justified by the following result which would be invalid ifthis requirement were dropped.

THEOREM 1. The polar form A (x; y) is u iquely determined by i squadratic form.

Proof: The definition of a bilinear form implies thatA (x 4-- y; x + y) = A (x; x) + A (x; y) + A (y; x) + A (y; y).Hence in view of the symmetry of A (x; y) (i.e., in view of theequality A (x; y) = A (y; x)),

A (x; y) -- (x + y; x + y) A (x; x) A (y; y)].

Since the right side of the above equation involves only values ofthe quadratic form A (x; x), it follows that A (x; y) is indeeduniquely determined by A (x; x).

To show the essential nature of the symmetry requirement inthe above result we need only observe that if A (x; y) is any (notnecessarily symmetric) bilinear form, then A (x; y) as well as thesymmetric bilinear form

A ; (x; A(y; x)i



give rise to the same quadratic form A (x; x).We have already shown that every symmetric bilinear form

A (x; y) can be expressed in terms of the coordinates E, of x andnk of y as follows:

A (X; 3) I atkeznk,

where a, = a,. It follows that relative to a given basis everyquadratic form A (x; x) can be expressed as follows:

A (x; x) = E aikeik, au, =k=1

We introduce another importantDEFINITION 5. A quadratic form A (x; x) is called positive definite

if for every vector x

A (x; x) > O.

EXAMPLE. It is clear that A (x; x) = + $22 + -in $2 is apositive definite quadratic form.

Let A (x; x) be a positive definite quadratic form and A (x; y)its polar form. The definitions formulated above imply that

A (x; y) = A (y; x).A (x, + x2; y) = A (xl; y) + A (x2; y).A (ibc; y) = (x; y).A (x; x) 0 and A (x; x) > 0 for x O.

These conditions are seen to coincide with the axioms for aninner product stated in § 2. Hence,

an inner product is a bilinear form corresponding to a positivedefinite quadratic form. Conversely, such a bilinear form alwaysdefines an inner product.

This enables us to give the following alternate definition ofEuclidean space:

A vector space is called Euclidean if there is defined in it a positivedefinite quadratic form A (x; x). In such a space the value of theinner product (x, y) of two vectors is taken as the value A (x; y) of the(uniquely determined) bilinear form A (x; y) associated with A (x; x).



+ + 2ainnin,

(k = 3, ,n)

§ S. Reduction of a quadratic form to a sum of squaresVVre know by now that the expression for a quadratic form

A (x; x) in terms of the coordinates of the vector x depends on thechoice of basis. We now show how to select a basis (coordinatesystem) in which the quadratic form is represented as a sum ofsquares, i.e.,

A (x; x) = Al$12 + 12E22 . . , $n2.

Thus let f1, f3, ,f ,,, be a basis of our space and let

A (x; X) = a zo in

where ni, )72, nn are the coordinates of the vector x relative tothis basis. We shall now carry out a succession of basis transfor-mations aimed at eliminating the terms in (2) containing productsof coordinates with different indices. In view of the one-to-onecorrespondence between coordinate transformations and basistransformations (cf. para. 6, § 1) we may write the formulas forcoordinate transformations in place of formulas for basis trans-formations.

To reduce the quadratic form A (x; x) to a sum of squares it isnecessary to begin with an expression (2) for A (x; x) in which atleast one of the a (a is the coefficient of )7,2) is not zero. If theform A (x; x) (supposed not identically zero) does not containany square of the variables n2, , nn , it contains one productsay, 2a1277072. Consider the coordinate transformation defined by

= nil + 7/'2n2 = 7711 niank = n'k

Under this transformation 2a12771)72 goes over into 2a12(n1 771).Since an = an = 0, the coefficient of )7'2, stays different from zero.

We shall assume slightly more than we may on the basis of theabove, namely, that in (2) a O. If this is not the case it can bebrought about by a change of basis consisting in a suitable changeof the numbering of the basis elements. We now single out allthose terms of the form which contain n2

ann12 + 2annIn2



and "complete the square,i.e., write

an/h2 24112971712 ' ' 2a/nn1 (ann, + + a1)2 B.

It is clear that B contains only squares and products of the terms

al2172, " (Jinn. so that upon substitution of the right side of (3)

in (2) the quadratic form under consideration becomes

1(x; x) = (a11711 + ' + a1)2 +

a

where the dots stand for a sum of terms in the variables t)2' 4.If we put

711* = aniD a12n2 "1)2* ,

(3)

11

71: =then our quadratic form goes over into

n *, *, *si ik 'A (x; x) *2 +all

n ,

The expression a ik* n i* n k* is entirely analogous to thei, k=2

right side of (2) except for the fact that it does not contain thefirst coordinate. If we assume that a22* 0 0 (which can be achiev-

ed, if necessary, by auxiliary transformations discussed above)and carry out another change of coordinates defined by

** ni*ni ,

272** = (122* n2* + a23* n3* + +n3** = n3*,

nn*,nn**

our form becomesfi

A (x; x) th**2 **2 + a ** ** **n2ik,,hThc.au a22* t,1=3



After a finite number of steps of the type just described our ex-pression will finally take the form

A (x; x) , 21E18 ¿2$22 27n em2,

where m n.We leave it as an exercise for the reader to write out the basis

transformation corresponding to each of the coordinate transfor-mations utilized in the process of reduction of A (x; x) (cf. para. 6,§ 1) and to see that each change leads from basis to basis, i.e., ton linearly independent vectors.

If m < n, we put 4+1= = An = O. We may now sum upour conclusions as follows:

THEOREM 1. Let A (x; x) be a quadratic form in an n-dimensionalspace R. Then there exists a basis el, e2, , en of R relative towhich A (x; x) has the form

A (x; x) = 21E12 + 22E22 An en2,

where E1, E2, , E are the coordinates of x relative to e1, e2, , e.We shall now give an example illustrating the above method of reducing

a quadratic form to a sum of squares. Thus let A (x; x) be a quadratic formin three-dimensional space which is defined, relative to some basis f1,f2,f3,by the equation

SI = rits,es = 712. + 27 s*e3 = '73'

If

then

A (x; x) 271,7/2 4flo, 7122 8%2.

Vi =

77. =

A (x; x) = 71' ,8 + 27j + 41)/ 2?y, 8712.Again, if

171* +=

thenn.s =

A (x; x) =_ n1.2 +722.2 + 4,j* ,j*Finally, if



then A (2c; x) assumes the canonical form

A (x; x) = _e,2 e22 12e22

If we have the expressions for ni*, 712*, , n* in terms ofni, 712, n, for nj**, 112", , n** in terms of ni*, 712*, retc., we can express el, e2, , en in terms of ni, ni, ,ìj in theform

= ciini + Cl2712 Clnnn

E2 = C21711 C22n2 C2nnn

$n = enini + cn2n2 + + Cnnn.

Thus in the example just given

1/1 7/2,

e2 2n3,

¿3= )73.

In view of the fact that the matrix of a coordinate transforma-tion is the inverse of the transpose of the matrix of the corre-sponding basis transformation (cf. para. 6, § 1) we can express thenew basis vectors ei, e2, , e, in terms of the old basis vectors

f2, ' f.e, = 6/12f2 + +e2 = d21f1 d2212 + + d2nf,

= dnif d2f2 + +If the form A (x; x) is such that at no stage of the reduction

process is there need to "create squares" or to change the number-ing of the basis elements (cf. the beginning of the description ofthe reduction process in this section), then the expressions for

El, E2, , e in terms of711, 712, , n take the form

Ej = cu.% c12n2 + "e2 C22n2 " + C2nnn

i.e. the matrix of the coordinate transformation is a so calledtriangular matrix. It is easy to check that in this case the matrix



of the corresponding basis transformation is also a triangularmatrix:

=e2 dfif1 d22f2,

er, = difi c/n2f2 + +

§ 6. Reduction of a quadratic form by means of atriangular transformation

1 In this section we shall describe another method of construct-ing a basis in which the quadratic form becomes a sum of squares.In contradistinction to the preceding section we shall express thevectors of the desired basis directly in terms of the vectors of theinitial basis. However, this time we shall find it necessary toimpose certain restrictions on the form A (x; y) and the initialbasis f1, f2, , fn. Thus let a II be the matrix of the bilinearform A (x; y) relative to the basis f1, f2, , fn. We assume thatthe following determinants are different from zero:

(1)

ali 0; LI 2=at2

a21 a22

a11 412 mm. a1n

di, = a21 a22 2n 0 O.

0 0; ;

an2 am,

(It is worth noting that this requirement is equivalent to therequirement that in the method of reducing a quadratic form to asum of squares described in § 5 the coefficients an , a22*, etc., bedifferent from zero.

Now let the quadratic form A (x; x) be defined relative to thebasis f1, f2, f by the equation

A (x; x) = I a1k,4, where ai, = A (fi; fk).i, 2-1

It is our aim to define vectors el, e2, , en so that(2) A (ei; ek) 0 for i k (i , k --= 1, 2, , n).


e = c22f2 + +We could now determine the coefficients ;k from the conditions(2) by substituting for each vector in (2) the expression for thatvector in (3). However, this scheme leads to equations of degreetwo in the cc°, and to obviate the computational difficulties involvedwe adopt a different approach.

We observe that if

A (ek; fi) = 0 for i = 1, 2, , k I,

then

(ek; ei) = O for

Indeed, if we replace e, by

ai1f1 oci2f2 '

then

A (ek; e1) = A (ek; + ;212 + + 2(f1)= ocii A (ek; fi) oci2A (ek; f2) + + aA(ek; fi).

Thus if A (ek; f1) = 0 for every k and for all i < k, thenA (e,; = 0 for i < k and therefore, in view of the symmetry ofthe bilinear form, also for i > k, i.e., e1, e2, , en is the requiredbasis. Our problem then is to find coefficients ,x, OCk2, Ctkk

such that the vector

ek = atk2f2 + + Mkkfk

satisfies the relations

A (ek; = 0, (i = 1, 2, , k I).

We assert that conditions (4) determine the vector ek to withina constant multiplier. To fix this multiplier we add the condition

A (ek; f2) = 1.

We claim that conditions (4) and (5) determine the vector ek


We shall seek these vectors in the form

=

(3)e2 c(21f1 22f2,



uniquely. The proof is immediate. Substituting in (4) and (5)the expression for e, we are led to the following linear system forthe kg.

chkiA (f1; f1) 12A (f1; f2) ' + kkA (f1; f = °,111A (f2' f1) ac12A (f2; f2) + + akkA (f2; fk) = °,

(6)

acnA (f1-1; fl) 1.12A (f1-1; f2) + + akkA (LI.; f 0,

(flt; fl) oc12A (fk; f2) + + lickA (fk; fk) 14

The determinant of this system is equal to

A (fk; f1) A (fi, f2) A (f2; fk)A (f2; f1) A (f 2, f2 A (f2; f

A (fk; fk) A (fk, f2) A (fk; fk) I

and is by assumption (1) different from zero so that the system (6)has a unique solution. Thus conditions (4) and (5) determine ekuniquely, as asserted.

It remains to find the coefficients bt of the quadratic formA (x; x) relative to the basis e1, e2, en just constructed. Aswe already know

b11 A (ei; ek).

The basis of the ei is characterized by the fact that A (e1; ek) =for i k, i.e., bin = 0 for i k. It therefore remains to computeb11 = A (ek; ek). Now

A (ek; ek) = A (ek; oc11f1 12f2 + + °C1741)

= C(11A (e1; fl) C(12A (elc; f2) + + 11114 fek; r

which in view of (4) and (5) is the same as

A (ek; ek) = ockk'

The number x11 can be found from the system (6) Namely, byCramer's rule,

A 1,-1QC/0c =

where A1_, is a determinant of order k I analogous to (7) andAo = 1.


Thus

Further, let the determinants

,11=a11,42


A k-1blek = A ,e; e) =

A k

To sum up:THEOREM 1. Let A (x; x) be a quadratic form defined relative to

some basis f1, f2, , fr, by the equation

A (x; x) = aiknink a ik = A (fi; fk).l<=1

an2 (inn

be all different from zero Then there exists a basis el, e2, , erelative to which A (x; x) is expressed as a sum of squares,

A (x; x) = 40A+ I._AI 22 2.

AI A2 A

Here ,4Ç, are the coordinates of x in the basis el, e2, , en.

This method of reducing a quadratic form to a sum of squares is

known as the method of Jacobi.REMARK: The fact that in the proof of the above theorem we

were led to a definite basis el, e2, , en in which the quadraticform is expressed as a sum of squares does not mean that this basisis unique. In fact, if one were to start out with another basis

f2, , f (or if one were simply to permute the vectors f1,

f2, , fn) one would be led to another basis el, e2, , en.Also, it should be pointed out that the vectors el, e2, , en need

not have the form (3).EXAMPLE. Consider the quadratic form

2E1' + 3E1E2 + 4E1E3 + E22 +

in three-dimensional space with basis

f= (1, 0, 0), f, =-- (0, 1, 0), f, = (0, 0, 1).

an a12 ' aln

An = an a22 a2n

all a12

an an


Or


The corresponding bilinear form is

A (x; y) =- 2e1n1 pi. + teh, + $012 + 2e3m1 + e3 7./3

The determinants A,, 43, 43 are 2, 1, --1-,&7, i.e., none of them vanishes.Thus our theorem may be applied to the quadratic form at hand. Let

el = ce,f, = (i, 0, 0),e, = 82313 a22f2 (823, 839, 0),e, = 83113 + 822f3 + ma, = (cc, 232, 833).

The coefficient cc, is found from the condition

A (e1; f2) 1,

i.e., 2a = I, or o( i ande, = if, =(j 0, 0).

Next a and 822 are determined from the equations

A (e2;f,) = 0 and A (e2, f2) = 1,o r,

whence

and

2M21 = 0;

121 = 6,

e, = 6f1 8f2 = (6, 8, 0).Finally, ct, 822, an are determined from the equations

A (ea; fi) = 0, A (es; 12) = 0, A (e3; fa) = 1

21ai 1832 + 2833 =jln + 832 = 0,28. 833 = 1,

whence

8318 12-y,' -33 133

and

e3-1871 12ea + 117 fa _(S 127, 117),

Relative to the basis e,, e2, e, our quadratic form becomes

1 Ai 42A(x;x) = C12 + C13 C32 Cl2 8C22 11-7C32.AH 43

Here C. C C2 are the coordinates of the vector x in the basis e,, e e



2. In proving Theorem I above we not only constructed a basisin which the given quadratic form is expressed as a sum of squaresbut we also obtained expressions for the coefficients that go withthese squares. These coefficients are

I A, An_,

A, A2 " 4so that the quadratic form is

(8)1 e22

,

2

It is clear that if A1_1 and A, have the same sign then the coefficientof E12 is positive and that if and A, have opposite signs, thenthis coefficient is negative. Hence,

THEOREM 2. The number of negative coefficients which appear inthe canonical form (8) of a quadratic form is equal to the number ofchanges of sign in the sequence

1, A1, Z12, , A.Actually all we have shown is how to compute the number of

positive and negative squares for a particular mode of reducing aquadratic form to a sum of squares. In the next section we shallshow that the number of positive and negative squares is independ-ent of the method used in reducing the form to a sum of squares.

Assume that d, > 0, /12 > 0, , A > O. Then there exists abasis e1, e2, , e7, in which A (x; x) takes the form

A (x; x) 4E12 + 22E22 Anen2,

where all the A, are positive. Hence A (x; x) 0 for all x and

A (x; x) = I 21E121=1

is equivalent to

E1= E2 = = En = O.

In other words,If A1> 0, A2 > 0, A,, > 0, then the quadrat e form A (x; x)

is positive definite.



Conversely, let A (x; x) be a positive definite quadratic form.We shall show that then

4k> 0 (k 1, 2, , n).

We first disprove the possibility that

A (f,; fi) A (fi; f2) A (f f,)A (f2; f,) A(f2;f2) A (f fie)

A (f,; fi) A (f; f2) A (f,; f,)

If A, = 0, then one of the rows in the above determinant would bea linear combination of the remaining rows, i.e., it would be possi-ble to find numbers y1,142, , y, not all zero such that

yiA(fi; fi) ,a,A (f2; fz) + + yk.A (fk; f,) 0,

1, 2, , k. But thenA (pifi p2f2 -F + phf; fi) ---- 0 (i 1, 2, k),

so thatA (yifi -F p2f2 + ' kikfk; g212 + + p,f,) = O.In view of the fact that pif, p2f2 + -F p,f, 0, the latterequality is incompatible with the assumed positive definite natureof our form.

The fact that A, (k 1, , n) combined with Theorem 1permits us to conclude that it is possible to express A (x; x) in theform

A (x; x) = A2E12+ A2E22 A2En2, Ak _ A k-1

Since for a positive definite quadratic form all 4> 0, it followsthat all A, > 0 (we recall that /10 = 1).

We have thus provedTHEOREM 3. Let A (x; y) be a symmetric bilinear form andf2, , f, a basis of the n-dimensional space R. For the quadratic

form A (x; x) to be positive definite it is necessary and sufficient that

> 0, 42 > 0, tl > O.This theorem is known as the Sylvester criterion for a quadrat c

form to be positive definite.


ti-DIMENSIONAL SPACES 53

It is clear that we could use an arbitrary basis of R to express theconditions for the positive definiteness of the form A (X; X). In particularif we used as another basis the vectors f1, f2. f in changed order, thenthe new A1, .4,, , An would be different principal minors of the matrixHaik11. This implies the following interesting

COROLLARY. I/the principal minors z1,, A2, , A,,o/ a matrix ilaikllof aquadratic form A (x; x) relative to some basis are positive, then all principalminors ok that matrix are positive.

Indeed, if 41, A2, , A. are all positive, then A (x; x) is positive definite.Now let A be a principal minor of jja1111 and let p 1,2, - p, be the num-bers of the rows and columns of jaj in A. If we permute the original basisvectors so that the pith vector occupies the ith position (i 1, k) andexpress the conditions for positive definiteness of A (x; x) relative to the newbasis, we see that A > O.

3. The Gramm determinant. The results of this section are validfor quadratic forms A (x; x) derivable from inner products, i.e.,for quadratic forms A (x; x) such that

A (x; x) '(x, x).

If A (x; y) is a symmetric bilinear form on a vector space R andA (x; x) is positive definite, then A (x; y) can be taken as an innerproduct in R, i.e., we may put (x, y) A (x; y). Conversely, if(x, y) is an inner product on R, then A (x; y) (x, y) is a bilinearsymmetric form on R such that A (x; x) is positive definite. Thusevery positive definite quadratic form on R may be identified withan inner product on R considered for pairs of equal vectors only,A (x; x) (x, x). One consequence of this correspondence is thatevery theorem concerning positive definite quadratic forms is atthe same time a theorem about vectors in Euclidean space.

Let e1, e2, , e, be k vectors in some Euclidean space. Thedeterminant

is known as the Gramm determinant of these vectors

THEOREM 4. The Gramm determinant of a system of vectorse1, e2, e, is always >_ O. This determinant is zero if and only ifthe vectors el, e2, , ek are linearly dependent.

el) (el, e2) (el, ek)e1) (e2, e2) (e2, ek)

(ek, (ek, e2) (ek, ek)



Proof; Assume that e1, e2, , e, are linearly independent.Consider the bilinear form A (x; y) (x, y), where (x, y) is theinner product of X and y. Then the Gramm determinant ofel, e2, , ek coincides with the determinant 4, discussed in thissection (cf. (7)). Since A (x; y) is a symmetric bilinear form suchthat A (x; x) is positive definite it follows from Theorem 3 that47c >0.

We shall show that the Gramm determinant of a system oflinearly dependent vectors e1, e2, , ek is zero. Indeed, in thatcase one of the vectors, say e,, is a linear combination of the others,

ek Ale]. + 22e2 ' +It follows that the last row in the Gramm determinant of thevectors e1, e2, e, is a linear combination of the others and thedeterminant must vanish. This completes the proof.

As an example consider the Gramm determinant of two vectorsx and y

= (x, x) (x, y)(Y, x) (Y, Y)

The assertion that > 0 is synonymous with the Schwarzinequality.

4. has the following geometric sense: d2 is the square of the area of theEXAMPLES. 1. In Euclidean three-space (or in the plane) the determinant

parallelogram with sides x and y. Indeed,

(x, y) = (y, x) = Ix1 lyr cos ry,

where y) is the angle between x and y. Therefore,

= ixF2 ly12 1x12 13712 cos' 9, = 1x18 13,12 (1 cos2 99) = 1x12 13712 sin' 99,i.e., J. has indeed the asserted geometric meaning.

2. In three-dimensional Euclidean space the volume of a parallelepipedon the vectors x, y, z is equal to the absolute value of the determinant

xi 23 23V Ya Ya Ya

2, 23 23

where 3/4, y 2', are the Cartesian coordinates of x, y, z. Now,3/42 + 3/42 + x32 x1 y1 1- z.Y. 3/4 Y3 x121 + x222 -; 3/423

= Y1 X1 + Y2 x2 T Y3 X3 Y12 + Ya' + 1I32 y,z, + yrz, + y,z,z,x, + z,x, + zax, 2,y, + z,y, + z,y, z,, z28 + za2



(x, y)(3', Y)

y)

Thus the Gramm determinant of three vectors x, y, z is the square of thevolume of the parallelepiped on these vectors.

Similarly, it is possible to show that the Gramm determinant of k vectorsx, y, , w in a k-dimenional space R is the square of the determinant

X1 22 " Xfr

(9)Y1 Y2 Yk

(x, z)(37, z)(z, z)

W1 W2 " Wk

where the xi are coordinates of x in some orthogonal basis, the yi are thecoordinates of y in that basis, etc.

(It is clear that the space R need not be k-dimensional. R may, indeed,be even infinite-dimensional since our considerations involve only thesubspace generated by the k vectors x, y, , w.)

By analogy with the three-dimensional case, the determinant (9) isreferred to as the volume of the k-dimensional parallelepiped determined bythe vectors x, y, w.

3. In the space of functions (Example 4, § 2) the Gramm determinanttakes the form

r b rb "bI 10 (t)de .1. /1(012(t)dt .1 11(1)1k(i)di

Pb12(t)f1(t)dt Pba 122(t)dt f 12(t)1(t)dt

a

r b r b Pbtic(1)11(1)dt 1k(t)12(t)dt " f (t)dt

a . a

and the theorem just proved implies that:The Gramm determinant of a system of functions is always 0. For a

system of functions to be linearly dependent it is necessary and sufficient thattheir Gramm determinant vanish.

§ 7. The law of inertia

1. T he law of inertia . There are different bases relative towhich a quadratic form A (x; x) is a sum of squares,

(1) A (x; x) = 2,e,21-1

By replacing those basis vectors (in such a basis) which corre-spond to the non-zero A, by vectors proportional to them we obtain a



representation of A (x; x) by means of a sum of squares in whichthe A, are 0, 1, or 1. It is natural to ask whether the number ofcoefficients whose values are respectively 0, I, and lis dependenton the choice of basis or is solely dependent on the quadraticform A (x; x).

To illustrate the nature of the question consider a quadraticform A (x; x) which, relative to some basis el, e2, , en, isrepresented by the matrix

where a = A (ei; ek) and all the determinants

41 = an; ''2 =

z1n

all a12

a22

an anan an ... a2

a, an

are different from zero. Then, as was shown in para. 2, § 6, all ).;in formula (1) are different from zero and the number of positivecoefficients obtained after reduction of A (x; x) to a sum of squaresby the method described in that section is equal to the number ofchanges of sign in the sequence 1, z11, .(12, , A,,.

Now, suppose some other basis e'1, e'2, , e' were chosen.Then a certain matrix I ja'11 would take the place of I laikl andcertain determinants

would replace the determinants z11, ZI2, A,,. There arises thequestion of the connection (if any) between the number of changesof sign in the squences 1, , z1'2, , A',, and I, A1, , 2, , zl.

The following theorem, known as the law of inertia of quadratictorms, answers the question just raised.

THEOREM 1. If a quadratic form is reduced by two differentmethods (i.e., in two different bases) to a sum of squares, then thenumber of positive coefficients as well as the number of negativecoefficients is the same in both cases.



Theorem 1 states that the number of positive A. in (1) and thenumber of negative A, in (1) are invariants of the quadratic form.Since the total number of the A, is n, it follows that the number ofcoefficients A, which vanish is also an invariant of the form.

We first prove the following lemma:

LEMMA. Let R' and R" be two subspaces of an n-dimensionalspace R of dimension k and 1, respectively, and let k 1 > n. Thenthere exists a vector x 0 contained in R' n R".

Proof: Let e, e2, , e, be a basis of R' and f,, f2, , fi,basis of R". The vectors el, e2, e,, f,, f2, , f, are linearlydependent (k 1 > n). This means that there exist numbersAl, A2, " Ak, kt2, pi not all zero such that

2.2e2 + + Akek p2f, + + pit = 0,i.e.,

Ale 2.2e2 + + Atek = !IA

Let us put

ei A2e2 + + Akek = Pelf' /42f2 ' !lift = x.It is clear that x is in R' n R". It remains to show that x O.

If x = 0, Al, A2, , 2, and pi., n2, p would all be zero, whichis impossible. Hence x O.

We can now prove Theorem 1.

Proof: Let e,, e2, , e be a basis in which the quadratic formA (x; x) becomes

A (x; x) 22 $2,2 $223+1 E2p+2 $2,±Q.

(Here E,, E2, , En are the coordinates of the vector x, i.e.,X 52e2 + + $e -F e, + + erk,e, +-Fene.) Let f,, f2, , f be another basis relative to which thequadratic form becomes

A (x; x) = ni2 n22 ... )72p, n2p, . . . _.

(Here )7,, n2 , , ti are the coordinates of x relative to the basisf, f2, , fm.) We must show that p = p' and q =-. q'. Assumethat this is false and that p > p', say.

Let R' be the subspace spanned by the vectors el, e2, ep.



R' has dimension p. The subspace R" spanned by the vectorsfil, , f has dimension n p'. Since n p > n

(we assumed 1) > p'), there exists a vector x 0 in R' n R"(cf. Lemma), i.e.,

x = E, e, + e eand

X = np fil + + nil-Fa' fil+qt + + nnfn

The coordinates of the vector x relative to the basis e, e2, , eare E1, E2, 0, 0 and its coordinates relative to the basis

f,, , f, are 0, 0, , Q nil+,, , n. Substituting thesecoordinates in (2) and (3) respectively we get, on the one hand,

A (x; x) = + $22 >

(since not all the E, vanish) and, on the other hand,

A (x; x) = - eil+1- n22,-+2 -c 0-

(Note that it is not possible to replace < in (5) with <, for, whilenot all the numbers il.±, , are zero, it is possible thatnil+, = n, +2 = = nil+0. = O.) The resulting contradictionshows that fi = p'. Similarly one can show that q = q'. Thiscompletes the proof of the law of inertia of quadratic forms.

2. Rank of a quadratic formDEFINITION 1. By the rank of a quadratic form we mean the

number of non-zero coefficients 2, in one of its canonical forms.

The reasonableness of the above definition follows from the lawof inertia just proved. We shall now investigate the problem ofactually finding the rank of a quadratic form. To this end weshall define the rank of a quadratic form without recourse to itscanonical form.

DEFINITION 2. By the null space of a given bilinear form A (x; y)we mean the set R, of all vectors y such that A (x; y) = 0 for everyx e R.

It is easy to see that R, is a subspace of R. Indeed, let y,y, e Et, , i.e., A (x; y,) = 0 and A (x; y2) = 0 for all x e R. ThenA (x; yi y,) = 0 and A (x; 0 for all x e R. But thismeans that y, y, e R, and 41 e R,.



We shall now try to get a better insight into the space Ro.If f, f,, , f is a basis of R, then for a vector

Y = n2f2 + + nnf.to belong to the null space of A (x; y) it suffices that

A (ft; y) 0 for i= 1, 2, n.

Replacing y in (7) by (6) we obtain the following system ofequations:

A (f1; 71f1 + n2f2 + + nnf,,) = QA (f2; 702 + n2f2 + + mit,) = Q

A (fn; nifi /02 + + ni,) = O.If we put A (fi; f,e) = aik, the above system goes over into

ant), + an% + ainnn = 0,a22n2 = 0,

anini + 12.202 + ' + = O.Thus the null space R, consists of all vectors y whose coordinates2h, )72, , 2,17, are solutions of the above system of linear equations.As is well known, the dimension of this subspace is n r, where ris the rank of the matrix Ikza

We can now argue thatThe rank of the matrix ra11 of the bilinear form A (x; y) is

independent of the choice of basis in R (although the matrix la ,,,[1

does depend on the choice of basis; cf. § 5).Indeed, the rank of the matrix in question is n ro, where ro

is the dimension of the null space, and the null space is completelyindependent of the choice of basis.

We shall now connect the rank of the matrix of a quadraticform with the rank of the quadratic form. We defined the rankof a quadratic form to be the number of (non-zero) squares in anyof its canonical forms. But relative to a canonical basis the matrixof a quadratic form is diagonal

[1.1

0 Ao 0

O An



and its rank r is equal to the number of non-zero coefficients, i.e.,the rank of the quadratic form. Since we have shown that therank of the matrix of a quadratic form does not depend on thechoice of basis, the rank of the matrix associated with a quadraticform in any basis is the same as the rank of the quadratic form. 5

To sum up:THEOREM 2. The matrices which represent a quadratic form in

different coordinate systems all have the same rank r. This rank isequal to the number of squares with non-zero multipliers in anycanonical form of the quadratic form.

Thus, to find the rank of a quadratic form we must computethe rank of its matrix relative to an arbitrary basis.

§ 8. Complex n-dimensional space

In the preceding sections we dealt essentially with vector spacesover the field of real numbers. Many of the results presented sofar remain in force for vector spaces over arbitrary fields. Inaddition to vector spaces over the field of real numbers, vectorspaces over the field of complex numbers will play a particularlyimportant role in the sequel. It is therefore reasonable to discussthe contents of the preceding sections with this case in mind.

Complex vector spaces. We mentioned in § 1 that all of theresults presented in that section apply to vector spaces overarbitrary fields and, in particular, to vector spaces over the fieldof complex numbers.

Complex Euclidean vector spaces. By a complex Euclideanvector space we mean a complex vector space in which there isdefined an inner product, i.e., a function which associates withevery pair of vectors x and y a complex number (x, y) so that thefollowing axioms hold:

1. (x, y) = (y, x) [(y, x) denotes the complex conjugate of(Y, x)i;

6 We could have obtained the same result by making use of the well-known fact that the rank of a matrix is not changed if we multiply it byany non-singular matrix and by noting that the connection between twomatrices st and .41 which represent the same quadratic form relative to twodifferent bases is .4 ,---- (6' non-singular.



(2x, y) 2(x, y);(x, -1- x2, y) = (x1, y) -H (x2, y);(x, x) is a non-negative real number which becomes zero

only if x = O.Complex Euclidean vector spaces are referred to as unitary

spaces.Axioms 1 and 2 imply that (x, 2y) = il(x, y) In fact,

(x, 2y) = (2y, x) --= .1(y, x) i"(x, y).

Also, (x, y, y2) = y) (x, y2). Indeed,

yi + Y2) = (Y1 + Y2, x) = x) + (Y2, x) = (x, + (x, Y2)Axiom 1 above differs from the corresponding Axiom 1 for a real Euclidean

vector space. This is justified by the fact that in unitary spaces it is notpossible to retain Axioms 1, 2 and 4 for inner products in the form in whichthey are stated for real Euclidean vector spaces. Indeed,

(x, Y) = (Y, x)would imply

(x, 2.(x, y).But then

(Ax, ).x) x).In particular,

(ix, ix) (x, x),

i.e., the numbers (x, x) and (y, y) with y = tx would have different signsthus violating Axiom 4.

EXAMPLES OF UNITARY SPACES. /. Let R be the set of n-tuplesof complex numbers with the usual definitions of addition andmultiplications by (complex) numbers. If

= (E1 E2 En ) and 2, ' ", nn)are two elements of R, we define

(x, Y) -= $217/2 +

We leave to the reader the verification of the fact that with theabove definition of inner product R becomes a unitary space.

2. The set R of Example i above can be made into a unitaryspace by putting

y) = aikeifh,Ic--1



where at, are given complex numbers satisfying the following twoconditions:

(a) a ,, =(Th azkez f,>. 0 for every n-tuple el, $2, , En and takes on

the value zero only if el = C2 = en = O.3. Let R be the set of complex valued functions of a real

variable t defined and integrable on an interval [a, b]. It is easy tosee that R becomes a unitary space if we put

(f(t), g(t)) = f(t)g(t) dt.

By the length of a vector x in a unitary space we shall mean thenumber \/(x, x). Axiom 4 implies that the length of a vector isnon-negative and is equal to zero only if the vector is the zerovector.

Two vectors x and y are said to be orthogonal if (x, y) = O.Since the inner product of two vectors is, in general, not a real

number, we do not introduce the concept of angle between twovectors.

3. Orthogonal basis. Isomorphism of unitary spaces. By anorthogonal basis in an n-dimensional unitary space we mean a setof n pairwise orthogonal non-zero vectors el, e2, , en. As in § 3we prove that the vectors el, e2, , en are linearly independent,i.e., that they form a basis.

The existence of an orthogonal basis in an n-dimensional unitaryspace is demonstrated by means of a procedure analogous to theorthogonalization procedure described in § 3.

If e, e2, , en is an orthonormal basis and

x = $2e2 + + $e, y = %el n2e2 + + nnen

are two vectors, then

(x, Y) = $2e2 + + ee, n2e2 + + nne.)= e2f/2 + + E71

(cf. Example / in this section).If e, e2, e is an orthonormal basis and

x = e,e, $2e2 + $en,



t hen

(x, ez) = e2e2 + + ez) = e,(ei, et)+ $2 (e2, et) + + e (e et),

SO that

(x, e,) = Et.

Using the method of § 3 we prove that all un tary spaces ofdimension n are isomorphic.

4. Bilinear and quadratic forms. With the exception of positivedefiniteness all the concepts introduced in § 4 retain meaning forvector spaces over arbitrary fields and in particular for complexvector spaces. However, in the case of complex vector spacesthere is another and for us more important way of introducingthese concepts.

Linear functions of the first and second kind. A complex valuedfunction f defined on a complex space is said to be a linear functionof the first kind if

f(x + Y) =f(x) ±f(y),f(Ax) = Af (x) ,

and a linear function of the second le.nd if1- f(x + y) --f(x) +f(Y),2. f (2x) = ;1./(x).Using the method of § 4 one can prove that every linear function

of the first kind can be written in the form

f(x) = a1e, a2 + ame,

where $, are the coordinates of the vector x relative to the basisel, e2, en and a, are constants, a, = f(e,), and that everylinear function of the second kind can be written in the form

f(x) = b2t, + + b&,

DEFINITION 1. W e shall say that A (x; y) is a bilinear form(function) of the vectors x and y if:

for any fixed y, A (x; y) is a linear function of the first kind of x,for any fixed x, A (x; y) is a linear function of the second kind

of y. In other words,1. A (xl + x2; y) = A (xi; y) A (x2; y),

A (2x; y) = 2,4 (x; y),



2. A (X; Y1. + 3,2) = A (X; Yi) + A (X; Y2),A (x; Ay) )7.4 (x; y).

One example of a bilinear form is the inner product in a unitaryspace

A (x; y) = (x, y)

considered as a function of the vectors x and y. Another exampleis the expression

A (x; y) = aik$,Fiki, k=1

viewed as a function of the vectors

X El; E2e2 + + $ne,Y = ?he]. + n2e2 + + nmen

Let en e2, , en be a basis of an n-dimensional complex space.Let A (x; y) be a bilinear form. If x and y have the representations

x = 1e1 e2e2 + + enen, y = n1e1 n2e2 +then

A (X; y) = A (elei $2e2 ' fle.n.; //lei ?2e2 + + linen)

ed7kA (ei; ej.i, k1

The matrix IjaH with

ai, A (ei;

is called the matrix of the bilinear form A (x; y) relative to the basis;, e.If we put y = x in a bilinear form A (x; y) we obtain a function

A (x; x) called a quadratic form (in complex space). The connec-tion between bilinear and quadratic forms in complex space issummed up in the following theorem:

Every bilinear form is uniquely determined by its quadraticform. 6

6 We recall that in the case of real vector spaces an analogous statementholds only for symmetric bilinear forms (cf. § 4).



Proof: Let A (x; x) be a quadratic form and let x and y be twoarbitrary vectors. The four identities 7:

A (x±y; x+y) = A (x; x) + A (y; x) + A (x; y) A (y; y),A (x+iy; x±iy)=A(x; x)-HiA (y; x)iA (x; y)± A (y; y),A (xy; xy) = A (x; x) A (y; x) A (x; y)± A (y; y),A (xiy; x iy)= A (x; x)iA (y; x)-HiA (x; y)+A(y; y),

enable us to compute A (x; y). Namely, if we multiply theequations (I), (II), (III), (IV) by 1, i, 1, i, respectively,and add the results it follows easily that

A (x; y) = ±{A (x y; x + y) + iA (x iy;x iy)A (x y; x y) iA(x iy; x iy)}.

Since the right side of (1) involves only the values of the quadraticform associated with the bilinear form under consideration ourassertion is proved.

If we multiply equations (I), (II), (III), (IV) by 1, i, 1, i,respectivly, we obtain similarly,

A (y; x) 1{A (x y; x + y) iA (x iy; x iy)A (x y; x y) + iA (x iy; x iy)}.

DEFINITION 2. A bilinear form is called Hermitian if

A (x; y) = A (y; x).

This concept is the analog of a symmetric bilinear form in a realEuclidean vector space.

For a form to be Hermitian it is necessary and sufficient that itsmatrix laikl I relative to some basis satisfy the condition

aIndeed, if the form A (x; y) is Hermitian, then

a = A (ei; ek) A (ek; e1) d,.Conversely, if a = aki, then

A (x; y) = a1kE111, I ankei ---- A (y x).

NOTE If the matrix of a bilinear form satisfies the condition

7 Note that A (x; 1.3) =M (x; Y)iA (x; y).

so that, in particular, A (x; iy)



a = dkì, then the same must be true for the matrix of this formrelative to any other basis. Indeed, a, -- d relative to some basisimplies that A (x; y) is a Hermitian bilinear form, but then a--=drelative to any other basis.

If a bilinear form is Hermitian, then the associated quadraticform is also called Hermitian. The following result holds:

For a bilinear form A (x; y) to be Hermitian it is necessar yand sufficient that A (x; x) be real for every vector x.

Proof: Let the form A (x; y) be Hermitian; i.e., let A (x; y)= A (y; x). Then A (x; x) = A (x; x), so that the numberA (x; x) is real. Conversely, if A (x; x) is real for al x, then, inparticular, A (x + y; x + y), A (x iy; x iy), A (x y; xy),A (x iy; x iy) are all real and it is easy to see from formulas(1) and (2) that A (x; y) = (y; x).

COROLLARY. A quadratic form is Hermitian i f and only i f it is realvalued.

The proof is a direct consequence of the fact just proved that fora bilinear form to be Hermitian it is necessary and sufficient thatA (x; x) be real for all x.

One example of a Hermitian quadratic form is the form

A (x; x) = (x, x),

where (x, x) denotes the inner product of x with itself. In fact,axioms 1 through 3 for the inner product in a complex Euclideanspace say in effect that (x, y) is a Hermitian bilinear form so that(x, x) is a Hermitian quadratic form.

If, as in § 4, we call a quadratic form A (x; x) positive definitewhen

A (x; x) > 0 for x 0,

then a complex Euclidean space can be defined as a complexvector space with a positive definite Hermitian quadratic form.

If Al is the matrix of a bilinear form A (x; y) relative to thebasis e1, e2, , en and I the matrix of A (x; y) relative to the

tt

basis f1, f2, , f2 and if f; = coe, j 1, , n), then4-4

= %)* seW


R.-DIMENSIONAL SPACES 67

Here g' flc[ and tt'* Ilc*011 is the conjugate transpose ofi.e., c* e51.

The proof is the same as the proof of the analogous fact in a realspace.

5. Reduction of a quadra ic form to a sum of squares

THEOREM 1, Let A (x; x) be a Hermitian quadratic form in acomplex vector space R. Then there is a basis e1, e1, , en of Rrelative to which the form in question is given by

A (X; X) = + A2 EJ2 + + EJwhere all the 2's are real.

One can prove the above by imitating the proof in § 5 of theanalogous theorem in a real space. We choose to give a version ofthe proof which emphasizes the geometry of the situation. Theidea is to select in succession the vectors of the desired basis.

We choose el so that A (el; el) O. This can be done for other-wise A (x; x) = 0 for all x and, in view of formula (1), A (x; y) O.

Now we select a vector e2 in the (n 1)-dimensional space Thuconsisting of all vectors x for which A (e1; x) = 0 so thatA (e2, e2) 0, etc. This process is continued until we reach thespace Itffi in which A (x; y) O (Mr) may consist of the zerovector only). If W.) 0, then we choose in it some basis er÷,,er+2, , en. These vectors and the vectors el, e2, , e, form abasis of R.

Our construction implies

A (ei; ek) = 0 for i < k.

On the other hand, the Hermitian nature of the form A (x; y)implies

A (ei; ) = 0 for > k.

It follows tha

x = + E2e2 + + enen

is an arbitrary vector, then

A (x; X) = A (ei; el) + E2(2A (e2; e2) + + EThfA (en; e),

where the numbers A (e,; ei) are real in view of the Hermitian



nature of the quadratic form. If we denote A (e,; e1) by ), then

A (x; x) + 22M2 + + 2E& = 41E112 + 221e2I2+ + 2nleta2

6. Reduction of a Hermitian quadratic form to a sum of squaresby means of a triangular transformation. Let A (x; x) be a Hermitianquadratic form in a complex vector space and e, e2, , e abasis. We assume that the determinants

A, =-- a,, 42 --= au a12a21 a2^

a, a.2 an

where a, --- A (ei; ek), are all different from zero. Then just as in§ 6, we can write down formulas for finding a basis relative towhich the quadratic form is represented by a sum of squares.These formulas are identical with (3) and (6) of § 6. Relative tosuch a basis the quadratic form is given by

A (x; x) = 1$112 1E21' + I ler2,A2 zlwhere A, = 1. This implies, among others, that the determinants/11 , 42, , are real. To see this we recall that if a Hermitianquadratic form is reduced to the canonical form (3), then thecoefficients are equal to A (e1; e1) and are thus real.EXERCISE. Prove directly that if the quadratic form A (x; x) is Hermitian,

then the determinants /1 4,, , 4 are real.Just as in § 6 we find that for a Hermitian quadratic form to be

positive definite it is necessary and sufficient that the determinants, A2 , ,A,, be positive.The number of negative multipliers of the squares in the canonical

form of a Hermitian quadratic form equals the number of changes ofsign in the sequence

I, Al, A2, , An7. The law of inertia

THEOREM 2. If a Hermitian quadratic form has canonical fo

all a12 aln

An = a22 a2n


It-DIMENSIONAL SPACES 69

relative to two bases, then the number of positive, negative and zerocoefficients is the same in both cases.

The proof of this theorem is the same as the proof of the corre-sponding theorem in § 7.

The concept of rank of a quadratic form introduced in § 7 for realspaces can be extended without change to complex spaces.


CHAPTER II

Linear Transformations

§ 9. Linear transformations. Operations on lineartransformations

1. Fundamental definitions. In the preceding chapter we stud-ied functions which associate numbers with points in an n-dimensional vector space. In many cases, however, it is necessaryto consider functions which associate points of a vector space withpoints of that same vector space. The simplest functions of thistype are linear transformations.

DEFINITION I. If with every vector x of a vector space R there isassociated a (unique) vector y in R, then the mapping y = A(x) iscalled a transformation of the space R.

This transformation is said to be linear if the following two condi-tions hold:

A (x + x2) = A(x1) + A (x,),A (dlx ) = (x).

Whenever there is no danger of confusion the symbol A (x) isreplaced by the symbol Ax.

EXAMPLES. 1. Consider a rotation of three-dimensional Eucli-dean space R about an axis through the origin. If x is any vectorin R, then Ax stands for the vector into which x is taken by thisrotation. It is easy to see that conditions 1 and 2 hold for thismapping. Let us check condition 1, say. The left side of 1 is theresult of first adding x and x, and then rotating the sum. Theright side of 1 is the result of first rotating x, and x, and thenadding the results. Clearly, both procedures yield the same vector.

2. Let R' be a plane in the space R (of Example 1) passingthrough the origin. We associate with x in R its projectionx' = Ax on the plane R'. It is again easy to see that conditions1 and 2 hold.

70


LINEAR TRANSFORMATIONS 71

for all x.

3. Consider the vector space of n-tuples of real numbers.Let liaikH be a (square) matrix. With the vector

x= &.] e2 , en)we associate the vector

Y = Ax (ni, n2, n.),where

aike kk=1

This mapping is another instance of a linear transformation.4. Consider the n-dimensional vector space of polynomials of

degree n 1.

If we putAP(1) P1(1),

where P'(t) is the derivative of P(t), then A is a linear transforma-tion. Indeed

[P1 (t)Pa(t)i' a(t) P 2 (t),[AP (t)]' AP' (t).

5. Consider the space of continuous funct ons f(t) defined onthe interval [0, 1]. If we put

Af(t) = f(r) dr,

then Af(t) is a continuous function and A is linear. Indeed,

A (fi + /2) = Jo I.41(t) f2(T)] dr

.fi(r) dr To f2er) tit = Afi Af2;

A (.) --= /If (r) dr Jo f(r) dr 2AI

Among linear transformations the following simple transforma-tions play a special role.

The identity mapping E defined by the equation

Ex x



The null transformation 0 defined by the equation

Ox =for all x.

2. Connection between matrices and linear transformations. Letel, e2, , en be a basis of an n-dimensional vector space R andlet A denote a linear transformation on R. We shall show that

Given n arbitrary vectors g1, g2, , g there exists a uniquelinear transformation A such that

Ae, = g1, Ae2 = g2, Ae = g.We first prove that the vectors Ae,, Ae2, , Ae determine A

uniquely. In fact, ifx = e2e2 + + ene

is an arbitrary vector in R, thenAx = A(eie, E2e2 + &,e) = EiAel E2Ae2

+ -Hen Ae,

so that A is indeed uniquely determined by the Ae,.It remains to prove the existence of A with the desired proper-

ties. To this end we consider the mapping A which associateswith x = ele E2e2 + + E'e. the vector Ax = e,g, e,g2+ + es,,. This mapping is well defined, since x has a uniquerepresentation relative to the basis e1, e2, , e,,. It is easily seenthat the mapping A is linear.

Now let the coordinates of g relative to the basis el, e2, ,be a1, a2, , a, i.e.,

g,, Aek = aikei.

The numbers ao, (i, k = 1, 2, n) form a matrixJ1 = Haikl!

which we shall call the matrix of the linear transformation A relativeto the basis e1, e2, , e.

We have thus shown that relative to a given basis el, e2, , eevery linear transformation A determines a unique matrix Maji and,conversely, every matriz determines a unique linear transformationgiven by means of the formulas (3), (1), (2).



Linear transformations can thus be described by means ofmatrices and matrices are the analytical tools for the study oflinear transformations on vector spaces.

EXAMPLES. 1. Let R be the three-dimensional Euclidean spaceand A the linear transformation which projects every vector on theXY-plane. We choose as basis vectors of R unit vectors el, e2, e,directed along the coordinate axes. Then

Ael = el, Ae, = e,, Ae3 = 0,

i.e., relative to this basis the mapping A is represented by thematrix

EXERCISE Find the matrix of the above transformation relative to thebasis e',, e'2, e'3, where

e', = e,, ei2 = e2, e'a = e2 ea.

Let E be the identity mapping and e, e2, e any basisin R. Then

Aei = e (1= 1, 2, n),

i.e. the matrix which represents E relative to any basis is

= 1,

00It is easy to see that the null transformation is always representedby the matrix all of whose entries are zero.

Let R be the space of polynomials of degree n 1. Let Abe the differentiation transformation, i.e.,

AP(t) = P'(t).

We choose the following basis in R:

t2 Ene2 = t, e e -=3 2! (n I)!

[1 0 0

0 1 o.0 0 0



Then

Ael = 1' = 0, Ae2 = e 1 Ae3 (2)' t e2

Ae =(n 1) !

tn-2

(n 2) !

Hence relative to our basis, A is represented by the matrix

Let A be a linear transformation, el, e2, , en a basis in R andMakH the matrix which represents A relative to this basis. Let

(4) x = $1e, $2e2 + + $nen,

(4') Ax = 121e1+ n2; + + nen.We wish to express the coordinates ni of Ax by means of the coor-dinates ei of x. Now

Ax = A (e,e, E2e2 + + Een)= ei(ael a21e2 + + anien)

$2(a12e1 a22e2 + + an2e2)

5(aiei a2e2 + + ae)= (a111 a12e2 + + a$n)e,

((file, + C122E2 + + a2en)e2

(aie, an22 + + ct,,)e.Hence, in v ew of (4'),

= arier a12E2 + alnn2 = aizEi a22 E2 + + az. en,

tin --= an1$1 a2 + + anEn,

or, briefly,

(5) n, a ace lek=1

0 1 0 0001 0

0 0 0 1

0 0 0 0



Thus, if J]20represents a linear transformation A relative tosome basis e1, e2, , e, then transformation of the basis vectorsinvolves the columns of lIctocH [formula (3)] and transformation ofthe coordinates of an arbitrary vector x involves the rows of Haikil[formula (5)].

3. Addition and multiplication of linear transformations. Weshall now define addition and multiplication for linear transforma-tions.

DEFINITION 2. By the product of two linear transformationsA and B we mean the transformation C defined by the equationCx = A (Bx) for all x.

If C is the product of A and B, we write C = AB.The product of linear transformations is itself linear,i.e., it

satisfies conditions 1 and 2 of Definition I. Indeed,

C (x, x2) -= A [B (x x,)] = A (Bx, Bx,)= ABx, ABx, = Cx, Cx2

The first equality follows from the definition of multiplication oftransformations, the second from property 1 for B, the third fromproperty 1 for A and the fourth from the definition of multiplica-tion of transformations. That C (2x) = ,ICx is proved just as easily.

If E is the identity transformation and A is an arbitrary trans-formation, then it is easy to verify the relations

AE = EA = A.Next we define powers of a transformation A:

A2 = A A, A3 = A2 A, etc.,

and, by analogy with numbers, we define A° = E. Clearly,

An" = Am A".EXAMPLE. Let R be the space of polynomials of degree n 1.

Let D be the differentiation operator,

D P(t) = P' (t).

Then D2P(t) = D(DP(t)) = (P'(t))/ P"(t). Likewise, D3P(t)P"(t). Clearly, in this case D" = O.

Ex ERCISE. Se/ect in R of the above example a basis as in Example 3 of para.3 of this section and find the matrices of ll, IP, relative to this basis.



We know that given a basis e1, e2, , e every linear transfor-mation determines a matrix. If the transformation A determinesthe matrix jjaikj! and B the matrix 1lb j], what is the matrix jcjjdetermined by the product C of A and B. To answer this questionwe note that by definition of Hcrj

C; = IckeI.

Further

AB; = A( bike]) == biAeiJ=1

Comparison of (7) and (6) yields

cika15 blk.

We see that the element c of the matrix W is the sum of the pro-ducts of the elements of the ith row of the matrix sit and thecorresponding elements of the kth column of the matrix Re?. Thematrix W with entries defined by (8) is called the product of thematrices S and M in this order. Thus, if the (linear) transforma-tion A is represented by the matrix I jaikj j and the (linear) trans-formation B by the matrix jjbjj, then their product is representedby the matrix j[c1! which is the product of the matrices Hai,j1and j I b, j j

DEFINITION 3. By the sum of tzew linear transformations A and Bwe mean the transformation C defined by the equation Cx Ax Bxfor all x.

If C is the sum of A and B we write C = A + B. It is easy tosee that C is linear.

Let C be the sum of the transformations A and B. If j jazkl j and

Ilkkl I represent A and B respectively (relative to some basise1, e2, , e) and I C al! represents the sum C of A and B (relativeto the same basis), then, on the one hand,

A; = a ake,, = I b,ke C; = czkei,

and, on the other hand,Ce, A; + B; = (ac ?Wei,


so thatc = a b,.

The matrix + bIl is called the sum of the matrices Ila,[1 and111)0,11. Thus the matrix of the sum of two linear transformations is thesum of the matrices associated with the summands.

Addition and multiplication of linear transformations havesome of the properties usually associated vvith these operations.Thus

A+B=B±A;(A B) C = A + (B C);

A (BC) = (AB)C;

f (A B)C = AC -1- BC,1 C(A B) = CA + CB.

We could easily prove these equalities directly but this is unnec-essary. We recall that we have established the existence of aone-to-one correspondence between linear transformations andmatrices which preserves sums and products. Since properties1 through 4 are proved for matrices in a course in algebra, the iso-morphism between matrices and linear transformations justmentioned allows us to claim the validity of 1 through 4 for lineartransformations.

We now define the product of a number A and a linear transfor-mation A. Thus by 2A we mean the transformation which associ-ates with every vector x the vector il(Ax). It is clear that if A isrepresented by the matrix then 2A is represented by thematrix rj2a2,11

If P (t) aot'n + + a, is an arbitrary polynomialand A is a transformation, we define the symbol P(A) by theequation

P(A) = (Om + a, 21,16-1 + + a,E.EXAMPLE. Consider the space R of functions defined and

infinitely differentiable on an interval (a, b). Let D be the linearmapping defined on R by the equation

Df (t) = f(1).




0 0 0 0000 o_

It is possible to give reasonable definitions not only for apolynomial in a matrix at but also for any function of a matrix dsuch as exp d, sin d, etc.

As was already mentioned in § 1, Example 5, all matrices oforder n with the usual definitions of addition and multiplicationby a scalar form a vector space of dimension n2. Hence anyn2 + 1 matrices are linearly dependent. Now consider thefollowing set of powers of some matrix sl

If P (t) is the polynomial P (t) = cior + airn-1+ + am,then P(D) is the linear mapping which takes f (I) in R into

P(D)f(t) = aor)(t) + a, f (m-1) (t) + -1- am f (t).

Analogously, with P (t) as above and al a matrix we definea polynomial in a matrix, by means of the equation

P(d) = arelm + a1stm-1 + + a, e .EXAMPLE Let a be a diagonal matrix, i.e., a matrix of the form

.2/ =

[A, 0 0 - - 0

0 0 ).,,

2.2 0 ' 0

We wish to find P(d). Since

d2 =

[AL, 0 01

., dm =

rim 0 - - 0]2.22 Oi 0 2.2" - - 0

0 At,' 0 0 ;.,^

it follows thatP (0). , ) 1 0; i :

0 - P(2..)

EXERCISE. Find P(.91) for

0 1 0 0 00 010 0

si =O 0 0 1 0



Since the number of matrices is n2 -H 1, they must be linearlydependent, that is, there exist numbers a, a1, a2, ,a, (not allzero) such that

clog' Jr a1d + a2ia/2 + + a,dn2 = 0.It follows that for every matrix of order n there exists a polyno-mial P of degree at most n2 such that P(s1) = C. This simpleproof of the existence of a polynomial P (t) for which P(d ) = 0 isdeficient in two respects, namely, it does not tell us how to con-struct P (t) and it suggests that the degree of P (t) may be as highas n2. In the sequel we shall prove that for every matrix sif thereexists a polynomial P(t) of degree n derivable in a simple mannerfrom sit and having the property P(si) = C.

4. Inverse transformationDEFINITION 4. The transformation B is said to be the inverse of

A if AB = BA = E, where E is the identity mapping.The definition implies that B(Ax) = x for all x, i.e., if A takes

x into Ax, then the inverse B of A takes Ax into x. The inverse ofA is usually denoted by A-1.

Not every transformation possesses an inverse. Thus it is clearthat the projection of vectors in three-dimensional Euclideanspace on the KV-plane has no inverse.

There is a close connection between the inverse of a transforma-tion and the inverse of a matrix. As is well-known for every matrixst with non-zero determinant there exists a matrix sil-1 such that

(9)sisti af_id _

si-1 is called the inverse of sit To find se we must solve a systemof linear equations equivalent to the matrix equation (9). Theelements of the kth column of sl-1 turn out to be the cofactors ofthe elements of the kth row of sit divided by the determinant ofIt is easy to see that d-1 as just defined satisfies equation (9).

We know that choice of a basis determines a one-to-one corre-spondence between linear transformations and matrices whichpreserves products. It follows that a linear transformation A hasan inverse if and only if its matrix relative to any basis has a non-zero determinant, i.e., the matrix has rank n. A transformationwhich has an inverse is sometimes called non-singular.



If A is a singular transformation, then its matrix has rank < n.We shall prove that the rank of the matrix of a linear transformation is

independent of the choice of basis.THEOREM. Let A be a linear transformation on a space R. The set of

vectors Ax (x varies on R) forms a subspace R' of R. The dimension of R'equals the rank of the nzatrix of A relative to any basis e2, e2, ,

Proof: Let y, e R' and y, e R', i.e., y, = Ax, and y, = Ax,. Theny, y, Ax, Ax, = A (x,

i.e., y, --F y, e R'. Likewise, if y = Ax, thenAy 2Ax - A (2x),

i.e., Ay e R'. Hence R' is indeed a subspace of R.Now any vector x is a linear combination of the vectors el, e2,

Hence every vector Ax, i.e., every vector in R', is a linear combination ofthe vectors Ae,, Ae,, Ae. If the maximal number of linearly independentvectors among the Ae, isk, then the other Ae, are linear combinations of the kvectors of such a maximal set. Since every vector in R' is a linear combinationof the vectors Ae,, Ae2, , Ae, it is also a linear combination of the h vectorsof a maximal set. Hence the dimension of R' is h. Let I represent Arelative to the basis e,, e , e. To say that the maximal number oflinearly independent Ae, is h is to say that the maximal number of linearlyindependent columns of the matrix Ila,1H is h, i.e., the dimension of R' is thesame as the rank of the matrix ra11.

5. Connection between the matrices of a linear transformationrelative to different bases. The matrices which represent a lineartransformation in different bases are usually different. We nowshow how the matrix of a linear transformation changes under achange of basis.

Let e1, e2, , en and f, , f , f be two bases in R. Let W bethe matrix connecting the two bases. More specifically, let

= C21e2 +f, cei c22e2 + + c02en,

(10)f cei c,e, + +

If C is the linear transformation defined by the equations

Cei =- 1, 2, , n),then the matrix of C relative to the basis e1, e2, , e is W(cf. formulas (2) and (3) of para. 3).



Let sit = Ilai,11 be the matrix of A relative to e1, e2, , e anda 11),!1 its matrix relative to f1, f2, , f. In other words,

(10') Aek =

tt

(10") Afk = bat.i=1

We wish to express the matrix ,R in terms of the matrices si and W.To this end we rewrite (10") as

ACe, = Ce,.

Premultiplying both sides of this equation by C-1- (which exists inview of the linear independence of the fi) we get

C-'ACe, = bzker

It follows that the matrix jbikl represents C-'AC relative to thebasis e1, e2, , e. However, relative to a given basis matrix(C-1AC) = matrix (C-9 matrix (A) matrix (C)., so that

(11) =To sum up: Formula (11) gives the connection between the matrix

.4 of a transformation A relative to a basis f,, f2, , f and thematrix <91 which represents A relative to the basis e,, e2, , e.The matrix 462 in (11) is the matrix of transition from the basise,, e2, , en to the basis f1, f2, , ft, (formula (10)).

§ 10. Invariant subspaces. Eigenvalues and eigenvectorsof a linear transformation

1. Invariant subs paces. In the case of a scalar valued functiondefined on a vector space R but of interest only on a subspace 12,of R we may, of course, consider the function on the subspace R,only.

Not so in the case of linear transformations. Here points in R,may be mapped on points not in R, and in that case it is notpossible to restrict ourselves to R, alone.



DEFIN/TION 1. Let A be a linear transformation on a space R.A subs pace R, of R is called invariant under A if x e R, impliesAx e R

If a subspace R1 is invariant under a linear transformation Awe may, of course, consider A on R, only.

Trivial examples of invariant subspaces are the subspace con-sisting of the zero element only and the whole space.

EXAMPLES. 1. Let R be three-dimensional Euclidean space andA a rotation about an axis through the origin. The invariantsubspaces are: the axis of rotation (a one-dimensional invariantsubspace) and the plane through the origin and perpendicular tothe axis of rotation (a two-dimensional invariant subspace).

Let R be a plane. Let A be a stretching by a factor A1 alongthe x-axis and by a factor A, along the y-axis, i.e., A is the mappingwhich takes the vector z = e, e, into the vector Az= Ai ei e, + A22e2 (here e, and e, are unit vectors along thecoordinate axes). In this case the coordinate axes are one-dimensional invariant subspaces. If A, = A, = A, then A is asimilarity transformation with coefficient A. In this case every linethrough the origin is an invariant subspace.

EXERCISE. Show that if A, /1.2, then the coordinate axes are the onlyinvariant one-dimensional subspaces.

Let R be the space of polynomials of degree n I and Athe differentiation operator on R, i.e.,

AP (t) --= P' (1).

The set of polynomials of degree .k.<n-1. is an invariantsubspace.

EXERCISE. Show that R in Example 3 contains no other subspacesinvariant under A.

Let R be any n-dimensional vector space. Let A be a lineartransformation on R whose matrix relative to some basis el, e2,

en is of the form


LINEAR TRANsFormarioNs 83

an ' avc

all a17,,+1 ' ' a,a1+11+1 ak+in

0 O a1,1 aIn this case the subspace generated by the vectors e , e2, ek isinvariant under A. The proof is left to the reader. If

(1 _< k),= =then the subspace generated by ek-Flp e1+2 en would also beinvariant under A.

2. Eigenvectors and eigenvalues. In the sequel one-dimensionalinvariant subspaces will play a special role.

Let R1 be a one-dimensional subspace generated by some vectorx O. Then R, consists of all vectors of the form ax. It is clearthat for R1 to be invariant it is necessary and sufficient that thevector Ax be in R1, i.e., that

Ax = 2x.

DEFINITION 2. A vector x 0 satisfying the relation Ax Ax

is called an eigenvector of A. The number A is called an eigenvalueof A.

Thus if x is an eigenvector, then the vectors ax form a one-dimensional invariant subspace.

Conversely, all non-zero vectors of a one-dimensional invariantsubspace are eigenvectors.

THEOREM 1. If A is a linear transformation on a complex i spaceR, then A has at least one eigenvector.

Proof: Let e1, e2, en be a basis in R. Relative to this basis Ais represented by some matrix IctikrI Let

x = elei E2e, + + Eebe any vector in R. Then the coordinates ni, n2, , n of thevector Ax are given by

The proof holds for a vector space over any algebraically closed fieldsince it makes use only of the fact that equation (2) has a solution.


= an1e1+ ct,i02+ + ctE(Cf. para. 3 of § 9).The equation

Ax = Ax,

which expresses the condition for x to be an eigenvector, is equiv-alent to the system of equations:

a111 a12$2 + + ainE=-- A¿,,a2151 a22 + + a2¿,, = A2

an11 an2$2 + + ae--= A¿,,,Or

(an A)ei an$, + + a1--- 0,Ei (a22 A)E, + + a2,, 0,

ani¿i a2$2 + + (anTh O.

Thus to prove the theorem we must show that there exists anumber A and a set of numbers ¿I), $2, , e not all zero satisfyingthe system (1).

For the system (1) to have a non-trivial solution ¿1, ¿,, - ,it is necessary and sufficient that its determinant vanish, i.e., that

A a12 aian a22 A a,an, a2 a A

This polynomial equation of degree n in A has at least one (ingeneral complex) root A.

With A, in place of A, (1) becomes a homogeneous system oflinear equations with zero determinant. Such a system has anon-trivial solution ¿,(0), E2(0), , ¿n(0). If we put

xon Elm) $2(0) e2 . . . Eno) en,

thenAxo) = Aoco),


= 1111E1 + a122 + ' al. ,

= a2111 a22 e2 + + a2¿,,,/12 4



i.e., 30°) is an eigenvector and 2 an eigenvalue of A.This completes the proof of the theorem.NOTE: Since the proof remains valid when A is restricted to any

subspace invariant under A, we can claim that every invariantsubspace contains at least one eigenvector of A.

The polynomial on the left side of (2) is called the character sticpolynomial of the matrix of A and equation (2) the characteristicequation of that matrix. The proof of our theorem shows that theroots of the characteristic polynomial are eigenvalues of thetransformation A and, conversely, the eigenvalues of A are roots ofthe characteristic polynomial.

Since the eigenvalues of a transformation are defined withoutreference to a basis, it follows that the roots of the characteristicpolynomial do not depend on the choice of basis. In the sequel weshall prove a stronger result 2, namely, that the characteristicpolynomial is itself independent of the choice of basis. We maythus speak of the characteristic polynomial of the transformation Arather than the characteristic polynomial of the matrix of thetransformation A.

3. Linear transformations with n linearly independent eigen-vectors are, in a way, the simplest linear transformations. Let A besuch a transformation and e1, e2, , en its linearly independenteigenvectors, i.e.,

Ae, e (i = 1, 2, , n).Relative to the basis e1, e2, , en the matrix of A is

o

o

Lo o 22,_

Such a matrix is called a diagonal matrix. We thus have

THEOREM 2. If a linear transformation A has n linearly independ-ent eigenvectors then these vectors form a basis in which A is represent-ed by a diagonal matrix. Conversely, if A is represented in some

2 The fact that the roots of the characteristic polynomial do not dependon the choice of basis does not by itself imply that the polynomial itself isindependent of the choice of basis. It is a priori conceivable that themultiplicity of the roots varies with the basis.



basis by a diagonal matrix, then the vectors of this basis are eigen-values of A.

NOTE: There is one important case in which a linear transforma-tion is certain to have n linearly independent eigenvectors. Welead up to this case by observing that

If e1, e2, , ek are eigenvectors of a transformation A and thecorresponding eigenvalues 2,, A ' , A, are distinct, then e1, e2,e, are linearly independent.

For k = 1 this assertion is obviously true. We assume itsvalidity for k 1 vectors and prove it for the case of k vectors.If our assertion were false in the case of k vectors, then therewould exist k numbers ai , 0( , a, with al 0 0, say, such that

(3) ei a2 e2 e, = O.

Apply ng A to both sides of equation (3) we get

A (al ek + x2e2 + + a, e,) = 0,Or

1121e1 1222e2 ock2kek = O.

Subtracting from this equation equation (3) multiplied by A, weare led to the relation

2k)e1 + 12(22 2k)e2+ ' ' 1k--1(1ki Ak)eki =with 21 2, 0 0 (by assumption Ak for i k). This contra-dicts the assumed linear independence of e1, e2, ,The following result is a direct consequence of our observation:

If the characteristic polynomial of a transformation A has n distinctroots, then the matrix of A is diagonable.

Indeed, a root Ak of the characteristic equation determines atleast one eigenvector. Since the A, are supposed distinct, it followsby the result just obtained that A has n linearly independenteigenvectors e1, e2, , e. The matrix of A relative to the basise1, e2, , en is diagonal.

If the characteristic polynomial has multiple roots, then the number oflinearly independent eigenvectors may be less than n. For instance, thetransformation A which associates with every polynomial of degree < n 1

its derivative has only one eigenvalue A = 0 and (to within a constantmultiplier) one eigenvector P = constant. For if P (t) is a polynomial of



degree k > 0, then P'(t) is a poly-nomial of degree k 1. HenceP'(t) = AP(t) implies A -= 0 and P(t) = constant, as asserted. It followsthat regardless of the choice of basis the matrix of A is not diagonal.

We shall prove in chapter III that if A is a root of multiplicity mof the characteristic polynomial of a transformation then themaximal number of linearly independent eigenvectors correspond-ing to A is m.

In the sequel (§§ 12 and 13) we discuss a few classes of diagonablelinear transformations (i.e., linear transformations which in somebases can be represented by diagonal matrices). The problem ofthe "simplest" matrix representation of an arbitrary linear trans-formation is discussed in chapter III.

4. Characteristic fiolynomial.In para. 2 we defined the characteris-tic polynomial of the matrix si of a linear transformation A as thedeterminant of the matrix si Ae and mentioned the fact thatthis polynomial is determined by the linear transformation Aalone, i.e., it is independent of the choice of basis. In fact, if si and

represent A relative to two bases then %'-'sn' for some W.But

Ati Ir-11 1st Ael

This proves our contention. Hence we can speak of the character-istic polynomial of a linear transformation (rather than thecharacteristic polynomial of the matrix of a linear transformation).

EXERCISES. 1. Find the characteristic polynomial of the matrix

A, 0 0 0 0

I A, 0 0 0

0 1 A o0 0 1 A

2. Find the characteristic polynomial of the matrix

a, a, a, an_, an-1 0 0 0 0010 0 0

0 0 0 1 0

Solution: (-1)"(A" a11^-1 a2A^-2 a ).

We shall now find an explicit expression for the characteristicpolynomial in terms of the entries in some representation sal of A.



Ab, a2 1b7,2 a 2band can (by the addition theorem on determinants) be writtenas the sum of determinants. The free term of Q(A) is

all anan an

a,,1 a, a,,The coefficient of ( A)'' in the expression for Q(A) is the sum ofdeterminants obtained by replacing in (4) any k columns of thematrix by the corresponding columns of the matrix II b1,11.

In the case at hand a = e and the determinants which add upto the coefficient of (A') are the principal minors of order n kof the matrix Ha,,11. Thus, the characteristic polynomial P(2) ofthe matrix si has the form

P(A) ( 1)4 (An fi12n-1 P22"' '

where p, is the sum of the diagonal entries of si p2 the sum of theprincipal minors of order two, etc. Finally, p,, is the determinant of si

We wish to emphasize the fact that the coefficients pi, p2, ,p are independent of the particular representation a of thetransformation A. This is another way of saying that the charac-teristic polynomial is independent of the particular representationsi of A.

The coefficients p and p, are of particular importance. p is thedeterminant of the matrix si and pi, is the sum of the diagonalelements of sí. The sum of the diagonal elements of sal is called itstrace. It is clear that the trace of a matrix is the sum of all theroots of its characteristic polynomial each taken with its propermultiplicity.

To compute the eigenvectors of a linear transformation we mustknow its eigenvalues and this necessitates the solution of a poly-nomial equation of degree n. In one important case the roots of

We begin by computing a more general polynomial, namely,Q (A) ¡ , where a and are two arbitrary matrices.

an Abu an /1.1)12 al b1,,

Q(A) =Abn a22 21)22 ' a2 2b2


the characteristic polynomial can be read off from the matrixrepresenting the transformation; namely,

If the matrix of a transformation A is triangular, i.e., if it has theform

an a12 al, al(5)

O a22 am a,


a' if a a ,V0 ==(Lot.

0 0 0 athen the eigenvalues of A are the numbers an, a22, , ann.

The proof is obvious since the characteristic polynomial of thematrix (5) is

P(2) = 2) fan 2) (an,' A)

and its roots are an, a22, a.EXERCISE. Find the eigenvectors corresponding to the eigenvalues

an, an, a of the matrix (5).We conclude with a discussion of an interesting property of the charac-

teristic polynomial. As 'vas pointed out in para. 3 of § 9, for every matrix a/there exists a polynomial P(t) such that P(d) is the zero matrix. We nowshow that the characteristic polynomial is just such a polynomial. First weprove the following

LEMMA 1. Let the polynomial

P(A) = ad." + + + am

and the matrix se be connected by the relationP(A)g = (se --,12)%v)

where ?(A) is a polynomial in A with matrix coefficie s, e

CA) = W01?-1 ' +Then P(.09) = C.

(We note that this lemma is an extension of the theorem of Bezout topolynomials with matrix coefficients.)

Prool: We have

Ae)r(A) sér., + (sn.-2(d?.-3 W.-2) + WoAm.

Now (6) and (7) yield the equationsam e,

w,_3= a,_, e,= a,_2 e,



If we multiply the first of these equations on the left by t, the second by al,the third by Sr, ' ' the last by dm and add the resulting equations, we get0 on the left, and P(.4) = a, t + a,_, d + + a, dm on the right.Thus P( 31) = 0 and our lemma is proved 3.THEOREM 3. If P(A) is the characteristic polynomial of Al, then P(d) = O.Proof: Consider the inverse of the matrix d At. We have

A t)(d A t)-1 e. As is well known, the inverse matrix can bewritten in the form

1AS)-/ =

P(A)

where 5 (A) is the matrix of the cofactors of the elements of a/ At andP(A) the determinant of d ite, i.e., the characteristic polynomial of S.Hence

(.29 ,i.e)w(A) = P(A)e.Since the elements of IS(A) are polynomials of degree . n I in A, weconclude on the basis of our lemma that

P ( 31) = C.

This completes the proof.

We note that if the characteristic polynomial of the matrix d has nomultiple roots, then there exists no polynomial Q(A) of degree less than nsuch that Q(.0) = 0 (cf. the exercise below).

EXERCISE. Let d be a diagonal matrix

=[A,Oi

0 A, 0

A

where all the A; are distinct. Find a polynomial P(t) of lowest degree forwhich P(d) = 0 (cf. para. 3, § 9).

§ II. The adjodnt of a linear hmasforrnation

1. Connection between transfornudions and bilinear forms inEuclidean space. VVe have considered under separate headingslinear transfornaations and bilinear forrns on vector spaces. In

3 In algebra the theorem of Bezout is proved by direct substitution of Ain (6). Here this is not an admissible procedure since A is a number and a' isa matrix. However, we are doing essentially the same thing. In fact, thekth equation in (8) is obtained by equating the coefficients of A* in (6).Subsequent multiplication by St and addition of the resulting equations istantamount to the substitution of al in place of A.



A (x; y) (6711E1 + an 52 +(171251 + 422E2 +

(a25e1 a2e2 +

Now we introduce the vector z with

= aide]. -F a21 52 '

C2 = a12e1 + 422E2 +

+ an' 52)771-F a,,2 j2

+ anE)77.coordinates

+ ani$,+ an2$,

= a2ne2 + a$,.It is clear that z is obtained by applying to x a linear transforma-tion whose matrix is the transpose of the matrix Haikil of thebilinear form A (x; y). We shall denote this linear transformation

4 Relative to a given basis both linear transformations and bilinear formsare given by matrices. One could therefore try to associate with a givenlinear transformation the bilinear form determined by the same matrix asthe transformation in question. However, such correspondence would bewithout significance. In fact, if a linear transformation and a bilinear formare represented relative to some basis by a matrix at, then, upon change ofbasis, the linear transformation is represented by Se-1 are (cf. § 9) and thebilinear form is represented by raw (cf. § 4). Here re is the transpose of rer

The careful reader will notice that the correspondence between bilinearforms and linear transformations in Euclidean space considered belowassociates bilinear forms and linear transformations whose matrices relativeto an orthonormal basis are transposes of one another. This correspondenceis shown to be independent of the choice of basis.

the case of Euclidean spaces there exists a close connectionbetween bilinear forms and linear transformations 4.

Let R be a complex Euclidean space and let A (x; y) be a bilinearform on R. Let el, e2, , e be an orthonormal basis in R. Ifx + ¿2e2 + + eandy = n1e1 212e2 + +mien,then A (x; y) can be written in the form

A (x; y) = anEi Th. a151 î72 + -1- afiin(1) + 421E2771 + 422E2772 + + 42nE217n

ani En an2Eni12 + +We shall now try to represent the above expression as an inner

product. To this end we rewrite it as follows:



by the letter A, i.e., we shall put z = Ax. Then

A (x; y) = Eh CS72 d- + Cn Tin = (z, Y) = (Ax, Y)-Thus, a bilinear form A (x; y) on Euclidean vector space determines

a linear transformation A such that

A (x; y) (Ax, y).

The converse of this proposition is also true, namely:A linear transformation A on a Euclidean vector space determines

a bilinear form A (x; y) defined by the relation

A (x; y) (Ax, y).

The bilinearity of A (x; y) (Ax, y) is easily proved:

(A (x, x,), y) = (Ax, Ax2, y) = (Ax,, y) + (Ax,, y) ,

(Mx, y) = (2Ax, y) = 2(Ax, y).(x, A (y + y2)) = (x, Ay, + AY2) = (x, AY1) (x, Ay,),(x, An) = (x, pAy) = /2(x, Ay).

We now show that the bilinear form A (x y) determines thetransformation A uniquely. Thus, let

A (x; y) = (Ax, y)and

A (x; y) = (Bx, y).Then

(Ax, y) (Bx, y),

(Ax Bx, y) =

for all y. But this means that Ax Ex = 0 for all x HenceAx = Ex for all x, which is the same as saying that A = B. Thisproves the uniqueness assertion.

We can now sum up our results in the following

THEonEm 1. The equation

(2) A (x; y) = (Ax, y)

establishes a one-to-one correspondence between bilinear forms andlinear transformations on a Euclidean vector space.



The one-oneness of the correspondence established by eq. (2)implies its independence from choice of basis.

There is another way of establishing a connection betweenbilinear forms and linear transformations. Namely, every bilinearform can be represented as

A (x; y) (x, A*y).

This representation is obtained by rewriting formula (1) above inthe following manner:

A (x; y) = 6/12172 + + am /7n)2(4121171 a22 772 + + a2n77)

$ n(an1Fn a2772 + += + d12n2 + + din)+ + a2272 + + a2nn.)

$7,(c1,21% d,z2n2 + + dnom) = (x, A*y).

Relative to an orthogonal basis the matrix la*,] of A* and thematrix I laiklt of A are connected by the relation

a*, = dn.For a non-orthogonal basis the connection between the twomatrices is more complicated.

2. Transition from A to its adjoint (the operation *)

DEFINITION 1. Let A be a linear transformation on a complexEuclidean space. The transformation A* defined by

(Ax, y) = (x, A*y)

is called the adjoint of A.

THEOREM 2. In a Euclidean space there is a one-to-one correspond-ence between linear transformations and their adjoints.

Proof: According to Theorem 1 of this section every lineartransformation determines a unique bilinear form A (x; y)

(Ax, y). On the other hand, by the result stated in the conclu-sion of para. 1, every bilinear form can be uniquely represented as(x, My). Hence



(Ax, y) = A (x; y) = (x, A*y).

The connection between the matrices of A and A* relative to anorthogonal matrix was discussed above.

Some of the basic properties of the operation * are

(AB)* = B*A*.(A*)* = A.(A + B)* = A* + B*.(2A)* = a*.E* = E.

We give proofs of properties 1 and 2.(ABx, y) = (Bx, A*y) = (x, B*A*y).

On the other hand, the definition of (AB)* implies

(ABx, y) = (x, (AB)* y).

If we compare the right sides of the last two equations and recallthat a linear transformation is uniquely determined by the corre-sponding bilinear form we conclude that

(AB)* .= B* A*.

By the definition of A*,

(Ax, y) = (x, A* 3).

Denote A* by C. Then

(Ax, y) = (x, Cy),whence

(y, Ax) = (Cy, x).

Interchange of x and y gives

(Cx, y) -= (x, Ay).

But this means that C* A, i.e., (A*)* = A.EXERC/SES. 1. Prove properties 3 through 5 of the operation *.2. Prove properties 1 through 5 of the operation * by making use of the

connection between the matrices of A and A* relative to an orthogonalbasis.

Self-adjoint, unitary and normal linear transformations. Theoperation * is to some extent the analog of the operation of



conjugation which takes a complex number a into the complexnumber et. This analogy is not accidental. Indeed, it is clear thatfor matrices of order one over the field of complex numbers, i.e.,for complex numbers, the two operations are the same.

The real numbers are those complex numbers for which Cc =The class of linear transformations which are the analogs of thereal numbers is of great importance. This class is introduced by

DEFINITION 2. A linear transformation is called self-adjoint(Hermitian) if A* = A.

We now show that for a linear transformation A to be self-adjointit is necessary and sufficient that the bilinear form (Ax, y) beHermitian.

Indeed, to say that the form (Ax, y) is Hermitian is to say that

(Ax, y) = (Ay, x).Again, to say that A is self-adjoint is to say that

(Ax, y) = (x, Ay).

Clearly, equations (a) and (b) are equivalent.Every complex number is representable in the form= a + iß, a, /3 real. Similarly,Every linear transformation A can be written as a sum

(3) A = iA,,

where Al and A, are self-adjoint transformations.In fact, let A, = (A + A*)/ 2 and A2 (A A*)/2i. Then

A = A, + iA, and

A,* (A + A*)* (A + A*)* = + (A* + A**)2

+ (A* + A) = A1,/A AT

2A* - - (A A*)* = 7-, A* A**)2i 2i1=

2i(A* A) = A2,

i.e., A, and A, are self-adjoint.This brings out the analogy between real numbers and self-

adjoint transformations.



EXERCISES, I. Prove the uniqueness of the representation (3) of A.Prove that a linear combination with real coefficients of self-adjoint

transformations is again self-adjoint.Prove that if A is an arbitrary linear transformation then AA* and

A*A are self-adjoint.NOTE: In contradistinction to complex numbers AA* is, in general,

different from A*A.

The product of two self-adjoint transformations is, in general,not self-adjoint. However:

THEOREM 3. For the product AB of two self-adjoint transforma-tions A and B to be self-adjoint it is necessary and sufficient thatA and B commute.

Proof: We know thatA* = A and B* = B.

We wish to find a condition which is necessary and sufficient for(4) (AB)* = AB.Now,

(AB)* = B*A* = BA.Hence (4) is equivalent to the equation

AB = BA.This proves the theorem.

EXERCISE. Show that if A and B are self-adjoint, then AB + BA andi (AB BA) are also self-adjoint.

The analog of complex numbers of absolute value one areunitary transformations.

DEFINITION 3. A linear transformation U is called unitary ifUU* = U*15 = E. 5 In other words for a unitary transformationsU. = Ul.

In § 13 we shall become familiar with a very simple geometricinterpretation of unitary transformations.

EXERCISES. 1. Show that the product of two unitary transformations is aunitary transformation.

2. Show that if 15 is unitary and A self-adjoint, then AU is againself-adjoint.

5 In n-dimensional spaces TIE* = E and 1:*ti = E are equivalentstatements. This is not the case in infinite dimensional spaces.



In the sequel (§ 15) we shall prove that every linear transforma-tion can be written as the product of a self-adjoint transformationand a unitary transformation. This result can be regarded as ageneralization of the result on the trigonometric form of a complexnumber.

DEFINITION 4. A linear transformation A is called normal ifAA* = A* A.

There is no need to introduce an analogous concept in the fieldof complex numbers since multiplication of complex numbers iscommutative.

It is easy to see that unitary transformations and self-adjointtransformations are normal.

The subsequent sections of this chapter are devoted to a moredetailed study of the various classes of linear transformations justintroduced. In the course of this study we shall become familiarwith very simple geometric characterizations of these classes oftransformations.

§ 12. Self-adjoint (Hermitian) transformations.Simultaneous reduction of a pair of quadratic forms to a

sum of squares

1. Self-adjoint transformations. This section is devoted to amore detailed study of self-adjoint transformations on n-dimen-sional Euclidean space. These transformations are frequentlyencountered in different applications. (Self-adjoint transformationson infinite dimensional space play an important role in quantummechanics.)

LEMMA 1. The eigenvalues of a self-adjoint transformation are real.Proof: Let x be an eigenvector of a self-adjoint transformation

A and let A be the eigenvalue corresponding to x, i.e.,

Ax A.x; x O.

Since A* -= A,

that is,(Ax, x) = (x, Ax),

(2x, x) = (x, Ax),



Or,

2(x, x) = 71(x, x).

Since (x, x) 0, it follows that A = 1, which proves that A is real.

LEMMA 2. Let A be a self-adjoint transformation on an n-dimen-sional Euclidean vector space R and let e be an eigenvector of A.The totality R, of vectors x orthogonal to e form an (n 1)-dimen-sional subspace invariant under A.

Proof: The totality R1 of vectors x orthogonal to e form an(n 1)-dimensional subspace of R.

We show that R, is invariant under A. Let x e R. This meansthat (x, e) = 0. We have to show that Ax e R1, that is, (Ax, e)= O. Indeed,

(Ax, e) = (x, A*e) = (x, Ae) (x, 2e) = 2(x, e) = 0.

THEOREM 1. Let A be a self-adjoint transformation on an n-dimensional Euclidean space. Then there exist n pairwise orthogonaleigenvectors of A. The corresponding eigenvalues of A are all real.

Proof: According to Theorem 1, § 10, there exists at least oneeigenvector el of A. By Lemma 2, the totality of vectors orthogo-nal to e, form an (n 1)-dimensional invariant subspaceWe now consider our transformation A on R, only. In R, thereexists a vector e2 which is an eigenvector of A (cf. note to Theorem1, § 10). The totality of vectors of R, orthogonal to e, form an(n 2)-dimensional invariant subspace R2. In R, there exists aneigenvector e, of A, etc.

In this manner we obtain n pairwise orthogonal eigenvectorse1, e2, , en. By Lemma 1, the corresponding eigenvalues arereal. This proves Theorem 1.

Since the product of an eigenvector by any non-zero number isagain an eigenvector, we can select the vectors e. that each ofthem is of length one.

THEOREM 2. Let A be a linear transformation on an n-dimensionalEuclidean space R. For A to be self-adjoint it is necessary andsufficient that there exists an orthogonal basis relative to which thematrix of A is diagonal and real.

Necessity: Let A be self-adjoint. Select in R a basis consisting of


the n pairwise orthogonal eigenvectors e1, e2, , e of A con-structed in the proof of Theorem 1.

Since

Ael = 22e,,A;


Ae = Anen,

it follows that relative to this basis the matrix of the transforma-tion A is of the form

[A,. o o

o A, 0

0 0 An

where the Ai are real.Sufficiency: Assume now that the matrix of the transformation

A has relative to an orthogonal basis the form (1). The matrix ofthe adjoint transformation A* relative to an orthonormal basis isobtained by replacing all entries in the transpose of the matrix ofA by their conjugates (cf. § 11). In our case this operation has noeffect on the matrix in question. Hence the transformations A andA* have the same matrix, i.e., A A*. This concludes the proofof Theorem 2.

We note the following property of the eigenvectors of a self-adj oint transformation: the eigenvectors corresponding to differenteigenvalues are orthogonal.

Indeed, let

Ael = 22 , Ae, = 2.2e2, 21 22.

(1)

Then

that is

or

(Ael, e2) = (e1, A*e2) = (e1, A;),

¿1(e1, e2) = 22(e1, ez),

(2, 22) (e1, e2) = O.

Since Ai rf 4, it follows that

(e1, e2) = O.



NOTE: Theorem 2 suggests the following geometric interpretation of aself-adjoint transformation: We select in our space n pairwise orthogonaldirections (the directions determined by the eigenvectors) and associate witheach a real number Ai (eigenvalue). Along each one of these directions weperform a stretching by ¡2,1 and, in addition, if 2.; happens to be negative, areflection in the plane orthogonal to the corresponding direction.

Along with the notion of a self-adjoint transformation weintro-duce the notion of a Hermitian matrix.

The matrix Irai,11 is said to be Hermitian if ai,Clearly, a necessary and sufficient condition for a linear trans-

formation A to be self-adjoint is that its matrix relative to someorthogonal basis be Hermitian.

EXERCISE. Raise the matrix( 0 A/2)

A/2 1

to the 28th power. Hint: Bring the matrix to its diagonal form, raise it tothe proper power, and then revert to the original basis.

2. Reduction to principal axes. Simultaneous reduction of a pairof quadratic forms to a sum of squares. We now apply the resultsobtained in para. 1 to quadratic forms.

We know that we can associate with each Hermitian bilinearform a self-adjoint transformation. Theorem 2 permits us now tostate the importantTHEOREM 3. Let A (x; y) be a Hermitian bilinear form defined on

an n-dimensional Euclidean space R. Then there exists an orthonor-mal basis in R relative to which the corresponding quadratic form canbe written as a sum of squares,

A (x; x) = ili[e ii2,

where the Xi are real, and the $1 are the coordi ales of the vectorx. 6

Proof: Let A( y) be a Hermitian bilinear form, i.e.,

A (x; y) = A (y; X),

We have shown in § 8 that in any vector space a Hermitian quadraticform can be written in an appropriate basis as a sum of squares. In the caseof a Euclidean space we can state a stronger result, namely, we can assertthe existence of an orthonnal basis relative to which a given Hermitianquadratic form can be reduced to a sum of squares.



then there exists (cf. § 11) a self-adjoint linear transformation Asuch that

A (x; y) (Ax, y).

As our orthonormal basis vectors we select the pairwise orthogo-nal eigenvectors e1, e2, en of the self-adjoint transformation A(cf. Theorem 1). Then

Ael = 21e1, Ae2 = 12e2, Aen An en.

Let

x = ei e2e, + + e , y e, + n2 e2 + + nn. .

Since

I 1 for i = k0 for i k,

we get

A (x; y) (Ax, y)= e2Ae2 + + en Aen , n1e1 /12e2 + + ?)en)= 22e2e2 + -I- An enen, %el n2e2 + + nnen)= 1E11 + 225 + + fin

In particular

A (x; x) = (Ax, x) 211$112,121 212 + + Arisni2.

This proves the theorem.

The process of finding an orthonormal basis in a Euclideanspace relative to which a given quadratic form can be representedas a sum of squares is called reduction to principal axes.

THEOREM 4. Let A (x; x) and B(x; x) be two Hermitian quadraticforms on an n-dimensional vector space R and assume B(x; x) to bepositive definite. Then there exists a basis in R relative to whicheach form can be written as a sum of squares.

Proof: We introduce in R an inner product by putting (x, y)B(x; y), where B(x; y) is the bilinear form corresponding to

B(x; x). This can be done since the axioms for an inner productstate that (x, y) is a Hermitian bilinear form corresponding to apositive definite quadratic form (§ 8). With the introduction of aninner product our space R becomes a Euclidean vector space. By



theorem 3 R contains an orthonormal basis el, e2, , erelative to which the form A (x; x) can be written as a sum ofsquares,

A (x; x) = 211E112 1215212 + + 41E7,12.

Now, with respect to an orthonormal basis an inner producttakes the form

(x, x) = ei I2 + 1E212 + + [EF2Since B(x x) (x, x), it follows that

B(x; x

We have thus found a basis relative to which both quadraticforms A (x; x) and B(x; x) are expressible as sums of squares.

We now show how to find the numbers AI, 22, , Ar, whichappear in (2) above.

The matrices of the quadratic forms A and B have the followingcanonical form:

[AI 1

d=0 22 0

0 [0

0 0

,q =

A

Consequently,

Det (id AR) (A1 A) (22 2) (2 A).

Under a change of basis the matrices of the Hermitian quadraticforms A and B go over into the matrices Jill = (t* d%' and

= %)* . Hence, if el, e2, , en is an arbitrary basis, thenwith respect to this basis

Det 141) Det V* Det (at Al) Det C,

i.e., Det 141) differs from (4) by a multiplicative constant.It follows that the numbers A, 22, , A are the roots of the equation

a Ab aan 2b21 a22 Ab22 a2n 21)2n

) + 1E21' + + le.12.

Abu ' al,, /bin

a, Abni an2 2.1)2 a,, Ab,,,,

Orthonormal relative to the inner product (x, y) = B(x; y).



where Haikl F and 0i/A are the matrices of the quadratic formsA (x; x) and B(x; x) in some basis e, e2, , en.

NOTE: The following example illustrates that the requirement that one ofthe two forms be positive definite is essential. The two quadratic forms

A (X; X) = let12 142, B(x; x) =neither of which is positive definite, cannot be reduced simultaneously to asum of squares. Indeed, the matrix of the first form is

[1 010 1and the matrix of the second form is

a, ro 11Li oJ

Consider the matrix a RR, where A is a real parameter. Its determinantis equal to (A2 + 1) and has no real roots. Therefore, in accordance withthe preceding discussion, the two forms cannot be reduced simultaneouslyto a sum of squares.

§ 13. Unitary transformations

In § 11 we defined a unitary transformation by the equation

(1) UU* U*U E.

This definition has a simple geometric interpretation, namely:A unitary transformation U on an n-dimensional Euclidean

space R preserves inner products, i.e.,

(Ux, Uy) (x, y)

for all x, y E R. Conversely, any linear transformation U whichpreserves inner products is unitary (i.e., it satisfies condition (1)).

Indeed, assume U*U = E. Then

(Ux, Uy) = (x, U*Uy) = (x, y).

Conversely, if for any vectors x and y

(Ux, Uy) = (x, y),then

(U*Ux, y) = (x, y),that is

(U*Ux, y) = (Ex, y).



Since equality of bilinear forms implies equality of correspondingtransformations, it follows that U*LI = E, i.e., U is unitary.

In particular, for x = y we have

(Ux, Ux) = (x, x),

i.e., a unitary transformation preserves the length of a vector.EXERCISE. Prove that a linear transformation which preserves length is

unitary.

We shall now characterize the matrix of a unitary transforma-tion. To do this, we select an orthonormal basis el, e2, , en.Let

[all 1E12

a21 a22 aa

a1 a2 abe the matr x of the transformation U relative to this basis. Then

dn dnd12 d22

an]]dn2

al,, a2,, ann

is the matrix of the adjoint U* of U.The condition UU* = E implies that the product of the matrices

(2) and (3) is equal to the unit matrix, that is,

aiti, = 1, aak = O (i k).a=1 a-1

Thus, relative to an orthonormal basis, the matrix of a unitarytransformation U has the following properties: the sum of the productsof the elements of any YOW by the conjugates of the correspondingelements of any other YOW is equal to zero; the sum of the squares ofthe moduli of the elements of any row is equal to one.

Making use of the condition U*U = E we obtain, in addition,

a2d,, = 1, a(T = O (i k).a=1 a=1

This condition is analogous to the preceding one, but refers to thecolumns rather than the rows of the matrix of U.


(6)


Condition (5) has a simple geometric meaning. Indeed, theinner product of the vectors

Uei = ai, + a2e2 + + aand

akk a2k e2 + + ankeis equal to axid (since we assumed el, e2, , en to be anorthonormal basis). Hence

f 1 for i = k,(Uei, Uek) 1

0 for i k.

It follows that a necessary and sufficient condition for a lineartransformation U to be unitary is that it take an orthonormal basise1, e2, en into an orthonormal basis Uek , Ue2, , Uen.

A matrix I laall whose elements satisfy condition (4) or, equiva-lently, condition (5) is called unitary. As we have shown unitarymatrices are matrices of unitary transformations relative to anorthonormal basis. Since a transformation which takes anorthonormal basis into another orthonormal basis is unitary, thematrix of transition from an orthonormal basis to another ortho-normal basis is also unitary.

We shall now try to find the simplest form of the matrix of aunitary transformation relative to some suitably chosen basis.

LEMMA 1. The eigenvalues of a unitary transformation are inabsolute value equal to one.

Proof: Let x be an eigenvector of a unitary transformation U andlet A be the corresponding eigenvalue, i.e.,

Ux = Ax, x O.

Then(x, x) = (Ux, Ux) = (2x, 2x) = 22(x, x),

that is, Ai = 1 or 121 = 1.

LEMMA 2. Let U be a unitary transfor ation on an n-di ensionalspace R and e its eigenvector, i.e.,

Ue = 2e, e O.

Then the (n 1)-d mensional subspace R, of R consisting of allvectors x orthogonal to e is invariant under U.



Proof: Let x E R, , i.e., (x, e) = 0. We shall show that Ux e R1,i.e., (Ux, e) O. Indeed,

(Ux, Ue) = (U*Ux, e) = (x, e) --- O.

Since Ue = ae, it follows that i(Ux, e) = 0. By Lemma 1,0 0, hence (Ux, e) = 0, i.e., Ux E . Thus, the subspace R1

is indeed invariant under U.THEOREM 1. Let U be a unitary transformation defined on an

n-dimensional Euclidean space R. Then U has n pairwise orthogo-nal eigenvectors. The corresponding eigenvalues are in absolute valueequal to one.

Proof: In view of Theorem 1, § 10, the transformation U as alinear transformation has at least one eigenvector. Denote thisvector by el. By Lemma 2, the (n 1)-dimensional subspace R,of all vectors of R which are orthogonal to e, is invariant under U.Hence R, contains at least one eigenvector e2 of U. Denote by R2the invariant subspace consisting of all vectors of R1 orthogonalto e2. R2 contains at least one eigenvector e3 of U, etc. Proceedingin this manner we obtain n pairwise orthogonal eigenvectorse,, , en of the transformation U. By Lemma 1 the eigenvaluescorresponding to these eigenvectors are in absolute value equal toone.

THEOREM 2. Let U be a unitary transformation on an n-dimen-sional Euclidean space R. Then there exists an orthonormal basis inR relative to which the matrix of the transformation U is diagonal,i.e., has the form

[2, o

(7) o 22 oi.o

The numbers 4, 4, , A are in absolute value equal to one.Proof: Let U be a unitary transformation. We claim that the n

pairwise orthogonal eigenvectors constructed in the precedingtheorem constitute the desired basis. Indeed,

Ue, =Ue, = 22e2,

Ue =



and, therefore, the matrix of U relative to the basis e1, e2, ,has form (7). By Lemma 1 the numbers Ai, 22, , An are inabsolute value equal to one. This proves the theorem.

EXERCISES. 1. Prove the converse of Theorem 2, i.e., if the matrix of Uhas form (7) relative to some orthogonal basis then U is unitary.

2. Prove that if A is a self-adjoint transformation then the transforma-tion (A iE)-1 (A + iE) exists and is unitary.

Since the matrix of transition from one orthonormal basis toanother is unitary we can give the following matrix interpretationto the result obtained in this section.

Let all be a unitary matrix. Then there exists a unitary matrix'V such that

Pi= rigr,where is a diagonal matrix whose non-zero elements are equal inabsolute value to one.

Analogously, the main result of para. 1, § 12, can be given thefollowing matrix interpretation.

Let sal be a Hermitian matrix. Then sat can be represented inthe form

sit =

where ir is a unitary matrix and g a diagonal matrix whose non-zero elements are real.

§ 14. Commutative linear transformations. Normaltransformations

1. Commutative transformations. We have shown (§ 12) that foreach self-adjoint transformation there exists an orthonormal basisrelative to which the matrix of the transformation is diagonal. Itmay turn out that given a number of self-adjoint transformations,we can find a basis relative to which all these transformations arerepresented by diagonal matrices. We shall now discuss conditionsfor the existence of such a basis. VVe first consider the case of twotransformations.

LEMMA 1. Let A and B be two commutative linear transformations,i.e., let

AB = BA.



Then the eigenvectors of A which correspond to a given eigenvalue A ofA form (together with the null vector) a subspace RA invariant underthe transformation B.

Proof: We have to show that if

x ERA, i.e., Ax = 2x,then

Bx e Ra, i.e., ABx = 2Bx.Since AB -= BA, we have

ABx = BAx = B2x = 2.13x,

which proves our lemma.LEMMA 2. Any two commutative transformations have a common

eigenvector.Proof: Let AB = BA and let RA be the subspace consisting of

all vectors x for which Ax ---- 2x, where A is an eigenvalue of A.By Lemma 1, RA is invariant under B. Hence RA contains a vectorx, which is an eigenvector of B. xo is also an eigenvector of A,since by assumption all the vectors of RA are eigenvectors of A.

NOTE: If AB = BA we cannot claim that every eigenvector of Ais also an eigenvector of B. For instance, if A is the identity trans-formation E, B a linear transformation other than E and x avector which is not an eigenvector of B, then x is an eigenvector ofE, EB BE and x is not an eigenvector of B.

THEOREM 1. Let A and B be two linear self-adjoint transformationsdefined on a complex n-dimensional vector space R. A necessary andsufficient condition for the existence of an orthogonal basis in Rrelative to which the transformations A and B are represented bydiagonal matrices is that A and B commute.

Sufficiency: Let AB EA. Then, by Lemma 2, there ex sts avector e, which is an eigenvector of both A and B, i.e.,

Ae, = 21e1, Be, =The (n 1)-dimensional subspace R, orthogonal to e, is invariantunder A and B (cf. Lemma 2, § 12). Now consider A and B on R,only. By Lemma 2, there exists a vector e, in R, which is an eigen-vector of A and B:

Ae, = 22e2, Be2 = u2e2.



All vectors of R, which are orthogonal to e2 form an (n 2)-dimensional subspace invariant under A and B, etc. Proceeding inthis way we get n pairwise orthogonal eigenvectors e1, e2, ,of A and B:

Aei 2,e1 , Bei = pie, (i = 1, , n).

Relative to e1, e2, e the matrices of A and B are diagonal.This completes the sufficiency part of the proof.

Necessity: Assume that the matrices of A and B are diagonalrelative to some orthogonal basis. It follows that these matricescommute. But then the transformations themselves commute.

EXERCISE. Let U, and U, be two commutative unitary transformations.Prove that there exists a basis relative to which the matrices of U, and U,are diagonal.

NOTE: Theorem I can be generalized to any set of pairwise commutativeself-adjoint transformations. The proof follows that of Theorem 1 butinstead of Lemma 2 the following Lemma is made use of :

LEMMA 2'. The elements of any set of pairwise commutative transformationson a vector space R have a common eigenvector.

Proof: The proof is by induction on the dimension of the space R. In thecase of one-dimensional space (n I ) the lemma is obvious. We assumethat it is true for spaces of dimension < n and prove it for an n-dimensionalspace.

If every vector of R is an eigenvector of all the transformations A, B,C, in our set Sour lemma is proved. Assume therefore that there exists avector in R which is not an eigenvector of the transformation A, say.

Let R, be the set of all eigenvectors of A corresponding to some eigenvalueA of A. By Lemma 1, R, is invariant under each of the transformationsB, C, (obviously, R, is also invariant under A). Furthermore, R, is asubspace different from the null space and the whole space. Hence R, is ofdimension n 1. Since, by assumption, our lemma is true for spaces ofdimension < n, R1 must contain a vector which is an eigenvector of thetransformations A, B, C, . This proves our lemma.

2. Normal transformations. In §§ 12 and 13 we considered twoclasses of linear transformations which are represented in asuitable orthonormal basis by a diagonal matrix. We shall nowcharacterize all transformations with this property.

THEOREM 2. A necessary and sufficient condition for the existence

This means that the transformations A, B, C, are multiples of theidentity transformation.



of an orthogonal basis relative to which a transformation A is represent-ed by a diagonal matrix is

AA* = A*A

(such transformations are said to be normal, cf. § 11).Necessity: Let the matrix of the transformation A be diagonal

relative to some orthonormal basis, i.e., let the matrix be of theform

[2, 0 0

0 22 0

0 0 IL,

Relative to such a basis the matrix of the transformation A* hasthe form

[Al

0 0

0 i,

0 0

Since the matrices of A and A* are diagonal they commute. Itfollows that A and A* commute.

Sufficiency: Assume that A and A* commute. Then by Lemma 2there exists a vector el which is an eigenvector of A and A*, i.e.,

Ae1=21e1, Ate1=p1e1.9The (n 1)-dimensional subspace R1 of vectors orthogonal to e,is invariant under A as well as under A*. Indeed, let x E 141, i.e.,(x, e1) = 0. Then

(Ax, e1) = (x, Ate) (x, pled = [71(x, el) = 0,

that is, Ax e R. This proves that R, is invariant under A. Theinvariance of R, under A* is proved in an analogous manner.

Applying now Lemma 2 to R.1, we can claim that R1 contains avector e, which is an eigenvector of A and A*. Let R2 be the(n 2)-dimensional subspace of vectors from R2 orthogonal toe2, etc. Continuing in this manner we construct n pairwise ortho-gonal vectors e, e,, , e which are eigenvectors of A and A*.

9 EXERCISE. Prove that pi =



The vectors e1, e2, e form an orthogonal basis relative towhich both A and A* are represented by diagonal matrices.

An alternative sufficiency proof. Let

A + A* A A*A1= , A2

2 2i

The transformations A1 and A, are self-adjoint. If A and A*commute then so do A, and A2. By Theorem I, there exists anorthonormal basis in which A, and A, are represented by diagonalmatrices. But then the same is true of A = A, + iA2.

Note that if A is a self-adjoint transformation then

AA* A*A = A2,

i.e., A is normal. A unitary transformation U is also normal sinceUU* U*U = E. Thus some of the results obtained in para. 1,§ 12 and § 13 are special cases of Theorem 2.

EXERCISES. 1. Prove that the matrices of a set of normal transformationsany two of which commute are simultaneously diagonable.

Prove that a normal transformation A can be written in the form

A = HU UH,

where H is self-adjoint, U unitary and where H and U commuteHint: Select a basis relative to which A and A* are diagonable.

Prove that if A HU, where H and U commute, H is self-adjointand U unitary, then A is normal.

§ IS. Decomposition of a linear transformation into aproduct of a unitary and self-adjoint transformation

Every complex number can be written as a product of a positivenumber and a number whose absolute value is one (the so-calledtrigonometric form of a complex number). We shall now derive ananalogous result for linear transformations.

Unitary transformations are the analog of numbers of absolutevalue one. The analog of positive numbers are the so-called positivedefinite linear transformations.

DEFINITION 1. A linear transformation H is called positivedefinite if it is self-adjoint and if (Hx, x) 0 for all x.

THEOREM 1. Every non-singular linear transformation A can be



represented in the form

A = HU (or A = U,H,),

where H(H1) is a non-singular positive definite transformation andU(U1) a unitary transformation.

We shall first assume the theorem true and show how to findthe necessary H and U. This will suggest a way of proving thetheorem.

Thus, let A = HU, where U is unitary and H is a non-singularpositive definite transformation. H is easily expressible in terms ofA. Indeed,

so thatAA* -= H2.

Consequently, in order to find H one has to "extract the squareroot" of AA*. Having found H, we put U = H-1A.

Before proving Theorem 1 we establish three lemmas.

LEMMA 1. Given any linear transformation A, the transformationAA* is positive definite. If A is non-singular then so is AA*.

Proof: The transformation AA* is positive definite. Indeed,

(AA*)* = A**A* = AA*,

that is, AA* is self-adjoint. Furthermore,

(AA* x, x) = (A*x, A*x) 0,

for all x. Thus AA* is positive definite.If A is non-singular, then the determinant of the matrix ilai,11 of

the transformation A relative to any orthogonal basis is differentfrom zero. The determinant of the matrix I fri,211 of the transfor-mation A* relative to the same basis is the complex conjugate ofthe determinant of the matrix 11(4,0. Hence the determinant ofthe matrix of AA* is different from zero, which means that AA* isnon-singular.

LEMMA 2. The eigenvalues of a positive definite transformation Bare non-negative. Conversely, if all the eigenvalues of a self-adjointtransformation B are non-negative then B is positive definite.



Proof. Let B be positive definite and let Be = 2e. Then

(Be, e) = 2(e, e).

Since (Be, e) >. 0 and (e, e) > 0, it follows that A O.

Conversely, assume that all the eigenvalues of a self-adjointtransformation B are non-negative. Let e1, e2, , e be anorthonormal basis consisting of the eigenvectors of B. Let

x = E2e2 + +e,be any vector of R. Then

(Bx, x)= (el Bel E2 Be2 + + $Be, E2e2 + -Fe)

(I) (E121e1+$222e2+ +$/1,en, E1e1fe2e2+ ±e)221E2 ... An env.

S nce all the 1 are non-negative it follows that (Bx, x) O.

NOTE: It iS clear from equality (1) that if all the A, are positivethen the transformation B is non-singular and, conversely, if B ispositive definite and non-singular then the are positive.

LEMMA 3. Given any positive definite transformation B, thereexists a positive definite transformation H such that H2 = B (inthis case we write H = Bi). In addition, if B is non-singularthen H is non-singular.

Proof: We select in R an orthogonal basis relative to which B isof the form

[Al O 0 1

B=0 A,

0 0 2where 21, 22, , Ar, are the eigenvalues of B. By Lemma 2 allA,>. O. Put

[V21. O 0

H= VA2 '

O 0 \/2App y ng Lemma 2 again we conclude that H is positive definite.



Furthermore, if B is non-singular, then (cf. note to Lemma 2)> O. Hence A/2i > 0 and H is non-singular

We now prove Theorem 1. Let A be a non-singular lineartransformation. Let

H = (AA*).

In view of Lemmas 1 and 3, H is a non-singular positive definitetransformation. If(2) U =then U is unitary. Indeed.

UU* = H--1A (H-1A)* = H-1AA* H-' = H-1112H-' = E.Making use of eq. (2) we get A = HU. This completes the proof ofTheorem 1.

The operat on of extracting the square root of a transformationcan be used to prove the following theorem:

THEOREM. Let A be a non-singular positive definite transforma-tion and let B be a self-adjoint transformation. Then the eigenvaluesof the transformation AB are real.

Proof: We know that the transformationsX = AB and C-1 XC

have the same characteristic polynomials and therefore the sameeigenvalues. If we can choose C so that C-i XC is self-adjoint,then C-1 XC and X = AB will both have real eigenvalues. Asuitable choice for C is C Ai. Then

C-1XC = A1ABA1 Ai BA+,

which is easily seen to be self-adjoint. Indeed,(Ai 132Ai )* = (Ai )* B* (Ai)* = A1 BA'.

This completes the proof.EXERCISE. Prove that if A and B are positive definite transformations, at

least one of which is non-singular, then the transformation AB has non-negative eigenvalues.

§ 16. Linear transformations on a real Euclidean spaceThis section will be devoted to a discussion of linear transfor-

mations defined on a real space. For the purpose of this discussion



the reader need only be familiar with the material of §§ 9 through 11of this chapter.

1. The concepts of invariant subspace, eigenvector, and eigen-value introduced in § 10 were defined for a vector space over anarbitrary field and are therefore relevant in the case of a realvector space. In § 10 we proved that in a complex vector spaceevery linear transformation has at least one eigenvector (one-dimensional invariant subspace). This result which played afundamental role in the development of the theory of complexvector spaces does not apply in the case of real spaces. Thus, arotation of the plane about the origin by an angle different fromhat is a linear transformation which does not have any one-dimen-sional invariant subspace. However, we can state the following

THEOREM 1. Every linear transformation in a real vector space Rhas a one-dimensional or two-dimensional invariant subspace.

Proof: Let e1, e2, , en be a basis in R and let I la ,2( be thematrix of A relative to this basis.

Consider the system of equations

(1)

6112E2 + 4222 e2 + T ainE 2E2,(4221 + a22E2 T + a2$ = 2E2,

ar1E2 T a2$2 T + annen = 2$7,.

The system 1) has a non-trivial solution if and only if

an 2 0112 ala22 2 a2

an,a2 a AThis equation is an nth order polynomial equation in A with realcoefficients. Let A be one of its roots. There arise two possibilities:

a. Ao is a real root. Then we can find numbers E1°, $20, ,not all zero which are a solution of (1). These numbers are thecoordinates of some vector x relative to the basis e1, e2, , e.We can thus rewrite (1) in the form

Ax = 2,x,

i.e., the vector x spans a one-dimensional invariant subspace.


1 16 LECTURES ON LINEAR ALGEBRA

b. + O. Let

+ inn E2 1.1/2, E 1.)7.

be a solution of (1 ). Replacing $i, $2, , $ in ( 1) by thesenumbers and separating the real and imaginary parts we get

(2)

and

(2)'

cJane,.

+ a12e2 = + amen --= ace' Pni,anEi + 022E2 + + azii en Cte2 A2,

ani$, a2$2 + + a7,$ = a&2

an .,1 + a12 -i-n2 ' ' ' + alniin = °U71. 4- ßE1,fra21n1 a22n2 + + a2nnii = 15t/12 /3E2,

+ 02072 + ' annyin = Gobi + ß,.

The numbers Eib e2 " en (ni, n2, n) are the coordi-nates of some vector x (y) in R. Thus the relations (2) and (2')can be rewritten as follows

Ax acx fly; Ay = + t3x.

Equations (3) imply that the two dimensional subspace spannedby the vectors x and y is invariant under A.

In the sequel we shall make use of the fact that in a two-dimen-sional invariant subspace associated with the root 2 = oc 43 thetransformation has form (3).

EXERCISE. Show that in an odd-dimensional space (in particular, three-dimensional) every transformation has a one-dimensional invariant sub-space.

2. Self-adjoint transformations

DEFINITION 1. A linear transformation A defi ed on a realEuclidean space R is said to be self-adjoint if

(Ax, y) = (x, Ay)

for any vectors x and y.

Let el, e2, e be an orthonormal basis in R and let

x e2 e2 + + enen, Y = Ghei n2e2 ' ' +Furthermore, let Ci be the coordinates of the vector z Ax, i.e.,



=a/M,k=1

where jaiklj is the matrix of A relative to the basis el, e2, , en.It follows that

(Ax, Y) = (z, Y) = E :ini = aikk?h1=1 i, k=1

Similarly,

(x, Ay) = aikeink.k=1

Thus, condition (4) is equivalent to

aik aki.

To sum up, for a linear transformation to be self-adjoint it isnecessary and sufficient that its matrix relative to an orthonormal basisbe symmetric.

Relative to an arbitrary basis every symmetric b 1 near formA (x; y) is represented by

A (X; 3r) = aikeinkk=1

where aik ak.i. Comparing (5) and (6) we obta n the followingresult:

Given a symmetric bilinear form A (x; y) there ex sts a self-adjointtransformation A such that

A (x; y) = (Ax, y).

VVe shall make use of this result in the proof of Theorem 3 ofthis section.

We shall now show that given a self-adjoint transformationthere exists an orthogonal basis relative to which the matrix ofthe transformation is diagonal. The proof of this statement will bebased on the material of para. 1. A different proof which does notdepend on the results of para. I and is thus independent of thetheorem asserting the existence of the root of an algebraic equationis given in § 17.

We first prove two lemmas.



LEMMA 1. Every self-adjoint transformation has a one-di ensionalinvariant subspace.

Proof: According to Theorem 1 of this section, to every realroot A of the characteristic equation there corresponds a one-dimensional invariant subspace and to every complex root A, atwo-dimensional invariant subspace. Thus, to prove Lemma 1we need only show that all the roots of a self-adjoint transforma-tion are real.

Suppose that A = + O. In the proof of Theorem 1 weconstructed two vectors x and y such that

Ax = ax fiy,Ay = fix + ay.

But then

(Ax, Y) = ix(x, Y) y)(x, Ay) = /3(x, x) (x, y).

Subtracting the first equation from the second we get [note that(Ax, y) = (x, Ay)]

O = 2/3[(x, x) + (y, y)].

S nce (x, x) + (y, y) = 0, it follows that 13 = O. Contradiction.

LEMMA 2. Let A be a self-adjoint transformation and el aneigenvector of A. Then the totality R' of vectors orthogonal to elforms an (n 1)-dimensional invariant subspace.

Proof: It is clear that the totality R' of vectors x, x e R,

orthogonal to e, forms an (n 1)-dimensional subspace. Weshow that R' is invariant under A.

Thus, let x e R', i.e., (x, e1) = O. Then

(Ax, = (x, Aei) = (x, 2e1) = 2(x, el) = 0,

i.e., Ax E R'.

THEOREM 2. There exists an orthonormal basis relative to whichthe matrix of a self-adjoint transformation A is diagonal.

Proof: By Lemma 1, the transformation A has at least oneeigenvector e,.

Denote by R' the subspace consisting of vectors orthogonal to e,.Since R' is invariant under A, it contains (again, by Lemma 1)


LINEAR TRA.'SFORMATIONS 119

an eigenvector e, of A, etc. In this manner we obta n n pairwiseorthogonal eigenvectors e1, e2, , e .

Since

Aei = 2,e, (i = 1, 2, -, n),

the matr x of A relative to the e, is of the form

2, o - - - o[ 1

o A, o

oo

3. Reduction of a quadratic form to a sum of squares relative to anorthogonal basis (reduction to principal axes). Let A (x; y) be asymmetric bilinear form on an n-dimensional Euclidean space.We showed earlier that to each symmetric bilinear form A (x; y)there corresponds a linear self-adjoint transformation A such thatA (x; y) = (Ax, y). According to Theorem 2 of this section thereexists an orthonormal basis e1, e2, , e consisting of theeigenvectors of the transformation A (i.e., of vectors such that

Aei 2ei). With respect to such a basisA (x; y) = (Ax, y)= (A($jel $2e2 + En e), /he,. ri2e2 + nen)= -H 22 ' 2e2 + the,. + /2e2 + ' -{--Iien)

21e17l+ 22E2T2 + + 2e22,,Putting y = x we obtain the following

THEOREM 3. Let A (x; x) be a quadratic fornt on an n-dimensionalEuclidean space. T hen there exists an orthonormal basis relative towhich the quadratic form can be represented as

A (x; x)

Here the 2, are the eigenvalues of the transformation A or, equiv-alently, the roots of the characteristic equation of the matrixHaitl

For n 3 the above theorem is a theorem of solid analytic geometry.Indeed, in this case the equation

A (x; x) 1

is the equation of a central conic of order two. The orthonorrnal basis



discussed in Theorem 3 defines in this case the coordinate system relativeto which the surface is in canonicid form. The basis vectors e1, e2, e3, aredirected along the principal axes of the surface.

4. Simultaneous reduction of a pair of quadratic forms to a sumof squares

THEORENI 4. Let A (x; x) and B(x; x) be two quadratic forms onan n-dimensional space R, and let B(x; x) be positive definite. Thenthere exists a basis in R relative to which each fornt is expressed asa sum of squares.

Proof: Let B(x; y) be the bilinear form corresponding to thequadratic form B(x; x). We define in R an inner product bymeans of the formula

(x, y) = B(x; y).

By Theorem 3 of this section there exists an orthonormal basise1, e2, ea relative to which the form A (x; x) is expressed as asum of squares, i.e.,

A (x; x) =27=1.

Relative to an orthonormal basis an inner product takes the form

(x, x) = B(x; x) = I E2.

Thus, relative to the basis e1, e2, e each quadratic formcan be expressed as a sum of squares.

5. Orthogonal transformations

DF:FINITION. A linear transformation A defined on a real n-dimen-sional Euclidean space is said to be orthogonal if it preserves innerproducts, i.e., if

(Ax, Ay) = (x, y)for all x, y E R.

Putting x =- y in (9) we getlAx12 IxJ2,

that is, an orthogonal transformation is length preservEXE RC 'SE. Prove that condition (10) is sufficient for a transformation

to be orthogonal.


Since(x, y)

cos 99 =ix)

and since neither the numerator nor the denominator in theexpression above is changed under an orthogonal transformation,it follows that an orthogonal transformation preserves the anglebetween two vectors.

Let e1, e2, , en be an orthonormal basis. Since an orthogonaltransformation A preserves the angles between vectors and thelength of vectors, it follows that the vectors Aei, Ae , Aelikewise form an orthonormal basis, i.e.,

{I for i k(A;, A;)0 for i k.

Now let Ila11 be the matrix of A relative to the basis e1, e2, ,en. Since the columns of this matrix are the coordinates of thevectors Ae conditions (11) can be rewritten as follows:

{1 for i = kanan =0 for i k.a-1

EXERCISE. Show that conditions (I1) and, consequently, conditions (12)are sufficient for a transformation to be orthogonal.

Conditions (12) can be written in matrix form. Indeed,

I axian are the elements of the product of the transpose of thea=1

matrix of A by the matrix of A. Conditions (12) imply thatthis product is the unit matrix. Since the determinant of the pro-duct of two matrices is equal to the product of the determinants,it follows that the square of the determinant of a matrix of anorthogonal transformation is equal to one, i.e., the determinant of amatrix of an orthogonal transformation is equal to + 1.

An orthogonal transformation whose determinant is equal to+ lis called a proper orthogonal transformation, whereas an ortho-gonal transforMation whose determinant is equal to 1 is calledimproper.

EXERCISE. Show that the product of two proper or two improperorthogonal transformations is a proper orthogonal transformation and theproduct of a proper by an improper orthogonal transformation is animproper orthogonal transformation.




NOTE: What motivates the division of orthogonal transformations intoproper and improper transformations is the fact that any orthogonal trans-formation which can be obtained by continuous deformation from theidentity transformation is necessarily proper. Indeed, let A, be an orthogo-nal transformation which depends continuously on the parameter t (thismeans that the elements of the matrix of the transformation relative to somebasis are continuous functions of t) and let An = E. Then the determinantof this transformation is also a continuous function of t. Since a continuousfunction which assumes the values ± I only is a constant and since fort 0 the determinant of A, is equal to 1, it follows that for t 0 thedeterminant of the transformation is equal to 1. Making use of Theorem 5of this section one can also prove the converse, namely, that every properorthogonal transformation can be obtained by continuous deformation ofthe identity transformation.

We now turn to a discussion of orthogonal transformat ons inone-dimensional and tviro-dimensional vector spaces. In the sequelwe shall show that the study of orthogonal transformations in aspace of arbitrary dimension can be reduced to the study of thesetwo simpler cases.

Let e be a vector generating a one-dimensional space and A anorthogonal transformation defined on that space. Then Ae Ae

and since (Ae, Ae) = (e, e), we have 2.2(e, e) = (e, e), i.e., A = 1.

Thus we see that in a one-dimensional vector space there existtwo orthogonal transformations only: the transformation Ax xand the transformation Ax an x. The first is a proper and thesecond an improper transformation.

Now, consider an orthogonal transformation A on a two-dimensional vector space R. Let e1, e2 be an orthonormal basis inR and let

[7/ /be the matrix of A relative to that basis.

We first study the case when A is a proper orthogonal trans-formation, i.e., we assume that acó ßy -= 1.

The orthogonality condition implies that the product of thematrix (13) by its transpose is equal to the unit matrix, i.e., that

(14) Fa )51-1

Ly JFa vlLß fit



Since the determinant of the matrix (13) is equal to one, we have

(15)fi'br --13.1.

It follows from (14) and (15) that in this case the matrix of thetransformation is

rwhere a2 + ß2 = 1. Putting x = cos q», ß sin qi we find thatthe matrix of a proper orthogonal transformation on a two dimensionalspace relative to an orthogonal basis is of the form

[cos 9)

sinsin 92-1

cos 9'I

(a rotation of the plane by an angle go).Assume now that A is an improper orthogonal transformation,

that is, that GO ßy = 1. In this case the characteristicequation of the matrix (13) is A2 (a + 6)2 1 = O and, thus,has real roots. This means that the transformation A has aneigenvector e, Ae = /le. Since A is orthogonal it follows thatAe ±e. Furthermore, an orthogonal transformation preservesthe angles between vectors and their length. Therefore any vectore, orthogonal to e is transformed by A into a vector orthogonal toAe ±e, i.e., Ae, +e,. Hence the matrix of A relative to thebasis e, e, has the form

F±IL o +1j.

Since the determinant of an improper transformation is equal to-- 1, the canonical form of the matrix of an improper orthogonaltransformation in two-dimensional space is

a reflection in one of the axes).We now find the simplest form of the matrix of an orthogonal

transformation defined on a space of arbitrary dimension.

(

HE oi 1 01Or

L o o +1



THEOREM 5. Let A be an orthogonal transforma/ion defined on ann-dimensional Euclidean space R. Then there exists an orthonormalbasis el, e,, , e ofR relative to which the matrix of the transforma-tion is

1cos 92, sin 921

sin 921 cos ch.

COS 92k -sin 92, cos 99,_

where the unspecified entries have value zero.Proof: According to Theorem 1 of this section R contains a

one-or two-dimensional invariant subspace Ru). If there exists aone-dimensional invariant subspace WI) we denote by el a vectorof length one in that space. Otherwise Wu is two dimensional andwe choose in it an orthonormal basis e1, e,. Consider A onIn the case when R(') is one-dimensional, A takes the form Ax= x. If Wu is two dimensional A is a proper orthogonal trans-formation (otherwise R") would contain a one-dimensionalinvariant subspace) and the matrix of A in Rn) is of the form

rcos sin wi

Lsin cos (pi

The totality 11 of vectors orthogonal to all the vectors of Rn)forms an invariant subspace.

Indeed, consider the case when Rn) is a two-dimensional space,say. Let x e ft., i.e.,


1


(x, y) = O for all y e R(1).

Since (Ax, Ay) = (x, y), it follows that (Ax, Ay) = O. As yvaries over all of W1, z = Ay likewise varies over all of 14(1.Hence (Ax, z) = 0 for all z e ml), i.e., Ax e it, We reason analo-gously if Wn is one-dimensional. If WI) is of dimension one, it isof dimension n 1. Again, if Wu is of dimension two, it is ofdimension n 2. Indeed, in the former case, it is the totalityof vectors orthogonal to the vector el, and in the latter case, R isthe totality of vectors orthogonal to the vectors el and e2.

We now find a one-dimensional or two-dimensional invariantsubspace of R, select a basis in it, etc.

In this manner we obtain n pairwise orthogonal vectors of lengthone which form a basis of R. Relative to this basis the matrix ofthe transformation is of the form

1

1cos qpi sin 921

sin go, cos q),,

cos qik sin w,sin 92, cos q)k_

where the +1 on the principal diagonal correspond to one-dimen-sional invariant subspaces and the "boxes"

[cos Ti sin T.]sin T., cos q),

correspond to two-dimensional invariant subspaces This com-pletes the proof of the theorem.



NOTE: A proper orthogonal transformation which represents a rotationof a two-dimensional plane and which leaves the (n 2)-dimensionalsubspace orthogonal to that plane fixed is called a simple rotation. Relativeto a suitable basis its matrix is of the form

1

cos q sin 9)sin yo cos w

1

An improper orthogonal transformation which reverses all vectors ofsome one-dimensional subspace and leaves all the vectors of the (n 1)-dimensional complement fixed is called a simple reflection. Relative to asuitable basis its matrix takes the form

1

Making use of Theorem 5 one can easily show that every orthogonaltransformation can be written as the product of a number of simple rota-tions and simple reflections. The proof is left to the reader.

§ 17. Extremal properties of eigenvalues

In this section we show that the eigenvalues of a self-adjointlinear transformation defined on an n-dimensional Euclideanspace can be obtained by considering a certain minimum problemconnected with the corresponding quadratic form (Ax, x). Thisapproach win, in particular permit us to prove the existence ofeigenvalues and eigenvectors without making use of the theorem

1

1



on the existence of a root of an nth order equation. The extremalproperties are also useful in computing eigenvalues. We shallfirst consider the case of a real space and then extend our resultsto the case of a complex space.

We first prove the following lemma:

LEMMA 1. Let B be a self-adjoint linear transformation on a realspace such that the quadratic form (Bx, x) is non-negative, i.e.,such that

(Bx, x) for all X.If for some vector x = e

(Be, e) = 0,then Be = O.

Proof: Let x = e + th, where t is an arb trary number and h avector. We have(B(e th), e + th) = (Be, e) + t(Be, h) t(Bh, e) + t2(Bh, h)

> O.

Since (Bh, e) = (h, Be) = (Be, h) and (Be, e) -= 0, then 2t(Be, h)t2(Bh, h) 0 for all t. But this means that (Be, h) = O.Indeed, the function at + bt2 with a 0 changes sign at t = O.

However, in our case the expression

2t(Be, h) t2(Bh, h)

is non-negative for all t. It follows that

(Be, h) = O.

Since h was arbitrary, Be = O. This proves the lemma.Let A be a self-adjoint linear transformation on an n-dimensional

real Euclidean space. We shall consider the quadratic form(Ax, x) which corresponds to A on the unit sphere, i.e., on the setof vectors x such that

(x, x) = 1.THEOREM 1. Let A be a selpadjoint linear transformation. Then

the quadratic form (Ax, x) corresponding to A assumes its minimumon the unit sphere. The vector e, at which the minimum

is assumed is an eigenvector of A and A, is the corresponding eigen-value.



Proof: The unit sphere is a closed and bounded set in n-dimen-sional space. Since (Ax, x) is continuous on that set it mustassume its minimum 2, at some point e,. We have

(Ax, x) 2, for (x, x) = 1,

and

(Aei, el) = 2, where (e1, e1) = 1.

Inequality (1) can be rewritten as follows

(Ax, x) 21(x, x), where (x, x) = 1.

This inequality holds for vectors of unit length. Note that if wemultiply x by some number a, then both sides of the inequalitybecome multiplied by a2. Since any vector can be obtained from avector of unit length by multiplying it by some number a, itfollows that inequality (2) holds for vectors of arbitrary length.

We now rewrite (2) in the form

(Ax x) O for all x.

In particular, for x el, we have

(Ae, 21e,, e) = O.

This means that the transformation B = A 21E satisfies theconditions of Lemma 1. Hence

(A 21E)e1 = 0, i.e., Ae, = 21e1.

We have shown that el is an eigenvector of the transformationA corresponding to the eigenvalue 2,. This proves the theorem.

To find the next eigenvalue of A we consider all vectors of Rorthogonal to the eigenvector e,. As was shown in para. 2, § 16(Lemma 2), these vectors form an (n 1)-dimensional subspaceR, invariant under A. The required second eigenvalue A, of A isthe minimum of (Ax, x) on the unit sphere in It. The corre-sponding eigenvector e, is the point in R, at which the minimumis assumed.

Obviously, A, A, since the minimum of a function consideredon the whole space cannot exceed the minimum of the function in asubspace.

We obtain the next eigenvector by solving the same problem in


the (n 2)-dimensional subspace consisting of vectors orthogonalto both e, and e,. The third eigenvalue of A is equal to theminimum of (Ax, x) in that subspace.

Continuing in this manner we find all the n eigenvalues and thecorresponding eigenvectors of A.

It is sometimes convenient to determine the second, third, etc., eigen-vector of a transformation from the extremum problem without referenceto the preceding eigenvectors.

Let A be a self-adjoint transformation. Denote by

A, < A, < 5 An

its eigenvalues and by e eo, , e the corresponding orthonormaleigenvectors.

We shall show that if S is the subs pace spanned by the first k eigenvectors

e1, e2, , ek

then for each x e S the lollowing inequality holds:

A, (x, x) (Ax, x) (x, x).Indeed, let

x = ekek eoeo + + ekeo.

Since Aek = 2,e,, (e ek) = 1 and (ek, e,) O for i k, it follows that

(Ax, x) (A (Eke/ eze, + ¿kek), Eke, + eke, + ' exek)= (Akeke, -L + + Ake ke k) ek + + Ekek)= 4E1' + A2E22 + + Ake k2.

Furthermore, since e,, e ek are orthonormal,

(x, x) 812 + 8,2 + ' ' ek2

and therefore

(Ax, x) = A 2E 22 + + AkEk2 4($12 82' + -E =-

= Adx, x).Similarly,

(Ax, x) 2.(x, x).It follows that

;t,(x, x) (Ax, x) 1.k(x, x).

Now let Rk be a subspace of dimension n k + 1. In § 7 (Lemma ofpara. 1) we showed that if the sum of the dimensions of two subspaces of ann-dimensional space is greater than n, then there exists a vector differentfrom zero belonging to both subspaces. Since the sum of the dimensions ofRk and S is (n k + 1) + k it follows that there exists a vector x,common to both Roc and S. We can assume that xo has unit length, that is,




(x,, x,,) = 1. Since (Ax, x) 4 (x, x) for x e S, it follows that(Axo, xo) 2.

We have thus shown that there exists a vector xo E Rk of unit lengthsuch that

(AX0, Ro) Ak

But then the minimum of (Ax, x) for x on the unit sphere in Rk must beequal to or less than Ak.

To sum up: If Rk is an k 1)-dimensional subspace and x variesover all vectors in R, for which (x, x) = 1, then

min (Ax, x) A,.

Note that among all the subspaces of dimension n k 1 there existsone for which min (Ax, x), (x, x) = I, x e 12.0, is actually equal to Ak.This is the subspace consisting of all vectors orthogonal to the first keigenvectors et, e, , e. Indeed, we showed in this section that min(Ax, x), (x, x) = 1, taken over all vectors orthogonal to et, et, , et,is equal to ;I.,.

We have thus proved the following theorem:

THEOREM. Let R be a (n k + 1)-dimensional subspace of the space R.Then min (Ax, x) for all x elt,, (x, x) = 1, is less than or equal to A,. Thesubspace Rk can be chosen so that min (Ax, x) is equal to A,.

Our theorem can be expressed by the formula

(3) max min (Ax, x) -= 4.Rk (x,

xe Rk

In this formula the minimum is taken over all x e R,, (x, x) = 1, andthe maximum over all subspaces Rk of dimension n k + 1.

As a consequence of our theorem we have:Let A be a sell-adjoint linear transformation and B a postive definite linear

transformation. Let A, A, A be the eigenvalues of A and lel" ,u be the eigenvalues of A -7 B. Then A, f

Indeed(Ax, x) ((A + 13)x, x),

for all x. Hence for any (n k + 1)-dimensional subspace Rk we have

min (Ax, x) min ((A B)x, x).(x, xi=1 X)=-1xe Rk xeRk

It follows that the maximum of the expression on the left side taken overall subspaces Rk does not exceed the maximum of the right side. Since, byformula (3), the maximum of the left side is equal to A, and the maximumof the right side is equal to 1.4, we have 2,

We now extend our results to the case of a complex space.



To this end we need only substitute for Lemma I the follovvinglemma.

LEMMA 2. Let B be a self-adjoint transformation on a complexspace and let the Hermitian form (Bx, x) corresponding to B benon-negative, i.e., let

(Bx, x) 0 foy all x.

If for some vector e, (Be, e) = 0, then Be = O.Proof: Let t be an arbitrary real number and h a vector. Then

(B (e th), e -r- th) 0,

or, since (Be, e) = 0,

t[(Be, h) (Eh, e)] + t2(Bh, h)

for all t. It follows that

(Be, h) (Bh, e) = O.

Since h was arbitrary, we get, by putting ih in place of h,

i(Be, h) i(Bh, e) O.

It follows from (4) and (5) that

(Be, h) = 0,

and therefore Be = O. This proves the lemma.

All the remaining results of this section as well as their proofscan be carried over to complex spaces without change.


CHAPTER III

The Canonical Form of an ArbitraryLinear Transformation

§ 18. The canonical form of a linear transformation

In chapter II we discussedvarious classes of linear transformationson an n-dimensional vector space which have n linearly independ-ent eigenvectors. We found that relative to the basis consistingof the eigenvectors the matrix of such a transformation had aparticularly simple form, namely, the so-called diagonal form.

However, the number of linearly independent eigenvectors ofa linear transformation can be less than n. i (An example of such atransformation is given in the sequel; cf. also § 10, para. 1, Example3). Clearly, such a transformation is not diagonable since, asnoted above, any basis relative to which the matrix of a transfor-mation is diagonal consists of linearly independent eigenvectorsof the transformation. There arises the question of the simplestform of such a transformation.

In this chapter we shall find for an arbitrary transformation abasis relative to which the matrix of the transformation has acomparatively simple form (the so-called Jordan canonical form).In the case when the number of linearly independent eigenvectorsof the transformation is equal to the dimension of the space thecanonical form will coincide with the diagonal form. We nowformulate the definitive result which we shall prove in § 19.

Let A be an arbitrary linear transformation on a complex n-dimen-sional space and let A have k (k n) linearly independent eigen-vectors

We recall that if the characteristic polynomial has n distinct roots,then the transformation has n linearly independent eigenvectors. Hence forthe number of linearly independent eigenvectors of a transformation to beless than n it is necessary that the characteristic polynomial have multipleroots. Thus, this case is, in a sense, exceptional.

132


Ah, = Akhi, Ah, = h, /1h2, , Ah, = 21118.

We see that the linear transformation A described by (2) takes thebasis vectors of each set into linear combinations of vectors in thesame set. It therefore follows that each set of basis vectors gener-ates a subspace invariant under A. We shall now investigate Amore closely.

Every subspace generated by each one of the k sets of vectorscontains an eigenvector. For instance, the subspace generated bythe set e1, , e, contains the eigenvector el. We show thateach subspace contains only one (to within a multiplicativeconstant) eigenvector. Indeed, consider the subspace generatedby the vectors el, e,, , e,, say. Assume that some vector ofthis subspace, ix., some linear combination of the form

c1 e1 + + cpe,,

where not all the c's are equal to zero, is an eigenvector, that is,

A(c,e, + + c,e,) = c2e2 + cae,).

Substituting the appropriate expressions of formula (2) on the leftside we obtain

cdie, c2(e1 21e2) + (e_, 2.1e,) =Ac,e, + /lever

Equating the coefficients of the basis vectors we get a system ofequations for the numbers A, c, e2, c:

2 Clearly, p q ±sn. I f k n, then each set consists of onevector only, namely an eigenvector.

CANONICAL FORM OF LINEAR TRANSFORMATION 133

e, f, h1,

corresponding to the eigenvalues Xi, 22, , A,. Then there exists abasis consisting of k sets of vectors 2

e, , e,; f1, , fq; ; 111, , hrelative to which the transformation A has the form:

Ael = 11e1, Ae, = e, 21e2, , Ae = e,_1 21e;Af, = 22f1, Af, = f, 22f2, At, = 12f,;



cp-14+ 1Cp-1,

cAl = Ac.We first show that A = Al. Indeed, if A A1, then it would followfrom the last equation that c, = 0 and from the remaining equa-tions that c,_1= c,_2= = c2= el= O. Hence A = A1. Sub-stituting this value for A we get from the first equation c2 = 0,from the second, c, = 0, and from the last, c, = O. Thismeans that the eigenvector is equal to cle and, therefore, coincides(to within a multiplicative constant) with the first vector of thecorresponding set.

We now write down the matrix of the transformation (2). Sincethe vectors of each set are transformed into linear combinationsof vectors of the same set, it follows that in the first p columns therow indices of possible non-zero elements are 1, 2, p; in thenext q columns the row indices of possible non zero elements arep + 1, p + 2, , p q, and so on. Thus, the matrix of thetransformation relative to the basis (1) has k boxes along the maindiagonal. The elements of the matrix which are outside theseboxes are equal to zero.

To find out what the elements in each box are it suffices to notehow A transforms the vectors of the appropriate set. We have

Ael = Ale,Ae2 = e1 +

ciAl+ c2r-c2A,± c3 = Ac2,

Ae = e,_, + A1e,_1,Ae ep + Ale,.

Recalling how one constructs the matrix of a transformationrelative to a given basis we see that the box corresponding to theset of vectors e1, e2, , e, has the form

-Al 1 0 0 0O All 0 0

(3) -0 0 0 A, 1

0 0 0 0


The matrix of A consists of similar boxes of orders p, q, -,s, thatis, it has the form

(4)


221 0 00221 0

Here all the elements outside of the boxes are zero.

Although a matrix in the canonical form described above seems morecomplicated than a diagonal matrix, say, one can nevertheless performalgebraic operations on it with relative ease. We show, for instance, how tocompute a polynomial in the matrix (4). The matrix (4) has the form

=

0 0 0 22

k_

where the a, are square boxes and all othur elements are zero. Then

sif2

that is, in order to raise the matrix al to some power all one has to do israise each one of the boxes to that power. Now let P(1) =, ao + ait + +

amtm be any polynomial. It is easy to see that

_211 0 00 A, 1 0

0 0 0 21

2k' 0 00 A, 1 0

0 0 0



[P(ei1)P(s12)

P(s,We now show how to compute P(s1,), say. First we write the matrix si,

in the formst, A,e +

where et is the unit matnx of order p and where the matrix f has the form

r0 1 0 0

.10 I 0=

0 0 0 o 110 0 o 0

We note that the matrices .02, .5.3, , ,0P-I are of the form 2

andfr == JP+, == = 0.

It is now easy to compute P(,(11). In view of Taylor's formula a polynomialP(t) can be written as

P(t)= P(20) (t )0)(t A1)2

P"(À1) + +(tA,)"

-1-v(2.1) -E P"'' (A1),2! n!

where n is the degree of P(t). Substituting for t the matrix sari we get

(st, A1 e)2P(di) = P(Mg + (si, A, e)P( (20 -1-2! P"(11.1)

[0 0 1 0

00000000

0001if 2-2 -

(di I)"n!

But sit, ¿e is. Hence

(Al)

0 00 0[0 0 0

00 000 0

o

oo

P"(20) Put' (20)P(di) = P(A1), Pl()105 + 52 +2! 2! n!

2 The powers of the matrix .1 are most easily computed by observingthat fie, -4 0, Je, = et, , ey_. Hence inei= 0, .332e2=0, ine2=e,,, J'ep = e_,. Similarly, J3e1 J3e, = ene, = 0, Jae, = e,,, J3e =



Recalling that ifP = JP-1 = ' ' = 0, we get

P' (A1) P" (Al) PP-'' (AO-P (A1)

1! 2! 1) !

I" F'''-'' (A,)OP(A)P(211) =1! 2)!

0O O ' P(21)

Thus in order to compute P(d1) where sal, has order p it suffices to knowthe value of P(t) and its first p 1 derivatives at the point A,, where A,is the eigenvalue of si,. It follows that if the matrix has canonical form (4)with boxes of order p, q, , s, then to compute P(d) one has to know thevalue of P(t) at the points t = A,, A2, , A, as well as the values of thefirst p 1 derivatives at A,, the first q 1 derivatives at A,, , and thefirst s 1 derivatives at A,.

§ 19. Reduction to canonical form

In this sect on we prove the following theorem 3:

THEOREM 1. Let A be a linear transformation on a complexn-dimensional space. Then there exists a basis relative to which thematrix of the linear transformation has canonical form. In otherwords, there exists a basis relative to which A has the form (2) (§ 18).

We prove the theorem by induction, i.e., we assume that therequired basis exists in a space of dimension n and show that sucha basis exists in a space of dimension n 1. We need the followinglemma:

LEMMA. Every linear transformation A on an n-dimensionalcomplex space R has at least one (n 1)-dimensional invariantsubspace R'.

Proof: Consider the adjoint At of A. Let e be an eigenvector of A*,

Ate =

We claim that the (n 1)-dimensional subspace R' consisting of

3 The main idea for the proof of this theorem is due to I. G. Petrovsky.See I. G. Petrovsky, Lectures on the Theory of Ordinary Differential Equa-tions, chapter 6.


138 LEcruims ON LINEAR ALGEBRA

all vectors x orthogonal 4) to e, that is, all vectors x for which(x, e) = 0, is invariant under A. Indeed, let x e R', i.e., (x, e) = O.Then

(Ax, e) = (x, Ate) = (x, 2e) = 0,that is, Ax E R'. This proves the invariance of R' under A.

We now turn to the proof of Theorem 1.Let A be a linear transformation on an (n + 1)-dimensional

space R. According to our lemma there exists an n-dimensionalsubspace R' of R, invariant under A. By the induction assumptionwe can choose a basis in R' relative to which A is in canonical form.Denote this basis by

el, e2, , e2, f1, f2, , h2, hwhere p q + + s = n. Considered on R', alone, the trans-formation A has relative to this basis the form

Ae, =1el,Ae2 = e, 11e2,

Aev = ev_, + ¿1e,,Aft = 22f2,Af2 = f, 2.2f2,

Afq = fq-1 + 12f,,

Ah, = 4112,Ah2 = +2,h2,

= h,_, + 21h,.We now pick a vector e wh ch together with the vectors

el, e2, ev; f, f2, ft; ; h, h2, hs

forms a basis in R.Applying the transformation A to e we get4 We assume Itere that R is Euclidean, i.e., that an inner product is

defined on R. However, by changing the proof slightly 've can show thatthe Lemma holds for any vector space R.



Ae = a1ej + /pep + + + + + 61111+ + 6311s + re. 5

We can assume that t = O. Indeed, if relative to some basis A isin canonical form then relative to the same basis A rE is also incanonical form and conversely. Hence if r 0 we can considerthe transformation A rE instead of A.

This justifies our putting

Ae =-- + + ape, + ,81f, ++ + 61111 + 6shs.

We shall now try to replace the vector e by some vector e' so thatthe expression for Aei is as simple as possible. We shall seek e'in the form

e' e ;Cie' Xpep Pl.fr '0»coihi cosh,.

We have

Ae' = Ae A(zlei + + x,,ep) + + Mg]A(wilk + + wshs),

or, making use of (1)

Ae' = i1e1 + + + ß1f1+ + (3,f, + + Óih+ + 8,11, A(x,e, xe) ACtc,f, +

dmuji A(0)1111 + + wshs).

The coefficients xi, zp; pti, , tt,,; ; oh, , co, canbe chosen arbitrarily. We will choose them so that the right sideof (3) has as few terms as possible.

We know that to each set of basis vectors in the n-dimensionalspace R' relative to which A is in canonical form there corresponds

5 The linear transformation A has in the (n 1)-dimensional space Rthe eigenvalues A, and T. Indeed, the matrix of A relative to thebasis el, e,, , e,; f,, f2, , fg; ; h1, h,, , h,, e is triangular withthe numbers A,, A,, ' , 2k, 7- on the principal diagonal.

Since the eigenvalues of a triangular matrix are equal to the entries onthe diagonal (cf. for instance, § 10, para. 4) it follows that , 2., andr are the eigenvalues of A considered on the (n + 1)-dimensional space R.Thus, as a result of the transition from the n-dimensional invariant sub-space R' to the (n + 1)-dimensional space R the number of eigenvalues isincreased by one, namely, by the eigenvalue T.



one eigenvalue. These eigenvalues may or may not be all differentfrom zero. We consider first the case when all the eigenvalues aredifferent from zero. We shall show that in this case we can choosea vector e' so that Ae' = 0, i.e., we can choose xi, , w, so thatthe right side of (3) becomes zero. Assume this to be feasible.Then since the transformation A takes the vectors of each setinto a linear combination of vectors of the same set it must bepossible to select xl, , (0, so that the linear combination ofeach set of vectors vanishes. We show how to choose the coeffi-cients xi, 'Y,2, x, so that the linear combination of the vectorsel, , e, in (3) vanishes. The terms containing the vectorse1, ;, , e, are of the form

+ + ape, A (xi; + + xpep)= i1e1 + ' ,e,

-H 21e2) ' Zp(en-1 ¿lei))= (al X1A1 Z2)e1 (/.2 X221 z3)e2+ + X,-121 Xp)ep-i (;) .-

We put the coefficient of e, equal to zero and determine x, (thiscan be done since Ai 0); next we put the coefficient of e, 1equal to zero and determine etc. In this way the linear combi-nation of the vectors e1, , e, in (3) vanishes. The coefficientsof the other sets of vectors are computed analogously.

We have thus determined e' so that

Ae' = O.

By adding this vector to the basis vectors of R' we obtain a basise'; e 1, e 2, , ev, f 1f, 2, .,f,; h h h1, 2, s

in the (n + 1)-dimensional space R relative to which the transfor-mation is in canonical form. The vector e' forms a separate set.The eigenvalue associated with e' is zero (or 2. if we consider thetransformation A rather than A TE).

Consider now the case when some of the eigenvalues of thetransformation A on R' are zero. In this case the summands onthe right side of (3) are of two types: those corresponding to sets ofvectors associated with an eigenvalue different from zero and thoseassociated with an eigenvalue equal to zero. The sets of the formertype can be dealt with as above; i.e., for such sets we can choose



coefficients so that the appropriate linear combinations of vectorsin each set vanish. Let us assume that we are left with, say, threesets of vectors,e,, e2, , e; fi, f2, " ' , fg; g, g2, g,. whose eigen-values are equal to zero, i.e., 2, = 22 = 23 = O. Then

Ae' = 1e1 + + a, e, p1f1 + + ß0f0 + 71g1(4) + yrg, A(zie, + + xpe)

A(ktifl+ + Itqfq) A (Yigi 4- 4-

Since Al = 22 = A, 0, it follows that

Ae, = 0, el, Ae = ep_1,Af, = 0, Af2 f1, At, = f2_,,Ag, = 0, Ag2= gi, Agr

Therefore the linear combination of the vectors el, e2, ,e,,appearing on the right side of (4) will be of the form

cc1e1 x2e2 + + x2e2 x3e2

By putting z, = a, Za= ac2, , = sc_, we annihilate allvectors except ape,. Proceeding in the same manner with thesets f, fq and g, ,g, we obtain a vector e' such that

Ae' = x,e fl,f,

It might happen that a = = O. In this case wearrive at a vector e' such that

Ae' =

and just as in the first case, the transformation A is already incanonical form relative to the basis e'; el, , es,; f, , f2;

; 111, , hs. The vector e', forms a separate set and isassociated with the eigenvalue zero.

Assume now that at least one of the coefficients x, Y

different from zero. Then, in distinction to the previous cases, itbecomes necessary to change some of the basis vectors of R'.We illustrate the procedure by considering the case x, ßq, y, O

andfi > q> r. We form a new set of vectors by putting = e',e' = e',_, = Ae', , , e', = Ae',. Thus



= Aet_7+2 = Gip ep_,.±, + 41, f,.+1 y,g1,= Aet_.+1 = e,_, + fg_r,

e'l = Ae'2 = cc, ei.

We now replace the basis vectors e', e1, e2, , ep by the vectors

e'1, e'2,

and leave the other basis vectors unchanged. Relative to the newbasis the transformation A is in canonical form. Note that theorder of the first box has been increased by one. This completesthe proof of the theorem.

While constructing the canonical form of A we had to distinguishtwo cases:

The case when the additional eigenvalue r (we assumedt = 0) did not coincide with any of the eigenvalues 2,, , 2 .In this case a separate box of order 1 was added.

The case when -c coincided with one of the eigenvalues).1 , , 4. Then it was necessary, in general, to increase the orderof one of the boxes by one. If = tig y,. = 0, then just as inthe first case, we added a new box.

§ 20. Elementary divisors

In this section we shall describe a method for finding the Jordancanonical form of a transformation. The results of this section willalso imply the (as yet unproved) uniqueness of the canonical form.

DEFINITION 1. The matrices sir and .5:11 =- tr'-isfl, where is anarbitrary non-singular matrix are said to be similar.

If the matrix (.911 is similar to the matrix a2, then a2 is alsosimilar to at,. Indeed, let

=Then

= wsgtir-1

e'e' = cc,e. + ßf, + yrgr,



If we put W-1 r, we obtain

Si2 =i.e., s4t2 is similar to sit,.

It is easy to see that if two matrices a, and at2 are similar tosome matrix d, then sit, is similar to Sly Indeed let

= 1S114, d-= Z'2-1 d2%5.2

Then r1-1s11r1 W2-1a2r,2, i.e.,

al 2W 2Z'

Putting W2W1-1 = 46', we get

=i.e., sdf is similar to

Let S be the matrix of a transformation A relative to somebasis. If 56. is the matrix of transition from this basis to a new basis(§ 9), then V-1,WW is the matrix which represents A relative to thenew basis. Thus similar matrices represent the same linear trans-formation relative to different bases.

We now wish to obtain invariants of a transformation from itsmatrix, i.e., expressions depending on the transformation alone.In other words, we wish to construct functions of the elements of amatrix which assume the same values for similar matrices.

One such invariant was found in § 10 where we showed that thecharacteristic polynomial of a matrix se, i.e., the determinant ofthe matrix d At,

D(1) =is the same for ..uf and for any matrix similar to S. We now con-struct a whole system of invariants which will include the charac-teristic polynomial. This will be a complete system of invariants inthe sense that if the invariants in question are the same fortwo matrices then the matrices are similar.

Let S be a matrix of order n. The kth order minors of thematrix sir 24' are certain polynomials in 2. We denote byD,(2) the greatest common divisor of those minors. 6 We also put

The greatest common divisor is determined to within a numericalmultiplier. We choose Dk(A) to be a monic polynomial. In particular, ifthe hth order ininors are pairwise coprime we take Di(A) to be I.



Do (A) = 1. In particular D(2) is the determinant of the matrixAe. In the sequel we show that all the 13,(2) are invariants.

We observe that D_1(1) divides D (2). Indeed, the definition ofD_1(2) implies that all minors of order n 1 are divisible byD ,(2). If we expand the determinant D(2) by the elements ofany row we obtain a sum each of whose summands is a product ofan element of the row in question by its cofactor. It follows thatD(X) is indeed divisible by D_1(2). Similarly, D 1(1) is divisibleby D.2 (A), etc.

EXERCISE. Find D(2) (k = 1,2, 3) for the matrix

rooA10 o ].

o A,

Answer: D3(2.).. (A A0)3, 132(A) = D1(2) 1.

LEMMA 1. If is an arbitrary non-singular matrix then thegreatest common divisors of the kth order minors of the matrices

AS,%c (21 AS) and (I 2e)w are the same.Proof: Consider the pair of matrices sí At and (.2, AS)W

If a, are the entries of st xe and a'1 are the entries of- Ae)r , then

=

i.e., the entries of any row of (si 2e)w. are linear combinations ofthe rows of st AC with coefficients from , i.e., independent ofA. It follows that every minor of (a 2e)w is the sum of minorsof a - AS each multiplied by some number. Hence every divisorof the kth order minors of alt AS must divide every kth orderminor of (st 2e)%'. To prove the converse we apply the samereasoning to the pair of matrices (sit AS)W and [(s1 xe)wx-i

S - 2da. This proves that the greatest common divisors ofthe kth order minors of a - Ag and (s1 26')%" are the same.

LEMMA 2. For similar matrices the polynomials D,(2) areidentical.

Proof: Let se and sit = W-Isiff be two similar matrices. ByLemma 1 the greatest common divisor of the kth order minorsS - Ae is the same as the corresponding greatest common divisor



for (saf Ad)W. An analogous statement holds for the matricesAS) and W-I(S1 - AS)S = AS. Hence the D,(2)

for si and at are identical.

In view of the fact that the matrices which represent a trans-formation in different bases are similar, we conclude on the basisof Lemma 2 that

THEOREM 1. Let A be a linear transformation. Then the greatestcommon divisor Dk(A) of the kth order minors of the matrix se ;te,where at represents the transformation A in some basis, does notdepend on the choice of basis.

We now compute the polynomials WA) for a given linear trans-formation A. Theorem 1 tells us that in computing the D,(2)we may use the matrix which represents A relative to an arbitrarilyselected basis. We shall find it convenient to choose the basisrelative to which the matrix of the transformation is in Jordancanonical form. Our task is then to compute the polynomial1),(2) for the matrix si in Jordan canonical form.

We first find the D(2) for an nth order matrix of the form

20 1 o oO 1 0

(1)0 0 0 - 1

0 0 0

i.e., for one "box" of the canonical form. Clearly D(2)= (A A0)n. If we cross out in (1) the first column and the lastrow we obtain a matrix sill with ones on the principal diagonal andzeros above it. Hence D1(2) =- 1. If we cross out in sli likenumbered rows and columns we find that D ,(1) = = D1(A)= 1. Thus for an individual "box" [matrix (1)] the D,(2) are

(A /10)4, 1, 1, , 1.

We observe further that if .R is a matrix of the form

Q10

where .1, and .42 are of order n, and n,, then the mth order non-zero



minors of the matrix 94 are of the form

d, = A(2) , "21 + M2 = M.

Here 4(1) are the minors of of order m, and 4(2) the minors of -42mi

of order m2.7 Indeed, if one singles out those of the first n, rowswhich enter into the minor in question and expands it by theserows (using the theorem of Laplace), the result is zero or is of theform A (2) A m(2) .

We shall now find the polynomials D,(1) for an arbitrary matrixsi which is in Jordan canonical form. We assume that al has pboxes corresponding to the eigenvalue A, q boxes correspondingto the eigenvalue 22, etc. We denote the orders of the boxescorresponding to the eigenvalue Al by n1, n2, , n, (n, n2> > nv).

Let R, denote the ith box in a' = si AC. Then ,42, say, is ofthe form

A, A 1 0 O

O 1 O

=O O O I

0 0 A, A_

We first compute 1),(2), i.e., the determinant of a. This determi-nant is the product of the determinants of the i.e.,

D1(2) = (A )1)1'1+7'2'4- (1 22)mi±m2+-+mq

We now compute Dn_1(2). Since D0_1(2) is a factor of D(A), itmust be a product of the factors A , A 22, . The problemnow is to compute the degrees of these factors. Specifically, wecompute the degree of A A in D1(2). We observe that anynon-zero minor of M = si Ae is of the form

= 4 M2) zlik.),

where t, t2 + + tk = n I and 4) denotes the t,th orderminors of the matrix ,2,. Since the sum of the orders of the minors

7 Of course, a non-zero kth order minor of d may have the form 4 k(,i.e., it may he entirely made up of elements of a,. In this case we shallwrite it formally as z1 = 4725 where zlo,22 --- 1.



M, , zfik) is n 1, exactly one of these minors is oforder one lower than the order of the corresponding matrix .4,,i.e., it is obtained by crossing out a row and a column in a box ofthe matrix PI. As we saw (cf. page 145) crossing out an appropriaterow and column in a box may yield a minor equal to one. Thereforeit is possible to select 47,1 so that some 4 is one and the remainingminors are equal to the determinants oif the appropriate boxes.It follows that in order to obtain a minor of lowest possible degreein A Al it suffices to cross out a suitable row and column in thebox of maximal order corresponding to Al. This is the box of ordern. Thus the greatest common divisor D 2(A) of minors of ordern 1 contains A A1 raised to the power n2 + n, n.

Likewise, to obtain a minor 4n-2 of order n 2 with lowestpossible power of A A, it suffices to cross out an appropriate rowand column in the boxes of order n, and n, corresponding to A,.Thus D2(2) contains A A, to the power n, n, + + n,etc. The polynomials D_(2), D_ 1(2), , D1(A) do not con-tain A A, at all.

Similar arguments apply in the determination of the degrees of22, A,, in WA).

We have thus proved the following result.If the Jordan canonical form of the matrix of a linear transforma-

tion A contains p boxes of order n,, n2, , n(n2 n, ./zi,)

corresponding to the eigenvalue A1, q boxes of order ml, m2, , mm2 m,) corresponding to the eigenvalue A2, etc., then

Da (A) (A A1)n,2+n2+--- +5 (A A2r,-Ern2-3-m3+ +mg

D_1(A) (A Ann2+.3+ -Enp (A 22),n2+.2+- +mg

= (A Aira+ +"' (A Az)na' +ma

Beginning with D_,(2) the factor (A 2,) is replaced by one.Beginning with Dn_ 2(2) the factor (2 A2) is replaced by one,etc.

In the important special case when there is exactly one box oforder n, corresponding to the eigenvalue A1, exactly one box oforder m, corresponding to the eigenvalue A2, exactly one box oforder k, corresponding to the eigenvalue A3, etc., the D,(A) havethe following form.



The expressions for the D1(A) show that in place of the D,(2) it ismore convenient to consider their ratios

,(2.)E ,(2) .

D k 19)The E1(1) are called elementary divisors. Thus if the Jordancanonical form of a matrix d contains p boxes of order n, n2, ,n(ni n, >: n) corresponding to the eigenvalue A, q boxesof order mi., m2, m, (m1 m2_> mg) correspondingto the eigenvalue 22, etc., then the elementary divisors E1(A) are

En(2) (2 21)" (2 22)' ',En-1(2) =- (A Al)"2 (A 22)m

E n-2(2) = (A Ai)"a(A 22)ma *,

Prescribing the elementary divisors E(2), E 2(2) , , deter-mines the Jordan canonical form of the matrix si uniquely.The eigenvalues 2 are the roots of the equation E(2). Theorders n1, n2, n of the boxes corresponding to the eigenvalueA, coincide with the powers of (2 in E(2), E_1(2), .

We can now state necessary and sufficient conditions for theexistence of a basis in which the matrix of a linear transformationis diagonal.

A necessary and sufficient condition for the existence of a basis inwhich the matrix of a transformation is diagonal is that the elementarydivisors have simple roots only.

Indeed, we saw that the multiplicities of the roots 21, 22, ,of the elementary divisors determine the order of the boxes in theJordan canonical form. Thus the simplicity of the roots of theelementary divisors signifies that all the boxes are of order one,i.e., that the Jordan canonical form of the matrix is diagonal.

THEOREM 2. For two matrices to be similar it is necessary andsufficient that they have the same elementary divisors.

Da(A) = (2 2)'(A 22)m ' (2 23)"'D _1(2) = 1D _2 (2) = 1



Proof: We showed (Lemma 2) that similar matrices have thesame polynomials D,(2) and therefore the same elementarydivisors E k(A) (since the latter are quotients of the 13,(2)).

Conversely, let two matrices a' and a have the same elementarydivisors. ,sat and a are similar to Jordan canonical matrices.Since the elementary divisors of d and are the same, theirJordan canonical forms must also be the same. This means thata' and a are similar to the same matrix. But this means thata' and :a are similar matrices.

THEOREM 3. The Jordan canonical form of a linear transformationis uniquely determined by the linear transformation.

Proof: The matrices of A relative to different bases are similar.Since similar matrices have the same elementary divisors andthese determine uniquely the Jordan canonical form of a matrix,our theorem follows.

We are now in a position to find the Jordan canonical form of amatrix of a linear transformation. For this it suffices to find theelementary divisors of the matrix of the transformation relativeto some basis. When these are represented as products of the form(X AO" (A AS' we have the eigenvalues as well as the orderof the boxes corresponding to each eigenvalue.

§ 21. Polynomial matrices

1. By a polynomial matrix we mean a matrix whose entries arepolynomials in some letter A. By the degree of a polynomialmatrix we mean the maximal degree of its entries. It is clear thata polynomial matrix of degree n can be written in the form

+ + A0,

where the A, are constant matrices. 8 The matrices A AE

which vvt considered on a number of occasions are of this type.The results to be derived in this section contain as special casesmany of the results obtained in the preceding sections for matricesof the form A ¿E.

In this section matrices are denoted by printed Latin capitals.



Polynomial matrices occur in many areas of mathematics. Thus, forexample, in solving a system of first order homogeneous linear differentialequations with constant coefficients

(I)dy,dx let alkYk

= 1, 2, n)

we seek solutions of the form

(2) Yk = ckeAx, (2)

where A and ck are constants. To determine these constants we substitutethe functions in (2) in the equations (1) and divide by eA.z. We are thus ledto the following system of linear equations:

71

iCj = agkekk=1

The matrix of this system of equations is A ilE, with A the matrix ofcoefficients in the system (1). Thus the study of the system of differentialequations (1) is closely linked to polynomial matrices of degree one, namely,those of the form A AE.

Similarly, the study of higher order systems of differential equations leadsto polynomial matrices of degree higher than one. Thus the study of thesystem

d2yk n dyk n2+ an, + E bik + czkyk O

k=1 dx2 k=1 dx k=1

is synonymous with the study of the polynomial matrix AA% + 132 + C,where A -= 16/.0, B = C = 11c3k1F.

We now consider the problem of the canonical form of polyno-mial matrices with respect to so-called elementary transformations.

The term 'elementary" applies to the following classes of trans-formations.

Permutation of two rows or columns.Addition to some row of another row multiplied by some

polynomial yo (A) and, similarly, addition to some column of anothercolumn multiplied by some polynomial.

Multiplication of some row or column by a non-zero constant.DEFINITION 1. Two polynomial matrices are called equivalent if it

is possible to obtain one from the other by a finite number of ele-mentary transformations.

The inverse of an elementary transformation is again an elemen-tary transformation. This is easily seen for each of the three types



of elementary transformations. Thus, e.g., if the polynomialmatrix B(A) is obtained from the polynomial matrix A(2) by apermutation of rows then the inverse permutation takes B(A)into A(A). Again, if B(A) is obtained from A(2) by adding theith row multiplied by q)(2) to the kth row, then A (A) can be ob-tained from B(A) by adding to the kth row of B(A) the ith rowmultiplied by a.(A).

The above remark implies that if a polynomial matrix K (A) isequivalent to L (A), then L (A) is equivalent to K (A). Indeed, ifL(A) is the result of applying a sequence of elementary transfor-mations to K (A), then by applying the inverse transformations inreverse order to L(2) we obtain K(2).

If two polynomial matrices K1(A) and K2(A) are equivalent to athird matrix K (A), then they must be equivalent to each other.Indeed, by applying to K, (A) first the transformations which takeit into K (A) and then the elementary transformations which takeK(2) into K,(A), we will have taken K1(2) into K, (A) . Thus K, (A)and K2(A) are indeed equivalent.

The main result of para. I of this section asserts the possibility ofdiagonalizing a polynomial matrix by ineans of elementarytransformations. We precede the proof of this result with thefollowing lemma:

LEMMA. If the elentent a11(2) of a polynomial matrix A (A) is notzero and if not all the elements a(2) of A(A) are divisible by a(A),then it is possible to find a polynomial matrix B (A) equivalent to A (A)and such that b11(A) is also different from zero and its degree is lessthan that of au (2).

Proof: Assume that the element of A (A) vvhich is not divisible bya (2) is in the first row. Thus let a(2) not be divisible by a (2) .Then a(A) is of the form

a1fr(2) = a11(2)02) b(2),

where b (A) f O and of degree less than au(A). Multiplying the firstcolumn by q(A) and subtracting the result from the kth column,we obtain a matrix with b(A) in place of a11(2), where the degree ofb (A) is less than that of a11 (A) . Permuting the first and kthcolumns of the new matrix puts b(A) in the upper left corner andresults in a matrix with the desired properties. We can proceed in



an analogous manner if the element not divisible by a11(2) is in thefirst column.

Now let all the elements of the first rovy and column be divisibleby a1(2) and let a,,(2) be an element not divisible by an(A). Wewill reduce this case to the one just considered. Since a11(2.) isdivisible by a11(2), it must be of the form a1(A) = (2)a1(2). Ifwe subtract from the ith row the first row multiplied by 92(2),then ;1(2) is replaced by zero and a,(2) is replaced by a' ,(2)

92(2)a(2) which again is not divisible by an (2) (thisbecause we assumed that a(2) is divisible by an(A)). We nowadd the ith row to the first row. This leaves a11(2) unchanged andreplaces a(2) with a(2.) + a' i,(2) = ai,(A)(1 T(2)) + (42*(2).Thus the first row now contains an element not divisible by a(A)and this is the case dealt with before. This completes the proof ofour lemma.

In the sequel we shall make use of the following observation.If all the elements of a polynomial matrix B (A) are divisible bysome polynomial E (A), then all the entries of a matrix equivalentto B (A) are again divisible by E (A).

We are now in a position to reduce a polynomial matrix todiagonal form.

We may assume that a11(2) O. Otherwise suitable permuta-tion of rows and columns puts a non-zero element in place ofau(A). If not all the elements of our matrix are divisible by ail (A) ,

then, in view of our lemma, we can replace our matrix with anequivalent one in which the element in the upper left corner is oflower degree than a11(A) and still different from zero. Repeatingthis procedure a finite number of times we obtain a matrix B (A)all of whose elements are divisible by bll(A).

Since b(A), , b(A) are divisible by b11(2), we can, by sub-tracting from the second, third, etc. columns suitable multiples ofthe first column replace the second, third, nth element of thefirst row with zero. Similarly, the second, third, nth elementof the first column can be replaced with zero. The new matrixinherits from B (A) the property that all its entries are divisible byb ,1(2). Dividing the first row by the leading coefficient of b11(2)replaces b11(2) with a monic polynomial E1(2) but does not affectthe zeros in that row.


(3)


We now have a matrix of the form

Oc2(A) c3(2.) c nn(2)_

all of whose elements are divisible by E1(2).We can apply to the matrix of order n 1 the same proce-

dure which we just applied to the matrix of order n. Then c22(A)is replaced by a monic polynomial E, (A) and the other c().) in thefirst row and first column are replaced with zeros. Since theentries of the larger matrix other than E1(2) are zeros, an elemen-tary transformation of the matrix of the c, can be viewed as anelementary transformation of the larger matrix. Thus we obtain amatrix whose "off-diagonal" elements in the first b,vo rows andcolumns are zero and whose first two diagonal elements are monicpolynomials E,(2), E, (A) , with E2(A) a multiple of E, (A). Repetitionof this process obviously leads to a diagonal matrix. This proves

THEOREM 1. Every polynomial matrix can be reduced by elemen-tary transformations to time diagonal form

E1(2)O E2(A) 0 O

(4) 0 0 E3(2)

0 0 0 E(2)_

Here lije diagonal elements Ek(A) are monic polynomials and El (X)divides E2(2), E, (A) divides E3(2.), etc. This form of a polynomialmatrix is called its canonical diagonal form.

It may, of course, happen thatEr+,(2) = E.,.+2(2) = =

for some value of Y.REMARK: We have brought A(A) to a diagonal form in which

every diagonal element is divisible by its predecessor. If we dis-pense with the latter requirement the process of diagonalizationcan be considerably simplified.

(2) 00 (722(1) c23(2) c2(A)O c32(2) c33(2) c,(;.)



Indeed, to replace the off-diagonal elements of the first row andcolumn with zeros it is sufficient that these elements (and not allthe elements of the matrix) be divisible by a(2). As can be seenfrom the proof of the lemma this requires far fewer elementarytransformations than reduction to canonical diagonal form. Oncethe off-diagonal elements of the first row and first column are allzero we repeat the process until we reduce the matrix to diagonalform. In this way the matrix can be reduced to various diagonalforms; i.e., the diagonal form of a polynomial matrix is notuniquely determined. On the other hand we will see in the nextsection that the canonical diagonal form of a polynomial matrix isuniquely determined.

EXERCISE. Reduce the polynomial matrix

21 0L 0 _2j ' 21

to canonical diagonal form.A nswer :

O

[01 (A 412 A2)].

2. In this paragraph we prove that the canonical diagonalform of a given matrix is uniquely determined. To this end weshall construct a system of polynomials connected with the givenpolynomial matrix which are invariant under elementary trans-formations and which determine the canonical diagonal formcompletely.

Let there be given an arb trary polynomial matrix. Let D,(2.)denote the greatest common divisor of all kth order minors of thegiven matrix. As before, it is convenient to put Do(A) 1. SinceD,(2) is determined to within a multiplicative constant, we takeits leading coefficient to be one. In particular, if the greatestcommon divisor of the kth order minors is a constant, we takeD,(2) = 1.

We shall prove that the polynomials D, (A) are invariant underelementary transformations, i.e., that equivalent matrices havethe same polynomials D,(1.).

In the case of elementary transformations of type 1 whichpermute rows or columns this is obvious, since such transformationseither do not affect a particular kth order minor at all, or change



its sign or replace it with another kth order minor. In all thesecases the greatest common divisor of all kth order minors remainsunchanged. Likewise, elementary transformations of type 3 donot change D,(2) since under such transformations the minors areat most multiplied by a constant. Now consider elementarytransformations of type 2. Specifically, consider addition of thejth column multiplied by T(A) to the ith column. If some particularkth order minor contains none of these columns or if it containsboth of them it is not affected by the transformation in question.If it contains the ith column but not the kth column we can writeit as a combination of minors each of which appears in the originalmatrix. Thus in this case, too, the greatest common divisor of thekth order minors remains unchanged.

If all kth order minors and, consequently, all minors of orderhigher than k are zero, then we put 13,(A) = Dk±i(A) =

D(2) = O. We observe that equality of the /k(A) for all

equivalent matrices implies that equivalent matrices have thesame rank.

We compute the polynomials D,(2) for a matrix in canonical

form

[Ei(i) 0

O E2(2)

E(2)

I

We observe that in the case of a diagonal matrix the only non-zero minors are the principal minors, that is, minors made up oflike numbered rows and columns. These minors are of the form

(2)E,.2(2.) E2k(2).Since E2(2) is divisible by E1(2), E3(2) is divisible by E2(2), etc.,

it follows that the greatest common divisor D1(A) of all minors oforder one is Ei(A). Since all the polynomials Ek(A) are divisible

by E1(2) and all polynomials other than E1(2) are divisible byE2 (A), the product Ei(A)Ei(A)(i < j) is always divisible by theminor E,(A)E,(A). Hence D2(A) = E,(À)E,(A). Since all E,(4other than E1(11) and E2(A) are divisible by E3(2.), the productE ,(1.)E,(2.)E,(2) (i < j < k) is divisible by the minor

E1(A)E2(A)E3(2) and so Da(A) = E,(;t)E,(A)E,(A).

(5)



Thus for the matrix (4)

(6) D k(2) = E1(A)E2(A) ' Eh(A) (k = 1, 2, , n).

Clearly, if beginning vvith some value of r

Er,1(2) = E,.+2(A) = = En(2) = 0,then

D1(A) = Dr+2(2) = = Da(A) = 0.

Thus the diagonal entries of a polynomial matrix in canonicaldiagonal form (5) are given by the quotients

D,(2)

Ek(2) Dk-1(2)

Here, if Dr±i(A) = = D(2) = 0 we must put E.±1(2)= = E(2) = O.

The polynomials Ek(2) are called elementary divisors. In § 20 wedefined the elementary divisors of matrices of the form A 2E.

THEOREM 2. The canonical diagonal form of a polynomial matrixA (A) is uniquely determined by this matrix. If D k(2) (k = 1, 2,

, r) is the greatest common divisor of all kth order minors of A (A)and D,1(1) = = D(2) = 0, then the elements of the canonicaldiagonal form (5) are defined by the formulas

D ,(2)Ek.(2) (k 1, 2, , r),

D k-1 (A)

= E+(A) = = E (2) = O.

Proof: We showed that the polynomials Dk(2) are invariantunder elementary transformations. Hence if the matrix A(A) isequivalent to a diagonal matrix (5), then both have the sameDk(A). Since in the case of the matrix (5) we found that

D,(2) = E1(2) Ek(2) (k 1, 2, , r, r n)

and that

Dr+1(2) = D+2(A) = = D(2) = 0,the theorem follows.

COROLLARY. A necessary and sufficient condition for two polyno-


CANONICAL FORM OF LINEAR TRANSFORMATIoN 157

mial matrices A (A) and E(A) to be equivalent is that the polyno alsDi(A), D2(2), , .13(2) be the same for both matrices.

Indeed, if the polynomials D,(2) are the same for A(A) and B (A),then both of these matrices are equivalent to the same canonicaldiagonal matrix and are therefore equivalent (to one another).

3. A polynomial matrix P(2) is said to be invertible if the matrix[P(2)]-1 is also a polynomial matrix. If det P (A) is a constant otherthan zero, then P (A) is invertible. Indeed, the elements of theinverse matrix are, apart from sign, the (n 1)st order minorsdivided by det P(2). In our case these quotients would be poly-nomials and [P (2)J-1 would be a polynomial matrix. Conversely,if P (A) is invertible, then det P(2) = const O. Indeed, let[P (2)1-1- = Pi(A). Then det P (A) det (A) = 1 and a product oftwo polynomials equals one only if the polynomials in question arenon-zero constants. We have thus shown that a polynomial matrixis invertible if and only if its determinant is a non-zero constant.

All invertible matrices are equivalent to the unit matrix.Indeed, the determinant of an invertible matrix is a non-zeroconstant, so that Da(A) = 1. Since D(2) is divisible by WMD,(2) = 1 (k = 1, 2, , n). It follows that all the elementarydivisors E(2) of an invertible matrix are equal to one and thecanonical diagonal form of such a matrix is therefore the unitmatrix.

THEOREM 3. Two polynomial matrices A (A) and B(2) areequivalent if and only if there exist invertible polynomial matricesP(2) and Q(A) such that.

(7) A(A) = P (2)B (2)Q (2).

Proof: We first show that if A (A) and B(2) are equivalent, thenthere exist invertible matrices P(A) and Q(A) such that (7) holds.To this end we observe that every elementary transformation of apolynomial matrix A(2) can be realized by multiplying A(2) on theright or on the left by a suitable invertible polynomial matrix,namely, by the matrix of the elementary transformation in ques-tion.

We illustrate this for all three types of elementary transforma-tions. Thus let there be given a polynomial matrix A(2)



Pan(2)

a12(2) '

A(A)

ain(2)an(2) a22(2) a2(2)

a,(2) a2(11.) ann(2)

To permute the first two columns (rows) of this matrix, we mustmultiply it on the right (left) by the matrix

0 1 0 0

1 0 0 0

(8) 0 0 1 0

obtained by permuting the first two columns (or, what amountsto the same thing, rows) of the unit matrix.

To multiply the second column (row) of the matrix A (A) by somenumber a we must multiply it on the right (left) by the matrix

1 0 0

0 a 0 0

(8) 0 0 1

0 0 0 1_obtained from the unit matrix by multiplying its second column(or, what amounts to the same thing, row) by a.

Finally, to add to the first column of A (A) the second columnmultiplied by q(A) we must multiply A(A) on the right by thematrix

1 0 0 0T(2) 1 0 0

(10) 0 0 1 0

0 0 0 1

obtained from the unit matrix by just such a process. Likewiseto add to the first row of A(A) the second row multiplied by 9/(A)we must multiply A(A) on the left by the matrix

(T(2)

0 1

(11) 0 0

0 0

0 00 0

1 0

0 1



obtained from the unit matrix by just such an elementary trans-formation.

As we see the matrices of elementary transformations are obtained byapplying an elementary transformation to E. To effect an elementarytransformation of the columns of a polynomial matrix A(A) 've must multi-ply it by the matrix of the transformation on the right and to effect anelementary transformation of the rows of A (A) we must multiply it by theappropriate matrix on the left.

Computation of the determinants of the matrices (8) through(11) shows that they are all non-zero constants and the matricesare therefore invertible. Since the determinant of a product ofmatrices equals the product of the determinants, it follows thatthe product of matrices of elementary transformations is aninvertible matrix.

Since we assumed that A (A) and B (A) are equivalent, it must bepossible to obtain A(A) by applying a sequence of elementarytransformations to B (A). Every elementary transformation canbe effected by multiplying B(A) by an invertible polynomialmatrix. Consequently, A(A) can be obtained from B (A) bymultiplying the latter by some sequence of invertible polynomialmatrices on the left and by some sequence of invertible polynomialmatrices on the right. Since the product of invertible matrices isan invertible matrix, the first part of our theorem is proved.

It follows that every invertible matrix is the product of matricesof elementary transformations. Indeed, every invertible matrixQ (A) is equivalent to the unit matrix and can therefore be writtenin the form

Q(2) = 131(2)EP2(2)where 132(2) and P2(A) are products of matrices of elementarytransformations. But this means that Q (A) = Pi (A)P,(2) is itself aproduct of matrices of elementary transformations.

This observation can be used to prove the second half of ourtheorem. Indeed, let

A(A) = P(A)B(A)Q(A),

where P (A) and 0(A) are invertible matrices. But then, in view ofour observation, A(A) is obtained from B(1) by applying to thelatter a sequence of elementary transformations. Hence A(A) isequivalent to B (A), which is what we wished to prove.



4.9 In this paragraph we shall study polynomial matrices of theform A AE, A constant. The main problem solved here is thatof the equivalence of polynomial matrices A 2E and B AE ofdegree one. w

It is easy to see that if A and B are similar, i.e., if there exists anon-singular constant matrix C such that B C-1AC, then thepolynomial matrices A AE and B AE are equivalent. Indeed,if

B C-i AC,then

B 2E = C-1(A 2E)C.

Since a non-singular constant matrix is a special case of aninvertible polynomial matrix, Theorem 3 implies the equivalenceof A AE and B 2E.

Later we show the converse of this result, namely, that theequivalence of the polynomial matrices A AE and B AE

implies the similarity of the matrices A and B. This will yield,among others, a new proof of the fact that every matrix is similarto a matrix in Jordan canonical form.

We begin by proving the following lemma:

LEMMA. Every polynomial matrix

P(2) = P02" + 1312"-1 + + P

can be ditiided on the left by a matrix of the form A AE (A anyconstant matrix); i.e., there exist matrices 5(2) and R (R constant)such that

P (A) = (A AE)S(2) + R.

The process of division involved in the proof of the lemma differsfrom ordinary division only in that our multiplication is non-commutative.

This paragraph may be omitted since it contains an alternate proof,independent of § 19, of the fact that every matrix can be reduced to Jordancanonical form.

to Every polynomial matrix A, + ¿A, with det A1 O is equivalent to amatrix of the form A ¿E. Indeed, in this case A, + 2A, = A,X ( A, -1A, -- 2E) and if we denote A,-,A, by A we have A, j- 2A1= A, (A - ¿E) which implies (Theorem 3) the equivalence of A, J AA,and A 2E.


Let

P(A) = P02" P,

where the P, are constant matrices.It is easy to see that the polynomial matrix

P(A) + (A AE)P02"-1

is of degree not higher than n 1.

If

P(2) + (A AE)P,An-i- -=- 13102"-I P'12"-2 + + P'_,,

then the polynomial matrixP(A) + (A AE)P02"-1 + (A AE)P'0An-2

is of degree not higher than n 2. Continuing this process weobtain a polynomial matrix

P(2) + (A 2E) (P02"-1 P'02"-2 + .)

of degree not higher than zero, i.e., independent of X. If R denotesthe constant matrix just obtained, then

P (A) = (A 2E) (P0An-1 P'02"-2 + ) R,

or putting S (2) ( P02" P'0211-2 + .),

P(2) = (A AE)S(2) + R.

This proves our lemma.

A similar proof holds for the possibility of division on the right;i.e., there exist matrices S1(A) and R1 such that

P(2) -= S,(2) (A AE)

We note that in our case, just as in the ordinary theorem of Bezout,can claim that

R = R, = P(A).

THEOREM 4. The polynomial matrices A AE and B AE areequivalent if and only if the matrices A and B are similar.

Proof: The sufficiency part of the proof was given in thebeginning of this paragraph. It remains to prove necessity. Thismeans that we must show that the equivalence of A 2E andB AE implies the similarity of A and B. By Theorem 3 thereexist invertible polynomial matrices P(2) and Q(A) such that




(12) B 2E = P(2)(A 2E)Q(2).

We shall first show that 11 (A) and Q(2) in (12) may be replaced byconstant matrices.

To this end we divide P(2) on the left by B AE and Q(2) byB AE on the right. Then

P(2) = (B AE)P, (2) + P0,

Q(2) -- Q1(2)(B 2E) + Qo,where Po and Q0 are constant matrices.

If we insert these expressions for P(2) and Q(2) in the formula(12) and carry out the indicated multiplications we obtain

B AE = (B 2E)P1 (2) (A 2E)121(2)(B 2E)+(B 2E)P1 (2) (A 2E)Q0 + Po(A 2E)Q1(2)(B 2E)

+ 130(A 2E)Q0.

If we transfer the last summand on the right side of the aboveequation to its left side and denote the sum of the remainingterms by K (2), i.e., if we put

K (2) = (B AE)P, (2) (A 2E)Q1 (2) (B 2E)(B 2E)P1(2)(A 2E)Q0

+ Po(A 2E)Q1(2)(B 2E),

then we getB AE Po(A 2E)Q0 -= K(2).

Since Q1(2)(B 2E) + Q0 = Q(2), the first two summands inK(2) can be written as follows:

(B 2E)P1(2) (A 2E)Q1 (2) (B 2E)

+ (B AE)1J1 (2) (A 2E)Q0 = (B 2E)P1 (2) (A 2E)Q (2).

We now add and subtract from the third summand in K(2) theexpression (B 2E)P1 (A) (A 2E)Q1(2)(B 2E) and find

K(2) = (B AE)P, (2) (A AE)Q (2)

P (A) (A 2E)Q1(2)(B 2E)(B 2E)P1(2)(A 2E)Q1(2)(B 2E).

But in view of (12)(A AE)Q (A) P-1(2)(B 2E),P(A)(A 2E) = (B 2E)Q-/(2).

Using these relations we can rewrite K (2) in the following manner



K (A) = (B 2E) [P1(2)P-1(2) 12-'(2)121(A)P, (2)(A 21E)Qi(2)1(B ¿E).

We now show that K (1) = O. Since P(2) and Q (2) are invertible,the expression in square brackets is a polynomial in 2. We shallprove this polynomial to be zero. Assume that this polynomial isnot zero and is of degree m. Then it is easy to see that K (2) is ofdegree m + 2 and since ni 0, K (2) is at least of degree two. But(15) implies that K (2) is at most of degree one. Hence the expres-sion in the square brackets, and with it K (2), is zero.

We have thus found that(17) B 2E =- Po(A 2E)Q0,

where Po and Q, are constant matrices; i.e., we may indeed replaceP(2) and Q(2.) in (12) with constant matrices.

Equating coefficients of 2 in (17) we see that

PoQo E,which shows that the matrices P, and Qo are non-singular and that

Po = Qo-1.Equating the free terms we find that

B = PoAQ0

i.e., that A and B are similar. This completes the proof of ourtheorem.

Since equivalence of the matrices A ¿E and B ¿E issynonymous with identity of their elementary divisors it followsfrom the theorem just proved that two matrices A and B are similarif and only if the matrices A ¿E and B a have the sameelementary divisors. We now show that every matriz A is similarto a matrix in Jordan canonical form.

To this end we consider the matrix A ¿E and find its ele-mentary divisors. Using these we construct as in § 20 a matrix Bin Jordan canonical form. B 2E has the same elementarydivisors as A ¿E, but then B is similar to A.

As was indicated on page 160 (footnote) this paragraph givesanother proof of the fact that every matrix is similar to a matrixin Jordan canonical form. Of course, the contents of this paragraphcan be deduced directly from §§ 19 and 20.


(1)

CHAPTER IV

Introduction to Tensors

§ 22. The dual space

1. Definition of the dual space. Let R be a vector space. To-gether with R one frequently considers another space called thedual space which is closely connected with R. The starting pointfor the definition of a dual space is the notion of a linear functionintroduced in para. 1, § 4.

We recall that a function f(x), x E R, is called linear if it satisfiesthe following conditions:

f(x+y)-f(x)+f(Y),f(2x) = 2f (x).

Let el, e2, e be a basis in an n-dimensional space R. If

x = ei e2 e, + + e" eis a vector in R and f is a linear function on R, then (cf. § 4) wecan write

f(x) = /(eei e2e2 re,)= a2e2 + ane",

where the coefficients al, a2, , a which determine the linearfunction are given by

(2) a = f(e2), a2 = f(e2), a,, = f(e).

It is clear from (1) that given a basis e1, e2, , en every n-tuple

al, a2, , a determines a unique linear function.Let f and g be linear functions. By the sum h off and g we mean

the function which associates with a vector x the number f(x)g (x). By the product off by a number a we mean the function

which associates with a vector x the number x f(x).Obviously the sum of two linear functions and the product of a

function by a number are again linear functions. Also, if f is164


INTRODUCTION TO TENSORS 165

determined by the numbers al, a2, , a and g by the numbersb1, b2, , b n , then f g is determined by the numbers al +a, + b,, , a + bn and xi' by the numbers arz,., a2, , acin.

Thus the totality of linear functions on R forms a vector space.

DEFINITION 1. Let R be an n-dimensional vector space. By thedual space R of R we mean the vector space whose elements arelinear functions defined on R. Addition and scalar multiplication inR follow the rules of addition and scalar multiplication for linearfunctions.

In view of the fact that relative to a given basis e1, e2, , e inR every linear function f is uniquely determined by an n-tuplea1, a2, , a and that this correspondence preserves sums andproducts (of vectors by scalars), it follows that R is isomorphicto the space of n-tupies of numbers.

One consequence of this fact is that the dual space R of then-dimensional space R is likewise n-dimensional.

The vectors in R are said to be contravariant, those in R,covariant. In the sequel x, y, will denote elements of R andf, g, elements of R.

2. Dual bases. In the sequel we shall denote the value of alinear function f at a point x by (f, x). Thus with every pairf E R and x e R there is associated a number (f, x) and thefollowing relations hold:

f, xi + x2) = (f,x1) ( f, x2),(f, /Ix) 2(f, x),(Af, x) = x),(fl. X) = (h, X) + (f2, X).

The first two of these relations stand for f(x,+ x2)=f(xi)-kf(x2)and f(A) Af (x) and so express the linearity of f The thirddefines the product of a linear function by a number and the fourth,the sum of two linear functions. The form of the relations 1 through4 is like that of Axioms 2 and 3 for an inner product (§ 2). However,an inner product is a number associated with a pair of vectorsfrom the same Euclidean space whereas (f, x) is a number associ-ated with a pair of vectors belonging to two different vector spacesR and R.



Two vectors x E R and f E R are said to be orthogonal if

(f,x) = O.

In the case of a single space R orthogonality is defined forEuclidean spaces only. If R is an arbitrary vector space we canstill speak of elements of R being orthogonal to elements of R.

DEFINITION 2. Let e1, e2, e be a basis in R and f1,f2, , f"a basis in R. The two bases are said to be dual if

(3) (P,ek)= {01

when i = kwhen i k

(i, k = 1, 2,

In terms of the symbol hki, defined by

{0 when i k

1 when i = kk

= (i, k 1, 2, , n),

condition (3) can be rewritten as(fi, ek) =

If el, e2, , en is a basis in R, then (f, ek) = f(ek) give thenumbers a, which determine the linear function fe R (cf. formula

(2)). Ibis remark implies thatif e1, e2, , en is a basis in R, then there exists a unique basis

fi, J.', in in R dual to e1, e2, ,The proof is immediate: The equations

(P, ei) = 1, (P, e2) = 0, (p, e) =define a unique vector (linear function) J.' E R. The equations

(f2, el) = 0, (f2, e2) = I, *, (f 2,e) = Odefine a unique vector (linear function) f2 e R, etc. The vectorsfl, f2, . . are linearly independent since the correspondingn-tuples of numbers are linearly independent. Thus In, f2, ,I3constitute a unique basis of R dual to the basis el, e2, , e of R.

In the sequel we shall follow a familiar convention of tensoranalysis according to which one leaves out summation signs andsums over any index which appears as a superscript and a sub-script. Thus el /72 stands for elm. + E22 + + enri.

Given dual bases e, and f one can easily express the coordinatesof any vector. Thus, if x e R and

x



(f, x) =

To repeat:If el, e2,

R then

(nip, ek) (fi, ek)ntek 6,1cn1ek

, e is a basis in R and f', f2, , its dual basis in

then(fk, x) (fk, vet) ei(fk, e1)61k Ek.

Hence, the coordinates Ek of a vector x in the basis e,, e2, , e,can be computed from the formulas

ek (fk, x),

where f is the basis dual to the basis e.Similarly, if fe k and

f= nkfkithen

= (f,e).Now let e1, e2, , e, andf, p, , be dual bases. We shall

express the number (f, x) in terms of the coordinates of thevectors f and x with respect to the bases e1, e2, , en andfi, p, . . respectively. Thus let

x = El ei 52e2 + + e, and f = ntf' + n2f2++Thifn.

Then

(4) (if x) = niE' + 172E2 + + nnen,where $1, E2, . , are the coordinates of x c R relative to the basise1, e2, , en and Th, n ,i, are the coordinates off E R relativeto the basis in, f2, , fn.

NOTE. For arbitrary bases el, e2, , en and P, /2, h in R and Rrespectively

(1, x) = a, rke,where a/c, = (fi,

3. Interchangeability of R and R. We now show that it is possibleto interchange the roles of R and R without affecting the theorydeveloped so far.

R was defined as the totality of linear functions on R. We wish



to show that if q) is a linear function on R, then cp(f) (f, x) forsome fixed vector xo in R.

To this end we choose a basis el, e2, , en in R and denote itsdual by Jr', f2, f". If the coordinates off relative to the basis

f2, . . fn are rh, 172, ,n, then we can writeq)(f) = (tin, + a2,72 4- +

Now let x, be the vector alei a2e2 4- (re Then, aswe saw in para. 2,

(f, X0) = + + annand

(5) (P (f, xo).

This formula establishes the desired one-to-one correspondencebetween the linear functions 9, on ft and the vectors x, e R andpermits us to view R as the space of linear functions on R thusplacing the tvvo spaces on the same footing.

We observe that the only operations used in the simultaneous study of aspace and its dual space are the operations of addition of vectors andmultiplication of a vector by a scalar in each of the spaces involved and theoperation (f, x) which connects the elements of the two spaces. It is there-fore possible to give a definition of a pair of dual spaces R and R whichemphasizes the parallel roles played by the two spaces. Such a definitionruns as follows: a pair of dual spaces R and R is a pair of n-dimensionalvector spaces and an operation (f, x) which associates with f e R andxeR a number (f, x) so that conditions 1 through 4 above hold and, inaddition,

5. (f, x) 0 for all x implies f = 0, and (f, x) 0 for all f implies x O.

NOTE: In para. 2 above we showed that for every basis in R thereexists a unique dual basis in R. In view of the interchangeabilityof R and R, for every basis in R there exists a unique dual basis inR.

4. Transformation of coordinates in R and R. If we specify thecoordinates of a vector x e R relative to some basis e1, e2, en,then, as a rule, we specify the coordinates of a vector f E R relativeto the dual basis f', f2, , fn of e,, e2, , e.

Now let e'1, e'2, , e' be a new basis in R whose connectionwith the basis e1, e2, , e is given by(6) e', = ci"ek.



Let p, f2,- , p be the dual basis of e1, e2, , e andf'1, f '2, , f'nbe the dual basis of e'1, e'2, , e'. We wish to find the matrix111)7'11 of transition from the fi basis to the f'. basis. We first find itsinverse, the matrix of transition from the basis f'1, f'2, , rto the basis F. f2, , fn:

(6')fk

To this end we compute (7, e'i) in two ways:

(fk, e'i) = (fk, ciece2) = ci.(fk, = cik

(fk, e'i) =1= e'i) u5k (f e'1)

Hence c1 = u1k, i.e., the matrix in (6') is the transpose 1 of thetransition matrix in (6). It follows that the matrix of the transition

fOc=from fl, f2, , to fg, f'2, , f'k is equal to the inverse of thetranspose of the matrix which is the matrix of transition frome1, ;, en to e'l , e'2, e'

VVe now discuss the effect of a change of basis on the coordinatesof vectors in R and k. Thus let ei be the coordinates of x E Rrelative to a basis e1, e2, , e and e'i its coordinates in a newbasis e'1, e'2, , e'. Then

x) --= (I' kek) =(f'1 x) = (f",Eiketk)= e".

Now

e't = (f", x) (bklik,x)=bki(fk,x) bkiek,so that

= bkiek.

It follows that the coordinates of vectors in R transform like thevectors of the dual basis in k. Similarly, the coordinates of vectorsin k transform like the vectors of the dual basis in R, i.e.,

ni, = czknk

This is seen by comparing the matrices in (6) and (6'). We say that thematrix lime j/ in (6') is the transpose of the transition matrix in (6) becausethe summation indices in (6) and (6') are different.



We summarize our findings in the following rule: when we changefrom an "old" coordinate system to a "new" one objects with lowercase index transform in one way and objects with upper case indextransform in a different way. Of the matrices r[cikl and I ib,k1 involvedin these transformations one is the inverse of the transpose of the other.

The fact the 111),k11 is the inverse of the transpose of 11c,k11 isexpressed in the relations

c,abni = 6/, = by.

5. The dual of a Euclidean space. For the sake of simplicity werestrict our discussion to the case of real Euclidean spaces.

LEMMA. Let R be an n-dimensional Euclidean space. Thenevery linear function f on R can be expressed in the form

f(x) = (x, y),

where y is a fixed vector uniquely determined by the linear function f.Conversely, every vector y determines a linear function f such thatf(x) = (x, Y).

Proof: Let e1, e2, , e be an orthonormal basis of R. Ifx = eiei, then f(x) is of the form

f(x) = ale + a2e2 + + ase.Now let y be the vector with coordinates c11, et,, , a. Since the

basis e1, e,, , en is orthonormal,

(x, y) = ct1e1 a2E2 + +This shows the existence of a vector y such that for all x

f(x) = (x, y).To prove the uniqueness of y we observe that if

f(x) = (x, y1) and f(x) = (x, y2),

then (x, y) = (x, y2), i.e., (x, y1 y,) = 0 for all x. But thismeans that y, y, = O.

The converse is obvious.

Thus in the case of a Euclidean space everyf in k can be replacedwith the appropriate y in R and instead of writing (f, x) we canwrite (y, x). Since the sumultaneous study of a vector space and itsdual space involves only the usual vector operations and the operation



( f, x) which connects elements fe R and x e R, we may, in case of aEuclidean space, replace f by y, R by R, and (f, X) by (y, x), i.e., wemay identify a Euclidean space R with its dual space R. 2 Thissituation is sometimes described as follows: in Euclidean space onecan replace covariant vectors by contravariant vectors.

When we identify R and its dual rt the concept of orthogonalityof a vector x E R and a vector f E k (introduced in para. 2 above)reduces to that of orthogonality of two vectors of R.

Let e], e2, , e be an arbitrary basis in R and f', f2, in itsdual basis in R. If R is Euclidean, we can identify R with R and solook upon the f as elements of R. It is natural to try to findexpressions for the f in terms of the given e,. Let

e, = g f a.

We wish to find the coefficients go,. Now

(ei ek) = eic) = gj2, ek) = giabe = gikThus if the basis of the J.' is dual to that of the ek, then

ek = gkia,where

g ik (ei, ek).

Solving equation (10) for f' we obtain the required result

f = gi2e .where the matrix Hell is the inverse of the matrix I Lgikl I, i.e.,

EXERCISE. Show that

flak

gik (p, tk)

§ 23. Tensors

1. Multilinear functions. In the first chapter we studied linearand bilinear functions on an n-dimensional vector space. A natural

2 If R is an n-dimensional vector space, then R is also n-dimensional andso R and R are isomorphic. If we were to identify R and R we would haveto write in place of (I, x), (y, x), y, x e R. But this would have the effectof introducing an inner product in R.



generalization of these concepts is the concept of a multilinearfunction of an arbitrary number of vectors some of which areelements of R and some of vvhich are elements of R.

DEFINITION I. A function

1(x, y, ; f, g, )is said to be a multilinear function of p vectors x, y, e R andq vectors f, g, e R (the dual of R) if 1 is linear in each of itsarguments.

Thus, for example, if we fix all vectors but the first then

/(x' x", y, ; f, g, )= 1(x', y, ; f, g, ) 1(x" , y, ; f, g, );

/(2x, y, ; f, g, -) = 2/(x, y, ; f, g, ).Again,

/(x, y, ; f' f", g, )1(x, y, ; f', g, ) 1(x, y, ; f", g, );

1(x, y, ; uf, g, ) = ul(x, y, ; f, g, ).A multilinear function of p vectors in R (contravariant vectors)

and q vectors in R (covariant vectors) is called a multilinearfunction of type (p, q).

The simplest multilinear functions are those of type (1, 0) and(0, 1).

A multilinear function of type (1, 0) is a linear function of onevector in R, i.e., a vector in R (a covariant vector).

Similarly, as was shown in para. 3, § 22, a multilinear functionof type (0, 1) defines a vector in R (a contravariant vector).

There are three types of multilinear functions of two vectors(bilinear functions):

bilinear functions on R (cons dered in § 4),(ß) bilinear functions on R,

functions of one vector in R and one in R.There is a close connection between functions of type (y) and

linear transformat ons. Indeed, let

y = Axbe a linear transformation on R. The bilinear function of type (y)


associated with A is the function

(f, Ax)

which depends linearly on the vectors x e R and fe R.As in § 11 of chapter II one can prove the converse, i.e.,that one

can associate with every bilinear function of type (y) a lineartransformation on R.

2. Expressions for multilinear functions in a given coordinatesystem. Coordinate transformations. We now express a multilinearfunction in terms of the coordinates of its arguments. For simplici-ty we consider the case of a multilinear function 1(x, y; f), x, y E R,fe R (a function of type (2, 1)).

Let el, e2, , en be a basis in R and fl, f2, , 7 its dual in R.Let

x = y = niei, fk.Then


/(x, y;f) /(Ve, n'e5; Ckfk) = V?? e5;fk),Or

/(x, Y; f) = eini ht..k

where the coefficients ail' which determine the function / (x, y; f)are given by the relations

ai ei; fit

This shows that the ak depend on the choice of bases in R and R.A similar formula holds for a general multilinear function

/(x, y, ; f, g, ) = y, ,

where the numbers au::: which define the multilinear functionare given by

ar,7: : : = 1(e, e......fr f3, ).We now show how the system of numbers which determine a

multilinear form changes as a result of a change of basisThus let el, e2, , en be a basis in R and fl, f 2, , f n its dual

basis in R. Let e'1, e'2, , e'n be a new basis in R and fi, f '2, ",fbe its dual in R. If

( 3 )e'a = cate,,



then (cf. para. 4, § 22)f'ß =

where the matrix I lb, II is the transpose of the inverse ofFor a fixed a the numbers c2 in (3) are the coordinates of the

vector e'j, relative to the basis el, e2, e. Similarly, for afixed fi the numbers baft in (4) are the coordinates of f'ß relative tothe basis f', f2, , p.

We shall now compute the numbers a';l2.': which define ourmultilinear function relative to the bases e',, e'2, e' andf'',/12, r. We know that

(CT:: = /(e't, . . .; fr,r, .).

Hence to find we must put in (1) in place of Ei,715, .; 2, ,,us,

the coordinates of the vectors e' e'5, ; fr,f", ' , i.e., thenumbers c52, c/, ; bar, b71, In this way we find that

ci-cl ba.rbts

To sum up: If define a multilinear function /(x, y, ;

f, g, )relative to a pair of dual bases el, e2, e and f1, f2, , fn,and define this function relative to another pair of dual basese'1, e'2, e' and f'1, [2, , PI, then

= ctfixift bibrs 4;5:

Here [c5' [[is the matrix defining the transformation of the e basis andis the matrix defining the transformation of the f basis.

This situation can be described briefly by saying that the lowerindices of the numbers aj are affected by the matrix I Ic5'11 and theupper by the matrix irb1111 (cf. para. 4, § 22).

3. Definition of a tensor. The objects which we have studied inthis book (vectors, linear functions, linear transformations,bilinear functions, etc.) were defined relative to a given basis byan appropriate system of numbers. Thus relative to a given basis avector was defined by its n coordinates, a linear function by its ncoefficients, a linear transformation by the n2 entries in its matrix,and a bilinear function by the n' entries in its matrix. In the caseof each of these objects the associated system of numbers would,upon a change of basis, transform in a manner peculiar to eachobject and to characterize the object one had to prescribe the



values of these numbers relative to some basis as well as their lawof transformation under a change of basis.

In para. 1 and 2 of this section we introduced the concept of amultilinear function. Relative to a definite basis this object isdefined by nk numbers (2) which under change of basis transformin accordance with (5). We now define a closely related conceptwhich plays an important role in many branches of physics,geometry, and algebra.

DEFINITION 2. Let R be an n-dimensional vector space. We saythat aß times covariant and g times contravariant tensor is defined ifwith every basis in R there is associated a set of nv+Q numbers a:::(there are p lower indices and q upper indices) which under change ofbasis defined by some matrix I Ic/II transform according to the rule

(6) = cja b b acrccrp:::

with the transpose of the inverse of I I I. The number p q is

called the rank (valence) of the tensor. The numbers are calledthe components of the tensor.

Since the system of numbers defining a multilinear function of pvectors in R and q vectors in R. transforms under change of basis inaccordance with (6) the multilinear function determines a uniquetensor of rank p q, p times covariant and q times contravariant.Conversely, every tensor determines a unique multilinear function.This permits us to deduce properties of tensors and of the operationson tensors using the "model" supplied by multilinear functions.Clearly, multilinear functions are only one of the possible realiza-tions of tensors.

We now give a few examples of tensors.Scalar. If we associate with every coordinate system the

same constant a, then a may be regarded as a tensor of rank zero.A tensor of rank zero is called a scalar

Contravariant vector. Given a basis in R every vector in Rdetermines n numbers, its coordinates relative to this basis. Thesetransform according to the rule

=and so represent a contravariant tensor of rank 1.

Linear function (covariant vector). The numbers a, defining



a linear function transform according to the rule

a', = cia,and so represent a covariant tensor of rank 1.

Bilinear function. Let A (x; y) be a bilinear form on R.With every basis we associate the matrix of the bilinear formrelative to this basis. The resulting tensor is of rank two, twicecovariant. Similarly, a bilinear form of vectors x E R and y e Rdefines a tensor of rank two, once covariant and once contra-variant and a bilinear form of vectors f, g e R defines a twicecontravariant tensor.

Linear transformation. Let A be a linear transformationon R. With every basis we associate the matrix of A relative tothis basis. We shall show that this matrix is a tensor of rank two,once covariant and once contravariant.

Let 11(11111 be the matrix of A relative to some basis el, e2, ,e,, i.e.,

Ae, = aikek.

Define a change of basis by the equations

e', =Then

e, b," e' where b1"Cak 6 ik

It follows that

Ae', = Ac,ae2 = cia Ae2 = e fi = ci2afl bflk e', = a' jei k.

This means that the matrix of A relative to the e', basistakes the form

c2ab fik,

which proves that the matrix of a linear transformation is indeed atensor of rank two, once covariant and once contravariant.

In particular the matrix of the identity transformation Erelative to any basis is the unit matrix, i.e., the system of numbers

i if i k,6ik toif i k.

Thus 61k is the simplest tensor of rank two once covariant and once



contravariant. One interesting feature of this tensor is that itscomponents do not depend on the choice of basis.

EXERCISE. Show dirctly that the system of numbers

(I if i = k,6; = 0 if i k,

associated with every bais is a tensor.

We now prove two simple properties of tensors.A sufficient condition for the equality of two tensors of the same

type is the equality of their corresponding components relative tosome basis. (This means that if the components of these twotensors relative to some basis are equal, then their componentsrelative to any other basis must be equal.) For proof we observethat since the two tensors are of the same type they transform inexactly the same way and since their components are the same insome coordinate system they must be the same in every coordinatesystem. We wish to emphasize that the assumption about thetvvo tensors being of the same type is essential. Thus, given abasis, both a linear transformation and a bilinear form are definedby a matrix. Coincidence of the matrices defining these objectsin one basis does not imply coincidence of the matrices definingthese objects in another basis.

Given p and q it is always possible to construct a tensor of type (p, q)whose components relative to some basis take on 79±g prescribedvalues. The proof is simple. Thus let (IT:: be the numbersprescribed in some basis. These numbers define a multilinearfunction /(x, y, ; f, g, ) as per formula (1) in para. 2 ofthis section. The multilinear function, in turn, defines a uniquetensor satisfying the required conditions.

4. T ensors in Euclidean space. If R is a (real) n-dimensionalEuclidean space, then, as was shown in para. 5 of § 22, it is possibleto establish an isomorphism between R and R such that if y E Rcorresponds under this isomorphism to fe R, then

(f, x) = (y, x)for all x e R. Given a multilinear function 1 of fi vectors x, y,in R and q vectors f, g, in R we can replace the latter by corre-sponding vectors u, y, in R and so obtain a multilinearfunction l(x, y, ; u, y, ) of p q vectors in R.



We now propose to express the coefficients of l(x, y, ; u, v, )in terms of the coefficients of /(x, y, ; f,g, ).

Thus let au::: be the coefficients of the multilinear funct on/(x, y, ; f, g, ), i.e.,

= l(ei, e ; fr, fs, . .)

and let b...,.. be the coefficients of the multilinear function; u, v, ), i.e.,

¿(e1, e, , ; e, e,, ).We showed in para. 5 of § 22 that in Euclidean space the vectorse, of a basis dual to fi are expressible in terms of the vectors pin the following manner:

e, = grswhere

gzk = (et, ek)It follows that

rs = l(e1, e5, ; e,, es, )l(es, e, , ; gc,. fit, , )

= ggfis 1(e3, ei, ; pc, fie, )= gsrgfis aTf:::.

In view of the established connection between multilinearfunctions and tensors we can restate our result for tensors:

If au::: is a tensor in Euclidean space p times covariant and qtimes contravariant, then this tensor can be used to construct a newtensor kJ._ which is p q times covariant. This operation isreferred to as lowering of indices. It is defined by the equation

= gccrg fi,

Here g is a twice covariant tensor. This is obvious if we observethat the g, = (e1, e,) are the coefficients of a bilinear form,namely, the inner product relative to the basis e1, e2, , en.In view of its connection with the inner product (metric) in ourspace, the tensor gz, is called a metric tensor.

The equation rdefines the analog of the operation just discussed. The new



operation is referred to as raising the indices. Here e has themeaning discussed in para. 5 of § 22.

EXERCISE. Show that gm is a twice contravariant tensor.

5. Operations on tensOYS. In view of the connection betweentensors and multilinear functions it is natural first to defineoperations on multilinear functions and then express these defini-tions in the language of tensors relative to some basis.

Addition of tensors. Let

l'(x, y, ; f, g, ), l" (x, y , -; f, g, )be two multilinear functions of the same number of vectors in Rand the same number of vectors in R. We define their sum/(x, y, ; f, g, ) by the formula

/(x, y, ; f, g, ) = (x, y, -; f, g, -)

Clearly this sum is again a multilinear function of the samenumber of vectors in R and R as the summands l' and 1". Conse-quently addition of tensors is defined by means of the formula

=Multiplication of tensors. Let

l'(x, y, ; f, g, ) and 1"(z, ; h, )be two multilinear functions of which the first depends on ivvectors in R and q' vectors in R and the second on Jo" vectors inR and q" vectors in R. We define the product /(x, y, z, ;f, g, , h, ) of l' and 1" by means of the formula:

/(x, y, , z, ; f, g, , h, )l'(x, y, ; f, g, -)1(z,

1 is a multilinear function of p' p" vectors in R and q' q"vectors in R. To see this we need only vary in 1 one vector at atime keeping all other vectors fixed.

Ve shall now express the components of the tensor correspond-ing to the product of the multilinear functions l' and 1" in termsof the components of the tensors corresponding to l' and 1". Since

r(ei, ej, -; f3, -)


Since each summand is a multilinear function of y, and g,the same is true of the sum I'. We now show that whereas eachsummand depends on the choice of basis, the sum does not. Let uschoose a new basis e'1, e'2, e' and denote its dual basis by

f'2, irtn. Since the vectors y, and g, remain fixed weneed only prove our contention for a bilinear form A (x;Specifically we must show that

A (e; fa) = A (e' a; f'k).

A (e'2; f'ce) = A (cak ek; f'a) = cak A( f' z)

= A (ek; ck f'a) = A (ek; fk),

i.e., A (ea; P) is indeed independent of choice of basis.We now express the coefficients of the form (7) in terms of the

coefficients of the form /(x, y, ; f, g, -). Since

= r(e5, ; f', -)

(7)

l'(y, -; g, )= /(ei, y, ; fl, g, ) /(e2, y,

+ 1(e, Y, ; f", g,1(e, y, ; g, ).

; f2, g,.)

)


and= 1" (ek, e1, ; Jet, fu , ),

it follows thatatt tuk,::: a"tkl

This formula defines the product of two tensors.Contraction of tensors. Let /(x, y, ; f, g, ) be a multilinear

function of p vectors in R (p 1) and q vectors in R(q 1).

We use 1 to define a new multilinear function of p 1 vectors in Rand q 1 vectors in R. To this end we choose a basis el, e2, ,e in R and its dual basis p, f2, f" in R and consider the sum

We recall that ife', cikek,

thenfk eikr.

Therefore


andl'(e ; jes, ) = l(e e ; f2, ),

if follows that

(8) =

The tensor a';::: obtained from a::: as per (8) is called acontraction of the tensor

It is clear that the summation in the process of contraction mayinvolve any covariant index and any contravariant index. How-ever, if one tried to sum over two covariant indices, say, the result-ing system of numbers would no longer form a tensor (for uponchange of basis this system of numbers would not transform inaccordance with the prescribed law of transformation for tensors).

We observe that contraction of a tensor of rank two leads to atensor of rank zero (scalar), i.e., to a number independent ofcoordinate systems.

The operation of lowering indices discussed in para. 4 of thissection can be viewed as contraction of the product of some tensorby the metric tensor g, (repeated as a factor an appropriate num-ber of times). Likewise the raising of indices can be viewed ascontraction of the product of some tensor by the tensor g".

Another example. Let a,k be a tensor of rank three and bt'na tensor of rank two. Their product ct'z' ai,kb,"' is a tensor rankfive. The result of contracting this tensor over the indices i and m,say, would be a tensor of rank three. Another contraction, overthe indices j and k, say, would lead to a tensor of rank one (vector).

Let ati and b,' be two tensors of rank two. By multiplicationand contraction these yield a new tensor of rank two:

cit = aiabat.

If the tensors a1 and b ki are looked upon as matrices of lineartransformations, then the tensor cit is the matrix of the productof these linear transformations.

With any tensor ai5 of rank two we can associate a sequence ofinvariants (i.e., numbers independent of choice of basis, simplyscalars)

a:, a/




The operations on tensors permit us to construct from giventensors new tensors invariantly connected with the given ones.For example, by multiplying vectors we can obtain tensors ofarbitrarily high rank. Thus, if ei are the coordinates of a contra-variant vector and n, of a covariant vector, then Ein, is a tensor ofrank two, etc. We observe that not all tensors can be obtained bymultiplying vectors. However, it can be shown that every tensorcan be obtained from vectors (tensors of rank one) using theoperations of addition and multiplication.

By a rational integral invariant of a given system of tensors we mean apolynomial function of the components of these tensors whose value doesnot change when one system of components of the tensors in question compu-ted with respect to some basis is replaced by another system computed withrespect to some other basis.

In connection with the above concept we quote without proof the follow-ing result:

Any rational integral invariant of a given system of tensors can be ob-tained from these tensors by means of the operations of tensor multiplica-tion, addition, multiplication by a number and total contraction (i.e.,contraction over all indices).

6. Symmetric and skew symmetric tensors

DEFINITION. A tensor is said to be symmetric with respect to agiven set of indices i if its components are invariant under anarbitrary permutation of these indices.

For example, if

then the tensor is said to be symmetric with respect to the firsttwo (lower) indices.

If 1(x, y, ; f, g, ) is the multilinear function correspondingto the tensor ail ..., i.e., if

(9) 1(x, y, ; f, g, ) =then, as is clear from (9), symmetry of the tensor with respect tosome group of indices is equivalent to symmetry of the corre-sponding multilinear function with respect to an appropriate set ofvectors. Since for a multilinear function to be symmetric with

It goes without saying that we have in mind indices in the same(upper or lower) group.



respect to a certain set of vectors it is sufficient that the corre-sponding tensor a:11;; be symmetric with respect to an appropriateset of indices in some basis, it follows that if the components of atensor are symmetric relative to one coordinate system, then thissymmetry is preserved in all coordinate systems.

DEFINITION. A tensor is said to be skew symmetric if it changessign every time two of its indices are interchanged. Here it is assumedthat we are dealing with a tensor all of whose indices are of thesame nature, i.e., either all covariant or all contravariant.

The definition of a skew symmetric tensor implies that an evenpermutation of its indices leaves its components unchanged andan odd permutation multiplies them by 1.

The multilinear functions associated with skew symmetrictensors are themselves skew symmetric in the sense of the followingdefinition:

DEFINITION. A multilinear function 1(x, y, ) of p vectorsx, y, in R is said to be skew symmetric if interchanging any pairof its vectors changes the sign of the function.

For a multihnear function to be skew symmetric it is sufficientthat the components of the associated tensor be skew symmetricrelative to some coordinae system. This much is obvious from (9).On the other hand, skew symmetry of a multilinear function impliesskew symmetry of the associated tensor (in any coordinate system).In other words, if the components of a tensor are skew symmetric inone coordinate system then they are skew symmetric in all coordi-nate systems, i.e., the tensor is skew symmetric.

We now count the number of independent components of askew symmetric tensor. Thus let a be a skew symmetric tensor ofrank two. Then a,k = a, so that the number of different com-ponents is n(n 1)/2. Similarly, the number of different compo-nents of a skew symmetric tensor ail, is n (n 1) (n 2)/3! sincecomponents with repeated indices have the value zero and com-ponents which differ from one another only in the order of theirindices can be expressed in terms of each other. More generally,the number of independent components of a skew symmetrictensor with k indices (k n) is (:). (There are no non zero skewsymmetric tensors with more than n indices. This follows from the



12'''This proves the fact that apart from a multiplicative constant theonly skew symmetric multilinear function of n vectors in an n-dimensional vector space is the determinant of the coordinates ofthese vectors.

The ofieration of symmetrization. Given a tensor one can alwaysconstruct another tensor symmetric with respect to a preassignedgroup of indices. This operation is called symmetrization andconsists in the following.

Let the given tensor be 011,12_1 , say. To symmetrize it withrespect to the first k indices, say, is to construct the tensor

1

k! L, o2.-44+1

where the sum is taken over all permutations ji, 12, j, of theindices ii,12, ik. For example

aCis i2) 2 -1, i2 2

fact that a component with two or more repeated indices vanishesand k > n implies that at least two of the indices of each compo-nent coincide.)

We consider in greater detail skevv symmetric tensors with nindices. Since two sets of n different indices differ from one anotherin order alone, it follows that such a tensor has only one independ-ent component. Consequently if 11, 12,- i is any permutationof the integers 1, 2, , n and if we put = a, then(10)a11±adepending on whether the permutation 1112 1,, is even (4- sign)or odd( sign).

EXERCISE. Show that as a result of a coordinate transformation thenumber a... = a is multiplied by the determinant of the matrix associat-ed with this coordinate transformation.

In view of formula (10) the multilinear function associatedwith a skew symmetric tensor with n indices has the form

e2 En

/(x, y, z) = = a ni n2 " vn



The operation of alternation is analogous to the operation ofsymmetrization and permits us to construct from a given tensoranother tensor skew symmetric with respect to a preassignedgroup of indices. The operation is defined by the equation

1

a,. . . = 1 +aiii2...ik,k!

where the sum is taken over all permutations ji,12, , j, of theindices i,,i , ik and the sign depends on the even or oddnature of the permutation involved. For instance

att,t2! = Cat, at2t,)The operation of alternation is indicated by the square bracket

symbol [ ]. The brackets conta ns the indices involved in theoperation of alternation.

Given k vectors eii, ni 2) , k we can construct their tensorproduct ail i2-"ik = Vini2 Cik and then alternate it to get PIIt is easy to see that the components of this tensor are all kthorder minors of the following matrix

e2

n1 n2

CI V.

The tensor a[ii '41 does not change when we add to one ofthe vectors E, n, any linear combination of the remainingvectors.

Consider a k-dimensional subspace of an n-dimensional space R.We vvish to characterize this subspace by means of a system ofnumbers, i.e., we wish to coordinatize it.

A k-dimensional subspace is generated by k linearly independentvectors ei, 2f 2, , Cik. Different systems of k linearly independ-ent vectors may generate the same subspace. However, it iseasy to show (the proof is left to the reader) that if two suchsystems of vectors generate the same subspace, the tensorsconstructed from each of these systems differ by a non-zeromultiplicative constant only.

Thus the skew symmetric tensor a[i1i2" constructed on thegenerators VI, Cik of the subspace defines this subspace.


gelfand - lectures on linear algebra

Documents

e e n vn

n t e r s c i e n c

n f t vs

e y fa

o n d o n j anuary1

i s h e r s

p yn n nn

vr b b b j ib j