linalg notes - leep

Linear Algebra

David B. LeepDepartment of Mathematics

University of KentuckyLexington KY 40506-0027

Contents

Introduction 2

1 Vector Spaces 41.1 Basics of Vector Spaces . . . . . . . . . . . . . . . . . . . . . . 41.2 Systems of Linear Equations . . . . . . . . . . . . . . . . . . . 161.3 Appendix . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 221.4 Proof of Theorems 1.20 and 1.21 . . . . . . . . . . . . . . . . . 23

2 Linear Transformations 272.1 Basic Results . . . . . . . . . . . . . . . . . . . . . . . . . . . 272.2 The vector space of linear transformations f : V → W . . . . . 322.3 Quotient Spaces . . . . . . . . . . . . . . . . . . . . . . . . . . 342.4 Applications . . . . . . . . . . . . . . . . . . . . . . . . . . . . 37

3 Matrix Representations of a Linear Transformation 443.1 Basic Results . . . . . . . . . . . . . . . . . . . . . . . . . . . 443.2 The Transpose of a Matrix . . . . . . . . . . . . . . . . . . . . 503.3 The Row Space, Column Space, and Null Space of a Matrix . 513.4 Elementary Matrices . . . . . . . . . . . . . . . . . . . . . . . 543.5 Permutations and Permutation Matrices . . . . . . . . . . . . 563.6 The Rank of a Matrix . . . . . . . . . . . . . . . . . . . . . . 593.7 Systems of Linear Equations . . . . . . . . . . . . . . . . . . . 633.8 Bases of Subspaces . . . . . . . . . . . . . . . . . . . . . . . . 65

4 The Dual Space 704.1 Basic Results . . . . . . . . . . . . . . . . . . . . . . . . . . . 704.2 Exact sequences and more on the rank of a matrix . . . . . . . 724.3 The double dual of a vector space . . . . . . . . . . . . . . . . 77

1

4.4 Annihilators of subspaces . . . . . . . . . . . . . . . . . . . . . 784.5 Subspaces of V and V ∗ . . . . . . . . . . . . . . . . . . . . . . 804.6 A calculation . . . . . . . . . . . . . . . . . . . . . . . . . . . 82

5 Inner Product Spaces 845.1 The Basics . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 845.2 Orthogonality . . . . . . . . . . . . . . . . . . . . . . . . . . . 875.3 The Matrix of an Inner Product on V . . . . . . . . . . . . . . 915.4 The Adjoint of a Linear Transformation . . . . . . . . . . . . 945.5 Normal Transformations . . . . . . . . . . . . . . . . . . . . . 96

6 Determinants 1076.1 The Existence of n-Alternating Forms . . . . . . . . . . . . . . 1076.2 Further Results and Applications . . . . . . . . . . . . . . . . 1156.3 The Adjoint of a matrix . . . . . . . . . . . . . . . . . . . . . 1176.4 The vector space of alternating forms . . . . . . . . . . . . . . 1216.5 The adjoint as a matrix of a linear transformation . . . . . . . 1226.6 The Vandermonde Determinant and Applications . . . . . . . 125

7 Canonical Forms of a Linear Transformation 1337.1 Preliminaries . . . . . . . . . . . . . . . . . . . . . . . . . . . 1337.2 The Cayley-Hamilton Theorem . . . . . . . . . . . . . . . . . 1347.3 Eigenvectors, eigenvalues, diagonalizability . . . . . . . . . . . 1377.4 More results on T-Invariant Subspaces . . . . . . . . . . . . . 1407.5 Primary Decomposition . . . . . . . . . . . . . . . . . . . . . 1417.6 Further Decomposition . . . . . . . . . . . . . . . . . . . . . . 147

2

Introduction

These notes are for a graduate course in linear algebra. It is assumedthat the reader has already studied matrix algebra or linear algebra, how-ever, these notes are completely self-contained. The material is developedcompletely from scratch, but at a faster pace than a beginning linear algebracourse. Less motivation is presented than in a beginning course. For exam-ple, it is assumed that the reader knows that vector spaces are importantto study, even though many details about vector spaces might have beenforgotten from an earlier course.

I have tried to present material efficiently without sacrificing any of the de-tails. Examples are often given after stating definitions or proving theorems,instead of beforehand. There are many occasions when a second method orapproach to a result is useful to see. I have often given a second approachin the text or in the exercises. I try to emphasize a basis-free approach toresults in this text. Mathematically this is often the best approach, but ped-agogically this is not always the best route to follow. For this reason, I haveoften indicated a second approach, either in the text or in the exercises, thatuses a basis in order to make results more clear.

Linear algebra is most conveniently developed over an arbitrary field k.For readers not comfortable with such generality, very little is lost if onealways thinks of k as the field of real numbers R, or the field of complexnumbers C. It will be clearly pointed out in the text if particular propertiesof a field are used or assumed.

3

Chapter 1

Vector Spaces

1.1 Basics of Vector Spaces

We begin by giving the definition of a vector space and deriving the mostbasic properties. They are fundamental to all that follows. The main resultof this chapter is that all finitely generated vector spaces have a basis andthat any two bases of a vector space have the same cardinality. On theway to proving this result, we introduce the concept of subspaces, linearcombinations of vectors, and linearly independent vectors. These resultslead to the concept of the dimension of a vector space. We close the chapterwith a brief discussion of direct sums of vector spaces.

Let k denote an arbitrary field. We begin with the definition of a vec-tor space. Example 1 (just after Proposition 1.2) gives the most importantexample of a vector space.

Definition 1.1. A vector space V over a field k is a nonempty set V togetherwith two binary operations, called addition and scalar multiplication, whichsatisfy the following ten axioms.

1. Addition is given by a function

V × V → V

(v, w) 7−→ v + w.

2. v + w = w + v for all v, w ∈ V .

3. (v + w) + z = v + (w + z) for all v, w, z ∈ V .

4

4. There exists an element 0 ∈ V such that v + 0 = v for all v ∈ V .

5. Given v ∈ V , there exists w ∈ V such that v + w = 0.

6. Scalar multiplication is given by a function

k × V → V

(a, v) 7−→ av.

7. a(v + w) = av + aw for all a ∈ k, v, w,∈ V .

8. (a+ b)v = av + bv for all a, b ∈ k, v ∈ V .

9. a(bv) = (ab)v for all a, b ∈ k, v ∈ V .

10. 1v = v for all v ∈ V , where 1 is the multiplicative identity of k.

Axioms (1)-(5) state that V is an abelian group under the operation ofaddition. The following result gives some simple consequences of axioms(1)-(5).

Proposition 1.2. The following statements hold in a vector space V .

1. The element 0 in axiom (4) is uniquely determined.

2. (Cancellation property) If v, y, z ∈ V and y + v = z + v, then y = z.

3. The element w in axiom (5) is uniquely determined.

Proof. 1. Suppose 01 and 02 are elements in V that satisfy axiom (4).Then axioms (4) and (2) give 01 = 01 + 02 = 02 + 01 = 02.

2. Assume v + w = 0. Then y = y + 0 = y + (v + w) = (y + v) + w =(z + v) + w = z + (v + w) = z + 0 = z.

3. If v + w1 = 0 = v + w2, then w1 = w2 by (2).

The notation “−v” will stand for the unique element w in axiom (5).Thus, v + (−v) = 0. The notation “v + (−w)” is usually shortened to“v − w”.

5

Example 1. This is the most basic and most important example of a vectorspace. Let n ≥ 1, n an integer. Let

k(n) = (a1, . . . , an)|ai ∈ k,

the set of all n-tuples of elements of k. Two elements (a1, . . . , an) and(b1, . . . bn) are equal if and only if a1 = b1, . . . , an = bn. Define additionand scalar multiplication by

(a1, . . . , an) + (b1, . . . , bn) = (a1 + b1, . . . , an + bn)

andc(a1, . . . , an) = (ca1, . . . , can).

Then k(n) is a vector space under these two operations. The zero elementof k(n) is (0, . . . , 0) and −(a1, . . . , an) = (−a1, . . . ,−an). Check that axioms(1)-(10) are valid and conclude that k(n) is a vector space over k (Exercise1).

We will show in Proposition 2.8 that every finitely generated vector spaceover k is isomorphic to k(n) for some n. This is why Example 1 is so important.

Proposition 1.3. The following computations are valid in a vector space Vover k.

1. 0v = 0 for all v ∈ V . (Note that the first “0” lies in k and the second“0” lies in V .)

2. a0 = 0 for all a ∈ k.

3. a(−v) = −(av) for all a ∈ k, v ∈ V .

4. −v = (−1)v, for all v ∈ V .

5. If a ∈ k, a 6= 0, v ∈ V , and av = 0, then v = 0.

6. If a1v1+· · ·+anvn = 0, a1 6= 0, then v1 = (−a2/a1)v2+· · ·+(−an/a1)vn.

Proof. 1. 0v+ 0 = 0v = (0 + 0)v = 0v+ 0v, so 0 = 0v by the cancellationproperty (Proposition 1.2(2)).

2. a0 + 0 = a0 = a(0 + 0) = a0 + a0, so 0 = a0 by the cancellationproperty.

6

3. a(−v) + av = a(−v + v) = a0 = 0, by (2). Since −(av) + av = 0 byaxiom (5), we have a(−v) = −(av) by Proposition 1.2(3).

4. (−1)v + v = (−1)v + 1v = (−1 + 1)v = 0v = 0, and so (−1)v = −v byProposition 1.2(3).

5. v = 1v = ((1/a) · a)v = (1/a)(av) = (1/a)(0) = 0, by (2).

6. a1v1 = −(a2v2 + · · · anvn) = (−1)(a2v2 + · · · anvn) = −a2v2−· · ·−anvn,which gives

v1 = 1 · v1 = ((1/a1) · a1)v1 = (1/a1)(a1v1) = (1/a1)(−a2v2 − · · · − anvn)

= (−a2/a1)v2 + · · ·+ (−an/a1)vn.

Definition 1.4. A subset W of a vector space V is a subspace of V if W ⊆ Vand W is a vector space over k with respect to the operations of V . W is aproper subspace of V if W is a subspace of V and W ( V .

Example 2. 0 and V are subspaces of V .We will use the following convention. The subspace of V consisting of

only the vector 0 will be written (0). A set consisting of only the zero vectorwill be written 0. Although the difference is small, it is useful to distinguishthe two notions.

Proposition 1.5. Let W be a nonempty subset of a vector space V over k.Then W is a subspace of V if and only if the following two statements hold.

1. If v, w ∈ W , then v + w ∈ W .

2. If a ∈ k, v ∈ W , then av ∈ W .

Proof. If W is a subspace of V , then (1) and (2) hold because of axioms (1)and (6). Now suppose that (1) and (2) hold. Let v ∈ W (W is nonempty).Then 0 = 0v ∈ W and −v = (−1)v ∈ W by (2). Now it is easy to check thataxioms (1)-(10) hold in W . (Most of the axioms are obvious since they arealready valid in V .)

Definition 1.6. Let S be a nonempty subset of V . A linear combination ofelements of S is an expression

∑v∈S avv, av ∈ k, where only finitely many of

the av’s are nonzero. (It is often convenient to think of a linear combinationas a finite sum

∑ni=1 aivi where v1, . . . , vn are distinct elements of S). A

nontrivial linear combination of elements of S is a linear combination asabove in which at least one of the coefficients av is nonzero.

7

Let W be the set of all linear combinations of elements of S. Then W isa subspace of V by Proposition 1.5, since if

∑v∈S avv,

∑v∈S bvv ∈ W , and

c ∈ k, then ∑v∈S

avv +∑v∈S

bvv =∑v∈S

(av + bv)v ∈ W

andc∑v∈S

avv =∑v∈S

(cav)v ∈ W.

The set W is called the subspace generated by S, or spanned by S, and wewrite W = 〈S〉.

Definition 1.7.

1. A vector space V is finitely generated if V = 〈S〉 for some finite subsetS of V .

2. A subset S of V is linearly independent over k if every equation∑ni=1 aivi = 0, where ai ∈ k and v1, . . . , vn are distinct elements

of S, implies that ai = 0, 1 ≤ i ≤ n. In this situation, one also saysthat the elements of S are linearly independent over k.

3. A subset S of V is linearly dependent over k if S is not linearly in-dependent over k. That is, there exist distinct elements v1, . . . , vn inS, elements a1, . . . , an ∈ k, and an equation

∑ni=1 aivi = 0 where some

ai 6= 0.

4. A subset S of V is a basis of V if S is a linearly independent subset ofV and V = 〈S〉.

The following convention is used. If S = ∅, the empty set, then 〈S〉 = (0).The empty set is a linearly independent set.

Example 3. Let k(n) be the vector space defined in Example 1. Let e1 =(1, 0, . . . , 0), e2 = (0, 1, 0, . . . , 0), . . . , en = (0, . . . , 0, 1). Thus ei is the n-tuple with zero in each coordinate except for a one in the ith coordinate.Then e1, . . . , en is a basis of k(n) (Exercise 2). This basis is called thestandard basis of k(n).

Here are two more examples of vector spaces over a field k that are usefulto remember.

8

Example 4. Let k(∞) denote the set of all sequences (a1, a2, a3, . . .) witheach ai ∈ k. Then k(∞) is a vector space where addition is performed coor-dinatewise and c(a1, a2, a3, . . .) = (ca1, ca2, ca3, . . .).

Example 5. Let k(∞) denote the set of all sequences (a1, a2, a3, . . .) with

each ai ∈ k such that only finitely many of the ai’s are nonzero. Then k(∞)

is a subspace of k(∞).Here is a basis of k(∞) that is easy to describe. Let ei = (0, . . . , 0, 1, 0, . . .),

where the ith coordinate is 1 and every other coordinate is zero. Then ε∞ =e1, e2, e3, . . . is a basis of k(∞) called the standard basis of k(∞). (The vectorspace k(∞) also has a basis but it is not so easy to describe.)

Proposition 1.8. The following statements are equivalent for a nonemptyset S.

1. S is a basis of V .

2. Every element of V can be written uniquely as a linear combination∑v∈S avv, av ∈ k.

Proof. (1) ⇒ (2). Since V = 〈S〉, each element v ∈ V can be written as alinear combination as in (2) and the uniqueness follows from Exercise 10.

(2) ⇒ (1). The hypothesis in (2) implies that V = 〈S〉. Suppose that∑v∈S avv = 0. Since 0 =

∑v∈S 0v, the uniqueness part of (2) implies av = 0

for all v ∈ S. This shows S is a linearly independent subset and so S is abasis of V .

Theorem 1.9. Let V be a vector space over k and let V = 〈T 〉 for somenonempty finite subset T of V . Let S be a subset of T that is linearly inde-pendent. Then there exists a basis B of V such that S ⊆ B ⊆ T .

Proof. Since T is finite, there is a maximal subset B of T containing S thatis linearly independent over k. Let W = 〈B〉. If W 6= V , then there existssome x ∈ T such that x 6∈ W (since if T ⊆ W , then V = 〈T 〉 ⊆ W ). ThenB ( B ∪ x and so the maximality of B implies that B ∪ x is a linearlydependent set. It follows that x ∈ 〈B〉 = W , a contradiction. (See Exercise13 ). Therefore, W = V and B is a basis of V .

Theorem 1.9 also holds if T is an infinite set. The proof is similar butrequires the use of Zorn’s Lemma.

9

Corollary 1.10. Let V be a finitely generated vector space.

1. V has a basis.

2. Every finite linearly independent subset of V can be extended to a basisof V .

3. Every finite generating set of V contains a basis of V .

Proof. Suppose that V = 〈T 〉 for some finite set T . Let S equal the emptyset and apply Theorem 1.9 to conclude that there exists a basis B of V suchthat S ⊆ B ⊆ T . This proves (1) and (3).

To prove (2), let S be a finite linearly independent subset of V . ThenS ∪T is a finite set, S ⊆ S ∪T , and V = 〈S ∪T 〉. Theorem 1.9. implies thatV has a basis B such that S ⊆ B ⊆ S ∪ T . Thus S has been extended to abasis of V . This proves (2).

Corollary 1.10 also holds for vector spaces that are not finitely generated.The proof uses the general case of Theorem 1.9.

Lemma 1.11 (Replacement Lemma). Suppose v1, . . . , vm is a basis of V .Let w ∈ V , c1, . . . , cm ∈ k, and suppose that w = c1v1 + · · · + cmvm withc1 6= 0. Then w, v2, . . . , vm is a basis of V .

Proof. Since v1 = c−11 (w−c2v2−· · ·−cmvm) ∈ 〈w, v2, . . . , vm〉, it follows that〈w, v2, . . . , vm〉 = V . (See exercise 11.)

Now suppose that a1w + a2v2 + · · ·+ amvm = 0, with each ai ∈ k. Thena1c1v1 +

∑mi=2(a1ci + ai)vi = 0. Since v1, . . . , vm is a linearly independent

set, we have that a1c1 = 0. Thus a1 = 0 because c1 6= 0. This gives a2v2 +· · · + amvm = 0. We conclude from the linear independence of v2, . . . , vmthat ai = 0 for 2 ≤ i ≤ m. Therefore w, v2, . . . , vm is a linearly independentset, and so w, v2, . . . , vm is a basis of V .

Theorem 1.12. Any two bases of a vector space V have the same cardinality.

Proof. We shall assume that V is finitely generated. The case when V is notfinitely generated will not be needed.

Since V is finitely generated, V contains a finite basis v1, . . . , vm byTheorem 1.9. Thus, V = 〈v1, . . . , vm〉. Let w1, . . . , wn be any elements of Vthat are linearly independent over k. We will show that n ≤ m. Assumingthis has been done, we may conclude that any other basis of V has at most

10

m elements. If a second basis of V has l elements, l ≤ m, then the symmetryof this argument lets us conclude that m ≤ l also. Thus, any basis of V hasexactly m elements.

It remains to show that n ≤ m. Suppose that n > m. Since V =〈v1, . . . , vm〉, we have w1 = c1v1 + · · ·+ cmvm, ci ∈ k. As w1 6= 0 (see Exercise9), we may relabel the vi’s to assume that c1 6= 0. Then w1, v2, . . . , vm is abasis of V by Lemma 1.11. Suppose by induction, and after relabeling, thatw1, . . . , wr, vr+1, . . . , vm is a basis of V , 1 ≤ r < m. Then

wr+1 = c1w1 + · · ·+ crwr + cr+1vr+1 + · · ·+ cmvm, ci ∈ k.

Now cr+1, . . . , cm cannot all equal zero since w1, . . . , wr+1 are linearly inde-pendent over k. Thus, we may relabel to assume that cr+1 6= 0. Lemma 1.11implies that w1, . . . , wr+1, vr+2, . . . , vm is a basis of V . Since n > m, wemay continue this process and conclude that w1, . . . , wm is a basis of V .Then w1, . . . , wm+1 would be a linearly dependent set, since wm+1 wouldbe contained in V = 〈w1, . . . , wm〉. This is impossible since w1, . . . , wn isa linearly independent set. Therefore n ≤ m and the proof is complete.

Definition 1.13. The dimension of a finitely generated vector space V overk is the number m of elements in any, and hence all, bases of V . This iswritten dimk V = m.

If V = (0), then dimk V = 0 since the cardinality of the empty set equalszero. If there is no confusion, we will usually write dimV instead of dimk V .It follows from Exercise 2 that dim(k(n)) = n.

Proposition 1.14. Let S = v1, . . . , vn and assume that dimV = n. Thefollowing statements are equivalent.

1. S is a basis of V .

2. S is a linearly independent set.

3. V = 〈S〉. That is, V is spanned by S.

Proof. (1) ⇒ (2) is obvious.(2)⇒ (3). Extend S to a basis T of V using Corollary 1.10. Since |T | = n

by Theorem 1.12, it follows S = T and V = 〈S〉.(3) ⇒ (1). Use Corollary 1.10 to choose R ⊆ S such that R is a basis of

V . Again, |R| = n by Theorem 1.12 so R = S and it follows that S is a basisof V .

11

The intersection of two subspaces of V is another subspace of V . (SeeExercise 4.) The sum of two subspaces of a vector space V is defined inExercise 6 and is shown there to be a subspace of V . The next result givesinformation on the dimensions of these subspaces in terms of the original twosubspaces.

Proposition 1.15. Let U,W be subspaces of a finitely generated vector spaceV . Then

dimU + dimW = dim(U +W ) + dim(U ∩W ).

Proof. Let R = y1, . . . , yl be a basis of U∩W . Use Corollary 1.10 to choosebases S = y1, . . . , yl, ul+1, . . . , um of U and T = y1, . . . , yl, wl+1, . . . , wnof W . Our goal is to show that

S ∪ T = y1, . . . , yl, ul+1, . . . , um, wl+1, . . . , wn

is a basis of U +W . Assuming that this has been shown, then

dim(U+W )+dim(U∩W ) = [l+(m−l)+(n−l)]+l = m+n = dimU+dimW.

Let v ∈ U + W . Then v = v1 + v2 where v1 ∈ U , and v2 ∈ W . Sincev1 ∈ 〈S〉 and v2 ∈ 〈T 〉, it follows that v ∈ 〈S ∪ T 〉 and so U +W = 〈S ∪ T 〉.

Suppose that∑l

i=1 aiyi+∑m

i=l+1 biui+∑n

i=l+1 ciwi = 0 where ai, bi, ci ∈ k.Then

n∑i=l+1

ciwi = −l∑

i=1

aiyi −m∑

i=l+1

biui ∈ U ∩W.

Thus,∑n

i=l+1 ciwi =∑l

i=1 diyi, since R is a basis of U ∩W . Since T is abasis of W , it follows that ci = 0, l + 1 ≤ i ≤ n, and di = 0, 1 ≤ i ≤ l. Nowwe have

∑li=1 aiyi +

∑mi=l+1 biui = 0. Since S is a basis of U , it follows that

ai = 0, 1 ≤ i ≤ l, and bi = 0, l + 1 ≤ i ≤ m. This shows S ∪ T is a linearlyindependent set and so S ∪ T is a basis of U +W .

The next result will be used in the definition of a direct sum of vectorspaces.

Proposition 1.16. Let W1, . . . ,Wm be subspaces of V . The following state-ments are equivalent.

1. V = W1 + · · ·+Wm and

(W1 + · · ·+Wi−1 +Wi+1 + · · ·+Wm) ∩Wi = (0), 1 ≤ i ≤ m.

12

2. Each v ∈ V has a unique expression v = w1 + · · ·+wm where wi ∈ Wi,1 ≤ i ≤ m.

Proof. (1) ⇒ (2). Statement (1) implies that each v ∈ V has an expressionas in (2). Suppose that w1 + · · · + wm = y1 + · · · + ym, where wi, yi ∈ Wi.Then

(w1 − y1) + · · ·+ (wi−1 − yi−1) + (wi+1 − yi+1) + · · ·+ (wm − ym) =

yi − wi ∈ (W1 + · · ·+Wi−1 +Wi+1 + · · ·+Wm) ∩Wi = (0), 1 ≤ i ≤ m.

Therefore, yi − wi = 0, so wi = yi, 1 ≤ i ≤ m.(2) ⇒ (1). Let v ∈ (W1 + · · ·+Wi−1 +Wi+1 + · · ·+Wm) ∩Wi. Then we

have v = w1 + · · ·+wi−1 +wi+1 + · · ·+wm = wi, where wj ∈ Wj, 1 ≤ j ≤ m.The uniqueness part of (2) implies that each wj = 0, so it follows that v = 0.The remaining part of (1) is obvious from (2).

Definition 1.17. If W1, . . . ,Wm are subspaces of a vector space V that sat-isfy the statements of Proposition 1.16, then we say that V is the (internal)direct sum of W1, . . . ,Wm and write V = W1

⊕· · ·⊕

Wm.

There is a slightly different notion of direct sum that is given in Exercise7. These are distinguished by referring to the direct sum in Definition 1.17as the internal direct sum and the direct sum in Exercise 7 as the externaldirect sum. The distinction is not large since we will show in Chapter 2 thatthe two direct sums are essentially the same.

Notes to Section 1.1We have proved that all finitely generated vector spaces V have a basis,

and that all such bases have the same cardinality. There remains the questionof finding or describing all bases of V . In addition, we need computationalmethods to find bases of vector spaces and/or calculate the dimensions ofvector spaces. The following is a specific problem that is important to solve.Let vi = (ai1, ai2, . . . , ain) ∈ k(n), 1 ≤ i ≤ m. Let W = 〈v1, . . . , vm〉, thesubspace of V spanned by v1, . . . , vm. What is dimW and how does onefind a basis of W? Some methods will be described in Chapter 3.

There are other approaches to this material. See, for example, the textby Hoffman and Kunze.

An exposition of Zorn’s Lemma can be found in Hungerford’s Algebrabook.

13

Exercises

1. Check that k(n), as defined in Example 1, satisfies the ten axioms of avector space.

2. Check that e1, . . . , en defined in Example 3 is a basis of k(n).

3. Check the details of Examples 4 and 5.

4. Let W1, . . . ,Wn be subspaces of a vector space V . Then W1 ∩ · · · ∩Wn

is also a subspace of V .

5. Suppose S is a subset of a vector space V and W = 〈S〉. Then W isthe smallest subspace of V that contains S, and W is the intersectionof all subspaces of V that contain S.

6. Let W1, . . . ,Wn be subspaces of a vector space V . Define W1+ · · ·+Wn

to be w1 + · · ·+wn|wi ∈ Wi, 1 ≤ i ≤ n. Then W1 + · · ·+Wn is also asubspace of V . Verify that it is the smallest subspace of V that containsW1, . . . ,Wn. That is, W1 + · · ·+Wn = 〈W1 ∪ · · · ∪Wn〉.

7. Let V1, . . . , Vn be vector spaces over k. Define V1⊕· · ·⊕

Vn to be(v1, . . . , vn)|vi ∈ Vi, 1 ≤ i ≤ n. Define addition and scalar multipli-cation on this set according to the following rules.

(v1, . . . , vn) + (y1, . . . , yn) = ((v1 + y1), . . . , (vn + yn))

c(v1, . . . , vn) = (cv1, . . . , cvn)

Then V1⊕· · ·⊕

Vn is a vector space over k. This vector space is calledthe (external) direct sum of V1, . . . , Vn.

8. (a) Let W1,W2 be two subspaces of a vector space V . Suppose V =W1 ∪W2. Then either V = W1 or V = W2. That is, V is not the unionof two proper subspaces.

(b) (harder) If k is an infinite field, show that V is not a finite unionof proper subspaces. What can be said if k is a finite field?

9. Every element of a linearly independent subset is nonzero. The setcontaining only the zero vector is a linearly dependent set.

14

10. If S is a linearly independent subset of V and∑

v∈S avv =∑

v∈S bvv,av, bv ∈ k, then av = bv for all v ∈ S.

11. Let v1, . . . , vn be a generating set of V and let w1, . . . , wm ∈ V . Ifv1 . . . , vn ∈ 〈w1, . . . , wm〉, then V = 〈w1, . . . wm〉.

12. Let S be a linearly independent subset of a vector space V and letx ∈ V . Then S ∪ x is a linearly independent subset if and only ifx 6∈ 〈S〉.

13. Let V be a finitely generated vector space and let W be a subspace ofV . Then

(a) W is finitely generated.

(b) dimW ≤ dimV .

(c) If dimW = dimV , then W = V .

14. A subset S of V containing only nonzero vectors is linearly dependentif and only if some vector v in S can be expressed as a nontrivial linearcombination of vectors in S distinct from v. This statement is falseif the phrase “some vector v” is replaced by “each vector v”. Thestatement is also false if S is allowed to contain the zero vector.

15. Let S ⊆ T be subsets of a vector space V . If T is a linearly independentset, then so is S. If S is a linearly dependent set, then so is T .

16. Assume dimV = n. Any set S ⊆ V that contains more than n elementsmust be a linearly dependent set. Any set S ⊆ V that contains fewerthan n elements does not generate V .

17. Let W1, . . . ,Wn be subspaces of V . Suppose that V = W1

⊕· · ·⊕

Wn

(as in Definition 1.17) and that V is finitely generated. Then dimV =∑ni=1 dimWi. In fact if Si is a basis for Wi, then

⋃Si is a basis of V .

18. Let V1, . . . , Vn be finitely generated vector spaces over k. Then

dim(V1⊕· · ·⊕

Vn) =n∑i=1

dim(Vi).

Describe a basis of V1⊕· · ·⊕

Vn in terms of bases of V1, . . . , Vn. (Notethat this is the external direct sum as defined in Exercise 7.)

15

19. Let W be a subspace of V , with dimV finite. Then there exists asubspace Y of V such that V = W

⊕Y .

20. Show that (2, 3), (3, 4) is a basis of R(2).

21. Let a, b, c, d ∈ R. Show that (a, b), (c, d) is a basis of R(2) if and onlyif ad− bc 6= 0.

1.2 Systems of Linear Equations

We introduce in this section the main subspaces associated with a systemof linear equations. Studying systems of linear equations gives a lot of mo-tivation for most of the concepts introduced in later chapters. Many of thetopics introduced here will be introduced in more detail later in this text.

Consider the following system of m linear equations in n variables withcoefficients in a field k.

a11x1 + a12x2 + · · ·+ a1nxn = c1

a21x1 + a22x2 + · · ·+ a2nxn = c2...

am1x1 + am2x2 + · · ·+ amnxn = cm

We have aij, ci ∈ k for all i and j.A system of linear equations is called a system of homogeneous linear

equations if c1 = · · · = cm = 0.The matrix of coefficients of a system of linear equations is the m × n

matrix A given by

A =

a11 a12 · · · a1na21 a22 · · · a2n

...am1 am2 · · · amn

.

We let Mm×n(k) denote the set of all m× n matrices with entries in k.

The rows of A are the m vectors

(a11, a12, . . . , a1n)

16

(a21, a22, . . . , a2n)

...

(am1, am2, . . . , amn)

in k(n), 1 ≤ i ≤ m.

The columns of A are the n vectorsa11a21...am1

,

a12a22...am2

, . . . ,

a1na2n...

amn

∈ k(m), 1 ≤ j ≤ n.

We will write such vectors vertically or horizontally, whichever is more con-venient.

The null set of A is the set of vectors in k(n) that are solutions to thehomogeneous system of linear equations

a11x1 + a12x2 + · · ·+ a1nxn = 0

a21x1 + a22x2 + · · ·+ a2nxn = 0

...

am1x1 + am2x2 + · · ·+ amnxn = 0.

It is straightforward to verify that the null set of A is a subspace of k(n).(See Exercise 1.)

Definition 1.18. Let A ∈Mm×n(k).

1. The row space of A, denoted R(A), is the subspace of k(n) spanned bythe m rows of A.

2. The column space of A, denoted C(A), is the subspace of k(m) spannedby the n columns of A.

3. The null space of A, denoted N (A), is the subspace of vectors in k(n)

that are solutions to the above homogeneous system of linear equations.

17

The product of a matrix A ∈Mm×n(k) and a vector b ∈ k(n) is a partic-ular vector c ∈ k(m) defined below. We follow the traditional notation andwrite b and c as column vectors. Thus we are defining Ab = c and writingthe following expression.

a11 a12 · · · a1na21 a22 · · · a2n

...am1 am2 · · · amn

b1b2...bn

=

c1...cm

.

The vector c is defined by setting

ci =n∑j=1

aijbj, for 1 ≤ i ≤ m.

We will see in Chapter 3 a conceptual reason for this seemingly arbitrarydefinition.

It is convenient to introduce additional notation to deal with sums of thistype. Given two vectors v, w ∈ k(n), we define the dot product of v and w asfollows. Let v = (a1, a2, . . . , an) and w = (b1, b2, . . . , bn). The dot product ofv and w is defined by

v · w =n∑j=1

ajbj.

The definition of Ab above can be expressed in terms of dot products ofvectors. Let v1, . . . , vm denote the m row vectors of A in k(n). Then Ab = cwhere

c =

v1 · bv2 · b

...vm · b

.

There is a second way to express Ab = c. Let w1, . . . wn denote the ncolumn vectors of A in k(m). We continue to write

b =

b1b2...bn

and c =

c1...cm

.

18

Then it is straightforward to verify that Ab = c where

c = b1w1 + b2w2 + · · · bnwn.

(See Exercise 9.)The dot product satisfies the following properties. Let v, w, y ∈ k(n) be

arbitrary vectors and let a, b ∈ k.

1. v · w = w · v

2. v · (w + y) = v · w + v · y

3. v · aw = a(v · w)

Thus v · (aw + by) = a(v · w) + b(v · y). (See Exercise 2.)Motivated by results on dot products of vectors in vector spaces over the

real numbers, we say that two vectors v, w ∈ k(n) are orthogonal if v ·w = 0.We write v ⊥ w to denote that v · w = 0.

Let W be an arbitrary subspace of k(n). We define the orthogonal com-plement of W , denoted W⊥, by

W⊥ = v ∈ k(n) | v · w = 0 for all w ∈ W.

Using the properties of dot products, it is straightforward to verify that W⊥

is a subspace of k(n). (See Exercise 3.)Putting together all of our definitions and notation, it follows that

N (A) = (R(A))⊥.

A system of linear equations can be expressed by the single matrix equa-tion

Ax = c, where x =

x1x2...xn

.For a matrix A ∈Mm×n(k), define the function

LA : k(n) → k(m) by LA(b) = Ab.

The function LA is an example of a linear transformation. Linear transforma-tions will be studied in detail in Chapter 2. In particular, for all v, v1, v2 ∈ k(n)and a ∈ k, the following properties hold.

19

1. LA(v1 + v2) = LA(v1) + LA(v2)

2. LA(av) = aLA(v)

We see that im(LA) = Ab | b ∈ k(n) = C(A). Note that the system of linearequations Ax = c has a solution if and only if c ∈ C(A).

Suppose we know that b is a solution of the matrix equation Ax = c, andso LA(b) = c. Then all solutions of the equation Ax = c are given by

b+N (A) = b+ y|y ∈ N (A).

(See Exercise 4.) For the special case that c = 0, we may certainly takeb = 0, and then this result simply reduces to the original definition of N (A).

Thus the existence of solutions of a system of linear equations and thecharacterization of all such solutions can be described in terms of the functionLA and the subspaces N (A) and C(A) = im(LA).

Theorem 1.19. Using the notations from above, we have

dim(N (A)) + dim(C(A)) = n.

Proof. Let w1, . . . , ws be a basis of N (A). As N (A) ⊆ k(n), we can extendw1, . . . , ws to a basis w1, . . . , ws, v1, . . . vn−s of k(n). We have Awi = 0,1 ≤ i ≤ s.

We will now show that Av1, . . . Avn−s is a basis of C(A). This willlet us conclude that dim(C(A)) = n − s, which will finish the proof. Sincew1, . . . , ws, v1, . . . vn−s is a basis of k(n), we have that C(A) = im(LA) isspanned by Aw1, . . . , Aws, Av1, . . . Avn−s. Since Awi = 0 for 1 ≤ i ≤ s, itfollows that C(A) is spanned by Av1, . . . Avn−s.

To show that Av1, . . . Avn−s is a linearly independent set, suppose thatc1Av1 + · · ·+ cn−sAvn−s = 0, where each ci ∈ k. Then

A(c1v1 + · · ·+ cn−svn−s) = 0.

Thus c1v1 + · · · + cn−svn−s ∈ N (A) = Spanw1, . . . , ws. It follows thatc1 = · · · = cn−s = 0 because v1, . . . , vn−s, w1, . . . , ws is a basis of k(n).Thus Av1, . . . Avn−s is a linearly independent set, and so is also a basis ofC(A).

20

The equation dim(N (A)) + dim(C(A)) = n allows us to recover many ofthe standard results about systems of linear equations. We need a method tocompute a basis of N (A) and a basis of C(A) in order to completely describethe set of solutions to a system of linear equations. This is one of the mainapplications of matrix methods developed in Chapter 3.

There are three important results about dim(C(A)), dim(R(A)), anddim(N (A)). Theorem (1.19) is the first result. We also have the follow-ing two theorems.

Theorem 1.20. For any subspace V ⊆ k(n), we have

dim(V ) + dim(V ⊥) = n.

In particular, since N (A) = R(A)⊥, we have

dim(R(A)) + dim(N (A)) = n.

Theorem 1.21. dim(R(A)) = dim(C(A)).

For any subspace V ⊆ k(n), there exists a matrix A ∈ Mm×n(k) withm ≥ dim(V ) such that V = R(A). (See Exercise 5.) From this it followsthat any two of Theorems (1.19), (1.20), and (1.21) easily implies the third.(See Exercise 6.) There are several proofs of Theorems (1.20) and (1.21) inthese notes that use different methods. Each of the proofs is trickier thanour previous proofs.

Exercises

1. For A ∈Mm×n(k), prove that the null set of A is a subspace of k(n).

2. Verify the properties of the dot product of two vectors as given in thetext.

3. For a subspace W ⊆ k(n), verify that W⊥ is a subspace of k(n).

4. If b is a solution of the system Ax = c, prove that all solutions aregiven by b+N (A) = b+ y|y ∈ N (A).

5. For any subspace V ⊆ k(n), prove that for m ≥ dim(V ) there exists amatrix A ∈Mm×n(k) such that V = R(A).

21

6. Using the previous exercise, prove that any two of Theorems 1.19, 1.20,1.21 implies the third.

7. For any v ∈ k(n), let 〈v〉 denote the subspace of k(n) generated by v.

(a) For any nonzero a ∈ k, prove that 〈av〉⊥ = 〈v〉⊥.

(b) Prove that dim(〈v〉⊥) =

n− 1 if v 6= 0

n if v = 0.

Note that this exercise gives a proof of the special case of Theorem 1.20when dim(V ) = 0 or 1.

8. Consider a system of homogeneous linear equations where m < n.Use Theorem 1.19 to prove that there exists a nonzero solution to thesystem.

9. Let e1, . . . , en be the standard basis of k(n).

(a) Show that Aej is the jth column of A.

(b) Suppose that b = b1e1 + · · ·+ bnen where each bj ∈ k. Show thatAb =

∑nj=1 bj(column j of A).

10. Let V,W ⊆ k(n) be subspaces. Prove the following statements.

(a) If W ⊆ V , then V ⊥ ⊆ W⊥.

(b) (V +W )⊥ = V ⊥ ∩W⊥

(c) (V ∩W )⊥ = V ⊥ +W⊥

In (c), the inclusion (V ∩ W )⊥ ⊆ V ⊥ + W⊥ is difficult to prove atthis point. One strategy is to prove that (V ∩W )⊥ ⊇ V ⊥ + W⊥ andto use Theorem 1.20 (which has not yet been proved) to prove thatdim((V ∩ W )⊥) = dim(V ⊥ + W⊥). Later we will be able to give acomplete proof of this inclusion.

1.3 Appendix

Let V be a vector space defined over a field k. In this appendix, we showthat the axiom stating that addition is commutative is actually redundant.

22

That is, the other axioms for a vector space already imply that addition iscommutative.

(The following argument is actually valid for an arbitrary R-module Mdefined over a commutative ring R with a multiplicative identity.)

We assume that V is a group, not necessarily commutative, and we as-sume the following three facts concerning scalar multiplication.

1. r · (~v + ~w) = r · ~v + r · ~w for all ~v, ~w ∈ V and for all r ∈ k.

2. (r + s) · ~v = r · ~v + s · ~v for all v ∈ V and for all r, s ∈ k.

3. 1 · ~v = ~v for all v ∈ V .

On the one hand, using (1) and then (2) we have

(r + s) · (~v + ~w) = (r + s) · ~v + (r + s) · ~w = r · ~v + s · ~v + r · ~w + s · ~w.

On the other hand, using (2) and then (1) we have

(r + s) · (~v + ~w) = r · (~v + ~w) + s · (~v + ~w) = r · ~v + r · ~w + s · ~v + s · ~w.

Since V is a group, we can cancel on both sides to obtain

s · ~v + r · ~w = r · ~w + s · ~v.

We let r = s = 1 and use (3) to conclude that ~v + ~w = ~w + ~v. Therefore,addition in V is commutative.

1.4 Proof of Theorems 1.20 and 1.21

We let Mm×n(k) denote the set of m × n matrices with entries in a field k,and we let e1, . . . , en denote the standard basis of k(n).

Lemma 1.22. Let A ∈ Mn×n(k). Then A is an invertible matrix if andonly if Ae1, . . . , Aen is a basis of k(n).

Proof. Assume that A is an invertible matrix. We show that Ae1, . . . , Aenis a linearly independent spanning set of k(n). Suppose that c1Ae1 + · · · +cnAen = 0, where each ci ∈ k. Then A(c1e1 + · · · + cnen) = 0. Since Ais invertible, it follows that c1e1 + · · · + cnen = 0, and so c1 = · · · = cn =0 because e1, . . . , en is a linearly independent set. Thus Ae1, . . . , Aen

23

is a linearly independent set in k(n). To show that Ae1, . . . , Aen spansk(n), consider v ∈ k(n). Since e1, . . . , en is a basis of k(n), we may writeA−1v = c1e1 + · · ·+ cnen where each ci ∈ k. Applying A to both sides givesv = A(c1e1 + · · · + cnen) = c1Ae1 + · · · + cnAen. Therefore, Ae1, . . . , Aenis a basis of k(n).

Now assume that Ae1, . . . , Aen is a basis of k(n). There exist bij ∈ k,1 ≤ i ≤ n and 1 ≤ j ≤ n, such that

∑ni=1 bij(Aei) = ej. Let B = (bij)n×n, an

n× n matrix. Using the fact thatn∑i=1

bij(column i of A) =n∑i=1

bij(Aei) = ej,

we see that AB = In, where In denotes the n×n identity matrix. Therefore,A is in an invertible matrix.

Proposition 1.23. Let v1, . . . , vn be a basis of k(n). Then there is a unique

basis w1, . . . , wn of k(n) such that vi · wj =

0 if i 6= j

1 if i = j.

Proof. Let e1, . . . , en be the standard basis of k(n). Let A ∈ Mn×n(k) bethe (unique) matrix such that Aei = vi, 1 ≤ i ≤ n. (The columns of A are thevectors v1, . . . , vn.) Then A is invertible by Lemma 1.22. Let wi = (At)−1ei,1 ≤ i ≤ n. Then w1, . . . , wn is a basis of k(n) by Lemma 1.22 because(At)−1 is invertible. We have

vi · wj = vtiwj = (Aei)t(At)−1ej = etiA

t(At)−1ej

= etiej = ei · ej =

0 if i 6= j

1 if i = j.

Suppose that w′1, . . . , w′n is another basis of k(n) such that vi · w′j =0 if i 6= j

1 if i = j.Then vi · (wj −w′j) = vi ·wj − vi ·w′j = 0 for all i and j. Then

wj − w′j ∈ (k(n))⊥ for all j. Since (k(n))⊥ = (0) (see Exercise 1), it followsthat wj = w′j for all j. This proves the uniqueness statement.

Lemma 1.24. Suppose that v1, . . . , vn and w1, . . . , wn are bases of k(n)

such that vi ·wj =

0 if i 6= j

1 if i = j.Let V = Spanv1, . . . , vl, where 1 ≤ l ≤ n.

Then V ⊥ = Spanwl+1, . . . , wn.

24

Proof. We have Spanwl+1, . . . , wn ⊆ V ⊥ because if 1 ≤ i ≤ l and l + 1 ≤j ≤ n, then i 6= j and so vi · wj = 0.

Take w ∈ V ⊥. We may write w = a1w1 + · · ·+ anwn. Since vi ·w = 0 for1 ≤ i ≤ l, it follows that ai = 0 for 1 ≤ i ≤ l. Then w = al+1wl+1+ · · ·+anwnand thus w ∈ Spanwl+1, . . . , wn. Therefore V ⊥ = Spanwl+1, . . . , wn.

The next result recovers Theorem 1.20.

Proposition 1.25. Any subspace V ⊆ k(n) satisfies dim(V ) + dim(V ⊥) = n.

Proof. Extend a basis v1, . . . , vl of V to a basis v1, . . . , vn of k(n). Let

w1, . . . , wn be a basis of k(n) such that vi · wj =

0 if i 6= j

1 if i = j.Then

Spanwl+1, . . . , wn is a basis of V ⊥. Thus dim(V )+dim(V ⊥) = l+(n− l) =n.

We can now prove Theorem 1.21.

Theorem 1.26. Let A ∈Mm×n(k). Then dim(R(A)) = dim(C(A)).

Proof. Since N (A) = R(A)⊥, Theorems 1.19 and 1.20 imply that

dim(N (A)) + dim(C(A)) = n = dim(R(A)) + dim(R(A)⊥)

= dim(R(A)) + dim(N (A)).

Therefore dim(C(A)) = dim(R(A)).

The next proposition develops a bit further the ideas in the proof ofProposition 1.23 and Lemma 1.24.

Proposition 1.27. Assume that A ∈ Mn×n(k) is an invertible matrix. LetW be a subspace of k(n). Then (At)−1W⊥ = (AW )⊥.

Proof. Take v ∈ AW and w ∈ (At)−1W⊥. Then v = Av1 where v1 ∈ W andw = (At)−1w1 where w1 ∈ W⊥. Then

v · w = vtw = (vt1At)(At)−1w1 = vt1w1 = v1 · w1 = 0,

because v1 ∈ W and w1 ∈ W⊥. Since v, w are arbitrary, it follows that(At)−1W⊥ ⊆ (AW )⊥.

25

Now take w ∈ (AW )⊥. Let w1 = Atw. We will show that w1 ∈ W⊥. Letv ∈ W . Then

v · w1 = vtw1 = vtAtw = (Av)tw = Av · w = 0,

because w ∈ (AW )⊥. Thus w1 ∈ W⊥, and so w = (At)−1w1 ∈ (At)−1W⊥.Therefore, (AW )⊥ ⊆ (At)−1W⊥, and so (At)−1W⊥ = (AW )⊥.

Exercises

1. Prove that (k(n))⊥ = (0).

2. Assume that A ∈Mn×n(k) is an invertible matrix. Show for any basisv1, . . . , vn of k(n) that Av1, . . . , Avn is a basis of k(n).

3. Show the following.

(a) Show that V ⊆ (V ⊥)⊥.

(b) Compute dim((V ⊥)⊥) and conclude that V = (V ⊥)⊥.

26

Chapter 2

Linear Transformations

2.1 Basic Results

The most important functions between vector spaces are called linear trans-formations. In this chapter we will study the main properties of these func-tions.

Let V,W be vector spaces over a field k. Let 0V , 0W denote the zerovectors of V,W respectively. We will write 0 instead, if the meaning is clearfrom context.

Definition 2.1. A function f : V → W is a linear transformation if

1. f(v1 + v2) = f(v1) + f(v2), for all v1, v2 ∈ V and

2. f(av) = af(v), for all a ∈ k, v ∈ V .

For the rest of section 2.1, let f : V → W be a fixed linear transformation.

Lemma 2.2. f(0V ) = 0W and f(−v) = −f(v).

Proof. Using Proposition 1.1, parts (1) and (4), we see that f(0V ) = f(0 ·0V ) = 0f(0V ) = 0W , and f(−v) = f((−1)v) = (−1)f(v) = −f(v).

For any vector space V and any v, y ∈ V , recall from Chapter 1 that v−ymeans v + (−y). For any linear transformation f : V → W , and v, y ∈ V ,Lemma 2.2 lets us write f(v − y) = f(v)− f(y) because

f(v − y) = f(v + (−y)) = f(v) + f(−y) = f(v) + (−f(y)) = f(v)− f(y).

Examples Here are some examples of linear transformations f : V → W .

27

1. If f(v) = 0 for all v ∈ V , then f is a linear transformation called thezero transformation.

2. If V = W and f(v) = v for all v ∈ V , then f is a linear transformationcalled the identity transformation. We denote the identity transforma-tion on V by 1V : V → V .

3. If U is a subspace of V , then the inclusion mapping ι : U → V is alinear transformation, where for u ∈ U we have ι(u) = u ∈ V .

4. Let V = W⊕

Y . Each v ∈ V can be written uniquely as v = w +y where w ∈ W and y ∈ Y . Let f(v) = w. Then f is a lineartransformation from V to W called the projection from V to W .

5. Let V = W = k[x] where k[x] denotes the ring of polynomials withcoefficients in k. Thus each element of V can be written a0+a1x+a2x

2+· · · + amx

m where each ai ∈ k and m ≥ 0. Then V is a vector spaceover k where addition in V is the usual addition of polynomials andscalar multiplication is the usual multiplication in k[x] of an elementin k by a polynomial in k[x]. Let f : V → V where f(a0 +a1x+a2x

2 +· · · + amx

m) = a1 + a2x + a3x2 + · · · amxm−1. Let g : V → V where

g(a0 + a1x+ a2x2 + · · ·+ amx

m) = a0x+ a1x2 + a2x

3 + · · ·+ amxm+1.

Then f and g are both linear transformations.

6. Let V = f : R → R | f is C∞. That is, V is the set of infinitelydifferentiable real-valued functions defined on the set of real numbers.Using standard facts about differentiable functions, one can show thatV is a vector space over the field of real numbers. Let d : V → V bedefined by d(f) = f ′, where f ′ is the derivative of f . Then d is a lineartransformation. Note that the hypothesis of infinite differentiability isneeded since a function f might be differentiable at each real numberbut its derivative f ′ might not be differentiable at each real number.

For example, let f(x) =

x2 if x ≥ 0,

−x2 if x < 0.Then f ′(x) = 2|x|, but

f ′(x) is not differentiable at x = 0.

Definition 2.3.

1. The kernel of f , written ker(f), is defined as v ∈ V |f(v) = 0.

28

2. The image of f , written im(f), is defined as w ∈ W |w = f(v) forsome v ∈ V .

3. f is injective if f(v1) = f(v2) implies v1 = v2, where v1, v2 ∈ V .

4. f is surjective if im(f) = W .

5. f is bijective if f is injective and surjective.

Proposition 2.4.

1. ker(f) is a subspace of V .

2. im(f) is a subspace of W .

3. f is injective if and only if ker(f) = (0).

Proof. (1) If v1, v2 ∈ ker(f), and a ∈ k, then f(v1 + v2) = f(v1) + f(v2) =0 + 0 = 0, and f(av1) = af(v1) = a · 0 = 0. Since ker(f) is nonempty(0 ∈ ker(f)), the result follows from Proposition 1.5.

(2) Let w1, w2 ∈ im(f) and a ∈ k. Then w1 = f(v1) and w2 = f(v2) forsome v1, v2 ∈ V . Then w1 + w2 = f(v1) + f(v2) = f(v1 + v2) ∈ im(f), andaw1 = af(v1) = f(av1) ∈ im(f). Since im(f) is nonempty, the result followsfrom Proposition 1.5.

(3) Suppose that f is injective and let v ∈ ker(f). Since f(v) = 0 = f(0),it follows that v = 0 and thus, ker(f) = 0. Conversely, suppose that ker(f) =0 and let f(v1) = f(v2). Then f(v1−v2) = f(v1)+f(−v2) = f(v1)−f(v2) = 0.Thus, v1 − v2 ∈ ker(f) = (0) and so v1 = v2. Therefore f is injective.

For any function f : V → W , if w ∈ W , recall that f−1(w) denotes theinverse image (or preimage) of w in V . That is, f−1(w) is the set of elementsx ∈ V such that f(x) = w.

Proposition 2.5. Let f : V → W be a linear transformation and let f(v) =w where v ∈ V and w ∈ W . Then f−1(w) = v + ker(f).

Proof. Let y ∈ ker(f). Then f(v + y) = f(v) + f(y) = f(v) = w. Thus,v + y ∈ f−1(w), and so v + ker(f) ⊆ f−1(w).

Now let u ∈ f−1(w). Then f(u) = w = f(v). This gives f(u − v) =f(u) − f(v) = 0, so u − v ∈ ker(f). Then u = v + (u − v) ∈ v + ker(f), sof−1(w) ⊆ v + ker(f).

29

Proposition 2.6. Let f be as above and assume that dimV is finite. Then

dimV = dim(ker(f)) + dim(im(f)).

Proof. Let v1, . . . , vn be a basis of V . Then f(v1), . . . , f(vn) spans im(f).(See Exercise 3(a).) It follows from Corollary 1.10(3) that im(f) has a finitebasis. Since ker(f) is a subspace of V , Exercise 13 of Chapter 1 implies thatker(f) has a finite basis.

Let w1, . . . , wm be a basis of im(f) and let u1 . . . , ul be a basis ofker(f). Choose yi ∈ V , 1 ≤ i ≤ m, such that f(yi) = wi. We will show thatu1, . . . , ul, y1, . . . , ym is a basis of V . This will show that dimV = l+m =dim(ker(f)) + dim(im(f)).

Let v ∈ V and let f(v) =∑m

j=1 cjwj where each cj ∈ k. Then v −∑mj=1 cjyj ∈ ker(f) because

f(v −m∑j=1

cjyj) = f(v)− f(m∑j=1

cjyj)

= f(v)−m∑j=1

cjf(yj) = f(v)−m∑j=1

cjwj = 0.

Thus, v −∑m

j=1 cjyj =∑l

i=1 aiui where each ai ∈ k, and so v =∑l

i=1 aiui +∑mj=1 cjyj. Therefore V is spanned by u1, . . . , ul, y1, . . . , ym.To show linear independence, suppose that

∑li=1 aiui +

∑mj=1 cjyj = 0.

Since ui ∈ ker(f), we have

0W = f(0V ) = f(l∑

i=1

aiui +m∑j=1

cjyj) =l∑

i=1

aif(ui) +m∑j=1

cjf(yj)

= 0V +m∑j=1

cjf(yj) =m∑j=1

cjwj.

Since w1, . . . , wm is a basis of im(f), we see that cj = 0, 1 ≤ j ≤ m. We

now have that∑l

i=1 aiui = 0. Then ai = 0, 1 ≤ i ≤ l because u1 . . . , ul isa basis of ker(f). Therefore, u1, . . . , ul, y1, . . . , ym is a linearly independentset and forms a basis of V .

Definition 2.7. A linear transformation f : V → W is an isomorphism iff is injective and surjective. We say that V is isomorphic to W , writtenV ∼= W , if there exists an isomorphism f : V → W .

30

Proposition 2.8. Let dimV = n, n ≥ 1, and let β = v1, . . . , vn be a basisof V . Then there exists an isomorphism φβ : V → k(n), where k(n) is definedin Example 1 in Chapter 1.

Proof. Define the function φβ : V → k(n) by φβ(v) = (a1, . . . , an), wherev =

∑ni=1 aivi. Then φβ is well defined by Proposition 1.8.

To show that φβ is a linear transformation, let v =∑n

i=1 aivi and w =∑ni=1 bivi. Since v + w =

∑ni=1(ai + bi)vi, we have

φβ(v + w) = (a1 + b1, . . . , an + bn) = (a1, . . . , an) + (b1, . . . , bn)

= φβ(v) + φβ(w).

If c ∈ k, then cv =∑n

i=1 caivi and hence

φβ(cv) = (ca1, . . . , can) = c(a1, . . . , an) = cφβ(v).

If v ∈ ker(φβ), then φβ(v) = (0, . . . , 0). Then v =∑n

i=1 0vi = 0 andtherefore φβ is injective by Proposition 2.4(3). If (a1, . . . , an) ∈ k(n) is given,then for v =

∑ni=1 aivi we have φβ(v) = (a1, . . . , an). Thus φβ is surjective

and therefore, φβ is an isomorphism.

The isomorphism φβ constructed in Proposition 2.8 depends on the choiceof basis β of V . This observation will be used in Chapter 3.

Proposition 2.9. Let f : V → W be a linear transformation and assumethat dimV = dimW is finite. Then the following statements are equivalent.

1. f is an isomorphism.

2. f is injective.

3. f is surjective.

Proof. We have dimV = dim(ker(f))+dim(im(f)) by Proposition 2.6. Thus,f is injective ⇐⇒ ker(f) = (0) ⇐⇒ dim(ker(f)) = 0 ⇐⇒ dim(im(f)) =dimV = dimW ⇐⇒ im(f) = W ⇐⇒ f is surjective. (See Exercise 13.)Thus (2),(3) are equivalent. If (1) is true, then both (2) and (3) are true bythe definition of isomorphism. If either (2) or (3) is true then both are true,and then (1) is true. This completes the proof.

Proposition 2.9 fails to hold if dimV is not finite. For example, considerExample 5 above. Using the notation of that example, it is easy to check thatf is surjective, but f is not injective. Also, g is injective, but not surjective.(See Exercise 19.)

31

2.2 The vector space of linear transforma-

tions f : V → W

Notation. Let L(V,W ) be the set of linear transformations from V to W .We will define addition and scalar multiplication in L(V,W ) so as to make

L(V,W ) a vector space over k with respect to these operations.Let f, g ∈ L(V,W ). Define f+g to be the function f+g : V → W , where

(f + g)(v) = f(v) + g(v), v ∈ V . For a ∈ k, define af to be the functionaf : V → W , where (af)(v) = af(v).

Proposition 2.10. L(V,W ) is a vector space with respect to the operationsdefined above.

Proof. Let f, g ∈ L(V,W ), and let v, w ∈ V , a, b ∈ k. Then

(f + g)(v + w) = f(v + w) + g(v + w) = f(v) + f(w) + g(v) + g(w)

= f(v) + g(v) + f(w) + g(w) = (f + g)(v) + (f + g)(w),

and

(f + g)(av) = f(av) + g(av) = af(v) + ag(v) = a(f(v) + g(v))

= a(f + g)(v).

Therefore, f + g ∈ L(V,W ).As for af , we have

(af)(v + w) = af(v + w) = a(f(v) + f(w)) = af(v) + af(w)

= (af)(v) + (af)(w),

and(af)(bv) = af(bv) = abf(v) = baf(v) = b(af)(v).

Therefore, af ∈ L(V,W ).We have now verified axioms (1) and (6) in the definition of a vector space.

The 0 element of L(V,W ) is the zero transformation f where f(v) = 0 forall v ∈ V . If f ∈ L(V,W ), then −f is the element (−1)f . Now it is easy toverify that L(V,W ) satisfies the ten axioms of a vector space over k. (SeeExercise 8.)

If g : U → V and f : V → W , then fg (or f g) denotes the compositefunction fg : U → W .

32

Proposition 2.11. Let U, V,W be vector spaces over k. Let g ∈ L(U, V )and let f ∈ L(V,W ). Then the composite function fg ∈ L(U,W ). (If u ∈ U ,(fg)(u) is defined to be f(g(u)).)

Proof. Let u1, u2, u ∈ U and let a ∈ k. Then

(fg)(u1 + u2) = f(g(u1 + u2)) = f(g(u1) + g(u2))

= f(g(u1)) + f(g(u2)) = (fg)(u1) + (fg)(u2),

and (fg)(au) = f(g(au)) = f(ag(u)) = af(g(u)) = a(fg)(u).

Proposition 2.12. Let U, V,W, Y be vector spaces over k. Let f, f1, f2 ∈L(V,W ), let g, g1, g2 ∈ L(U, V ), let h ∈ L(W,Y ), and let a ∈ k. Then thefollowing properties hold.

1. (f1 + f2)g = f1g + f2g

2. f(g1 + g2) = fg1 + fg2

3. a(fg) = (af)g = f(ag)

4. (hf)(g) = (h)(fg)

Proof. See Exercise 11.

Suppose now that U = V = W = Y . Then the composition of lineartransformations in L(V, V ) acts like a type of multiplication in L(V, V ). Theidentity linear transformation 1V has the property that 1V f = f1V = f forall f ∈ L(V, V ). The properties in Proposition 2.12 let us regard L(V, V ) asan associative algebra. That is, L(V, V ) is not only a vector space but alsoa ring whose multiplication is compatible with scalar multiplication.

Let f ∈ L(V,W ) and assume that f is bijective. Then the inverse functionf−1 : W → V is defined by f−1(w) = v ⇐⇒ f(v) = w. The function f−1 isalso bijective.

Proposition 2.13. If f ∈ L(V,W ) and f is bijective, then f−1 ∈ L(W,V ).Thus, if f is an isomorphism, then f−1 is an isomorphism.

Proof. Let w1, w2 ∈ W , and let f−1(w1) = v1 and f−1(w2) = v2. Thenf(v1) = w1 and f(v2) = w2. Since f(v1 + v2) = f(v1) + f(v2) = w1 + w2, itfollows that f−1(w1 + w2) = v1 + v2 = f−1(w1) + f−1(w2).

If a ∈ k, then f(av1) = af(v1) = aw1. Thus, f−1(aw1) = av1 = af−1(w1).

33

2.3 Quotient Spaces

Let V be a vector space over a field k and let W be a subspace of V . Thereis an important vector space formed from V and W called the quotient spaceV mod W , written V/W .

We begin the construction by defining the following equivalence relationon V . Let v, w ∈ V . We say that v ≡ w mod W if v − w ∈ W . Then thefollowing three properties are easily verified.

1. v ≡ v mod W for all v ∈ V . (reflexive)

2. If v ≡ w mod W , then w ≡ v mod W . (symmetric)

3. If v ≡ w mod W and w ≡ y mod W , then v ≡ y mod W . (transitive)

To see that the transitive property holds, observe that if v ≡ w mod Wand w ≡ y mod W , then v − w ∈ W and w − y ∈ W . Then v − y =(v − w) + (w − y) ∈ W , and so v ≡ y mod W .

These three properties show that ≡ modW is an equivalence relation.Recall that v+W denotes the set v +w|w ∈ W. Such a set is called a

coset of W in V (or more precisely, a left coset of W in V ).

Lemma 2.14. Let v ∈ V . Then y ∈ V |y ≡ v mod W = v + W . That is,the equivalence classes of ≡ modW are the cosets of W in V .

Proof. ⊆: If y ≡ v mod W , then y− v = w ∈ W and so y = v +w ∈ v +W .⊇: Let y = v + w ∈ v + W , where w ∈ W . Then y ≡ v mod W because

y − v = w ∈ W .

Lemma 2.15. Let v, y ∈ V . The following four statements are equivalent.

1. v ≡ y mod W .

2. v − y ∈ W .

3. v +W = y +W .

4. (v +W ) ∩ (y +W ) 6= ∅.

In particular, two cosets of W are either equal or disjoint.

34

Proof. The equivalence of statements 1 and 2 comes from the definition of≡ modW .

2 ⇒ 3: If v − y ∈ W , then v = y + w where w ∈ W . Then v + W =(y + w) +W ⊆ y +W . Similarly, y +W ⊆ v +W because y = v − w. Thusv +W = y +W .

3 ⇒ 4: This is obvious.4 ⇒ 2: Let z ∈ (v + W ) ∩ (y + W ). Then z = v + w1 and z = y + w2

where w1, w2 ∈ W . Then v − y = w2 − w1 ∈ W .The equivalence of statements 3 and 4 implies that two cosets of W are

either equal or disjoint.

The equivalence classes of ≡ modW partition V into disjoint sets. Thus,the cosets of W in V partition V into disjoint sets.

Let V/W denote the set of equivalence classes of ≡ modW (or the set ofcosets of W in V ). The set V/W can be given the structure of a vector spaceover k. We must define addition and scalar multiplication and then checkthat the ten axioms of a vector space hold.Addition: (v +W ) + (y +W ) = (v + y) +W , for all v, y ∈ V .Scalar multiplication: a(v +W ) = (av) +W , for all a ∈ k, v ∈ V .

It is necessary to check that these definitions are well defined. Thatis, we must show that the two operations do not depend on the choice ofv or y to represent the equivalence class. Thus, it must be shown that ifv1 +W = v2 +W and y1 +W = y2 +W , then (v1 + y1) +W = (v2 + y2) +W .Similarly, it must be shown that (av1) +W = (av2) +W . (See Exercise 13.)

Proposition 2.16. The set V/W under the two operations defined above isa vector space over k.

Proof. The zero element 0V/W of V/W is 0 +W and −(v+W ) = (−v) +W .Now it is straightforward to check the ten axioms of a vector space. Notethat checking the axioms in V/W involves using the corresponding axiomsin V . (See Exercise 13.)

Proposition 2.17. The function π : V → V/W defined by π(v) = v +W islinear transformation with ker(π) = W and im(π) = V/W .

Proof. π(v + y) = (v + y) +W = (v +W ) + (y +W ) = π(v) + π(y).π(av) = (av) +W = a(v +W ) = aπ(v), a ∈ k.Thus π is a linear transformation. The image of π is clearly all of V/W sincev + W = π(v). For the kernel of π, v ∈ ker(π) ⇐⇒ v + W = 0 + W ⇐⇒v ∈ W .

35

Corollary 2.18. Assume that dimV is finite and let W be a subspace of V .Then dim(V/W ) = dimV − dimW .

Proof. Using the notation from Proposition 2.17, we have dim(V/W ) =dim(im(π)) = dimV −dim(ker(π)) = dimV −dimW , by Proposition 2.6.

Theorem 2.19 (First Isomorphism Theorem). Let f : V → Z be a lineartransformation of vector spaces over k. Then V/ ker(f) ∼= im(f). Thereexists a unique isomorphism f : V/ ker(f) → im(f) such that f π = f ,where π : V → V/ ker(f).

Proof. Let Y = im(f) and let W = ker(f). Define f : V/W → Y byf(v +W ) = f(v). We have to check five properties of f .

f is well defined: Suppose v + W = y + W . Then v − y ∈ W = ker(f) andso f(v) = f(y) because f(v) − f(y) = f(v) + f(−y) = f(v − y) = 0. Now,f(v +W ) = f(v) = f(y) = f(y +W ).

f is a linear transformation: f((v + W ) + (y + W )) = f((v + y) + W ) =f(v + y) = f(v) + f(y) = f(v +W ) + f(y +W ).f(a(v +W )) = f(av +W ) = f(av) = af(v) = af(v +W ).

f is injective: Let f(v + W ) = 0. Then f(v) = 0 and so v ∈ ker(f) = W .Therefore v +W = 0 +W and therefore ker(f) = (0 +W ).

f is surjective: Let y ∈ Y . Then y = f(v) for some v ∈ V and f(v + W ) =f(v) = y.

This shows that f : V/ ker(f)→ im(f) is an isomorphism.

f π = f : This is obvious.Finally, f is unique since f(v +W ) must equal f(v).

Theorem 2.20 (Second Isomorphism Theorem). Let W,Y be subspaces ofV . Then (W + Y )/W ∼= Y/(W ∩ Y ).

Proof. Let f : Y → W + Y be the linear transformation defined by f(y) = yfor all y ∈ Y , and let g : W +Y → (W +Y )/W be the linear transformationdefined by g(w + y) = (w + y) + W for all w ∈ W and y ∈ Y . Theng f : Y → (W + Y )/W is a linear transformation. We now computeker(g f) and im(g f). We have

y ∈ ker(g f) ⇐⇒ y ∈ Y and y +W = 0 +W ⇐⇒ y ∈ W ∩ Y.

36

Thus ker(g f) = W ∩ Y . Next we have that g f is surjective because if(w+y)+W ∈ (W+Y )/W , then (gf)(y) = g(y) = y+W = (w+y)+W . Thusim(g f) = (W + Y )/W . Then the First Isomorphism Theorem (Theorem2.19) implies that W/ ker(g f) ∼= im(g f). Therefore, W/(W ∩ Y ) ∼=(W + Y )/W .

2.4 Applications

Direct Sums.We will now show that the internal direct sum defined in Definition 1.17

and the external direct sum defined in Exercise 7 of Chapter 1 are essentiallythe same.

Let V = V1⊕· · ·⊕

Vn be an external direct sum of the vector spacesV1, . . . , Vn, and let

V′

i = (0, . . . , 0, vi, 0, . . . , 0)|vi ∈ Vi.

Then V′i is a subspace of V and it is easy to check that V

′i is isomorphic to

Vi. It follows from Definition 1.17 and Proposition 1.16 that V equals theinternal direct sum V

′1

⊕· · ·⊕

V′n.

Now suppose that V is the internal direct sum of subspaces W1, . . . ,Wm

as in Definition 1.17. Let V′

equal the external direct sum W1

⊕· · ·⊕

Wm.Consider the function f : V → V

′defined as follows. If v ∈ V , Proposition

1.16 implies there is a unique expression v = w1 + · · ·+wm where each wi ∈Wi. Set f(v) = (w1, . . . , wm). It is easy to check that f is a bijective lineartransformation and so f is an isomorphism. We will no longer distinguishbetween the internal and external direct sum.

Exact Sequences.

Definition 2.21. Let Uf→ V

g→ W be a sequence of linear transformations.

(This means f ∈ L(U, V ) and g ∈ L(V,W ).) We say Uf→ V

g→ W is anexact sequence at V if im(f) = ker(g). A sequence

V1f1→ V2

f2→ · · · fn−2→ Vn−1fn−1→ Vn

is an exact sequence if im(fi−1) = ker(fi), 2 ≤ i ≤ n − 1, that is, if thesequence is exact at V2, . . . , Vn−1.

37

When writing sequences of linear transformations, it is standard to write0 for the vector space (0). Note that the linear transformations 0 → V andV → 0 are uniquely defined and hence we will never give a name to thesefunctions. The following lemma gives some of the basic results.

Lemma 2.22.

1. The sequence 0→ V → 0 is exact if and only if V = (0).

2. The sequence 0→ Uf→ V is exact if and only if f is injective.

3. The sequence Vg→ W → 0 is exact if and only if g is surjective.

4. The sequence 0→ Vf→ W → 0 is exact if and only if f is an isomor-

phism.

5. The sequence 0 → Uf→ V

g→ W → 0 is exact if and only if f isinjective, g is surjective, and im(f) = ker(g).

Proof. The proofs are immediate from the definitions upon noting that

im(0→ V ) = 0, and ker(V → 0) = V.

Proposition 2.23. Assume that 0→ V1f1→ V2

f2→ · · · fn−1→ Vn→0 is an exactsequence of finite dimensional vector spaces, n ≥ 1. Then

n∑i=1

(−1)i dimVi = 0.

Proof. We prove the result by induction on n. If n = 1, then V1 = (0)and so dimV1 = 0. If n = 2, then V1 ∼= V2 by Lemma 2.22(4) and sodimV1 = dimV2 (see Exercise 7). Now assume that n = 3. In this case,

0→ V1f1→ V2

f2→ V3 → 0 is an exact sequence. Then

dimV2 = dim(ker(f2)) + dim(im(f2)) = dim(im(f1)) + dimV3

= dimV1 + dimV3.

(Note that ker(f1) = (0) since f1 is injective and so dim(ker(f1)) = 0. There-fore dimV1 = dim(im(f1)) by Proposition 2.6.) Thus, − dimV1 + dimV2 −dimV3 = 0 and the result holds for n = 3.

38

For the general case, suppose that n ≥ 4 and that we have proved theresult for smaller values of n. Then the given exact sequence is equivalent tothe following two exact sequences because im(fn−2) = ker(fn−1).

0→ V1f1→ · · · fn−3→ Vn−2

fn−2→ im(fn−2)→ 0,

0→ im(fn−2)i→ Vn−1

fn−1→ Vn → 0.

By induction, we have∑n−2

i=1 (−1)i dimVi + (−1)n−1 dim(im(fn−2)) = 0 and(−1)n−2 dim(im(fn−2))+(−1)n−1 dimVn−1+(−1)n dimVn = 0. Adding thesetwo equations gives us the result.

Systems of Linear Equations.Our last application is studying systems of linear equations. This moti-

vates much of the material in Chapter 3.Consider the following system of m linear equations in n variables with

coefficients in k.a11x1 + a12x2 + · · ·+ a1nxn = c1

a21x1 + a22x2 + · · ·+ a2nxn = c2

...

am1x1 + am2x2 + · · ·+ amnxn = cm

Now consider the following vectors in k(m). We will write such vectorsvertically or horizontally, whichever is more convenient.

u1 =

a11a21...am1

, u2 =

a12a22...am2

, . . . , un =

a1na2n...

amn

Define the function f : k(n) → k(m) by f(b1, . . . , bn) = b1u1 + · · · + bnun.

Then f is a linear transformation. Note that f(b1, . . . , bn) = (c1, . . . , cm) ifand only if (b1, . . . , bn) is a solution for (x1, . . . , xn) in the system of linearequations above. That is, (c1, . . . , cm) ∈ im(f) if and only if the system oflinear equations above has a solution. Let us abbreviate the notation byletting b = (b1, . . . , bn) and c = (c1, . . . , cm). Then f(b) = c means that b isa solution of the system of linear equations.

39

Suppose we know that b is a solution of the system and so f(b) = c. Thenall solutions are given by b+ ker(f) = b+ y|y ∈ ker(f) by Proposition 2.5.

Thus the existence of solutions of a system of linear equations and thecharacterization of all such solutions can be described in terms of a lineartransformation f and the subspaces ker(f) and im(f).

The equations n = dim(k(n)) = dim(ker(f)) + dim(im(f)) allow us torecover many of the usual results about systems of linear equations. Weneed a method to compute a basis of ker(f) and a basis of im(f) in order tocompletely describe the set of solutions to a system of linear equations. Thisis one of the main applications of matrix methods developed in Chapter 3.

Notes to Chapter 2. The proof of Proposition 2.10 contains a stepwhere we need to know that ab = ba in a field. This is an interesting pointsince much of linear algebra can be developed over skew fields. (A skew fieldis a ring where each nonzero element is invertible, but multiplication neednot be commutative.) One defines left and right vector spaces over a skewfield D and other related concepts. A good reference for this is the book byE. Artin. In this setting, Proposition 2.10 does not carry over completely.

A common problem that students have is knowing when it is necessaryto show that a function is well defined. If the domain of a function consistsof equivalence classes, and if a function is defined in terms of an element ofthe equivalence class, then it is necessary to show that the definition of thefunction does not depend on the chosen element of the equivalence class.

Exercises

1. Give a second proof of Lemma 2.2 using the equations 0 + 0 = 0 and−v + v = 0.

2. Verify that Examples 1-5 following Lemma 2.2 are linear transforma-tions.

3. Let f : V → W be a linear transformation. Let v1, . . . , vn ∈ V and letwi = f(vi), 1 ≤ i ≤ n.

(a) If v1, . . . , vn spans V , then w1, . . . , wn spans im(f). That is,〈w1, . . . , wn〉 = im(f).

(b) If v1, . . . , vn is a linearly independent set in V and f is injective,then w1, . . . , wn is a linearly independent set in W .

40

(c) If w1, . . . , wn is a linearly independent set inW , then v1, . . . , vnis a linearly independent set in V .

(d) If w1, . . . , wn is a linearly independent set in W and v1, . . . , vnis a basis of V , then f is injective.

(e) Assume that v1, . . . , vn is a basis of V . Then f is an isomorphismif and only if w1, . . . , wn is a basis of W .

4. Let W be a subspace of V , dimV finite. Define the codimensionof W in V , written codimkW (or codimW if k is understood fromcontext), by the equation codimW = dimV − dimW . More gener-ally, we can use Corollary 2.18 as motivation to define codimW bycodimW = dim(V/W ).

(a) If W1,W2 are subspaces of V with codimW1 = codimW2 = 1, thencodim(W1 ∩W2) ≤ 2. If W1 6= W2, then codim(W1 ∩W2) = 2.

(b) Let W1,W2 be subspaces of V . If each codimWi is finite, then

codim(W1 ∩W2) ≤ codimW1 + codimW2.

We have codim(W1 ∩W2) = codimW1 + codimW2 if and only ifW1 +W2 = V .

(c) Let W1, . . . ,Wn be subspaces of V . If each codimWi is finite, then

codim(W1 ∩ · · · ∩Wn) ≤n∑i=1

codimWi.

5. Isomorphism is an equivalence relation on vector spaces.

6. Let f : V → W be an isomorphism of vector spaces. Let Y be asubspace of V . Then Y ∼= f(Y ).

7. Two finitely generated vector spaces, V,W , over k are isomorphic if andonly if dimV = dimW . (One can use Proposition 2.8, and Exercise 5for one implication, and Exercise 3 (e) for the other implication.)

8. Finish the proof of Proposition 2.10 that L(V,W ) is a vector space overk.

41

9. Let U, V,W be arbitrary sets and let f : U → V and g : V → W bearbitrary functions.

(a) If f and g are injective, then g f is injective.

(b) If g f is injective, then f is injective, but g may not be injective.Give such an example.

(c) If f and g are surjective, then g f is surjective.

(d) If g f is surjective, then g is surjective, but f may not be surjec-tive. Give such an example.

10. Let V,W be vector spaces, and let f ∈ L(V,W ) and g ∈ L(W,V ).

(a) If fg = 1W , then g is injective and f is surjective. Similarly,if gf = 1V , then f is injective and g is surjective. (This partdepends only on sets and functions and has nothing to do withLinear Algebra.)

(b) Assume that dimV = dimW is finite. Then fg = 1W if and onlyif gf = 1V .

(c) If dimV 6= dimW , give an example where gf = 1V but fg 6= 1W .

11. Prove Proposition 2.12.

12. Let V = V1⊕· · ·⊕

Vn and W = W1

⊕· · ·⊕

Wm. Then

(a) L(V,W ) ∼=⊕n

i=1 L(Vi,W )

(b) L(V,W ) ∼=⊕m

j=1 L(V,Wj)

(c) L(V,W ) ∼=⊕n

i=1

⊕mj=1 L(Vi,Wj)

(d) If dimV = dimW = 1, then L(V,W ) ∼= k.

13. Show that the definitions of addition and scalar multiplication on V/Ware well defined. Finish the proof of Proposition 2.16 that V/W is avector space.

14. Let W be a subspace of V and assume dimV is finite. Let v1, . . . , vlbe a basis of W . Extend this basis to a basis v1, . . . , vl, vl+1, . . . , vnof V . Then vl+1 +W, . . . , vn +W is a basis of V/W .

42

15. Use the previous exercise to give a different proof of Corollary 2.18.Now use The First Isomorphism Theorem, Exercise 7, and Corollary2.18 to give a different proof of Proposition 2.6.

16. Show that Proposition 1.15 is a consequence of the Second IsomorphismTheorem (Theorem 2.20), Exercise 7, and Corollary 2.18.

17. (Strong form of the First Isomorphism Theorem) Let f : V → Z bea linear transformation of vector spaces over k. Suppose W ⊆ ker(f).Then show there exists a unique linear transformation f : V/W →im(f) such that f π = f , where π : V → V/W . Show f is surjectiveand ker(f) ∼= ker(f)/W . Conclude that (V/W )/(ker(f)/W ) ∼= im(f).

18. (Third Isomorphism Theorem) Let V be a vector space and let W ⊆Y ⊆ V be subspaces. Then

(V/W )/(Y/W ) ∼= V/Y.

(Hint: Consider the linear transformation f : V → V/Y and apply thestrong form of the First Isomorphism Theorem from Exercise 17.)

19. Check the assertion following Proposition 2.9 that in Example 5, thelinear transformation f is surjective, but not injective, while g is injec-tive, but not surjective.

20. Let V be a vector space and let W ⊆ Y ⊆ V be subspaces. Assumethat dim(V/W ) is finite. Then dim(V/W ) = dim(V/Y ) + dim(Y/W ).(Hint: We have (V/W )/(Y/W ) ∼= V/Y from Problem 18. Then applyCorollary 2.18.)

21. Suppose that 0→ Uf→ V

g→ W → 0 is an exact sequence. Prove thatV ∼= U

⊕W .

22. Recall k∞ from Example 5 above Chapter 1, Proposition 1.8, and recallk[x] from Example 5 above Definition 2.3. Prove that k[x] and k∞ areisomorphic vector spaces over k.

43

Chapter 3

Matrix Representations of aLinear Transformation

3.1 Basic Results

Let V,W be vector spaces over a field k and assume that dimV = n anddimW = m. Let β = v1, . . . , vn be a basis of V and let γ = w1, . . . , wmbe a basis of W . Let f ∈ L(V,W ). We will construct an m× n matrix thatis associated with the linear transformation f and the two bases β and γ.

Proposition 3.1. Assume the notations above.

1. f is completely determined by f(v1), . . . , f(vn).

2. If y1, . . . , yn ∈ W are chosen arbitrarily, then there exists a uniquelinear transformation f ∈ L(V,W ) such that f(vi) = yi, 1 ≤ i ≤ n.

Proof. 1. Let v ∈ V . Then v =∑n

i=1 civi . Each ci ∈ k is uniquelydetermined because β is a basis of V . We have f(v) =

∑ni=1 cif(vi),

because f is a linear transformation.

2. If f exists, then it is certainly unique by (1). As for the existence of f ,let v ∈ V , v =

∑ni=1 civi. Define f : V → W by f(v) =

∑ni=1 ciyi. Note

that f is well defined because β is a basis of V and so v determinesc1, . . . , cn uniquely. Clearly f(vi) = yi. Now we show that f ∈ L(V,W ).

44

Let v′ =∑n

i=1 divi and let a ∈ k. Then

f(v + v′) = f

(n∑i=1

(ci + di)vi

)=

n∑i=1

(ci + di)yi =n∑i=1

ciyi +n∑i=1

diyi

= f(v) + f(v′),

and

f(av) = f

(a

n∑i=1

civi

)=

n∑i=1

aciyi = a

n∑i=1

ciyi = af(v).

Definition 3.2. An ordered basis of a finite dimensional vector space V isa basis v1, . . . , vn where the order of the basis vectors v1, . . . , vn is fixed.

Let β = v1, . . . , vn be an ordered basis of V , let γ = w1, . . . , wm be anordered basis of W , and let f ∈ L(V,W ). Assume that f(vj) =

∑mi=1 aijwi,

1 ≤ j ≤ n. Thus, the linear transformation f gives rise to an m× n matrix(aij)m×n whose (i, j)-entry is given by aij. Note that the jth column of thismatrix gives the expression of f(vj) in terms of the basis w1, . . . , wm. Weshall denote this matrix by [f ]γβ and refer to it as the matrix of f with respectto the ordered bases β, γ. The notation shows the dependence of this matrixon the linear transformation f and on the ordered bases β and γ. Thenotation

[f ]γβ = (aij)m×n

means that f(vj) =∑m

i=1 aijwi, 1 ≤ j ≤ n.LetMm×n(k) denote the set of m×n matrices with entries in the field k.

Let A = (aij), B = (bij) be elements in Mm×n(k) and let c ∈ k. We defineaddition in Mm×n(k) by setting A + B to be the matrix C = (cij), wherecij = aij + bij. We define scalar multiplication in Mm×n(k) by setting cA tobe the matrix C = (cij), where cij = caij.

Proposition 3.3. Under the operations defined above, the setMm×n(k) is avector space over k that is isomorphic to k(mn). The dimension of Mm×n(k)is mn.


45

Proposition 3.4. Let β, γ be ordered bases of V,W , as above. The functionφ : L(V,W )→Mm×n(k), defined by φ(f) = [f ]γβ is an isomorphism of vector

spaces. In particular, L(V,W ) ∼=Mm×n(k) ∼= k(mn) and dimL(V,W ) = mn.

Proof. If φ(f) = φ(g), then f(vj) = g(vj), 1 ≤ j ≤ n. Thus f = g byProposition 3.1(1). Therefore φ is injective. Now let (aij) ∈ Mm×n(k) begiven. Let yj =

∑mi=1 aijwi, 1 ≤ j ≤ n. There exists f ∈ L(V,W ) such that

f(vj) = yj, 1 ≤ j ≤ n, by Proposition 3.1(2). Then φ(f) = [f ]γβ = (aij).Therefore φ is surjective and it follows that φ is a bijection.

Let f, g ∈ L(V,W ) and let [f ]γβ = (aij), [g]γβ = (bij). Then f(vj) =∑mi=1 aijwi, g(vj) =

∑mi=1 bijwi, and (cf)(vj) = cf(vj) =

∑mi=1 caijwi. Since

(f + g)(vj) = f(vj) + g(vj) =m∑i=1

(aij + bij)wi,

it follows that

φ(f + g) = [f + g]γβ = (aij + bij) = (aij) + (bij) = [f ]γβ + [g]γβ = φ(f) + φ(g).

Similarly,φ(cf) = [cf ]γβ = (caij) = c(aij) = c[f ]γβ = cφ(f).

Thus φ is a linear transformation and so φ is an isomorphism.The rest follows easily from Proposition 3.3.

Note that the isomorphism φ depends on the choice of basis for V andW , although the notation doesn’t reflect this.

Another proof that dimL(V,W ) = mn can be deduced from the resultin Exercise 12 of Chapter 2 and the special case that dimL(V,W ) = 1 whendimV = dimW = 1.

We have already defined addition and scalar multiplication of matricesin Mm×n(k). Next we define multiplication of matrices. We will see inProposition 3.5 below that the product of two matrices corresponds to thematrix of a composition of two linear transformations.

Let A = (aij) be an m × n matrix and let B = (bij) be an n × p matrixwith entries in k. The product AB is defined to be the m×p matrix C = (cij)whose (i, j)-entry is given by cij =

∑nk=1 aikbkj. A way to think of cij is to

note that cij is the usual “dot product” of the ith row of A with the jth

column of B. The following notation summarizes this.

Am×nBn×p = Cm×p, where cij =n∑k=1

aikbkj.

46

An important observation about matrix multiplication occurs in the spe-cial case that p = 1. In that case B is a “column matrix”. Suppose theentries of B are (b1, . . . , bn) Then a short calculation shows that AB =∑n

j=1 bj(jth column of A).

Note that the definition of matrix multiplication puts restrictions on thesizes of the matrices. An easy way to remember this restriction is to remem-ber (m× n)(n× p) = (m× p).

Proposition 3.5. Let U, V,W be finite dimensional vector spaces over k.Let α = u1, . . . , up, β = v1, . . . , vn, γ = w1, . . . , wm be ordered basesof U, V,W , respectively. Let g ∈ L(U, V ) and f ∈ L(V,W ) (and so fg ∈L(U,W ) by Proposition 2.11). Then [fg]γα = [f ]γβ[g]βα.

Proof. Let [f ]γβ = (aij)m×n, [g]βα = (bij)n×p, and [fg]γα = (cij)m×p. Then

(fg)(uj) = f(g(uj)) = f

(n∑k=1

bkjvk

)=

n∑k=1

bkjf(vk)

=n∑k=1

bkj

(m∑i=1

aikwi

)=

m∑i=1

(n∑k=1

aikbkj

)wi.

This shows that cij =∑n

k=1 aikbkj, which is precisely the (i, j)-entry of[f ]γβ[g]βα.

Proposition 3.6. Let A ∈ Mm×n(k), B ∈ Mn×p(k), and C ∈ Mp×q(k).Then (AB)C = A(BC). In other words, multiplication of matrices satisfiesthe associative law for multiplication, as long as the matrices have compatiblesizes.


Exercise 19 gives some properties of matrices that are analogous to Propo-sition 2.12.

Let In denote the n× n matrix (aij) where aij = 0 if i 6= j and aij = 1 ifi = j. For any A ∈ Mm×n(k), it is easy to check that ImA = AIn = A. Ifm = n, then InA = AIn = A For this reason, In is called the n× n identitymatrix.

Definition 3.7. A matrix A ∈Mn×n(k) is invertible if there exists a matrixB ∈ Mn×n(k) such that AB = In and BA = In The matrix B is called theinverse of A and we write B = A−1.

47

Proposition 3.8.

1. The inverse of an invertible matrix inMn×n(k) is uniquely determinedand thus the notation A−1 is well defined.

2. Let A,B be invertible matrices in Mn×n(k). Then (AB)−1 = B−1A−1.

3. Let A1, . . . , Al be invertible matrices inMn×n(k). Then (A1 · · ·Al)−1 =A−1l · · ·A

−11 . In particular, the product of invertible matrices is invert-

ible.

4. (A−1)−1 = A.

Proof. Suppose AB = BA = In and AC = CA = In. Then B = BIn =B(AC) = (BA)C = InC = C. This proves (1). The other statements areeasy to check.

Proposition 3.9. Let β be an ordered basis of V and let γ be an orderedbasis of W . Assume that dimV = dimW = n. Suppose that f : V → W isan isomorphism and let f−1 : W → V denote the inverse isomorphism. (SeeProposition 2.13.) Then ([f ]γβ)−1 = [f−1]βγ .

Proof. [f ]γβ[f−1]βγ = [ff−1]γγ = [1W ]γγ = In. Similarly, [f−1]βγ [f ]γβ = [f−1f ]ββ =

[1V ]ββ = In. The result follows from this and the definition of the inverse of amatrix.

If U = V = W and α = β = γ in Proposition 3.5, then the result inProposition 3.5 implies that the isomorphism in Proposition 3.4 is also aring isomorphism. (See the remark preceding Proposition 2.13.) Next weinvestigate how different choices of bases of V,W in Proposition 3.4 affectthe matrix [f ]γβ.

Proposition 3.10. Let β, β′ be ordered bases of V and let γ, γ′ be orderedbases of W . Let f ∈ L(V,W ), and let 1V : V → V and 1W : W → W be the

identity maps on V,W , respectively. Then [f ]γ′

β′ = [1W ]γ′γ [f ]γβ[1V ]ββ′.

Proof. Since f = 1W f 1V , the result follows immediately from Proposition3.5.

48

An important special case of Proposition 3.10 occurs when V = W ,dimV = n, β = γ, and β′ = γ′. Then [f ]β

′

β′ = [1V ]β′

β [f ]ββ[1V ]ββ′ . Let A = [1V ]ββ′

andB = [1V ]β′

β . Then AB = [1V ]ββ′ [1V ]β′

β = [1V ]ββ = In and similarlyBA = In.Thus B = A−1.

We can summarize this with more compact notation. Let M = [f ]ββ and

let M ′ = [f ]β′

β′ . Then M ′ = A−1MA.In Proposition 2.7 we proved that if V has dimension n, then V is iso-

morphic to k(n). We will now extend this idea to help with computationsinvolving linear transformations.

Let V be a vector space over k with basis β = v1, . . . , vn and let W bea vector space over k with basis γ = w1, . . . , wm. Let εn be the standardbasis of k(n). That is, εn = e1, . . . , en where ei is the vector in k(n) withcoordinates equal to zero everywhere except in the ith position where thecoordinate equals 1. Similarly, let εm be the standard basis of k(m). Letφβ : V → k(n) be the isomorphism (as in Proposition 2.7) that takes v =∑n

i=1 bivi to (b1, . . . bn). We will write [v]β for φβ(v). Thus [v]β denotes thecoordinates of v with respect to the basis β. Similarly, let φγ : W → k(m) bethe isomorphism that takes w to [w]γ.

Let A = (aij)m×n ∈ Mm×n(k). We let LA : k(n) → k(m) denote thefunction that takes a vector b (considered as an n× 1 column matrix) to thevector Ab ∈ k(m), where Ab is given by usual matrix multiplication. It isstraightforward to check that LA ∈ L(k(n), k(m)).

Lemma 3.11. Using the notation from above, we have [LA]εmεn = A, [φβ]εnβ =In, and [φγ]

εmγ = Im.

Proof. LA(ej) = Aej =∑m

i=1 aijei, since Aej is the jth column of A. Theresult for LA now follows from the definition of [LA]εmεn .

The other two results follow easily upon noting that φβ(vj) = ej ∈ k(n)and φγ(vj) = ej ∈ k(m).

Proposition 3.12. Let f ∈ L(V,W ) and let A = [f ]γβ. Then the followingdiagram commutes. That is, φγ f = LA φβ. In particular, [f(v)]γ =[f ]γβ[v]β.

Vf−−−→ W

φβ

y yφγk(n)

LA−−−→ k(m)

49

Proof. The second statement follows from the first because

[f(v)]γ = (φγ f)(v) = (LA φβ)(v) = A[v]β = [f ]γβ[v]β.

For the first statement, we have

[φγ f ]εmβ = [φγ]εmγ [f ]γβ = ImA = A = AIn = [LA]εmεn [φβ]εnβ = [LA φβ]εmβ .

Therefore, φγ f = LA φβ by Proposition 3.4.

Here is another proof of the second statement of Proposition 3.12 that isa more concrete calculation.

Let v ∈ V , v =∑n

j=1 bjvj. Then f(vj) =∑m

i=1 aijwi, where A = (aij) =[f ]γβ. We have

f(v) = f(n∑j=1

bjvj) =n∑j=1

bjf(vj) =n∑j=1

bj(m∑i=1

aijwi)

=m∑i=1

(n∑j=1

aijbj)wi.

Therefore,

[f(v)]γ =

∑n

j=1 a1jbj...∑n

j=1 amjbj

= A

b1...bn

= A[v]β = [f ]γβ[v]β.

Proposition 3.13. Let β = v1, . . . , vn be an ordered basis of a finite di-mensional vector space V . If f ∈ L(V, V ) is an isomorphism, define τ(f)to be the ordered basis f(β), where f(β) = f(v1), . . . , f(vn). Then τ is abijection between the set of isomorphisms f ∈ L(V, V ) and the set of orderedbases of V .


3.2 The Transpose of a Matrix

Definition 3.14. Let A ∈Mm×n(k). The transpose of A, written At, is then × m matrix in Mn×m(k) whose (i, j)-entry is the (j, i)-entry of A. Thatis, if A = (aij)m×n and if At = (bij)n×m, then bij = aji.

50

It follows that the ith row of A is the ith column of At, 1 ≤ i ≤ m, andthe jth column of A is the jth row of At, 1 ≤ j ≤ n.

Proposition 3.15. 1. Let A,B ∈ Mm×n(k) and let a ∈ k. Then (A +B)t = At +Bt, (aA)t = aAt, and (At)t = A.

2. The map g : Mm×n(k) → Mn×m(k) given by A 7→ At is an isomor-phism of vector spaces.

3. Let A ∈Mm×n(k), B ∈Mn×p(k). Then (AB)t = BtAt.

Proof. For (1) and (2), see Exercise 7.(3) Let C = AB = (cij)m×p. Let At = (a′ij)n×m and Bt = (b′ij)p×n. Then

(i, j)-entry of BtAt =n∑l=1

b′ila′lj =

n∑l=1

bliajl =n∑l=1

ajlbli = cji

= (j, i)-entry of AB

= (i, j)-entry of (AB)t .

Therefore, (AB)t = BtAt.

Proposition 3.16. Let A ∈Mn×n(k) and assume that A is invertible. ThenAt is invertible and (At)−1 = (A−1)t.

Proof. Let B = A−1. Then AtBt = (BA)t = I tn = In and BtAt = (AB)t =I tn = In. Therefore the inverse of At is Bt = (A−1)t.

A more conceptual description of the transpose of a matrix and the resultsin Propositions 3.15 and 3.16 will be given in Chapter 4.

3.3 The Row Space, Column Space, and Null

Space of a Matrix

Definition 3.17. Let A ∈Mm×n(k).

1. The row space of A is the subspace of k(n) spanned by the rows of A.

2. The column space of A is the subspace of k(m) spanned by the columnsof A.

51

3. The null space of A is the set of vectors b ∈ k(n) such that Ab = 0.

Let A ∈Mm×n(k) and let

R(A) = row space of A,

C(A) = column space of A,

N (A) = null space of A.

Let LA : k(n) → k(m) be the linear transformation that takes b ∈ k(n) toAb ∈ k(m).

The definition of matrix multiplication shows that im(LA) = C(A) andker(LA) = N (A). Therefore N (A) is a subspace of k(n). Also, R(A) =C(At). The subspace N (At) is called the left null space of A. This is becauseif c ∈ k(m) is considered as a 1 × m matrix, then cA = 0 ⇐⇒ (cA)t =0 ⇐⇒ Atct = 0. The dimension formula from Proposition 2.6 implies thatn = dim(ker(LA)) + dim(im(LA)) = dim(N (A)) + dim(C(A)).

Proposition 3.18. Let A ∈ Mm×n(k), D ∈ Mp×m(k), and E ∈ Mn×p(k).Then

1. R(DA) ⊆ R(A).

2. C(AE) ⊆ C(A).

3. N (DA) ⊇ N (A).

If p = m and D is invertible, then equality occurs in (1) and (3). If p = nand E is invertible, then equality occurs in (2).

Proof. Let D = (dij). The ith row of DA is given by

di1(first row of A) + · · ·+ din(nth row of A),

which is contained in the row space of A. This shows (1).Let E = (eij). The jth column of AE is given by

e1j(first column of A) + · · ·+ enj(nth column of A),

which is contained in the column space of A. This shows (2).If Ax = 0, then DAx = 0 and thus (3) holds.If p = m and D is invertible, then apply (1) to A = D−1(DA). If p = n

and E is invertible, then apply (2) to A = (AE)E−1. If D is invertible, thenDAx = 0 implies D−1DAx = D−10 = 0, so Ax = 0.

52

Proposition 3.19. Let A ∈Mm×n(k), D ∈Mm×m(k), and E ∈Mn×n(k).Assume that both D and E are invertible. Then

1. C(DA) = LD(C(A)) ∼= C(A),

2. C(AE) = C(A),

3. N (DA) = N (A),

4. N (AE) = LE−1N (A) ∼= N (A),

5. R(DA) = R(A),

6. R(AE) = LEtR(A) ∼= R(A).

Proof. Statements (2), (3), and (5) were proved in Proposition 3.18. We givealternative proofs below.

Since E is an invertible matrix, it follows that Et and E−1 are also in-vertible matrices. Since D is also invertible, it follows that LD, LE, LEt , andLE−1 are isomorphisms. (See Exercise 5.) Since LD is an isomorphism, wehave

C(DA) = im(LDA) = im(LD LA) = LD(im(LA)) = LD(C(A)) ∼= C(A).

Since LE is surjective, we have

C(AE) = im(LAE) = im(LA LE) = im(LA) = C(A).

Since LD is injective, we have

N (DA) = ker(LDA) = ker(LD LA) = ker(LA) = N (A).

We have

b ∈ N (AE) ⇐⇒ AEb = 0 ⇐⇒ Eb ∈ N (A) ⇐⇒ b ∈ LE−1N (A).

Thus N (AE) = LE−1N (A) ∼= N (A), because LE−1 is an isomorphism.From (1) and (2), we now have

R(DA) = C((DA)t) = C(AtDt) = C(At) = R(A),

andR(AE) = C((AE)t) = C(EtAt) ∼= C(At) = R(A).

53

3.4 Elementary Matrices

We will define elementary matrices in Mm×n(k) and study their properties.Let eij denote the matrix inMm×n(k) that has a 1 in the (i, j)-entry and

zeros in all other entries. The collection of these matrices forms a basis ofMm×n(k) called the standard basis ofMm×n. Multiplication of these simplematrices satisfies the formula

eijekl = δ(j, k)eil,

where δ(j, k) is defined by δ(j, k) =

0 if j 6= k

1 if j = k.

When m = n, there are three types of matrices that we single out. Theyare known as the elementary matrices.

Definition 3.20.

1. Let Eij(a) be the matrix In + aeij, where a ∈ k and i 6= j.

2. Let Di(a) be the matrix In + (a− 1)eii, where a ∈ k, a 6= 0.

3. Let Pij, i 6= j, be the matrix In − eii − ejj + eij + eji.

Note that Eij(0) is the identity matrix In, and that Di(a) is the matrixobtained from In by replacing the (i, i)-entry of In with a. Also note thatPij can be obtained from the identity matrix by either interchanging the ith

and jth rows, or by interchanging the ith and jth columns.The notation we are using for the elementary matrices, which is quite

standard, does not indicate that the matrices under consideration are n× nmatrices. We will always assume that the elementary matrices under consid-eration always have the appropriate size in our computations.

We now relate the basic row and column operations of matrices to theelementary matrices.

Proposition 3.21. Let A ∈ Mm×n(k). Let A1 = Eij(a)A, A2 = Di(a)A,A3 = PijA, where the elementary matrices lie in Mm×m(k). Then

A1 is obtained from A by adding a times row j of A to row i of A.A2 is obtained from A by multiplying row i of A by a.A3 is obtained from A by interchanging rows i and j of A.

54

Proposition 3.22. Let A ∈ Mm×n(k). Let A4 = AEij(a), A5 = ADi(a),A6 = APij, where the elementary matrices lie in Mn×n(k). Then

A4 is obtained from A by adding a times column i of A to column j of A.A5 is obtained from A by multiplying column i of A by a.A6 is obtained from A by interchanging columns i and j of A.

Exercise 8 asks for a proof of the last two propositions. The operationsdescribed in Propositions 3.21, 3.22 are called the elementary row and columnoperations.

Proposition 3.23.

1. Eij(a)Eij(b) = Eij(a + b) for all a, b ∈ k. Therefore, Eij(a)−1 =Eij(−a).

2. Di(a)Di(b) = Di(ab) for all nonzero a, b ∈ k. Therefore, Di(a)−1 =Di(a

−1), for nonzero a ∈ k.

3. P−1ij = Pij.

Proof. We will prove these results by applying Proposition 3.21. We haveEij(a)Eij(b)I = Eij(a + b)I because each expression adds a + b times row jof I to row i of I. Since I = Eij(0) = Eij(a + (−a)) = Eij(a)Eij(−a), itfollows that Eij(a)−1 = Eij(−a).

Similarly,Di(a)Di(b)I = Di(ab)I because each expression multiplies rowi of I by ab.

Finally, PijPijI = I, because the left side interchanges rows i and j of Itwice, which has the net effect of doing nothing. Therefore, PijPij = I and(3) follows from this.

Here are several useful cases when products of elementary matrices com-mute.

Proposition 3.24.

1. Eij(a)Eil(b) = Eil(b)Eij(a) for all a, b ∈ k.

2. Eil(a)Ejl(b) = Ejl(b)Eil(a) for all a, b ∈ k.

3. Di(a)Dj(b) = Dj(b)Di(a) for all nonzero a, b ∈ k.

55

Proof. (1) Since i 6= j and i 6= l, we have

Eij(a)Eil(b) = (In + aeij)(In + beil)

= In + aeij + beil = (In + beil)(In + aeij) = Eil(b)Eij(a).

(2) Since i 6= l and j 6= l, we have

Eil(a)Ejl(b) = (In + aeil)(In + bejl)

= In + aeil + bejl = (In + bejl)(In + aeil) = Ejl(b)Eil(a).

(3) Since eiiejj = ejjeii, we have

Di(a)Dj(b) = (In + (a− 1)eii)(In + (b− 1)ejj)

= In + (a− 1)eii + (b− 1)ejj + (a− 1)(b− 1)eiiejj

= (In + (b− 1)ejj)(In + (a− 1)eii) = Dj(b)Di(a).

3.5 Permutations and Permutation Matrices

This section is important for Chapter 6 as well as for further results onelementary matrices.

Definition 3.25. A function σ : 1, . . . , n → 1, . . . , n is called a permuta-tion if σ is bijective. The set of all permutations σ : 1, . . . , n → 1, . . . , nis denoted Sn. If σ(i) = i, 1 ≤ i ≤ n, then σ is called the identity permuta-tion and is written σ = id.

There are n! permutations in Sn. If σ ∈ Sm and m < n, then we canconsider σ ∈ Sn by setting σ(i) = i for all i satisfying m+ 1 ≤ i ≤ n. In thisway we have S1 ⊆ S2 ⊆ · · · ⊆ Sn ⊆ · · · .

If σ1, σ2 ∈ Sn, then σ1 σ2 ∈ Sn. We call σ1 σ2 the product of σ1 andσ2.

Definition 3.26. A permutation σ ∈ Sn is called a transposition if thereexist i, j ∈ 1, . . . , n, i 6= j, such that σ(i) = j, σ(j) = i, and σ(l) = l forall l ∈ 1, . . . , n with l 6= i, j. Let σij denote this transposition.

Proposition 3.27. Each σ ∈ Sn, n ≥ 2, is a product of transpositions inSn.

56

Proof. If σ = id, then σ = σ12 σ12. We now prove the result by inductionon n. Assume that n ≥ 2. Let σ(n) = i and let σ′ = σin σ. Thenσ′(n) = σin(i) = n, so we can regard σ′ ∈ Sn−1. By induction, σ′ is a productof transpositions in Sn−1 ⊆ Sn. Therefore, σ = σin σin σ = σin σ′ is aproduct of transpositions in Sn.

In Theorem 6.4 of Chapter 6, we will prove that the number of transpo-sitions in any representation of σ as a product of transpositions is uniquelydetermined modulo 2.

Definition 3.28. A matrix P = (pij) ∈ Mn×n(k) is called a permutationmatrix if there exists a permutation σ ∈ Sn such that

pij =

0, if i 6= σ(j)

1, if i = σ(j).

Denote this permutation matrix by Pσ.

Each column of a permutation matrix P ∈ Mn×n(k) consists of exactlyone 1 and n − 1 zeros. The same holds for each row of P because if pij =pil = 1, then σ(j) = i = σ(l). Then j = l because σ is injective.

Conversely, if each row and each column of an n× n matrix has exactlyone 1 and n−1 zeros, then the matrix is a permutation matrix. (See Exercise14.)

The elementary matrix Pij defined in Section 3.4 is the same as the permu-tation matrix Pσ where σ is the transposition σij. We call Pij an elementarypermutation matrix.

Let Pn×n denote the set of permutation matrices.

Proposition 3.29. There is a bijection f : Sn → Pn×n given by σ 7→ Pσwith the property f(σ1 σ2) = f(σ1)f(σ2). In other words, Pσ1σ2 = Pσ1Pσ2.

Proof. The definition of matrix multiplication shows that

(i, j)-entry of Pσ1Pσ2 =

1, if i = σ1(l) and l = σ2(j) for some l,

0, otherwise,

=

1, if i = σ1 σ2(j),0, otherwise,

= (i, j)-entry of Pσ1σ2 .

57

Thus Pσ1σ2 = Pσ1Pσ2 .Suppose f(σ1) = f(σ2). Then Pσ1 = Pσ2 . This implies σ1(j) = σ2(j) for

all j = 1, 2, . . . , n. Therefore σ1 = σ2, so f is injective.Finally, f is surjective because every permutation matrix has the form

Pσ = f(σ). Therefore f is a bijection.

Proposition 3.30. Let Pσ be a permutation matrix. Then

1. P−1σ = Pσ−1.

2. P tσ = P−1σ = Pσ−1, so P t

σ is a permutation matrix.

3. Pσ is a product of elementary permutation matrices Pij.

4. If Pσ is an elementary permutation matrix, then Pσ = P tσ = P−1σ .

Proof. 1. Proposition 3.29 implies that Pσ−1Pσ = Pid = PσPσ−1 . SincePid = In, it follows that P−1σ = Pσ−1 .

2. Direct matrix multiplication shows that PσPtσ = P t

σPσ = In. Therefore,P tσ = P−1σ . Alternatively, it follows from Definition 3.28 that P t

σ = (qij)where

qij =

0, if j 6= σ(i)

1, if j = σ(i)=

0, if i 6= σ−1(j)

1, if i = σ−1(j).

It follows that P tσ = Pσ−1 .

3. This follows from Propositions 3.27 and 3.29.

4. If Pσ is an elementary permutation matrix, then σ is a transposition.Then σ−1 = σ and the result follows from (1) and (2).

The following result will be useful later.

Proposition 3.31.

Pσ

x1...xn

=

xσ−1(1)...

xσ−1(n)

.

58

Proof. We have

Pσ

x1...xn

=n∑j=1

xj( jth column of Pσ).

Suppose that σ(j) = i. Then the jth column of Pσ has a single 1 in the ith

row. Thus the ith row of Pσ

x1...xn

is xj = xσ−1(i).

3.6 The Rank of a Matrix

Proposition 3.32. Let A ∈ Mm×n(k). Then there exists an invertible ma-trix D ∈Mm×m(k), an invertible matrix E ∈Mn×n(k), and an integer r ≥ 0such that the matrix DAE has the form

DAE =

(Ir Br×(n−r)0(m−r)×r 0(m−r)×(n−r)

).

In addition we have

1. D is a product of elementary matrices.

2. E is a permutation matrix.

3. 0 ≤ r ≤ minm,n.

Proof. We will show that A can be put in the desired form by a sequence ofelementary row operations and a permutation of the columns. It follows fromProposition 3.21 that performing elementary row operations on A results inmultiplying A on the left by m ×m elementary matrices. Each elementarymatrix is invertible by Proposition 3.23. Let D be the product of theseelementary matrices. Then D is an invertible m × m matrix. Similarly, itfollows from Proposition 3.22 and Proposition 3.30 (3) that permuting thecolumns of A results in multiplying A on the right by an n× n permutationmatrix E.

If A = 0, then let r = 0, D = Im, and E = In. Now assume that A 6= 0.Then aij 6= 0 for some i, j.

59

Interchange rows 1 and i and then interchange columns 1 and j. (ReplaceA with P1iAP1j.) This lets us assume that a11 6= 0.

Multiply row 1 by a−111 . (Replace A with D1(a−111 )A.) This lets us assume

that a11 = 1.Add −ai1 times row 1 to row i, 2 ≤ i ≤ m. (Replace A with

Em1(−am1) · · ·E31(−a31)E21(−a21)A.)

This lets us assume that a11 = 1 and ai1 = 0 for 2 ≤ i ≤ m.Suppose by induction that we have performed elementary row operations

on A and permuted the columns of A such that for some l satisfying 1 ≤ l <minm,n, we have

aij =

1 if i = j

0 if i 6= j,

for 1 ≤ i ≤ m and 1 ≤ j ≤ l.If aij = 0 for all l+1 ≤ i ≤ m and l+1 ≤ j ≤ n, then let r = l and we are

done. Otherwise assume that aij 6= 0 for some i, j with i ≥ l + 1, j ≥ l + 1.Then interchange rows l + 1 and i, and then interchange columns l + 1 andj. This lets us assume that al+1,l+1 6= 0. Multiply row l+ 1 by a−1l+1,l+1. Thislets us assume that al+1,l+1 = 1. Add −ai,l+1 times row l + 1 to row i, fori with 1 ≤ i ≤ m, but i 6= l + 1. This lets us assume that al+1,l+1 = 1 andai,l+1 = 0 for all i satisfying 1 ≤ i ≤ m but i 6= l + 1. Observe that columnsj, where 1 ≤ j ≤ l, remain unchanged throughout this step.

By induction, this procedure either eventually ends as above or r =minm,n. This puts A in the required form.

The matrix D in Proposition 3.32 can be found by applying to Im thesame sequence of elementary row operations that were used in the proof ofthe proposition. The matrix E in Proposition 3.32 can be found by applyingto In the same sequence of column interchanges that were used in the proofof the proposition.

Corollary 3.33. Let A ∈ Mn×n(k) be an invertible matrix. Then A is aproduct of elementary matrices.

Proof. By Proposition 3.32, there exist invertible matrices D,E ∈Mn×n(k)such that D and E are each products of elementary matrices and DAEhas the form given in Proposition 3.32. Since A is invertible, DAE is alsoinvertible, so we must have m− r = 0. Thus r = m = n. Thus DAE = In.Then A = D−1E−1, so A is a product of elementary matrices.

60

Corollary 3.34. Let A ∈Mn×n(k) be an invertible matrix and suppose thatD and E are as in Proposition 3.32 such that DAE = In. Then A−1 = ED.

Proof. If DAE = In, then A = D−1E−1, so A−1 = ED.

Proposition 3.35. Let A ∈Mm×n(k). Then dim(R(A)) = dim(C(A)).

Proof. We find invertible matrices D,E as in Proposition 3.32 Then Propo-sition 3.18 implies that dim(C(A)) = dim(C(DA)) = dim(C(DAE)), anddim(R(A)) = dim(R(DA)) = dim(R(DAE)). This shows that we can as-sume from the start that

A =

(Ir Br×(n−r)0(m−r)×r 0(m−r)×(n−r)

).

The first r rows of A are linearly independent and the remaining m − rrows of A are zero. Therefore, dim(R(A)) = r. The first r columns of Aare linearly independent. The columns of A can be regarded as lying in k(r).Therefore dim(C(A)) = r.

Definition 3.36. Let A ∈Mm×n(k). The rank r(A) of A is the dimension ofthe row space or column space of A. Thus r(A) = dim(R(A)) = dim(C(A)).

It is a very surprising result that dim(R(A)) = dim(C(A)). Since R(A) ⊆k(n) and C(A) ⊆ k(m), there seems to be no apparent reason why the two sub-spaces should have the same dimension. In Chapter 4, we will see additionalconnections between these two subspaces that will give more insight into thesituation.

We end this section with a theorem that gives a number of equivalentconditions that guarantee an n× n matrix is invertible.

Theorem 3.37. Let A ∈Mn×n(k). The following statements are equivalent.

1. A is invertible.

2. R(A) = k(n).

3. The rows of A are linearly independent.

4. Dim(R(A)) = n.

5. C(A) = k(n).

61

6. The columns of A are linearly independent.

7. Dim(C(A)) = n.

8. There exists B ∈Mn×n(k) such that BA = In.

9. There exists C ∈Mn×n(k) such that AC = In.

10. LA : k(n) → k(n) is an isomorphism.

11. LA : k(n) → k(n) is an injection.

12. LA : k(n) → k(n) is a surjection.

13. A is a product of elementary matrices.

Proof. Statements (2)-(4) are equivalent by Proposition 1.14 and Exercise13 of Chapter 1. The same reasoning shows that statements (5)-(7) areequivalent. Statements (10)-(12) are equivalent by Proposition 2.9.

(8) ⇒ (2): We have k(n) = R(In) = R(BA) ⊆ R(A) ⊆ k(n). Thereforeequality holds throughout and (2) holds.

(2) ⇒ (8): Since the R(A) = k(n), there exist constants bi1, . . . , bin in ksuch that bi1(first row of A) + · · ·+ bin(nth row of A) = ei, where ei is the ith

vector in the standard basis of k(n). If we let B be the n × n matrix whose(i, j)-entry is bij, then BA = In.

Similarly, if we apply the same reasoning to columns and column opera-tions as we did to rows and row operations in the proof that (2) and (8) areequivalent, then we easily see that (5) and (9) are equivalent.

Since [LA]εnεn = A by Lemma 3.11, it follows that (1) and (10) are equiva-lent by Exercise 5.

The next step is to prove that (1), (8), (10), (11) are equivalent.(8) ⇒ (11): Lemma 3.11 and Propositions 3.4, 3.5 imply that LBLA =

LBA = LIn = 1k(n) . Then (11) follows from Exercise 9b in Chapter 2.(11) ⇒ (10) ⇒ (1) comes from above and (1) ⇒ (8) is trivial.Statements (1), (9), (10), (12) are proved equivalent in the same way as

the equivalence of (1), (8), (10), (11).It remains to prove that (1) and (13) are equivalent. Since elementary

matrices are invertible and products of invertible matrices are invertible byProposition 3.8, it is clear that (13) implies (1). Finally, (13) implies (1) byCorollary 3.33.

62

3.7 Systems of Linear Equations

We begin by describing an algorithm to solve systems of linear equations.Then we will apply these results to finding bases of ker(f) and im(f) fora linear transformation f . This will extend the results given at the end ofChapter 2.

Consider the following system of m linear equations in n variables withcoefficients in k.

a11x1 + a12x2 + · · ·+ a1nxn = c1

a21x1 + a22x2 + · · ·+ a2nxn = c2

...

am1x1 + am2x2 + · · ·+ amnxn = cm

Let A = (aij)m×n, x = (xj)n×1, and c = (ci)m×1. Then the systemof equations above is equivalent to the matrix equation Ax = c. Whenc1 = · · · = cm = 0, this system is called a homogeneous system of linearequations. Otherwise, the system is called an inhomogeneous system of linearequations.

In Chapter 2, we constructed a linear transformation f : k(n) → k(m)

with the property that f(b) = c if and only if b = (b1, . . . , bn) is a solution forx = (x1, . . . , xn) in the system of linear equations above. It was shown that ifb is one solution, then all solutions are given by b+ker(f). It is easy to checkthat f is the linear transformation LA defined before Lemma 3.11. That is,f(b) = Ab. Thus, ker(f) = b ∈ k(n)|Ab = 0 = ker(LA) = N (A). It followsthat finding the solutions of a homogeneous system of linear equations isequivalent to finding either the null space of a matrix A or the kernel of theassociated linear transformation LA. Also,

im(f) = c ∈ k(m)|Ab = c for some b ∈ k(n) = im(LA) = C(A).

Thus

n = dim(ker(f)) + dim(im(f)) = dim(N (A)) + dim(C(A)).

Proposition 3.38. Let A ∈Mm×n(k) and c ∈ k(m). Suppose that D and Eare invertible matrices as in Proposition 3.32 and suppose that r is the rankof A. Then Ax = c has a solution if and only if the last m − r coordinatesof Dc are zero.

63

Proof. Since D is an invertible m × m matrix, the equation Ax = c has asolution if and only if the equation DAx = Dc has a solution, and this holdsif and only if Dc ∈ C(DA) = C(DAE). The column space of DAE consistsof those vectors in k(m) whose last m − r coordinates are zero. Therefore,Ax = c has a solution if and only if the last m − r coordinates of Dc arezero.

Proposition 3.39. Let A ∈Mm×n(k) and c ∈ k(m). Suppose that D and Eare invertible matrices as in Proposition 3.32 and suppose that r is the rankof A. Assume that the equation Ax = c has a solution. Let

Dc =

d1...dr0...0

m×1

and d =

d1...dr0...0

n×1

,

where the last m−r coordinates of Dc equal zero and the last n−r coordinatesof d equal zero. (The notation is valid since r = dim(R(A)) ≤ minm,n.)Then a solution of Ax = c is given by x = Ed.

Proof. We have

DAEd = d1

(1st columnof DAE

)+ · · ·+ dr

(rth columnof DAE

)=

d1...dr0...0

m×1

= Dc.

Multiply by D−1 on the left to conclude AEd = c. Thus, x = Ed is a solutionof the equation Ax = c.

We now describe N (A). We have N (A) = N (DA) = LEN (DAE). (SeeProposition 3.19.) We first compute a basis of N (DAE) and dim(N (DAE)).Since DAE is an m×n matrix and r(DAE) = dim(C(DAE)) = r, it followsthat dim(N (DAE)) = n− r.

64

Let DAE = (aij)m×n. It follows from Proposition 3.32 that aij = 0for r + 1 ≤ i ≤ m, and the upper r × r submatrix of DAE is the r × ridentity matrix Ir. The null space of DAE corresponds to the solutions ofthe following homogeneous system of r linear equations in n variables.

x1 + a1,r+1xr+1 + · · ·+ a1nxn = 0

x2 + a2,r+1xr+1 + · · ·+ a2nxn = 0

...

xr + ar,r+1xr+1 + · · ·+ arnxn = 0

The vectors

−a1,r+1...

−ar,r+1

10...00

,

−a1,r+2...

−ar,r+2

01...00

, . . . ,

−a1n...−arn

00...01

are n − r linearly independent vectors in N (DAE), which therefore form abasis of N (DAE).

Finally we have N (A) = LEN (DAE). Since E is a permutation matrix,the coordinates of the basis vectors of N (DAE) are permuted to obtain abasis of N (A). The complete solution for the original system is b + N (A),where b is any solution to Ax = c. If a solution exists, then we saw abovethat we may take b = Ed.

3.8 Bases of Subspaces

A number of problems can be solved using the techniques above. Here areseveral types of problems and some general procedures based on elementaryrow operations.

65

1. Let V be a vector space with basis β = v1, . . . , vn. Let Y ⊂ V be anonzero subspace of V and suppose Y =< y1, . . . , ym >. Find a basisof Y .

Although Corollary 1.10(3) implies some subset of y1, . . . , ym containsa basis of Y , the Corollary does not produce a method to actually findthe basis. Here is a method.

We follow the notation that precedes Lemma 3.11. Suppose that yi =∑nj=1 aijvj, 1 ≤ i ≤ m. Let A = (aij)m×n. Then

φβ(yi) = (ai1, ai2, . . . , ain),

the ith row of A. Thus φβ(Y ) = R(A), the row space of A. Since Y is anonzero subspace of V , it is clear that A is a nonzero matrix. Thereforea basis of Y is given by φ−1β (a basis of R(A)).

Find invertible matrices D,E as in Proposition 3.32. Then R(A) =R(DA). The matrix DA has r nonzero rows which are linearly inde-pendent. These rows form a basis of R(A).

2. Let y1, . . . , ym ∈ k(n). Determine if these vectors are linearly indepen-dent.

Let A be an m × n matrix whose rows are y1, . . . , ym. Find invertiblematricesD,E as in Proposition 3.32. The vectors y1, . . . , ym are linearlyindependent if and only if r(A) = r(DAE) = m, i.e., each row ofDAE is nonzero. The rows of A are linearly dependent if and only ifelementary row operations can be performed on A to obtain a row ofzeros.

3. Let W ⊂ k(m) be a subspace. Let u1, . . . , un be a set of generators ofW . Let c ∈ k(m). Determine whether c ∈ W . If c ∈ W , find b1, . . . , bnin k such that c = b1u1 + · · ·+ bnun.

Let S be a basis of W . Then c ∈ W if and only if S⋃c is a linearly

dependent set. (This statement would be false if we replaced S by anarbitrary set of generators of W . See Exercise 13.) To find b1, . . . , bn,it is more convenient to change the problem to one about solving asystem of linear equations. Let A be the m× n matrix whose columnsare the vectors u1, . . . , un. Then c ∈ W if and only if the equationAx = c has a solution.

66

4. Let f : V → W be a linear transformation. Let β = v1, . . . , vn be abasis of V and let γ = w1, . . . , wm be a basis of W . Let A = [f ]γβ =(aij). Find a basis of im(f).

Proposition 3.12 implies that

φγ(im(f)) = φγ(f(V )) = LA(φβ(V ))

= LA(k(n)) = Ab | b ∈ k(n) = C(A).

Then a basis of im(f) is given by φ−1γ (a basis of C(A)). A basis of C(A)can be computed by applying elementary column operations on A inthe same manner as was used to compute a basis of R(A) in (1).

Exercises

1. Prove Proposition 3.3 and find a basis of Mm×n(k).

2. Prove Proposition 3.6. (An easy method is to interpret each matrix asthe matrix of a linear transformation with respect to some basis, andthen to recall that composition of functions is associative. For a secondproof, compute the entries of (AB)C and A(BC) directly.)

3. Matrix multiplication is not commutative, even for square matrices.


5. Using our usual notation, f ∈ L(V,W ) is an isomorphism if and onlyif [f ]γβ is an invertible matrix.

6. Let A ∈Mm×n(k) and let B ∈Mn×m(k).

(a) Suppose that AB = Im. Then m ≤ n.

(b) Suppose that AB = Im and m = n. Then BA = Im. Thus, ifm = n, then AB = Im ⇐⇒ BA = Im.

(c) Suppose that AB = Im and m < n. Then BA 6= In.

7. Prove Proposition 3.15 (1), (2).

8. Prove Propositions 3.21 and 3.22.

67

9. The elementary matrix Pij can be expressed as a product of the othertwo types of elementary matrices Eij(a) and Di(a). (First solve thisproblem for 2× 2 matrices.)

10. Give another proof of Proposition 3.23 by using the formulas Eij(a) =In + aeij, Di(a) = In + (a− 1)eii, and Pij = In − eii − ejj + eij + eji.

11. Let β = v1, . . . , vn be a basis of V and let γ = w1, . . . , wm be abasis of W . Set Vj =< vj > and Wi =< wi > so that V =

⊕nj=1 Vj and

W =⊕m

i=1Wi. Let f ∈ L(V,W ) and suppose [f ]γβ = (aij). Then showf =

∑mi=1

∑nj=1 fij where fij ∈ L(Vj,Wi) and fij(vj) = aijwi. Compare

this exercise to Exercise 12 of Chapter 2.

12. Justify the procedures given in Section 3.8 to solve the problems thatare posed.

13. Justify the statement about S⋃c in problem 3 of Section 3.8.

14. Suppose that each row and each column of A ∈ Mn×n(k) has exactlyone 1 and n− 1 zeros. Prove that A is a permutation matrix.

15. Let A be an m× n matrix and let r = rank(A). Then

(a) r ≤ minm,n.(b) r = dim(imLA).

(c) dim(kerLA) = n− r.

16. In Proposition 3.12, show that φβ(ker(f)) = ker(LA) and φγ(im(f)) =im(LA).

17. Suppose that S and T are two sets and that f : S → T is a bijection.Suppose that S is a vector space. Then there is a unique way to make Ta vector space such that f is an isomorphism of vector spaces. Similarly,if T is a vector space, then there is a unique way to make S a vectorspace such that f is an isomorphism of vector spaces.

18. Use Exercise 17 to develop a new proof of Proposition 3.4. Eitherstart with the vector space structure ofMm×n(k) to develop the vectorspace structure of L(V,W ), or start with the vector space structure ofL(V,W ) to develop the vector space structure of Mm×n(k).

68

19. Let A,A1, A2 ∈ Mm×n(k), B,B1, B2 ∈ Mn×p(k), C ∈ Mp×q(k), andlet a ∈ k. Then the following statements hold. (Compare with Propo-sition 2.12.)

(a) (A1 + A2)B = A1B + A2B

(b) A(B1 +B2) = AB1 + AB2

(c) a(AB) = (aA)B = A(aB)

(d) (AB)C = A(BC)

Note that when m = n = p = q, statements (1) - (4) imply thatMn×n(k) is an associative algebra.

20. A diagonal matrix A ∈ Mn×n(k) is a matrix that can be written∑ni=1 aieii. In other words, A = (aij) is a diagonal matrix if aij = 0 for

all i, j where i 6= j.

Let D1 =∑n

i=1 aieii and D2 =∑n

i=1 bieii be two diagonal matrices.Then D1D2 =

∑ni=1 aibieii. In particular, D1D2 = D2D1 for any two

diagonal n× n matrices.

69

Chapter 4

The Dual Space

4.1 Basic Results

Let V be a vector space over k, dimV = n. We shall consider k as a vectorspace of dimension one over k by identifying k with the one-dimensionalvector space k(1). That is, the element a ∈ k is identified with the vector(a) ∈ k(1).

Definition 4.1. The vector space L(V, k) is called the dual space of V , orsimply the dual of V and is written V ∗. The elements of V ∗ are called linearfunctionals.

Let β = v1, . . . , vn be a basis of V . We use Proposition 3.1 to defineelements φ1, . . . , φn ∈ L(V, k) = V ∗ as follows. Let φi denote the element inV ∗ such that

φi(vj) =

0, if i 6= j

1, if i = j.

Proposition 4.2. The elements φ1, . . . , φn form a basis of V ∗.

Proof. Let f ∈ V ∗ and suppose that f(vj) = cj, cj ∈ k, 1 ≤ j ≤ n. Letg =

∑ni=1 ciφi. Then g(vj) =

∑ni=1 ciφi(vj) = cj. Thus f = g by Proposi-

tion 3.1(1), because both functions agree on the basis β. This shows thatφ1, . . . , φn span V ∗.

Suppose that∑n

i=1 biφi = 0. Evaluation at vj gives 0 =∑n

i=1 biφi(vj) =bj. Therefore each bj = 0 and so φ1, . . . , φn are linearly independent. There-fore, the elements φ1, . . . , φn form a basis of V ∗.

70

The basis in Proposition 4.2 is called the dual basis of V ∗ to β and wewrite β∗ = φ1, . . . , φn to denote this. Since dimV ∗ = dimL(V, k) = n,by Proposition 3.4, we could have shortened the proof of Proposition 4.2 bymaking use of Proposition 1.14.

Corollary 4.3. Let β = v1, . . . , vn be a basis of V and let β∗=φ1, . . . , φnbe the dual basis of V ∗ to β. Let v ∈ V and let f ∈ V ∗. Then v =∑n

j=1 φj(v)vj and f =∑n

i=1 f(vi)φi.

Proof. The proof of Proposition 4.2 shows that f =∑n

i=1 f(vi)φi. Let v ∈ V .Then v =

∑ni=1 aivi, with each ai ∈ k uniquely determined. Then φj(v) =∑n

i=1 aiφj(vi) = aj. Therefore v =∑n

j=1 φj(v)vj.

We now begin to develop the connections between dual spaces, matrixrepresentations of linear transformations, transposes of matrices, and exactsequences.

Let V , W be finite dimensional vector spaces over k. Let β = v1, . . . , vnbe an ordered basis of V and let γ = w1, . . . , wm be an ordered basis of W .Let β∗ = φ1, . . . , φn and γ∗ = ψ1, . . . , ψm be the corresponding ordereddual bases of V ∗ and W ∗. Let f ∈ L(V,W ). Define a function f ∗ : W ∗ → V ∗

by the rule f ∗(τ) = τ f . Then f ∗(τ) lies in V ∗ because τ f is the composite

of the two linear transformations Vf→ W

τ→ k.

Proposition 4.4. Using the notation from above, the following statementshold.

1. f ∗ ∈ L(W ∗, V ∗).

2. [f ∗]β∗

γ∗ = ([f ]γβ)t.

Proof. (1) f ∗ ∈ L(W ∗, V ∗) because

f ∗(τ1 + τ2) = (τ1 + τ2) f = τ1 f + τ2 f = f ∗(τ1) + f ∗(τ2)

andf ∗(aτ) = (aτ) f = a(τ f) = af ∗(τ).

(2) Let [f ]γβ = (aij)m×n and let [f ∗]β∗

γ∗ = (bij)n×m. Then

f ∗(ψj) = ψj f =n∑i=1

(ψj f)(vi) · φi,

71

by Corollary 4.3. This shows that bij = (ψj f)(vi). But,

(ψj f)(vi) = ψj(f(vi)) = ψj(m∑l=1

aliwl) =m∑l=1

aliψj(wl) = aji.

Therefore bij = aji and this proves (2).

Let Uf→ V

g→ W be a sequence of linear transformations of vector spaces.The linear transformations f, g induce corresponding linear transformations

on the dual spaces. Namely, we have W ∗ g∗→ V ∗f∗→ U∗. Then (gf)∗ = f ∗g∗

because for every ψ ∈ W ∗, we have

(g f)∗(ψ) = ψ (g f) = (ψ g) f = g∗(ψ) f = f ∗ g∗(ψ).

Proposition 4.5. In the situation above, assume that dimU = p, dimV =n, and dimW = m. Let α, β, γ be ordered bases of U, V,W , respectively,and let α∗, β∗, γ∗ be the dual bases of U∗, V ∗,W ∗ to α, β, γ. Let A = [g]γβ,

B = [f ]βα, and C = [g f ]γα.Then (AB)t = BtAt. In fact, (AB)t = BtAt for any pair of matrices

having compatible sizes.

Proof. AB = [g]γβ[f ]βα = [g f ]γα = C. Therefore, Proposition 4.4(2) impliesthat

(AB)t = Ct = ([g f ]γα)t = [(g f)∗]α∗

γ∗ = [f ∗ g∗]α∗γ∗ = [f ∗]α∗

β∗ [g∗]β∗

γ∗ = BtAt.

Now let A,B be any pair of matrices having compatible sizes so that A,Bcan be multiplied. We may assume that A ∈ Mm×n(k) and B ∈ Mn×p(k).Then A = [g]γβ for some g ∈ L(V,W ) and B = [f ]βα for some f ∈ L(U, V ).The rest of the proof follows from above.

Proposition 4.5 furnishes another proof of Proposition 3.15(3). Proposi-tions 4.4 and 4.5 give a more intrinsic meaning of the transpose of a matrix.

4.2 Exact sequences and more on the rank of

a matrix

Lemma 4.6. Let Y be a vector space and let W ⊆ Y be a subspace. Thenthere exists a subspace X ⊆ Y such that W

⊕X = Y .

72

Proof. Let v1, . . . , vl be a basis of W . Extend this basis to a basis

v1, . . . , vl, vl+1, . . . , vm

of Y . Let X = 〈vl+1, . . . , vm〉, the span of vl+1, . . . , vm. Then Y = W +Xand W ∩ X = (0), and so Y = W

⊕X by Proposition 1.16 and Definition

1.17.

Proposition 4.7 (Fundamental Mapping Property of Linear Transforma-tions). Let V, Y, Z be vector spaces over k and let φ : V → Z, g : V → Y belinear transformations. Assume that ker(g) ⊆ ker(φ).

1. If g is surjective, then there exists a unique linear transformation ψ :Y → Z such that φ = ψ g.

2. In general, there always exists at least one linear transformation ψ :Y → Z such that φ = ψ g.

Proof. (1) First assume that g is surjective. Define ψ : Y → Z as follows.Let y ∈ Y . Since g is surjective, we may choose v ∈ V such that g(v) = y.Define ψ(y) by ψ(y) = φ(v). We must show that ψ is well-defined, ψ is alinear transformation, φ = ψ g and ψ is uniquely determined.

Suppose that v′ ∈ V and g(v′) = y. Then

v − v′ ∈ ker(g) ⊆ ker(φ).

Thus φ(v) = φ(v′), and this shows that ψ is well-defined.To show that ψ : Y → Z is a linear transformation, let y, y′ ∈ Y and

a ∈ k. Let v, v′ ∈ V such that g(v) = y and g(v′) = y′. Then g(v + v′) =g(v) + g(v′) = y + y′ and g(av) = ag(v) = ay. Thus,

ψ(y + y′) = φ(v + v′) = φ(v) + φ(v′) = ψ(y) + ψ(y′),

andψ(ay) = φ(av) = aφ(v) = aψ(y).

This shows that ψ is a linear transformation.Let v ∈ V . Then (ψ g)(v) = ψ(g(v)) = φ(v) from the definition of φ.To show that the linear transformation ψ : Y → Z is unique, suppose

that ψ′ : Y → Z is a linear transformation such that φ = ψ′ g. Let y ∈ Yand choose v ∈ V such that g(v) = y. Then

ψ(y) = φ(v) = ψ′(g(v)) = ψ′(y)

73

for all y ∈ Y . Therefore, ψ = ψ′.

(2) Now we remove the assumption that g is surjective. Let W = im(g).Then (1) implies that there exists a unique linear transformation ψ′ : W → Zsuch that φ = ψ′ g. Since Y = W

⊕X for some subspace X ⊆ Y , there

exists a linear transformation ψ : Y → Z such that ψ|W = ψ′. (For example,we can take ψ(X) = 0.) It follows that φ = ψ g.

Remark: Note that the statement and proof of (1) extends to the case ofmodules over a commutative ring R. The proof of (2) uses special propertiesof vector spaces, namely the existence of a basis.

Proposition 4.8. If Uf→ V

g→ W is an exact sequence of linear transfor-

mations of vector spaces, then W ∗ g∗→ V ∗f∗→ U∗ is also an exact sequence of

linear transformations.

Proof. First we show that im(g∗) ⊆ ker(f ∗). Let φ ∈ im(g∗). Then g∗(ψ) =φ, for some ψ ∈ W ∗. It follows that

f ∗(φ) = f ∗(g∗(ψ)) = (g f)∗(ψ) = ψ (g f) = 0,

because im(f) ⊆ ker(g) and so g f = 0. Thus φ ∈ ker(f ∗), so im(g∗) ⊆ker(f ∗).

Now let φ ∈ ker(f ∗). Then f ∗(φ) = 0 and so φ f = 0. Thus ker(g) =im(f) ⊆ ker(φ). Proposition 4.7 implies that there exists ψ : W → k suchthat ψ g = φ. Thus ψ ∈ W ∗ and g∗(ψ) = ψ g = φ. Thus φ ∈ im(g∗), andso ker(f ∗) ⊆ im(g∗). Therefore, im(g∗) = ker(f ∗).

Corollary 4.9. Let f ∈ L(V,W ) and let f ∗ ∈ L(W ∗, V ∗) be the inducedlinear transformation on dual spaces.

1. If f is injective, then f ∗ is surjective.

2. If f is surjective, then f ∗ is injective.

Proof. If f is injective, then 0→ V → W is exact at V and so W ∗ → V ∗ → 0is an exact sequence. Thus f ∗ is surjective.

If f is surjective, then V → W → 0 is exact at W and so 0→ W ∗ → V ∗

is an exact sequence. Thus f ∗ is injective.

74

Corollary 4.10. Suppose that W ⊆ V is a subspace and ι : W → V is theinclusion map. Let ι∗ : V ∗ → W ∗ denote the induced linear transformationon dual spaces. Let τ ∈ V ∗. Then ι∗(τ) = τ |W .

In particular, if ρ : V ∗ → W ∗ is the function given by restricting thedomain to W , then ρ = ι∗ and ρ is surjective.

Proof. Let τ ∈ V ∗. Then ι∗(τ) = τ ι = τ |W = ρ(τ). Therefore ρ = ι∗. TheCorollary 4.9 implies that ι∗ is surjective because ι is injective. Thus ρ issurjective.

Proposition 4.11. Let f ∈ L(V,W ) and let f ∗ ∈ L(W ∗, V ∗) be the inducedlinear transformation on dual spaces. Then dim(im(f)) = dim(im(f ∗)).

Proof. The linear transformation f : V → W induces the linear transforma-tions V

g→ im(f)ι→ W where ι is the inclusion map and g is the same as f

except that the range of f has been restricted to im(f). Then f = ι g.

This induces the linear transformations W ∗ ι∗→ im(f)∗g∗→ V ∗ where f ∗ =

g∗ι∗. We have that g∗ is injective because g is surjective, and ι∗ is surjectivebecause ι is injective. Since f ∗ : W ∗ → V ∗, this gives

dim(W ∗)− dim(im(f ∗)) = dim(ker(f ∗))

= dim(ker(ι∗)), because g∗ is injective,

= dim(W ∗)− dim(im(ι∗))

= dim(W ∗)− dim(im(f)∗), because ι∗ is surjective.

Therefore, dim(im(f ∗)) = dim(im(f)∗) = dim(im(f)).

Lemma 4.12. Let f : V → W be a linear transformation. Let β and γ beordered bases of V and W , respectively, and let A = [f ]γβ. Then dim(im(f)) =dim(C(A)).

Proof. Proposition 3.12 and Exercise 16 of Chapter 3 imply that

φγ(im(f)) = im(LA) = C(A).

Thus dim(im(f)) = dim(φγ(im(f))) = dim(C(A)).

We now can give a new proof of Proposition 3.35.

75

Theorem 4.13. Let A ∈Mm×n(k). Then dim(C(A)) = dim(R(A)).

Proof. Let LA : k(n) → k(m) be the linear transformation given by LA(b) =Ab. Let εn and εm be the standard bases of k(n) and k(m). Then [LA]εmεn = A,by Lemma 3.11.

Let ε∗n and ε∗m be the dual bases of (k(n))∗ and (k(m))∗ to εn and εm. Wehave an induced linear transformation

(LA)∗ : (k(m))∗ → (k(n))∗.

Proposition 4.4 (2) implies that

[(LA)∗]ε∗nε∗m

= ([LA]εmεn )t = At.

It follows from Lemma 4.12 and Proposition 4.11 that

dim(C(A)) = dim(im(LA)) = dim(im((LA)∗)) = dim(C(At)) = dim(R(A)).

Let f : V → W be a linear transformation, with dimV = n and dimW =m. Let f ∗ : W ∗ → V ∗ be the induced linear transformation of dual spaceswhere f ∗(τ) = τ f , for τ ∈ W ∗. We now construct bases of the subspacesker(f), im(f), ker(f ∗), and im(f ∗).

Let r = dim(im(f)). Then n− r = dim(ker(f)). Let vr+1, . . . , vn be abasis of ker(f). Extend this basis to a basis β = v1, . . . , vr, vr+1, . . . , vn ofV . Let wi = f(vi), 1 ≤ i ≤ r.

We show now that w1, . . . , wr is a basis of im(f). First, w1, . . . , wrspans im(f) because

im(f) = 〈f(v1), . . . , f(vn)〉 = 〈f(v1), . . . , f(vr)〉 = 〈w1, . . . , wr〉.

Since dim(im(f)) = r, it follows that w1, . . . , wr is a basis of im(f).Extend w1, . . . , wr to a basis γ = w1, . . . , wr, . . . , wm of W . Let

γ∗ = ψ1, . . . , ψm be the dual basis of W ∗ to γ. We will now show thatψr+1, . . . , ψm is a basis of ker(f ∗). If r + 1 ≤ j ≤ m then ψj ∈ ker(f ∗)because f ∗(ψj) = ψj f and

ψj f(V ) = ψj(im(f)) = ψj(〈w1, . . . , wr〉) = 0.

76

Now let ψ =∑m

i=1 ciψi ∈ ker(f ∗). Then 0 = f ∗(∑m

i=1 ciψi) = (∑m

i=1 ciψi) f .Therefore, if 1 ≤ j ≤ r, then

0 =

((m∑i=1

ciψi

) f

)(vj) =

(m∑i=1

ciψi

)(f(vj)) =

(m∑i=1

ciψi

)(wj) = cj.

Therefore ψ =∑m

i=r+1 ciψi. This shows that ψr+1, . . . , ψm spans ker(f ∗).Since ψr+1, . . . , ψm is part of a linearly independent set, it follows thatψr+1, . . . , ψm is a basis of ker(f ∗).

Let β∗ = φ1, . . . , φn be the dual basis of V ∗ to β. We will show thatφ1, . . . , φr is a basis of im(f ∗).

Since ψ1, . . . , ψm is a basis of W ∗ and since ψr+1, . . . , ψm ⊆ ker(f ∗),it follows that

im(f ∗) = 〈f ∗(ψ1), . . . , f∗(ψm)〉 = 〈f ∗(ψ1), . . . , f

∗(ψr)〉 = 〈ψ1 f, . . . , ψr f〉.

We will show that f ∗(ψi) = ψi f = φi for 1 ≤ i ≤ r. This will imply thatim(f ∗) = 〈φ1, . . . , φr〉. Since the set φ1, . . . , φr is a linearly independentset, being a subset of β∗, it will follow that φ1, . . . , φr is a basis of im(f ∗).

Assume that 1 ≤ i ≤ r. We have

(ψi f)(vj) = ψi(f(vj)) =

ψi(wj), if 1 ≤ j ≤ r

ψi(0), if r + 1 ≤ j ≤ n=

0, if j 6= i

1, if j = i

= φi(vj).

Therefore, f ∗(ψi) = ψi f = φi for 1 ≤ i ≤ r, because the two lineartransformations agree on a basis of V .

4.3 The double dual of a vector space

L(V ∗, k) is the dual space of V ∗. It is often written V ∗∗ and called the doubledual of V . Each element v ∈ V determines an element fv ∈ V ∗∗ as follows.Let fv : V ∗ → k be the function defined by fv(φ) = φ(v) for any φ ∈ V ∗.The function fv is a linear transformation because for φ, τ ∈ V ∗ and a ∈ k,we have

fv(φ+ τ) = (φ+ τ)(v) = φ(v) + τ(v) = fv(φ) + fv(τ) andfv(aφ) = (aφ)(v) = a(φ(v)) = afv(φ).Thus, fv ∈ L(V ∗, k). This gives us a function f : V → V ∗∗ where

f(v) = fv.

77

Proposition 4.14. The function f : V → V ∗∗ is an injective linear trans-formation. If dimV is finite, then f is an isomorphism.

Proof. First we show that f is a linear transformation. Let v, w ∈ V , let a ∈k, and let φ ∈ V ∗. Since fv+w(φ) = φ(v+w) = φ(v) +φ(w) = fv(φ) + fw(φ),it follows that fv+w = fv + fw in V ∗∗ Thus,

f(v + w) = fv+w = fv + fw = f(v) + f(w).

Since fav(φ) = φ(av) = aφ(v) = afv(φ), it follows that fav = afv in V ∗∗.Thus,

f(av) = fav = afv = af(v).

This shows that f is a linear transformation.Suppose that v ∈ ker f . Then fv = f(v) = 0. That means fv(φ) =

φ(v) = 0 for all φ ∈ V ∗. If v 6= 0, then V has a basis that contains v. Thenthere would exist an element φ ∈ V ∗ such that φ(v) 6= 0, because a lineartransformation can be defined by its action on basis elements, and this is acontradiction. Therefore v = 0 and f is injective.

If dimV = n, then n = dimV = dimV ∗ = dimV ∗∗. Therefore f is alsosurjective and so f is an isomorphism.

4.4 Annihilators of subspaces

Definition 4.15. Let S be a subset of a vector space V . The annihilator S

of S is the set S = τ ∈ V ∗ | τ(v) = 0 for all v ∈ S.

Thus S = τ ∈ V ∗ | S ⊆ ker(τ).

Lemma 4.16. Let S be a subset of V and let W = 〈S〉.

1. S is a subspace of V ∗.

2. S = W .

Proof. If τ1, τ2 ∈ S and a ∈ k, then it is easy to see that τ1 + τ2 ∈ S andaτ1 ∈ S. Therefore (1) holds.

Since S ⊆ W , it follows easily that W ⊆ S. Now let τ ∈ S. Thenτ(S) = 0 implies S ⊆ ker(τ). Since ker(τ) is a subspace, it follows thatW = 〈S〉 ⊆ ker(τ). Therefore, τ ∈ W and (2) holds.

78

Proposition 4.17. Let W be a subspace of V . Consider the function ρ :V ∗ → W ∗ given by restricting the domain to W . That is, ρ(τ) = τ |W .

1. ker(ρ) = W .

2. dim(W ) = dim(V )− dim(W ).

Proof. We know from Corollary 4.10 that ρ is a surjective linear transforma-tion. Then ker(ρ) = τ ∈ V ∗ | τ |W = 0 = W . Thus

dim(W ) = dim(ker(ρ)) = dimV ∗ − dim(im(ρ))

= dimV ∗ − dimW ∗ = dimV − dimW.

Here is a further look at the situation in Proposition 4.17. Let W be asubspace of V . The following sequence is exact.

0→ Wι→ V

π→ V/W → 0

Proposition 4.8 implies that the following sequence is exact.

0→ (V/W )∗π∗→ V ∗

ι∗→ W ∗ → 0

Since ι∗ = ρ (see Corollary 4.10), the exactness implies

π∗((V/W )∗) = im(π∗) = ker(ι∗) = ker(ρ) = W .

Thus π∗ : (V/W )∗ → W is an isomorphism and so

dim(W ) = dim((V/W )∗) = dim(V/W ) = dim(V )− dim(W ).

Since ker(ι∗) = W , we have V ∗/W ∼= W ∗.See Exercise 7 for additional proofs of some of these results.

Proposition 4.18. Let f : V → W be a linear transformation and letf ∗ : W ∗ → V ∗ be the corresponding linear transformation of dual spaces.Then

1. ker(f ∗) = (im(f))

2. (ker(f)) = im(f ∗)

79

Proof. (1) h ∈ ker(f ∗) ⇐⇒ h f = 0 ⇐⇒ h(im(f)) = 0 ⇐⇒ h ∈(im(f)).

(2) Let g ∈ im(f ∗). Then g = f ∗(h) = h f , for some h ∈ W ∗. Sinceg(ker(f)) = (h f)(ker(f)) = h(f(ker(f))) = 0, it follows that g ∈ (ker(f)).Thus, im(f ∗) ⊆ (ker(f)). Proposition 4.11 and Proposition 4.17 (2) implythat

dim(im(f ∗)) = dim(im(f)) = dimV − dim(ker(f)) = dim(ker(f)).

It follows that (ker(f)) = im(f ∗).

Here is a direct proof that (ker(f)) ⊆ im(f ∗) in Proposition 4.18 (2).Let φ ∈ (ker(f)). Then φ ∈ V ∗ and φ(ker(f)) = 0. Since ker(f) ⊆ ker(φ),Proposition 4.7 (2) implies that there exists ψ ∈ W ∗ such that f ∗(ψ) =ψ f = φ. Thus φ ∈ im(f ∗), so (ker(f)) ⊆ im(f ∗).

4.5 Subspaces of V and V ∗

The following definition includes Definition 4.15.

Definition 4.19. Let V be a vector space over k of dimension n. Let W bea subspace of V and let Y be a subspace of V ∗.

1. Let W = f ∈ V ∗ |W ⊆ ker(f).

2. Let Y =⋂f∈Y ker(f).

Thus, W ⊆ V ∗, Y ⊆ V , and Y = v ∈ V | f(v) = 0, for all f ∈ Y .From Proposition 4.17 (2), we have dim(W ) = dim(V )− dim(W ).

It is easy to see that if W1 ⊆ W2, then W 1 ⊇ W

2 , and if Y1 ⊆ Y2, then(Y1) ⊇ (Y2).

Proposition 4.20. W ⊆ (W ) and Y ⊆ (Y).

Proof. If x ∈ W , then f(x) = 0 for all f ∈ W . Thus,

x ∈⋂f∈W

ker(f) = (W ).

Therefore, W ⊆ (W ).If f ∈ Y , then f(v) = 0 for all v ∈ Y. Thus f(Y) = 0 and this implies

that f ∈ (Y). Therefore, Y ⊆ (Y)

.

80

Lemma 4.21. If Y is spanned by f1, . . . , fr, then Y =⋂ri=1 ker(fi).

Proof. We have Y =⋂f∈Y ker(f) ⊆

⋂ri=1 ker(fi), because fi ∈ Y . Now let

x ∈⋂ri=1 ker(fi). For each f ∈ Y , we have f = a1f1 + · · ·+ arfr with ai ∈ k.

It follows that f(x) =∑r

i=1 aifi(x) = 0. Then x ∈ ker(f) and it follows thatx ∈

⋂f∈Y ker(f) = Y. Therefore, Y =

⋂ri=1 ker(fi).

Proposition 4.22. dim(Y) = dim(V ∗)− dim(Y ).

Proof. We have dim(Y ) ≤ dim((Y)) = dim(V )−dim(Y). Thus, dim(Y) ≤

dim(V )− dim(Y ) = dim(V ∗)− dim(Y ).Suppose that dim(Y ) = r and let f1, . . . , fr be a basis of Y . Then

Lemma 4.21 implies that Y =⋂ri=1 ker(fi). Since fi 6= 0, it follows that

fi : V → k is surjective and so dim(ker(fi)) = n−1. Thus codim(ker(fi)) = 1and codim(

⋂ri=1 ker(fi)) ≤

∑ri=1 codim(ker(fi)) = r. (See Exercise 4 in

Chapter 2.) Then

dim(Y) = dim

(r⋂i=1

ker(fi)

)≥ dim(V )− r = dim(V )− dim(Y ) = dim(V ∗)− dim(Y ).

Therefore, dim(Y) = dim(V ∗)− dim(Y ).

Proposition 4.23. W = (W ) and Y = (Y).

Proof. dim((W )) = dim(V ∗)−dim(W ) = dim(V )−(dim(V )−dim(W )) =dim(W ). Since W ⊆ (W ), it follows that W = (W ).

dim((Y)) = dim(V ) − dim(Y) = dim(V ) − (dim(V ∗) − dim(Y )) =

dim(Y ). Since Y ⊆ (Y), it follows that Y = (Y)

.

Corollary 4.24. Let W1,W2 ⊆ V and Y1, Y2 ⊆ V ∗. If (W1) = (W2)

, thenW1 = W2. If (Y1) = (Y2), then Y1 = Y2.

Proof. If (W1) = (W2)

, then W1 = ((W1)) = ((W2)

) = W2. If (Y1) =(Y2), then Y1 = ((Y1))

= ((Y2)) = Y2.

The results above imply the following Theorem.

Theorem 4.25. Let ∆ : Subspaces of V → Subspaces of V ∗, whereW → W .

Let Ω : Subspaces of V ∗ → Subspaces of V , where Y → Y.Then ∆ and Ω are inclusion-reversing bijections, and ∆ and Ω are inverse

functions of each other.

81

4.6 A calculation

The next result gives a computation in V ∗ that doesn’t involve a dual basisof V ∗.

Let β = v1, . . . , vn be a basis of V and let γ = τ1, . . . , τn be a basisof V ∗ (not necessarily the dual basis of V ∗ to v1, . . . , vn). Let τi(vj) = bij,for 1 ≤ i ≤ n and 1 ≤ j ≤ n.

Let v ∈ V and τ ∈ V ∗. We wish to compute τ(v). Let τ =∑n

i=1 ciτi andlet v =

∑nj=1 djvj. Let

c =

c1...cn

, d =

d1...dn

, B = (bij)n×n.

Then

τ(v) =

(n∑i=1

ciτi

)(n∑j=1

djvj

)=

n∑i=1

n∑j=1

cidjτi(vj) =n∑i=1

n∑j=1

cibijdj

=n∑i=1

ci

(n∑j=1

bijdj

)= ctBd.

The last equality holds because the ith coordinate of Bd equals∑n

j=1 bijdj.If τ1, . . . , τn is the dual basis of V ∗ to v1, . . . , vn, then B = In because

we would have

bij =

0, if i 6= j

1, if i = j.

In this case, the formula reads τ(v) = ctd.

Exercises

1. Suppose that f(v) = 0 for all f ∈ V ∗. Then v = 0.

2. Suppose that V is infinite dimensional over k. Let β = vαα∈J be abasis of V where the index set J is infinite. For each α ∈ J , define alinear transformation φα : V → k by the rule φα(vλ) = 0 if α 6= λ andφα(vλ) = 1 if α = λ. Show that φαα∈J is a linearly independent setin V ∗ but is not a basis of V ∗.

82

3. In Proposition 4.14, show that f is not surjective if V is infinite dimen-sional over k. (As in the previous exercise assume that V has a basis,even though the proof of this fact in this case was not given in Chapter1.)

4. Let v1, . . . , vn be a basis of V and let φ1, . . . , φn be the dual basisof V ∗ to v1, . . . , vn. In the notation of Proposition 4.14, show thatfv1 , . . . , fvn is the dual basis of V ∗∗ to φ1, . . . , φn. Give a secondproof that the linear transformation f in Proposition 4.14 is an iso-morphism, in the case that dimV is finite, by observing that f takes abasis of V to a basis of V ∗∗.

5. Let V,W be finite dimensional vector spaces. Let f : V → W be alinear transformation. Let f ∗ : W ∗ → V ∗ and f ∗∗ : V ∗∗ → W ∗∗ bethe induced linear transformations on the dual spaces and double dualspaces. Let h : V → V ∗∗ and h′ : W → W ∗∗ be the isomorphisms fromProposition 4.14. Show that f ∗∗ h = h′ f .

6. Let Uf→ V

g→ W be a sequence of linear transformations of finite

dimensional vector spaces. Suppose that W ∗ g∗→ V ∗f∗→ U∗ is an exact

sequence of linear transformations. Then prove that the original se-quence is exact. (Hint: One method is to apply Proposition 4.8 to thegiven exact sequence and use Proposition 4.14 along with the previousexercise. One can also give a direct argument.)

7. Give a direct proof that ρ is surjective in Proposition 4.10 and give adirect proof of the dimension formula in Proposition 4.17 (2) as fol-lows. Let v1, . . . , vr be a basis of W . Extend this basis to a basisv1, . . . , vr, vr+1, . . . , vn of V . Let φ1, . . . , φn be the dual basis of V ∗

to v1, . . . , vn.

(a) Show that φ1|W , . . . , φr|W is the dual basis of W ∗ to v1, . . . , vr.Let τ ∈ W ∗. Show that τ =

∑ri=1 aiφi|W = ρ(

∑ri=1 aiφi). Con-

clude that ρ : V ∗ → W ∗ is surjective.

(b) Show that φr+1, . . . , φn ∈ W . If τ =∑n

i=1 aiφi ∈ W , then showaj = 0, 1 ≤ j ≤ r. Conclude that φr+1, . . . , φn spans W .Since φr+1, . . . , φn is a linearly independent set, conclude thatφr+1, . . . , φn is a basis of W . Therefore dim(W ) = dim(V )−dim(W ).

83

Chapter 5

Inner Product Spaces

5.1 The Basics

In this chapter, k denotes either the field of real numbers R or the field ofcomplex numbers C.

If α ∈ C, let α denote the complex conjugate of α. Thus if α = a + bi,where a, b ∈ R and i =

√−1, then α = a−bi. Recall the following facts about

complex numbers and complex conjugation. If α, β ∈ C, then α + β = α+β,αβ = αβ, and α = α. A complex number α is real if and only if α = α. Theabsolute value of α ∈ C is defined to be the nonnegative square root of αα.This is well defined because αα = a2 + b2 is a nonnegative real number. Wesee that this definition of the absolute value of a complex number extendsthe usual definition of the absolute value of a real number by considering thecase b = 0.

Definition 5.1. Let V be a vector space over k. An inner product on V isa function 〈〉 : V × V → k, where (v, w) 7→ 〈v, w〉, satisfying the followingfour statements. Let v1, v2, v, w ∈ V and c ∈ k.

1. 〈v1 + v2, w〉 = 〈v1, w〉+ 〈v2, w〉.

2. 〈cv, w〉 = c〈v, w〉.

3. 〈w, v〉 = 〈v, w〉.

4. 〈v, v〉 > 0 for all v 6= 0.

84

Statement 3 in Definition 5.1 implies that 〈v, v〉 is a real number. State-ments 1 and 2 in the definition imply that for any fixed w ∈ V , the function〈 , w〉 : V → k, given by v 7→ 〈v, w〉, is a linear transformation. It followsthat 〈0, w〉 = 0, for all w ∈ V .

Proposition 5.2. Let 〈〉 : V × V → k be an inner product on k. Letv, w, w1, w2 ∈ V and c ∈ k. Then

5. 〈v, w1 + w2〉 = 〈v, w1〉+ 〈v, w2〉.

6. 〈v, cw〉 = c〈v, w〉.

Proof. 〈v, w1 + w2〉 = 〈w1 + w2, v〉 = 〈w1, v〉+ 〈w2, v〉 = 〈w1, v〉 + 〈w2, v〉 =〈v, w1〉+ 〈v, w2〉.〈v, cw〉 = 〈cw, v〉 = c〈w, v〉 = c〈w, v〉 = c〈v, w〉.

If k = R, then 〈w, v〉 = 〈v, w〉 and 〈v, cw〉 = c〈v, w〉. If k = C, thenit is necessary to introduce complex conjugation in Definition 5.1 in orderstatements (1)-(4) to be consistent. For example, suppose 〈v, cw〉 = c〈v, w〉for all c ∈ C. Then −〈v, v〉 = 〈iv, iv〉 > 0 when v 6= 0. But 〈v, v〉 > 0 by (4)when v 6= 0.Example. Let V = k(n), where k is R or C, and let v, w ∈ k(n). Considerv, w to be column vectors in k(n) (or n×1 matrices). Define an inner product〈〉 : V×V → k by (v, w) 7→ vtw. If v is the column vector given by (a1, . . . , an)and w is the column vector given by (b1, . . . , bn), ai, bi ∈ k, then 〈v, w〉 =∑n

i=1 aibi. It is easy to check that the first two statements in Definition 5.1

are satisfied. The third statement follows from∑n

i=1 aibi =∑n

i=1 biai. Thefourth statement follows from

∑ni=1 aiai =

∑ni=1 |ai|2 > 0 as long as some ai

is nonzero. This inner product is called the standard inner product on k(n).

Definition 5.3. Let 〈〉 be an inner product on V and let v ∈ V . The normof v, written ||v||, is defined to be the nonnegative square root of 〈v, v〉.

The norm on V extends the absolute value function on k in the followingsense. Suppose V = k(1) and let v ∈ V . Then v = (a), a ∈ k. Assume that〈〉 is the standard inner product on V . Then ||v||2 = 〈v, v〉 = aa = |a|2.Thus ||v|| = |a|.

Proposition 5.4. Let 〈〉 be an inner product on V and let v, w ∈ V , c ∈ k.Then the following statements hold.

85

1. ||cv|| = |c| ||v||

2. ||v|| > 0, if v 6= 0.

3. |〈v, w〉| ≤ ||v|| ||w|| (Cauchy-Schwarz inequality)

4. ||v + w|| ≤ ||v||+ ||w|| (triangle inequality)

Proof. 1. ||cv||2 = 〈cv, cv〉 = cc〈v, v〉 = |c|2||v||2.2. If v 6= 0, then ||v||2 = 〈v, v〉 > 0.3. Let y = ||w||2v − 〈v, w〉w. Then

0 ≤ 〈y, y〉 = 〈||w||2v − 〈v, w〉w, ||w||2v − 〈v, w〉w〉= ||w||4〈v, v〉 − 〈v, w〉||w||2〈w, v〉 − ||w||2〈v, w〉〈v, w〉+ 〈v, w〉〈v, w〉〈w,w〉= ||w||2(||v||2||w||2 − |〈v, w〉|2 − |〈v, w〉|2 + |〈v, w〉|2)= ||w||2(||v||2||w||2 − |〈v, w〉|2).

If w = 0, then 3. is clear. If w 6= 0, then this calculation shows that0 ≤ ||v||2||w||2 − |〈v, w〉|2 and this gives 3.

4. First we note that 3 implies that

〈v, w〉+ 〈w, v〉 = 〈v, w〉+ 〈v, w〉 = 2Re(〈v, w〉) ≤ 2|〈v, w〉| ≤ 2||v|| ||w||.

Therefore, 4 follows from

||v + w||2 = 〈v + w, v + w〉 = 〈v, v〉+ 〈v, w〉+ 〈w, v〉+ 〈w,w〉≤ ||v||2 + 2||v|| ||w||+ ||w||2 = (||v||+ ||w||)2.

Proposition 5.5. The inner product 〈〉 is completely determined by || ||2.The following formulas hold.

1. If k = R, then 〈v, w〉 = 14||v + w||2 − 1

4||v − w||2.

2. If k = C, then 〈v, w〉 = 14||v+w||2− 1

4||v−w||2+ i

4||v+iw||2− i

4||v−iw||2.


86

5.2 Orthogonality

In this section, let 〈〉 denote a fixed inner product on V .

Definition 5.6. Let 〈〉 be an inner product on V .

1. Vectors v, w ∈ V are orthogonal if 〈v, w〉 = 0.

2. A subset S of vectors in V is an orthogonal set if each pair of distinctvectors in S is orthogonal.

3. An orthonormal set S in V is an orthogonal set with the additionalproperty that ||v|| = 1 for each v ∈ S.

Note that the relation of orthogonality is symmetric since 〈v, w〉 = 0 ifand only if 〈w, v〉 = 0.

Proposition 5.7. An orthogonal set of nonzero vectors in V is linearly in-dependent.

Proof. Let S be an orthogonal set in V and let v1, . . . , vn be distinct nonzerovectors in S. Suppose c1v1 + · · · + cnvn = 0, ci ∈ k. Then for each j,0 = 〈0, vj〉 = 〈

∑ni=1 civi, vj〉 =

∑ni=1 ci〈vi, vj〉 = cj〈vj, vj〉. Therefore cj = 0

since 〈vj, vj〉 6= 0. Thus v1, . . . , vn is a linearly independent set and so S is alinearly independent set.

Definition 5.8. Let S ⊆ V be a subset in V . The orthogonal complementof S in V , written S⊥, is v ∈ V |〈v, w〉 = 0 for all w ∈ S.

Note that S⊥ depends on the given inner product on V , although thenotation does not indicate that.

Proposition 5.9. 1. 0⊥ = V , V ⊥ = 0.

2. S⊥ is a subspace of V .

3. If S1 ⊆ S2, then S⊥2 ⊆ S⊥1 .

4. Let W = Span(S). Then W⊥ = S⊥.

Proof. If v ∈ V ⊥, then 〈v, v〉 = 0 and so v = 0.2 and 3 are clear.4. Let v ∈ S⊥. If w ∈ W , then w =

∑ni=1 cisi, where ci ∈ k and si ∈ S.

Then 〈v, w〉 = 〈v,∑n

i=1 cisi〉 =∑n

i=1 ci〈v, si〉 = 0, since v ∈ S⊥. Thus,v ∈ W⊥ and so S⊥ ⊆ W⊥. The other inclusion follows easily from 3.

87

Lemma 5.10. Let Y ⊆ V be a subspace.

1. If Y is spanned by v1, . . . , vl, then Y ⊥ =⋂li=1〈vi〉⊥.

2. If dimV = n and v ∈ V , with v 6= 0, then dim(〈v〉⊥) = n− 1.

Proof. Since 〈vi〉 ⊆ Y , we have Y ⊥ ⊆ 〈vi〉⊥ and it follows Y ⊥ ⊆⋂li=1〈vi〉⊥.

Let w ∈⋂li=1〈vi〉⊥ and let y ∈ Y . Then y =

∑li=1 aivi and

〈y, w〉 =l∑

i=1

ai〈vi, w〉 = 0.

Therefore, w ∈ Y ⊥ and we have Y ⊥ =⋂li=1〈vi〉⊥.

Now suppose dimV = n and let v ∈ V , with v 6= 0. Then 〈v〉⊥ is thekernel of the linear transformation f : V → k given by f(w) = 〈w, v〉. Thelinear transformation f is nonzero since f(v) 6= 0. Therefore, f is surjective(as k is a one-dimensional vector space over k) and Proposition 2.5 impliesdim(ker(f)) = dim(V )− dim(im(f)) = n− 1.

Proposition 5.11. Let W ⊆ V be a subspace, dimV = n. Then

1. W ∩W⊥ = (0),

2. dim(W⊥) = dimV − dimW ,

3. V = W⊕

W⊥,

4. W⊥⊥ = W , where W⊥⊥ means (W⊥)⊥.

Proof. Let v ∈ W ∩W⊥. Then 〈v, v〉 = 0 and this implies v = 0. Therefore,W ∩W⊥ = (0). This proves (1).

Let v1, . . . , vl be a basis of W . Lemma 5.10 implies that W⊥ =⋂li=1〈vi〉⊥ and dim(〈vi〉)⊥ = n− 1, 1 ≤ i ≤ l. Then

codim(l⋂

i=1

〈vi〉⊥) ≤l∑

i=1

codim(〈vi〉⊥) = l.

This implies

dim(V )− dim(l⋂

i=1

〈vi〉⊥) ≤ l

88

and so

dim(l⋂

i=1

〈vi〉⊥) ≥ n− l.

Therefore, dim(W⊥) ≥ n− l. Since W ∩W⊥ = (0), we have

n = l + (n− l) ≤ dimW + dimW⊥ = dim(W +W⊥) + dim(W ∩W⊥)

= dim(W +W⊥) ≤ dimV = n.

Thus dim(W⊥) = n− l and dim(W +W⊥) = n. This implies W +W⊥ = V .This proves (2) and (3).

It is easy to check that W ⊆ W⊥⊥. We have equality since

dimW⊥⊥ = dimV − dimW⊥ = dimV − (dimV − dimW ) = dimW.

This proves (4).

The next two results give a computational procedure to find the orthog-onal complement of a subspace.

Proposition 5.12 (Gram-Schmidt Orthogonalization). Let 〈〉 be an innerproduct on V and let v1, . . . , vn be a linearly independent set in V . Thenthere exists w1, . . . , wn ∈ V such that the following two statements hold.

1. w1, . . . , wn is an orthogonal set of nonzero vectors.

2. Spanv1, . . . , vl = Spanw1, . . . , wl, 1 ≤ l ≤ n.

In particular, if v1, . . . , vn is a basis of V , then w1, . . . , wn is an orthog-onal basis of V . Therefore, every finite dimensional inner product space hasan orthogonal basis.

Proof. The proof is by induction on n. If n = 1, let w1 = v1. Now assume byinduction that the result has been proved for all values m < n. That meanswe can assume there exist w1, . . . , wn−1 ∈ V such that w1, . . . , wn−1 is anorthogonal set of nonzero vectors and Spanv1, . . . , vl = Spanw1, . . . , wl,1 ≤ l ≤ n− 1.

Let

wn = vn −n−1∑j=1

〈vn, wj〉||wj||2

wj.

89

If wn = 0, then vn ∈ Spanw1, . . . , wn−1 = Spanv1, . . . , vn−1, which wouldcontradict the linear independence of v1, . . . , vn. Therefore, wn 6= 0.

If 1 ≤ i ≤ n− 1, then

〈wn, wi〉 = 〈vn, wi〉 −n−1∑j=1

〈vn, wj〉||wj||2

〈wj, wi〉

= 〈vn, wi〉 −〈vn, wi〉||wi||2

〈wi, wi〉 = 0.

Therefore, w1, . . . , wn is an orthogonal set of nonzero vectors and it iseasy to check that Spanv1, . . . , vn = Spanw1, . . . , wn. The remainingstatements now follow easily.

Corollary 5.13. Assume dimV = n.

1. If v1, . . . , vn is a basis of V , then an orthogonal basis w1, . . . , wnof V is given by the following formulas.

w1 = v1

w2 = v2 −〈v2, w1〉||w1||2

w1

w3 = v3 −〈v3, w1〉||w1||2

w1 −〈v3, w2〉||w2||2

w2

...

wi = vi −i−1∑j=1

〈vi, wj〉||wj||2

wj

...

wn = vn −n−1∑j=1

〈vn, wj〉||wj||2

wj

2. Let yi = 1||wi||wi. Then y1, . . . , yn is an orthonormal basis of V .

In particular, every finite dimensional inner product space has an or-thonormal basis.

Proof. 1. The wi’s are well defined since each wi is defined in terms ofvi, w1, . . . , wi−1. The proof of the previous proposition shows w1, . . . , wn isan orthogonal basis of V .

2. Proposition 5.4(1) implies that ||yi|| = 1.

90

5.3 The Matrix of an Inner Product on V

Definition 5.14. Let A ∈Mm×n(k), A = (aij)m×n.

1. The adjoint of A, written A∗, is defined to be At. That is, A∗ ∈Mn×m(k) and if A∗ = (bij)n×m, then bij = aji.

2. If A∗ = A, then A is called a hermitian matrix. Note that if A ishermitian, then m = n.

Observe that At =(A)t

.

Proposition 5.15. Let A,B ∈ Mm×n(k) and let C ∈ Mn×p(k). Let c ∈ k.Then

1. (A+B)∗ = A∗ +B∗ and (cA)∗ = cA∗,

2. (AC)∗ = C∗A∗,

3. A∗∗ = A, where A∗∗ means (A∗)∗.

Proof. (1) and (3) are easy using basic facts about complex conjugation andtransposes of matrices. Here is a proof of (2).

(AC)∗ = (AC)t = CtAt = Ct At = C∗A∗.

Since we sometimes work with more than one inner product at a time,we modify our notation for an inner product function in the following way.We denote an inner product by a function B : V × V → k, where B(x, y) =〈x, y〉B. If there is no danger of confusion, we write instead B(x, y) = 〈x, y〉.

Definition 5.16. Let B : V × V → k be an inner product on V and letβ = v1, . . . , vn be an ordered basis of V . Let [B]β ∈Mn×n(k) be the matrixwhose (i, j)-entry is given by 〈vi, vj〉B. Then [B]β is called the matrix of Bwith respect to the basis β.

Proposition 5.17. Let B : V × V → k be an inner product on V and letβ = v1, . . . , vn be an ordered basis of V . Let G = [B]β, the matrix of Bwith respect to the basis β.

Let x =∑n

i=1 aivi and y =∑n

j=1 bjvj. Then the following statementshold.

91

1. 〈x, y〉 = [x]tβG[y]β.

2. G is a hermitian matrix.

3. G is an invertible matrix.

Proof. 1.

〈x, y〉 = 〈n∑i=1

aivi,

n∑j=1

bjvj〉 =n∑i=1

n∑j=1

〈aivi, bjvj〉

=n∑i=1

n∑j=1

aibj〈vi, vj〉 =n∑i=1

ai(n∑j=1

〈vi, vj〉bj) = [x]tβG[y]β.

2. The (i, j)-entry of G∗ = (j, i)− entry of G = 〈vj, vi〉 = 〈vi, vj〉 =(i, j)-entry of G. Thus, G∗ = G.

3. Suppose G[y]β = 0 for some y ∈ V . Then G[x]β = 0 where [x]β = [y]β.

Now 〈x, x〉 = [x]tβG[x]β = 0. This implies x = 0 and thus y = 0. Therefore,the null space of G equals zero. This implies that ker(LG) = 0 and thus G isinvertible by Theorem 3.24.

Proposition 5.18. Let β = v1, . . . , vn be a basis of V . Let B denotethe set of all inner products B : V × V → k on V . Consider the functiongβ : B →Mn×n(k) given by gβ(B) = [B]β where B ∈ B. Then gβ is injective.The image of gβ is contained in the set of all hermitian matrices inMn×n(k).

Proof. The function gβ is injective since the entries of [B]β, 〈vi, vj〉, uniquelydetermine B. The image of gβ is contained in the set of hermitian matricesin Mn×n(k) by Proposition 5.17.

The precise image of gβ is determined in Corollary 5.21 below.If β and γ are two bases of V , the next result gives the relation between

[B]β and [B]γ.

Proposition 5.19. Let β = v1, . . . , vn and γ = w1, . . . , wn be two basesof V . Let G = [B]β and H = [B]γ. Let P = [1V ]βγ . Then H = P tGP .

Proof. The formula [x]β = [1V ]βγ [x]γ = P [x]γ and Proposition 5.17 imply for

any x, y ∈ V that [x]tγH[y]γ = 〈x, y〉 = [x]tβG[y]β = [x]tγPtGP [y]γ. Since this

holds for all x, y ∈ V , it follows that H = P tGP .

92

Exercise 5 asks for another proof of Proposition 5.19.

Theorem 5.20. Let V be a vector space over k of dimension n and letβ = v1, . . . , vn be a basis of V . Let B : V × V → k be an inner product onV . Then G = [B]β = P tP for some invertible matrix P ∈Mn×n(k).

Conversely, let P ∈ Mn×n(k) be an invertible matrix. Then there existsa unique inner product B : V × V → k such that [B]β = P tP .

Proof. First assume B : V × V → k is an inner product on V . Let γ =w1, . . . , wn be an orthonormal basis of V with respect to the inner productB. Let P = [1V ]γβ. Then [x]γ = [1V ]γβ[x]β = P [x]β. Thus, P is invertible since1V is invertible. We have [B]γ = In because γ is an orthonormal basis of Vwith respect to B. We now have

[x]tβ[B]β[y]β = 〈x, y〉B = [x]tγ[B]γ[y]γ = [x]tγIn[y]γ = [x]tβPtInP [y]β

= [x]tβPtP [y]β.

Since this holds for all x, y ∈ V , it follows that [B]β = P tP .Now let P ∈ Mn×n(k) be an arbitrary invertible matrix. We will show

that the function B : V × V → k defined by B(x, y) = [x]tβPtP [y]β is an

inner product on V . Statements (1),(2) of Definition 5.1 are easily seen tohold. For statement (3) of Definition 5.1, we have

B(y, x) = [y]tβPtP [x]β = [x]β

tPtP [y]β = [x]tβP

tP [y]β = B(x, y).

For statement (4), we have

B(x, x) = [x]tβPtP [x]β = (P [x]β)tP [x]β = [y]tβ[y]β,

for the unique vector y ∈ V satisfying [y]β = P [x]β. Thus, B(x, x) =

[y]tβ[y]β ≥ 0, with equality holding if and only if [y]β = 0. But [y]β = 0if and only if [x]β = 0 since P is invertible. Therefore B(x, x) > 0 if x 6= 0.This shows that statements (1)-(4) of Definition 5.1 hold, and thus B is aninner product.

To see that [B]β = P tP , we observe that the (i, j)-entry of [B]β =B(vi, vj) = etiP

tPej = the (i, j)-entry of P tP . The uniqueness follows fromProposition 5.18.

We have now proved the following Corollary.

93

Corollary 5.21. im(gβ) = P tP |P ∈ Mn×n(k), P invertible. Thus amatrix G has the form G = [B]β if and only if G = P tP for some invertiblematrix P ∈Mn×n(k).

Corollary 5.22. Let V = k(n) and let β = εn, the standard basis of k(n). LetG ∈Mn×n(k). Consider the function B : V × V → k, where (x, y) 7→ xtGy.Then B is an inner product on k(n) if and only if G = P tP for some invertiblematrix P ∈Mn×n(k).

Let us now check how P tP depends on the choice of the orthonormal basisγ of V in the proof of Theorem 5.20. Suppose δ = y1, . . . , yn is anotherorthonormal basis of V . Let Q = [1V ]δγ. Then [x]δ = [1V ]δγ[x]γ = Q[x]γ. Since

δ is an orthonormal basis of V , we have 〈x, y〉B = [x]tδIn[y]δ = [x]tγQtInQ [y]γ.

Since this equation holds for all x, y ∈ V , and we saw earlier that 〈x, y〉B =[x]tγIn[y]γ, it follows that QtQ = In.

Now we repeat the computation of 〈x, y〉B above with δ in place of γ. Wehave [1V ]δβ = [1V ]δγ[1V ]γβ = QP . Therefore, 〈x, y〉B = [x]tβ(QP )tQP [y]β. But

(QP )tQP = P tQtQ P = P tP . This shows that a different choice of γ wouldreplace P by QP where QtQ = In.

5.4 The Adjoint of a Linear Transformation

Lemma 5.23. Let (V, 〈〉) be given and assume that dimV = n. Let T :V → V be a linear transformation. Let w ∈ V be given. Then

1. The function h : V → k, given by v 7→ 〈Tv, w〉, is a linear transforma-tion.

2. There is a unique w′ ∈ V such that h(v) = 〈v, w′〉 for all v ∈ V . Thatis, h = 〈 , w′〉.

Proof. 1. h(v1 + v2) = 〈T (v1 + v2), w〉 = 〈Tv1 + Tv2, w〉 = 〈Tv1, w〉 +〈Tv2, w〉 = h(v1) + h(v2).

h(cv) = 〈T (cv), w〉 = 〈cTv, w〉 = c〈Tv, w〉 = ch(v).2. Let y1, . . . , yn be an orthonormal basis of V . Let ci = h(yi), 1 ≤ i ≤ n,and let w′ =

∑ni=1 ciyi. Then 〈yj, w′〉 = 〈yj,

∑ni=1 ciyi〉 =

∑ni=1 ci〈yj, yi〉 =

cj = h(yj), 1 ≤ j ≤ n. Therefore h = 〈, w′〉, since both linear transformationsagree on a basis of V .

94

To show that w′ is unique, suppose that 〈v, w′〉 = 〈v, w′′〉 for all v ∈ V .Then 〈v, w′−w′′〉 = 0 for all v ∈ V . Let v = w′−w′′. Then w′−w′′ = 0 andso w′ = w′′.

Proposition 5.24. Let (V, 〈〉) be given and assume that dimV = n. LetT : V → V be a linear transformation. Then there exists a unique lineartransformation T ∗ : V → V such that 〈Tv, w〉 = 〈v, T ∗w〉 for all v, w ∈ V .

Proof. Define T ∗ : V → V by w 7→ w′ where 〈Tv, w〉 = 〈v, w′〉, for all v ∈ V .Therefore, 〈Tv, w〉 = 〈v, T ∗w〉 for all v, w ∈ V . It remains to show that T ∗

is a linear transformation. The uniqueness of T ∗ is clear from the previousresult.

For all v ∈ V ,

〈v, T ∗(w1 + w2)〉 = 〈Tv, w1 + w2〉 = 〈Tv, w1〉+ 〈Tv, w2〉= 〈v, T ∗w1〉+ 〈v, T ∗w2〉 = 〈v, T ∗w1 + T ∗w2〉.

Therefore, T ∗(w1 + w2) = T ∗w1 + T ∗w2. For all v ∈ V ,

〈v, T ∗(cw)〉 = 〈Tv, cw〉 = c〈Tv, w〉 = c〈v, T ∗w〉 = 〈v, cT ∗w〉.

Therefore, T ∗(cw) = cT ∗w.

Definition 5.25. The linear transformation T ∗ constructed in Proposition5.24 is called the adjoint of T .

Proposition 5.26. Let T, U ∈ L(V, V ) and let c ∈ k. Then

1. (T + U)∗ = T ∗ + U∗.

2. (cT )∗ = cT ∗.

3. (TU)∗ = U∗T ∗.

4. T ∗∗ = T . (T ∗∗means(T ∗)∗ .)

Proof. We have

〈v, (T + U)∗w〉 = 〈(T + U)v, w〉 = 〈Tv + Uv,w〉 = 〈Tv, w〉+ 〈Uv,w〉= 〈v, T ∗w〉+ 〈v, U∗w〉 = 〈v, T ∗w + U∗w〉 = 〈v, (T ∗ + U∗)w〉.

Thus, (T + U)∗w = (T ∗ + U∗)w for all w ∈ V and so (T + U)∗ = T ∗ + U∗.

95

Next, we have

〈v, (cT )∗w〉 = 〈(cT )v, w〉 = 〈c(Tv), w〉 = c〈Tv, w〉 = c〈v, T ∗w〉 = 〈v, cT ∗w〉.

Thus, (cT )∗w = cT ∗w for all w ∈ V and so (cT )∗ = cT ∗.Since 〈v, (TU)∗w〉 = 〈(TU)v, w〉 = 〈Uv, T ∗w〉 = 〈v, U∗T ∗w〉, it follows

that (TU)∗w = U∗T ∗w for all w ∈ V . Thus (TU)∗ = U∗T ∗.Since 〈v, T ∗∗w〉 = 〈T ∗v, w〉 = 〈w, T ∗v〉 = 〈Tw, v〉 = 〈v, Tw〉, it follows

that T ∗∗w = Tw for all w ∈ V . Thus T ∗∗ = T .

Proposition 5.27. Let T ∈ L(V, V ) and assume that T is invertible. ThenT ∗ is invertible and (T ∗)−1 = (T−1)∗.

Proof. Let 1V : V → V be the identity linear transformation. Then (1V )∗ =1V , since 〈v, (1V )∗w〉 = 〈1V (v), w〉 = 〈v, w〉. Thus (1V )∗w = w for all w ∈ Vand so (1V )∗ = 1V .

Then (T−1)∗T ∗ = (TT−1)∗ = (1V )∗ = 1V , and T ∗(T−1)∗ = (T−1T )∗ =(1V )∗ = 1V . Therefore, (T ∗)−1 = (T−1)∗.

Proposition 5.28. Let (V, 〈〉) be given and assume that dimV = n. Letβ = v1, . . . , vn be an ordered orthonormal basis of V . Let T : V → V be a

linear transformation. Let A = [T ]ββ and let B = [T ∗]ββ. Then B = A∗(= At).

Proof. Let A = (aij)n×n and let B = (bij)n×n Then T (vj) =∑n

i=1 aijvi,1 ≤ j ≤ n, and T ∗(vj) =

∑ni=1 bijvi, 1 ≤ j ≤ n. But

T (vj) =n∑i=1

〈T (vj), vi〉vi

by Exercise 3 at the end of this chapter, since β is an orthonormal basis.Thus, aij = 〈Tvj, vi〉. Similar reasoning applied to T ∗ implies that bij =

〈T ∗(vj), vi〉 = 〈vi, T ∗(vj)〉 = 〈Tvi, vj〉 = aji. Therefore B = A∗.

5.5 Normal Transformations

Definition 5.29. Let (V, 〈〉) be given and let T ∈ L(V, V ).

1. T is normal if TT ∗ = T ∗T .

2. T is hermitian if T ∗ = T .

96

3. T is skew-hermitian if T ∗ = −T .

4. T is unitary if TT ∗ = T ∗T = 1V .

5. T is positive if T = SS∗ for some S ∈ L(V, V ).

Clearly, hermitian, skew-hermitian, and unitary transformations are spe-cial cases of normal transformations.

Definition 5.30. Let T ∈ L(V, V ).

1. An element α ∈ k is an eigenvalue of T if there is a nonzero vectorv ∈ V such that Tv = αv.

2. An eigenvector of T is a nonzero vector v ∈ V such that Tv = αv forsome α ∈ k.

3. For α ∈ k, let Vα = ker(T − α1V ). Then Vα is called the eigenspace ofV associated to α.

The definition of Vα depends on T although the notation doesn’t indicatethis. If α is not an eigenvalue of T , then Vα = 0.

The three special types of normal transformations above are distinguishedby their eigenvalues.

Proposition 5.31. Let α be an eigenvalue of T .

1. If T is hermitian, then α ∈ R.

2. If T is skew-hermitian, then α is pure imaginary. That is, α = riwhere r ∈ R and i =

√−1.

3. If T is unitary, then |α| = 1.

4. If T is positive, then α ∈ R and α is positive.

Proof. Let Tv = αv, v 6= 0.1. α〈v, v〉 = 〈αv, v〉 = 〈Tv, v〉 = 〈v, T ∗v〉 = 〈v, Tv〉 = 〈v, αv〉 = α〈v, v〉.Since 〈v, v〉 6= 0, it follows that α = α and so α ∈ R.2. Following the argument in (1), we have α〈v, v〉 = 〈v, T ∗v〉 = 〈v,−Tv〉 =〈v,−αv〉 = −α〈v, v〉. Therefore α = −α and so 2Re(α) = α + α = 0. Thisimplies α = ri where r ∈ R and i =

√−1.

97

3. 〈v, v〉 = 〈v, T ∗Tv〉 = 〈Tv, Tv〉 = 〈αv, αv〉 = αα〈v, v〉. Thus, αα = 1 andso |α| = 1.4. As in (1), we have α〈v, v〉 = 〈v, T ∗v〉 = 〈v, SS∗v〉 = 〈S∗v, S∗v〉 > 0. Since〈v, v〉 > 0, it follows that α > 0.

Definition 5.32. Let W ⊆ V be a subspace and let T ∈ L(V, V ). Then Wis a T -invariant subspace of V if T (W ) ⊆ W .

Proposition 5.33. Let T ∈ L(V, V ) and let W be a subspace of V . Assumethat dimV = n.

1. If W is a T -invariant subspace of V , then W⊥ is a T ∗-invariant sub-space of V .

2. If W is a T -invariant subspace of V and T is normal, then W⊥ is aT -invariant subspace of V .

3. Assume that W is a T -invariant subspace of V and T is normal. Using(2), let T |W : W → W and T |W⊥ : W⊥ → W⊥ be the linear transfor-mations induced by T by restriction of domains. Then T |W and T |W⊥are each normal.

Proof. 1. Let x ∈ W and let y ∈ W⊥. Then Tx ∈ W and 〈x, T ∗y〉 =〈Tx, y〉 = 0, since y ∈ W⊥. Since x is an arbitrary vector in W , it followsthat T ∗y ∈ W⊥ and so T ∗(W⊥) ⊆ W⊥.2. Let v1, . . . , vl be an orthonormal basis of W and let vl+1, . . . , vnbe an orthonormal basis of W⊥. Then β = v1, . . . , vl, vl+1, . . . , vn is anorthonormal basis of V . Let A = [T ]ββ. Then A∗ = [T ∗]ββ by Proposition5.28, since β is an orthonormal basis of V . We have AA∗ = A∗A, sinceTT ∗ = T ∗T . Since W is a T -invariant subspace of V , we have

A =

(C D0 E

),

where

C ∈Ml×l(k), D ∈Ml×(n−l)(k), 0 ∈M(n−l)×l(k), andE ∈M(n−l)×(n−l)(k).

Note that the submatrix 0 appears because W is T -invariant. The equationAA∗ = A∗A gives(

C D0 E

)(C∗ 0D∗ E∗

)=

(C∗ 0D∗ E∗

)(C D0 E

)98

(CC∗ +DD∗ DE∗

ED∗ EE∗

)=

(C∗C C∗DD∗C D∗D + E∗E

).

It is necessary to check carefully that the block matrix multiplicationshown here works as indicated. The upper l × l matrix shows that CC∗ +DD∗ = C∗C and so DD∗ = C∗C − CC∗. It follows from this that D = 0.(See Exercise 10 for the details of this conclusion.) Therefore,

A =

(C 00 E

).

Since A = [T ]ββ, it follows that W⊥ is T -invariant.3. Since D = 0, the matrix equations above show that CC∗ = C∗C andEE∗ = E∗E. If we let β

′= v1, . . . , vl and β

′′= vl+1, . . . , vn, then we

have C = [T |W ]β′

β′and E = [T |W⊥ ]β

′′

β′′. It follows that T |W and T |W⊥ are both

normal.

In order to apply Proposition 5.33 we require the following two facts.These two facts will be proved later in these notes.

Facts: Let T : V → V be a linear transformation. (There is no assump-tion here that V has an inner product.)

1. If k = C, then V contains a one-dimensional T -invariant subspace.

2. If k = R, then V contains a T -invariant subspace of dimension ≤ 2.

When k = C, statement 1 implies that V contains an eigenvector. Tosee this, let W be a one-dimensional T -invariant subspace generated by thenonzero vector v. Then Tv = αv for some α ∈ k since Tv ∈ W . Thus v isan eigenvector with eigenvalue α.

Theorem 5.34. Let (V, 〈〉) be given and assume dimV = n. Let T : V →V be a linear transformation and assume that T is normal.

1. If k = C, then V can be written as an orthogonal direct sum of one-dimensional T -invariant subspaces. Thus, V has an orthonormal basisβ consisting of eigenvectors of T and [T ]ββ is a diagonal matrix.

2. If k = R, then V can be written as an orthogonal direct sum of T -invariant subspaces of dimension ≤ 2. Let

V = W1

⊕· · ·⊕

Wl

⊕Wl+1

⊕· · ·⊕

Wl+s,

99

where

dimWi =

1, if 1 ≤ i ≤ l

2, if l + 1 ≤ i ≤ l + s.

Each T |Wi: Wi → Wi is a normal transformation.

3. If k = R, then V can be written as an orthogonal direct sum of one-dimensional T -invariant subspaces if and only if T is hermitian (T ∗ =T ). That is, V has an orthonormal basis β consisting of eigenvectorsof T if and only if T ∗ = T .

Proof. 1. The proof is by induction on n. If n = 1, the result is clear. Nowassume the result has been proved if dimV < n and assume dimV = n.Using Fact 1, there exists a one-dimensional T -invariant subspace W1. ThenV = W1

⊕W⊥

1 , where W⊥1 is T -invariant and T |W⊥1 is normal. Since

dimW⊥1 = n − 1, the induction hypothesis implies that W⊥

1 can be writ-ten as an orthogonal direct sum of one-dimensional T |W⊥1 -invariant sub-

spaces. Thus, W⊥1 = W2

⊕· · ·⊕

Wn, where dimWi = 1, 2 ≤ i ≤ n, Wi

is T |W⊥1 -invariant, and the Wi’s are pairwise orthogonal. Therefore V =W1

⊕W2

⊕· · ·⊕

Wn where each Wi is a one-dimensional T -invariant sub-space and the Wi’s are pairwise orthogonal. Let Wi = 〈vi〉, where ||vi|| = 1,1 ≤ i ≤ n. Then β = v1, . . . , vn is an orthonormal basis of V and [T ]ββ is adiagonal matrix with (i, i)-entry equal to αi, where Tvi = αivi.

2. The proof of (2) is also by induction and is similar to the proof of(1) except we use Fact 2 in place of Fact 1. Let Wi = 〈vi〉, 1 ≤ i ≤ l,where ||vi|| = 1 and Tvi = αivi. Let yi, zi be an orthonormal basis ofWi, l + 1 ≤ i ≤ l + s. Then β = v1, . . . , vl, yl+1, zl+1, . . . , yl+s, zl+s is anorthonormal basis of V . We know that T |Wi

is normal by Proposition 5.33.It remains to describe normal transformations T : V → V when k = R

and dimV = 2. We will do that now before proving 3.

Proposition 5.35. Let k = R, dimV = 2, T ∈ L(V, V ), and assume thatT is normal. Then either T ∗ = T or T ∗ = λT−1 for some λ ∈ R, λ > 0.

Proof. Let β be an orthonormal basis of V and let A = [T ]ββ. Then A∗ =

[T ∗]ββ, since β is an orthonormal basis of V . Since k = R, we have A∗ = At.Let

A =

(a bc d

).

100

Since T is normal, we have TT ∗ = T ∗T and so AAt = AA∗ = A∗A = AtA.Then AAt = AtA gives(

a bc d

)(a cb d

)=

(a cb d

)(a bc d

)(a2 + b2 ac+ bdac+ bd c2 + d2

)=

(a2 + c2 ab+ cdab+ cd b2 + d2

).

Comparing the (1, 1)-entry gives b2 = c2. If b = c, then A = At = A∗ andso T ∗ = T . Now assume that b = −c 6= 0. Then ac + bd = ab + cd impliesac + (−c)d = a(−c) + cd and so 2c(a − d) = 0. Since 2c 6= 0, we must havea = d. We now have

A =

(a −cc a

).

It is easy to check that AAt = AtA = (a2 + c2)I2. Therefore, A∗ = At =(a2 + c2)A−1 and so T ∗ = λT−1 for some λ ∈ R.

Proposition 5.36. Let k = R, dimV = 2, T ∈ L(V, V ), and assume that Tis normal. Then T ∗ = T if and only if V has a one-dimensional T -invariantsubspace.

Proof. First assume that V has a one-dimensional T -invariant subspace W .Then V = W

⊕W⊥ and W⊥ is also a T -invariant subspace by Proposition

5.33. Let W = 〈v1〉 and let W⊥ = 〈v2〉, where ||vi|| = 1, i = 1, 2. Thenβ = v1, v2 is an orthonormal basis of V and A = [T ]ββ is a diagonal matrix,since v1, v2 are each eigenvectors of T . Since A∗ = At = A and β is anorthonormal basis of V , it follows that T ∗ = T by Proposition 5.28.

Now assume that T ∗ = T . Let β be an orthonormal basis of V and let

A = [T ]ββ =

(a bc d

).

Since β is an orthonormal basis of V , we have A = A∗ = At and so b = c.A straight forward calculation shows that A2 − (a + d)A + (ad − b2)I2 = 0.It follows that T 2 − (a + d)T + (ad − b2)1V = 0. Let f(x) = x2 − (a +d)x + (ad− b2). Then f(x) = 0 has two (not necessarily distinct) real rootssince the discriminant (a + d)2 − 4(ad − b2) = (a − d)2 + 4b2 ≥ 0. Thus,f(x) = (x−α)(x−γ), where α, γ ∈ R. It follows that (T−α1V )(T−γ1V ) = 0.We may assume that ker(T − α1V ) 6= 0. That means there exists a nonzero

101

vector v ∈ V such that Tv = αv. Then W = 〈v〉 is a one-dimensionalT -invariant subspace of V .

Now we can finish the proof of 3 in Theorem 5.34. Let β be the orthonor-mal basis from the proof of 2 and let A = [T ]ββ. Let βi = vi, if 1 ≤ i ≤ l,and let βi = yi, zi, if l + 1 ≤ i ≤ l + s. Then βi is an orthonormal basisof Wi, 1 ≤ i ≤ l + s and β = β1 ∪ · · · ∪ βl+s. Let Ai = [T |Wi

]βiβi . Since eachWi is a T-invariant subspace of V , βi is an orthonormal basis of Wi, β is anorthonormal basis of V , each T |Wi

is normal (by 2), Proposition 5.36, andk = R, we have the following conclusions.

T is hermitian ⇔ T ∗ = T ⇔ A∗ = A ⇔ At = A ⇔ Ati = Ai ⇔A∗i = Ai ⇔ (T |Wi

)∗ = T |Wi⇔ each Wi has a one-dimensional T |Wi

-invariantsubspace ⇔ each Wi can be written as an orthogonal direct sum of one-dimensional T |Wi

-invariant subspaces ⇔ V can be written as an orthogonaldirect sum of one-dimensional T -invariant subspaces.

Some special terminology for the case k = R:

Definition 5.37. Assume k = R and let T : V → V be a linear transforma-tion.

1. If T ∗ = T , then T is symmetric.

2. If T ∗ = −T , then T is skew-symmetric.

3. If TT ∗ = T ∗T = 1V , then T is orthogonal.

Some terminology for matrices over C and R:

Definition 5.38. Let A ∈Mn×n(k). Recall that A∗ = At

= At.

1. A is normal ⇔ AA∗ = A∗A⇔ AAt = AtA.

2. A is hermitian ⇔ A∗ = A ⇔ At = A. If k = R and A is hermitian,then At = A and A is called symmetric.

3. A is skew-hermitian ⇔ A∗ = −A ⇔ At = −A. If k = R and A isskew-hermitian, then At = −A and A is called skew-symmetric.

4. A is unitary ⇔ AA∗ = A∗A = In ⇔ AtA = AAt = In. If k = R andA is unitary, then AAt = AtA = In and A is called orthogonal.

102

If A is a unitary matrix, then the columns of A are orthonormal withrespect to the standard inner product (pairwise orthogonal and have norm1) because A∗A = In implies AtA = In. The rows of A are also orthonormalwith respect to the standard inner product because AA∗ = In. If k = Rand A is orthogonal, note that this requires the rows and columns to beorthonormal, not just orthogonal. Tradition requires us to live with thisconfusing terminology.

Theorem 5.39. Let A ∈Mn×n(k).

1. Assume k = C. Then A is normal if and only if there exists a unitarymatrix P such that P ∗AP is a diagonal matrix.

2. Assume k = R. Then A is a symmetric matrix if and only if thereexists an orthogonal matrix P such that P tAP is a diagonal matrix.

Proof. 1. First assume that P is a unitary matrix such that P ∗AP = D is adiagonal matrix. Then

(P ∗AP )(P ∗AP )∗ = DD∗ = D∗D = (P ∗AP )∗(P ∗AP )

P ∗APP ∗A∗P = P ∗A∗PP ∗AP.

Then AA∗ = A∗A, since PP ∗ = In and P, P ∗ are both invertible.Now assume that A is normal. We may assume that A = [T ]γγ for some

T ∈ L(V, V ) and some orthonormal basis γ of V . Since k = C, V containsan orthonormal basis β consisting of eigenvectors of T , by Theorem 5.34. LetP = [1V ]γβ. Then P is a unitary matrix since β and γ are both orthonormal

bases of V . (See Exercise 8(b).) Then [T ]ββ is a diagonal matrix and [T ]ββ =

[1V ]βγ [T ]γγ[1V ]γβ = P−1AP = P ∗AP .

2. First assume that P is an orthogonal matrix such that P tAP = D is adiagonal matrix. Then (P tAP )t = Dt = D = P tAP. Thus, P tAtP = P tAPand so At = A, since P is invertible. Therefore A is symmetric.

Now assume that A is symmetric. We may assume that A = [T ]γγ forsome T ∈ L(V, V ) and some orthonormal basis γ of V . Since A = At = A∗

and γ is an orthonormal basis of V , it follows that T = T ∗ and T is her-mitian. By Theorem 5.34(3), V contains an orthonormal basis β consist-ing of eigenvectors of T . Then [T ]ββ is a diagonal matrix. As in 1, let

103

P = [1V ]γβ. Then P is an orthogonal matrix since β and γ are both or-

thonormal bases of V . (See Exercise 8(b).) Then [T ]ββ is a diagonal matrix

and [T ]ββ = [1V ]βγ [T ]γγ[1V ]γβ = P−1AP = P tAP .

If k is an arbitrary field, char k different from 2, and A ∈ Mn×n(k) is asymmetric matrix, then it can be shown that there exists an invertible matrixP such that P tAP is a diagonal matrix. The field of real numbers R hasthe very special property that one can find an orthogonal matrix P such thatP tAP is diagonal.

Let A ∈ Mn×n(R), A symmetric. Here is a nice way to remember howto find the orthogonal matrix P in Theorem 5.39(2). Let εn be the standardbasis of R(n). Let β = v1, . . . , vn be an orthonormal basis of R(n) consistingof eigenvectors of A. Let P be the matrix whose jth column is the eigenvectorvj expressed in the standard basis εn. Then P = [1V ]εnβ . Recall that P is anorthogonal matrix since the columns of P are orthonormal. Let Avj = αjvj,1 ≤ j ≤ n. Let D ∈ Mn×n(R) be the diagonal matrix whose (j, j)-entry isαj. Then it is easy to see that AP = PD and so D = P−1AP = P tAP .

Exercises

1. In proposition 5.4(3), equality in the Cauchy-Schwarz inequality holdsif and only if v, w are linearly dependent.


3. Let v1, . . . , vn be an orthogonal set of nonzero vectors. If w =∑ni=1 civi, ci ∈ k, then

cj =〈w, vj〉||vj||2

.

4. Let (V, 〈〉) be given, with dimV finite. Let S1, S2 be subsets of V andlet W1 = Span(S1), W2 = Span(S2). Recall that S⊥i = W⊥

i , i = 1, 2.Assume that 0 ∈ Si, i = 1, 2. Show the following are true.

(a)(S1 + S2)

⊥ = (W1 +W2)⊥ = W⊥

1 ∩W⊥2 = S⊥1 ∩ S⊥2

(W1 ∩W2)⊥ = W⊥

1 +W⊥2 = S⊥1 + S⊥2 .

104

(b) It is possible that (S1 ∩ S2) ( (W1 ∩ W2) and (W1 ∩ W2)⊥ (

(S1 ∩ S2)⊥.

5. Give another proof of Proposition 5.19 by directly computing the (r, s)-entry of H, namely 〈wr, ws〉, and showing that it is equal to the (r, s)-entry of P tGP .

6. Let G,H ∈ Mn×n(k) and suppose xtGy = xtHy for all x, y ∈ k(n).Then show G = H.

7. Suppose RtR = P tP , where P,R are invertible matrices in Mn×n(k).Let Q = PR−1. Then QtQ = In.

8. (a) Suppose that β, γ are bases of V . Then [B]γ = ([1V ]βγ)t[B]β[1V ]βγ .

(b) Assume that β, γ are orthonormal bases of V . Let R = [1V ]γβ.

Then RtR = In.

9. Let T ∈ L(V, V ) and let W be a subspace of V . Assume that Wis a T -invariant subspace of V and that T is normal. Then W is aT ∗-invariant subspace of V .

10. Let A ∈ Mn×n(k), A = (aij)n×n. Define the trace of A to be trA =a11 + a22 + · · · ann, the sum of the diagonal entries of A.

(a) If A,B ∈Mn×n(k), then tr(A+B) = trA+ trB.

(b) If A,B ∈Mn×n(k), then tr(AB) = tr(BA).

(c) If A ∈Mn×n(k), compute tr(AA∗).

(d) If tr(AA∗) = 0, then A = 0.

(e) If BB∗ = A∗A− AA∗, then B = 0.

11. Let T : V → V be a normal linear transformation. Use the followingparts to show that if Tmv = 0 for some vector v ∈ V and some positiveinteger m, then Tv = 0. (Tm = T · · · T .)

(a) If R : V → V is a linear transformation and R∗Rv = 0, thenRv = 0.

(b) If S : V → V is hermitian and S2v = 0, then Sv = 0.

(c) If S : V → V is hermitian and Smv = 0, then Sv = 0.

105

(d) Let S = T ∗T . Then Sm = (T ∗)mTm.

(e) If Tmv = 0, then Tv = 0.

12. Suppose k = R, dimV = 2, T ∈ L(V, V ), and T is normal. Describeall such T ’s such that T ∗ = T and T ∗ = λT−1. In Proposition 5.35,if b = −c 6= 0, then V does not contain a one-dimensional T -invariantsubspace.

13. Describe all orthogonal matrices A ∈M2×2(R).

14. Let A ∈M2×2(R) be a symmetric matrix. Find an orthogonal matrixP ∈M2×2(R) such that P tAP is diagonal.

106

Chapter 6

Determinants

6.1 The Existence of n-Alternating Forms

In this chapter, k denotes an arbitrary field.

Definition 6.1. Let V1, . . . , Vn,W be vector spaces over k. A function

f : V1 × · · · × Vn → W

is multilinear (or n-multilinear) if f is linear in each variable. That is, ifj is given and vi ∈ Vi, i 6= j, then the function fj : Vj → W , given byfj(x) = f(v1, . . . , vj−1, x, vj+1, . . . , vn) is a linear transformation.

We will usually assume that V = V1 = · · · = Vn and W = k. In this case,a multilinear function (or n-multilinear function) is called a multilinear form(or n-multilinear form) on V .

Definition 6.2. An n-multilinear function f : V × · · · × V → W is calledan alternating function (or n-alternating function) if

1. f is a multilinear function and

2. f(v1, . . . , vn) = 0 whenever two adjacent arguments of f are equal. Thatis, if vi = vi+1, for some i, 1 ≤ i ≤ n− 1, then f(v1, . . . , vn) = 0.

If W = k, we call f an alternating form (or n-alternating form) on V .

If f : V ×· · ·×V → W is defined by f(v1, . . . , vn) = 0 for all v1, . . . , vn ∈V , then f is clearly an n-alternating form on V .

107

Proposition 6.3. Let f : V × · · · × V → W be an n-alternating function.Let v1, . . . , vn ∈ V . Then the following statements hold.

1. f(v1, . . . , vi+1, vi, . . . , vn) = −f(v1, . . . , vi−1, vi, vi+1, vi+2, . . . , vn). (In-terchanging two adjacent arguments of f multiplies f(v1, . . . , vn) by−1.)

2. If vi = vj, i 6= j, then f(v1, . . . , vi, . . . vj, . . . , vn) = 0.

3. If i < j, then

f(v1, . . . , vi, . . . , vj, . . . , vn) = −f(v1, . . . , vj, . . . , vi, . . . , vn).

(Interchanging any two distinct arguments of f multiplies the value off by −1.)

4. Let a ∈ k and let i 6= j. Then

f(v1, . . . , vn) = f(v1, . . . , vi−1, vi + avj, vi+1, . . . , vn).

Proof. (1) Let g(x, y) = f(v1, . . . , vi−1, x, y, vi+2, . . . , vn), x, y ∈ V . Then g isalso multilinear and alternating, and so we have

0 = g(x+ y, x+ y) = g(x, x) + g(x, y) + g(y, x) + g(y, y) = g(x, y) + g(y, x).

Thus, g(y, x) = −g(x, y), and so g(vi+1, vi) = −g(vi, vi+1). This gives (1).(2) We successively interchange adjacent arguments of f until two adja-

cent arguments are equal. Then (1) implies that this changes the value off(v1, . . . , vi, . . . vj, . . . , vn) by a factor of ±1. Since f is alternating, the newvalue of f is 0. Therefore f(v1, . . . , vi, . . . vj, . . . , vn) = 0.

(3) This follows immediately from (2) using the argument in (1).(4) The multilinearity of f and the result in (2) give

f(v1, . . . , vi−1,vi + avj, vi+1, . . . , vn)

= f(v1, . . . , vi, . . . , vn) + af(v1, . . . , vi−1, vj, vi+1, . . . , vn)

= f(v1, . . . , vi, . . . , vn).

108

If char k 6= 2, then condition (1) of Proposition (6.3) can be taken as thedefinition of an alternating function. To see this, suppose that

f : V × · · · × V → W

is an n-multilinear function and suppose that condition (1) of Proposition(6.3) holds. Assume that vi = vi+1, where 1 ≤ i ≤ n−1, and define g(x, y) asin the proof of condition (1) of Proposition 6.3. Then condition (1) impliesthat g(x, x) = −g(x, x) for all x ∈ V . Then 2g(x, x) = 0. If char k 6= 2,then g(x, x) = 0, and thus the original defining condition of an alternatingfunction holds.

If char k = 2, then there are n-multilinear functions that satisfy condition(1) of Proposition 6.3, but don’t satisfy the original defining condition of analternating function.

We have not yet given an example of a nonzero alternating function asit requires some work to produce such examples. We seemingly digress for amoment to discuss some results about permutations and then return to theproblem of producing nontrivial alternating functions. (Permutations wereintroduced in Chapter 3, Section 5.)

We now recall some concepts from Chapter 3, Section 5. Let Tn denotethe set of all functions σ : 1, 2, . . . , n → 1, 2, . . . , n. Let Sn be the subsetof Tn consisting of all bijective functions

σ : 1, 2, . . . , n → 1, 2, . . . , n.

The elements of Sn are called permutations of 1, 2, . . . , n since each σ ∈Sn corresponds to a rearrangement or reordering of 1, 2, . . . , n. An easycalculation shows that |Tn| = nn and |Sn| = n!.

Let σ ∈ Sn. The ordered set σ(1), σ(2), . . . , σ(n) can be rearrangedto the ordered set 1, 2, . . . , n by a sequence of steps, called transpositions,that only interchange two elements at time. Let ε(σ) denote the number ofinterchanges required to do this. Although ε(σ) is not necessarily well defined,Theorem 6.4 below states that ε(σ) is well defined modulo 2. We’ll see belowthat the result in Theorem 6.4 is essentially equivalent to the existence ofa nonzero n-alternating function on V . Namely, in Proposition 6.6, we willshow that if char k 6= 2 and if there exists a nonzero n-alternating functionon V , then ε(σ) is well defined modulo 2. Conversely, if ε(σ) is well definedmodulo 2, then Exercise 1 shows that there is a nonzero n-alternating form

109

on V . We will prove that ε(σ) is well defined modulo 2 as a consequence ofProposition 6.6 and Theorem 6.10.

Theorem 6.4. Using the notation above, ε(σ) is well defined modulo 2.

We will prove Theorem 6.4 later. Theorem 6.4 allows us to make thefollowing definition.

Definition 6.5. A permutation σ ∈ Sn is called even (or odd), if ε(σ) iseven (or odd).

Proposition 6.6. Let V,W be vector spaces defined over k and assume thatchar k 6= 2. Suppose that there exists a nonzero n-alternating function

f : V × · · · × V → W.

Then ε(σ) is well defined modulo 2 for every σ ∈ Sn.

Proof. Choose v1, . . . , vn ∈ V such that f(v1, . . . , vn) 6= 0 and let σ ∈ Sn.We apply Proposition 6.3(3) repeatedly to see that f(vσ(1), . . . , vσ(n)) =(−1)ε(σ)f(v1, . . . , vn). Since f(v1, . . . , vn) 6= −f(v1, . . . , vn), it follows thatfor any sequence of transpositions bringing (vσ(1), . . . , vσ(n)) back to the orig-inal order v1, . . . , vn, the value (−1)ε(σ) is always the same. Therefore, ε(σ)well defined modulo 2.

The statement that ε(σ) is well defined modulo 2 has nothing to do withwhether a field has characteristic different from 2. Thus, although we willshow in Theorem 6.10 that there exist n-alternating functions over all fields,Proposition 6.6 implies that in order to show that ε(σ) is well defined modulo2, we need only show the existence of an n-alternating function over just onefield k having characteristic different from 2.

The following Proposition contains a calculation that is central to the restof this chapter.

Proposition 6.7. Assume that ε(σ) is well defined modulo 2 and let

f : V × · · · × V → W

be an n-alternating function on V . Let v1, . . . , vn ∈ V be arbitrary vectors.Let wj =

∑ni=1 aijvi, 1 ≤ j ≤ n, aij ∈ k. Then

f(w1, . . . , wn) =∑σ∈Sn

(−1)ε(σ)aσ(1)1 · · · aσ(n)nf(v1, . . . , vn).

110

Proof.

f(w1, . . . , wn) = f(a11v1 + · · ·+ an1vn, . . . , a1nv1 + · · ·+ annvn)

=∑σ∈Tn

aσ(1)1 · · · aσ(n)nf(vσ(1), . . . , vσ(n))

=∑σ∈Sn

aσ(1)1 · · · aσ(n)nf(vσ(1), . . . , vσ(n))

=∑σ∈Sn

(−1)ε(σ)aσ(1)1 · · · aσ(n)nf(v1, . . . , vn).

Notations. Let A ∈Mn×n(k).

1. Let Aj denote the jth column of A. Sometimes we will write A =(A1, . . . , An)

2. Let Aij denote the (n−1)×(n−1) submatrix of A obtained by deletingthe ith row and jth column of A.

Definition 6.8. Let A ∈ Mn×n(k) and consider the columns of A as ele-ments of k(n). An n×n determinant is a function det :Mn×n(k)→ k, whereA→ detA, that satisfies the following two conditions.

1. det : k(n) × · · · × k(n) → k is an n-alternating form on the columns ofA.

2. det In = 1.

If A ∈Mn×n(k), we will write either detA or det(A1, . . . , An), whicheveris more convenient.

Proposition 6.9. Suppose that char k 6= 2. If a determinant function det :Mn×n(k) → k exists, then it is unique. In particular, ε(σ) is well definedmodulo 2, and if A = (aij)n×n, then

detA =∑σ∈Sn

(−1)ε(σ)aσ(1)1 · · · aσ(n)n.

111

Proof. The existence of a determinant function implies that ε(σ) is well de-fined modulo 2 by Definition 6.8 and Proposition 6.6.

Let e1, . . . , en be the standard basis of k(n) and let A ∈Mn×n(k). ThenAj =

∑ni=1 aijei. Proposition 6.7 implies that

detA = det(A1, . . . , An) =∑σ∈Sn

(−1)ε(σ)aσ(1)1 · · · aσ(n)n det(e1, . . . , en)

=∑σ∈Sn

(−1)ε(σ)aσ(1)1 · · · aσ(n)n,

because det(e1, . . . , en) = det In = 1. This shows that det is uniquely deter-mined by the entries of A.

We now show that a determinant function det : Mn×n(k) → k exists.One possibility would be to show that the formula given in Proposition 6.9satisfies the two conditions in Definition 6.8. (See Exercise 1.) Instead weshall follow a different approach that will yield more results in the long run.

Theorem 6.10. For each n ≥ 1 and every field k, there exists a determinantfunction det :Mn×n(k)→ k.

Proof. The proof is by induction on n. If n = 1, let det : M1×1(k) → k,where det(a)1×1 = a. Then det is linear (1-multilinear) on the column of(a)1×1 and det I1 = 1. The alternating condition is satisfied vacuously.

Now let n ≥ 2 and assume that we have proved the existence of a determi-nant function det : M(n−1)×(n−1)(k) → k. Let A ∈ Mn×n(k), A = (aij)n×n.Choose an integer i, 1 ≤ i ≤ n, and define detA =

∑nj=1(−1)i+jaij detAij.

This expression makes sense because detAij is defined by our induction hy-pothesis. Showing that detA is a multilinear form on the n columns of Ais the same as showing that detA is a linear function (or transformation)on the lth column of A, 1 ≤ l ≤ n. Since a sum of linear transformationsis another linear transformation, it is sufficient to consider separately eachterm (−1)i+jaij detAij.

If j 6= l, then aij doesn’t depend on the lth column of A and detAij is alinear function on the lth column of A, by the induction hypothesis, becauseAij is an (n − 1) × (n − 1) matrix. Therefore, (−1)i+jaij detAij is a linearfunction on the lth column of A.

If j = l, then aij = ail is a linear function on the lth column of A anddetAij = detAil doesn’t depend on the lth column of A because Ail is ob-tained from A by deleting the lth column of A. Therefore, (−1)i+jaij detAijis a linear function on the lth column of A in this case also.

112

We conclude that detA is a multilinear form on the n columns of A. Nextwe show that detA is an alternating form on the n columns of A. SupposeAl = Al+1. If j 6= l, l + 1, then Aij has two adjacent columns that are equal.Therefore detAij = 0 by the induction hypothesis. The remaining two termsin the expression for detA are

(−1)i+lail detAil + (−1)i+(l+1)ai(l+1) detAi(l+1).

We have Ail = Ai(l+1) and ail = ai(l+1) because the lth, (l + 1)th columns ofA are equal. Therefore,

(−1)i+lail detAil + (−1)i+(l+1)ai(l+1) detAi(l+1)

= ail detAil((−1)i+l + (−1)i+(l+1)) = 0.

Thus detA = 0 and so detA is an alternating form on the n columns of A.Now we show that det In = 1. Letting In = A in the formula for detA

and noting that aij = 0 when i 6= j and aii = 1, we have

det In =n∑j=1

(−1)i+jaij detAij = (−1)i+iaii detAii = detAii = 1,

by the induction hypothesis, because Aii = In−1. Therefore det is a determi-nant function.

The formula given for detA in Theorem 6.10 is called the expansion ofdetA along the ith row of A.

We have now established the existence of a nonzero n-alternating formon k(n) for an arbitrary field k. Since any vector space V of dimension n overk is isomorphic to k(n), it follows that we have established the existence ofnonzero n-alternating forms on V for any n-dimensional vector space V overk. By Proposition 6.6, we have now proved that ε(σ) is well defined modulo2 for any σ ∈ Sn.

The uniqueness part of Proposition 6.9 shows that our definition of detAdoes not depend on the choice of i given in the proof of Theorem 6.10.Therefore, we can compute detA by expanding along any row of A.

We now restate Proposition 6.7 in the following Corollary using the resultfrom Proposition 6.9.

113

Corollary 6.11. Let f : V × · · · × V → W be an n-alternating function onV . Let v1, . . . , vn ∈ V be arbitrary vectors. Let wj =

∑ni=1 aijvi, 1 ≤ j ≤ n,

aij ∈ k. Let A = (aij)n×n ∈Mn×n(k). Then

f(w1, . . . , wn) = (detA)f(v1, . . . , vn).

The next result summarizes the information we need for the function ε(σ),where σ ∈ Sn.

Proposition 6.12. 1. ε(σ) is well defined modulo 2 for all σ ∈ Sn.

2. ε(στ) ≡ ε(σ) + ε(τ) (mod 2), for all σ, τ ∈ Sn.

3. ε(σ−1) ≡ ε(σ) (mod 2), for all σ ∈ Sn.

Proof. (1) This follows from Proposition 6.6 and Theorem 6.10.(2) This follows from counting the number of interchanges needed to pass

through the sequences στ(1), . . . , στ(n), τ(1), . . . , τ(n), 1, 2, . . . , n.(3) ε(σ) + ε(σ−1) ≡ ε(σσ−1) ≡ ε(identity) ≡ 0 (mod 2). Therefore,

ε(σ−1) ≡ ε(σ) (mod 2), for all σ ∈ Sn.

Proposition 6.13. Let A ∈Mn×n(k). Then det(At) = detA.

Proof. Let A = (aij)n×n and let At = (bij)n×n. Then bij = aji and

det(At) =∑σ∈Sn

(−1)ε(σ)bσ(1)1 · · · bσ(n)n

=∑σ∈Sn

(−1)ε(σ)a1σ(1) · · · anσ(n)

=∑σ∈Sn

(−1)ε(σ−1)aσ−1(σ(1))σ(1) · · · aσ−1(σ(n))σ(n)

=∑σ∈Sn

(−1)ε(σ−1)aσ−1(1)1 · · · aσ−1(n)n

=∑σ∈Sn

(−1)ε(σ)aσ(1)1 · · · aσ(n)n

= detA.

Proposition 6.14.

114

1. The function det :Mn×n(k)→ k is an n-alternating form on the rowsof n× n matrices.

2. The determinant of a matrix A ∈ Mn×n(k) can be computed by ex-panding along any row or any column of A.

Proof. (1) Since det(At) = detA and the rows of A are the columns of At,the result follows from the properties of det applied to the columns of At.

(2) We have already seen that detA can be computed by expanding alongany row of A. Now let B = At, B = (bij)n×n. Then bij = aji and Bij = (Aji)

t.If we expand along the jth column of A, we have

n∑i=1

(−1)i+jaij detAij =n∑i=1

(−1)i+jbji det(Bji)t

=n∑i=1

(−1)i+jbji detBji = detB = det(At) = detA,

where we have expanded along the jth row of B.

6.2 Further Results and Applications

Theorem 6.15. Let A,B ∈Mn×n(k). Then det(AB) = (detA)(detB).

Proof. Let A = (aij)n×n, let B = (bij)n×n and let C = AB. Denote thecolumns of A,B,C by Aj, Bj, Cj, respectively, 1 ≤ j ≤ n. Let e1, . . . , endenote the standard basis of k(n). Our results on matrix multiplication implythat Cj = ABj =

∑ni=1 bijAi, 1 ≤ j ≤ n. Since Aj =

∑ni=1 aijei, Corollary

6.11 implies that

det(AB) = detC = det(C1, . . . , Cn)

= (detB) det(A1, . . . , An)

= (detB)(detA) det(e1, . . . , en)

= (detA)(detB) det In

= (detA)(detB).

115

Here is a second proof of Theorem 6.15: Define f : Mn×n(k) → k byf(B) = det(AB). Then f(B1, . . . , Bn) = det(AB1, . . . , ABn), because thecolumns of AB are AB1, . . . , ABn. It is straightforward to check that f isan n-alternating form on k(n). Let B = (bij)n×n. Then Bj =

∑ni=1 bijei,

1 ≤ j ≤ n, where e1, . . . , en denotes the standard basis of k(n). Corollary6.11 implies that

det(AB) = f(B) = f(B1, . . . , Bn)

= (detB)f(e1, . . . , en)

= (detB)f(In)

= (detB) det(AIn)

= (detA)(detB).

Proposition 6.16. Let A ∈ Mn×n(k). Then A is invertible if and only ifdetA 6= 0.

Proof. If A is invertible, then there exists a matrix B ∈ Mn×n(k) suchthat AB = In. Then (detA)(detB) = det(AB) = det In = 1. Therefore,detA 6= 0.

Now suppose that A is not invertible. Then Theorem 3.37 implies thatthere is a linear dependence relation b1A1+· · ·+bnAn = 0 among the columnsof A with some bi 6= 0. We now use Proposition 6.3(4) to see that

bi detA = bi det(A1, . . . , Ai−1, Ai, Ai+1, . . . An)

= det(A1, . . . , Ai−1, biAi, Ai+1, . . . An)

= det(A1, . . . , Ai−1, b1A1 + · · ·+ bnAn, Ai+1, . . . An)

= det(A1, . . . , Ai−1, 0, Ai+1, . . . An)

= 0.

Since bi 6= 0, it follows that detA = 0.

Proposition 6.17 (Cramer’s Rule). Let A ∈ Mn×n(k). Consider the sys-tem of equations represented by Ax = b, where x = (x1, . . . xn)t and b =(b1, . . . bn)t ∈ k(n).

1. If the system of equations has a solution, then

xi detA = det(A1, . . . , Ai−1, b, Ai+1, . . . , An).

116

2. If detA 6= 0, then the system of equations has a unique solution givenby

xi =det(A1, . . . , Ai−1, b, Ai+1, . . . , An)

detA, 1 ≤ i ≤ n.

Proof. (1) The system of linear equations represented by Ax = b is equivalentto the matrix equation x1A1 + · · ·+ xnAn = b. This gives

det(A1, . . . , Ai−1, b,Ai+1, . . . , An)

= det(A1, . . . , Ai−1,

n∑j=1

xjAj, Ai+1, . . . , An)

= det(A1, . . . , Ai−1, xiAi, Ai+1, . . . , An)

= xi det(A1, . . . , An) = xi detA.

(2) If detA 6= 0, then A is invertible by Proposition 6.16. Then there isa unique solution given by x = A−1b. Since detA 6= 0, the result in the firstpart gives the desired formula for xi.

Definition 6.18. Let T ∈ L(V, V ), dimV = n. Let β = v1, . . . , vn be anordered basis of V . The determinant of T , detT , is defined to be det[T ]ββ.

Proposition 6.19. The determinant of a linear transformation is well de-fined. That is, detT does not depend on the choice of the ordered basis β.

Proof. Let β, γ be two ordered bases of V . Then [T ]γγ = [1V ]γβ[T ]ββ[1V ]βγ . Let

P = [1V ]βγ , A = [T ]ββ, and B = [TV ]γγ. Then [1V ]γβ = P−1 since [1V ]βγ [1V ]γβ =

[1V ]ββ = In.Thus B = P−1AP and so

detB = det(P−1)(detA)(detP ) = det(P−1)(detP )(detA) = detA.

Therefore, detT is well defined.

6.3 The Adjoint of a matrix

Definition 6.20. Let A ∈ Mn×n(k). If n ≥ 2, the adjoint matrix ofA, written Adj(A), is defined as the matrix (bij) ∈ Mn×n(k) where bij =(−1)i+j det(Aji). If n = 1, we define Adj(A) = (1) (even if A = 0).

117

Note that the entries of Adj(A) are polynomial expressions of the entriesof A.

The adjoint matrix of A is sometimes called the classical adjoint matrixof A. It is natural to wonder if there is a connection between the adjoint of Ajust defined and the adjoint of a linear transformation T defined in Chapter5 in relation to a given inner product. There is a connection, but it requiresmore background than has been developed so far.

Lemma 6.21. Let A ∈Mn×n(k). Then (Adj(A))t = Adj(At).

Proof. The (i, j)-entry of (Adj(A))t is the (j, i)-entry of Adj(A), which is(−1)i+j detAij. The (i, j)-entry of Adj(At) equals

(−1)i+j det((At)ji) = (−1)i+j det((Aij)t) = (−1)i+j detAij.

Therefore, (Adj(A))t = Adj(At).

Proposition 6.22. Let A ∈Mn×n(k).

1. Adj(A)A = AAdj(A) = (detA)In.

2. If det(A) 6= 0, then A is invertible and

Adj(A) = (detA)A−1.

Proof. (1) The (i, l)-entry of AAdj(A) equals

n∑j=1

aijbjl =n∑j=1

aij(−1)j+l detAlj.

If i = l, this expression equals detA, because this is the formula for detAgiven in the proof of Theorem 6.10 if we expand along the ith row. If i 6= l,then let A′ be the matrix obtained from A by replacing the lth row of A withthe ith row of A. Then detA′ = 0 by Proposition 6.14(1) because A has tworows that are equal. Thus, if we compute detA′ by expanding along the lth

row, we get 0 =∑n

j=1(−1)l+jaij detAlj, because the (l, j)-entry of A′ is aijand (A′)lj = Alj. Therefore, the (i, l)-entry of AAdj(A) equals detA if i = land equals 0 if i 6= l. This implies that AAdj(A) = (detA)In.

Now we prove that Adj(A)A = (detA)In. We have

(Adj(A)A)t = At(Adj(A))t = At Adj(At) = det(At)In = (detA)In.

Thus, Adj(A)A = ((detA)In)t = (detA)In.(2) This is clear from (1) and Proposition 6.16.

118

The equality Adj(A)A = (detA)In in Proposition 6.22 (1) could alsohave been proved in a way similar to the proof of the equality AAdj(A) =(detA)In. In this case, we could have computed detA by expanding alongthe lth column of A and at the appropriate time replace the lth column of Awith the jth column of A. See Exercise 8.

Note that if detA 6= 0, then the formula in Proposition 6.22 (1) could beused to give a different proof of the implication in Proposition 6.16 statingthat A is invertible when detA 6= 0.

Lemma 6.23. Let A,B ∈Mn×n(k). Then Adj(AB) = Adj(B) Adj(A).

Proof. First assume that A,B are both invertible. Then AB is invertible, soProposition 6.22 implies that

Adj(AB) = det(AB)(AB)−1 = det(A) det(B)B−1A−1

= (det(B)B−1)(det(A)A−1) = Adj(B) Adj(A).

Now assume that A,B ∈ Mn×n(k) are arbitrary. Let k(x) denote therational function field over k. Then A+xIn, B+xIn ∈Mn×n(k(x)) are bothinvertible (because det(A+ xIn) = xn + · · · 6= 0). Thus

Adj((A+ xIn)(B + xIn)) = Adj(B + xIn) Adj(A+ xIn).

Since each entry in these matrices lies in k[x], it follows that we may setx = 0 to obtain the result.

If A ∈Mn×n(k), let rk(A) denote the rank of A.


1. If rk(A) = n, then rk(Adj(A)) = n.

2. If rk(A) = n− 1, then rk(Adj(A)) = 1.

3. If rk(A) ≤ n− 2, then Adj(A) = 0, so rk(Adj(A)) = 0.

Proof. If n = 1, then (1) and (2) hold, while (3) is vacuous. Now assumethat n ≥ 2. If rk(A) = n, then A is invertible. Thus Adj(A) = det(A)A−1

has rank n. This proves (1). If rk(A) ≤ n − 2, then det(Aji) = 0 foreach (n− 1)× (n− 1) submatrix Aji. The definition of Adj(A) implies thatAdj(A) = 0, and so rk(Adj(A)) = 0. This proves (3).

119

Now assume that rk(A) = n−1. For any matrix A ∈Mn×n(k), there existinvertible matrices B,C ∈ Mn×n(k) such that BAC is a diagonal matrix.Let D = BAC. Then rk(D) = rk(A). Lemma 6.23 implies that

Adj(D) = Adj(BAC) = Adj(C) Adj(A) Adj(B).

Since Adj(B) and Adj(C) are invertible by (1), it follows that

rk(Adj(D)) = rk(Adj(A)).

For a diagonal matrix D, it is easy to check that if rk(D) = n − 1, thenrk(Adj(D)) = 1. Thus the same folds for A and this proves (2).


1. Adj(cA) = cn−1 Adj(A) for all c ∈ k.

2. If A is invertible, then Adj(A−1) = (Adj(A))−1.

3. Adj(At) = (Adj(A))t.

4. Adj(Adj(A)) = (det(A))n−2A for n ≥ 3. If n = 1, this result holdswhen A 6= 0. If n = 2, then the result holds if we set (det(A))n−2 = 1,including when det(A) = 0.

Proof. (1) follows from the definition of the adjoint matrix. For (2), we have

In = Adj(In) = Adj(AA−1) = Adj(A−1) Adj(A).

Thus, Adj(A−1) = (Adj(A))−1.Although we already proved (3) in Lemma 6.21, here is a different proof.

First assume that A is invertible. Then

At Adj(At) = det(At)In = det(A)In

= Adj(A)A = (Adj(A)A)t = At(Adj(A))t.

Since At is also invertible, we have Adj(At) = (Adj(A))t.Now assume that A is arbitrary. Then consider A + xIn over the field

k(x). Since A+ xIn is invertible over k(x), we have

Adj(At + xIn) = Adj((A+ xIn)t) = (Adj(A+ xIn))t .

120

Since all entries lie in k[x], we may set x = 0 to obtain the result.(4) If n = 1 and A 6= 0, then

Adj(Adj(A)) = (1) and (det(A))n−2A = A/ det(A) = (1).

Now assume that n ≥ 2. First assume that A is invertible. Then Adj(A) isinvertible and we have

Adj(Adj(A)) = det(Adj(A))(Adj(A))−1

= det(det(A)A−1)(det(A)A−1

)−1= (det(A))n(1/ det(A))(1/ det(A))A = (det(A))n−2A.

Now assume that A is not invertible and n ≥ 3. Then det(A) = 0 and so(det(A))n−2A = 0. Since rk(A) ≤ n − 1, we have rk(Adj(A)) ≤ 1 ≤ n − 2,by Proposition 6.24. Thus Adj(Adj(A)) = 0 by Proposition 6.24 again.

Now assume that n = 2 and A is not invertible. One checks directly thatAdj(Adj(A)) = A when n = 2. We have (det(A))n−2A = A (because we set(det(A))n−2 = 1, including when det(A) = 0).

6.4 The vector space of alternating forms

Let V be a vector space over a field k and assume that dimV = n. Foran integer m ≥ 1, let Mulm(V ) denote the set of m-multilinear forms f :V × · · · × V → k. Then Mulm(V ) is a vector space over k and

dim(Mulm(V )) = (dimV )m = nm,

To see this, consider a basis β = v1, . . . , vn of V and let f : V ×· · ·×V → kbe an m-multilinear map. Then f is uniquely determined by

f(vi1 , . . . , vim) | i1, . . . , im ∈ 1, 2, . . . , n.

Thus a basis of Mulm(V ) consists of nm elements.Let Altm(V ) denote the subset of Mulm(V ) consisting of m-alternating

forms f : V × · · · × V → k. Then Altm(V ) is a subspace of Mulm(V ) over k.We will compute the dimension of Altm(V ) below.

Proposition 6.26. If m ≥ 1, then dim(Altm(V )) =(nm

). In particular,

dim(Altm(V )) = 0 if m > n, and dim(Altn(V )) = 1.

121

Proof. Let v1, . . . , vn be a basis of V and let f ∈ Altm(V ). Since f ∈Mulm(V ), we know that f is uniquely determined by

f(vi1 , . . . , vim) | i1, . . . , im ∈ 1, 2, . . . , n.

Since f ∈ Altm(V ), Proposition 6.3 (2) and (3) implies that f is uniquelydetermined by

f(vi1 , . . . , vim) | 1 ≤ i1 < i2 < · · · < im ≤ n.

Thus Altm(V ) = (0) if m > n. If 1 ≤ m ≤ n, it follows that Altm(V ) isspanned by

(nm

)elements. To show that these elements are linearly indepen-

dent over k, it is sufficient to construct for each 1 ≤ i1 < i2 < · · · < im anelement g ∈ Altm(V ) such that g(vi1 , · · · , vim) = 1 and g(vj1 , · · · , vjm) = 0 forall j1, . . . , jm satisfying 1 ≤ j1 < j2 · · · < jm and j1, . . . , jm 6= i1, . . . , im.

We define g by

g(vj1 , · · · , vjm) =

(−1)ε(σ) if (j1, . . . , jm) = (iσ(1), . . . , iσ(m))

0 otherwise.

6.5 The adjoint as a matrix of a linear trans-

formation

In this section, we identify the adjoint of a matrix as the matrix of a particularlinear transformation.

Let V,W be vector spaces over k and let T : V → W be a lineartransformation. Then for each m ≥ 1, there exists a linear transforma-tion T ′m : Altm(W )→ Altm(V ) defined by T ′m(g) = g T where g ∈ Altm(W )and (g T )(u1, . . . , um) = g(T (u1), . . . , T (um)) for u1, . . . , um ∈ V . It isstraightforward to check that g T ∈ Altm(V ).

Now let US→ V

T→ W be two linear transformations. Then for eachm ≥ 1, we obtain linear transformations

Altm(W )T ′m→ Altm(V )

S′m→ Altm(U).

122

Lemma 6.27. With the notation above, we have

(T S)′m = S ′m T ′m.

Proof. Let g ∈ Altm(W ). Then

(S ′mT ′m)(g) = S ′m(T ′m(g)) = S ′m(gT ) = (gT )S = g(T S) = (T S)′m(g).

Assume now that dim(V ) = dim(W ) = n and m = n− 1. Let T ′ = T ′n−1where T ′ : Altn−1(W )→ Altn−1(V ) is the linear transformation from above.

Let β = v1, . . . , vn be a basis of V and let γ = w1, . . . , wn be a basisof W . Let T (vj) =

∑ni=1 aijwi, 1 ≤ j ≤ n. Then [T ]γβ = (aij)n×n.

Let β′ = f1, . . . , fn be the basis of Altn−1(V ) where

fj(v1, . . . , vj−1, vj+1, . . . , vn) = (−1)j−1

and fj(vi1 , . . . , vin−1) = 0 when 1 ≤ i1 < · · · < in−1 ≤ n and i1, . . . , in−1 6=1, 2, . . . , j − 1, j + 1, . . . , n.

Define γ′ = g1, . . . , gn to be a basis of Altn−1(W ) with similar propertieswith respect to the basis γ.

Let T ′(gj) =∑n

i=1 cijfi, 1 ≤ j ≤ n. Then [T ′]β′

γ′ = (cij)n×n.

Proposition 6.28. Using the notation above, we have [T ′]β′

γ′ = Adj([T ]γβ).

Proof. Let A = [T ]γβ and let C = [T ′]β′

γ′ . We now make two computations.First,(

n∑i=1

cijfi

)(v1, . . . , vl−1, vl+1, . . . , vn) = cljfl(v1, . . . , vl−1, vl+1, . . . , vn)

= (−1)l−1clj.

123

Second,

T ′(gj)(v1, . . . , vl−1, vl+1, . . . , vn) = gj(T (v1), . . . , T (vl−1), T (vl+1), . . . , T (vn))

= gj

(n∑i=1

ai1vi, . . . ,

n∑i=1

ai,l−1vi,

n∑i=1

ai,l+1vi, . . . ,

n∑i=1

ainvi

)

= gj

n∑i=1i 6=j

ai1vi, . . . ,

n∑i=1i 6=j

ai,l−1vi,

n∑i=1i 6=j

ai,l+1vi, . . . ,

n∑i=1i 6=j

ainvi

= det(Ajl)gj(v1, . . . , vj−1, vj+1, . . . , vn)

= det(Ajl)(−1)j−1.

Since T ′(gj) =∑n

i=1 cijfi, it follows that

(−1)l−1clj = (−1)j−1 det(Ajl).

Thus clj = (−1)l+j(det(Ajl)). Therefore, C = Adj(A) and so [T ′]β′

γ′ =Adj([T ]γβ).

Let US→ V

T→ W be two linear transformations and assume that

dim(U) = dim(V ) = dim(W ) = n.

Let β, β′, γ, γ′ be as before. Let α be a basis of U and let α′ be thecorresponding basis of Altn−1(U).

Let A = [T ]γβ and B = [S]βα.

Proposition 6.29. Using the notations above, we have

Adj(AB) = Adj(B) Adj(A).

Proof.

Adj(AB) = Adj([T ]γβ[S]βα) = Adj([T S]γα)

= [(T S)′]α′

γ′ = [S ′ T ′]α′γ′ = [S ′]α′

β′ [T′]β′

γ′

= Adj([S]βα) Adj([T ]γβ) = Adj(B) Adj(A).

124

6.6 The Vandermonde Determinant and Ap-

plications

Definition 6.30. Let a1, . . . , an ∈ k. The Vandermonde determinant ofa1, . . . , an, written V (a1, . . . , an), is defined to be detA where

A =

1 1 · · · 1a1 a2 · · · ana21 a22 · · · a2n

...an−11 an−12 · · · an−1n

n×n

.

The matrix A is called the Vandermonde matrix of a1, . . . , an.

Proposition 6.31.

V (a1, . . . , an) =∏

1≤i<j≤n

(aj − ai).

Proof. We prove the result by induction on n. For n = 1 the result holdssince an empty product is defined to be 1 and det(1) = 1. For n = 2, theresult holds since

det

(1 1a1 a2

)= a2 − a1.

Now assume that n ≥ 3.replace row n by (row n - a1·row (n− 1) ),replace row (n− 1) by (row (n− 1) −a1·row (n− 2) ),

...replace row 2 by (row 2 −a1·row 1 ).

These row operations do not change V (a1, . . . , an) by Proposition 6.3(4)since det is an alternating function on the rows of A. Therefore,

V (a1, . . . , an) = det

1 1 · · · 10 a2 − a1 · · · an − a10 a2(a2 − a1) · · · an(an − a1)

...0 an−22 (a2 − a1) · · · an−2n (an − a1)

n×n

.

125

Now we expand the determinant along the first column to obtain

V (a1, . . . , an) = det

a2 − a1 · · · an − a1

a2(a2 − a1) · · · an(an − a1)...

an−22 (a2 − a1) · · · an−2n (an − a1)

(n−1)×(n−1)

.

Since det is multilinear on the columns, we can factor out common factorsfrom entries in each column to obtain

V (a1, . . . , an) = (a2 − a1) · · · (an − a1) det

1 1 · · · 1a2 a3 · · · ana22 a23 · · · a2n

...an−22 an−23 · · · an−2n

.

By induction on n, we conclude now that

V (a1, . . . , an) =∏

1≤i<j≤n

(aj − ai).

We will now apply the Vandermonde determinant and related ideas tothe problem of finding polynomials in one variable which pass through givenpoints in the affine plane.

For each integer n ≥ 1, let Vn denote the vector space of polynomials ink[x] of degree ≤ n, including the zero polynomial. Thus

Vn = g ∈ k[x] | g = c0 + c1x+ · · ·+ cnxn,

where ci ∈ k. Then β = 1, x, x2, . . . , xn is a basis of Vn. It is easy to checkthat Vn ∼= k(n+1) by the isomorphism φ : Vn → k(n+1), where

∑nj=0 cjx

j 7→(c0, . . . , cn). Thus φ maps β to the standard basis of k(n+1).

We will shortly construct some other special bases of Vn.

Proposition 6.32. Let a0, a1, . . . , an be distinct elements in k and supposethat b0, b1, . . . , bn are arbitrary elements in k. Then there is a unique poly-nomial g ∈ Vn such that g(ai) = bi, 0 ≤ i ≤ n.

126

Proof. We will give two proofs of this result.(1) The conditions g(ai) = bi, 0 ≤ i ≤ n, are equivalent to finding

c0, c1, . . . , cn ∈ k such that g(x) = c0 + c1x + · · · + cnxn and

∑nj=0 cja

ji = bi,

0 ≤ i ≤ n. Let

A =

1 a0 a20 · · · an01 a1 a21 · · · an1

...1 an a2n · · · ann

, c =

c0c1...cn

, b =

b0b1...bn

.

The stated conditions are equivalent to finding c such that Ac = b. Proposi-tions 6.31 and 6.13 imply that detA 6= 0 because the ai’s are distinct. Propo-sition 6.16 implies that A is invertible and therefore c = A−1b is the unique so-lution to the system of equations. This implies that g(x) = c0+c1x+· · ·+cnxnis the unique polynomial in Vn such that g(ai) = bi, 0 ≤ i ≤ n.

(2) Let

fj,a0,...,an(x) =(x− a0) · · · (x− aj−1)(x− aj+1) · · · (x− an)

(aj − a0) · · · (aj − aj−1)(aj − aj+1) · · · (aj − an), 0 ≤ j ≤ n.

Then fj,a0,...,an(x) ∈ Vn because deg fj,a0,...,an(x) = n. We have

fj,a0,...,an(ai) =

0 if i 6= j

1 if i = j.

Let g(x) =∑n

j=0 bjfj,a0,...,an(x). Then g(ai) = bi, 0 ≤ i ≤ n, and g ∈ Vn.Suppose that h ∈ Vn and h(ai) = bi, 0 ≤ i ≤ n. Then (g − h)(ai) = 0,

0 ≤ i ≤ n. This implies that g − h ∈ Vn and g − h has n + 1 distinct zeros.A theorem from Algebra implies that g − h = 0 and so g = h. This provesthe existence and uniqueness of g.

The first proof of Proposition 6.32 depended on knowledge of the Van-dermonde determinant, while the second proof used no linear algebra, butused a theorem from Algebra on polynomials. We will now develop a thirdapproach to Proposition 6.32 and the Vandermonde determinant using thepolynomials fj,a0,...,an(x) from above.

Proposition 6.33. 1. Let a ∈ k and consider the function a∗ : Vn → k,defined by a∗(h) = h(a). Then a∗ ∈ V ∗n .

127

2. Let a0, . . . , an be distinct elements in k. Then

γ = fj,a0,...,an(x) |0 ≤ j ≤ n

is a basis of Vn. The set γ∗ = a∗0, a∗1, . . . , a∗n is a dual basis of V ∗n toγ.

Proof. (1) Let g, h ∈ Vn and let c ∈ k. Then a∗(g+ h) = (g+ h)(a) = g(a) +h(a) = a∗(g) + a∗(h), and a∗(ch) = (ch)(a) = ch(a) = ca∗(h). Thereforea∗ ∈ V ∗n .

(2) We first show that γ is a linearly independent set in Vn. Suppose that∑nj=0 bjfj,a0,...,an(x) = 0. Since

a∗i (fj,a0,...,an(x)) = fj,a0,...,an(ai) =

0 if i 6= j

1 if i = j,

we apply a∗i to both sides to obtain bifi,a0,...,an(ai) = 0. Thus, bi = 0, 0 ≤i ≤ n, and so γ is a linearly independent set in Vn. Since dimVn = n+ 1, itfollows that γ is a basis of Vn. Therefore, γ∗ is a dual basis of V ∗n to γ byProposition 4.2.

Proposition 6.34. Let a0, a1, . . . , an be distinct elements of k.

1. If g(x) ∈ Vn, then g(x) =∑n

j=0 g(aj)fj,a0,...,an(x).

2. Suppose that b0, b1, . . . , bn are arbitrary elements in k. Then there is aunique polynomial g ∈ Vn such that g(ai) = bi, 0 ≤ i ≤ n.

3. V (a0, a1, . . . , an) 6= 0.

Proof. (1) Let g(x) ∈ Vn. Then g(x) =∑n

j=0 cjfj,a0,...,an(x) where each cj ∈ k,because γ is a basis of Vn. For 0 ≤ i ≤ n, we have

g(ai) = a∗i (g(x)) = a∗i

(n∑j=0

cjfj,a0,...,an(x)

)= cifi,a0,...,an(ai) = ci.

(2) We saw above that g(x) =∑n

j=0 bjfj,a0,...,an(x) has the property thatg(ai) = bi, 0 ≤ i ≤ n. Now suppose that h ∈ Vn and h(ai) = bi, 0 ≤ i ≤ n.Let h(x) =

∑nj=0 cjfj,a0,...,an(x). Then the proof of (1) implies that ci =

128

h(ai) = bi for 0 ≤ i ≤ n. Therefore the polynomial h = g, so g is uniquelydetermined.

(3) Apply (1) to the polynomials g(x) = xj, 0 ≤ j ≤ n. Then

xj =n∑i=0

ajifi,a0,...,an(x), 0 ≤ j ≤ n.

Let

A =

1 a0 a20 · · · an01 a1 a21 · · · an1

...1 an a2n · · · ann

.

Then A = [1Vn ]γβ. Therefore, A is an invertible matrix, being a change ofbasis matrix. In particular, V (a0, a1, . . . , an) = detA 6= 0, by Proposition6.16.

The next application of Vandermonde matrices is to the linear indepen-dence of exponential functions.

Proposition 6.35. Let a1, . . . , an be distinct complex numbers. Then thefunctions ea1x, . . . , eanx are linearly independent over C.

Proof. Suppose that c1ea1x + · · · + cne

anx = 0, where ci ∈ C. Differentiatethis equation n− 1 times to obtain the system of equations

c1ea1x + · · ·+ cne

anx = 0a1c1e

a1x + · · ·+ ancneanx = 0

a21c1ea1x + · · ·+ a2ncne

anx = 0...

an−11 c1ea1x + · · ·+ an−1n cne

anx = 0.

This is equivalent to1 1 · · · 1a1 a2 · · · ana21 a22 · · · a2n

· · ·an−11 an−12 · · · an−1n

c1ea1x

c2ea2x

...cne

anx

=

00

...0

.

129

Since a1, . . . , an are distinct, it follows that the Vandermonde determinantV (a1, . . . , an) 6= 0. Therefore, the coefficient matrix is invertible and sowe must have c1e

a1x = · · · = cneanx = 0. Since eaix 6= 0, this implies

c1 = · · · = cn = 0. Therefore ea1x, . . . , eanx are linearly independent functionsover C.

Proposition 6.35 can be strengthened as follows.

Proposition 6.36. Let a1, . . . , an be distinct complex numbers. Then thefunctions ea1x, . . . , eanx are linearly independent over C(x), the field of ratio-nal functions in x. (C(x) = g(x)/h(x) | g, h ∈ C[x]).

Proof. Suppose that ea1x, . . . , eanx are linearly dependent over C(x). Thenthere exists an equation

n∑i=1

fi(x)eaix = 0,

where each fi(x) ∈ C(x) and at least one fi(x) 6= 0. After multiplying by acommon denominator of the fi(x)’s, we may assume that each fi(x) ∈ k[x].

Of all such equations, choose one with the minimal number of nonzerocoefficients. Then after relabeling, we may assume that

∑ni=1 fi(x)eaix =

0, where each fi(x) ∈ k[x] is nonzero and no nontrivial linear dependencerelation exists with fewer summands. Of all such linear dependence relations,choose one where

∑ni=1 deg fi(x) is minimal.

Clearly n ≥ 2. Differentiate the equation to obtain

n∑i=1

(aifi(x) + f ′i(x))eaix = 0.

From this equation, subtract the equation a1∑n

i=1 fi(x)eaix = 0. This gives

f ′1(x)ea1x +n∑i=2

((ai − a1)fi(x) + f ′i(x)) eaix = 0.

If this is a nontrivial linear dependence relation, then each coefficient mustbe nonzero from our assumptions. But we now show that the sum of thedegrees of the coefficients has decreased, which is a contradiction. If i ≥ 2,then we have deg((ai− a1)fi(x) + f ′i(x)) = deg fi(x) because ai− a1 6= 0 anddeg f ′i(x) < fi(x) (since fi(x) 6= 0). For i = 1 we have deg f ′1(x) < f1(x).

130

Therefore, we must have a trivial linear dependence relation. This impliesthat (ai − a1)fi(x) + f ′i(x) = 0 for i ≥ 2. This is also impossible because(ai − a1)fi(x) is nonzero and has degree greater than f ′i(x).

Therefore ea1x, . . . , eanx are linearly independent over C(x).

Exercises

1. Let A = (aij)n×n. Define

f(A) = f(A1, . . . , An) =∑σ∈Sn

(−1)ε(σ)aσ(1)1 · · · aσ(n)n.

Prove directly (without use of Theorem 6.10, for example) that f is ann-alternating form on k(n) such that f(e1, . . . , en) = 1. (Assume thefact that ε(σ) is well defined modulo 2.)

2. Compute the determinant of an arbitrary 2× 2 and 3× 3 matrix.

3. Suppose that A ∈Mn×n(k) with A = (aij)n×n. If A is either a diagonalmatrix, an upper triangular matrix, or a lower triangular matrix, thendetA = a11a22 · · · ann. (Try to find several solutions. Methods couldrely on Problem 3, or Proposition 6.9 or Theorem 6.10, for example.)

4. Let A ∈Mn×n(k). Suppose that A has the following block form.

A =

(Br×r Cr×s0s×r Ds×s

)Then detA = (detB)(detD).

5. Consider the situation in Proposition 6.17 and assume that detA = 0in each of the following parts.

(a) If Ax = b has a solution, then det(A1, . . . , Ai−1, b, Ai+1, . . . , An) =0, 1 ≤ i ≤ n.

(b) Suppose that det(A1, . . . , Ai−1, b, Ai+1, . . . , An) = 0, 1 ≤ i ≤ n,and dim(column space of A) = n − 1. Then the system Ax = bhas a solution.

(c) If Ax = b has a solution and k is infinite, then there are infinitelymany solutions to the system.

131

6. Let A ∈ Mn×n(k) and let A = (aij)n×n. The trace of A is defined bytrA =

∑ni=1 aii. Verify the following statements for A,B ∈ Mn×n(k)

and c ∈ k.

(a) tr(A+B) = trA+ trB

(b) tr(cA) = c trA

(c) tr :Mn×n(k)→ k is a linear transformation.

(d) dim(ker(tr)) = n2 − 1.

(e) tr(AB) = tr(BA)

(f) If B is invertible, then tr(B−1AB) = trA.

7. Let T ∈ L(V, V ), dimV = n. Let β = v1, . . . , vn be an ordered basisof V . The trace of T , trT , is defined to be tr[T ]ββ. Then trT is welldefined. That is, trT does not depend on the choice of the orderedbasis β.

8. Prove that Adj(A)A = (detA)In using the ideas in the proof of Propo-sition 6.22, but in this case expand along the lth column of A and atthe appropriate time replace the lth column of A with the jth columnof A.

132

Chapter 7

Canonical Forms of a LinearTransformation

7.1 Preliminaries

Let V be a vector space over k. We do not assume in this chapter that V isfinite dimensional over k, unless it is explicitly stated. Let T : V → V be alinear transformation.

We let T i denote the linear transformation T T · · · T , where T iscomposed with itself i times. Let T 0 = 1V . If

f(x) = amxm + am−1x

m−1 + · · ·+ a1x+ x0 ∈ k[x],

then f(T ) denotes the linear transformation

amTm + am−1T

m−1 + · · ·+ a1T + a01V .

Let A ∈ Mn×n(k). The characteristic polynomial of A is defined to bethe polynomial det(xIn−A) that lies in k[x]. The methods in Chapter 6 forcomputing a determinant show that det(xIn − A) is a monic polynomial ink[x] of degree n. (A polynomial in k[x] of degree n is monic if the coefficientof xn is 1). Thus the characteristic polynomial of A is a monic polynomialof degree n. We let fA(x) denote the characteristic polynomial of A and sofA(x) = det(x1n − A).

Suppose that dimV = n. We define the characteristic polynomial of Tto be det(x1V − T ) by setting det(x1V − T ) = det(xIn −A) where A = [T ]ββand β is a basis of V . The characteristic polynomial of T is independent

133

of the choice of basis of V by Proposition 6.19. We let fT (x) denote thecharacteristic polynomial of T . Thus fT (x) = det(x1V − T ).

7.2 The Cayley-Hamilton Theorem

Let V be a vector space over k and let T : V → V be a linear transformation.

Proposition 7.1. If dimV is finite, then there exists a nonzero polynomialg ∈ k[x] such that g(T ) = 0.

Proof. Assume that dimV = n. Then the n2 + 1 linear transformations1V , T, T

2, T 3, . . . , T n2

are linearly dependent over k since dim(L(V, V )) = n2.Thus there exist a0, a1, . . . , an2 ∈ k, not all zero, such that

an2T n2

+ · · ·+ a1T + a01V = 0.

Then g(T ) = 0 where g(x) = an2xn2+ · · ·+a1x+a0 is a nonzero polynomial.

Proposition 7.1 shows that we can choose g such that deg g ≤ n2. Astronger theorem, known as the Cayley-Hamilton Theorem, states that thereexists a polynomial g with deg g ≤ n such that g(T ) = 0. We prove theCayley-Hamilton Theorem in Theorem 7.7 below.

Definition 7.2. Let v ∈ V . We define the T -cyclic subspace of V generatedby v to be the subspace k[T ]v = h(T )v | h ∈ k[x]. If V = k[T ]v, then v iscalled a cyclic vector for T .

Thus k[T ]v is the subspace of V spanned by v, Tv, T 2v, . . ..

Lemma 7.3. Suppose that dimV = n ≥ 1 and that V = k[T ]v for somev ∈ V . Then v, T (v), T 2(v), . . . , T n−1(v) is a basis of V .

Proof. Suppose that v, T (v), T 2(v), . . . , T n−1(v) is a linearly dependent setover k. Then there is a dependence relation

aiTi(v) + ai−1T

i−1(v) + · · ·+ a1T (v) + a0v = 0

where ai 6= 0 and 1 ≤ i ≤ n− 1. We can divide by ai so that we may assumefrom the start that ai = 1. Then

T i(v) ∈ Span(v, T (v), T 2(v), . . . , T i−1(v)).

134

It follows that T j(v) ∈ Span(v, T (v), T 2(v), . . . , T i−1(v)) for all j ≥ i. Thusk[T ]v = Span(v, T (v), T 2(v), . . . , T i−1(v)), and so n = dim k[T ]v ≤ i. Thisis impossible because i ≤ n − 1. Thus v, T (v), T 2(v), . . . , T n−1(v) is alinearly independent set over k. Since dimV = n, it follows that this set is abasis of V .

A subspace W ⊆ V is called a T -invariant subspace of V if T (W ) ⊆ W .For example, it is easy to see that k[T ]v is a T -invariant subspace of V .

Let W be a T -invariant subspace of V and assume that dimV = n. ThenT induces a linear transformation T |W : W → W . Let β1 = v1, . . . , vm bea basis of W . Extend β1 to a basis β = v1, . . . , vm, vm+1, . . . , vn of V . Let[T |W ]β1β1 = A, where A ∈Mm×m(k). Then

[T ]ββ =

(A B0 C

),

where B ∈ Mm×(n−m)(k), C ∈ M(n−m)×(n−m)(k), and 0 ∈ M(n−m)×m(k).Let fT |W denote the characteristic polynomial of T |W . Thus fT |W = det(x1W−T |W ).

Lemma 7.4. Suppose that V = V1⊕V2 where V1 and V2 are each T -invariantsubspaces of V . Let T1 = T |V1 and T2 = T |V2. Suppose that γ = w1, . . . , wlis a basis of V1 and that δ = y1, . . . , ym is a basis of V2. Let β = γ ∪ δ =w1, . . . , wl, y1, . . . , ym. Then

[T ]ββ =

(A 00 B

),

where A = [T1]γγ and B = [T2]

δδ.

Proof. We have that T (wj) =∑l

i=1 aijwi and T (yj) =∑m

i=1 bijyi. ThenA = (aij)l×l = [T1]

γγ and B = (bij)m×m = [T2]

δδ.

Lemma 7.5. Let V be a finite dimensional vector space over k and let Wbe a T -invariant subspace of V . Then fT |W divides fT .

Proof. As above, we select a basis β1 of W and extend β1 to a basis β of Vso that β1 ⊆ β. Then

fT (x) = det

(xIn −

(A B0 C

))= det

(xIm − A −B

0 xIn−m − C

)= det(xIm − A) det(xIn−m − C) = fT |W det(xIn−m − C).

Thus fT |W divides fT .

135

For n ≥ 1, let f(x) = xn + an−1xn−1 + · · · + a1x + a0 ∈ k[x]. For n ≥ 2,

the companion matrix of f is defined to be the n× n matrix

Af =

0 0 0 · · · 0 0 −a01 0 0 · · · 0 0 −a10 1 0 · · · 0 0 −a20 0 1 · · · 0 0 −a3...

......

......

......

0 0 0 · · · 1 0 −an−20 0 0 · · · 0 1 −an−1

.

For n = 1 and f(x) = x+ a0, we let Af = (−a0).Let β = v1, . . . , vn be a basis of V . For n ≥ 2, let T : V → V be the

linear transformation defined by T (vi) = vi+1 for 1 ≤ i ≤ n− 1, and

T (vn) = −a0v1 − a1v2 − · · · − an−2vn−1 − an−1vn.

For n = 1, we define T by T (v1) = −a0v1. Then [T ]ββ = Af . Note that for

n ≥ 2, we have T i(v1) = vi+1 for 1 ≤ i ≤ n− 1.We now show that f(T )(v1) = 0.

f(T )(v1) = T n(v1) + an−1Tn−1(v1) + · · ·+ a1T (v1) + a0v1

= T (T n−1(v1)) + an−1vn + an−2vn−1 + · · ·+ a1v2 + a0v1

= T (vn) + an−1vn + an−2vn−1 + · · ·+ a1v2 + a0v1

= 0,

because T (vn) = −a0v1 − a1v2 − · · · − an−2vn−1 − an−1vn.

Proposition 7.6. Let T : V → V be the linear transformation defined above.Then the characteristic polynomial of T is f .

Proof. Since [T ]ββ = Af , we must show that det(xIn−Af ) = f . The proof isby induction on n. If n = 1, then

det(xI1 − Af ) = det(xI1 + a0) = x+ a0 = f.

If n = 2, then

det(xI2 − Af ) = det

(x a0−1 x+ a1

)= x2 + a1x+ a0 = f.

136

Now assume that n ≥ 3. We have

xIn − Af =

x 0 0 · · · 0 0 a0−1 x 0 · · · 0 0 a10 −1 x · · · 0 0 a20 0 −1 · · · 0 0 a3...

......

......

......

0 0 0 · · · −1 x an−20 0 0 · · · 0 −1 x+ an−1

.

Expanding along the first row and using induction gives

det(xIn − Af ) = x(xn−1 + an−1xn−2 + · · ·+ a2x+ a1) + (−1)n+1a0 · (−1)n−1

= xn + an−1xn−1 + · · ·+ a2x

2 + a1x+ a0 = f.

Theorem 7.7 (Cayley-Hamilton Theorem). Let V be a finite dimensionalvector space over k and let T : V → V be a linear transformation. LetfT (x) be the characteristic polynomial of T . Then fT (T ) = 0 as a lineartransformation.

Proof. It is sufficient to show that fT (T )(v) = 0 for each v ∈ V . This isclearly true if v = 0. Now assume that v 6= 0. Let W = k[T ]v be the cyclicsubspace generated by v. Since W is a T -invariant subspace of V , we knowthat fT |W divides fT . It is sufficient to show that fT |W (T )(v) = 0. Thuswe can assume from the beginning that V = k[T ]v. Let dimV = n. Thenv, T (v), T 2(v), . . . , T n−1(v) is a basis of V by Lemma 7.3.

Let T n(v) = −a0v − a1T (v) − · · · − an−1Tn−1(v) and let f(x) = xn +

an−1xn−1 + · · · + a1x + a0. Then fT (x) = f and it follows that fT (T )(v) =

f(T )(v) = 0, as desired.

7.3 Eigenvectors, eigenvalues, diagonalizabil-

ity

Let T : V → V be a linear transformation.

Definition 7.8. An element a ∈ k is an eigenvalue of T if there exists anonzero vector v ∈ V such that T (v) = av. A nonzero vector v ∈ V is aneigenvector of T with eigenvalue a, a ∈ k, if T (v) = av.

137

Lemma 7.9. An element a ∈ k is an eigenvalue of T if and only if ker(T −aIV ) is nonzero. The nonzero elements of ker(T − aIV ) are the set of eigen-vectors of T having eigenvalue a.

Proof. A vector v ∈ V satisfies T (v) = av if and only if (T − aIV )(v) = 0,which is equivalent to v ∈ ker(T − aIV ). Thus an element a ∈ k is aneigenvalue of T if and only if ker(T − aIV ) is nonzero. The second statementfollows easily from this.

Proposition 7.10. Assume that dimV is finite and let fT (x) = det(x1V −T )be the characteristic polynomial of T . An element a ∈ k is an eigenvalue ofT if and only if fT (a) = 0.

Proof. Assume first that a ∈ k and fT (a) = 0. Then det(a1V − T ) = 0, soT − a1V is not an invertible linear transformation. (See Exercise 1.) Thusker(T − a1V ) is nonzero, so a is an eigenvalue of T .

Now assume that a ∈ k is an eigenvalue of T . Then ker(T − a1V ) isnonzero. Thus T − a1V is not an invertible linear transformation, and sodet(a1V − T ) = 0. Therefore, fT (a) = 0.

Definition 7.11. A matrix A ∈ Mn×n(k) is diagonalizable (or, can be di-agonalized) if there exists C ∈ Mn×n(k) such that C−1AC is a diagonalmatrix.

If dimV is finite, a linear transformation T ∈ L(V, V ) is diagonalizable(or, can be diagonalized) if there exists a basis β of V such that [T ]ββ is adiagonal matrix.

If β is an arbitrary basis of V , dimV finite, then T is diagonalizable ifand only if [T ]ββ is diagonalizable. (See Exercise 2.)

Proposition 7.12. Let T : V → V be a linear transformation, dimV finite.The following statements are equivalent.

1. T is diagonalizable.

2. There exists a basis of V consisting of eigenvectors of T .

Proof. Assume that T is diagonalizable. Then there exists a basis β =v1, . . . , vn of V such that [T ]ββ is a diagonal matrix A = (aij). ThenT (vi) = aiivi for 1 ≤ i ≤ n. Thus β is a basis of V consisting of eigen-vectors of T .

138

Now assume that there exists a basis β = v1, . . . , vn of V consisting ofeigenvectors of T . Let T (vi) = civi for 1 ≤ i ≤ n. Then [T ]ββ = (aij) where

aij = 0 for i 6= j and aij = ci if i = j. Thus [T ]ββ is a diagonal matrix, so Tis diagonalizable.

If T : V → V is a linear transformation with dimV finite, then theCayley-Hamilton Theorem guarantees that there is a monic polynomial g ∈k[x] such that g(T ) = 0. Namely, we can take g = fT . In general, for a lineartransformation T : V → V , suppose that there exists a nonzero polynomialg ∈ k[x] such that g(T ) = 0. Then there is a nonzero polynomial f ∈ k[x]of least degree such that f(T ) = 0. We can multiply f by a nonzero scalarso that we may assume that f is also a monic polynomial. Suppose that his another monic polynomial of least degree such that h(T ) = 0 with f 6= h.Then (f − h)(T ) = f(T ) − h(T ) = 0. Since deg(f) = deg(h), both are fand h are monic, and f − h 6= 0, it follows that deg(f − h) < deg f . Thisis impossible because there is no nonzero polynomial h of degree less thatdeg(f) such that h(T ) = 0. Thus f = h. Therefore, there exists a uniquemonic polynomial f of least degree such that f(T ) = 0. This justifies thefollowing definitions.

Definition 7.13. Let T : V → V be a linear transformation. Suppose thatthere exists a nonzero polynomial g ∈ k[x] such that g(T ) = 0. (This alwaysoccurs if dimV is finite.) The minimal polynomial of T is the unique monicpolynomial pT (x) of least degree such that pT (T ) = 0. Similarly, the minimalpolynomial of a matrix A ∈ Mn×n(k) is the unique monic polynomial pA(x)of least degree such that pA(A) = 0.

Lemma 7.14. Let T : V → V be a linear transformation. Suppose thatf ∈ k[x] and f(T ) = 0. Let pT (x) ∈ k[x] be the minimal polynomial of T .Then pT (x) | f(x). In particular, if dimV is finite, then pT (x) | fT (x).

A similar result holds for the minimal polynomial of a matrix A ∈Mn×n(k).

Proof. Use the division algorithm to write f(x) = pT (x)q(x) + r(x) whereq(x), r(x) ∈ k[x] and either r(x) = 0 or deg r(x) < deg pT (x). Then

0 = f(T ) = pT (T )q(T ) + r(T ) = r(T ).

If r(x) 6= 0, then we can write r(x) = cs(x) where s(x) is a monic poly-nomial with deg(s(x)) = deg(r(x)) and c ∈ k is nonzero. Then r(T ) = 0

139

implies s(T ) = 0. This is impossible because s(x) is a monic polynomial withdeg(s(x)) < deg(pT (x)). Therefore r(x) = 0 and so pT (x) | f(x).

If dimV is finite, then fT (T ) = 0 by the Cayley-Hamilton Theorem. Itfollows that pT (x) | fT (x).

An analogous proof holds for the minimal polynomial of a matrix A ∈Mn×n(k).

7.4 More results on T-Invariant Subspaces


Lemma 7.15. Let g ∈ k[x] be a polynomial and suppose that deg(g) = m ≥1. If ker(g(T )) 6= 0, then V contains a nonzero T -invariant subspace W ofdimension ≤ m. If g is irreducible in k[x], then dimW = m.

Proof. Let v ∈ ker(g(T )) and assume that v 6= 0. Let

W = 〈v, Tv, T 2v, . . . , Tm−1v〉.

Let g = amxm + · · · + a1x + a0 with am 6= 0. Since g(T )(v) = 0, it follows

that Tm(v) = −a−1m (am−1Tm−1 + · · ·+a1T +a01V )(v). Then Tm(v) lies in W

and thus T (T iv) ∈ W for 0 ≤ i ≤ m − 1. Then W is a nonzero T -invariantsubspace of V and dim(W ) ≤ m.

Now assume that g is an irreducible polynomial. We will show thatdimW = m. Suppose that v, Tv, T 2v, . . . , Tm−1v are linearly dependentover k. Then there exist b0, . . . , bm−1 in k, not all zero, such that

b0v + b1Tv + · · ·+ bm−1Tm−1 = 0.

Then h(T )(v) = 0, where h(x) = bm−1xm−1 + · · · + b1x + b0 is a nonzero

polynomial. Let f = gcd(g, h). Recall that there exist polynomials r(x)and s(x) in k[x] such that f = rg + sh. Then f(T )(v) = 0 becauseg(T )(v) = 0 and h(T )(v) = 0. We have f = 1 because g is irreduciblein k[x] and deg(h) < deg(g). Therefore, v = 0, a contradiction. It fol-lows that v, Tv, T 2v, . . . , Tm−1v are linearly independent over k and thatdim(W ) = m.

Corollary 7.16. Suppose that there exists a nonzero polynomial f ∈ k[x]such that f(T ) = 0. Write f(x) = f1(x) · · · fm(x) where each fi(x) is anirreducible polynomial in k[x]. Then V contains a nonzero T -invariant sub-space of dimension equal to dim fi(x) for some i.

140

Proof. First suppose that ker(fi(T )) = 0 for 1 ≤ i ≤ m. Then fi(T ) isinjective for each i and it follows that the composition f(T ) = f1(T ) · · · fm(T )is injective. This is a contradiction because f(T ) = 0. Therefore ker(fi(T )) 6=0 for some i. Then the result follows from Lemma 7.15.

Proposition 7.17. Suppose that there exists a nonzero polynomial f ∈ k[x]such that f(T ) = 0.

1. If k = C, the field of complex numbers (or any algebraically closedfield), then V contains a one-dimensional T -invariant subspace. Inother words, V contains an eigenvector for T .

2. If k = R, the field of real numbers (or any real closed field), then Vcontains a T -invariant subspace of dimension ≤ 2.

Proof. Let f(T ) = 0, where f ∈ k[x] is a nonzero polynomial of degree m.(1) Since k is algebraically closed, f factors as a product of linear poly-

nomials over k. Thus we have f(x) = b(x − c1)(x − c2) · · · (x − cm) where band each ci lie in k. The result now follows from Corollary 7.16.

(2) Since k is the field of real numbers (or is real closed), f factors as aproduct of linear and irreducible quadratic polynomials over k. Thus we havef(x) =

∏mi=1 gi(x), where deg gi ≤ 2 and each gi is an irreducible polynomial

in k[x]. The result follows from Corollary 7.16.

In the next two sections, we attempt to find bases of V so that the matrix[T ]ββ is as simple as possible.

7.5 Primary Decomposition


Proposition 7.18. Suppose that f is a nonzero polynomial in k[x] such thatf(T ) = 0. Let f(x) = f1(x)f2(x), where f1, f2 ∈ k[x] and gcd(f1, f2) = 1.

Let Vi = ker(fi(T )), i = 1, 2. Then the following statements hold.

1. V = V1 ⊕ V2.

2. Vi is a T -invariant subspace of V for i = 1, 2.

3. Let Ti = T |Vi : Vi → Vi be the induced linear transformation from (2).Let pTi(x) denote the minimal polynomial of Ti. Then pTi(x) | fi(x).

141

4. If dimVi is finite, let fTi(x) denote the characteristic polynomial of Ti.Then pTi(x) | fTi(x).

5. We have pT (x) = pT1(x)pT2(x). If dimV1 and dimV2 are both finite,then fT (x) = fT1(x)fT2(x).

Proof. There exist polynomials g1, g2 ∈ k[x] such that f1g1 + f2g2 = 1.Let v ∈ V . Then v = 1V (v) = f1(T )g1(T )(v) + f2(T )g2(T )(v). Let v1 =f2(T )g2(T )(v) and let v2 = f1(T )g1(T )(v), so that v = v1 + v2. Then v1 ∈ V1because

f1(T )(v1) = f1(T )f2(T )g2(T )(v) = f(T )(g2(T )(v)) = 0.

Similarly, v2 ∈ V2. Thus V = V1 + V2.Now suppose that v1 ∈ V1, v2 ∈ V2 and v1 + v2 = 0. Then

0 = f2(T )(v1 + v2) = f2(T )(v1) + f2(T )(v2) = f2(T )(v1).

Since f1(T )(v1) = 0, it follows that

v1 = 1V (v1) = g1(T )f1(T )(v1) + g2(T )f2(T )(v1) = 0.

Similarly, it follows that v2 = 0. Therefore, V = V1 ⊕ V2, which proves (1).For (2), let vi ∈ Vi. Then fi(T )(T (vi)) = Tfi(T )(vi) = T (0) = 0. Thus

T (vi) ∈ ker(fi(T )) = Vi, so (2) holds.For (3), let v ∈ Vi. Then fi(Ti)(v) = fi(T )(v) = 0 because v ∈ Vi =

ker(fi(T )). Thus fi(Ti) = 0, so pTi(x) | fi(x) by Lemma 7.14.If dimVi is finite, then pTi(x) | fTi(x) by Lemma 7.14 because fTi(Ti) = 0

by the Cayley-Hamilton Theorem. This proves (4).Now we prove (5). Let v ∈ Vi. Then pT (Ti)(v) = pT (T )(v) = 0. Thus, for

i = 1, 2, we have pTi(x) | pT (x) by Lemma 7.14. Since pTi(x) | fi(x) by (3)and gcd(f1, f2) = 1, it follows that gcd(pT1 , pT2) = 1, so pT1(x)pT2(x) | pT (x).

Now let v ∈ V and let v = v1 + v2 where vi ∈ Vi. Then

pT1(T )pT2(T )(v) = pT1(T )pT2(T )(v1 + v2)

= pT2(T )pT1(T )(v1) + pT1(T )pT2(T )(v2) = 0 + 0 = 0.

Thus pT (x) | pT1(x)pT2(x). Therefore, pT (x) = pT1(x)pT2(x).

142

Now assume that dimV1 and dimV2 are both finite. Then Lemma 7.4(with its notation) implies that

fT (x) = det(xIn − T ) = det

(xIl − A 0

0 xIm −B

)= det(xIl − A) det(xIm −B) = fT1(x)fT2(x).

Proposition 7.19. Suppose that f ∈ k[x] is a nonzero polynomial such thatf(T ) = 0. Let f(x) = f1(x) · · · fm(x), where f1, . . . , fm ∈ k[x] are pairwiserelatively prime polynomials.

Let Vi = ker(fi(T )), 1 ≤ i ≤ m. Then the following statements hold.

1. V = V1 ⊕ · · · ⊕ Vm.

2. Vi is a T -invariant subspace of V .

3. Let Ti = T |Vi : Vi → Vi be the induced linear transformation from (2).Let pTi(x) denote the minimal polynomial of Ti. Then pTi(x) | fi(x).


5. We have pT (x) = pT1(x) · · · pTm(x). If dimVi is finite, 1 ≤ i ≤ m, thenfT (x) = fT1(x) · · · fTm(x).

Proof. We prove this by induction on m. The case m = 1 is trivial and thecase m = 2 follows from Proposition 7.18. Now assume that m ≥ 3. Letg = f1(x) · · · fm−1(x), and let W = ker(g(T )). Then gcd(g, fm) = 1 andV = W ⊕ Vm by Proposition 7.18(1). By induction, we have that W = V1 ⊕· · ·⊕Vm−1 and thus (1) holds. The remaining statements are easy conclusionsfrom the induction hypothesis and Proposition 7.18. (See Exercise 4.)

The next result gives the so called Primary Decomposition of V respectto T .

Corollary 7.20. Suppose that f ∈ k[x] is a monic polynomial such thatf(T ) = 0. Let f(x) = f1(x)e1 · · · fm(x)em, where f1, . . . , fm ∈ k[x] are dis-tinct monic irreducible polynomials and each ei > 0.

Let Vi = ker(fi(T )ei), 1 ≤ i ≤ m. Then the following statements hold.

143

1. V = V1 ⊕ · · · ⊕ Vm.

2. Vi is a T -invariant subspace of V .

3. Let Ti = T |Vi : Vi → Vi be the induced linear transformation from (2).Let pTi(x) denote the minimal polynomial of Ti. Then pTi(x) | f eii (x).


5. We have pT (x) = pT1(x) · · · pTm(x). If dimVi is finite, 1 ≤ i ≤ m, thenfT (x) = fT1(x) · · · fTm(x).

Proof. The result follows immediately from Proposition 7.19.

We now use a slightly more abstract way to obtain the primary decom-position of V with respect to T .

Lemma 7.21. Suppose that Ei : V → V , 1 ≤ i ≤ m, are linear transforma-tions satisfying the following statements.

1. E1 + · · ·+ Em = 1V ,

2. EiEj = 0 if i 6= j.

Let Vi = im(Ei), 1 ≤ i ≤ m. Then

i. E2i = Ei, 1 ≤ i ≤ m.

ii. V = V1 ⊕ · · · ⊕ Vm.

iii. ker(Ei) =m∑j=1j 6=i

im(Ej) =m∑j=1j 6=i

Vj.

Proof. Since EiEj = 0 when i 6= j, we have

E2i = Ei(E1 + E2 + · · ·+ Ei + · · ·+ Em) = Ei1V = Ei.

Let v ∈ V . Then v = 1V v = (E1 + · · · + Em)v ∈ V1 + · · · + Vm, sinceEjv ∈ im(Ej) = Vj. Thus V = V1 + · · ·+ Vm.

Suppose that v ∈ V1∩(V2+· · ·+Vm). Then there exist yi ∈ Vi, 1 ≤ i ≤ m,such that

v = E1y1 = E2y2 + · · ·+ Emym.

144

Applying E1 gives

E1v = E21y1 = E1E2y2 + · · ·+ E1Emym = 0,

because E1Ej = 0 for j ≥ 2. Since E21 = E1, we have 0 = E2

1y1 = E1y1 = v.Thus V1 ∩ (V2 + · · ·+ Vm) = 0. Similarly,

Vi ∩ (V1 + · · ·+ Vi−1 + Vi+1 + · · ·+ Vm) = 0,

for 1 ≤ i ≤ m. Therefore, V = V1 ⊕ · · · ⊕ Vm.Since EiEj = 0 if i 6= j, it follows that im(Ej) ⊆ ker(Ei), for i 6= j. Thus

m∑j=1j 6=i

im(Ej) ⊆ ker(Ei). Let v ∈ ker(Ei). Then

v = (E1 + · · ·+ Em)v =m∑j=1

Ej(v) =m∑j=1j 6=i

Ej(v) ∈m∑j=1j 6=i

im(Ej).

Therefore, ker(Ei) =m∑j=1j 6=i

im(Ej).

We now use Lemma 7.21 to obtain a second proof of Proposition 7.19(1).

Proposition 7.22. Suppose that f is a nonzero monic polynomial f ∈ k[x]such that f(T ) = 0. Let f(x) = f1(x) · · · fm(x), where f1, . . . , fm ∈ k[x] arepairwise relatively prime polynomials.

Let Vi = ker(fi(T )), 1 ≤ i ≤ m. Then V = V1 ⊕ · · · ⊕ Vm.

Proof. Let gj(x) =f(x)

fj(x), 1 ≤ j ≤ m. Then gcd(g1(x), . . . , gm(x)) = 1. Thus

there exist hj(x) ∈ k[x], 1 ≤ j ≤ m, such that∑m

j=1 gj(x)hj(x) = 1. If i 6= j,

then fi(x) | gj(x) and so f(x) | gi(x)gj(x). In other words,gi(x)gj(x)

f(x)=

gi(x)gj(x)

gi(x)fi(x)∈ k[x].

Now we show that Lemma 7.21 can be applied. Let Ej = gj(T )hj(T ).Since

∑mj=1 gj(x)hj(x) = 1, we have

m∑j=1

Ej =m∑j=1

gj(T )hj(T ) = 1V .

145

Since f(x) | gi(x)gj(x) when i 6= j and f(T ) = 0, we have gi(T )gj(T ) = 0,and thus

EiEj = gi(T )hi(T )gj(T )hj(T ) = gi(T )gj(T )hi(T )hj(T ) = 0.

Therefore, the hypotheses of Lemma 7.21 hold.Now we show that im(Ei) = ker(fi(T )). Since fi(T )gi(T ) = f(T ) = 0,

we have im(gi(T )) ⊆ ker(fi(T )). Thus

im(Ei) = im(gi(T )hi(T )) ⊆ im(gi(T )) ⊆ ker(fi(T )).

Let v ∈ ker(fi(T )). If i 6= j, then fi(x) | gj(x), and so fi(T )(v) = 0 impliesthat gj(T )(v) = 0. Thus Ej(v) = gj(T )hj(T )(v) = hj(T )gj(T )(v) = 0. Thus

v = (E1 + · · ·+ Em)(v) = Ei(v) ∈ im(Ei).

Therefore, im(Ei) = ker(fi(T )) = Vi, 1 ≤ i ≤ m. Now the result follows fromLemma 7.21.

We can now extend Proposition 7.12 by giving another condition thatcharacterizes when a linear transformation T is diagonalizable.

Proposition 7.23. Let T : V → V be a linear transformation. The followingstatements are equivalent.

1. T is diagonalizable.

2. There exists a basis of V consisting of eigenvectors of T .

3. The minimal polynomial pT (x) of T is square-free and factors com-pletely over k.

Proof. We have already proved in Proposition 7.12 that (1) and (2) are equiv-alent. Assume that (2) holds and let β = v1, . . . , vn be a basis of V consist-ing of eigenvectors of T . Let A = [T ]ββ = (aij). Then A is a diagonal matrixand T (vi) = aiivi for 1 ≤ i ≤ m. Let c1, . . . , cm be the distinct elements thatlie in a11, . . . , ann. Thus m ≤ n.

Let g(x) = (x− c1) · · · (x− cm). Let v ∈ β and suppose that T (v) = civ.Since all linear transformations in k[T ] commute with one another, we have

g(T )(v) = (T−c11V ) · · · (T−ci−11V )(T−ci+11V ) · · · (T−cm1V )(T−ci1V )(v).

146

Since (T−ci1V )(v) = T (v)−civ = 0, it follows that g(T )(v) = 0 for all v ∈ β.Then g(T )(v) = 0 for all v ∈ V . Therefore g(T ) = 0. Then pT (x) | g(x)by Lemma 7.14. Since g(x) is square-free and factors completely over k, itfollows that the same holds for pT (x). Thus (3) holds.

Now assume that (3) holds, so that the minimal polynomial pT (x) of T issquare-free and factors completely over k. Then pT (x) = (x−c1) · · · (x−cm),where c1, . . . , cm ∈ k are distinct elements. Proposition 7.19 implies thatV = V1 ⊕ · · · ⊕ Vm, where Vi = ker(T − ci1V ), 1 ≤ i ≤ m. Each element ofker(T − ci1V ) is an eigenvector of T with eigenvalue ci. Let βi be a basis ofker(T−ci1V ). Then β = β1∪· · ·∪βm is a basis of V consisting of eigenvectorsof T . Thus (2) holds.

In the proof of (2) implies (3) in the proof of Proposition 7.23, we canprove even more. We now prove that g(x) = pT (x). We have already provedthat pT (x) | g(x). If g(x) 6= pT (x), then we can relabel to assume thatpT (x) | (x− c1) · · · (x− cm−1). There exists v ∈ β such that T (v) = cmv. Wehave

(T − c11V ) · · · (T − cm−11V )(v) = g(T )(v) = 0.

Since (T − ci1V )(v) = (cm − ci)v, it follows that

(T − c11V ) · · · (T − cm−11V )(v) =m−1∏i=1

(cm − ci)v 6= 0.

This is impossible, and therefore g(x) = pT (x).

7.6 Further Decomposition

We begin with two definitions that are similar to Definition 7.13.

Definition 7.24. Let v ∈ V and let W ⊆ V be a subspace. Suppose thatthere exists a nonzero polynomial f ∈ k[x] such that f(T )(v) ∈ W .

1. We let p(v,W ) ∈ k[x] denote the unique monic polynomial g ∈ k[x] ofleast degree such that g(T )(v) ∈ W .

2. In the special case of (1) when W = (0) and f(T )(v) = 0, we let pv(x)denote the unique monic polynomial g ∈ k[x] of least degree such thatpv(T )(v) = 0. We call pv(x) the T -order of v.

147

The following lemma is proved as in Lemma 7.14.

Lemma 7.25. Let v ∈ V and let W ⊆ V be a T -invariant subspace. Iff ∈ k[x] and f(T )(v) ∈ W , then p(v,W ) | f(x). If f(T )(v) = 0, thenpv(x) | f(x).

Proof. Use the division algorithm to write f(x) = p(v,W )q(x) + r(x) whereeither r(x) = 0 or deg(r(x)) < deg(p(v,W )). We have that f(T )(v) ∈ W andp(v,W )(T )q(T )(v) = q(T )p(v,W )(T )(v) ∈ W because p(v,W )(T )(v) ∈ Wand W is a T -invariant subspace. Then r(T )(v) ∈ W . The proof finishes asin Lemma 7.14.

Since (0) is a T -invariant subspace of V , it follows that if f(T )(v) = 0,then pv(x) | f(x).

Proposition 7.26. Let v ∈ V and suppose that deg pv(x) = m. Thenv, Tv, T 2v, . . . , Tm−1v is a basis of k[T ]v and dim k[T ]v = deg pv(x).

Proof. Let pv(x) = xm + am−1xm−1 + · · ·+ a1x+ a0. Since pv(T )(v) = 0, we

have Tmv = −(am−1Tm−1v + · · · + a1Tv + a0v). It follows easily that k[T ]v

is spanned by v, Tv, T 2v, . . . , Tm−1v.Suppose that

∑m−1i=0 biT

iv = 0 is a nontrivial linear dependence rela-tion. Then g(x) = bm−1x

m−1 + · · · + b1x + b0 is a nonzero polynomial withdeg g(x) ≤ m − 1 < deg pv(x) such that g(T )(v) = 0. Since we may multi-ply by a scalar to obtain a monic polynomial with the same property, thiscontradicts the definition of pv(x). Therefore, v, Tv, T 2v, . . . , Tm−1v is alinearly independent set and thus forms a basis of k[T ]v. It is now clear thatdim k[T ]v = deg pv(x).

Definition 7.27. Let W be a subspace of V . Then W is a T -admissiblesubspace of V if the following two statements hold.

1. W is a T -invariant subspace of V .

2. If y ∈ V , g ∈ k[x], and g(T )y ∈ W , then there exists z ∈ W such thatg(T )z = g(T )y.

For example, (0) and V are T -admissible subspaces of V . For if W = (0),we may always take z = 0, while if W = V , then we may always take z = y.

The following lemma constitutes the key step in the proofs of Theorems7.30 and 7.31 below.

148

Lemma 7.28. Let W ( V be a T -admissible subspace. Suppose that f(x) ∈k[x] is a polynomial such that f(T ) = 0. Then there exists v ∈ V , v /∈ W ,such that

1. W ∩ k[T ]v = (0)

2. W + k[T ]v is a T -admissible subspace of V .

Proof. Choose y ∈ V such that deg p(y,W ) is maximal. This is possiblebecause p(y,W ) | f(x) and hence deg p(y,W ) ≤ deg f(x). We have y /∈ Wbecause W ( V . Let g(x) = p(y,W ). Then g(T )y ∈ W . Since W isT -admissible, there exists y′ ∈ W such that g(T )y = g(T )y′.

Let v = y − y′. Thus v /∈ W because y /∈ W and y′ ∈ W . We now showthat W ∩ k[T ]v = (0). Suppose that h ∈ k[x] and h(T )v ∈ W . Then

h(T )y − h(T )y′ = h(T )(y − y′) = h(T )v ∈ W.

We have h(T )y′ ∈ W because y′ ∈ W and W is T -invariant. Thus h(T )y ∈W . Therefore, g(x) | h(x). We have

g(T )v = g(T )(y − y′) = g(T )y − g(T )y′ = 0.

Thus h(T )v = 0 because g(x) | h(x). Therefore W ∩k[T ]v = (0). This proves(1).

For (2), we first note that W + k[T ]v is T -invariant because W and k[T ]are each T -invariant.

Suppose that u ∈ V and h(T )u ∈ W+k[T ]v. We must find u′ ∈ W+k[T ]vsuch that h(T )u = h(T )u′. We can assume that h(x) = p(u,W + k[T ]v)because if h(T )u = h(T )u′, then s(T )h(T )u = s(T )h(T )u′ for all s(x) ∈ k[x].

Let h(T )u = w + j(T )v, where w ∈ W and j(x) ∈ k[x]. We now provethat h(x) | j(x). Use the division algorithm to write j(x) = h(x)q(x) + r(x)where either r(x) = 0 or deg r(x) < deg h(x). Let z = u − q(T )v. Thenz − u ∈ k[T ]v, and so p(z,W + k[T ]v) = p(u,W + k[T ]v) = h(x). We have

h(T )z = h(T )(u− q(T )v) = h(T )u− h(T )q(T )v

= w + j(T )v − h(T )q(T )v = w + (j(T )− h(T )q(T ))v = w + r(T )v.

Let n(x) = p(z,W ). Since h(x) = p(z,W+k[T ]v), it follows that h(x) | n(x).Let n(x) = h(x)m(x). Then

n(T )z = m(T )h(T )z = m(T )(w + r(T )v) = m(T )w +m(T )r(T )v.

149

Since n(T )z ∈ W (because n(x) = p(z,W )) and m(T )w ∈ W , we havem(T )r(T )v ∈ W . If r(x) 6= 0, then

deg(m(x)r(x)) ≥ deg p(v,W ) = deg p(y,W ) ≥ deg p(z,W )

= deg n(x) = deg h(x)m(x).

(The y above is the y from the proof of (1). We also used that deg(p(y,W ))is maximal.)

Thus deg r(x) ≥ deg h(x), a contradiction. Therefore, r(x) = 0 andh(x) | j(x). Thus j(x) = h(x)q(x).

From above, h(T )z = w + r(T )v = w. Since w = h(T )z and W isT -admissible, we have w = h(T )z = h(T )z′ for some z′ ∈ W . Thus,

h(T )u = w + j(T )v = h(T )z′ + h(T )q(T )v = h(T )(z′ + q(T )v) = h(T )u′,

where u′ = z′ + q(T )v ∈ W + k[T ]v. Therefore, W + k[T ]v is a T -admissiblesubspace of V .

Lemma 7.29. Let W be a subspace of V and suppose that W = W1 ⊕· · · ⊕Wm. Assume that W1, . . . ,Wm are each T -invariant subspaces of V . IfW is a T -admissible subspace of V , then W1, . . . ,Wm are each T -admissiblesubspaces of V .

Proof. Let y ∈ V and suppose that g(T )y ∈ Wi. Since Wi ⊆ W , there existsz ∈ W such that g(T )y = g(T )z. Let z = z1 + · · ·+ zm where zj ∈ Wj. Sinceeach Wj is T -invariant, we have g(T )y = g(T )z = g(T )z1 + · · ·+g(T )zm withg(T )zj ∈ Wj, 1 ≤ j ≤ m. Since g(T )y ∈ Wi, it follows that g(T )y = g(T )ziwith zi ∈ Wi. Thus each Wi is a T -admissible subspace of V .

Theorem 7.30. Assume that f(T ) = 0 for some polynomial f ∈ k[x]. Thefollowing statements are equivalent for a subspace W of V .

1. W is a T -admissible subspace of V .

2. W is a T -invariant subspace of V and there exists a T -invariant sub-space W ′ of V such that V = W ⊕W ′.

Proof. First assume that (2) holds. Since V is a T -admissible subspace,Lemma 7.29 implies that W is a T -admissible subspace of V and (1) holds.

Now assume that (1) holds. If W = V , then let W ′ = 0. We may nowsuppose that W ( V . Let S denote the set of subspaces Y of V satisfyingthe following three conditions.

150

1. Y is a T -invariant subspace of V .

2. W ∩ Y = (0).

3. W + Y is a T -admissible subspace of V .

Then S is a nonempty set because (0) ∈ S. The set S is partially orderedby inclusion. Let W ′ be a maximal element of S. If dimV is finite, then itis clear that W ′ exists. Otherwise the existence of W ′ follows from Zorn’sLemma.

We have W + W ′ = W ⊕W ′ by (2). We now show that V = W ⊕W ′.Suppose that W ⊕ W ′ ( V . Then Lemma 7.28 implies that there existsv ∈ V , v /∈ W ⊕W ′, such that (W ⊕W ′)∩k[T ]v = (0) and (W ⊕W ′)+k[T ]vis a T -admissible subspace of V . Then (W ⊕W ′) +k[T ]v = W ⊕W ′⊕k[T ]v.We now check that W ′⊕ k[T ]v ∈ S. We have that W ′⊕ k[T ]v is T -invariantbecause W ′ and k[T ]v are each T -invariant. We have W ∩ (W ′ ⊕ k[T ]v) = 0because of the direct sum decomposition. Finally, W + (W ′ ⊕ k[T ]v) is T -admissible. This is a contradiction because W ′ ( W ′ ⊕ k[T ]v. Therefore,V = W ⊕W ′ and (2) holds.

Theorem 7.31. Let V be a finite dimensional vector space over k. ThenV is a direct sum of T -cyclic subspaces of V . One summand k[T ]v can bechosen such that k[T ]v is a maximal T -cyclic subspace of V .

Proof. Let W be a maximal subspace of V with the property that W is aT -admissible subspace of V and is also a direct sum of T -cyclic subspaces ofV . (The set of such subspaces is nonempty because (0) is a T -cyclic subspaceof V that is also T -admissible.) If W ( V , then Lemma 7.28 and Proposition7.1 imply that there exists v ∈ V , v /∈ W , such that W ∩ k[T ]v = (0) andW + k[T ]v is a T -admissible subspace of V . Then W + k[T ]v = W ⊕ k[T ]vand W ( W ⊕ k[T ]v. This is a contradiction, and thus W = V .

Now we show that one summand k[T ]v can be chosen such that k[T ]v is amaximal T -cyclic subspace of V . Suppose that v ∈ V is such that k[T ]v is amaximal T -cyclic subspace of V . Let W = k[T ]v. The proof of Lemma 7.28shows that W is a T -admissible subspace of V . (In the proof of Lemma 7.28,let W = (0), y = v, and y′ = 0 so that v = y − y′.) Theorem 7.30 impliesthat V = W ⊕W ′, where W ′ is a T -invariant subspace of V . The first partof this Theorem implies that W ′ is a direct sum of T -cyclic subspaces of V .Then V = k[T ]v ⊕W ′ is also a direct sum of T -cyclic subspaces of V .

151

Exercises

1. Assume that dimV is finite and let T : V → V be a linear transforma-tion. Then T is invertible if and only if det(T ) 6= 0.

2. If β is an arbitrary basis of V , then T is diagonalizable if and only if[T ]ββ is diagonalizable.

3. Suppose [T ]ββ =

(a 00 b

), where a 6= b. Then PT (x) = (x− a)(x− b).

4. Fill in the details to the proof of Proposition 7.19.

5. Let V be the vector space over R consisting of all infinitely differen-tiable functions f : R→ R. Let T : V → V be the linear transforma-tion given by differentiation. That is, T (f) = f ′. Show that there isno nonzero polynomial g ∈ R[x] such that g(T ) = 0.

152

linalg notes - leep

Documents