mat 2141 - linear algebra i university of...

Linear Algebra IMAT 2141

Fall 2018

Alistair Savage

Department of Mathematics and Statistics

University of Ottawa

This work is licensed under aCreative Commons Attribution-ShareAlike 4.0 International License

http://alistairsavage.ca/mat2141

http://alistairsavage.ca

http://creativecommons.org/licenses/by-sa/4.0/

Contents

Preface 4

1 Vector spaces 51.1 Fields . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 51.2 Vector spaces . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 71.3 Some properties of vector spaces . . . . . . . . . . . . . . . . . . . . . . . . . 111.4 Linear combinations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 141.5 Subspaces . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16

2 Linear maps 212.1 Definition and examples . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 212.2 Kernel and image . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 242.3 Vector spaces of linear maps . . . . . . . . . . . . . . . . . . . . . . . . . . . 272.4 Isomorphisms . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 31

3 Structure of vector spaces 363.1 Spans and generating sets . . . . . . . . . . . . . . . . . . . . . . . . . . . . 363.2 Linear dependence/independence . . . . . . . . . . . . . . . . . . . . . . . . 383.3 Finitely generated vector spaces . . . . . . . . . . . . . . . . . . . . . . . . . 423.4 Basis and dimension . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 453.5 The Dimension Theorem . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 513.6 Dimensions of spaces of linear maps . . . . . . . . . . . . . . . . . . . . . . . 553.7 Dual spaces . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 56

4 Matrices 624.1 The matrix of a linear map . . . . . . . . . . . . . . . . . . . . . . . . . . . . 624.2 Change of bases and similar matrices . . . . . . . . . . . . . . . . . . . . . . 664.3 Gaussian elimination . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 704.4 The rank of a matrix . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 71

5 Determinants and multlinear maps 755.1 Multilinear maps . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 755.2 The determinant . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 775.3 Characterizing properties of the determinant . . . . . . . . . . . . . . . . . . 805.4 Other properties of the determinant . . . . . . . . . . . . . . . . . . . . . . . 82

2

Contents 3

6 Inner product spaces 856.1 Definitions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 856.2 Orthogonality . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 886.3 Adjoints . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 97

A A taste of abstract algebra 100A.1 Operations on sets . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 100A.2 Use of parentheses . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 101A.3 Identity elements . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 103A.4 Invertible elements . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 104A.5 Monoids . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 106A.6 Fields . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 110

B Quotient spaces and the First Isomorphism Theorem 115B.1 Equivalence relations and quotient sets . . . . . . . . . . . . . . . . . . . . . 115B.2 Quotient vector spaces . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 120B.3 The First Isomorphism Theorem . . . . . . . . . . . . . . . . . . . . . . . . . 122B.4 Another proof of the Dimension Theorem . . . . . . . . . . . . . . . . . . . . 125

Index 128

Preface

These are lecture notes for the course Linear Algebra I (MAT 2141) at the University ofOttawa. In this course, we will take a more abstract approach to linear algebra than theone taken in MAT 1341 (the prerequisite for this course). Instead of working only withreal or complex numbers, we will generalize to the setting where our coefficients lie in afield. The real numbers and complex numbers are both examples of fields, but there areothers as well. We will revisit familiar topics such as matrices, linear maps, determinants,and diagonalization, but now in this more general setting and at a deeper level. We willalso discuss more advanced topics such as dual spaces, multilinear maps, and inner productspaces. Compared to MAT 1341, this course will concentrate more on justification of resultsand mathematical rigour as opposed to computation. Almost all results will be accompaniedby a proof and students will be expected to do proofs on assignments and exams.

The appendices contains some introductory abstract algebra and discuss the topic ofquotient vector spaces, including the important First Isomorphism Theorem. This materialwill not be covered in the course, and is included here only for the interested student whowould like to explore delve further into the subject matter.

Notation: In this course N = Z≥0 = {0, 1, 2, 3, . . . } denotes the set of nonnegative integers.If Y is a subset of a set X, then

X \ Y = {x ∈ X | x 6∈ Y }.

Acknowledgement : Portions of these notes are based on lecture notes by Barry Jessup, DanielDaigle, and Kirill Zainoulline.

Alistair Savage

Course website: http://alistairsavage.ca/mat2141

4

http://alistairsavage.ca

http://alistairsavage.ca/mat2141

Chapter 1

Vector spaces

In this chapter we introduce vector spaces, which form the central subject of the course. Youhave seen vector spaces in MAT 1341, and so much of the material in this chapter should befamiliar. The topics in this chapter roughly correspond to [Tre, §1.1, §1.2, §1.7].

1.1 Fields

In MAT 1341, you did linear algebra over the real numbers and complex numbers. In fact,linear algebra can be done in a much more general context. For much of linear algebra, onecan consider “scalars” from any mathematical object known as a field. The topic of fields isan interesting subject in its own right. We will not undertake a detailed study of fields inthis course. The interested student is referred to Appendix A.6 for a discussion of this topic.

For the purposes of this course, we will use the word field to mean a subfield of thecomplex numbers or a finite field, which we now explain.

Definition 1.1.1 (Subfield of C). A subfield of the complex numbers C is a subset F ⊆ Cthat

(a) contains 1,

(b) is closed under addition (that is, x+ y ∈ F for all x, y ∈ F ),

(c) is closed under subtraction (that is, x− y ∈ F for all x, y ∈ F ),

(d) is closed under multiplication (that is, xy ∈ F for all x, y ∈ F ),

(e) is closed under taking the multiplicative inverse of a nonzero element (that is, x−1 ∈ Ffor all x ∈ F , x 6= 0).

Definition 1.1.2 (Finite fields). Let p be a prime number. The finite field with p elementsis the set

Zp = {0, 1, . . . , n− 1}

of integers modulo p, together with its usual addition and multiplication:

x+ y = remainder after dividing x+ y by p,

x · y = remainder after dividing x · y by p.

5

6 Chapter 1. Vector spaces

As long as the context is clear, we sometimes omit the bars and, for instance, write 2 insteadof 2.

Example 1.1.3. (The field F2) We have F2 = {0, 1}, where 0 6= 1. The operations + and · onF are defined by:

+ 0 1

0 0 11 1 0

and

· 0 1

0 0 01 0 1

.

Example 1.1.4. In F5, we have

3 · 4 = 2, 2 + 4 = 1, −(2) = 3, 2−1 = 3.

Remark 1.1.5. We require p to be prime in Definition 1.1.2 since we want all nonzero elementsof a field to have multiplicative inverses. See Exercise 1.1.3.

For the purposes of this course, the term field will mean either a subfield of C or a finitefield. (See Definition A.6.1 for the more general definition of a field.)

Examples 1.1.6. (a) C, R, and Q are fields.

(b) The integers Z are not a field since they are not closed under taking the inverse of anonzero element.

(c) The set R≥0 = {x ∈ R | x ≥ 0} is not a field because it is not closed under subtraction.

Example 1.1.7 (The field Q(√

2)). Let

Q(√

2) := {x+ y√

2 | x, y ∈ Q}.

ThenQ ( Q(

√2) ( R.

The fact that Q 6= Q(√

2) follows from the fact that√

2 is not a rational number. The factthat Q(

√2) 6= R follows, for example, from the fact that

√3 6∈ Q(

√2). We leave it as an

exercise (Exercise 1.1.1) to show that Q(√

2) is a field.

Examples 1.1.8. (a) Q(√

3) is also a subfield of C.

(b) Q(i) = Q(√−1) is a subfield of C. We have Q ( Q(i) ( C and Q(i) 6⊆ R.

We write F× for the set of nonzero elements of F , that is

F× = F \ {0}.

We will see that most of the linear algebra that you saw in MAT 1341 can be done overany field. That is, fields can form the “scalars” in systems of equations and vector spaces.For example, we can solve the system

x1 − 2x2 = 2x1 + x2 = 0

(1.1)

in R, C, Q, or F3 (Exercise 1.1.4).

1.2. Vector spaces 7

Exercises.

1.1.1. Prove that Q(√

2), as defined in Example 1.1.7 is a subfield of C.

1.1.2. Show that√

3 6∈ Q(√

2) and so Q(√

2) and Q(√

3) are different fields.

1.1.3. Consider the set Z6 of integers modulo 6, together with the multiplication and additionof Definition 1.1.2. Show that 2 has no multiplicative inverse.

1.1.4. Find all the solutions to the system (1.1) over the fields R, C, Q, and F3.

1.2 Vector spaces

For the remainder of the chapter, F is a field.

Definition 1.2.1 (Vector space). A vector space over F is

• a set V (whose objects are called vectors),

• a binary operation + on V called vector addition, and

• scalar multiplication: for each c ∈ F and v ∈ V , an element cv ∈ V , called the scalarproduct of c and v,

such that the following axioms are satisfied:

(V1) For all u, v ∈ V , we have u+ v = v + u. (commutativity of vector addition)

(V2) For all u, v, w ∈ V , we have (u+ v) + w = u+ (v + w). (associativity of vectoraddition)

(V3) There is an element 0 ∈ V such that, for all v ∈ V , v + 0 = 0 + a = a. The element0 is unique and is called the zero vector .

(V4) For any v ∈ V , there exists an element −v ∈ V such that v+ (−v) = 0. The element−v is uniquely determined by v and is called the additive inverse of v.

(V5) For all a ∈ F and u, v ∈ V , we have a(u+ v) = au+ av. (distributivity of scalarmultiplication over vector addition)

(V6) For all a, b ∈ F and v ∈ V , we have (a+ b)v = av + bv. (distributivity of scalarmultiplication over field addition)

(V7) For all a, b ∈ F and v ∈ V , we have a(bv) = (ab)v. (compatibility of scalarmultiplication with field multiplication)

(V8) For all v ∈ V , we have 1v = v, where 1 denotes the multiplicative identity in F .(unity law)


Remark 1.2.2 (Notation). In the setting of vector spaces, elements of F will be called scalars .Some references used boldface for vectors (e.g. v), while other use arrows over vectors (e.g.~v). We will only use boldface for the zero vector, to distinguish it from the zero element 0of the field F . In class, we will write ~0 for the zero vector (since boldface is hard to write ona blackboard).

Definition 1.2.3 (Real and complex vector spaces). When F = R in Definition 1.2.1, Vis called a real vector space. When F = C in Definition 1.2.1, V is called a complex vectorspace.

Examples 1.2.4. (a) For each positive integer n, Rn is a real vector space and Cn is a com-plex vector space. Here the vector addition and scalar multiplication are the operationsyou learned in MAT 1341.

(b) Suppose F is a field. For each positive integer n,

F n = {(x1, . . . , xn) | x1, . . . , xn ∈ F}

is a vector space over F with operations defined by

(x1, . . . , xn) + (y1, . . . , yn) = (x1 + y1, . . . , xn + yn),

c(x1, . . . , xn) = (cx1, . . . , cxn),

for c, x1, . . . , xn, y1, . . . , yn ∈ F . The previous examples of Rn and Cn are special casesof this example (where F = R and F = C).

(c) Suppose F is a field. Then for positive integers m and n,

Mm,n(F ) = {A | A is an m× n matrix with entries in F}

is a vector space over F with the usual operations of matrix addition and scalar mul-tiplication. Note that matrix multiplication does not play a role when we considerMm,n(F ) as a vector space.

(d) In addition to being a complex vector space, C is also a real vector space. See Exer-cise 1.2.1. In addition, R and C are both vector spaces over Q.

For the next example, recall that two functions f, g : X → Y are equal , and we writef = g, if f(x) = g(x) for all x ∈ X (see Definition A.5.4).

Example 1.2.5 (Function spaces). Suppose X is a nonempty set and F is a field. Let F(X,F )or FX denote the set of functions from X to F (i.e. F valued functions on X). When X = F ,we sometimes write F(F ) in instead of F(F, F ). If f, g ∈ F(X,F ) and c ∈ F , we definefunctions f + g, cf ∈ F(X,F ) by

(f + g)(x) = f(x) + g(x), (cf)(x) = cf(x), ∀ x ∈ X.

So we define addition and scalar multiplication pointwise. Let 0 ∈ F(X,F ) be the functiondefined by 0(x) = 0 for all x ∈ X (note the important difference between the zero function

1.2. Vector spaces 9

and the number zero). For f ∈ F(X,F ), define −f ∈ F(X,F ) by (−f)(x) = −f(x) for allx ∈ X. With these definitions, F(X,F ) is a vector space over F .

For example, we can verify the distributivity axiom as follows: For all f, g ∈ F(X,F )and c ∈ F , we need to show that c(f + g) = cf + cg. These two functions are equal if theyare equal at all points of X, that is, if

(c(f + g))(x) = (cf + cg)(x) ∀ x ∈ X.

Now, since we know that distributivity holds in the field F , for all x ∈ X we have

(c(f+g))(x) = c(f+g)(x) = c(f(x)+g(x)) = cf(x)+cg(x) = (cf)(x)+(cg)(x) = (cf+cg)(x).

Thus c(f + g) = cf + cg and so the distributivity axiom in Definition 1.2.1 holds. We leaveit as an exercise (Exercise 1.2.2) to verify the remaining axioms of Definition 1.2.1.

Example 1.2.6 (C∞(R)). Consider the field F = R. Let

C∞(R) = {f | f : R→ R has derivatives of all orders} ⊆ F(R,R).

As in Example 1.2.5, we define addition and scalar multiplication pointwise. For example,

sinx, cosx, ex, x2 + x3 ∈ C∞(R),

since these functions have derivatives of all orders (they are infinitely differentiable). Weclaim that C∞(R) is a vector space over R.

First of all, we must ask ourselves if the operations of addition and scalar multiplicationare well-defined on the set C∞(R). But we know from calculus that if the nth derivativesof f and g exist, then so do the nth derivatives of f + g and cf , for all c ∈ R (we have(f + g)(n) = f (n) + g(n) and (cf)(n) = cf (n)). Thus, C∞(R) is closed under addition andscalar multiplication and so the operations are well-defined.

Next, we must check the axioms of Definition 1.2.1 are satisfied. Since the zero function0 is infinitely differentiable (0′ = 0, 0′′ = 0, etc.), we have 0 ∈ C∞(R). The axioms ofDefinition 1.2.1 then hold since they hold in F(R,R). Thus C∞(R) is a vector space over R.

Example 1.2.7. Let F = R and

V = {f ∈ C∞(R) | f ′′ + f = 0} ⊆ C∞(R) ⊆ F(R,R).

For example, if f(x) = sin x, then f ′(x) = cos x, f ′′(x) = − sinx, and so

f ′′(x) + f(x) = − sinx+ sinx = 0,

and so sin x ∈ V . To show that V is a vector space over R, we need to show that it is closedunder addition and scalar multiplication (then the rest of the axioms of Definition 1.2.1 willfollow from the fact that F(R,R) is a vector space). If f, g ∈ V , then

(f + g)′′ + (f + g) = f ′′ + g′′ + f + g = (f ′′ + f) + (g′′ + g) = 0 + 0 = 0,

and so f + g ∈ V . Thus V is closed under addition. Now, if f ∈ V and c ∈ R, we have

(cf)′′ + cf = cf ′′ + cf = c(f ′′ + f) = c0 = 0,

and so cf ∈ V . Thus V is a vector space over R.


Example 1.2.8 (Polynomials). Suppose F is the field R, C or Q. Let

P(F ) = {p | p is a polynomial with coefficients in F}= {p(t) = ant

n + an−1tn−1 + · · ·+ a0 | n ∈ N, ai ∈ F ∀ i}.

Here t is an “indeterminate” (a formal symbol that is manipulated as if it were a number).For n ∈ N, let

Pn(F ) = {p ∈ P(F ) | p = 0 or deg p ≤ n}= {antn + an−1t

n−1 + · · ·+ a0 | ai ∈ F for all i} ⊆ P(F ).

Note the difference between P(F ) and Pn(F ). The elements of P(F ) are polynomials of anydegree and the elements of Pn(F ) are polynomials of degree at most n.

Remark 1.2.9. A polynomial p(t) = antn + an−1t

n−1 + · · ·+ a0 defines a function p : F → F .Specifically, p(t) defines the function which maps c ∈ F to p(c) = anc

n + an−1cn−1 + · · ·+ a0.

There is subtle difference between the polynomial p(t) and the polynomial function p.

Example 1.2.10 (Infinite sequences). Suppose F is a field. Let

V = {(a1, a2, . . . ) | ai ∈ F ∀ i}

be the set of all infinite sequence of elements of F . We define addition and scalar multipli-cation componentwise. That is,

(a1, a2, . . . ) + (b1, b2, . . . ) = (a1 + b1, a2 + b2, . . . )

andc(a1, a2, . . . ) = (ca1, ca2, . . . )

for (a1, a2, . . . ), (b1, b2, . . . ) ∈ V , c ∈ F . We leave it as an exercise (Exercise 1.2.5) to showthat V is a vector space over F .

Definition 1.2.11 (Product vector space). Suppose V1, V2, . . . , Vn are vector spaces over afield F . Define

V1 × V2 × · · · × Vn = {(v1, v2, . . . , vn) | vi ∈ Vi, 1 ≤ i ≤ n}.

We write (v1, v2, . . . , vn) = (w1, w2, . . . , wn) if vi = wi for all 1 ≤ i ≤ n. We define additionand scalar multiplication componentwise:

(v1, v2, . . . , vn) + (w1, w2, . . . , wn) = (v1 + w1, v2 + w2, . . . , vn + wn),

c(v1, v2, . . . , vn) = (cv1, cv2, . . . , cvn),

for (v1, v2, . . . , vn), (w1, w2, . . . , wn) ∈ V1×V2×· · ·×Vn and c ∈ F . We call V1×V2×· · ·×Vnthe product vector space of V1, V2, . . . , Vn. See Exercise 1.2.6.

1.3. Some properties of vector spaces 11

Exercises.

1.2.1. Check that the conditions of Definition 1.2.1 are satisfied with V = C and F = R.

1.2.2. Verify the remaining axioms of Definition 1.2.1 in Example 1.2.5.

1.2.3. Show that P(F ) and Pn(F ) are vector spaces over F .

1.2.4. Fix a positive integer n. Why is the set

{antn + an−1tn−1 + · · ·+ a0 | ai ∈ F for all i, an 6= 0}

of polynomials of degree exactly n not a vector space over F?

1.2.5. Show that V , as defined in Example 1.2.10 is a vector space.

1.2.6. Show that the product vector space (Definition 1.2.11) is actually a vector space overF . That is, show that it satisfies the axioms of Definition 1.2.1. You will need to use thefact that each Vi, 1 ≤ i ≤ n, is a vector space over F .

1.2.7 ([Ber14, Ex. 1.3.4]). Suppose V is a real vector space. Let W = V × V be the (real)product vector space and define multiplication by complex scalars by the formula

(a+ bi)(u, v) = (au− bv, bu+ av), a, b ∈ R, (u, v) ∈ W.

Show that W satisfies the axioms for a complex vector space.

1.3 Some properties of vector spaces

In this section we deduce some basic properties of vector spaces that will be used throughoutthe course.

Theorem 1.3.1. Suppose V is a vector space over the field F .

(a) If 0′ ∈ V is a vector such that 0′ + v = v for all v ∈ V , then 0′ = 0. In other words,the zero vector is unique.

(b) If v+w = 0 for some v, w ∈ V , then w = −v. That is, the additive inverse of a vectoris unique (or, negatives of vectors are unique).

(c) For all v ∈ V , we have −(−v) = v.

(d) For all v ∈ V , we have 0v = 0.

(e) For all c ∈ F , c0 = 0.


Proof. To prove (d), note that

0v + 0v = (0 + 0)v = 0v = 0v.

Adding −0v to both sides then gives

0v + 0v + (−0v) = 0v + (−0v) =⇒ 0v = 0.

We leave the proof of the remaining statements as Exercise 1.3.1.

Corollary 1.3.2. For every vector v and scalar c, we have

c(−v) = −(cv) = (−c)v.

Proof. To prove the first equality, note that

c(−v) + cv = c(−v + v) = c0 = 0.

Thus c(−v) = −cv by the uniqueness of negatives (Theorem 1.3.1(b)). Similarly,

cv + (−c)v = (c− c)v = 0v = 0,

and so (−c)v = −(cv) by the uniqueness of negatives.

Corollary 1.3.3. For every vector v, we have (−1)v = −v.

Proof. We have (−1)v = −(1v) = v.

Theorem 1.3.4 (The zero vector has no divisors). Let V be a vector space, v ∈ V , and c ascalar. Then cv = 0 if and only if c = 0 or v = 0.

Proof. We already know from Theorem 1.3.1 that if c = 0 or v = 0, then cv = 0. So itremains to prove that if cv = 0, then either c = 0 or v = 0. Suppose cv = 0. Either c = 0 orc 6= 0. If c = 0, then we’re done. So let’s consider the remaining case of c 6= 0. Then, since cis a nonzero element of a field, it has a multiplicative inverse. Multiplying both sides of theequation cv = 0 by c−1 gives

c−1cv = c−10 =⇒ 1v = 0 =⇒ v = 0.

Corollary 1.3.5 (Cancellation laws). Suppose u, v are vectors and c, d are scalars.

(a) If cu = cv and c 6= 0, then u = v.

(b) If cv = dv and v 6= 0, then c = d.

Proof. (a) Since c 6= 0 and all nonzero elements of a field have multiplicative inverses,we can multiply both sides of the equation cu = cv by c−1 to get

c−1(cu) = c−1cv =⇒ (c−1c)u = (c−1c)v =⇒ 1u = 1v =⇒ u = v.

1.3. Some properties of vector spaces 13

(b) cv = dv =⇒ cv + (−dv) = dv + (−dv) =⇒ (c − d)v = 0. Then, since v 6= 0, wehave c−d = 0 by Theorem 1.3.4. Adding d to both sides gives c−d+d = 0+d =⇒ c+0 =d =⇒ c = d.

Definition 1.3.6 (Subtraction of vectors). For vectors u, v, we define

u− v = u+ (−v).

Exercises.

1.3.1. Complete the proof of Theorem 1.3.1.

1.3.2 ([Ber14, Ex. 1.4.1]). Prove that, in a vector space:

(a) u− v = 0 if and only if u = v;

(b) u+ v = z if and only if u = z − v.

1.3.3 ([Ber14, Ex. 1.4.2]). Let V be a vector space over a field F .

(a) Show that if v ∈ V is a fixed nonzero vector, then the mapping f : F → V defined byf(c) = cv is injective.

(b) Show that if c is a fixed nonzero scalar, then the mapping g : V → V defined byg(v) = cv is bijective.

You should directly use the definition of injective and bijective mapppings, and not use anyproperties of linear maps (a topic we have yet to discuss).

1.3.4 ([Ber14, Ex. 1.4.3]). Suppose that v is a fixed vector in a vector space V . Prove thatthe mapping

τ : V → V, τ(u) = u+ v,

is bijective. (This map is called translation by the vector v.)

1.3.5 ([Ber14, Ex. 1.4.4]). Prove that if a is a nonzero scalar and v is a fixed vector in avector space V , then the equation ax+ v = 0 has a unique solution x ∈ V .

1.3.6. Suppose that, in some vector space V , a vector u ∈ V has the property that u+ v = vfor some v ∈ V . Prove that u = 0.


1.4 Linear combinations

Definition 1.4.1 (Linear combination). Suppose V is a vector space over a field F . Ifv1, v2, . . . , vn ∈ V and c1, c2, . . . , cn ∈ F , then the vector

c1v1 + c2v2 + · · ·+ cnvn

is called a linear combination of v1, v2, vn. The scalars c1, c2, . . . , cn are called coefficients .

Example 1.4.2. In the vector space F(R,R),

2 sinx− 3ex + 5x3 − 8|x|

is a linear combination of the vectors sinx, ex, x3 and |x|.

Example 1.4.3 (Wave functions in Physics). The theory of vector spaces plays an importantrole in many areas of physics. For example, in quantum mechanics, the space of wavefunctions that describe the state of a system of particles is a vector space. The so-calledsuperposition principle, that a system can be in a linear combination of states, correspondsto the fact that in vector spaces, one can form linear combinations of vectors. This is thetheory underlying the thought experiment known as Schrodinger’s cat.

Definition 1.4.4 (Span). Suppose V is a vector space over F and v1, v2, . . . , vn ∈ V . Then

Span{v1, v2, . . . , vn} = {c1v1 + c2v2 + · · ·+ cnvn | c1, . . . , cn ∈ F}

is the set of all linear combinations of v1, v2, . . . , vn and is call the span of this set of vec-tors. When we wish to emphasize the field we’re working with (for instance, when we areworking with multiple fields), we write SpanF{v1, . . . , vn}. Note that [Tre] uses the notationL{v1, . . . , vn}.

Example 1.4.5. Recall Pn(F ) = {antn + · · ·+ a0 | ai ∈ F} from Example 1.2.8. We have

Pn(F ) = Span{1, t, t2, . . . , tn},

where here 1 denotes the constant polynomial function 1(c) = 1 for all c ∈ F .

Example 1.4.6. Consider the real vector space F(R,R). Using the trigonometric sum formu-las, we have

cos(x+

π

4

)= cosx cos

π

4− sinx sin

π

4=

1√2

cosx− 1√2

sinx.

Thus cos(x+ π

4

)is a linear combination of cos x and sinx and so cos

(x+ π

4

)∈ Span{cosx, sinx}.

Examples 1.4.7. (a) Consider C as a vector space over C. We have C = SpanC{1} sincewe can write z = z1 for any z ∈ C.

(b) Consider C as a vector space over R. Then C = SpanR{1, i}. Since we can write anycomplex number as a+ bi for some real numbers a, b.

https://en.wikipedia.org/wiki/Schr%C3%B6dinger's_cat

1.4. Linear combinations 15

(c) Consider R as a vector space over Q. Then SpanQ{1,√

2} = Q(√

2), the field ofExample 1.1.7.

Example 1.4.8. Consider the vector space F n over F , for some field F . Let

e1 = (1, 0, 0, . . . , 0), e2 = (0, 1, 0, . . . , 0), . . . , en = (0, . . . , 0, 0, 1).

So for 1 ≤ i ≤ n, ei has a 1 in the ith position and a zero everywhere else. Then

F n = SpanF{e1, e2, . . . , en}

since every vector a = (a1, a2, . . . , an) ∈ F n can be written as

a = a1e1 + a2e2 + · · ·+ anen =n∑k=1

anen.

Therefore every vector in F n is a linear combination of the vectors e1, e2, . . . , en.

Exercises.

1.4.1. In C4, express the vector (2 + i, 3 − 7i, 0,−6) as a linear combination of e1, e2, e3, e4(see Example 1.4.8).

1.4.2. Show that, in any vector space, u− v is a linear combination of u and v.

1.4.3. For each of the following statements, deduce whether the statement is true or false.Justify your answers.

(a) In R3, the vector (3, 1,−7) is a linear combination of (1, 0, 1) and (2, 0, 3).

(b) In R3, the vector (3, 0,−7) is a linear combination of (1, 0, 1) and (2, 0.3).

(c) For any vector space V and u, v ∈ V , the vector u is a linear combination of u− v andu+ v.

1.4.4 ([Ber14, Ex. 1.5.8]). In the vector space P(F ) (Example 1.2.8), let

p(t) = 2t3 − 5t2 + 6t− 4, q(t) = t3 + 6t2 + 3t+ 5, r(t) = 4t2 − 3t+ 7.

Is r a linear combination of p and q? Justify your answer.


1.5 Subspaces

Definition 1.5.1 (Subspace). Suppose V is a vector space over a field F . A subset U ⊆ Vis called a subspace (or linear subspace) of V if

(a) 0 ∈ U ,

(b) U is closed under vector addition: u+ v ∈ U for all u, v ∈ U .

(c) U is closed under scalar multiplication: cu ∈ U for all u ∈ U and c ∈ F .

Theorem 1.5.2 (Subspaces are vector spaces). If U is a subspace of a vector space V , thenU is a vector space.

Proof. By Definition 1.5.1, vector addition and scalar multiplication are well-defined on U .Since 0 ∈ U and −v = (−1)v ∈ U for all v ∈ U , we see that axioms (V3) and (V4) of avector space are satisfied.. The remaining axioms hold since they hold in V (because V is avector space).

Examples 1.5.3. (a) If V is a vector space over a field F , then {0} and V are both subspacesof V . These are called the trivial subspaces. They are different if V 6= {0}.

(b) Suppose A ∈Mm,n(F ). Then

KerA = {v ∈ F n | Av = 0}

is a subspace of F n. We check the axioms of Definition 1.5.1. Since A0 = 0, we have0 ∈ KerA. If u, v ∈ KerA, then A(u + v) = Au + Av = 0 + 0 = 0 and so u + v ∈ A.If u ∈ KerA and c ∈ F , then A(cv) = c(Av) = c0 = 0 and so cv ∈ KerA.

(c) Consider C as a vector space over R. Then SpanR{1} = R is a subspace of C over R.

(d) Consider C as a vector space over C. Then the only subspaces are {0} and C.

(e) More generally, for any field F , we can consider F as a vector space over itself. Weleave it as an exercise (Exercise 1.5.1) to show that the only subspaces are {0} and F .

(f) The setV = {f ∈ F(R,R) | f(x) ≥ 0 ∀ x ∈ R},

is not a subspace of F(R,R) because it does not contain the additive inverse of everyelement in V . For instance, f(x) = x2 is a function in V , but the additive inverse −fis defined by (−f)(x) = −x2 and so −f is not in V (since, for example (−f)(1) =−1 < 0). The set V is also not closed under scalar multiplication (see Exercise 1.5.2).

(g) The set C∞(R) is a subspace of F(R,R). The set

{f ∈ C∞(R) | f ′′ + f = 0}

is a subspace of C∞(R) and a subspace of F(R,R).

(h) For any n ∈ N, Pn(F ) is a subspace of P(F ). When F = R, Pn(R) is even a subspaceof C∞(R) since polynomials are infinitely differentiable.

1.5. Subspaces 17

(i) For a field F , letW = {p ∈ P(F ) | p(1) = 0}.

We check that W is a subspace of P(F ). Since 0(1) = 0, we have 0 ∈ W . Supposep, q ∈ W , then

(p+ q)(1) = p(1) + q(1) = 0 + 0 = 0,

and so p+ q ∈ W . Finally, if p ∈ W and c ∈ F , then

(cp)(1) = cp(1) = c · 0 = 0,

and so cp ∈ W . Thus, W is indeed a subspace of P .

(j) For a field F , letV = {p ∈ P(F ) | p(1) = 1}.

Since 0(1) = 0 6= 1, 0 6∈ V and so V is not a subspace of P . The set V also violatesthe other axioms of a subspace but we can stop here because as soon as a set violatesone of the axioms, we know it is not a subspace.

Theorem 1.5.4. If v1, v2, . . . , vn are vectors in a vector space V , then Span{v1, v2, . . . , vn}is a subspace of V .

Proof. The proof of this theorem is exactly like the proof of the special case where the fieldis R (which you saw in MAT 1341).

Definition 1.5.5 (Sum and intersection of subsets). Suppose M and N are subsets of avector space V (note that they do not need to be subspaces). We define

M ∩N = {v ∈ V | v ∈M and v ∈ N}, and

M +N = {u+ v | u ∈M, v ∈ N}.

These are called the intersection and sum (respectively) of M and N .

Theorem 1.5.6. If U and W are subspaces of a vector space V , then U ∩W and U + Ware subspaces as well.

Proof. Let’s prove that U +W is a subspace. Since 0 ∈ U and 0 ∈ W , we have that

0 = 0 + 0 ∈ U +W.

Next we show that U + W is closed under vector addition. Suppose that v = u + w andv′ = u′ + w′ are two vectors in U (so u, u′ ∈ U and w,w′ ∈ W ). Then

v + v′ = (u+ w) + (u′ + w′) = (u+ u′) + (w + w′).

Since U is a subspace, it is closed under vector addition and so u+ u′ ∈ U . Similarly, sinceW is a subspace, w+w′ ∈ W . Thus v+ v′ ∈ U +W . Finally, we show that U +W is closedunder scalar multiplication. Suppose that v = u + w ∈ U + W (with u ∈ U and w ∈ W ).Then

cv = c(u+ w) = cu+ cw.


Since U is a subspace, it is closed under scalar multiplication. Thus cu ∈ U . Similarly,cw ∈ W . Therefore, cv ∈ U +W . So we have shown that U +W is a subspace.

We leave it as an exercise (Exercise 1.5.3) to show that U ∩W is a subspace of V . Infact, it is also a subspace of both U and W (since U ∩W ⊆ U and U ∩W ⊆ W ).

Example 1.5.7. Suppose V = F 3, U = {(x, 0, 0) | x ∈ F}, W = {(0, y, 0) | y ∈ F}. Weleave it as an exercise (Exercise 1.5.4) to show that U and W are subspaces of V , and thatU +W = {(x, y, 0) | x, y ∈ F} (which is also a subspace). Note that

U = Span{(1, 0, 0)}, W = Span{(0, 1, 0)},U +W = Span{(1, 0, 0), (0, 1, 0)}, U ∩W = {0}.

Corollary 1.5.8. Suppose U,W are subspaces of a vector space V over F . Then,

(a) U ∩W is the largest subspace of V contained in both U and W (that is, if X is anysubspace of V such that X ⊆ U and X ⊆ W , then X ⊆ U ∩W ), and

(b) U + W is the smallest subspace of V containing both U and W (that is, if Y is anysubspace of V such that U ⊆ Y and V ⊆ Y , then U +W ⊆ Y ).

Proof. (a) Suppose X is a subspace of V such that X ⊆ U and X ⊆ W . Then X ⊆ U∩Wby the definition of the intersection U ∩W .

(b) Suppose Y is a subspace of V such that U ⊆ Y and W ⊆ Y . Let v ∈ U + W . thenv = u + w for some u ∈ U and w ∈ W . Since u ∈ U ⊆ Y , we have u ∈ Y . Similarly,w ∈ W ⊆ Y implies w ∈ Y . Since Y is a subspace, it is closed under vector addition andso v = u + w ∈ Y . So we have shown that every element of U + W is an element of Y .Therefore U +W ⊆ Y .

Definition 1.5.9 (Direct sum). Suppose V is a vector space and U,W are subspaces of Vsuch that

(a) U ∩W = {0}, and

(b) U +W = V .

We say that V is the direct sum of U and W and write V = U ⊕W .

Example 1.5.10. Suppose F is a field, V = F 2,

U = {(x, 0) | x ∈ F}, and W = {(0, x) | x ∈ F}.

Then U ∩ W = {0} and U + W = V since any vector (x, y) ∈ V can be written as(x, y) = (x, 0)+(0, y) ∈ U+W . Thus V = U⊕W . Sometimes this as written as F 2 = F⊕F ,where we identity U and W with F (by considering only the nonzero component).

Theorem 1.5.11. Suppose U and W are subspace of a vector space V . The following sta-tements are equivalent:

(a) V = U ⊕W ,

1.5. Subspaces 19

(b) For each v ∈ V , there are unique elements u ∈ U and v ∈ V such that v = u+ w.

Proof. We first proof that (a) implies (b). So we assume (a) is true. Suppose v ∈ V . SinceV = U + W , there exist u ∈ U and w ∈ W such that v = u + w. Now suppose v = u′ + w′

for some u′ ∈ U and w′ ∈ W . Then

0 = v − v = (u+ w)− (u′ − w′) = (u− u′) + (w − w′) =⇒ u− u′ = w′ − w.

Now u − u′ ∈ U since U is a subspace (hence closed under vector addition and scalarmultiplication). Similarly, w′ − w ∈ W . So u − u′ ∈ U and u − u′ = w − w′ ∈ W . Thusu − u′ ∈ U ∩W . But U ∩W = {0}. Therefore u − u′ = 0 and w − w′ = u − u′ = 0. Sou = u′ and w = w′. Hence the representation of v in the form v = u+ w is unique.

We leave it as an exercise (Exercise 1.5.7) to prove that (b) implies (a).

If U , W , and V satisfy the equivalent conditions of Theorem 1.5.11, we say that W is acomplement to U in V . Of course, it follows that U is also a complement to W .

Remark 1.5.12. To show that S is a subspace of a vector space V over a field F , it is enoughto that 0 ∈ S and

u, v ∈ S, c, d ∈ F =⇒ cu+ dv ∈ S.

Why? Take c = d = 1 to see that S is closed under vector addition and then take d = 0 tosee that S is closed under scalar multiplication.

Exercises.

1.5.1. Suppose F is a field and consider F as a vector space over itself. Show that the onlysubspaces are {0} and F .

1.5.2. Come up with an example to show that V , as defined in Example 1.5.3(f) is not closedunder scalar multiplication.

1.5.3. Complete the proof of Theorem 1.5.6 by showing that U ∩W is a subspace of V .

1.5.4. Consider Example 1.5.7.

(a) Show that U and W are subspaces of V .

(b) Show that U +W = {(x, y, 0) | x, y ∈ F}.

1.5.5. Suppose V = F 3, U = {(x, 0, z) | x, z ∈ F}, W = {(y, y + z, z) | y, z ∈ F}.

(a) Show that U and W are subspaces of V and that

U = Span{(1, 0, 0), (0, 0, 1)}, W = Span{(1, 1, 0), (0, 1, 1)} = Span{(1, 0,−1), (0, 1, 1)}.

(b) Show that U +W = F 3.


(c) Show that U ∩W = Span{(1, 0,−1)}.

1.5.6 ([Ber14, Ex. 1.6.5]). Let M , N , and P be subspaces of a vector space V .

(a) Show that, if M ⊆ P , then P ∩ (M +N) = M + (P ∩N). (This is called the modularlaw for subspaces.)

(b) Give an example to show that, in general, P ∩ (M +N) 6= (P ∩M) + (P ∩N). Hint :Let V = R2 and let P , M , and N be three distinct lines through the origin.

1.5.7. Complete the proof of Theorem 1.5.11 by showing that (b) implies (a).

1.5.8. Let U = SpanR{(1, 1)} and V = SpanR{1,−1)}. Show that R2 = U ⊕ V .

1.5.9. Let X be a nonempty set and let F be a field. Recall that F(X,F ) is the set offunctions from X to F and is a vector space over F (see Example 1.2.5). Let Y be a nonemptysubset of X. Show that

V = {f ∈ F(X,F ) | f(x) = 0 ∀ x ∈ Y }

is a subspace of F(X,F ).

1.5.10 ([Ber14, Ex. 1.6.13]). Let V = F(R,R) be the real vector space of all functionsx : R→ R (see Example 1.2.5). We say that y ∈ V is even if y(−t) = y(t) for all t ∈ R andwe say that z ∈ V is odd if z(−t) = −z(t) for all t ∈ R. Let

M = {y ∈ V | y is even} and N = {z ∈ V | z is odd}.

(a) Prove that M and N are subspace of V and that V = M⊕N . Hint : If x ∈ V , considerthe functions y and z defined by

y(t) =1

2(x(t) + x(−t)) and z(t) =

1

2(x(t)− x(−t)) .

(b) What does (a) say for x(t) = et? Hint : Think about hyperbolic trigonometric functi-ons.

(c) What does (a) say for a polynomial function x?

1.5.11. Suppose

V = V1 × V2, M1 = {(x1,0) | x1 ∈ V1}, M2 = {(0, x2) | x2 ∈ V2}.

Prove that V = M1 ⊕M2.

1.5.12 ([Ber14, Ex. 1.6.18]). Let M and N be subspaces of a vector space V whose unionM ∪ N is also a subspace of V . Prove that either M ⊆ N or N ⊆ M . Hint: Try proof bycontradiction. Assume that neither of M , N is contained in the other. Then we can choosevectors y ∈M , z ∈ N such that y 6∈ N , z 6∈M . Think about the sum y + z.

1.5.13. Give an example of a vector space V with two subspaces U and W such that U ∪Wis not a subspace of V .

Chapter 2

Linear maps

As a general rule, whenever one introduces a new type of mathematical object (such as vectorspaces), it is important to look the natural maps between them. This helps us understand therelationships between these objects. In the case of vector spaces, the natural maps betweenthem are linear maps. The topics in this chapter roughly correspond to [Tre, §§1.3–1.6].

2.1 Definition and examples

Definition 2.1.1. Suppose V and W are vector spaces over the same field F . A functionT : V → W is said to be a linear map if

(a) T (v + w) = T (v) + T (w) for all v, w ∈ V , and

(b) T (cv) = cT (v) for all c ∈ F and v ∈ V .

Such a function is also called a linear transformation.

Remark 2.1.2. If T : V → W is a linear map and v ∈ V , we will sometimes write Tv insteadof T (v) (this should remind you of matrix multiplication).

Remark 2.1.3. Suppose V and W are vector spaces over a field F and T : V → W is a linearmap.

(a) It follows from the definition of a linear map that

T (cv + dw) = cT (v) + dT (w), ∀ c, d ∈ F, v, w ∈ V.

(b) In fact, if c1, . . . , cn ∈ F , and v1, . . . , vn ∈ V , then

T

(n∑i=1

civi

)=

n∑i=1

ciT (vi).

(c) We haveT (0V ) = T (0 · 0V ) = 0 · T (0V ) = 0W .

Here we use the notation 0V and 0W to distinguish between the zero vectors of V andW .

21

22 Chapter 2. Linear maps

(d) For v ∈ V , we have

T (−v) = T ((−1)v) = (−1)T (v) = −T (v).

Now that we have the definition of a linear map, we turn our attention to constructingsome.

Theorem 2.1.4. Suppose V is a vector space over a field F and v1, v2, . . . , vn ∈ V . (Notethat we do not require that the vi be distinct. In other words, some of the vi may be equal.)Then the map T : F n → V defined by

T (a1, a2, . . . , an) = a1v1 + a2v2 + · · ·+ anvn =n∑i=1

aivi.

is linear. Furthermore, it is the unique linear map such that Tei = vi for all 1 ≤ i ≤ n,where the ei are the vectors described in Example 1.4.8.

Proof. If a = (a1, a2, . . . , an) and b = (b1, b2, . . . , bn) are vectors in F n and c ∈ F , then

T (a+ b) = T (a1 + b1, . . . , an + bn)

= (a1 + b1)v1 + · · ·+ (an + bn)vn

= (a1v1 + · · ·+ anvn) + (b1v1 + · · ·+ bnvn)

= Ta+ Tb.

Also,

T (ca) = T (ca1, . . . , can)

= (ca1)v1 + · · ·+ (can)vn

= c(a1v1) + · · ·+ c(anvn)

= c(a1v1 + · · ·+ anvn)

= c(Ta).

Therefore T is linear. Now

Tei = T (0, . . . , 0, 1, 0, . . . , 0) = 0v1 + · · ·+ 0vi−1 + 1vi + 0vi+1 + · · ·+ 0vn = vi.

If S is another linear map such that Sei = vi for all i, then for all a = (a1, . . . , an) ∈ F n, wehave

Sa = S(a1, . . . , an) = S(a1e1 + · · ·+ anen) =n∑i=1

aiS(ei) =n∑i=1

aivi = Ta.

Thus Sa = Ta for all a ∈ F n and so S = T .

Remark 2.1.5. Suppose V and W are vector spaces over a field F . To show that a mapT : V → W is linear, it is enough to show that

T (cu+ dv) = cT (u) + dT (v) ∀ c, d ∈ F, u, v ∈ V.

Why? Consider the case c = d = 1 and then the case d = 0.

2.1. Definition and examples 23

Examples 2.1.6. (a) Let V = C∞(R), F = R, and define D : C∞(R)→ C∞(R) by D(f) =f ′ (i.e. the map D takes the derivative). Then we know from calculus that

D(cf + dg) = (cf + dg)′ = cf ′ + dg′ = cD(f) + dD(g),

for all c, d ∈ R and f, g ∈ C∞(R).

(b) IfCn(R) = {f ∈ F(R,R) | f is n times differentiable},

then D : Cn(R)→ Cn−1(R) is linear (n ≥ 1).

(c) Let V = C∞(R), F = R, and define S : C∞(R)→ C∞(R) by

(Sf)(x) =

∫ x

0

f(t) dt, ∀ x ∈ R.

Then we know from calculus that S(f) ∈ C∞(R) and

S(cf + dg) =

∫ x

0

(cf + dg)(t) dt =

∫ x

0

(cf(t) + dg(t)) dt

= c

∫ x

0

f(t) dt+ d

∫ x

0

g(t) dt = cS(f) + dS(g),

for all c, d ∈ R and f, g ∈ C∞(R). Thus S is a linear map.

(d) We leave it as an exercise (Exercise 2.1.1) to show that the map T : P2(R)→ R3 definedby T (at2 + bt+ c) = (a, b, c) is linear.

Examples 2.1.7. The following are linear maps from R3 to R3 (Exericse 2.1.2):

S(x1, x2, x3) = (x3, x1, x2)

T (x1, x2, x3) = (2x1 − 5x3, 0, 2x2)

Example 2.1.8. The maps

T : R3 → R2, T (x1, x2, x3) = (x2 − x1, 2x3),S : R2 → R4, S(x1, x2) = (2x2 − x1, 0, x1,−4x2)

are linear (Exercise 2.1.3).

Example 2.1.9. If V is any vector space and a is any scalar, then the map T : V → V definedby Tv = av is linear since for any u, v ∈ V and scalars c, d, we have

T (cu+ dv) = a(cu+ dv) = a(cu) + a(dv) = (ac)u+ (ad)v

= (ca)u+ (da)v = c(au) + d(av) = cTu+ dTv.

Note that we used the commutativity of the field of scalars here.

Definition 2.1.10 (Linear form and dual space). Suppose V is a vector space over a fieldF . Then a linear form on V is a linear map V → F and

V ∗ := {f | f : V → F is linear}

is called the dual space (or simply dual) of V .

We will soon see that V ∗ is itself a vector space.


Exercises.

2.1.1. Prove that the map T defined in 2.1.6(d) is linear.

2.1.2. Verify that the maps of Exercise 2.1.7 are linear.

2.1.3. Verify that the maps of Exercise 2.1.8 are linear.

2.1.4. Suppose V is a vector space. Prove that the map

T : V × V → V, T (u, v) = u− v,

is linear.

2.1.5. Fix a vector v ∈ R3 and define

T : R3 → R3, T (u) = u× v,

where u× v denotes the cross product of u and v. Prove that T is linear.

2.1.6. Suppose F is a field, X is a set, and x ∈ X. Prove that the map

T : F(X,F )→ F, T (f) = f(x),

is a linear form on F(X,F ).

2.1.7. Prove that if T : V → W is a linear map, then T (u−v) = T (u)−T (v) for all u, v ∈ V .

2.2 Kernel and image

Definition 2.2.1. If f : A → B is any map of sets (in particular, f could be a linear mapbetween vector spaces) and A′ ⊆ A, then

f(A′) = {f(a) | a ∈ A′} ⊆ B

is called the image of A′ under f . If B′ ⊆ B, then

f−1(B′) = {a ∈ A | f(a) ∈ B′} ⊆ A

is called the inverse image of B′ under f . Note that we use the notation f−1(B′) here eventhough we do not assume that f is invertible. If B′ = {b} consists of a single element, wewill sometimes write f−1(b) for f−1({b}) (here one must be extra careful to not confuse thiswith the inverse function, which may or may not exist).

Theorem 2.2.2. Suppose V and W are vector spaces and T : V → W is a linear map.

(a) If M is a subspace of V , then T (M) is a linear subspace of W .

2.2. Kernel and image 25

(b) If N is a subspace of W , then T−1(N) is a subspace of V .

Proof. We will use Remark 1.5.12 in this proof.

(a) Since M is a subspace, we have 0 ∈ M and so 0 = T0 ∈ T (M). Now suppose thaty, y′ ∈ T (M) and c, c′ are scalars. Then there are x, x′ ∈M such that y = Tx, y′ = Tx′ andso

cy + c′y′ = cTx+ c′Tx′ = T (cx) + T (c′x′) = T (cx+ c′x′) ∈ T (M)

since cx+ c′x′ ∈M (because M is a subspace).

(b) Since T0 = 0 ∈ N , we have 0 ∈ T−1(N). Now suppose that x, x′ ∈ T−1(N) and c, c′

are scalars. ThenT (cx+ c′x′) = cTx+ c′Tx′ ∈ N,

since Tx, Tx′ ∈ N . Thus cx+ c′x′ ∈ T−1(N).

Definition 2.2.3 (Kernel and image). If T : V → W is a linear map, then

KerT = T−1(0) = {x | Tx = 0}

is called the kernel (or null space) of T and

ImT = T (V ) = {Tx | x ∈ V }

is called the image (or range) of T .

Corollary 2.2.4. If T : V → W is a linear map, then KerT is a subspace of V and ImT isa subspace of W .

Proof. We take M = V and N = {0} in Theorem 2.2.2.

Example 2.2.5. Let D : C∞(R)→ C∞(R) be the linear map given by differentiation. Then

KerD = {f ∈ C∞(R) | f is constant},ImD = C∞(R).

Why is ImT all of C∞(R)? Suppose f ∈ C∞(R). Define F by

F (x) =

∫ x

0

f(t) dt.

Then we know from calculus that F is differentiable and DF = F ′ = f (this shows thatF ∈ C∞(R)). Hence every f ∈ C∞(R) is in the image of D.

Example 2.2.6. Let f : R2 → R be the linear form defined by f(x1, x2) = x2 − 3x1. ThenIm f = R and the kernel of f is a line through the origin (the line x2 = 3x1).

Example 2.2.7. Let T : R2 → R3 defined by T (x1, x2) = (x2, 0, x1). Then KerT = {0} andthe image of T is the ‘x, z-plane’ in R3.


Theorem 2.2.8. Suppose T : V → W is a linear map. Then T is injective if and only ifKerT = {0}.

Proof. Suppose KerT = {0} is injective. Then, for v, v′ ∈ V ,

Tv = Tv′ =⇒ Tv − Tv′ = 0 =⇒ T (v − v′) = 0 =⇒ v − v′ = 0 =⇒ v = v′.

Thus T is injective. Now suppose T is injective. Then for v ∈ V ,

Tv = 0 = T0 =⇒ v = 0.

Hence KerT = {0}.

Remark 2.2.9. Note that Theorem 2.2.8 only applies to linear maps. For instance, let f : R→R by the map defined by f(x) = x2. Then f−1({0}) = {0} but f is not injective since, forinstance, f(1) = f(−1).

Exercises.

2.2.1 ([Ber14, Ex. 2.1.7]). Let P(R) be the vector space of all real polynomial functions (seeExample 1.2.8) and define

f : P → R, f(p) = p′(1).

Prove that f is a linear form on P . What is the geometric meaning of the kernel of f?

2.2.2. Suppose S : C∞(R) → C∞(R) is the map of Example 2.1.6(c). What is ImS? Hint:S is not surjective.

2.2.3 ([Ber14, Ex. 2.2.2]). Let V be a vector space over F , and let f : V → F be a linearform on V . Assume that f is not identically zero and choose a vector v such that f(v) 6= 0.Let N = Ker f . Prove that, for every vector u ∈ V , there exist unique z ∈ N and c ∈ Fsuch that x = z + cv. Hint : If u ∈ V , compute the value of the vector u− (f(u)/f(v))v.

2.2.4 ([Ber14, Ex. 2.2.3]). Let S : V → W and T : V → W be linear maps, and let

M = {v ∈ V | S(v) ∈ T (V )}.

Prove that M is a subspace of V .

2.2.5 ([Ber14, Ex. 2.2.6]). Let P be the space of real polynomial functions and let T : P → Pbe the mapping defined by Tp = p−p′, where p′ is the derivative of p. Prove that T is linearand injective.

2.2.6 ([Ber14, Ex. 2.3.15]). Let T : U → V and S : V → W be linear mappings. Prove:

(a) If S is injective, then Ker(ST ) = Ker(T ).

2.3. Vector spaces of linear maps 27

(b) If T is surjective, then Im(ST ) = Im(S).

2.2.7. Let V be a vector space and S, T ∈ L(V ). Show that if ST = TS, then S(KerT ) ⊆KerT .

2.2.8. Suppose V is a vector space and consider the linear mapping (see Exercise 2.1.4)

T : V × V → V, T (u, v) = u− v.

Determine the kernel and image of T .

2.3 Vector spaces of linear maps

Definition 2.3.1. Suppose V and W are vector spaces over a field F . We define

L(V,W ) = {T : V → W | T is linear}.

In the case where V = W , we write L(V ) for L(V, V ).

Example 2.3.2 (Zero linear map). For any vector spaces V and W over a field F , we havethe zero linear map which maps every element of V to 0 ∈ W . We denote the map by 0 (so0(v) = 0 for all v ∈ V ). Thus 0 ∈ L(V,W ).

Example 2.3.3 (Identity map). The map I : V → V defined by Iv = v for all v ∈ V is linearand is called the identity linear map. We sometimes write IV when we want to keep trackof the vector space on which it acts.

Example 2.3.4 (Scalar linear map). If a is any scalar, then the map T : V → V defined byTv = av is linear (see Example 2.1.9).

Example 2.3.5 (Linear forms). The elements of L(V, F ) are precisely the linear forms on V .

Recall that in MAT 1341, you learned that every linear map f from Rn to Rm (whoseelements are written as column vectors) is given by multiplication by the m× n matrix

A =[f(e1) f(e2) · · · f(en)

],

where {e1, . . . , en} is the standard basis of Rn (see Example 1.4.8) and f(ej) is the jth columnof A. The matrix A is called the standard matrix of the linear map.

Example 2.3.6. The linear map T : R3 → R2 defined by

T (x, y, z) = (y, z)

has standard matrix [0 1 00 0 1

]since [

0 1 00 0 1

]xyz

=

[yz

]= T

xyz

.


Later in the course, we’ll return to the topic of matrices. We’ll see matrices with entriesin an arbitrary field F and their relation to linear maps between vectors spaces over F .

Our goal now is to turn L(V,W ) into a vector space itself. So we need to define vectoraddition and scalar multiplication and then check the axioms of a vector space.

Definition 2.3.7. Suppose V and W are vector spaces over a field F and c ∈ F . ForS, T ∈ L(V,W ), define S + T and cT by

(S + T )(v) = S(v) + T (v), ∀ v ∈ V,(cT )(v) = cT (v), ∀ v ∈ V.

Lemma 2.3.8. If S, T ∈ L(V,W ) and a is a scalar, then S + T, aT ∈ L(V,W ).

Proof. We first show that S + T is linear. For u, v ∈ V and c, d ∈ F , we have

(S + T )(cu+ dv) = S(cu+ dv) + T (cu+ dv)

= cS(u) + dS(v) + cT (u) + dT (v)

= cS(u) + cT (u) + dS(v) + dT (v)

= c(S(u) + T (u)) + d(S(v) + T (v))

= c(S + T )(u) + d(S + T )(v).

Therefore, S + T is linear. We leave it as an exercise (Exercise 2.3.1) to show that aT islinear.

Theorem 2.3.9. If V and W are vector spaces over a field F , then L(V,W ) is also a vectorspace over F (with the operations defined in Definition 2.3.7).

Proof. We must show that the axioms in Definition 1.2.1 are satisfied.Vector addition is commutative and associative since for all S, T, U ∈ L(V,W ) and v ∈ V ,

we have

(S + T )(x) = S(x) + T (x) = T (x) + S(x) = (T + S)(x),

and

((S + T ) + U)(x) = (S + T )(x) + U(x) = S(x) + T (x) + U(x)

= S(x) + (T + U)(x) = (S + (T + U))(x).

The zero linear map is an identity for the operation of vector addition since T + 0 = T forall T ∈ L(V,W ).

For all T ∈ L(V,W ), the map −T = (−1)T is linear and is the inverse of T under theoperation of vector addition since

(T + (−T ))(v) = T (v) + (−1)T (v) = 0 ∀ v ∈ V.

We leave it as an exercise (Exercise 2.3.1) to verify the remaining axioms of Defini-tion 1.2.1.

2.3. Vector spaces of linear maps 29

Example 2.3.10. Fix c ∈ R and define a map ϕc : F(R,R)→ R by

ϕc(f) = f(c) ∀ f ∈ F(R,R).

We show that ϕc ∈ F(R,R)∗ (see Definition 2.1.10). To do this, we need to show that ϕc islinear. For k1, k2 ∈ R and f1, f2 ∈ F(R,R), we have

ϕc(k1f1 + k2f2) = (k1f1 + k2f2)(c) = k1f1(c) + k2f2(c) = k1ϕc(f1) + k2ϕc(f2).

Thus ϕc is linear. This example is important in quantum physics. The so-called ‘deltafunction’ which describes the state of a particle located at the point is this type of linearform (i.e. evaluation at the point).

Definition 2.3.11 (Composition of maps). If A, B and C are sets (so, for instance, theycould be vector spaces), and T : A→ B and S : B → C are maps, we write ST : A→ C forthe composite map (or simply the composition) defined by

(ST )a = S(Ta) ∀a ∈ A.

Composition of maps is associative, so if R : C → D is a third map, we have (RS)T = R(ST ).We simply write RST for this triple composition.

Remark 2.3.12. Sometimes composition maps are written as S ◦T . We use the shorter nota-tion ST to remind us of matrix multiplication. As in MAT 1341, we’ll see that compositionand matrix multiplication are closely related.

Theorem 2.3.13. The composition of linear maps is a linear map. In other words, ifT ∈ L(U, V ) and S ∈ L(V,W ), then ST ∈ L(U,W ).

Proof. For u, u′ ∈ U and c a scalar, we have

(ST )(u+ u′) = S(T (u+ u′))

= S(Tu+ Tu′)

= S(Tu) + S(Tu′)

= (ST )(u) + (ST )(u′),

(ST )(cu) = S(T (cu)) = S(c(Tu)) = c(S(Tu)) = c((ST )u).

Note that if U = V = W , then the above theorem tells us that when S, T ∈ L(V ), wehave ST ∈ L(V ). Therefore we have three operations on L(V ): addition, multiplication byscalars, and composition.

Definition 2.3.14 (Powers of a map). If T is a map from some set to itself (for instance, ifT ∈ L(V )), then the powers of T are defined by

T 1 = T, T 2 = TT, T 3 = TT 2, . . . , T n = TT n−1, n ≥ 2.

We also define T 0 to be the identity map.


Exercises.

2.3.1. Complete the proof of Lemma 2.3.8 by showing that aT is linear.

2.3.2. Complete the proof of Theorem 2.3.9 by verifying the remaining axioms of Defini-tion 1.2.1.

2.3.3 ([Ber14, Ex. 2.3.2]). Let V be a vector space. If R, S, T ∈ L(V ) and c is a scalar, provethe following statements:

(a) (RS)T = R(ST );

(b) (R + S)T = RT + ST ;

(c) R(S + T ) = RS +RT ;

(d) (cS)T = c(ST ) + S(cT );

(e) TI = T = IT , where I is the identity mapping.

2.3.4. If S, T ∈ L(R3) are defined by

S(x, y, z) = (x− 2y + 3z, y − 2z, 4y),

T (x, y, z) = (y − z, 2x+ y, x+ 2x),

find explicit expressions for S + T , 2T , S − T , and ST .

2.3.5. Suppose T : V → V is a linear map such that ImT ⊆ Ker(T − I). Prove that T 2 = T .

2.3.6 ([Ber14, Ex. 2.3.9]). Let U, V,W be vector spaces over F . Prove the following:

(a) For fixed T ∈ L(U, V ), the map

L(V,W )→ L(U,W ), S 7→ ST,

is linear.

(b) For fixed S ∈ L(V,W ), the map

L(U, V )→ L(U,W ), T 7→ ST,

is linear.

2.3.7 ([Ber14, Ex. 2.3.12]). Suppose V = U ⊕ W . For each v ∈ V , let v = u + w be itsunique decomposition with u ∈ U and w ∈ W , and define Pv = u, Qv = w. Prove thatP,Q ∈ L(V ), P 2 = P , Q2 = Q, P +Q = I, and PQ = QP = 0.

2.3.8 ([Ber14, Ex. 2.3.13]). Let T : V → V be a linear map such that T 2 = T , and let

U = ImT, W = KerT.

Prove that U = {v ∈ V | Tv = v} = Ker(T − I) and that V = U ⊕W .

2.4. Isomorphisms 31

2.3.9 ([Ber14, Ex. 2.3.14]). Give an example of S, T ∈ L(R2) such that ST 6= TS.

2.3.10 ([Ber14, Ex. 2.3.16]). Let V be a real or complex vector space and let T ∈ L(V ) besuch that T 2 = I. Define

M = {v ∈ V | Tv = v}, N = {v ∈ V | Tv = −v}.

Prove that M and N are subspaces of V and that V = M ⊕ N . Hint: For every vector v,we have

v =1

2(v + Tv) +

1

2(v − Tv).

2.3.11 ([Ber14, Ex. 2.3.19]). Let S, T ∈ L(V,W ) and let U = {v ∈ V | Sv = Tv}. Provethat U is a subspace of V . Hint : Consider Ker(S − T ).

2.3.12 ([Ber14, Ex. 2.3.20]). If T : V → W and S : W → V are linear maps such that ST = I,prove that KerT = {0} and ImS = V .

2.3.13 ([Ber14, Ex. 2.3.24]). Let V = P be the vector space of real polynomial functions,and let D : V → V be the differentiation mapping Dp = p′. Let u ∈ P be the monomialu(t) = t, and define another linear map which is multiplication by t:

M : P → P , Mp = up.

Prove that DM −MD = I. Hint : You need to show that (up)′ − up′ = p for all p ∈ P .Remember the product rule.

2.4 Isomorphisms

We will now discuss a precise way in which certain vector spaces are the ‘same’ (but notnecessarily equal).

Definition 2.4.1 (Isomorphism). An isomorphism is a bijective linear map T : V → W ,where V and W are vector spaces. We say a vector space V is isomorphic to another vectorspace W (over the same field) if there is an isomorphism T : V → W . We write V ∼= W toindicate that V is isomorphic to W .

Remark 2.4.2. One should think of isomorphic vector spaces as being ‘the same’ as far astheir vector space properties go. Of course, they make look quite different (and are, ingeneral, not equal). But the isomorphism identifies the two in a way that preserves theoperations of a vector space (vector addition and scalar multiplication).

Example 2.4.3. Let

V = R2,

W = {(x, y, 0) | x, y ∈ R}.


Then V ∼= W . In order to prove this, we need to find a specific isomorphism. One isomor-phism is the map

T : V → W, T (x, y) = (x, y, 0).

We leave it as an exercise (Exercise 2.4.1) to check that the map T is an isomorphism.Note that there is more than one isomorphism from V to W . For instance,

T1(x, y) = (y, x, 0) and T2(x, y) = (x+ y, x− y, 0)

are two other isomorphisms from V to W .

Example 2.4.4. Consider the map

T : P2(R)→ R3, T (at2 + bt+ c) = (a, b, c).

You were asked to show in Exercise 2.1.1 that the map T is linear. It is easy to see that Tis bijective. Thus, T is an isomorphism. Hence P2(R) ∼= R3 (as real vector spaces).

Example 2.4.5. One can actually show that Pn(F ) ∼= F n+1 for any infinite field F . However,things can go wrong if the field is finite. For instance if we work over the field F2, then thepolynomial functions t and t2 are equal! To check this, we just need to check that they takethe same value on all of the elements of F2. Since 0 = 02 and 1 = 12, this verifies that thepolynomial functions t and t2 are indeed equal.

Remark 2.4.6. There is a difference between the symbols → and 7→. We use → when wewrite T : V → W to indicate that T is a map with domain V and codomain W . We use 7→to describe a map. So x 7→ Tx is the map that sends x to Tx.

Example 2.4.7. Let U = {(x, y, z) ∈ R3 | x+y+z = 0}. We know from techniques learned inMAT 1341 that U is a subspace of R3 since it is the solution set to a system of homogeneousequations. In fact, you can use the techniques of MAT 1341 to show that

U = Span{(−1, 1, 0), (−1, 0, 1)}

DefineT : R2 → U, T (x, y) = x(−1, 1, 0) + y(−1, 0, 1) = (−x− y, x, y).

Then T is linear since it corresponds to multiplication by the matrix−1 −11 00 1

.We leave it as an exercise (Exercise 2.4.4) to show that T is bijective (note that it is notbijective as a map R2 → R3, but it is bijective as a map R2 → U). So T is an isomorphismand R2 ∼= U .

Example 2.4.8. Remember that C can be thought of as a real vector space. Then

(a, b) 7→ a+ bi, a, b ∈ R

is an isomorphism from R2 to C. So R2 ∼= C as real vector spaces. Not that it is especiallyimportant here to state what type of isomorphism we have (that is, an isomorphism of realvector spaces). We are not using the complex vector space structure on C.


Recall that an inverse of a map f : A → B is a map g : B → A such that gf is theidentity map on A and gf is the identity map on B. In other words,

(gf)(a) = a ∀ a ∈ A, and (fg)(b) = b ∀ b ∈ B.

We write f−1 for such a map g.Remember that bijective maps have inverses. If f : A→ B is a bijective map from a set

A to a set B, then we can define the inverse map f−1 : B → A, by

f−1(b) = a ⇐⇒ f(a) = b.

In other words, for any b ∈ B, f−1(b) is defined to be the unique element a of A such thatf(a) = b. Such an a exists since f is surjective and it is unique since f is injective.

Theorem 2.4.9. Suppose V and W are vector spaces over a field F . If T : V → W is abijective linear map (i.e. an isomorphism), then the inverse map T−1 : W → V is also linear(hence is an isomorphism, since it is bijective by Exercise 2.4.6).

Proof. Let w,w′ ∈ W and c, c′ ∈ F . We want to show that

T−1(cw + c′w′) = cT−1w + c′T−1w′. (2.1)

Recall that since T is injective, for any u, v ∈ V , we have that Tu = Tv implies u = v. Solet’s apply T to each side of 2.1 and try to show that the results are equal. Applying T tothe left-hand side, we get

TT−1(cw + c′w′) = cw + c′w′.

Now, applying T to the right-hand side, we get

T (cT−1w + c′T−1w′) = cTT−1w + c′TT−1w′ = cw + cw′,

where we have used the fact that T is linear. We thus see that

T (T−1(cw + c′w′)) = T (cT−1w + c′T−1w′)

and so T−1(cw + c′w′) = cT−1w + c′T−1w′ by the injectivity of T .

Remark 2.4.10. By Exercise 2.4.6, if a linear map T : V → W has an inverse, then it is anisomorphism. This can be a useful way to show that a give linear map is an isomorphism.

Example 2.4.11. If A ∈Mn,n(R) is any invertible matrix, then the map Rn → Rn defined byv 7→ Av for v ∈ Rn is an isomorphism. We know this because it is linear (from MAT 1341)and has an inverse, namely the map Rn → Rn given by v 7→ A−1v for v ∈ Rn. This is indeedan inverse map since

AA−1v = Iv = v and A−1Av = Iv = v ∀v ∈ V.

We now collect some nice properties of isomorphism. The next theorem states thatisomorphism is an equivalence relation (see Definition B.1.1).


Theorem 2.4.12. Suppose U, V,W are vector spaces over the same field. Then

(a) V ∼= V (isomorphism is reflexive),

(b) if V ∼= W , then W ∼= V (isomorphism is symmetric),

(c) if U ∼= V and V ∼= W , then U ∼= W (isomorphism is transitive).

Proof. (a) The identity map I : V → V is an isomorphism.

(b) We now know that the inverse of an isomorphism is itself an isomorphism. So supposeV ∼= W . Then there is an isomorphism T : V → W . Thus, the inverse T−1 : W → V is anisomorphism. Hence W ∼= V .

(c) If T : U → V and S : V → W are isomorphisms, then the composition ST : U → Wis also linear and bijective, hence is an isomorphism.

Exercises.

2.4.1. Prove that the maps T , T1, and T2 of Example 2.4.3 are linear. So you need to showthey are linear and bijective.

2.4.2. Show that Pn(R) is isomorphic to Rn+1.

2.4.3. Let A = {1, 2, . . . , n} and V = F(A,F ) for some field F . Show V ∼= F n via thebijection x 7→ (x(1), x(2), . . . , x(n)).

2.4.4. Show that the map T of Example 2.4.7 is bijective.

2.4.5. Show that R2n ∼= Cn as real vector spaces.

2.4.6. Prove the following statements:

(a) The inverse of a bijective map is also bijective.

(b) If f : A → B has an inverse, then f is bijective. Combining this with the aboveremarks, this means that a map is bijective if and only if it has an inverse.

(c) The composition of two injective maps is itself an injective map.

(d) The composition of two surjective maps is itself a surjective map.

(e) The composition of two bijective maps is itself a bijective map.

Students who took MAT 1362 will have already seen these statements (see [Sav, §8.1]).

2.4.7 ([Ber14, Ex. 2.4.5]). Let V be a vector space, T ∈ L(V ), and c a nonzero scalar. Provethat cT is bijective and that (cT )−1 = c−1T−1.


2.4.8 ([Ber14, Ex. 2.4.6]). Let T : V → W be an isomorphism. For each S ∈ L(V ), defineϕ(S) = TST−1 (note that the product is defined). Prove that ϕ : L(V ) → L(W ) is anisomorphism and that ϕ(RS) = ϕ(R)ϕ(S) for all R, S ∈ L(V ). Don’t forget that you needto show ϕ is itself a linear map.

2.4.9 ([Ber14, Ex. 2.4.7]). Let V be a vector space, and let T ∈ L(V ). Prove the following:

(a) If T 2 = 0, then I − T is bijective.

(b) If T n = 0 for some positive integer n, then I − T is bijective.

Hint : In polynomial algebra, (1− t)(1 + t) = 1− t2.

2.4.10. Let U, V,W be vector spaces over the same field. Prove the following:

(a) (U × V )×W ∼= U × (V ×W ).

(b) V ×W ∼= W × V .

2.4.11. For positive integers n and m, prove that Rn × Rm ∼= Rn+m.

Chapter 3

Structure of vector spaces

In this chapter we will explore the structure of vector spaces is more detail. In particular,we analyze the concepts of generating sets, linear dependence/independence, bases, anddimension. We also discuss the important notation of the dual space. The material in thischapter roughly corresponds to [Tre, §1.2, §3.5, §8.1].

3.1 Spans and generating sets

Recall (Definition 1.4.4) that if A is a nonempty subset of a vector space V over a field F ,then

SpanA = {c1v1 + c2v2 + · · ·+ cnvn | n ∈ N, ci ∈ F, vi ∈ A for 1 ≤ i ≤ n}.

We sometimes write SpanF A when we want to emphasize the field.

Theorem 3.1.1. Suppose A is a nonempty subset of a vector space V . Then SpanA is asubspace of V and is the smallest subspace of V containing A. In other words

(a) A ⊆ SpanA, and

(b) if W is a subspace of V such that A ⊆ W , then SpanA ⊆ W .

Proof. We already noted in Theorem 1.5.4 that SpanA is a subspace of V , so it remainsto proves that it is the smallest one containing A. It is clear that A ⊆ SpanA since everyelement a ∈ A is a linear combination of elements of A (with one term and coefficient one).Now suppose W is a subspace of V containing A. We wish to show that SpanA ⊆ W . Letv ∈ SpanA. Then, by definition,

v =n∑i=1

civi

for some c1, . . . , cn ∈ F and v1, . . . , vn ∈ A. Since A ⊆ W , each vi ∈ W , for 1 ≤ i ≤ n.Then, since W is a subspace and therefore closed under scalar multiples and vector addition,∑n

i=1 civi ∈ W . So v ∈ W . Thus SpanA ⊆ W as desired.

Remark 3.1.2. If we adopt the convention that Span∅ = {0}, then Theorem 3.1.1 remainstrue with the word ‘nonempty’ removed.

36

3.1. Spans and generating sets 37

Definition 3.1.3 (Generating set). The space SpanA is called the subspace of V generatedby A, we call A a generating set for SpanA, and say that A generates SpanA. So ifSpanA = V , then we say A generates V .

Theorem 3.1.4. Suppose T : V → W is a linear map and A is a subset of V . ThenSpanT (A) = T (SpanA).

Proof. Since A ⊆ SpanA, we have T (A) ⊆ T (SpanA). Also, since SpanA is a subspace ofA, we know T (SpanA) is a subspace of W by Theorem 2.2.2. Hence, by Theorem 3.1.1,SpanT (A) ⊆ T (SpanA).

It remains to prove the reverse in inclusion. Since T (A) ⊆ SpanT (A), we have A ⊆T−1(SpanT (A)). Also, since SpanT (A) is a subspace of W , we know T−1(SpanT (A)) is asubspace of V by Theorem 2.2.2. Thus, by Theorem 3.1.1, we have SpanA ⊆ T−1(SpanT (A)).Thus T (SpanA) ⊆ SpanT (A).

Corollary 3.1.5. If T : V → W is a surjective linear map and A generates V , then T (A)generates W .

Proof. We have

W = T (V ) (since T is surjective)

= T (SpanA) (since A generates V )

= SpanT (A) (by Theorem 3.1.4).

Therefore T (A) generates W .

Exercises.

3.1.1 ([Ber14, Ex. 3.1.1]). If T : V → W is linear and A is a subset of V such that T (A)generates W , then T is surjective.

3.1.2 ([Ber14, Ex. 3.1.3]). Suppose x, x1, . . . , xn are vectors such that x is a linear combinationof x1, . . . , xn, but not of x1, . . . , xn−1. Prove that

Span{x1, . . . , xn} = Span{x1, . . . , xn−1, x}.Hint : When x is represented as a linear combination of x1, . . . , xn, the coefficient of xn mustbe zero.

3.1.3 ([Ber14, Ex. 3.1.4]). Let S : V → W and T : V → W be linear mappings, and let A bea subset of V such that SpanA = V . Prove that, if Sx = Tx for all x ∈ A, then S = T .

3.1.4. Let T : V → W be a linear map, and let x1, . . . , xm and y1, . . . , yn be two lists ofvectors in V . Suppose that

(a) x1, . . . , xm generate KerT , and

(b) T (y1), . . . , T (yn) generate W .

Show that the list x1, . . . , xm, y1, . . . , yn generates V .

38 Chapter 3. Structure of vector spaces

3.2 Linear dependence/independence

Definition 3.2.1 (Linearly dependent/independent). Suppose v1, v2, . . . , vn is a finite list ofvectors in a vector space V . We say that the list is (or the vectors v1, v2, . . . , vn themselvesare) linearly dependent (or simply dependent) if there exist scalars c1, c2, . . . , cn such that

c1v1 + c2v2 + · · ·+ cnvn = 0.

and at least one of the ci, 1 ≤ i ≤ n, is nonzero. Such an equation (in which at least one ofthe ci is nonzero), is called a dependence relation among the vi.

If v1, v2, . . . , vn are not linearly dependent, they are said to be linearly independent. Inother words, the vectors v1, v2, . . . , vn are linearly independent if

c1v1 + · · ·+ cnvn = 0 =⇒ c1 = c2 = · · · = cn = 0,

where c1, . . . , cn are scalars.

Remark 3.2.2. By the commutativity of vector addition, the order of the vectors in the listis not important in the definition of linear dependence/independence.

Example 3.2.3. Consider the case n = 1. The list v1 is linearly dependent if and only ifv1 = 0. This is because c1v1 = 0 for some nonzero scalar c1 if and only if v1 = 0 (byTheorem 1.3.4).

Example 3.2.4. Consider the case n = 2. Then the list v1, v2 is linearly dependent if andonly if one of the vectors is a multiple of the other. To see this, first suppose that v2 = cv1.Then cv1 + (−1)v2 = 0 is a dependence relation and so v1, v2 are linearly dependent. Thesame argument works for the case that v1 is a multiple of v2. Now suppose that v1, v2 arelinearly dependent. Then we have a dependence relation c1v1 + c2v2 = 0 where c1 or c2 (orboth) is nonzero. If c1 6= 0, then v1 = (−c2/c1)v2 and so v1 is a multiple of v2. Similarly, ifc2 6= 0, then v2 is a multiple of v1.

Lemma 3.2.5. (a) Any list of vectors containing the zero vector is linearly dependent.

(b) If the vectors v1, v2, . . . , vn are not distinct (i.e. some vector appears more than oncein the list), then they are linearly dependent.

Proof. Consider a list of vectors v1, . . . , vn.

(a) If vi = 0 for some i, 1 ≤ i ≤ n, then

0v1 + · · ·+ 0vi−1 + 1vi + 0vi+1 + · · ·+ 0vn

is a dependence relation and so the vectors are linearly dependent.

(b) If vi = vj for some 1 ≤ i < j ≤ n, then

0v1 + · · ·+ 0vi−1 + 1vi + 0vi+1 + · · ·+ 0vj−1 + (−1)vj + 0vj+1 + · · ·+ vn

is a dependence relation and so the vectors are linearly dependent.

3.2. Linear dependence/independence 39

Corollary 3.2.6. Consider a list of vectors v1, . . . , vn.

(a) If v1, . . . , vn are linearly independent, then none of them is the zero vector.

(b) If v1, . . . , vn are linearly independent, then no two of them are equal.

Theorem 3.2.7. Suppose V is a vector space over a field F and v1, v2, . . . , vn ∈ V . Definea linear map T : F n → V by

T (a1, . . . , an) = a1v1 + · · ·+ anvn.

(We know from Theorem 2.1.4 that this map is linear.) Then the vectors v1, . . . , vn arelinearly dependent if and only if T is not injective.

Proof. Since T is a linear map, it is not injective if and only if KerT 6= {0}. This is true ifand only if there is some (a1, . . . , an) 6= (0, . . . , 0) such that

T (a1, . . . , an) = a1v1 + · · ·+ anvn = 0.

This is true if and only if v1, . . . , vn are linearly dependent.

Corollary 3.2.8. Under the hypotheses of Theorem 3.2.7, the map T is injective if and onlyif the vectors v1, . . . , vn are linearly independent.

Theorem 3.2.9. Let V be a vector space and v1, . . . , vn ∈ V , n ≥ 2. Then the followingstatements are equivalent:

(a) v1, . . . , vn are linearly dependent,

(b) some vj is a linear combination of the vi with i 6= j (i.e. some vector is a linearcombination of the others),

(c) either v1 = 0 or there exists an index j > 1 such that vj is a linear combination ofv1, . . . , vj−1.

Proof. You saw this result for real vector spaces in MAT 1341. The proof for vector spacesover an arbitrary field is essentially the same and so we will omit it. See also [Tre, Prop. 2.6].

Corollary 3.2.10. Let V be a vector space and v1, . . . , vn ∈ V , n ≥ 2. Then the followingstatements are equivalent:

(a) v1, . . . , vn are linearly independent,

(b) no vj is a linear combination of the vi with i 6= j,

(c) v1 6= 0 and for every j > 1, vj is not a linear combination of the v1, . . . , vj−1.

Example 3.2.11. The vectors e1, . . . , en ∈ F n are linearly independent. This is because

c1e1 + · · ·+ cnen = 0 =⇒ (c1, . . . , cn) = 0 =⇒ c1 = c2 = · · · = cn = 0.


Example 3.2.12. Consider the vectors sinx, sin 2x in F(R). Suppose

a sinx+ b sin 2x = 0 ∀ x ∈ R.

Note that we insist that the equality hold for all x ∈ R since equality in the vector spaceF(R) is equality of functions. So it must hold, in particular, for x = π/2, and so

a sinπ

2+ b sin

2π

2= 0 =⇒ a+ 0 = 0 =⇒ a = 0.

Then, taking x = π/4, gives

a sinπ

4+ b sin

2π

4= 0 =⇒ 0 sin

π

4+ b = 0 =⇒ b = 0.

Thus, the vectors sinx, sin 2x are linearly independent.

Theorem 3.2.13. If T : V → W is an injective linear map and v1, . . . , vn are linearly inde-pendent vectors in V , then Tv1, . . . , T vn are linearly independent in W .

Proof. For scalars c1, . . . , cn, we have

c1(Tv1) + · · ·+ cn(Tvn) = 0

=⇒ T (c1v1 + · · ·+ cnvn) = 0 (T is linear)

=⇒ c1v1 + · · ·+ cnvn = 0 (T is injective)

=⇒ c1 = c2 = · · · = cn = 0 (the vectors v1, . . . , vn are linearly independent).

Note that it is very important in the above theorem that T be injective. For instance, itwould certainly fail to be true if T were the zero map since then the list Tv1, . . . , T vn wouldcontain the zero vector and so could not be linearly independent.

Definition 3.2.14 (Linear dependence/independence of sets). If A is any (possibly infinite)set of vectors in a vector space V , we say A is linearly dependent if some finite list v1, v2, . . . , vnof distinct vectors in A is linearly dependent (in other words, there is some dependencerelation involves a finite number of the vectors of A). We say A is linearly independent ifevery finite list of distinct vectors in A is linearly independent.

Exercises.

3.2.1 ([Ber14, Ex. 3.2.1]). Let V = F(R) be the vector space of all real-valued functions ofa real variable (see Example 1.2.5). Define

f : R→ R, f(t) = sin(t+

π

6

).

Prove that the functions f , sin, and cos are linearly dependent in V .

3.2. Linear dependence/independence 41

3.2.2. Show that in R3, the vectors e1, e2, e3, and (−2, 5, 4) are linearly dependent. (SeeExample 1.4.8.)

3.2.3. Show that the vectors (1, 1, 1), (1, 2,−1), and (1,−1, 2) in R3 are linearly independent.

3.2.4. Suppose T : V → W is a linear map and x1, . . . , xn are vectors in V .

(a) If x1, . . . , xn are linearly dependent, then the vectors Tx1, . . . , Txn are linearly depen-dent in W .

(b) If the vectors Tx1, . . . , Txn are linearly independent, then x1, . . . , xn are linearly inde-pendent.

3.2.5 ([Ber14, Ex. 3.2.7]). If T : V → W is an injective linear map and x1, . . . , xn are vectors inV such that Tx1, . . . , Txn are linearly dependent in W , then x1, . . . , xn are linearly dependentin V .

3.2.6 ([Ber14, Ex. 3.3.6]). If T : V → W is a linear map

3.2.7. If E is a subset of R, then we define a function χE : R→ R (called the characteristicfunction of E) by

χE(x) =

{1, if x ∈ E,0, if x ∈ R \ E.

Note that χE is a vector in the vector space F(R). For example, χ∅ is the zero function (thezero vector in F(R)) and χR is the constant function with value 1.

Let A and B be subsets of R and consider the list

` : χA, χB, χA∩B,

of vectors in F(R).

(a) Show that if the condition

A ∩B 6= ∅ and A 6⊆ B and B 6⊆ A (∗)

is satisfied, then the list ` is linearly independent.

(b) Show that if the condition (∗) is not satisfied, then the list ` is linearly dependent.

3.2.8. Suppose A is a linearly independent set of vectors in a vector space V . Prove thatevery nonempty subset of A is also linearly independent.

3.2.9 ([Ber14, Ex. 3.3.3]). If x1, . . . , xn are linearly independent, then the following implica-tion is true:

a1x1 + · · ·+ anxn = b1x1 + · · ·+ bnxn =⇒ ai = bi for all i = 1, . . . , n.

Conversely, if the above implication is true, then the vector x1, . . . , xn are linearly indepen-dent.


3.2.10 ([Ber14, Ex. 3.3.4]). Prove that if the vectors x1, x2, x3 are linearly independent, thenso are the vectors x1, x1 + x2, x1 + x2 + x3.

3.2.11 ([Ber14, Ex. 3.3.5]). Let a, b, c be distinct real numbers. Prove that the vectors

(1, 1, 1), (a, b, c), (a2, b2, c2)

in R3 are linearly independent. Can you come up with a generalization of this result?

3.2.12 ([Ber14, Ex. 3.3.7]). (a) Let V be a real or complex vector space. Prove the followingstatement: If x, y, z are linearly independent vectors in V , then so are x + y, y + z,z + x.

(b) Does this statement hold if V is instead a vector space over the field Z2 of two elements(see Definition 1.1.2)? Hint : 2x = 0 for every vector x.

3.2.13 ([Ber14, Ex. 3.3.11]). Prove that the functions sin t, sin 2t, and sin 3t are linearlyindependent.

3.2.14 ([Ber14, Ex. 3.3.12]). Let V = C2, u = (1, i), and v = (i,−1).

(a) Show that u, v are linearly independent in V when V is considered as a real vectorspace.

(b) Show that u, v are linearly dependent in V when V is considered as a complex vectorspace.

3.3 Finitely generated vector spaces

Definition 3.3.1 (Finitely generated vector space). A vector space V over a field F is finitelygenerated if there is a finite subset {v1, . . . , vn} ⊆ V such that Span{v1, . . . , vn} = V . Inother words, V is finitely generated if it can be spanned by finitely many vectors. In thecase where V can be considered as a vector space over multiple fields (for example, C is avector space over both R and C), we say V is finitely generated over F if there is a finitesubset {v1, v2, . . . , vn} ⊆ V such that SpanF{v1, . . . , vn} = V .

Remark 3.3.2. Recall (Definition 3.1.3) that if Span{v1, . . . , vn} = V , then we say that theset {v1, . . . , vn} generates V . This explains the term ‘finitely generated’. Later, once we’vedefined dimension, we’ll often use the term finite dimensional instead of finitely generated.

Examples 3.3.3. (a) The zero space {0} is finitely generated since Span{0} = {0}.(b) F n is finitely generated over F since SpanF{e1, e2, . . . , en} = F .

(c) Pn(F ) = SpanF{1, t, t2, . . . , tn} and so Pn(F ) is finitely generated over F .

(d) C is finitely generated over C since C = SpanC{1}.(e) C is finitely generated over R since C = SpanR{1, i}.(f) R is finitely generated over R since R = SpanR{1}.

3.3. Finitely generated vector spaces 43

(g) R is not finitely generated over Q (but this is not so easy to see).

(h) F(R) is not finitely generated over R.

Example 3.3.4. The vector space V = {f ∈ C∞(R) | f ′′ + f = 0} is finitely generated overR. Indeed, we can show that V = SpanR{sin, cos}. To do this, we need to prove that anyf ∈ V can be written as a linear combination of sin and cos. Suppose f ∈ V and let

g(x) = f(x) sinx+ f ′(x) cosx.

Then

g′(x) = f ′(x) sinx+ f(x) cosx+ f ′′(x) cosx− f ′(x) sinx = (f ′′(x) + f(x)) cosx = 0.

Therefore, g(x) is constant. That is g(x) = a (for all x ∈ R) for some a ∈ R. Similarly, if

h(x) = f(x) cosx− f ′(x) sinx,

then one can show that h′(x) = 0 (Exercise 3.3.1) and so h(x) = b (for all x ∈ R) for someb ∈ R (that is, h is a constant function). Therefore we have[

sinx cosxcosx − sinx

]︸︷︷︸

A

[f(x)f ′(x)

]=

[ab

].

Since detA = − sin2 x− cos2 x = −1, for all x ∈ R, the matrix A is invertible with inverse

A−1 =1

detA

[− sinx − cosx− cosx sinx

]=

[sinx cosxcosx − sinx

].

Multiplying both sides of our matrix equation on the left by A−1 gives[f(x)f ′(x)

]=

[sinx cosxcosx − sinx

] [ab

]=

[a sinx+ b cosxa cosx− b sinx

].

Thus f(x) = a sinx+ b cosx for all x ∈ R.

Example 3.3.5. If {v1, . . . , vn} generates V , and vn+1 ∈ V , then {v1, . . . , vn, vn+1} is depen-dent. This is because vn+1 must be a linear combination of v1, . . . , vn and so {v1, . . . , vn+1}is dependent by Theorem 3.2.9.

In contrapositive form, we have that if x1, . . . , xn are independent, then x1, . . . , xn−1cannot generate V .

Theorem 3.3.6. Suppose V is a vector space over a field F and v1, . . . , vn ∈ V . Define alinear map T : F n → V by

T (a1, . . . , an) = a1v1 + · · ·+ anvn.

Then T is surjective if and only if v1, . . . , vn generate V .


Proof. By definition, T is surjective if and only if every vector of V can be written asa1v1 + · · ·+ anvn. This is true if and only if Span{v1, . . . , vn} = V .

Theorem 3.3.7. If T : V → W is a surjective linear map and V is finitely generated, thenW is finitely generated.

Proof. Since V is finitely generated, there is a finite subset {v1, . . . , vn} ⊆ V such {v1, . . . , vn}generates V . Then, since T is surjective, T (A) = {Tv1, . . . , T vn} generates W by Corol-lary 3.1.5. Hence W is finitely generated.

Example 3.3.8. If if V is a finitely generated vector space and M is a subspace of V , then V/Mis finitely generated. This is because the quotient map V → V/M is linear and surjective.

Theorem 3.3.9. A vector space V over F is finitely generated if and only if there exists apositive integer n and a surjective linear map T : F n → V .

Proof. If V is finitely generated, then Span{v1, . . . , vn} = V for some finite subset {v1, . . . , vn} ⊆V . Then, by Theorem 3.3.6, there is a surjective linear map T : F n → V .

Conversely, if there is a positive integer n and a linear surjective map T : F n → V , then,by Theorem 3.3.7, V is finitely generated since F n is.

Exercises.

3.3.1. In the notation of Exercise 3.3.4, show that h′(x) = 0.

3.3.2. Do the vectors (2, 1, 1), (3,−1, 1), (10, 5, 5), and (6,−2, 2) generate R3? Justify youranswer.

3.3.3 ([Ber14, Ex. 3.4.4]). Let V = C∞(R) be the vector space of all functions R→ R havingderivatives of all orders (see Example 1.2.6) and let

U = {f ∈ V | f ′′ = f},

where f ′′ is the second derivative of f . Prove that U is the subspace of V generated by thetwo functions t 7→ et and t 7→ e−t. Hint : If f ′′ = f , consider f = 1

2(f + f ′) + 1

2(f − f ′).

3.3.4 ([Ber14, Ex. 3.4.6]). With V as in Exercise 3.3.3, let

W = {f ∈ V | f ′′ = −f}.

Prove that W is generated by the functions sin and cos. Hint : If f ′′ = −f , calculate thederivative of f sin +f ′ cos and f cos−f ′ sin.

3.4. Basis and dimension 45

3.4 Basis and dimension

Definition 3.4.1 (Basis). A finite list of vectors v1, . . . , vn in a vector space V is called abasis of V if it is both independent and generating. In other words, every vector v ∈ V canbe written as a linear combination

v = a1v1 + · · ·+ anvn

(since Span{v1, . . . , vn} = V ) and the coefficients a1, . . . , an are unique by Corollary 3.2.8.These coefficients are called the coordinates of v with respect to the basis v1, . . . , cn.

We also speak of the set {v1, . . . , vn} being a basis. If we wish to emphasize the field, wesay v1, . . . , vn is a basis of V over F . The plural of ‘basis’ is ‘bases’ (pronounced bay-sees).

Examples 3.4.2. (a) For any field F , the set {e1, . . . , en} is a basis of F n, called the cano-nical basis , the standard basis , or the natural basis of F n.

(b) If v1, . . . , vn are independent vectors in a vector space V , then {v1, . . . , vn} is a basisof Span{v1, . . . , vn}.

(c) {1} is a basis for C over C.

(d) {1, i} is a basis for C over R.

(e) {1, t, . . . , tn} is a basis for Pn(R) over R.

Example 3.4.3. Suppose v1, . . . , vn is a basis of V . If vn+1 is any vector in V , then v1, . . . , vn+1

is dependent (since vn+1 is a linear combination of the other vectors and so the set is de-pendent by Theorem 3.2.9) and so is not a basis. Additionally, the list v1, . . . , vn−1 is notgenerating (since if it were, vn would be a linear combination of the vectors in this shorterlist and so v1, . . . , vn would not be independent).

Theorem 3.4.4. Suppose V is a vector space over a field F and v1, . . . , vn is a finite list ofvectors in V . Define a linear map T : F n → V by

T (a1, . . . , an) = a1v1 + · · ·+ anvn.

Then the following statements are equivalent:

(a) v1, . . . , vn is a basis of V ,

(b) T is bijective.

Proof. If v1, . . . , vn is a basis if and only if it is both independent and generating. This istrue if and only if T is injective and surjective by Theorem 3.3.6 and Corollary 3.2.8.

Corollary 3.4.5. A vector space V over a field F has a (finite) basis if and only if V ∼= F n

for some positive integer n.

Proof. If V ∼= Fn, then there is a bijective linear map T : F n → V . Let e1, . . . , en be thecanonical basis of F n. Then Te1, . . . , T en is independent (by Theorem 3.2.13) and generating(by Corollary 3.1.5) and hence is a basis of V .

Conversely, if V has a basis v1, . . . , vn, then V ∼= F n by Theorem 3.4.4.


Theorem 3.4.6. Every nonzero finitely generated vector space has a basis.

Proof. Let V 6= {0} be a finitely generated vector space. We know that there is somefinite set of vectors that spans V . Among all such sets, choose one of minimal size, say{u1, . . . , uk}. So {u1, . . . , uk} generates V . We will show that it is also linearly independent(by contradiction). Suppose that this set is dependent. Then, by Theorem 3.2.9, there existsa j, 1 ≤ j ≤ k, such that uj ∈ Span{u1, u2, . . . , uj, . . . , uk} (where the notation uj means weomit the vector uj). Then V = Span{u1, . . . , uj, . . . , uk}, which contradicts the minimality ofk. Therefore, {u1, . . . , uk} independent and hence is a basis (since it is also generating).

Remarks 3.4.7. (a) We will see that every vector space (not just finitely generated ones)has a basis.

(b) The above proof shows that we can get a basis by taking a sublist of some generatingset. We just keep removing vectors until the vectors are independent.

(c) A vector space can have more than one basis.

Even though a vector space can have more than one basis, any two bases have the samenumber of vectors, as we will see.

Theorem 3.4.8. Suppose V is a vector space and

• R is a linearly independent subset of V with |R| = m and• S is a generating set for V with |S| = n.

Then m ≤ n and there exists a subset S ′ ⊆ S with |S ′| = n−m such that

V = Span(R ∪ S ′).

Proof. We prove the result by induction on m. In the base case m = 0, we have R = ∅.Thus, taking S ′ = S gives the desired statement.

For the induction step, we now assume that the theorem holds for some integer m ≥ 0(and arbitrary n). We wish to show that it holds for m+ 1.

LetR = {v1, . . . , vm+1} be a linearly independent subset of V . Then the subset {v1, . . . , vm}is also linearly independent (Exercise 3.2.8). Therefore, by the induction hypothesis, we havem ≤ n and there is a subset S ′ = {u1, . . . , un−m} of S such that

V = Span ({v1, . . . , vm} ∪ {u1, . . . , un−m}) .

Since vm+1 ∈ V , this means that there exist scalars a1, . . . , am ∈ F and b1, . . . , bn−m ∈ Fsuch that

vm+1 = a1v1 + · · ·+ amvm + b1v1 + · · ·+ bn−mun−m.

We that the b1, . . . , bn−m are not all zero. Indeed, if b1 = b2 = · · · = bn−m = 0, then wehave

vm+1 = a1v1 + · · ·+ amvm ∈ Span({v1, . . . , vm}).

Thus, by Theorem 3.2.9, the set {v1, . . . , vm+1} in linearly independent, contradicting ourassumption.


By the above, we can choose 1 ≤ i ≤ n−m such that bi 6= 0. Then we have

ui = (−b−1i a1)v1 + · · ·+ (−b−1i am)vm + b−1i vm+1 + · · ·+ (−b−1i b1)u1 + · · ·+ (−b−1i bi−1)vi−1

+ (−b−1i bi+1) + · · ·+ (−b−1i bn−m)un−m. (3.1)

LetS ′ = {u1, . . . , ui−1, ui+1, . . . , un−m}.

Then, by (3.1), we have ui ∈ Span(R ∪ S ′) and so

V = Span ({v1, . . . , vm} ∪ {u1, . . . , un−m})= Span ({v1, . . . , vm, vm+1} ∪ {u1, . . . , ui−1, ui+1, . . . , un−m})= Span(R ∪ S ′).

Thus the result holds for |R| = m+ 1, completing the proof of the induction step.

Theorem 3.4.8 tells us that the size of any spanning set is greater than the size of anylinearly independent set (of the same vector space).

Corollary 3.4.9. Every linearly independent set of vectors in a finitely-generated vectorspace V can be extended to a basis of V .

Proof. Let R be a linearly independent set of vectors in a finitely-generated vector spaceV . Since V is finitely generated, it has a generating set S. Then the corollary follows fromTheorem 3.4.8.

Theorem 3.4.10. If V has a basis with n elements, then every basis of V has n elements.

Proof. Suppose B = {v1, . . . , vn} and C = {w1, . . . , wm} are both bases of V . Since B spansV and C is linearly independent, we have m ≤ n (by Theorem 3.4.8). Since C spans V andB is linearly independent, we also have n ≤ m. Hence m = n.

Definition 3.4.11 (Dimension). Suppose V is a finitely generated vector space. If V 6= {0},then the number of vectors in any basis of V is called the dimension of V and is writtendimV . If V = {0}, we say dimV = 0. If we wish to emphasize the field F , we writedimF V for the dimension of V over F . Finitely generated vector spaces are also called finitedimensional . If dimV = n, we say V is n-dimensional. If V is not finite dimensional, wesay it is infinite dimensional .

Examples 3.4.12. (a) dimF Fn = n. (See Exercise 3.4.2.)

(b) dimC C = 1.

(c) dimR C = 2.

(d) dimRPn(R) = n+ 1.

(e) dimRPn(C) = 2(n+ 1).

Lemma 3.4.13. Suppose V is a vector space. If {v1, . . . , vn} spans V , then dimV ≤ n. If{w1, . . . , wm} is linearly independent, then m ≤ dimV .


Proof. This follows from Theorem 3.4.8.

Lemma 3.4.14. Suppose {v1, . . . , vn} is independent and v is a vector. Then {v, v1, . . . , vn}is independent if and only if v 6∈ Span{v1, . . . , vn}.

Proof. We prove the contrapositive: That {v, v1, . . . , vn} is dependent if and only if v ∈Span{v1, . . . , vn}. We already know by Theorem 3.2.9 that if v ∈ Span{v1, . . . , vn}, then{v, v1, . . . , vn} is dependent. Now suppose {v, v1, . . . , vn} is dependent. Then there arescalars a, a1, . . . , an, not all zero, such that

av + a1v1 + · · ·+ anvn = 0.

If a = 0, then a1v1 + · · · + anvn = 0. Since {v1, . . . , vn} is independent, we have a1 = · · · =an = 0. This contradicts the fact that a, a1, . . . , an are not all zero. So a 6= 0. Then

v = −a−1a1v1 − · · · − a−1anvn ∈ Span{v1, . . . , vn}.

Theorem 3.4.15. Suppose V is an n-dimensional vector space. Then the following state-ments are equivalent.

(a) {v1, . . . , vn} spans V .

(b) {v1, . . . , vn} is linearly independent.

(c) {v1, . . . , vn} is a basis of V .

Proof. (a) ⇒ (b): Suppose {v1, . . . , vn} spans V but is dependent. Then there exists an isuch that V = Span{v1, . . . , vi, . . . , vn}. But then V has a spanning set with n− 1 elements.But this contradicts Lemma 3.4.13. Thus, (b) holds.

(b) ⇒ (a): Suppose {v1, . . . , vn} is independent but does not span V . Then we can finda vector v ∈ V such that v 6∈ Span{v1, . . . , vn}. Then, by Lemma 3.4.14, {v, v1, . . . , vn} is anindependent set with n+ 1 elements. But this contradicts Lemma 3.4.13. Thus, (a) holds.

It is now clear that (a) and (b) are equivalent to (c).

Remark 3.4.16. The point of the above theorem is that if we know the dimension of a vectorspace ahead of time, to show that some set is a basis we only need to check one of the twodefining properties of a basis (independence or spanning).

Example 3.4.17. We know dimR C = 2. Since the set {1, 1 + i} is linearly independent(Exercise 3.4.1), it is a basis.

Example 3.4.18. Let

W = {f ∈ C∞(R) | f ′′ + f = 0} = Span{sin, cos}.

We leave it as an exercise (Exercise 3.4.3) to show that sin, cos are linearly independent.Then dimW = 2.

Define

f(x) = sinx+ cosx

g(x) = sinx− cosx.

Then f and g are linearly independent (Exercise 3.4.3). Therefore {f, g} is a basis for W .


Example 3.4.19. Let

U = {(x, y, z) ∈ C3 | x+ y + z = 0} = Span{(−1, 1, 0), (−1, 0, 1)}.

(You learned how to solve homogeneous systems like this in MAT 1341.) Moreover,

{(−1, 1, 0), (−1, 0, 1)}

is linearly independent (it is a two vector set, and neither vector is a multiple of the other).Thus {(−1, 1, 0), (−1, 0, 1)} is a basis of U and so dimU = 2.

Since (1, 1,−2), (0, 1,−1) both belong to U and are linearly independent,

{(1, 1,−2), (0, 1,−2)}

is another basis for U .

Theorem 3.4.20. Suppose V and W are finite-dimensional vector spaces. Then V ∼= W ifand only if dimV = dimW .

Proof. Suppose dimV = n. Then V has a basis {v1, . . . , vn} with n vectors. If V ∼= W , thenthere exists an isomorphism T : V → W . Then Tv1, . . . , T vn generates W (Corollary 3.1.5)and is independent (Theorem 3.2.13). Then {Tv1, . . . , T vn} is a basis of W and so dimW =n = dimV .

Suppose suppose dimW = dimV = n. Then W has a basis {w1, . . . , wn} with n elements.Then, by Theorem 3.4.4, V ∼= F n and W ∼= F n. Hence V ∼= W by Theorem 2.4.12.

Theorem 3.4.21. Suppose W is a subspace of a finite-dimensional vector space V . Then

(a) W is finite dimensional and dimW ≤ dimV , and

(b) dimW = dimV if and only if W = V .

Proof. Let n = dimV . If W = {0}, then the theorem is trivially true. Therefore, assumeW 6= {0}. Hence V 6= {0} and n ≥ 1. Choose w1 ∈ W , w1 6= 0. Then {w1} is independent.If Span{w1} = W , then W is finitely-generated. Otherwise, choose w2 6∈ Span{w1}. Then{w1, w2} is independent. We continue in the manner, extending the list. By Lemma 3.4.13,this process must stop and so we obtain a list {w1, . . . , wk} which is both independentand spans W , with k ≤ n. Therefore, dimW ≤ dimV . If dimW = dimV (i.e. k =n), then {w1, . . . , wk} must span V since it is independent (Theorem 3.4.15) and so W =Span{w1, . . . , wk} = V . It is clear that if W = V , then dimW = dimV .

Example 3.4.22. Take

V = P2(R) = {a0 + a1t+ a2t2 | ai ∈ R}.

The set {1 − t} is independent in V (since 1 − t is not the zero polynomial, e.g. 1 − t 6= 0at t = 0). Since Span{1 − t} 6= P2(R), we continue. Choose a polynomial, for instance,t ∈ P2(R), but t 6∈ Span{1− t}. Then {1− t, t} is linearly independent. Is Span{1− t, t} =P2(R)? No (for instance t2 6∈ Span{1 − t, t}). Then the set {1 − t, t, t2} is independent inP2(R). Since dimP2(R) = 3, we know that {1− t, t, t2} is a basis.


Exercises.

3.4.1. Prove that the complex numbers 1 and 1 + i are linearly independent over R.

3.4.2. Suppose F is a field.

(a) Prove that dimF Fn = n.

(b) Prove that F n ∼= Fm if and only if m = n.

3.4.3. This exercise concerns Example 3.4.18.

(a) Show that sin and cos are linearly independent (over R) elements of the set W . Hint :Write down an arbitrary linear combination and suppose it is equal to the zero function.Then evaluate the functions at carefully selected points that allow you to conclude thecoefficients in your linear combination must be zero.

(b) Show that f and g are linearly independent.

3.4.4 ([Ber14, Ex. 3.5.5]). Let α be a real number. Prove that the vectors

u = (cosα, sinα) and v = (− sinα, cosα)

are a basis of R2.

3.4.5 ([Ber14, Ex. 3.5.6]). True or false (explain): If x1, x2, x3 is a basis of V , then so isx1, x1 + x2, x1 + x2 + x3.

3.4.6 ([Ber14, Ex. 3.5.7]). (a) In R2, find the coordinates of the vector (2, 3) with respectto the basis

(12

√3, 1

2

),(−1

2, 12

√3).

(b) In Rn, find the coordinates of the vector (a1, . . . , an) with respect to the canonical basise1, . . . , en (see Example 1.4.8).

3.4.7 ([Ber14, Ex. 3.5.8]). If x1 = (1, 2, 0), x2 = (2, 1, 0), and x3 = (a, b, 1), where a and b areany real numbers, prove that x1, x2, x3 is a basis of R3.

3.4.8. Suppose T : V → W is a linear map, with V finite dimensional. Prove that dimT (U) ≤dimU for any subspace U of V .

3.4.9. Find a basis of the subspace

U = {(x, y, z) ∈ R3 | x+ 2y + 3z = 0}

of R3. What is the dimension of U? Note: This is really a MAT 1341 question. It’s here torefresh your memory.

3.5. The Dimension Theorem 51

3.4.10. Suppose v1, . . . , vn (where n ≥ 2) is a basis of a vector space V . Choose r ∈ {1, . . . , n−1} and define

M = Span{v1, . . . , vr},N = Span{vr+1, . . . , vn}.

Show that V = M ⊕N .

3.4.11. Suppose that U is a subspace of a finitely generated vector space V . Show that U hasa complement. That is, show that there exists some subspace W of V such that U⊕W = V .

3.4.12 ([Ber14, Ex. 3.5.13]). Prove that if x1, . . . , xn is a basis of V and a1, . . . , an are nonzeroscalars, then a1x1, . . . , anxn is also a basis of V .

3.4.13 ([Ber14, Ex. 3.5.14]). Prove that if x1, x2, x3 is a basis of V and a1, a2, a3 are nonzeroscalars, then the list

a1x1, a1x1 + a2x2, a1x1 + a2x2 + a3x3

is also a basis of V . Hint : Combine Exercises 3.4.5 and 3.4.12.

3.4.14. Suppose that F is a finite field with q elements. Let V be an n-dimensional vectorspace over F . How many elements does V have? Remember to justify your answer.

3.4.15 ([Ber14, Ex. 3.5.20]). Let V be a finite-dimensional complex vector space. Prove thatdimR V = 2 · dimC V . Hint : As complex vector spaces, V ∼= Cn for some n.

3.5 The Dimension Theorem

Theorem 3.5.1 (Dimension Theorem). If V is a finite-dimensional vector space and T : V →W is a linear map, then

dimV = dimT (V ) + dim(KerT ).

Proof. By Corollary 2.2.4, KerT is a subspace of V , and hence is finite-dimensional. Choosea basis B′ = {v1, . . . , vk} of KerT . By Corollary 3.4.9, we can extend this to a basisB = {v1, . . . , vk, vk+1, vn} of V .

Suppose w ∈ T (V ). Then w = T (v) for some v ∈ V . Since B is a basis for V , we canwrite

v =n∑i=1

civi, c1, . . . , cn ∈ F.

Then

w = T (v) = T

(n∑i=1

civi

)=

n∑i=1

ciT (vi) =n∑

i=k+1

ciT (vi).

Therefore,

T (V ) = Span{T (vk+1), . . . , T (vn)}.


We claim that the {T (vk+1), . . . , T (vn)} is linearly independent and hence forms a basisof T (V ). Indeed, suppose

ck+1T (vk1) + · · ·+ cnT (vn) = 0

for some ck+1, . . . , cn ∈ F . Then

T (ck+1vk+1 + · · ·+ cnvn) = ck+1T (vk1) + · · ·+ cnT (vn) = 0,

and so ck+1vk+1 + · · ·+ cnvn ∈ KerT . Since B′ is a basis for KerT , we have

ck+1vk+1 + · · ·+ cnvn = c1v1 + · · ·+ ckvk

for some c1, . . . , ck ∈ F . But then

−c1v1 − · · · − ckvk + ck+1vk+1 + · · ·+ cnvn = 0.

Since B is a basis, it is linearly independent. Hence c1 = c2 = · · · = cn = 0.Finally, counting the number of basis elements, we have

dimV = n, dim(KerT ) = k, dimT (V ) = n− k.

Hence dimV = dimT (V ) + dim(KerT ), as desired.

Definition 3.5.2 (Rank and Nullity). If T : V → W is a linear map, we define the rank ofT to be

rankT = dimT (V ),

and the nullity of T to benullT = dim(KerT ).

We can now rephrase Theorem 3.5.1 as

rankT + nullT = dimV, (3.2)

where V is the domain of the linear map T . The equation (3.2) is sometimes call the Rank-Nullity Theorem.

Corollary 3.5.3. If T : V → W is linear, and dimV = dimW <∞, then

T is an isomorphism ⇐⇒ T is injective ⇐⇒ T is surjective.

Proof. Obviously, if T is an isomorphism, it is injective (by definition). Suppose T is injective.Then KerT = {0} and so dimT (V ) = dimV by the Dimension Theorem. So T (V ) = V(since T (V ) is a subspace of V , we can apply Theorem 3.4.21). Finally, suppose T issurjective. Then T (V ) = V and so rankT = dimV . By the Dimension Theorem, nullT = 0and so T is injective, hence an isomorphism.

Theorem 3.5.4. If V1, V2, . . . , Vn are finite-dimensional vector spaces, then so is V = V1 ×· · · × Vn and

dimV = dimV1 + dimV2 + · · ·+ dimVn.

3.5. The Dimension Theorem 53

Proof. For each i = 1, . . . , n, let Bi be a basis of Vi, and let

B′i = {(0, . . . , 0, v, 0, . . . , 0) | v ∈ Bi} ⊆ V,

where the v appears in the i-th position. We leave it as an exercise (Exercise 3.5.1) to showthat

B := B′1 ∪B′2 ∪ · · · ∪B′nis a basis of V . Since each Bi has dimVi elements, B has

∑ni=1 dimVi elements, and the

theorem follows.

Theorem 3.5.5. A system of m homogeneous linear equations in n unknowns, where n > m,always has a nontrivial solution.

Proof. As you saw in MAT 1341, such a system is equivalent to a matrix equation

Ax = 0,

where A is the coefficient matrix, and hence is m× n. The map

T : F n → Fm, T (x) = Ax,

is linear (since it is multiplication by a matrix). The set of solutions is precisely the kernelof T . By Theorem 3.5.1, we have

dimT (F n) + dim KerT = dimF n = n =⇒ dim KerT = n− dimT (F n) ≥ n−m > 0.

Here we used that T (F n) ⊆ Fm and so dimT (F n) ≤ dimFm = m. So dim KerT > 0, whichmeans that KerT is not the zero vector space and hence there are nonzero elements in thekernel of T (which correspond to nontrivial solutions to the homogeneous system).

Theorem 3.5.6. Suppose T : V → W is a linear map between finite-dimensional vectorspaces.

(a) If T is injective, then dimV ≤ dimW .

(b) If T is surjective, then dimV ≥ dimW .

(c) If T is bijective, then dimV = dimW .

Proof. (a) If T is injective, then KerT = {0}. Thus

dimV = dimT (V ) + dim KerT = dimT (V ) + 0 ≤ dimW.

(b) If T is surjective, then T (V ) = W . Thus

dimV = dimT (V ) + dim KerT = dimW + dim KerT ≥ dimW.

(c) This follows immediately from the previous two parts.


Exercises.

3.5.1. Show that B, as defined in the proof of Theorem 3.5.4 is a basis of V .

3.5.2 ([Ber14, Ex. 3.6.1]). Does there exists a linear map T : R7 → R3 whose kernel is 3-dimensional?

3.5.3. For this exercise, the notation UT−→ V means that T is a mapping from U to V (that

is, it means the same thing as T : U → V ).

Let

{0} T0−→ V1T1−→ V2

T2−→ V3T3−→ {0}

be linear maps satisfying:

ImTi−1 = KerTi for all i = 1, 2, 3.

(a) Show that T1 is injective and that T2 is surjective.

(b) Show that∑3

i=1(−1)i dimVi = 0.

3.5.4 ([Ber14, Ex. 3.6.14]). Let T : V → W and S : W → U be linear maps, with V finitedimensional.

(a) If S is injective, then KerST = KerT and rank(ST ) = rank(T ).

(b) If T is surjective, then ImST = ImS and null(ST )− null(S) = dimV − dimW .

3.5.5 ([Ber14, Ex. 3.7.3]). Let V be a finite-dimensional vector space and let S, T ∈ L(V ).Prove the following statements.

(a) rank(S + T ) ≤ rankS + rankT .

(b) rank(ST ) ≤ rankS, and rank(ST ) ≤ rankT .

(c) null(S + T ) ≥ nullS + nullT − dimV .

3.5.6 ([Ber14, Ex.3.7.6]). Let V = P(R) be the vector space of all real polynomial functions(see Example 1.2.8). Let S, T ∈ L(V ) be the linear maps such that Tp = p′ and Sp is theantiderivative of p with constant term zero. Then TS = I. Is T bijective?

3.5.7 ([Ber14, Ex.3.7.7]). Let V be any vector space (not necessarily finite dimensional) andsuppose R, S, T ∈ V are such that ST = I and TR = I. Prove that T is bijective andR = S = T−1. Hint : Look at (ST )R.

3.6. Dimensions of spaces of linear maps 55

3.6 Dimensions of spaces of linear maps

Theorem 3.6.1. Suppose V is an n-dimensional vector space and W is an arbitrary (possiblyinfinite-dimensional) vector space. If {v1, . . . , vn} is a basis of V and w1, . . . , wn are anyvectors in W , then there exists a unique linear map T : V → W such that Tvi = wi for alli = 1, . . . , n.

Proof. Existence: We know by Theorem 2.1.4 that the maps R : F n → V and S : F n → Wdefined by

R(a1, . . . , an) = a1v1 + · · ·+ anvn,

S(a1, . . . , an) = a1w1 + · · ·+ anwn,

are linear. It is clear that if {e1, . . . , en} is the canonical basis of F n, then

Rei = vi, Sei = wi.

Since {v1, . . . , vn} is a basis of V , R is bijective by Theorem 3.4.4. So R is invertible andR−1 is linear by Theorem 2.4.9. Therefore, the map T := SR−1 is linear. Since

Tvi = SR−1vi = Sei = wi,

the map T has the desired properties.Uniqueness: Suppose T1, T2 : V → W are two linear maps with the given property. Then

(T1 − T2)(vi) = T1vi − T2vi = wi − wi = 0 ∀ i = 1, . . . , n.

Since {v1, . . . , vn} is a basis for V , this means that T1 − T2 = 0 (the zero map). ThusT1 = T2.

Corollary 3.6.2. If V and W are finite-dimensional vector spaces, then L(V,W ) is alsofinite dimensional and

dimL(V,W ) = (dimV )(dimW ).

Proof. Let n = dimV and let {v1, . . . , vn} be a basis of W . Define a map

Φ: L(V,W )→ W n = W ×W × · · · ×W, Φ(T ) = (Tv1, . . . , T vn).

Then Φ is linear (Exercise 3.6.1) and bijective by Theorem 3.6.1. Hence Φ is a vector spaceisomorphism and so

dimL(V,W ) = dimW n = n(dimW ) = (dimV )(dimW ).

(In the first equality, we used Theorem 3.5.4).


Exercises.

3.6.1. Prove that the map Φ defined in the proof of Corollary 3.6.2 is linear.

3.6.2. Suppose V andW are finite-dimensional vector spaces over a field F , and dimL(V,W ) =11. Show that either V ∼= F or W ∼= F .

3.6.3 ([Ber14, Ex. 3.8.2]). True or false (explain): L(R2,R3) ∼= R6.

3.6.4 ([Ber14, Ex. 3.8.3]). If V and W are finite-dimensional vector spaces, prove thatL(V,W ) ∼= L(W,V ). Hint : Don’t look for a map L(V,W )→ L(W,V ).

3.6.5 ([Ber14, Ex. 3.8.7]). Prove that if V is a finite-dimensional vector space and T ∈ L(V ),then there exists a positive integer r and scalars a0, a1, . . . , ar (not all zero) such that

a0v + a1Tv + a2T2v + · · ·+ arT

rv = 0, ∀ v ∈ V.

3.6.6 ([Ber14, Ex. 3.8.8]). Let x1, . . . , xn be a basis of V and, for each pair of indices i, j ∈{1, . . . , n}, let Ei,j ∈ V be the linear mapping given by

Ei,jxk =

{xi if j = k,

0 if j 6= k.

Prove the following statements:

(a) Ei,jEj,k = Ei,k;

(b) Ei,jEh,k = 0 if j 6= h;

(c) E1,1 + E2,2 + · · ·+ En,n = I.

3.7 Dual spaces

Recall (Definition 2.1.10) that if V is a vector space over a field F then dual space of linearforms is

V ∗ = L(V, F ).

Remark 3.7.1. Some references use the notation V ′ instead of V ∗.

Theorem 3.7.2. If dimV <∞, then dimV ∗ = dimV .

Proof. This follows from Corollary 3.6.2:

dimV ∗ = dimL(V, F ) = (dimV )(dimF ) = (dimV ) · 1 = dimV.

3.7. Dual spaces 57

Proposition 3.7.3 (Existence of dual bases). Suppose {v1, . . . , vn} is a basis of a vectorspace V . Then there exists a basis {f1, . . . , fn} of V ∗ such that

fi(vj) = δij :=

{1, i = j,

0, i 6= j.

(The symbol δij is called the Kronecker delta.)

Proof. For i = 1, . . . , n, define

fi : V → F, fi

(n∑j=1

cjvj

)= ci. (3.3)

Then each fi is linear (Exercise 3.7.1) and so fi ∈ V ∗. Moreover, we easily see that fi(vj) =δij. Since we know from Theorem 3.7.2 that dimV ∗ = n, to show that {f1, . . . , fn} is abasis, it is enough to show that this set is linearly independent (Theorem 3.4.15). Now, forc1, . . . , cn ∈ F ,

n∑j=1

cjfj = 0 =⇒n∑j=1

cjfj(vi) = 0 ∀ i = 1, . . . , n

=⇒n∑j=1

cjδji = 0 ∀ i = 1, . . . , n

=⇒ ci = 0 ∀ i = 1, . . . , n.

Thus {f1, . . . , fn} is linearly independent and so we’re done.

Definition 3.7.4 (Dual basis). We call {f1, . . . , fn} the basis of V ∗ dual to {v1, . . . , vn}.

Example 3.7.5. If {e1, . . . , en} is the canonical basis of F n, then the dual basis of (F n)∗ is{f1, . . . , fn} where

fi(a1, . . . , an) = ai

is the i-th coordinate function.If f : R3 → R is defined by f(x, y, z) = x− 2y + 3z (so f ∈ (R3)∗), then

f = f1 − 2f2 + 3f3.

Theorem 3.7.6. Suppose T : V → W is a linear map. Then the map T ∗ : W ∗ → V ∗ definedby

T ∗g = g ◦ T, g ∈ W ∗

is linear.V

T //

g◦T

W

g��F


Proof. First, it is clear that for g ∈ W ∗ = L(W,F ), the composition g ◦ T is a map from Vto F . So we only need to show it is linear. Suppose g, h ∈ W ∗ and c, d are scalars. Then forall v ∈ V , we have

(T ∗(cg + dh))(v) = ((cg + dh)T )(v)

= (cg + dh)(Tv)

= c(g(Tv)) + d(h(Tv))

= c(gT )(v) + d(hT )(v)

= c(T ∗g)(v) + d(T ∗h)(v)

= (cT ∗g + dT ∗h)(v).

Thus T ∗(cg + dh) = cT ∗g + dT ∗h. Therefore T ∗ is linear.

Definition 3.7.7 (Transpose). The map T ∗ is called the transpose of the map T .

Remark 3.7.8. We will see later than the transpose of a linear map is closely related to thetranspose of a matrix (which you learned about in MAT 1341).

Theorem 3.7.9. If T : V → W is a linear map, then

KerT ∗ = {g ∈ W ∗ | g(w) = 0 ∀ w ∈ T (V )}.

Proof. If g ∈ W ∗, then

g ∈ KerT ∗ ⇐⇒ T ∗g = 0 ⇐⇒ gT = 0 ⇐⇒ (gT )(v) = 0 ∀ v ∈ V⇐⇒ g(Tv) = 0 ∀ v ∈ V ⇐⇒ g(w) = 0 ∀ w ∈ T (V ).

Definition 3.7.10 (Annihilator). If M is a subspace of V , then the annihilator of M in V ∗

isM◦ := {f ∈ V ∗ | f(w) = 0 ∀ w ∈M} = {f ∈ V ∗ | f = 0 on M}.

Note that M◦ is a subspace of V ∗.

Using the above terminology, the result of Theorem 3.7.9 is

KerT ∗ = T (V )◦ = (ImT )◦.

Theorem 3.7.11. If T : V → W is a linear map, then ImT ∗ = (KerT )◦.

Proof. Let f ∈ ImT ∗. Then f = T ∗g for some g ∈ W ∗. Thus

f(v) = (T ∗g)(v) = gTv ∀ v ∈ V.

Sov ∈ KerT =⇒ Tv = 0 =⇒ f(v) = g(0) = 0.

Therefore ImT ∗ ⊆ (KerT )◦.

3.7. Dual spaces 59

Now suppose f ∈ (KerT )◦. We want to find a g ∈ W ∗ such that f = T ∗g. Let {v1, . . . , vk}be a basis of KerT and extend this to a basis {v1, . . . , vk, u1, . . . , ul} of V . We know that{Tu1, . . . , Tul} is a basis of ImT . Extend this to a basis of {Tu1, . . . , Tul, w1, . . . , wp} of W .Now define g ∈ W ∗ as follows. For scalars c1, . . . , cl, a1, . . . , ap, define

g(cTu1 + · · ·+ clTul + a1w1 + · · ·+ apwp) = c1f(u1) + · · ·+ clf(ul) = f(c1u1 + · · ·+ clul).

Then g is linear by Theorem 3.6.1. We will show that T ∗g = f . Let x ∈ V and writex = v + u with v ∈ KerT and u ∈ Span{u1, . . . , ul}. Then

(T ∗g)(x) = gT (v + u) = gT (u) = f(u).

On the other hand,f(x) = f(v + u) = f(v) + f(u) = f(u),

since f ∈ (KerT )◦. Hence T ∗g = f . Thus (KerT )◦ ⊆ ImT ∗ and so (KerT )◦ = ImT ∗.

Theorem 3.7.12. If U is a subspace of V and dimV <∞, then

dimU◦ = dimV − dimU = dimV ∗ − dimU∗.

Proof. Let j : U → V be the inclusion map (i.e. j(u) = u for all u ∈ U). Then we have

Ker j∗ = (Im j)◦ and Im j∗ = (Ker j)◦.

Now, Im j = U and Ker j = {0}. Thus

Ker j∗ = U◦ and Im j∗ = {0}◦ = U∗

(so j∗ is surjective). By the Dimension Theorem (Theorem 3.5.1), we have

dim Ker j∗ + dim Im j∗ = dimV ∗.

ThusdimU◦ + dimU∗ = dimV ∗ = dimV.

Since dimU = dimU∗, the result follows.

Corollary 3.7.13. If T : V → W is linear, and dimV, dimW <∞, then

(a) rankT = rankT ∗,

(b) T is surjective ⇐⇒ T ∗ is injective, and

(c) T is injective ⇐⇒ T ∗ is surjective.

Proof. (a) We have

rankT ∗ = dimW ∗ − dim(KerT ∗)

= dimW ∗ − dim(ImT )◦

= dimW ∗ −(

dimW − dim(ImT ))

= dim ImT

= rankT.


(b) We have

T is surjective ⇐⇒ rankT = dimW ⇐⇒ rankT ∗ = dimW ∗

⇐⇒ dim KerT ∗ = 0 ⇐⇒ T ∗ is injective.

(c) We have

T is injective ⇐⇒ rankT = dimV ⇐⇒ rankT ∗ = dimV ∗ ⇐⇒ T ∗ is surjective.

Exercises.

3.7.1. Show that the maps fi defined by (3.3) are linear.

3.7.2 ([Ber14, Ex. 3.6.7]). Let V be an n-dimensional vector space and let f be a nonzerolinear form on V . Prove that the dim Ker(f) = n = 1. Conversely, show that every (n− 1)-dimensional linear subspace of V is the kernel of a linear form.

3.7.3 ([Ber14, Ex. 3.6.9]). Let V be a vector space and suppose f1, . . . , fn are linear formson V such that

(Ker f1) ∩ · · · ∩ (Ker fn) = {0}.

Prove that V is finite-dimensional and dimV ≤ n. Hint : If F is the field of scalars, considera suitable mapping V → F n.

3.7.4 ([Ber14, Ex. 3.6.10]). Let V be the set of all vectors (x, y, z) ∈ R3 such that 2x−3y+z =0. Prove that V is a subspace of R3 and find its dimension. Hint : Use Exercise 3.7.2.

3.7.5 ([Ber14, Ex. 3.9.3]). If V and W are finite-dimensional vector spaces and T : V → Wis a linear map, prove that T and T ∗ have the same nullity if and only if dimV = dimW .

3.7.6 ([Ber14, Ex. 3.9.5]). Let V = P(R) be the vector space of real polynomial functions(Example 1.2.8). Every nonnegative integer k defined a linear form ϕk on V given byϕk(p) = p(k)(0), where p(k) denotes the k-th derivative of p. IfD : V → V is the differentiationmap Dp = p′, show that D∗ϕk = ϕk+1. Hint : ϕk(p) = (Dkp)(0).

3.7.7 ([Ber14, Ex. 3.9.7]). Suppose V and W are finite-dimensional vector spaces and T : V →W is a linear map. Prove that T is bijective if and only if T ∗ is bijective. Hint : UseCorollary 3.7.13.

3.7.8 ([Ber14, Ex. 3.9.10]). Let V and W be finite-dimensional vector spaces. Show that themapping

Φ: L(V,W )→ L(W ∗, V ∗), Φ(T ) = T ∗

is linear and bijective.

3.7. Dual spaces 61

3.7.9 ([Ber14, Ex. 3.9.13]). Let V be a finite-dimensional vector space, M a subspace of V ,and M◦ the annihilator of M in V ∗. Prove the following:

(a) M◦ = {0} if and only if M = V ;

(b) M◦ = V ∗ if and only if M = {0}.

Chapter 4

Matrices

In this chapter we look at matrices. Some of this material will be familiar from MAT 1341.However, we will cover the topic of matrices in more detail here. In particular, we discussthe matrix of a linear map, which generalizes the “standard matrix” of a map Rn → Rm

seen in MAT 1341. We will then recall the procedure of gaussian elimination (row reduction)and the concept of the rank of a matrix. The material in this section roughly correspondsto [Tre, §2.2, §2.7, §2.8].

4.1 The matrix of a linear map

Choosing bases allows one to associated a matrix to any linear map between finite-dimensionalvector spaces, as we now explain.

Definition 4.1.1 (The matrix of a linear map). Suppose V and W are finite-dimensionalvector spaces over a field F , n = dimV , m = dimW , and T : V → W is a linear map. LetB = {v1, . . . , vn} be an ordered basis of V and let D = {w1, . . . , wm} be an ordered basis ofW . For each j = 1, . . . , n, write Tvj as a linear combination of y1, . . . , ym:

Tvj =m∑i=1

aijwi, j = 1, . . . , n.

Then the m × n matrix [aij] is called the matrix of T relative to the bases v1, . . . , vn andw1, . . . , wm and is denoted [T ]DB .

Remark 4.1.2. (a) It is important that the bases be ordered. Changing the order of vectorsin the bases will change the matrix.

(b) We will sometimes use the notation [T ] instead of [T ]DB when we’ve fixed the bases Band D and there is no chance of confusion.

(c) A given linear map can have different matrices (if you pick different bases).

Recall that if V is an n-dimensional space, and B = {v1, . . . , vn} is an ordered basis of

62

4.1. The matrix of a linear map 63

V , then we have an isomorphism

CB : V → F n, CB

(n∑i=1

civi

)=

c1...cn

∈ F n.

So CB is the coordinate function that takes the coordinates of a vector relative to the basisB. If D is an ordered basis of W and T ∈ L(V,W ), we have the following diagram.

VT //

CB��

W

CD��

F n

C−1B

OO

CDTC−1B

// Fm

C−1D

OO

We know from MAT 1341 that a linear map F n → Fm corresponds to multiplication by amatrix. (In MAT 1341, F was R or C, but the argument is the same in general.) So CDTC

−1B

corresponds to multiplication by some matrix, and this is the matrix [T ]DB . This gives us anisomorphism

L(V,W )→ L(F n, Fm), T 7→ CDTC−1B ,

and an isomorphismL(V,W )→Mm,n(F ), T 7→ [T ]DB .

We often simply identify an m× n matrix with the corresponding map Fm → F n (given bymultiplication by that matrix) and so we write [T ]DB = CDTC

−1B .

Note that

CD(Tvj) =

aija2j...amj

,and so

[T ]DB =[CD(Tv1) CD(Tv2) · · · CD(Tvn)

].

Example 4.1.3. Let S : F n → Fm be a linear map and choose the standard bases B ={e1, . . . , en} and D = {e1, . . . , em}. Then CB : F n → F n and CD : F n → F n are the identitymaps. Thus

[T ]DB =[Se1 Se2 · · · Sen

]is the standard matrix of the linear transformation (as you saw in MAT 1341).

Recall (from MAT 1341) that, for a matrix A, colA denotes the column space of A (thespan of the columns of A).

Proposition 4.1.4. With the same notation as above, we have

(a) [T ]CB(v) = CDT (v) for all v ∈ V ,

(b) CB(KerT ) = Ker[T ] or KerT = C−1B (Ker[M ]), and

64 Chapter 4. Matrices

(c) CD(ImT ) = Im[T ] = col[T ] or ImT = C−1D (col[T ]).

Proof. These statements all follow from the equation [T ] = CDTC−1B and the fact that CB

and CD are isomorphisms.

Example 4.1.5. Let T : P3(R) → P2(R) be the linear map given by T (p) = p′ − p′′. Chooseordered bases B = {1, t, t2, t3} and D = {1, t, t2} of P3(R) and P2(R) respectively. Then

[T ] =[CDT (1) CDT (t) CDT (t2) CDT (t3)

]=[CD(0) CD(1) CD(2t− 2) CD(3t2 − 6t)

]=

0 1 −2 00 0 2 −60 0 0 3

.Now let’s find the kernel and image of T . Using techniques from MAT 1341, we know that

Ker[T ] = Span

1000

, col[T ] = R3.

Therefore

KerT = Span

C−1B

1000

= Span{1},

andImT = C−1D (R3) = P2(R).

Note that if we have a linear map T : V → V from some vector space V to itself, weoften use the same bases for V as the domain and as the codomain. But this is not alwaysthe case.

Example 4.1.6. Consider the bases B = {e1, e2} and D = {(1, 1), (1,−1)} of R2 and letI : R2 → R2 be the identity map. Then

[I]DB =[CDI(e1) CDI(e2)

]=[CD(e1) CD(e2)

].

To finish the calculation, we need to find CD(e1) and CD(e2). In other words, we need towrite e1 and e2 in the basis D. We have

e1 = (1, 0) =1

2(1, 1) +

1

2(1,−1), e2 =

1

2(1, 1)− 1

2(1,−1).

Thus

[I]DB =

[1/2 1/21/2 −1/2

].

4.1. The matrix of a linear map 65

Exercises.

4.1.1. Choose the same basis for P2(R) and P3(R) as in Example 4.1.5 and let S : P2(R)→P3(R) be the linear map defined by letting S(p) be the antiderivative of p with constantterm zero. Find the matrix [S] for S and check that [T ]CB(p) = CDT (p) for all p ∈ P2(R).

4.1.2 ([Ber14, Ex. 4.2.1]). Lev V be a 3-dimensional vector space, let x1, x2, x3 be a basis ofV , and let T ∈ L(V ) be the linear map given by

Tx1 = x1, Tx2 = x1 + x2, Tx3 = x1 + x2 + x3.

Find the matrices of T and T−1 relative to the basis x1, x2, x3.

4.1.3. Let T : R2 → R3 be the linear map defined by

T (x, y) = (3x+ 2y, x− y, 4x+ 5y).

Find the matrix of T relative to the canonical bases of R2 and R3.

4.1.4 ([Ber14, Ex. 4.2.4]). Let V = P3(R) be the vector space of real polynomials of degree≤ 3 (see Example 1.2.8), and let D : V → V be the linear map Dp = p′ (the derivative ofp). Find the matrix of D relative to the basis 1, t, t2, t3 of V .

4.1.5 ([Ber14, Ex. 4.2.5]). Fix y ∈ R3 and define

T : R3 → R3, Tx = x× y

(cross product of vectors). Find the matrix of T relative to the canonical basis of R3.

4.1.6 ([Ber14, Ex. 4.2.7]). If A = (ai,j) is an n× n matrix, the trace of A is

tr(A) =n∑i=1

ai,i.

Prove that tr : Mn(F )→ F is a linear form on the vector space Mn(F ) of all n× n matricesover the field F .

4.1.7 ([Ber14, Ex. 4.2.8]). Let V be an n-dimensional vector space over F with basis x1, . . . , xn,and let f : V → F be a linear form on V . Describe the matrix of f relative to the basisx1, . . . , xn of V and the basis 1 of F .

4.1.8 ([Ber14, Ex. 4.2.9]). Recall (from MAT 1341) that if A = (ai,j) is an m×n matrix overF , the transpose of A, denoted At, is the n ×m matrix (bi,j), where bi,j = aj,i. Prove thatA 7→ At is a vector space isomorphism Mm,n(F )→Mn,m(F ).


4.1.9 ([Ber14, Ex. 4.2.10]). Let V = W = R2. Choose the basis B = {x1, x2} of V , wherex1 = (2, 3), x2 = (4,−5) and choose the basis D = {y1, y2} of W , where y1 = (1, 1),y2 = (−3, 4). Find the matrix of the identity linear mapping I : V → W with respect tothese bases.

4.1.10. Suppose U , V and W are vector spaces (over the same field) with ordered bases B,D, and E respectively. Suppose we have linear maps

UT−→ V

S−→ W.

Show that [ST ]EB = [S]ED · [T ]DB (where the product on the right hand side is matrix mul-tiplication). Note that we must use the same basis for the intermediate space V in bothmatrices.

4.1.11. Suppose V is a vector space with two ordered bases B and D. Show that a linearmap T : V → V is invertible if and only if the matrix [T ]DB is invertible. Furthermore, if T is

invertible, show that [T−1]BD =([T ]DB

)−1. Hint : Use Exercise 4.1.10.

4.2 Change of bases and similar matrices

Here we specialize to linear maps T : V → V from a vector space to itself and assume V isfinite dimensional. What is the relationship between the matrices of T in different bases?Recall that

[T ]BB = CBTC−1B , and so C−1B [T ]BBCB = T.

Now suppose D is another ordered basis of V . Then

C−1B [T ]BBCB = T = C−1D [T ]DDCD,

and so[T ]DD = CDC

−1B [T ]BBCBC

−1D .

Let P = CBC−1D . Then P−1 = CDC

−1B and so

[T ]DD = P−1[T ]BBP.

Now, if B = {v1, . . . , vn} and D = {w1, . . . , wm}, then

CD(wj) = ej, and so wj = C−1D (ej).

ThereforePej = CBC

−1D (ej) = CB(wj).

Since Pej is just the j-th column of P , we see that

The j-th column of P gives the coordinates of wj in the basis B.

Thus, if P = [Pij], then wj =∑n

i=1 Pijvi. The matrix P is called the change of basis matrixfrom B to D.

4.2. Change of bases and similar matrices 67

Example 4.2.1. Suppose V = R2 and T : R2 → R2 is the linear map given by multiplicationby the matrix [

2 11 2

].

Let B = {e1, e2} be the standard basis and D = {w1, w2} where w1 = (1, 1) and w2 = (1,−1).Then

[T ]BB =[CBT (e1) CBT (e2)

]=

[2 11 2

]is the standard matrix of T since B is the standard basis of R2. However,

[T ]DD =

[CD

(T

[11

])CD

(T

[1−1

])]=

[CD

([33

])CD

([1−1

])]=

[3 00 1

],

a diagonal matrix!Note that Pej = CBC

−1D (ej), so

Pe1 = CBC−1D (e1) = CB(w1) =

[11

], P e2 = CBC

−1D (e2) = CB(w2) =

[1−1

].

Thus

P =

[1 11 −1

],

and

P−1 =−1

2

[−1 −1−1 1

].

Therefore

P−1[T ]BBP =1

2

[1 11 −1

] [2 11 2

] [1 11 −1

]=

1

2

[1 11 −1

] [3 13 −1

]=

1

2

[6 00 2

]=

[3 00 1

]= [T ]DD.

Remark 4.2.2. In this particular case, w1 and w2 are eigenvectors of the matrix

[2 11 2

], with

eigenvalues 2 and 1, respectively.

P−1[2 11 2

]P =

[3 00 1

]is diagonal and the columns of P are w1 and w2 (which are CB(w1) and CB(w2) since B isthe standard basis!), and the diagonal entries of [T ]DD are the eigenvalues. The whole idea ofchanging bases is (usually) to simplify [T ]BB as much as possible.

Definition 4.2.3 (Similar matrices). Two n × n matrices X and Y are similar if there isan invertible matrix P such that P−1XP = Y .

Examples 4.2.4.

The matrix

[2 11 2

]is similar to

[3 00 1

], with P =

[1 11 −1

].


The matrix

[2 11 2

]is similar to

[2 11 2

], with P =

[1 00 1

].

The matrix

[2 11 2

]is similar to P−1

[2 11 2

]P and Q

[2 11 2

]Q−1 for any P (or Q).

Remark 4.2.5. (a) Similarity is an equivalence relation on matrices (Exercise 4.2.5).

(b) Over an infinite field, a given matrix is usually similar to infinitely many matrices.However, there are exceptions. For example, any matrix similar to the identity matrixIn is of the form P−1InP = P−1P = In. So In is only similar to itself. The same istrue of cIn for any scalar c.

The next theorem explains why we used the word ‘transpose’ in Definition 3.7.7.

Theorem 4.2.6. Suppose T : V → W is a linear map between finite-dimensional vectorspaces and T ∗ : W ∗ → V ∗ is the transpose map. If B and D are ordered basis of V and Wrespectively and B∗ and D∗ are the corresponding dual bases, then(

[T ]DB)t

= [T ∗]B∗

D∗ ,

where At denotes the transpose of the matrix A (see Exercise 4.1.8).

Proof. Let

B = {v1, . . . , vn}, D = {w1, . . . , wm}, X = [T ]DB = [Xij],

B∗ = {f1, . . . , fn}, D∗ = {g1, . . . , gm}, Y = [T ∗]B∗

D∗ = [Yij].

Then, for all 1 ≤ i ≤ n, 1 ≤ j ≤ m, we have

(T ∗gj)(vi) = (gjT )(vi) = gj(Tvi) = gj

(m∑k=1

Xkiwk

)= Xji.

On the other hand,

(T ∗gj)(vi) =

(n∑l=1

Yljfl

)(vi) = Yij.

Thus Xji = Yij for all 1 ≤ i ≤ n and 1 ≤ j ≤ m. Therefore X t = Y .

Let’s do one final example on matrices associated to linear maps. Suppose one knows thematrix for a linear map in some bases. How do you compute the action of the linear map?

Example 4.2.7. Choose the ordered basis B = {1, t, t2} of P2(R) and D = {e11, e12, e21.e22}of M2(R) (here eij is the matrix with a one in the (i, j)-position and zeroes everywhere else).Suppose a linear map T : P2(R)→M2(R) has matrix

[T ]DB =

1 0 00 2 11 1 01 5 −1

.

4.2. Change of bases and similar matrices 69

What is T (t2 − 1)?Recall that [T ]DB = CDTC

−1B . Thus T = C−1D [T ]DBCB. So

T (t2 − 1) = C−1D [T ]DBCB(t2 − 1)

= C−1D [T ]DB

−101

= C−1D

1 0 00 2 11 1 01 5 −1

−1

01

= C−1D

−11−1−2

= −e11 + e12 − e21 − 2e22

=

[−1 1−1 −2

]

Exercises.

4.2.1 ([Tre, Ex. 2.8.2]). Consider the vectors

(1, 2, 1, 1), (0, 1, 3, 1), (0, 3, 2, 0), (0, 1, 0, 0).

(a) Prove that these vectors form a basis in R4.

(b) Find the change of basis matrix that changes from this basis to the standard basis ofR4.

4.2.2 ([Tre, Ex. 2.8.3]). Find the change of basis matrix that changes from the basis 1, 1 + tof P2(R) to the basis 1− t, 2t (see Example 1.2.8).

4.2.3 ([Tre, Ex. 2.8.4]). Consider the linear map

T : C2 → C2, T (x, y) = (3x+ y, x− 2y).

Find the matrix of T in the standard basis and also in the basis (1, 1), (1, 2).

4.2.4. Let

A =

[1 1−1 1

], B = {e1, e2}, D = {(3, 1), (−2, 1)}, T (v) = Av.

Find [T ]DD and show that A is not diagonalizable.

4.2.5. Fix a positive integer n. Prove that similarity is an equivalence relation on the set ofn× n matrices. (See Definition B.1.1 for the definition of equivalence relation.)


4.3 Gaussian elimination

In this section and the next we review some material from MAT 1341. Recall that youlearned how to row reduce a matrix. This procedure is called Gaussian elimination. Youused the following operations:

Definition 4.3.1 (Elementary row/column operations). The following are called elementaryrow operations on a matrix A ∈Mn(F ).

• Type I : Interchange two rows of A.

• Type II : Multiply any row of A by a nonzero element of F .

• Type III : Add a multiple of one row of A to another row of A.

Replacing the word “row” by “column” everywhere above gives the elementary column ope-rations .

Definition 4.3.2 (Elementary matrices). An n× n elementary matrix is a matrix obtainedby performing an elementary row/column operation on the identity matrix In. In particular,we define the following elementary matrices:

• For 1 ≤ i, j ≤ n, i 6= j, we let Pi,j be the elementary matrix obtained from In byinterchanging the i-th and j-th rows (equivalently, columns).

• For 1 ≤ i ≤ n and a ∈ F×, we let Mi(a) be the elementary matrix obtained from Inby multiplying the i-th row (equivalently, column) by a.

• For i ≤ i, j ≤ n, i 6= j, and a ∈ F , we let Ei,j(a) be the elementary matrix obtainedfrom In by adding a times row j to row i (equivalently, adding a times column i tocolumn j).

The type of the elementary matrix is the type of the corresponding row/column operationperformed on In.

Lemma 4.3.3. (a) Every elementary matrix is invertible and the inverse is an elementarymatrix of the same type.

(b) Performing an elementary row operation on a matrix A is equivalent to multiplying Aon the left by the corresponding elementary matrix.

(c) Performing an elementary column operation on a matrix A is equivalent to multiplyingA on the right by the corresponding elementary matrix.

Lemma 4.3.4. You saw this in MAT 1341, and so we will omit the proof here.

Definition 4.3.5 (Row-echelon form). A matrix R ∈Mm,n(F ) is in row-echelon form if:

(a) all nonzero rows are above all zero rows, and

(b) the leading coefficient (the first nonzero entry from the left) of a nonzero row is strictlyto the right of the leading coefficient of the row above it.

4.4. The rank of a matrix 71

Lemma 4.3.6. Every matrix A ∈Mm,n(F ) can be transformed to a row-echelon form matrixR by performing row operations of type I and III. Equivalently, there exist finitely manyelementary matrices E1, E2, . . . , Ek such that R = E1E2 · · ·EkA.

Proof. You saw this in MAT 1341, and so we will omit the proof here.

Theorem 4.3.7. By performing both row and column operations, every matrix A ∈Mm,n(F )can be transformed to a block matrix of the form

D =

[Ir 0r×(n−r)

0(m−r)×r 0(m−r)×(n−r)

]. (4.1)

Exercises.

4.3.1. Using row and column operations, reduce the following matrices to the form indicatedin Theorem 4.3.7:

(a)

[1 2 3−2 −4 −6

]

(b)

2 11 13 2

(c)

1 1 00 1 21 2 1

4.4 The rank of a matrix

Definition 4.4.1 (Rank of a matrix). The rank of A ∈ Mm,n(F ), denoted rank(A) is therank of the linear map

T : F n → Fm

given by matrix multiplication (equivalently, such that [T ] = A with respect to the standardbases of F n and Fm).

Lemma 4.4.2. A matrix A ∈Mn(F ) is invertible if and only if rank(A) = n.

Proof. Let T : F n → F n be the linear map given by matrix multiplication by A. Thus[T ] = A, with respect to the standard basis of F n. Then T is invertible if and only if A isinvertible. Thus

rank(A) = n ⇐⇒ rank(T ) = n (by Definition 4.4.1)

⇐⇒ dimT (V ) = n (by Definition 3.5.2)

⇐⇒ T is invertible (by Corollary 3.5.3)

⇐⇒ A is invertible.


Lemma 4.4.3. A matrix A ∈Mn(F ) is invertible if and only if it can be written as a productof elementary matrices.

Proof. Since elementary matrices are invertible, it is clear that if A can be written as aproduct of elementary matrices, then A is invertible.

Now suppose that A is invertible. Then, by Lemma 4.4.2, rank(A) = n. Thus, thematrix D of Theorem 4.3.7 is the identity matrix In. Thus, there exists elementary matricesE1, . . . , Ep and F1, . . . , Fq such that

E1 · · ·EpAF1 · · ·Fq = In.

Hence

A = E−1p · · ·E−11 F−1q · · ·F−11 .

Since the inverse of an elementary matrices is an elementary matrix, it follows that A canbe written as a product of elementary matrices.

Lemma 4.4.4. If A ∈Mm,n(F ) and B ∈Mn,k(F ), then

rank(AB) ≤ rank(A) and rank(AB) ≤ rank(B).

Proof. Let TA : F n → Fm be the linear map given by matrix multiplication (on the left) byA, and let TB : F k → F n be the linear map given by matrix multiplication (on the left) byB. Then

rank(AB) = rank(TA ◦ TB) (by definition)

= dim(Im(TA ◦ TB)) (by definition)

≤ dim(ImTB) (by Exercise 3.4.8)

= rank(B).

Corollary 4.4.5. Let A ∈Mm,n(F ). If P and Q are invertible m×m and n× n matrices,respectively, then

rank(PA) = rank(A) = rank(AQ).

Proof. By Lemma 4.4.4, we have

rank(PA) ≤ rank(A)

and

rank(A) = rank(P−1PA

)≤ rank(PA).

Hence rank(PA) = rank(A). The proof that rank(AQ) = rank(A) is similar.

Lemma 4.4.6. Let T : V → W be a linear map between finite-dimensional vector spaces.Let B be an ordered basis of V and let D be an ordered basis of W . Then

rank(T ) = rank([T ]DB

).

4.4. The rank of a matrix 73

Proof. Let n = dimV and m = dimW . Let SV : F n → V be the isomorphism that sendsthe standard basis of F n to the basis B, and let SW : W → Fm be the isomorphism thatsends the basis D to the standard basis of F . Let

U = SW ◦ T ◦ SV : F n → Fm

be the composition. Then [T ]DB = [U ] (where [U ] denotes the matrix of U with respect tothe standard bases). Then

rank(T ) = rank(S−1W ◦ U ◦ S−1V ) (since U = SW ◦ T ◦ SV )

= rank(U) (by Corollary 4.4.5)

= rank([U ]) (by Definition 4.4.1)

= rank([T ]DB).

Lemma 4.4.6 tells us that to compute the rank of any linear transformation, you cancompute the rank of its matrix in any basis. So we reduce the problem to matrix calculationsyou learned in MAT 1341.

Lemma 4.4.7. The rank of a matrix A is equal to the dimension of the column space of A.

Proof. Let A ∈ Mn(F ) and let T : F n → F n be the matrix given by matrix multiplication(on the left). Then T (ej) is the j-th column of A. Thus we have

rank(A) = dim(Im(T )) = dim(Span{T (e1), . . . , T (em)}) = dim(col(A)).

Lemma 4.4.8. Suppose A ∈ Mm,n(F ) has row-echelon form R and block matrix D fromTheorem 4.3.7. Then

rank(A) = rank(R) = rank(D) = r.

Proof. The proof of this lemma is left as Exercise 4.4.1.

Recall that the row space of a matrix is the span of its rows.

Lemma 4.4.9. Suppose A ∈Mm,n(F ).

(a) We have rank(A) = rank(At).

(b) The rank of A is equal to the dimension of the row space of A.

(c) The row space and column space of A have the same dimension.

Proof. (a) Let D be the matrix from Theorem 4.3.7. Thus we have elementary matricesE1, . . . , Ep and F1, . . . , Fq such that

E1 · · ·EpAF1 · · ·Fq = D.

Note that Dt = D. Thus we have

rank(A) = rank(D) (by Corollary 4.4.5)

= rank(Dt)

= rank(F tq · · ·F t

1AtEt

p · · ·Et1)

= rank(At). (by Corollary 4.4.5)


(b) The row space of A is equal to the column space of At. Thus, the result follows frompart (a) and Lemma 4.4.7.

(c) This follows from part (b) and Lemma 4.4.7.

Exercises.

4.4.1. Prove Lemma 4.4.8.

4.4.2. Prove that similar matrices have the same rank.

4.4.3. Suppose A and B are n × n matrices. Prove that, if AB is invertible, then A and Bare both invertible. Do not use determinants, since we have not seen them yet. Hint : UseLemma 4.4.4.

4.4.4. Suppose A is an n ×m matrix with entries in F . Recall (from MAT 1341), that thenull space of A is

KerA = {x ∈ Fm | Ax = 0}.

Prove that rank(A) + dim(KerA) = n.

4.4.5. Is it possible for a matrix A = M3(R) to have column space spanned by (1, 1, 1) andnull space spanned by (3,−1, 2)?

Chapter 5

Determinants and multlinear maps

You saw determinants of matrices in MAT 1341, and learned how to compute them. We nowrevisit this topic in more depth. In particular, we explain why the determinant is definedthe way it is (or, equivalently, can be computed in the way you learned) by proving that thedeterminant is the only function on matrices satisfying three very natural properties. Thematerial in this section roughly corresponds to [Tre, Ch. 3].

5.1 Multilinear maps

Suppose V1, . . . , Vn and W are vector spaces over a field F . A multilinear map

f : V1 × · · · × Vn → W

is a map that is linear separately in each variable. In other words, for each i = 1, . . . , n, ifall the variables other than vi are held constant, then f(v1, . . . , vn) is a linear function of vi.

Examples 5.1.1. (a) The map

f : V × · · · × V︸︷︷︸n factors

→ V, f(v1, . . . , vn) = v1 + · · ·+ vn

is multilinear.

(b) The zero map V1 × · · · × Vn → W is multilinear.

A multlinear mapf : V × · · · × V︸︷︷︸

n factors

→ F

is called an n-linear form. If n = 2, this is called a bilinear form.

Example 5.1.2. For every bilinear form f : V × V → F , we have

f(c1v1 + c2v2, u) = c1f(v1, u) + c2f(v2, u) and

f(u, c1v1 + c2v2) = c1f(u, v1) + c2f(u, v2)

for all c1, c2 ∈ F and u, v1, v2 ∈ V .

75

76 Chapter 5. Determinants and multlinear maps

Each row of a matrix A ∈ Mn(F ) can be viewed as an element of the vector space F n.Therefore, as a vector space, we can identify Mn(F ) with

F n × · · · × F n︸︷︷︸n factors

,

where the first factor of F n corresponds to the first row of the matrix, etc. Then a mapf : Mn(F )→ F corresponds to a map

f : F n × · · · × F n︸︷︷︸n factors

→ F,

and it makes sense to ask whether or not this map is n-linear.

Examples 5.1.3. (a) The function

f : Mn(F )→ F, f(A) = a1,1a2,2 · · · an,n, A = (ai,j),

is n-linear. (See Exercise 5.1.2.)

(b) The function

tr : Mn(F )→ F, tr(A) = a1,1 + a2,2 + · · ·+ an,n, A = (ai,j),

is not n-linear. The value tr(A) is called the trace of A. (See Exercise 5.1.2.)

Exercises.

5.1.1. Prove that the cross product map

f : R3 × R3, f(u, v) = u× v

is bilinear.

5.1.2. Prove the statements made in Examples 5.1.3.

5.1.3. Suppose thatf : V1 × · · · × Vn → W

is a multilinear map. Prove that f(v1, . . . , vn) = 0 whenever vi = 0 for some 1 ≤ i ≤ n.

5.1.4. Let V be a vector space with dual space V ∗. Show that the map

f : V × V ∗ → F, f(v, ϕ) = ϕ(v),

is bilinear.

5.2. The determinant 77

5.2 The determinant

Given A ∈Mn(F ) for n ≥ 2, we let Ai,j denote the matrix obtained from A by deleting thei-th row and j-th column. We will denote (i, j) entry of the matrix A by ai,j.

Definition 5.2.1 (Determinant). We define a function

det : Mn(F )→ F

recursively (with respect to n) as follows. For n = 1 we define

det(A) := a1,1.

For n ≥ 2, we define

det(A) :=n∑i=1

(−1)i+1 det (Ai,1) · ai,1. (5.1)

The function det is called the determinant function. The value det(A) is called thedeterminant of A, and is also denoted detA or |A|. The scalar (−1)i+j det(Ai,j) is called thecofactor of the entry of A in position (i, j). The formula (5.1) is called cofactor expansionalong the first column of A.

Example 5.2.2. For 2× 2 matrices, we have

det

[a1,1 a1,2a2,1 a2,2

]= a1,1a2,2 − a1,2a2,1.

Recall that In is the identity matrix.

Lemma 5.2.3. We have det(In) = 1.

Proof. We prove the result by induction on n. Since

det(I1) = det[1] = 1,

the result holds for n = 1.Now assume that det(In) = 1 for some n ≥ 1. Noting that

(In+1

)1,1

= In, we use (5.1)

to compute

det(In+1) = (−1)2 det(In) · 1 + (−1)3 det(In+1

)2,1· 0 + · · ·+ (−1)n+2 det

(In+1

)n+1,1

· 0 = 1.

Recall from Section 5.1 that det : Mn(F )→ F can be viewed as a function

det : F n × · · · × F n︸︷︷︸n factors

→ F.

Theorem 5.2.4. The determinant function det : Mn(F )→ F is n-linear.


Proof. We prove the result by induction on n. The base case n = 1 is obvious.Now assume that n > 1 and that det : Mn−1(F ) → F is (n − 1)-linear. Let A = (aij) ∈

Mn(F ). Suppose that for the r-th row of A we have

(ar,1, . . . , ar,n) = (b1, . . . , bn) + k(c1, . . . , cn), k ∈ F.

In other words, we havear,i = bi + kci, 1 ≤ i ≤ n.

Let B denote the matrix obtained from A by replacing the r-th row by (b1, . . . , bn) and letC denote the matrix obtained from A by replacing the r-th row by (c1, . . . , cn).

We have

det(A)(5.1)=

n∑i=1

(−1)i+1 det(Ai,1) · ai,1

=∑

1≤i≤ni 6=r

(−1)i+1 det(Ai,1) · ai,1 + (−1)r+1 det(Ar,1) · ar,1

=∑

1≤i≤ni 6=r

(−1)i+1 (det(Bi,1) + k det(Ci,1)) · ai,1 + (−1)r+1 det(Ar,1) · (br,1 + kcr,1)

=n∑i=1

(−1)i+1 det(Bi,1) · bi,1 + kn∑i=1

(−1)i+1 det(Ci,1) · ci,1

(5.1)= det(B) + k det(C),

where we used the induction hypothesis in the third equality. This completes the proof ofthe induction step.

Example 5.2.5. The determinant of a 2× 2 matrix is a bilinear form

det : F 2 × F 2 → F

given by

det((x1, x2), (y1, y2)

)= det

[x1 x2y1 y2

]= x1y2 − x2y1.

The following lemma will be generalized in Lemma 5.3.1. However, our proof of Lemma 5.3.1will depend indirectly on the following special case. Thus, we need to prove it independentlyto avoid circular logic.

Lemma 5.2.6. Suppose the matrix B is obtained from A ∈Mn(F ), n ≥ 2, by interchangingtwo neighbouring rows. Then

detB = − detA.

Proof. The proof of this lemma is left as Exercise 5.2.1.

Lemma 5.2.7. If a matrix A ∈Mn(F ), n ≥ 2, has two identical rows, then det(A) = 0.

5.2. The determinant 79

Proof. By successively swapping neighbouring rows, we can turn A into a matrix B wherethe two identical rows are neighbouring. By Lemma 5.2.6, we have det(A) = ± det(B). Now,since swapping the two neighbouring identical rows of B leaves B unchanged, Lemma 5.2.6tells us that

det(B) = − det(B).

Thus det(A) = ± det(B) = 0.

Exercises.

5.2.1. Prove Lemma 5.2.6. Hint : Use (5.1) and induction on n.

5.2.2. If A is an n×n matrix and c is a scalar, how are the determinants det(A) and det(cA)related?

5.2.3 ([Tre, Ex. 3.3.2]). How are the determinants of A and B related if

(a)

A =

a1 a2 a3b1 b2 b3c1 c2 c3

, B =

2a1 3a2 5a32b1 3b2 5b32c1 3c2 5c3

;

(b)

A =

a1 a2 a3b1 b2 b3c1 c2 c3

, B =

3a1 4a2 + 5a1 5a33b1 4b2 + 5b1 5b33c1 4c2 + 5c1 5c3

.5.2.4 ([Tre, Ex. 3.3.4]). A square (n×n) matrix is called skew-symmetric (or antisymmetric)if At = −A. Prove that if A is skew-symmetric and n is odd, then det(A) = 0. Is this truefor even n?

5.2.5 ([Tre, Ex. 3.3.8]). Show that

det

1 a a2

1 b b2

1 c c2

= (c− a)(c− b)(b− a).

This is a particular case of the so-called Vandermonde determinant .


5.3 Characterizing properties of the determinant

From Theorem 5.2.4, Lemma 5.2.7, and Lemma 5.2.3, we see that the determinant has thefollowing properties:

(D1) The function det is n-linear.

(D2) We have det(A) = 0 whenever A has two identical rows.

(D3) We have det(In) = 1.

We will soon show that these properties uniquely characterize the determinant. In otherwords, the determinant is the only map Mn(F ) → F satisfying properties (D1)–(D3). Todo this, we will prove some facts about the determinant using only properties (D1)–(D3).Therefore, these facts are true for any other map with these properties.

We begin by analyzing the effects of the elementary row operations on the determinant.

Lemma 5.3.1 (Determinant under row operation of type I). Suppose the matrix B is obtai-ned from A ∈Mn(F ) by interchanging two rows. Then

det(B) = − det(A).

In particular det(Pi,j) = −1.

Proof. Suppose B is obtained from A by interchanging rows i and j. Let u and v be the i-thand j-th rows of A, respectively. Define the following matrices:

• C is the matrix whose i-th and j-th rows are both u+ v,

• M is the matrix whose i-th and j-th rows are both u,

• N is the matrix whose i-th and j-th rows are both v.

Then we have

0 = det(C) (by (D2))

= det(A) + det(B) + det(M) + det(N) (by (D1))

= det(A) + det(B) + 0 + 0. (by (D2))

Thus det(B) = − det(A). Since Pi,j is obtained from the identity matrix by interchangingthe i-th and j-th rows, we have

det(Pi,j) = − det(In) = −1.

Lemma 5.3.2 (Determinant under row operation of type II). Assume that a matrix B isobtained from a matrix A by multiplying a row of A by a scalar a ∈ F . Then

det(B) = a · det(A).

In particular, det(Mi(a)) = a, for a ∈ F .

5.3. Characterizing properties of the determinant 81

Proof. The first assertion follows immediately from (D1). Since Mi(a) is obtained from theidentity matrix by multiplying the i-th row by a, we have

det(Mi(a)) = a det(In) = a · 1 = a.

Lemma 5.3.3 (Determinant under row operation of type III). Let a ∈ F . Assume that amatrix B is obtained from a matrix A by adding a times the j-th row to the i-th row. Then

det(B) = det(A).

In particular, det(Ei,j(a)) = 1.

Proof. Let C be the matrix obtained from A by replacing its i-th row with the j-th row ofA. In particular, C has two identical rows (namely, its i-th and j-th rows are the same).Then

det(B) = det(A) + a det(C) (by (D1))

= det(A) + 0. (by (D2))

Lemma 5.3.4. If A ∈Mn(F ) has rank(A) < n, then det(A) = 0.

Proof. By Lemma 4.3.6, we can use row operations of type I and III to transform the matrixA into a matrix B in row-echelon form. By Lemma 5.3.1 and 5.3.3, we have det(A) =± det(B). If rank(A) < n, then B has a zero row. Therefore, det(B) = 0 by (D1) (seeExercise 5.1.3).

Theorem 5.3.5 (The determinant is multiplicative). For any A,B ∈Mn(F ), we have

det(AB) = det(A) det(B).

Proof. First suppose that rank(A) < n. Then rank(AB) ≤ rank(A) < n by Lemma 4.4.4.Hence

det(A) det(B) = 0 · det(B) = 0 = det(AB),

and we’re done.Now suppose rank(A) = n. Lemmas 5.3.1, 5.3.2, and 5.3.3 imply that the theorem holds

whenever A is an elementary matrix. Since A is an invertible matrix (by Lemma 4.4.2), itcan be written as a product A = E1E2 · · ·Ek of elementary matrices by Lemma 4.4.3. Then

det(AB) = det(E1E2 · · ·EkB)

= det(E1) det(E2 · · ·EkB)

...

= det(E1) det(E2) · · · det(Ek) det(B)

= det(E1E2) det(E3) · · · det(Ek) det(B)

...

= det(E1E2 · · ·Ek) det(B)

= det(A) det(B).


The following theorem states that the properties (D1)–(D3) uniquely characterize thedeterminant.

Theorem 5.3.6 (Characterization of the determinant). Suppose δ : Mn(F )→ F is a functionsuch that

(a) δ is n-linear,

(b) δ(A) = 0 whenever A has two identical rows,

(c) δ(In) = 0.

(In other words, suppose δ has properties (D1)–(D3).) Then δ(A) = det(A) for all A ∈Mn(F ).

Proof. First note that the results of Lemmas 5.3.1, 5.3.2, 5.3.3, 5.3.4, and Theorem 5.3.5hold for δ, since their proofs only used properties (D1)–(D3).

First, suppose A is not invertible (i.e. rank(A) < n). Then, by Lemma 5.3.4 (and itsanalogue for δ), we have

det(A) = 0 = δ(A).

Now suppose A is invertible. Then we can write A as a product A = E1E2 · · ·Ek ofelementary matrices. Then we have

det(A) = det(E1) · · · det(Ek) = δ(E1) · · · det(Ek) = δ(E1 · · ·Ek) = δ(A).

Exercises.

5.3.1 ([Tre, Ex. 3.3.5]). A square matrix A is called nilpotent if Ak = 0 for some positiveinteger k. Show that if A is a nilpotent matrix, then det(A) = 0.

5.3.2. Prove that if A and B are similar matrices, then det(A) = det(B).

5.3.3. A real square matrix Q is called orthogonal if QtQ = I (see Definition 6.2.6). Provethat is Q is an orthogonal matrix, then det(Q) = ±1.

5.4 Other properties of the determinant

We conclude this chapter by proving a few more useful properties of the determinant.

Lemma 5.4.1. A matrix A is invertible if and only if det(A) 6= 0. Furthermore, if A isinvertible, then

det(A−1

)= det(A)−1.

5.4. Other properties of the determinant 83

Proof. If A ∈Mn(F ) is not invertible, then rank(A) < n, and so det(A) = 0 by Lemma 5.3.4.Now suppose A is invertible. Then we can write A as a product A = E1E2 · · ·Ek of

elementary matrices, and we have

det(A) = det(E1E2 · · ·Ek) = det(E1) det(E2) · · · det(Ek).

By Lemmas 5.3.1, 5.3.2, and 5.3.3, the determinants of the elementary matrices are nonzero.Hence det(A) 6= 0.

To complete the proof of the lemma, note that, if A is invertible, we have

det(A) det(A−1

)= det

(AA−1

)= det(In) = 1.

Thus det (A−1) = det(A)−1.

Lemma 5.4.2. If A ∈Mn(F ), then

det(At) = det(A).

Proof. If A is not invertible, then rank(A) = rank(At) < n by Lemmas 4.4.9(a) and 4.4.2.Thus, by Lemma 5.3.4, det(A) = 0 = det(At).

Now suppose A is invertible. Then we can write A as a product A = E1E2 · · ·Ek ofelementary matrices by Lemma 4.4.3. Note that det(E) = det(Et) for any elementarymatrix E (Exercise 5.4.1). Thus,

det(At) = det(Etk · · ·Et

2Et1)

= det(Etk) · · · det(Et

2) det(Et1)

= det(E1) det(E2) · · · det(Ek)

= det(A).

Lemma 5.4.3. If A is a triangular matrix, then

det(A) = a1,1a2,2 · · · an,n

is the product of the elements on the diagonal of A.

Proof. The proof of the case where A is upper triangular is left as Exercise 5.4.2. The thelower triangular case follows from Lemma 5.4.2.

Exercises.

5.4.1. Verify directly (i.e. without using Lemma 5.4.2) that det(E) = det(Et) for any ele-mentary matrix E.

5.4.2. Prove Lemma 5.4.3 in the case that A is upper triangular. Hint : Use (5.1) andinduction on n.


5.4.3. Prove that the determinant can be computed by cofactor expansion along any row orcolumn. More precisely, if A ∈Mn(F ) with n ≥ 2, prove that

detA =n∑i=1

(−1)i+j det(Ai,j) · ai,j for all 1 ≤ j ≤ n,

and that

detA =n∑j=1

(−1)i+j det(Ai,j) · ai,j for all 1 ≤ i ≤ n.

5.4.4 ([Tre, Ex. 3.3.9]). Let A be a square matrix. Show that the block triangular matrices[I M0 A

],

[A M0 I

],

[I 0M A

],

[A 0M I

]all have determinant equal to det(A). Here M is an arbitrary matrix.

5.4.5 ([Tre, Ex. 3.3.10]). Use Exercise 5.4.4 to show that, if A and C are square matrices,then

det

[A B0 C

]= det(A) det(C).

Here B is an arbitrary matrix. Hint :[A B0 C

]=

[I B0 C

] [A 00 I

].

5.4.6 ([Tre, Ex. 3.3.11]). Let A be an m × n matrix, and let B be an n ×m matrix. Provethat

det

[0 A−B I

]= det(AB).

Hint : While it is possible to transform the matrix by row operations to a form where thedeterminant is easy to compute, the easiest way is to multiply on the right by[

I 0B I

].

5.4.7 ([Tre, Ex. 3.5.5]). Let Dn be the determinant of the matrix1 −1 01 1 −1

1. . . . . .

. . . 1 −10 1 1

.

Using cofactor expansion, show that Dn = Dn−1 +Dn−2. This (together with a computationof D1 and D2) implies that Dn is the n-th Fibonacci number.

Chapter 6

Inner product spaces

In this chapter we look at vector spaces with additional structure. That additional structureis the notion of an inner product, which can be thought of as a generalization of the dotproduct (for Rn) to arbitrary vector spaces. In particular, when one has an inner product,one can speak of the length of a vector. The material in this chapter roughly corresponds to[Tre, Ch. 5].

6.1 Definitions

Definition 6.1.1 (Inner product space). A (real) inner product space is a vector space overR together with a map

〈·, ·〉 : V × V → R, (x, y)→ 〈x, y〉

satisfying

(a) 〈v, v〉 ≥ 0 and 〈v, v〉 = 0 ⇐⇒ v = 0 for all v ∈ V ,

(b) 〈v, w〉 = 〈w, v〉 for all v, w ∈ V , and

(c) 〈c1v1 + c2v2, w〉 = c1〈v1, w〉+ c2〈v2, w〉 for all v1, v2, w ∈ V and c1, c2 ∈ R.

The real number 〈v, w〉 is called the inner product of v and w.

Remark 6.1.2. Properties (b) and (c) imply that

〈v, c1w1 + c2w2〉 = c1〈v, w1〉+ c2〈v, w2〉

for all v, w1, w2 ∈ V and c1, c2 ∈ R.

Example 6.1.3. On Rn,

〈x, y〉 := x · y (dot product)

= xty (matrix product)

is an inner product, called the canonical inner product on Rn.

85

86 Chapter 6. Inner product spaces

Example 6.1.4. On C[a, b] := {f : [a, b]→ R | f is continuous},

〈f, g〉 :=

∫ b

a

f(t)g(t) dt

is an inner product. Property (a) is the only difficult property to check. It relies on the factthat if h is a continuous real-valued function on [a, b] and h(t) ≥ 0 for all t ∈ [a, b], then∫ b

a

h(t) dt = 0 ⇐⇒ h(t) = 0 ∀ t ∈ [a, b].

Definition 6.1.5 (Norm). If (V, 〈·, ·〉) is an inner product space, then the norm of v ∈ V isdefined to be

‖v‖ :=√〈v, v〉.

Theorem 6.1.6. If V is an inner product space, then

(a) ‖v‖ = 0 ⇐⇒ v = 0 for all v ∈ V ,

(b) ‖cv‖ = |c| ‖v‖ for all c ∈ R and v ∈ V ,

(c) 〈v, w〉 = 14

(‖v + w‖2 − ‖v − w‖2) for all v, w ∈ V ( polarization), and

(d) ‖v + w‖2 + ‖v − w‖2 = 2‖v‖2 + 2‖w‖2 for all v, w ∈ V ( parallelogram law).

v

w

Proof. The proof of these statements is left as Exercise 6.1.1.

Theorem 6.1.7 (Cauchy-Schwartz Inequality). If (V, 〈·, ·〉) is an inner product space, then

|〈v, w〉| ≤ ‖v‖ ‖w‖ ∀ v, w,∈ V,

and equality holds above if and only if {v, w} is linearly dependent.

Proof. Letp(t) = 〈v + tw, v + tw〉, t ∈ R.

Then p(t) ≥ 0 for all t ∈ R. Now

p(t) = 〈v, v〉+ t〈v, w〉+ t〈w, v〉+ t2〈w,w〉 = ‖v‖2 + 2t〈v, w〉+ t2‖w‖2,

and so p is a quadratic polynomial. Since p(t) ≥ 0 for all t ∈ R, it can have at most oneroot. Recall that if a polynomial p(t) = at2 + bt+ c has at most one root, then b2 − 4ac ≤ 0(with equality if and only if p has exactly one root). Thus

b2 − 4ac = 4〈v, w〉2 − 4‖v‖2‖w‖2 ≤ 0 =⇒ |〈v, w〉| ≤ ‖v‖‖w‖

6.1. Definitions 87

and equality holds it and only if there exists a unique t0 ∈ R such that

0 = p(t0) = 〈v + t0w, v + t0w〉,

which implies that v + t0w = 0, and so {v, w} is linearly independent.

Example 6.1.8. In the setting of Example 6.1.4, we have∣∣∣∣∫ b

a

f(t)g(t) dt

∣∣∣∣ ≤√∫ b

a

f(t)2 dt ·∫ b

a

g(t)2 dt.

This is not an obvious fact at all. (Try proving it directly using methods from calculus—it’shard!)

Corollary 6.1.9 (Triangle inequality). If V is an inner product space, then for all v, w ∈ V ,

‖v + w‖ ≤ ‖v‖+ ‖w‖.

This is called the triangle inequality.

Proof. We have

‖v+w‖2 = 〈v+w, v+w〉 = ‖v‖2 +2〈v, w〉+‖w‖2 ≤ ‖v‖2 +2‖v‖ ‖w‖+‖w‖2 = (‖v‖+‖w‖)2.

Taking square roots gives the triangle inequality.

Exercises.

6.1.1. Prove Theorem 6.1.6. Hint: Replace the norms by inner products. For the last twoparts, expand both sides of each equation.

6.1.2 ([Ber14, 5.1.1]). Show that if v and w are vectors in an inner product space such that

‖v‖2 = ‖w‖2 = 〈v, w〉,

then v = w.

6.1.3. For vectors u, v in an inner product space, show that

〈u, v〉 = 0 ⇐⇒ ‖u+ v‖2 = ‖u‖2 + ‖v‖2.

6.1.4 ([Ber14, 5.1.3]). For nonzero vectors x, y in a inner product space, prove that ‖x+y‖ =‖x‖ + ‖y‖ if and only if y = cx for some c > 0. Hint : Inspect the proof of Corollary 6.1.9,then apply the second assertion of Theorem 6.1.7.


6.1.5 ([Ber14, Ex. 5.1.6]). Suppose that r1, . . . , rn are positive real numbers. Prove that

〈x, y〉 :=n∑i=1

riaibi,

for x = (a1, . . . , an), y = (b1, . . . , bn) defines an inner product on Rn (different from thecanonical inner product, unless r1 = r2 = · · · = rn = 1).

6.1.6 ([Ber14, Ex. 5.1.7]). In the inner product space of Example 6.1.4, prove that if f and gare continuously differentiable (that is, the derivatives f ′ and g′ exist and are differentiable),then

〈f, g′〉+ 〈f ′, g〉 = f(b)g(b)− f(a)g(a).

Hint : Integration by parts.

6.1.7 ([Ber14, Ex. 5.1.9]). Suppose V is an inner product space and T : V → V is a linearmap. Prove that

〈Tu, Tv〉 =1

4

(‖T (u+ v)‖2 − ‖T (u− v)‖2

)for all vectors u and v. Hint : Use Theorem 6.1.6.

6.2 Orthogonality

Definition 6.2.1 (Orthogonal and orthonormal). If V is an inner product space and v, w ∈V , we say v and w are orthogonal , and write v ⊥ w, if 〈v, w〉 = 0. We say a set {v1, . . . , vn}of vectors in V is orthogonal if

〈vi, vj〉 =

{0, i 6= j,

‖vi‖2 6= 0, i = j.

In other words, they are nonzero and pairwise orthogonal. We say the set {u1, . . . , um} isorthonormal if it is orthogonal and ‖ui‖2 = 1 for all i = 1, . . . ,m (i.e. each ui is a unitvector).

Theorem 6.2.2. If {v1, . . . , vn} is orthogonal, then it is linearly independent.

Proof. Supposec1v1 + · · ·+ cnvn = 0

for some scalars c1, . . . , cn. Then for each j = 1, . . . , n,

0 = 〈0, vj〉 =

⟨n∑i=1

civi, vj

⟩=

n∑i=1

ci〈vi, vj〉 = cj‖vj‖2.

Since ‖vj‖2 6= 0, this implies cj = 0. Since this is true for all j = 1, . . . , n, we see that theset is linearly independent.

6.2. Orthogonality 89

Definition 6.2.3 (Projection). Suppose V is an inner product space. For v ∈ V , v 6= 0,define the projection map projv : V → V by

projv u =〈u, v〉‖v‖2

v.

Note

projv u =

⟨u,

v

‖v‖

⟩v

‖v‖.

u

v‖v‖ projv(u)

Theorem 6.2.4. If U is a finite-dimensional subspace of an inner product space V , then Uhas an orthogonal basis (hence an orthonormal basis as well).

The proof of this theorem involves a process called the Gram-Schmidt algorithm.

Proof. Let {v1, . . . , vk} be any basis of U . We produce an orthogonal basis {w1, . . . , wk} ofU as follows.

w1 = v1

w2 = v2 − projw1v2

w3 = v3 − projw1v3 − projw2

v3...

wk = vk −k−1∑l=1

projwlvk.

We claim that {w1, . . . , wk} is orthogonal. Then it follows from Theorem 6.2.2 that it isindependent, and hence a basis of U (since U has dimension k).

We prove the claim by induction. For 1 ≤ n ≤ k, let P (n) be the assertion that

(a) Span{w1, . . . , wn} = Span{u1, . . . , un}, and

(b) {w1, . . . , wn} is orthogonal.

We know that P (1) is true because Span{w1} = Span{v1} (since w1 = v1) and {w1} = {v1}is orthogonal since w1 = v1 6= 0.

Now we show that, for 1 ≤ n < k, P (n) =⇒ P (n+ 1). So assume P (n) is true. In otherwords, Span{w1, . . . , wn} = Span{v1, . . . , vn}, and {w1, . . . , wn} are orthogonal. Then, for1 ≤ i ≤ n, we have

〈wn+1, wi〉 = 〈vn+1, wi〉 −

⟨n∑l=1

〈vn+1, wl〉‖wl‖2

wl, wi

⟩


= 〈vn+1, wi〉 −n∑l=1


〈wl, wi〉

= 〈vn+1, wi〉 −〈vn+1, wi〉‖wi‖2

〈wi, wi〉

= 0.

Moreover, if wn+1 = 0, then

vn+1 =n∑l=1


wl ∈ Span{w1, . . . , wn},

and so, by the induction hypothesis P (n)(a), vn+1 ∈ Span{v1, . . . , vn}. But this is impos-sible since {v1, . . . , vn+1} is linearly independent. Thus wn+1 6= 0 and so {w1, . . . , wn+1} isorthogonal. So P (n+ 1)(b) holds.

Now, sincewn+1 ∈ Span{w1, . . . , wn, vn+1} = Span{v1, . . . , vn+1},

and P (n)(a) holds, we know

Span{w1, . . . , wn+1} ⊆ Span{v1, . . . , vn+1}.

Moreover,

vn+1 = wn+1 −n∑l=1

projwlvk ∈ Span{w1, . . . , wn, wn+1},

so thatSpan{v1, . . . , vn+1} ⊆ Span{w1, . . . , wn+1}.

Thus P (n+ 1)(a) also holds. This completes the proof by induction.We can then form an orthonormal basis{

w1

‖w1‖, . . . ,

wk‖wk‖

}.

Example 6.2.5. Take V = P2(R) with the inner product

〈p, q〉 =

∫ 1

−1p(t)q(t) dt, p, q ∈ P2(R).

Note that, after restricting the domain of the polynomial functions to [−1, 1], P2(R) is asubspace of the inner product space C[−1, 1] (since polynomials are continuous) and theabove is the restriction of the inner product on C[−1, 1] (see Example 6.1.4) to P2(R).

Let {1, t, t2} be the standard basis of P2(R). We’ll find an orthonormal basis using theGram-Schmidt algorithm. We have

w1 = 1,

w2 = t− proj1 t = t− 〈t, 1〉‖1‖2

· 1


Now,

〈t, 1〉 =

∫ 1

−1t · 1 dt =

1

2t2∣∣∣∣1−1

=1

2− 1

2= 0.

Therefore,

w2 = t− 0 = t

w3 = t2 − 〈t2, 1〉‖1‖2

1− 〈t2, t〉‖t‖2

t.

Since

〈t2, 1〉 =

∫ 1

−1t2 dt =

1

3t3∣∣∣∣1−1

=1

3− −1

3=

2

3,

‖1‖2 = 〈1, 1〉 =

∫ 1

−11 · 1 dt = 2,

〈t2, t〉 =

∫ 1

−1t3 dt =

1

4t4∣∣∣∣1−1

= 0.

Thus,

w3 = t2 − 2

3· 1

2· 1 = t2 − 1

3.

Hence, {1, t, t2 − 1

3

}is an orthogonal basis of P2(R).

Now, if we wanted an orthonormal basis of P2(R), we compute

‖1‖ =√

2, ‖t‖2 =

∫ 1

−1t2 dt =

2

3=⇒ ‖t‖ =

√6

3,

and

‖w3‖ =

∫ 1

−1

(t2 − 1

3

)2

dt =

∫ 1

−1

(t4 − 2

3t2 +

1

9

)dt =

1

5t2 − 2

9t3 +

1

9t

∣∣∣∣1−1

=2

5− 4

9+

2

9=

8

45.

Let

u1 =w1

‖w1‖=

√2

2, u2 =

w2

‖w2‖=

√6

2t, u3 =

w3

‖w3‖=

3√

10

4

(t2 − 1

3

).

Then 〈ui, uj〉 = δij.

Orthogonal bases are practical, since if {u1, . . . , uk} is an orthogonal basis for U , andv ∈ U , then

v =〈v, u1〉‖u1‖2

u1 +〈v, u2〉‖u2‖2

u2 + · · ·+ 〈v, uk〉‖uk‖2

uk.


In other words,

v =k∑i=1

ciui, where ci =〈v, ui〉‖ui‖2

are the Fourier coefficients of v with respect to u1, . . . , uk. This is convenient, since it savesus time. We don’t need to solve a linear system for the coefficients c1, . . . , cn as we normallywould need to. Computing Fourier coefficients is usually less work (once the orthogonal basisis known).

Definition 6.2.6 (Orthogonal matrix). A matrix A ∈Mn(F ) is called orthogonal if AtA = I.

Lemma 6.2.7. A matrix A ∈ Mn(R) is orthogonal if and only if its columns form anorthonormal basis of Rn (with the dot product as the inner product).

Proof. Let A =[v1 · · · vn

](i.e. vi is the i-th column of A). Then

AtA =

vt1...vtn

[v1 · · · vn].

So the (i, j) entry of AtA is vtivj. Thus

AtA = In ⇐⇒ vtivj = δij ⇐⇒ {v1, . . . , vn} is orthonormal.

Since F n is n-dimensional and orthonormal sets are linearly independent, the result follows.

Definition 6.2.8 (Orthogonal complement). Let U be a subset of an inner product spaceV . Then we define

U⊥ := {v ∈ V | 〈v, u〉 = 0 ∀ u ∈ U}.

If U is a subspace (as opposed to just a subset), then U⊥ is called the orthogonal complementto U . The notation U⊥ is read “U perp”.

Theorem 6.2.9. Suppose V is an inner product space and U is a finite-dimensional subspaceof V . Then

(a) U⊥ is also a subspace of U , and

(b) V = U ⊕ U⊥.

The proof will use the important idea of “duality” in inner product spaces.

Lemma 6.2.10. Suppose V is an inner product space. Define ϕ : V → V ∗ by

ϕ(v)(w) = 〈v, w〉.

If V is finite dimensional, then ϕ is an isomorphism.


Proof. Suppose c1, c2 ∈ R and v1, v2 ∈ V . Then for all w ∈ V ,

ϕ(c1v1 + c2v2)(w) = 〈c1v1 + c2v2, w〉= c1〈v1, w〉+ c2〈v2, w〉= (c1ϕ(v1))(w) + c2ϕ(v2)(w).

Thusϕ(c1v1 + c2v2) = c1ϕ(v1) + c2ϕ(v2),

and so ϕ is linear. Since dimV = dimV ∗, it suffices to show that ϕ is injective. Supposeϕ(v) = 0 for some v ∈ V . Then, in particular,

0 = ϕ(v)(v) = 〈v, v〉‖v‖2,

and so v = 0. Hence ϕ is injective and therefore an isomorphism.

Corollary 6.2.11 (Riesz Representation Theorem). If V is a finite-dimensional inner pro-duct space, then for all f ∈ V ∗, there exists a unique v ∈ V such that

f(w) = 〈v, w〉, ∀ w ∈ V.

In other words, all linear forms are given by taking the inner product with some fixed vectorof V .

Proof. This follows from the fact that ϕ in Lemma 6.2.10 is bijective.

Proof of Theorem 6.2.9. The proof of part (a) is left as Exercise 6.2.1. We will prove part (b).So we need to show that U ∩ U⊥ = {0} and U + U⊥ = V . Suppose v ∈ U ∩ U⊥. Then

〈v, u〉 = 0 ∀ u ∈ U,

since v ∈ U⊥. But since we also have v ∈ U , we have 〈v, v〉 = 0. Thus v = 0. SoU ∩ U⊥ = {0}.

Now we show U + U⊥ = V . Let v ∈ V and define

f : U → R, f(u) = 〈v, u〉 ∀ u ∈ U.

Then f ∈ U∗. Now, U is a finite-dimensional inner product space (we simply restrict theinner product on V to U). Therefore, by the Riesz Representation Theorem for U , thereexists a u ∈ U such that

f(u) = 〈u, u〉 ∀ u ∈ U.

Therefore,

〈u, u〉 = 〈v, u〉 ∀ u ∈ U =⇒ 〈v − u, u〉 = 0 ∀ u ∈ U =⇒ v − u ∈ U⊥.

Sov = u+ (v − u) ∈ U + U⊥.

Since v was arbitrary, we have V = U + U⊥.


Corollary 6.2.12. If U is a subspace of a finite-dimensional inner product space V , then

dimU⊥ = dimV − dimU.

Corollary 6.2.13. Recall that if U is a subspace of an inner product space V , then V =U ⊕ U⊥ (Theorem 6.2.9(b)). Define

projU : V → V, projU(u+ x) = u, u ∈ U, x ∈ U⊥.

This map has the following properties:

(a) Im projU = U ,

(b) Ker projU = U⊥,

(c) 〈v − projU v, u〉 − 0 for all vinV ,

(d) ‖v − projU v‖ ≤ ‖v − u‖ for all u ∈ U , and equality holds if and only if u = projU v(equivalently, if and only if v ∈ U).

Proof. Statements (a) and (b) are easy. If v = u + x = projU v + x (for u ∈ U , x ∈ U⊥),then v − projU v = x ∈ U⊥, so (c) holds.

It remains to show (d). Let u = projU v, so that v = u + x, with x ∈ U⊥. Now let u beany vector in U . Then

‖v − u‖2 = ‖v − u+ u− u‖= ‖x+ (u− u)‖2

= ‖x‖2 + 2〈x, u− u〉+ ‖u− u‖2

= ‖x‖2 + ‖u− u‖2.

Thus

‖v − u‖2 = ‖v − u‖2 + ‖u− u‖2

and so

‖v − u‖2 ≥ ‖v − u‖2.

Moreover, equality holds if and only if ‖u− u‖2 = 0, i.e. u = u = projU v.

The map projU is called orthogonal projection to U . The vector projU v is the “bestapproximation” to v by a vector in U .

U

v

projU v


Remark 6.2.14. If {u1, . . . , uk} is any orthogonal basis for U , then an explicit formula forprojU is

projU v =〈v, u1〉‖u1‖2

u1 +〈v, u2〉‖u2‖2

u2 + · · ·+ 〈v, uk〉‖uk‖2

uk.

Indeed, if v′ denotes the formula on the right side, then v′ ∈ U , and a short computationshows that v − v′ ∈ U⊥. By uniqueness of the decomposition v = u+ x, u ∈ U , x ∈ U⊥, weconclude that v′ = projU v.

Corollary 6.2.15. If U is a subspace of a finite-dimensional inner product space V , then(U⊥)⊥

= U .

Proof. If u ∈ U , then 〈u, v〉 = 0 for all v ∈ U⊥. Thus, by definition, u ∈(U⊥)⊥

. So

U ⊆(U⊥)⊥

. Moreover,

dim(U⊥)⊥

= dimV − dimU⊥ = dimV − (dimV − dimU) = dimU.

Thus U =(U⊥)⊥

.

Corollary 6.2.16. The isomorphism ϕ : (U ⊕ U⊥) = V → V ∗ satisfies ϕ(U⊥)

= U0.

Proof. We have

ϕ(v) ∈ U0 ⇐⇒ ϕ(v)(u) = 0 ∀ u ∈ U ⇐⇒ 〈v, u〉 = 0 ∀ u ∈ U ⇐⇒ v ∈ U⊥.

Exercises.

6.2.1. Prove part (a) of Theorem 6.2.9.

6.2.2. Prove that vectors u, v in an inner product space are orthogonal if and only if ‖u+v‖ =‖x− v‖.

6.2.3. For vectors u, v in an inner product space, prove that ‖u‖ = ‖v‖ if and only if u + vand u− v are orthogonal.

6.2.4 ([Ber14, 5.1.2]). (Bessel’s inequality) In an inner product space, if x1, . . . , xn are pair-wise orthogonal unit vectors (that is, ‖xi‖ = 1 for all i, and xi ⊥ xj when i 6= j) then

n∑i=1

|〈x, xi〉|2 ≤ ‖x‖2.

Hint : Define ci = 〈x, xi〉, y = c1x1 + · · ·+ xnxn, and z = x− y. Show that y ⊥ z and applyExercise 6.1.3 to x = y + z to conclude that ‖x‖2 ≥ ‖y‖2.


6.2.5 ([Ber14, Ex. 5.1.5]). Suppose that x1, . . . , xn are vectors in an inner product space thatare pairwise orthogonal, that is, 〈xi, xj〉 for i 6= j.

(a) Prove (by induction) that ‖∑n

i=1 xi‖2

=∑n

i=1 ‖xi‖2.(b) Deduce an alternative proof of Theorem 6.2.2.

6.2.6 ([Ber14, Ex. 5.2.2]). Let X be a set and let V = F(X,R) (see Example 1.2.5). LetW ⊆ V be the set of all functions f ∈ V whose support

{x ∈ X | f(x) 6= 0}

is a finite subset of X. Prove the following statements:

(a) W is a subspace of V .

(b) The formula

〈f, g〉 =∑x∈X

f(x)g(x)

defines an inner product on W . (Note that, even though the set X may be infinite,the sum above has at most finitely many nonzero terms, and so is well defined.)

(c) The formula

ϕ(f) =∑x∈X

f(x)

defines a linear form on W .

(d) There does not exist g ∈ W such that ϕ(f) = 〈f, g〉 for all f ∈ W .

6.2.7 ([Ber14, Ex. 5.2.4]). If V is any inner product space (not necessarily finite dimensio-nal), then U = (U⊥)⊥ for every finite-dimensional linear subspace U . Hint : The proof ofCorollary 6.2.15 is not applicable here, since we do not assume that V is finite dimensional.The problem is to show that (U⊥)⊥ ⊆ U . If x ∈ (U⊥)⊥ and x = y + z, with y ∈ U andz ∈ U⊥, as in Theorem 6.2.9, then z = x− y ∈ (U⊥)⊥, so z ⊥ z.

6.2.8 ([Ber14, Ex. 5.2.6]). Suppose U and V are subspaces of a finite-dimensional innerproduct space. Prove the following:

(a) (U + V )⊥ = U⊥ ∩ V ⊥;

(b) (U ∩ V )⊥ = U⊥ + V ⊥.

Hint : Use Corollary 6.2.15.

6.2.9 ([Ber14, Ex. 5.2.7]). Consider R3 with the canonical inner product. If

V = Span{(1, 1, 1), (1,−1, 1)},

find V ⊥.

6.3. Adjoints 97

6.3 Adjoints

Recall that if V is an inner product space, then we have an isomorphism

ϕV : V → V ∗, ϕ(v)(w) = 〈v, w〉, ∀v, w ∈ V. (6.1)

Definition 6.3.1 (Adjoint of a linear map). If T : V → W is a linear map on a finite-dimensional inner product space, the adjoint of T is the linear map T ? : W → V definedby

T ? = ϕ−1V ◦ T∗ ◦ ϕW ,

where T ∗ : W ∗ → V ∗ is the transpose map of T from Definition 3.7.7.

V WT ?oo

ϕW

��V ∗

ϕ−1V

OO

W ∗T ∗oo

Proposition 6.3.2. If T : V → V is a linear map on a finite-dimensional inner productspace, then

〈Tv, w〉 = 〈v, T ?w〉 ∀ v, w,∈ V. (6.2)

Proof. Let ϕ = ϕV : V → V ∗ be isomorphism of 6.1. Then T ? = ϕ−1 ◦ T ∗ ◦ ϕ and soϕ ◦ T ? = T ∗ ◦ ϕ. Now, for all v, w ∈ V , we have

(ϕ ◦ T ?)(w)(v) = (ϕ(T ?w))(v) = 〈T ?w, v〉 = 〈v, T ?w〉,

while(T ∗ ◦ ϕ)(w)(v) = T ∗(ϕ(w))(v) = ϕ(w)(Tv) = 〈w, Tv〉 = 〈Tv, w〉.

Thus〈Tv, w〉 = 〈v, T ?w〉 ∀ v, w,∈ V.

Proposition 6.3.3. If V is a finite-dimensional inner product space, then property (6.2)characterizes the adjoint. In others, if S : V → V satisfies

〈Tv, w〉 = 〈v, Sw〉 ∀ v, w ∈ V,

Then S = T ?.

Proof. We have

〈v, Sw〉 = 〈v, T ?w〉 ∀ v, w ∈ V ⇐⇒ 〈v, (S − T ?)w〉 = 0 ∀ v, w ∈ V⇐⇒ (S − T ∗)w = 0 ∀ w ∈ V ⇐⇒ S = T ?.

Remark 6.3.4. Often in the literature, the same symbol is used for the transpose and theadjoint. This is because the isomorphism ϕ : V → V ∗ is so ‘natural’ for an inner productspace that we ‘erase’ it in the equation ϕ ◦ T ? = T ∗ ◦ ϕ. Moreover, as we will see next, inthe classic example of Rn with the dot product, the adjoint corresponds to the transpose ofa matrix.


Theorem 6.3.5. Fix A = Mn(R) (i.e. A is an n × n matrix with real entries). SupposeT : Rn → Rn is defined by T (v) = Av for all v ∈ Rn. Then if Rn is given the dot product,we have

T ?(v) = Atv ∀ v ∈ Rn.

Proof. Let v, w ∈ Rn. Then

〈Tv, w〉 = (Av) · w (dot product)

= (Av)tw (matrix product)

= (vtAt)w

= vtAtw

= v · (Atw)

= 〈v,Atw〉.

Therefore, by Proposition 6.3.3, T ?(v) = Atv for all v ∈ Rn.

Definition 6.3.6 (Orthogonal transformation). An orthogonal transformation is a linearmap T : V → V on an inner product space V that preserves the inner product:

〈u, v〉 = 〈Tu, Tv〉, for all u, v ∈ V.

Corollary 6.3.7. A linear map T : Rn → Rn is an orthogonal transformation if and only ifits matrix, relative to the canonical basis of Rn, is an orthogonal matrix.

Proof. Let A be the matrix of T relative to the canonical basis of R. Then Tv = Av for allv ∈ Rn. We have

T is orthogonal ⇐⇒ 〈Tu, Tv〉 = 〈u, v〉 for all u, v ∈ V⇐⇒ 〈u, T ?Tv〉 = 〈u, v〉 for all u, v ∈ V⇐⇒ 〈u,AtAv〉 = 〈u, v〉 for all u, v ∈ V⇐⇒ 〈u,AtAv − v〉 = 0 for all u, v ∈ V⇐⇒ AtAv − v = 0 for all v ∈ V (taking u = ei, i = 1, . . . , n)

⇐⇒ AtAv = v for all v ∈ V⇐⇒ AtA = I

⇐⇒ A is orthogonal.

See Exercise 6.3.5 for a generalization of Corollary 6.3.7.

Exercises.

6.3.1 ([Ber14, Ex. 5.3.1]). Consider R3 and R2 as inner product spaces with the canonicalinner products. Let T : R3 → R2 be the linear map defined by

T (a, b, c) = (2a− c, 3b+ 4c).

6.3. Adjoints 99

(a) Find the matrix of T ? relative to the canonical orthonormal bases.

(b) Find the matrix of T ? relative to the basis 12(1,√

3), 12(−√

3, 1) of R2 and the canonicalbasis e1, e2, e3 of R3.

6.3.2 ([Ber14, Ex. 5.3.3]). If U and V are inner product spaces, and if T : U → V , S : V → Uare maps such that

〈Tu, v〉 = 〈u, Sv〉 for all u ∈ U, v ∈ V,

then S and T are linear (and S = T ? when U and V are finite dimensional).

6.3.3 ([Ber14, Ex. 5.3.4]). For A,B in the vector space Mn(R) of n× n real matrices, define

〈A,B〉 = tr(ABt).

Prove that Mn(R) is an inner product space and find the formula for ‖A‖ in terms of itsentries ai,j.

6.3.4 ([Ber14, Ex. 5.3.7]). Suppose U and V are inner product spaces, T : U → V is a linearmap, A is a subset of U , and B is a subset of V . Prove the following:

(a) T (A)⊥ = (T ?)−1(A⊥);

(b) T ?(B)⊥ = T−1(B⊥);

(c) T (U)⊥ = KerT ?, therefore V = T (U)⊕KerT ∗;

(d) T ?(V )⊥ = KerT , therefore U = T ?(V )⊕KerT ;

(e) T (A) ⊆ B =⇒ T ?(B⊥) ⊆ A⊥.

6.3.5. Suppose B is an orthonormal basis of an inner product space V and T : V → V is alinear map. Show that T is an orthogonal transformation if and only if [T ]BB is an orthogonalmatrix. (This generalizes Corollary 6.3.7.)

Appendix A

A taste of abstract algebra

In this optional appendix, we explore some of the properties that binary operations on setscan have. We also give the precise definition of a field omitted in Section 1.1. The abstractpoint of view taken here allows one to prove results in a very general setting, and then obtainmore specific results as special cases. For example, the fact that additive inverses in fieldsand additive inverses in vectors are both unique follows from a more general statement aboutinverses in a monoid.

A.1 Operations on sets

Definition A.1.1 (Operation on a set). Suppose E is a set. We say that ? is an operationon E if for every x, y ∈ E, x ? y is a well-defined element of E. A pair (E, ?), where E is aset and ? is an operation on E is called a set with operation, or magma.

So, loosely speaking, an operation ? on a set E is a “rule” that assigns an element x ? yof E to every pair of elements (x, y) of E. More precisely, ? is a map from E ×E = {(x, y) |x, y ∈ E} to E.

Remark A.1.2. The term magma is not particularly well-known, even among mathematici-ans. We will use it simply because it is shorter than “set with operation”.

Examples A.1.3. (a) (Z,+) is a magma.

(b) (Z,−) is a magma. However, subtraction is not an operation on the set N = {0, 1, 2, 3, 4, . . . }since x− y is not an element of N for all x, y ∈ N. Thus, (N,−) is not a magma.

(c) If “÷” denotes ordinary division of numbers, then ÷ is not an operation on R becausex ÷ y is not defined when y = 0. However, if we let R∗ = R \ {0}, then (R∗,÷) is amagma.

Definition A.1.4. Suppose that (E, ?) is a magma.

(a) We say that ? is commutative if x ? y = y ? x for all x, y ∈ E.

(b) We say that ? is associative if (x ? y) ? z = x ? (y ? z) for all x, y, z ∈ E.

100

A.2. Use of parentheses 101

Example A.1.5. Consider the set E = {1, 2, 3} and the operation ? defined by the table:

? 1 2 3

1 3 2 32 1 1 23 2 2 2

The table tells us that 1 ? 2 = 2 and 2 ? 1 = 1. Note that ? is indeed an operation on E,because it satisfies the requirement of Definition A.1.1. So (E, ?) is a magma. Observe that1 ? 2 6= 2 ? 1, so this operation is not commutative.

Examples A.1.6. (a) In the magma (R,+), the operation is commutative and associative.

(b) In the magma (R, ·), the operation is commutative and associative.

(c) In the magma (R,−), the operation is not commutative and not associative. To seethat it is not commutative, we note that, for instance, 3 − 5 6= 5 − 3. To prove it isnot associative, we note that, for example, (1− 3)− 5 6= 1− (3− 5).

Remark A.1.7. To show that a magma is not associative or commutative, you only needto find one counterexample, but to show that a magma is associative or commutative, youhave to show that the property is satisfied for all elements—you cannot just give a particularexample. For instance, in Example A.1.5, we have 2?3 = 2 = 3?2 but ? is not a commutativeoperation (as we noted in the example).

Exercises.

A.1.1. Let E = {1, 2, 3} and define ? by the table:

? 1 2 3

1 3 4 32 1 1 23 2 2 2

Is (E, ?) a magma? If so, is it commutative and/or associative?

A.2 Use of parentheses

Suppose (E, ?) is a magma. In expressions like a ? (b ? (c ? d)), can we omit the parentheses?The answer depends on whether or not ? is associative.

If ? is an associative operation then we can omit all parentheses. The reason is that itdoesn’t matter whether by a ? b ? c we mean (a ? b) ? c or a ? (b ? c), since these two quantities

102 Appendix A. A taste of abstract algebra

are equal. We may also write longer expressions like a?b?c?d without parentheses, becausethe possible interpretations

((a ? b) ? c) ? d, (a ? (b ? c)) ? d, (a ? b) ? (c ? d), a ? ((b ? c) ? d), a ? (b ? (c ? d)),

all give the same result, so there is no ambiguity.

The general rule is that parentheses must be used when the operation is not associative.There is one exception to this rule: with the non-associative operation of subtraction, pa-rentheses can be omitted in certain cases. Namely, people have adopted the convention thatin expressions such as a−b−c−d (without parentheses), the leftmost operation is evaluatedfirst. In other words, the convention says that a− b− c− d should always be interpreted asmeaning ((a− b)− c)− d. So, if you want to write ((a− b)− c)− d, then you can omit theparentheses, thanks to this convention; but if you want to write one of

(a− (b− c))− d, (a− b)− (c− d), a− ((b− c)− d), a− (b− (c− d))

then the convention does not help you and you must use parentheses.

The convention to evaluate the leftmost operation first (in the absence of parentheses)is universally accepted in the case of subtraction, but not for other operations. If you workwith a non-associative operation other than subtraction, you should not assume that youcan use that convention.

For instance, let ∧ denote the operation of exponentiation on the set A = {1, 2, 3, . . . }of positive integers (i.e., 2 ∧ 3 = 23 = 8 and 3 ∧ 2 = 32 = 9). Then (A,∧) is a magma. Ifyou enter the expression 2∧ 2∧ 3 on your calculator, it will likely give you the answer 64; sothe calculator evaluates the leftmost operation first:

2 ∧ 2 ∧ 3 = (2 ∧ 2) ∧ 3 = (22)3 = 43 = 64.

However, if you ask a mathematician the same question, she will probably do this:

2 ∧ 2 ∧ 3 = 223 = 28 = 256,

so the mathematician evaluates the rightmost operation first. For the same reason, a mat-hematician will always interpret ex

2as meaning e(x

2), never as (ex)2.

Conclusion: Use parentheses whenever there is a possibility of confusion.

Exercises.

Exercise A.2.1. Compute the five possible interpretations of 16 ÷ 8 ÷ 4 ÷ 2 (two of theseinterpretations give the same result but the others are all different, so you should get fourdifferent numbers).

A.3. Identity elements 103

A.3 Identity elements

Definition A.3.1 (Identity element). Suppose that (E, ?) is a magma. If e is an element ofE satisfying

e ? x = x = x ? e for all x ∈ E,

we call e an identity element of (E, ?).

Examples A.3.2. (a) 0 is an identity element of (R,+) because

0 + x = x = x+ 0 for all x ∈ R.

(b) 1 is an identity element of (R, ·) because

1 · x = x = x · 1 for all x ∈ R.

Note that in the above examples, the same set has different identities for different ope-rations. So an identity element depends on the set and the operation.

Example A.3.3. Does (R,−) have an identity element? Suppose e were an identity element.Then we would have

x− e = x for all x ∈ R.

This is satisfied for e = 0 and so one might be tempted to think that 0 is an identity element.But we must also have

e− x = x for all x ∈ R.

For each particular x, the equation e − x = x is only satisfied for e = 2x. But 2x has adifferent value for each x (and is only equal to zero when x = 0). Therefore, there is noe ∈ R that satisfies e− x = x for all x ∈ R. So (R,−) has no identity element.

We have seen that some magmas do not have an identity element, while others do. Is itpossible for a magma to have more than one identity element?

Theorem A.3.4. A given magma can have at most one identity element.

Proof. Suppose that (E, ?) is a magma and that e1, e2 are identity elements of (E, ?). Then

e1 = e1 ? e2 (because e2 is an identity element)

= e2 (because e1 is an identity element).


Exercises.

A.3.1. Verify that the magma (E, ?) in Example A.1.5 does not have any identity element.

A.3.2. In the magma (R∗,÷), is the operation commutative? Associative? Does it have anidentity element?

A.3.3. Consider the set E = {1, 2, 3, 4} and the operation ? defined by:

? 1 2 3 4

1 1 2 1 42 2 3 2 43 1 2 3 44 4 4 4 4

Does the magma (E, ?) have an identity element? If so, find all the invertible elements of(E, ?).

A.3.4. Consider the pair (R, ?), where x ? y = 2x2y. For example, 3 ? (−1) = 232−1 = 4.

(a) Is the pair (R, ?) a magma? If so, does it have an identity element?

(b) If we restrict ? to Z, is the pair (Z, ?) a magma?

A.4 Invertible elements

Definition A.4.1 (Invertible, inverse). Let (E, ?) be a magma and suppose that (E, ?) hasan identity element e ∈ E.

An element a ∈ E is invertible if there exists at least one x ∈ E satisfying

a ∗ x = e = x ∗ a.

Any such x is then called an inverse of a.

It is important to note that it only makes sense to talk about invertibility and inversesif the magma has an identity element.

Examples A.4.2. (a) In (R,+), is −2 invertible? Does there exist x ∈ R satisfying

(−2) + x = 0 = x+ (−2)?

Yes, x = 2 satisfies this requirement. So −2 is invertible and 2 is an inverse of −2. Infact, in (R,+), all elements are invertible: the inverse of x ∈ R is −x.

A.4. Invertible elements 105

(b) In (R, ·), is −2 invertible? Does there exist x ∈ R satisfying

(−2) · x = 1 = x · (−2)?

Yes, x = −1/2 satisfies this requirement. So −2 is invertible and −1/2 is an inverse of−2.

Again in (R, ·), is 0 invertible? Does there exist x ∈ R satisfying

0 · x = 1 = x · 0?

No, such an x does not exist, so 0 is not invertible. In fact, the set of invertible elementsof (R, ·) is equal to R× = R \ {0}. The inverse of x ∈ R× is 1/x.

(c) The element 2 is not an invertible element in the magma (Z, ·). In fact, the set ofinvertible elements of (Z, ·) is equal to {1,−1}.

(d) In (R,−), is 3 invertible? STOP! This question does not make sense, because (R,−)does not have an identity element.

When (E, ∗) does not have an identity element, the concept of invertibility is not de-fined, so it is absurd to ask whether a given element is invertible. Read Definition A.4.1again and pay attention to the role played by the identity element in that definition.Do you see that, if e does not exist, it makes no sense to speak of invertibility?

(e) Here is a disturbing example. Consider the set E = {1, 2, 3, 4} and the operation ∗defined by the table:

∗ 1 2 3 4

1 1 2 3 42 2 3 1 13 3 1 3 14 4 1 1 4

So (E, ∗) is a magma. Observe that 1 ∗ x = x = x ∗ 1 for all x ∈ E, so (E, ∗) has anidentity element (namely, 1) and consequently the concept of invertibility makes sense.

Is 2 invertible? Does there exist x ∈ E satisfying

2 ∗ x = 1 = x ∗ 2?

Yes, there are even two such x: x = 3 and x = 4 satisfy this condition. So 2 isinvertible and has two inverses: 3 and 4 are inverses of 2.

So it is possible for a given element to have more than one inverse! However, wewill see below that this unpleasant phenomenon never happens when the operation isassociative.


Exercises.

A.4.1. Prove that the magma of Example A.4.2(e) is not associative.

A.4.2. Let (E, ?) be a magma, with identity element e. Show that e is invertible, and is itsown inverse (more precisely, show that e is the unique inverse of e). Let a, b ∈ E; show thatif b is an inverse of a then a is an inverse of b.

A.4.3. Let × denotes the vector product (or “cross product”) on R3. Then (R3,×) is amagma. Is (1, 1, 2) an invertible element?

A.5 Monoids

Definition A.5.1 (Monoid). A monoid is a magma (E, ?) such that

• (E, ?) has an identity element, and

• the operation ? is associative.

Examples A.5.2. (a) (R,+), (R, ·), (Z,+), (Z, ·), (N,+), and (N, ·) are examples of mo-noids. Also: (R2,+), (R3,+), and (Rn,+) (for any n ≥ 1) are monoids. All theseexamples are commutative monoids.

(b) Let A = {x ∈ Z | x ≥ 1} = {1, 2, 3, 4, . . . } and let + be ordinary addition. Then(A,+) is a magma but is not a monoid: the operation is associative but there is noidentity element.

(c) Consider (E, ∗) in Example A.4.2(e). Then (E, ∗) is a magma but is not a monoid:there is an identity element but the operation is not associative (see Exercise A.4.1).

(d) Consider the set E = {1, 2, 3, 4} and the operation ∗ defined by the table:

∗ 1 2 3 4

1 1 2 3 42 2 1 4 33 3 3 3 34 4 4 4 4

So (E, ∗) is a magma and (E, ∗) has an identity element (namely, 1). One can alsocheck that ∗ is associative (this takes a bit of work; just take it for granted). So (E, ∗)is a monoid, and in fact it is a non-commutative monoid.

(e) In MAT 1341, you learned how to multiply matrices. Let E be the set of all 2 × 2matrices with entries in R, and let · denote the multiplication of matrices. Then · isan operation on the set E (because the product of any two 2× 2 matrices with entriesin R is again a 2 × 2 matrix with entries in R), and consequently (E, ·) is a magma.Moreover,

A.5. Monoids 107

• the element

(1 00 1

)of the set E is an identity element of this magma;

• in (E, ·), the operation is associative (matrix product is associative).

So (E, ·) is a monoid; in fact it is a non-commutative monoid. What are the invertibleelements in this monoid?

Note that, in any monoid, the concept of invertibility makes sense (because it makessense in any magma which has an identity element).

Theorem A.5.3. In a monoid, a given element can have at most one inverse.

Proof. Let (E, ?) be a monoid and let e be its identity element. Consider a ∈ E and supposethat x1, x2 ∈ E are inverses of a; we have to show that x1 = x2.

As x1, x2 are inverses of a,

a ? x1 = e = x1 ? a and a ? x2 = e = x2 ? a.

It follows that

x1 = x1 ? e (since e is an identity element)

= x1 ? (a ? x2) (since x2 is an inverse of a)

= (x1 ? a) ? x2 (since ∗ is associative)

= e ? x2 (since x1 is an inverse of a)

= x2, (since e is an identity element),

which proves that x1 = x2 and hence completes the proof.

Suppose that (E, ∗) is a monoid. If a is an invertible element then we know, by Theo-rem A.5.3, that a has exactly one inverse; so it makes sense to speak of the inverse of a (asopposed to an inverse of a).

(a) Suppose that the operation in our monoid is an addition; for instance our monoidcould be (R,+) or (Z,+) or (Rn,+), etc. Then we say that our monoid is an additivemonoid and we often use special notation:

• we write (E,+) instead of (E, ?)

• the identity element is usually denoted 0

• if a ∈ E is an invertible element then the inverse of a is usually denoted −a. Ifthis notation is used then −a is, by definition, the unique element of E satisfyinga+ (−a) = 0 = (−a) + a.

(b) If the operation in our monoid (E, ?) is not an addition then the inverse of an elementa is often denoted a−1. Then, by definition of inverse, a−1 is the unique element of Ewhich satisfies

a ? a−1 = e = a−1 ? a,

where e denotes the identity element of (E, ?).


If f and g are functions from R to R, we define a new function “f + g” from R to R by:

(f + g)(x) = f(x) + g(x), for all x ∈ R.

For a concrete example of addition of functions, suppose that:

• f is the function from R to R defined by f(x) = x−1x4+1

• g is the function from R to R defined by g(x) = x+1x4+1

.

Then the definition of f+g says that (f+g)(2) = f(2)+g(2) = 117

+ 317

= 417

. More generally,the definition of f + g says that for each x ∈ R we have

(f + g)(x) = f(x) + g(x) =x− 1

x4 + 1+

x+ 1

x4 + 1=

2x

x4 + 1

so f + g is the function from R to R defined by

(f + g)(x) =2x

x4 + 1, for all x ∈ R.

Let F(R) denote the set of all functions from R to R. Then the addition of functionsthat we just defined is an operation on the set F(R), i.e., any f, g ∈ F(R) determine awell-defined element f + g ∈ F(R). So (F(R),+) is a magma; let us argue that it is acommutative monoid. Before going into this argument, it is useful to recall what it meansfor two functions to be equal.

Definition A.5.4 (Equality of functions). Suppose that f, g are functions from R to R. Wesay that f and g are equal, and write f = g, if for all x ∈ R, the real numbers f(x) and g(x)are equal.

Let us use this definition to show that addition of functions is commutative. Let f, g befunctions from R to R. Then f + g and g + f are functions from R to R and, in order toprove that they are equal, we have to show that for each x ∈ R, the real numbers (f + g)(x)and (g + f)(x) are equal. Now, for any given x ∈ R, we have

(f + g)(x) = f(x) + g(x) = g(x) + f(x) = (g + f)(x)

where the equality in the middle is true because addition of real numbers is commutative,and the other two equalities are true by definition of addition of functions. This proves thatf +g = g+f , so addition of functions is a commutative operation on the set F(R). One canimitate the above argument to show that addition of function is associative (Exercise A.5.2).

Let 0 denote the function from R to R which is identically equal to zero:

0(x) = 0, for all x ∈ R.

Then 0 ∈ F(R). One can show (Exercise A.5.3) that 0 is an identity element of F(R). Weconclude that (F(R),+) is a commutative monoid.

Theorem A.5.5. If a, b are invertible elements in a monoid (E, ?) then a ? b is invertibleand (a ? b)−1 = b−1 ? a−1.

A.5. Monoids 109

Proof. Let (E, ?) be a monoid, with identity element e.Suppose that a, b are invertible elements of (E, ?). Let v = b−1 ? a−1 and note that v

exists and is an element of E. Let us compute (a ? b) ? v and v ? (a ? b); remember that ? isassociative, so parentheses can be moved at will, or even omitted:

(a ? b) ? v = (a ? b) ? (b−1 ? a−1) = a ? (b ? b−1) ? a−1 = a ? e ? a−1

= (a ? e) ? a−1 = a ? a−1 = e,

v ? (a ? b) = (b−1 ? a−1) ? (a ? b) = b−1 ? (a−1 ? a) ? b = b−1 ? e ? b

= (b−1 ? e) ? b = b−1 ? b = e.

So (a ? b) ? v = e = v ? (a ? b); this proves that a ? b is invertible and that its inverse is v.

Exercises.

A.5.1. Find all invertible elements of the monoid of Example A.5.2(d). Also find an inverseof each invertible element.

A.5.2. Prove that addition of functions is associative.

A.5.3. Show that 0 + f = f = f + 0 for all f ∈ F(R). Hint : 0 + f and f are functions fromR to R; to prove that they are equal, use Definition A.5.4.

A.5.4. Show that, in the monoid (F(R),+), each element is invertible. Following the conven-tions for additive monoids, the inverse of an element f is denoted −f . (Here we are talkingabout the additive inverse of f ; this has nothing to do with the concept of inverse function.)

A.5.5. Consider the magma (R2, ?) where the operation ? is defined by

(x, y) ? (x′, y′) = (xx′, yy′) for all (x, y), (x′, y′) ∈ R2.

(a) Is this magma a monoid?

(b) What are the invertible elements of (R2, ?)? Show that (2, 3) is an invertible elementof (R2, ?) and find all the inverses of (2, 3). Make sure you justify that you have foundthem all.

A.5.6. Define a new operation ⊕ on the set R by x ⊕ y = x + y + 3, where the “+” in theright hand side is the ordinary addition of numbers. For instance, 7⊕ 5 = 15. Then (R,⊕)is a magma.

(a) Check that the real number 0 is not an identity element of (R,⊕).

(b) Find a real number e which is an identity element of (R,⊕) (then, by Theorem A.3.4,e is the unique identity element of (R,⊕)). Note that, even though ⊕ looks like anaddition, it would be a bad idea to denote the identity element of (R,⊕) by the symbol0, because then 0 would denote two different numbers (the real number zero and theidentity of the monoid (R,⊕)). So let e denote the identity element of (R,⊕).


(c) Let x, y, z be arbitrary real numbers. Compute the real number (x ⊕ y) ⊕ z; thencompute the real number x ⊕ (y ⊕ z); then check that you obtained the same answerin the two calculations. This argument shows that (x ⊕ y) ⊕ z = x ⊕ (y ⊕ z) for allx, y, z ∈ R, so you have just proved that ⊕ is associative. In view of (b), you canconclude that (R,⊕) is a monoid. Also prove that ⊕ is commutative (compute x ⊕ yand y⊕x separately, and see that you get the same number in the two cases) so (R,⊕)is a commutative monoid.

(d) Show that 5 is an invertible element of (R,⊕) and find its inverse. (Let us use thenotation 5 for the inverse of 5 in (R,⊕).)

(e) Show that in the monoid (R,⊕), every element is invertible. Find the inverse a of eachelement a of the monoid. What is (−3) ? What is 0 ?

(f) If we think of ⊕ as being an addition, then we would like to define a new operation on R which would act like a subtraction. The idea is that subtracting b should beequivalent to adding the inverse of b. So we define a new operation on R by: givena, b ∈ R, a b = a⊕ b. Compute 0 5 and 5 0 (careful! this is confusing). Note that(R,) is a magma, but is not commutative, not associative, and there is no identityelement. Show that the solution x to the equation x⊕ a = b is x = b a (be careful;show that x = b a is a solution to the given equation, and show that it is the onlysolution).

A.6 Fields

Definition A.6.1 (Field). A field is a triple (F,+, ·) where F is a set, + and · are twobinary operations on F , called addition and multiplication, respectively, and the followingconditions are satisfied:

(A1) For all a, b ∈ F , we have a+ b = b+ a. (commutativity of addition)

(A2) For all a, b, c ∈ F , we have (a+ b) + c = a+ (b+ c). (associativity of addition)

(A3) There is an element 0 ∈ F such that, for all a ∈ F , a+ 0 = 0 + a = a. The element0 is unique and is called the additive identity .

(A4) For any a ∈ F , there exists an element −a ∈ F such that a+(−a) = 0. The element−a is uniquely determined by a and is called the additive inverse of a.

(M1) For all a, b ∈ F , we have a · b = b · a. (commutativity of addition)

(M2) For all a, b, c ∈ F , we have (a · b) · c = a · (b · c). (associativity of multiplication)

(M3) There is a nonzero element 1 ∈ F such that a · 1 = 1 · a = a for all a ∈ F . Theelement 1 is unique and is called the multiplicative identity .

(M4) For any a ∈ F , a 6= 0, there exists an element a−1 ∈ F such that aa−1 = 1. Theelement a−1 is uniquely determined by a and is called the multiplicative inverse ofa.

(AM1) For any a, b, c ∈ F , we have (a+ b) · c = a · c+ b · c. (distributivity)

A.6. Fields 111

We often denote multiplication by juxtaposition. For example, if a, b ∈ F , then ab meansa · b. We also sometimes say “F is a field” (instead of “(F,+, ·) is a field”) when the twooperations (addition and multiplication) are clear from the context.

Remark A.6.2. (a) Suppose (F,+, ·) is a field. Definition A.6.1 implies that (F,+) and(F, ·) are both commutative monoids : sets with an associative operation and an identityelement (see Section A.5). In any monoid, identity elements are unique and invertibleelements have exactly one inverse (see Theorem A.5.3). This justifies the uniquenessclaims made in Definition A.6.1.

(b) Our assumption in (M3) that 1 6= 0 is crucial, as we will see in Remark A.6.9.

Remark A.6.3. A set R with two operations + and · satisfying all of the axioms of Defini-tion A.6.1 except possibly (M1) and (M4) is called a ring . If it also satisfies (M1), it is acommutative ring . So a field is a commutative ring in which every nonzero element has amultiplicative inverse.

Note that distributivity and commutativity of multiplication imply

(a+ b)c = ac+ bc for all a, b, c ∈ F

since(a+ b)c = c(a+ b) = ca+ cb = ac+ bc.

Thus, we have distributivity in both directions.If F is a field, we can view any integer as an element of F . First, we view the integer

zero as the 0 ∈ F . Then, for n > 0, we have

n = 1 + 1 + · · ·+ 1︸︷︷︸n summands

∈ F, (A.1)

where 1 here is the multiplicative identity of F . For n < 0, we then define

n = −(−n) = −(1 + 1 + · · ·+ 1︸︷︷︸−n summands

) ∈ F. (A.2)

Then, for instance, if a ∈ F , we can write expressions like

5a = (1 + 1 + 1 + 1 + 1)a = a+ a+ a+ a+ a.

Much of the algebra that we have done in R or C can be done in an arbitrary field.However, when working in an arbitrary field, you should be careful that you are only usingthe properties guaranteed by Definition A.6.1. In particular, watch out for the following:

• In an arbitrary field, you should not use inequalities. The expression a < b does notmake sense for a, b in an arbitrary field F , even though it makes sense when F = R.Even when F = C, there is no good notion of order.

• Even if n and m are distinct integers, they can be equal when considered as elementsof a field F as in (A.1) and (A.2). For example 5 = 0 in F5.


Examples A.6.4. (a) We see that C, R, and Q are fields, with the usual addition andmultiplication.

(b) The integers Z (with the usual addition and multiplication) is not a field because itcontains nonzero elements that do not have multiplicative inverses (for instance, 2).

(c) The set Mn(R) of n× n matrices (with real number entries) with matrix addition andmultiplication, is not a field for n ≥ 2 because there are nonzero matrices (the zeromatrix is the additive identity) that do not have multiplicative inverses. For example,the matrix (

1 01 0

)∈M2(R)

is not the zero matrix but is still not invertible since its determinant is zero. HoweverMn(R) is a ring (see Remark A.6.3).

For the remainder of this section, F is a field.

Lemma A.6.5 (Cancellation laws). For all a, b, c ∈ F , the following statements hold:

(a) If a+ b = a+ c, then b = c.

(b) If ab = ac and a 6= 0, then b = c.

Proof. (a) Suppose a, b, c ∈ F . Then

a+ b = a+ c

=⇒ (−a) + a+ b = (−a) + a+ c

=⇒ 0 + b = 0 + c (by (A4))

=⇒ b = c. (by (A3))

(b) The proof of this part is left as an exercise (Exercise A.6.3).

Example A.6.6. In the field R the equality 0 ·2 = 0 ·3 holds, but 2 6= 3. This does not violateLemma A.6.5(b) because of the requirement a 6= 0 there.

Example A.6.7. Let M2(R) be the set of 2 × 2 matrices with real number entries. We haveoperations of matrix addition and matrix multiplication on M2(R). Suppose

A =

(0 10 0

), X =

(1 00 0

), X ′ =

(2 10 0

).

Then

AX =

(0 00 0

)= AX ′,

but X 6= X ′. The problem here is that M2(R) is not a field (see Example A.6.4(c)). Inparticular, A does not have a multiplicative inverse.

Proposition A.6.8. Let F be a field.

A.6. Fields 113

(a) 0x = 0 for all x ∈ F .

(b) (−1)x = −x for all x ∈ F .

(c) (−a)b = −(ab) = a(−b) for all a, b ∈ F .

Proof. (a) Let x ∈ F . Then

0x+ 0x = (0 + 0)x = 0x = 0x+ 0.

Then we have 0x = 0 by Lemma A.6.5(a).

(b) Let x ∈ F . Then

x+ (−1)x = 1x+ (−1)x = (1 + (−1))x = 0x = 0.

This implies that (−1)x is the additive inverse of x, hence (−1)x = −x.

(c) Let a, b ∈ F . Then

(−a)b = ((−1)a)b = (−1)(ab) = −(ab)

anda(−b) = a((−1)b) = ((−1)b)a = (−1)(ba) = (−1)(ab) = −(ab).

Remark A.6.9. We now see why the assumption 1 6= 0 in (M3) is so important. If 1 = 0then, for all a ∈ F , we have

a = 1a = 0a = 0.

Thus F has only the zero element, and thus is not very interesting!

Proposition A.6.10. Suppose x, y are elements of a field F . If x 6= 0 and y 6= 0, thenxy 6= 0.

Proof. The proof of this proposition is left as an exercise (Exercise A.6.5).

Definition A.6.11 (Subtraction). Suppose F is a field. We define the operation − ofsubtraction on F by

a− b = a+ (−b), for all a, b ∈ F.

Exercises.

A.6.1. Verify directly that F2, as defined in Example 1.1.3 is a field, using Definition A.6.1.

A.6.2. Define an addition and multiplication on R2 by

(x, y) + (x′, y′) = (x+ x′, y + y′) and (x, y)(x′, y′) = (xx′, yy′), for (x, y), (x′, y′) ∈ R2.

Is (R2,+, ·) a field?


A.6.3. Prove Lemma A.6.5(b).

A.6.4. Show that the element 0 in a field F does not have a multiplicative inverse (Hint:Use Proposition A.6.8(a) and the fact that 1 6= 0, as guaranteed by (M3).) Since, by thedefinition of a field, all nonzero elements have a multiplicative inverse, this shows that theset of elements of a field F with a multiplicative inverse is exactly F×.

A.6.5. Prove Proposition A.6.10.

A.6.6. Suppose a, b, c are elements of a field F . Show that

(a) a− a = 0,

(b) a(b− c) = ab− ac, and

(c) (a− b)c = ac− bc.

A.6.7. Suppose that F is a field containing an element c such that c 6= 0 and c+ c = 0. Showthat 1 + 1 = 0 in this field.

A.6.8. Show that if F is a field and x, y ∈ F satisfy x 6= 0 and y 6= 0, then xy 6= 0.

A.6.9. Suppose n ∈ N+ = {1, 2, 3, . . . } is not prime. Let Fn = {0, 1, 2, . . . , n− 1} and defineaddition and multiplication as follows: For a, b ∈ Fn,

a+ b = remainder after dividing a+ b by n,

a · b = remainder after dividing ab by n.

Prove that Fn is not a field. Hint : Use Exercise A.6.8.

Appendix B

Quotient spaces and the FirstIsomorphism Theorem

In this optional appendix we discuss the idea of a quotient vector space. Quotient vectorspaces are vectors spaces whose elements are equivalence classes under a certain equivalencerelation. We recall the idea of an equivalence relation, define quotient vector spaces, andthen prove the important First Isomorphism Theorem. This gives us alternative method forproving the Dimension Theorem (Theorem 3.5.1).

B.1 Equivalence relations and quotient sets

Definition B.1.1 (Equivalence relation). Let X be a nonempty set. Suppose that for eachordered pair (x, y) of elements of X, we are given a statement S(x, y) about x and y. Wewrite x ∼ y if the statement S(x, y) is true. We say that ∼ is an equivalence relation on Xif the following three conditions hold:

(a) reflexivity : x ∼ x for all x ∈ X,

(b) symmetry : if x ∼ y then y ∼ x,

(c) transitivity : if x ∼ y and y ∼ z, then x ∼ z.

If x ∼ y, we say that x is equivalent to y (under the relation ∼).

Example B.1.2 (Congruence modulo n). Fix a positive integer n and consider the set Z ofintegers. For x, y ∈ Z, let S(x, y) be the statement

“(x− y) is an integral multiplie of n.”

Then x ∼ y if and only if x − y is an integrable multiple of n. In other words, x ∼ y if(x− y) = kn for some k ∈ Z. If we write nZ = {kn | k ∈ Z} for the set of integral multiplesof n, then we have

x ∼ y ⇐⇒ (x− y) ∈ nZ.Let’s check that ∼ is an equivalence relation on the set Z.

• Symmetry : For all x ∈ Z, we have x− x = 0 ∈ nZ, so x ∼ x.

115

116 Appendix B. Quotient spaces and the First Isomorphism Theorem

• Reflexivity : Suppose x ∼ y for some x, y ∈ Z. Then x − y ∈ nZ. But then y − x =−(x − y) ∈ nZ (since if x − y = kz for some integer k, then y − x = (−k)z, which isalso an integer multiple of n) and so y ∼ x.

• Transitivity : Suppose x ∼ y and y ∼ z for some x, y, z ∈ Z. Then x − y = k1n forsome k1 ∈ Z and y − z = k2n for some k2 ∈ Z. But then

x− z = (x− y) + (y − z) = k1n+ k2n = (k1 + k2)n ∈ nZ,

since k1 + k2 ∈ Z.

Thus∼ is an equivalence relation on Z. This equivalence relation has its own special notation.If x ∼ y, we write

x ≡ y mod n,

and say x is congruent to y modulo n.

Example B.1.3. Suppose A and B are sets and f : A→ B is any function (map of sets). Forx, y ∈ A, write x ∼ y if f(x) = f(y). Let’s check that ∼ is an equivalence relation.

(a) Reflexive: For all x ∈ A, we have x ∼ x since f(x) = f(x).

(b) Symmetric: If x, y ∈ A such that x ∼ y, then f(x) = f(y). Hence f(y) = f(x) and soy ∼ x.

(c) Transitive: Suppose x, y, z ∈ A such that x ∼ y and y ∼ z. Then f(x) = f(y) = f(z)and so x ∼ z.

Example B.1.4. Suppose M is a subspace of a vector space V . For u, v ∈ V , write u ∼ v ifu − v ∈ M . Then ∼ is an equivalence relations on V . We check the three properties of anequivalence relation.

(a) Reflexive: For all v ∈ V , we have v − v = 0 ∈M , since M is a subspace.

(b) Symmetric: For all u, v ∈ V such that u ∼ v, we have u− v ∈M . Then

v − u = −(u− v) = (−1)(u− v) ∈M

since M is a subspace and hence closed under scalar multiplication.

(c) Transitive: Suppose u, v, w ∈ V such that u ∼ v and v ∼ w. Then u − v ∈ M andv − w ∈M . Then

u− w = (u− v) + (v − w) ∈M

since M is a subspace and hence closed under vector addition.

Definition B.1.5 (Equivalence class). Suppose ∼ is an equivalence relation on a set X.Then, for x ∈ X, we define

[x] = {y ∈ X | x ∼ y}

to be the set of all elements of X that are equivalent to x. We call [x] the equivalence classof x.

B.1. Equivalence relations and quotient sets 117

Example B.1.6. Consider the equivalence relation on Z given by congruence modulo n (Ex-ample B.1.2). Then, for a ∈ Z, we have

[a] = {b ∈ Z | a− b ∈ nZ} = {a+ kn | k ∈ Z} := a+ nZ.

Example B.1.7. Suppose A,B are sets, f : A→ B is any function, and ∼ is the equivalencerelation of Example B.1.3. Then, for x ∈ A,

[x] = {y ∈ A | f(y) = f(x)} = f−1({f(x)}).

Example B.1.8. Suppose M is a subspace of a vector space V and ∼ is the equivalencerelation of Example B.1.4. Then, for v ∈ V ,

[v] = {v + z | z ∈M}.

To prove this, suppose that w ∈ [v]. Then, by definition, v ∼ w and so v−w ∈M . That is,v − w = z for some z ∈M . Thus w = v + (−z). Therefore [v] ⊆ {v + z | z ∈M}.

Now suppose w = v + z for some z ∈ M . Then v − w = −z = (−1)z ∈ M , since M is asubspace. Hence v ∼ w and so w ∈ [v]. Hence, {v + z | z ∈M} ⊆ [v].

Definition B.1.9 (Coset). For a subspace M of a vector space V and v ∈ V , we write

v +M = {v + z | z ∈M}

and call this the coset of v modulo M .

Example B.1.10. Define a linear form f : R2 → R by

f(x1, x2) = 2x1 − x2 ∀ (x1, x2) ∈ R2.

Let

M = Ker f = {(x1, x2) ∈ R2 | f(x1, x2) = 0} = {(x1, x2) ∈ R2 | 2x1 − x2 = 0}.

So M is a line through the origin. Note that the range of f is R since, for instance, for anyx ∈ R, we have f(0,−x) = x.

For c ∈ R, we have

f−1({c}) = {(x1, x2) ∈ R2 | f(x1, x2) = c} = {(x1, x2) ∈ R2 | 2x1 − x2 = c},

which is a line parallel to M (and equal to M if and only if c = 0).Consider the equivalence relation of Example B.1.3. Under this equivalence relation, for

x, y ∈ R2, we have

x ∼ y ⇐⇒ f(x) = f(y) ⇐⇒ f(x)−f(y) = 0 ⇐⇒ f(x−y) = 0 ⇐⇒ x−y ∈ Ker f = M,

where we have used the fact that f is linear. Thus we see that this equivalence relationagrees with the one of Example B.1.4.


For any v ∈ R2,

[v] = v +M = f−1({f(v)}) = {x ∈ R2 | 2x1 − x2 = f(v)}.

This is the line in R2 passing through v and parallel to M . For example, if v = (3, 1), then[v] is the plane with equation

2x1 − x2 = 2(3)− (1) = 5.

We see that the equivalence relation ‘decomposes’ the plane into a set of parallel lines. Eachpoint of the plane R2 lies on exactly one of these lines.

Definition B.1.11 (Partition). SupposeX is a nonempty set. A partition ofX is a collectionA of subsets of X satisfying the following properties:

(a) every A ∈ A is a nonempty subset of X,

(b) if A,B ∈ A and A 6= B, then A ∩B = ∅,

(c) every element of X belongs to one of the subsets in A (in other words, for each x ∈ X,there is some A ∈ A such that x ∈ A).

Remark B.1.12. The third property of a partition says that each element of X belongs toone of the subsets of the partition. The second property says that no element lies in morethan one of the subsets of the partition. Therefore, combining these two facts, we see thatthe fundamental property of a partition is that every element of X belongs to exactly one ofthe subsets in the partition.

In turns out that equivalence relations and partitions are simply two different ways ofthinking about the same thing. The precise relationship is given in the following theorem.

Theorem B.1.13. Let X be a nonempty set.

(a) If ∼ is an equivalence relation on X, then the set of equivalence classes is a partitionof X.

(b) Suppose A is a partition of X. Write x ∼ y if x and y belong to the same elementof A (that is, there exists an A ∈ A such that x, y ∈ A). Then ∼ is an equivalencerelation on X.

Proof. (a) Suppose ∼ is an equivalence relation on X and let A be the set of equivalenceclasses. We want to show that A satisfies the conditions of Definition B.1.11.

• First of all, for any equivalence class [x], we have x ∈ [x] (since x ∼ x by the reflexivityof an equivalence relation). Therefore, all the equivalence classes are nonempty.

• Suppose [x] and [y] are two equivalence classes and [x] 6= [y]. We want to show that[x] ∩ [y] = ∅. We prove this by contradiction. Suppose [x] ∩ [y] 6= ∅. Then we canchoose some element z ∈ [x] ∩ [y]. We will show that this implies that [x] = [y]. Letw ∈ [x]. Then x ∼ w. Since z ∈ [x] ∩ [y], we have z ∈ [x] and z ∈ [y]. Thus x ∼ z andy ∼ z. By symmetry, z ∼ x. Thus

y ∼ z ∼ x ∼ w.

B.1. Equivalence relations and quotient sets 119

By transitivity, y ∼ w and so w ∈ [y]. Hence [x] ⊆ [y]. Repeating the above argumentbut interchanging the roles of x and y shows that [y] ⊆ [x]. Hence [x] = [y]. Thiscontradicts the assumption that [x] 6= [y]. Therefore, [x] ∩ [y] = ∅.

• The last property is easy since for every x ∈ X, we have x ∈ [x] and so every elementof x belongs to some element of A .

(b) Now assume that A is a partition of X and define x ∼ y if there is some A ∈ Asuch that x, y ∈ A. We wish to verify that ∼ is an equivalence relation on X.

• Reflexivity : For any x ∈ X, we have x ∈ [x]. Thus x ∼ x.

• Symmetry: Suppose x, y ∈ X and x ∼ y. Thus, there is some A ∈ A such thatx, y ∈ A. Thus y ∼ x as well.

• Transitivity: Suppose x, y, z ∈ X, x ∼ y, and y ∼ z. Then there are A,B ∈ A suchthat x, y ∈ A and y, z ∈ B. This implies in particular that y ∈ A∩B. Thus A∩B 6= ∅.Hence, by the definition of a partition, we have A = B. Therefore, x and z belong tothe same element A = B of A . Hence x ∼ z.

From this theorem, we can deduce some properties of equivalence classes of relations.

Corollary B.1.14. Suppose ∼ is an equivalence relation on a nonempty set X.

(a) For any x, y ∈ X, we have either [x] = [y] or [x] ∩ [y] = ∅.

(b) For x, y ∈ X, we have x ∼ y if and only if [x] = [y].

Proof. (a) This follows immediately from Theorem B.1.13, which says that the equivalenceclasses form a partition.

(b) If x ∼ y, then y ∈ [x]. We also have y ∈ [y]. Thus [x]∩ [y] 6= ∅ and so [x] = [y] by thefirst part of the corollary. Conversely, if [x] = [y], then y ∈ [x] (since y ∈ [y]). Hencex ∼ y.

We can apply this corollary to the equivalence relation of Example B.1.4 to get thefollowing result.

Corollary B.1.15. If M is a subspace of a vector space V , then

(a) for any x, y ∈ V , we have x+M = y +M or (x+M) ∩ (y +M) = ∅.

(b) x− y ∈M if and only if x+M = y +M , and

Definition B.1.16 (Quotient set). Suppose ∼ is an equivalence relation on a set X. Thenthe set of equivalence classes

{[x] | x ∈ X}is called the quotient set of X for the relation ∼, and is denoted X/ ∼. In the special casewhere M is a subspace of V and the equivalence relation is the one of Example B.1.4, thequotient set is denoted V/M . Thus

V/M = {x+M | x ∈ V }.


Example B.1.17. Consider the situation of Example B.1.10, where f : R2 → R is the linearform defined by

f(x1, x2) = 2x1 − x2 ∀ (x1, x2) ∈ R2,

and

M = Ker f = {(x1, x2) ∈ R2 | f(x1, x2) = 0} = {(x1, x2) ∈ R2 | 2x1 − x2 = 0}.

Then R2/M is the set of lines parallel to M .

Definition B.1.18 (Quotient map). Suppose ∼ is an equivalence relation on a set X. Thenwe have a surjective map

q : X → X/ ∼, q(x) = [x],

called the quotient map of X onto X/ ∼.

Remark B.1.19. Note that if q : X → X/ ∼ is the quotient map corresponding to someequivalence relation ∼ on a set X, then

q(x) = q(y) ⇐⇒ [x] = [y] ⇐⇒ x ∼ y.

Thus, any equivalence relation is of the type seen in Example B.1.3.

Exercises.

B.1.1 ([Ber14, Ex. 2.5.2]). Let X be the set of all nonzero vectors in Rn. In other words,

X = {v ∈ Rn | v 6= 0}.

For x, y ∈ X, write x ∼ y if x = cy for some scalar c. Prove that this is an equivalencerelation on X.

B.2 Quotient vector spaces

In this course, the most important quotient set will be V/M for a subspace M of a vectorspace V . One reason for this is that, in this case, the quotient set is more than just a set : itis a vector space.

First, we need to define vector addition and scalar multiplication. Suppose V is a vectorspace over a field F and M is a subspace of V . For x + M, y + M ∈ V/M , we define theirsum to be

(x+M) + (y +M) = (x+ y) +M.

However, this definition should worry you. Why? Note that, in general, an element of V/Mcan be written in the form x+M for more than one x. Since our definition of vector addition

B.2. Quotient vector spaces 121

refers explicitly to this x, our vector addition is not well-defined unless we get the result ofour operation is independent of this choice of x. That is, if

x+M = x′ +M and y +M = y′ +M,

then it must be true that

(x+M) + (y +M) = (x′ +M) + (y′ +M).

By our definition of addition, this is true if and only if

(x+ y) +M = (x′ + y′) +M.

Let’s check this. Since x+M = x′+M , we have x−x′ ∈M . Similarly, since y+M = y′+M ,we have y − y′ ∈M . Thus

(x+ y)− (x′ + y′) = (x− x′) + (y − y′) ∈M,

since M is a subspace of V . Hence (x+ y) +M = (x′ + y′) +M as desired.We now define scalar multiplication on V/M by the formula

c(x+M) = cx+M, c ∈ F, x ∈ V.

Again we need to check that this is well-defined. In particular, we need to check that ifx + M = x′ + M , then cx + M = cx′ + M . To see this, note that x + M = x′ + M impliesx − x′ ∈ M . Thus c(x − x′) ∈ M since M is a subspace (and hence closed under scalarmultiplication). Thus cx+M = cx′ +M as desired.

Now that we’ve defined an addition and scalar multiplication, we can prove the followingtheorem.

Theorem B.2.1. Suppose V is a vector space over a field F and M is a subspace of V .Then, under the operations defined above, V/M is a vector space over F and the quotientmap

Q : V → V/M, Q(x) = x+M,

is a linear map with KerQ = M . The vector space V/M is called the quotient vector spaceof V by the subspace M .

Proof. We must check that our operations satisfy the axioms of Definition 1.2.1. For x +M, y +M, z +M ∈ V/M , we have

((x+M) + (y +M)) + (z +M) = ((x+ y) +M) + (z +M)

= (x+ y + z) +M = (x+M) + ((y + z) +M) = (x+M) + ((y +M) + (z +M)),

so the addition is associative. Also

(x+M) + (y +M) = (x+ y) +M = (y + x) +M = (y +M) + (x+M), and

(x+M) + (0 +M) = (x+ 0) +M = x+M.


Thus the addition is commutative and 0 +M = M is an additive identity.For all x+M ∈ V/M , we have

(x+M) + (−x+M) = (x− x) +M = 0 +M,

and so every element of V/M has an additive inverse. We leave it as an exercise (Exer-cise B.2.1) to check the remaining axioms in the definition of a vector space.

To check that the quotient map Q is linear, suppose u, v ∈ V and c ∈ F . Then

Q(u+ v) = (u+ v) +M = (u+M) + (v +M) = Q(u) +Q(v),

Q(cv) = cv +M = c(v +M).

Exercises.

B.2.1. Complete the proof of Theorem B.2.1 by showing that V/M satisfies the remainingaxioms of a vector space.

B.2.2 ([Ber14, Ex. 2.6.1]). Prove that if V = M ⊕ N , then V/M ∼= N . Hint: Restrict thequotient mapping V → V/M to N and find the kernel and image of the restricted mapping.

B.3 The First Isomorphism Theorem

Lemma B.3.1. If T : V → W is a linear map, then

T−1({Tv}) = v + KerT, for all v ∈ V.

Proof. Suppose u ∈ T−1({Tv}). Then

Tu = Tv =⇒ Tu− Tv = 0 =⇒ T (u− v) = 0 =⇒ u− v ∈ KerT =⇒ u ∈ v + KerT.

Now suppose u ∈ v + KerT . Then

u− v ∈ KerT =⇒ T (u− v) = 0 =⇒ Tu−Tv = {0} =⇒ Tu = Tv =⇒ u ∈ T−1({Tv}).

Theorem B.3.2 (First Isomorphism Theorem). Suppose T : V → W is a linear map andN = KerT . Then

(a) N is a subspace of V ,

(b) T (V ) is a subspace of W , and

(c) The map V/N → T (V ) given by v+N 7→ Tv is an isomorphism. Thus, V/N ∼= T (V ).

B.3. The First Isomorphism Theorem 123

In other words,V/KerT ∼= ImT

for every linear map T with domain V .

Proof. We’ve already proven (a) and (b) (see Corollary 2.2.4). So it remains to prove (c).We need to find a bijective linear map S : V/N → T (V ). Suppose x ∈ V/N . Then x is a

subset of V and is equal to v + N for some v ∈ V . By Lemma B.3.1, T is constant on thesubset x. Thus we can define a map

S : V/N → T (V ), S(v +N) = Tv.

Note that the map S is only well-defined by the comments above. We first show that S islinear. Suppose x, y ∈ V/N and c is a scalar. Then x = u + N and y = v + N for someu, v ∈ V . Thus,

S(x+ y) = S(u+ v +N) = T (u+ v) = Tu+ Tv = S(u+N) + S(v +N) = S(x) + S(y),

S(cx) = S(cu+N) = T (cu) = cTu = cS(u+N) = cS(x).

It remains to show that S is bijective. It is clearly surjective since any element of T (V ) is ofthe form Tv for some v ∈ V and we have S(v +N) = Tv. Since S is linear, we can show itis injective by proving that KerS = {0}. Now, if x = u+N ∈ KerS, then

S(u+N) = 0 =⇒ Tu = 0 =⇒ u ∈ KerT =⇒ u+N = N,

which is the zero vector of V/N .

Remark B.3.3. One of the main uses of the First Isomorphism Theorem is the following. Ifwe want to prove that V/U ∼= W (where V and W are vector spaces and U is a subspace ofV ), then we can do this by finding a surjective linear map from V to W with kernel U .

Example B.3.4. Let

U = {(x1, x2, x3, x4) | 2x1 − x2 = 0 and x1 + 3x2 − 4x3 − x4 = 0}.

Prove that R4/U ∼= R2.We need to find a surjective map T : R4 → R2 such that U = KerT . Let

T (x1, x2, x3, x4) = (2x1 − x2, x1 + 3x2 − 4x3 − x4).

This is a linear map since it corresponds to multiplication by the matrix[2 −1 0 01 3 −4 −1

].

It is surjective since for any (a, b) ∈ R2, we have

T(a

2, 0, 0,

a

2− b)

= (a, b).

(Recalling MAT 1341, showing that T is surjective is equivalent to showing that the abovematrix has rank 2). It is clear that KerT = U . Thus R4/U ∼= R2 by the First IsomorphismTheorem.


Example B.3.5. Let X be a set and Y be a subset of X. Let F be a field and define

W = {f ∈ F(X,F ) | f(x) = 0 for all x ∈ Y }.

Let’s prove that F(X,F )/W ∼= F(Y, F ).

First of all, it only makes sense to write F(X,F )/W if W is a subspace of F(X,F ). Weleave it as an exercise (Exercise 1.5.9) to show that this is indeed the case.

Next, we want to find a surjective map T : F(X,F ) → F(Y, F ) such that KerT = W .Define T by

T (f) = f |Y .

So T restricts a function to Y . This map is linear (Exercise B.3.1). It is clear that KerT = W .Also, T is surjective since, given any F -valued function f on Y , we can extend it to a functiong on all of X by given it any values we want on points of X \ Y . Then T (g) = f . Thus, bythe First Isomorphism Theorem, we have F(X,F )/W ∼= F(Y, F ).

Exercises.

B.3.1. Prove that the map T of Example B.3.5 is linear.

B.3.2. (a) Let S : U → V and T : V → W be linear maps. Show that Ker(TS) =S−1(KerT ).

(b) Let S : V → W be a surjective linear map and M a subspace of W . Show thatV/S−1(M) ∼= W/M . Hint: Apply part (a) to S : V → W and Q : W → W/M .

B.3.3 ([Ber14, Ex. 2.7.2]). Let M , N be subspaces of the vector spaces V , W (respectively),and let

T : V ×W → (V/M)× (W/N)

be the linear map defined by T (x, y) = (x+M, y+N). Find the kernel of T and prove that

(V ×W )/(M ×N) ∼= (V/M)× (W/N).

B.3.4 ([Ber14, Ex. 2.7.4]). Let V be a vector space, V × V the product vector space, and

∆ = {(v, v) | v ∈ V } ⊆ V × V.

(The set ∆ is called the diagonal of V × V .) Prove that ∆ is a subspace of V × V and that(V × V )/∆ ∼= V . Hint: See Exercise 2.2.8.

B.4. Another proof of the Dimension Theorem 125

B.4 Another proof of the Dimension Theorem

We now use quotient spaces to give an alternative proof of the Dimension Theorem (Theo-rem 3.5.1).

Theorem B.4.1. Suppose V is a vector space and v1, . . . , vn ∈ V . Let 1 ≤ r < n, letM = Span{vr+1, . . . , vn}, and let Q : V → V/M be the quotient map. Then the followingconditions are equivalent:

(a) v1, . . . , vn are linearly independent in V ,

(b) Qv1, . . . , Qvr are independent in V/M and vr+1, . . . , vn are linearly independent in V .

Proof. (a) =⇒ (b): vr+1, . . . , vn are clearly independent since v1, . . . , vn are (any subset of anindependent set is independent). So it remains to show that Qv1, . . . , Qvn are independent.Suppose

c1(Qv1) + · · ·+ cr(Qvr) = Q0

(Recall that Q0 = M is the zero vector of V/M .) We need to show that c1 = · · · = cr = 0.Since Q is linear, we have

Q(c1v1 + · · ·+ cnvn) = Q0,

and soc1v1 + · · ·+ cnvn ∈ KerQ = M.

Since M = Span{vr+1, . . . , vn}, we thus have

c1v1 + · · ·+ cnvn = cr+1vr+1 + · · ·+ cnvn,

for some scalars cr+1, . . . , cn. Then

c1v1 + · · ·+ crvr + (−cr+1)vr+1 + · · ·+ (−cn)vn = 0.

Since v1, . . . , vn are independent, we have that all the coefficients are zero.(b) =⇒ (a): Suppose

c1v1 + · · ·+ cnvn = 0. (B.1)

We want to show c1 = · · · = cn = 0. Let

z = cr+1vr+1 + · · ·+ cnvn.

Then z ∈M = KerQ. Therefore,

Q0 = c1(Qv1) + · · ·+ cr(Qvr) +Qz = c1(Qv1) + · · ·+ cr(Qvr).

Since Qv1, . . . , Qvr are independent, we have c1 = · · · = cr = 0. But then B.1 becomes

cr+1vr+1 + · · ·+ cnvn = 0.

Since vr+1, . . . , vn are linearly independence, we have cr+1 = · · · = cn = 0. Hence all the ciare zero.


Theorem B.4.2. Suppose V is a vector space and v1, . . . , vn ∈ V . Let 1 ≤ r < n, let M =Span{vr+1, . . . , vn}, and let Q : V → V/M be the quotient map. The following conditions areequivalent.

(a) v1, . . . , vn generate V ,

(b) Qv1, . . . , Qvr generate V/M .

Proof. (a) =⇒ (b): Suppose u ∈ V/M . We want to show that we can write u as a linearcombination of Qv1, . . . , Qvn. Since Q is surjective, there is a v ∈ V such that u = Qv. Sincev1, . . . , vn generate V , we have

v = c1v1 + · · ·+ cnvn

for some scalars c1, . . . , cn. We apply the linear map Q to both sides and use the fact thatQvr+1 = · · · = Qvn = Q0 (since M = KerQ) to obtain

u = Qv = c1Qv1 + · · ·+ crQvr.

(b) =⇒ (a): Suppose v ∈ V . We want to show that we can write v as a linearcombination of v1, . . . , vn. Since Qv1, . . . , Qvr generated V/M , we have

Qv = c1(Qv1) + · · ·+ cr(Qvr)

for some scalars c1, . . . , cr. Thus

v − (c1v1 + · · ·+ crvr) ∈ KerQ = M,

and sov − (c1v1 + · · ·+ crvr) = cr+1vr+1 + · · ·+ cnvn

for some scalars cr+1, . . . , cn. Therefore, v = c1v1 + · · ·+ cnvn.

Theorem B.4.3. If V is a vector space and M is a subspace of V , then the followingconditions are equivalent:

(a) V is finite dimensional,

(b) M and V/M are finite dimensional.

When these conditions hold, we have

dimV = dim(V/M) + dimM. (B.2)

Proof. (a) ⇒ (b): Suppose V is finite dimensional. Then M is finite-dimensional by Theo-rem 3.4.21 and V/M is finite dimensional by applying Theorem 3.3.7 to the quotient mapQ : V → V/M .

(b) ⇒ (a): If M = V , the implication is trivial. Furthermore, V/M = V/V consists ofthe single coset 0 + V = V and so V/V is the zero vector space. Thus B.2 becomes

dimV = 0 + dimV,

B.4. Another proof of the Dimension Theorem 127

which is obviously true.If M = {0}, then the quotient map V → V/M is a linear surjection with zero kernel.

Thus it is an isomorphism and so V ∼= V/M . Thus V is finite dimensional because V/M is.Furthermore, B.2 becomes

dimV = dim(V/{0}) + 0 = dimV + 0,

which is clearly true.Now assume M 6= {0} and M 6= V . Choose a basis x1, . . . , xm of M and u1, . . . , ur of

V/M (we can do this since, by assumption, M and V/M are finite dimensional). Chooseyk ∈ V such that uk = yk +M , 1 ≤ k ≤ r. Then, by Theorem B.4.2, the list

x1, . . . , xm, y1, . . . , yr

generates V . Furthermore, this list is independent by Theorem B.4.1. Therefore, it is a basisfor V . It follows that V is finite dimensional and

dimV = m+ r = dimM + dimV/M.

We can now give an alternative proof of the Dimension Theorem.

Alternative proof of Theorem 3.5.1. Since V is finite dimensional, so is its image T (V ) (T isa surjection onto its image, and we call apply Theorem 3.3.7) and the subspace KerT (byTheorem 3.4.21). By the First Isomorphism Theorem, T (V ) ∼= V/KerT . Thus

dimT (V ) = dim(V/KerT ).

Now, by Theorem B.4.3, we have dimT (V ) = dimV − dim(KerT ). The result follows.

Index

additive identity, 110additive inverse, 7, 110additive monoid, 107adjoint, 97Aij, 77algorithm

Gram-Schmidt, 89annihilator, 58antisymmetric, 79associative, 100

basis, 45canonical, 45dual, 57natural, 45standard, 45

Bessel’s inequality, 95bilinear form, 75

C∞(R), 9cancellation in vector spaces, 12canonical basis, 45canonical inner product, 85Cauchy-Schwartz Inequality, 86change of basis matrix, 66coefficient, 14cofactor, 77cofactor expansion, 77column space, 63commutative, 100commutative ring, 111complement, 19

orthogonal, 92complex vector space, 8composite map, 29composition, 29congruence modulo n, 115congruent, 116

coordinate function, 57, 63coordinates, 45, 63coset, 117

dependence relation, 38dependent

linearly, 38, 40determinant, 77dimension, 47Dimension Theorem, 51direct sum, 18distributivity, 7, 110dual basis, 57dual space, 23, 56

ei, 15elementary column operations, 70elementary matrix, 70elementary row operations, 70equality of functions, 8, 108equivalence class, 116equivalence relation, 115equivalent, 115even function, 20

FX , 8F×, 6F2, 6Fp, 5F(F ), 8F(X,F ), 8field, 5, 6, 110

finite, 5finite dimensional, 47finite field, 5finitely generated vector space, 42F n, 8form

128

Index 129

linear, 75Fourier coefficient, 92

Gaussian elimination, 70generate, 37generating set, 37Gram-Schmidt algorithm, 89

identity element, 103identity map, 27Im, 25image, 24, 25independent

linearly, 38, 40indeterminate, 10inequality

Cauchy-Schwartz, 86triangle, 87

infinite dimensional, 47infinite sequence, 10inner product space, 85intersection, 17inverse, 33, 104inverse image, 24invertible element, 104isomorphism, 31

Ker, 25kernel, 25Kronecker delta, 57

L(V ), 27L(V,W ), 27linear combination, 14linear form, 23, 27, 75linear map, 21linear subspace, 16linear transformation, 21linearly dependent, 38

sets, 40linearly independent, 38

sets, 40

Mn(R), 112Mm,n(F ), 8magma, 100

mapcomposite, 29linear, 21

matrix, 62change of basis, 66of a linear map, 62orthogonal, 92

Mi(a), 70modular law, 20monoid, 106

additive, 107multilinear map, 75multiplicative identity, 110multiplicative inverse, 110

N, 4, 100natural basis, 45nilpotent, 82norm, 86null space, 25, 74nullity, 52

odd function, 20operation, 100orthogonal, 88

matrix, 92transformation, 98

orthogonal complement, 92orthogonal matrix, 82orthogonal projection, 89, 94orthonormal, 88

parallelogram law, 86parentheses, 101partition, 118Pi,j, 70pointwise multiplication, 8polarization, 86polynomial, 10, 14, 34polynomial function, 10, 34power of a map, 29product vector space, 10projection, 89, 94

Q(√

2), 6quotient map, 120

130 Index

quotient set, 119quotient vector space, 121

R≥0, 6range, 25rank, 52

of a matrix, 71Rank-Nullity Theorem, 52real vector space, 8reflexivity, 115relation

dependence, 38Riesz Representation Theorem, 93ring, 111row space, 73row-echelon form, 70

scalar, 8scalar linear map, 27scalar multiplication, 7scalar product, 7sequence, 10set with operation, 100similar matrices, 67skew-symmetric, 79span, 14, 36standard basis, 27, 45standard matrix, 27subfield, 5subset

sum, 17subspace, 16

trivial, 16subtraction

in a field, 113of vectors, 13

sum of subsets, 17superposition principle, 14symmetry, 115

[T ]DB , 62trace, 65, 76transitivity, 115translation, 13transpose

of a linear map, 58

of a matrix, 65triangle inequality, 87trivial subspace, 16

unit vector, 88

Vandermonde determinant, 79vector, 7vector addition, 7vector space, 7

complex, 8finitely generated, 42real, 8

wave function, 14

zero linear map, 27zero vector, 7

Bibliography

[Ber14] Sterling K. Berberian. Linear algebra. Dover Publications, Inc., Mineola, NY, 2014.Reprint of the 1992 original, with a new errata and comments.

[Sav] Alistair Savage. Mathematical reasoning & proofs. Notes for MAT 1362. Avai-lable at http://alistairsavage.ca/mat1362/notes/MAT1362-Mathematical_

reasoning_and_proofs.pdf.

[Tre] Sergei Treil. Linear algbra done wrong. Available at http://www.math.brown.edu/

~treil/papers/LADW/LADW.html.

131

http://alistairsavage.ca/mat1362/notes/MAT1362-Mathematical_reasoning_and_proofs.pdf

http://alistairsavage.ca/mat1362/notes/MAT1362-Mathematical_reasoning_and_proofs.pdf

http://www.math.brown.edu/~treil/papers/LADW/LADW.html

http://www.math.brown.edu/~treil/papers/LADW/LADW.html

mat 2141 - linear algebra i university of...

Documents