linear algebra math 308 s. paul smith

Linear Algebra

Math 308

S. Paul Smith

Department of Mathematics, Box 354350, University of Wash-ington, Seattle, WA 98195

E-mail address: [email protected]

Contents

Chapter 0. Introduction 11. What it’s all about 12. Some practical applications 23. The importance of asking questions 2

Chapter 1. Linear Geometry 31. Linearity 32. Lines in R2 103. Points, lines, and planes in R3 134. Higher dimensions 155. Parametric descriptions of lines, planes, etc. 15

Chapter 2. Matrices 171. What is a matrix? 172. A warning about our use of the word vector 183. Row and column vectors = points with coordinates 184. Matrix arithmetic: addition and subtraction 205. The zero matrix and the negative of a matrix 206. Matrix arithmetic: multiplication 217. Pitfalls and warnings 258. Transpose 259. Some special matrices 2610. Solving an equation involving an upper triangular matrix 2711. Some special products 28

Chapter 3. Matrices and motions in R2 and R3 311. Linear transformations 312. Rotations in the plane 313. Projections in the plane 314. Contraction and dilation 315. Reflections in the plane 316. Reflections in R3 317. Projections from R3 to a plane 31

Chapter 4. Systems of Linear Equations 331. Systems of equations 332. A single linear equation 36

iii

iv CONTENTS

3. Systems of linear equations 374. A system of linear equations is a single matrix equation 385. Specific examples 386. The number of solutions 417. A geometric view on the number of solutions 418. Homogeneous systems 42

Chapter 5. Row operations and row equivalence 451. Equivalent systems of equations 452. Echelon Form 453. An example 464. You already did this in high school 475. The rank of a matrix 486. Inconsistent systems 497. Consistent systems 498. Parametric equations for lines and planes 519. The importance of rank 5210. The word “solution” 5311. Elementary matrices 54

Chapter 6. The Vector space Rn 571. Arithmetic in Rn 572. The standard basis for Rn 583. Linear combinations and linear span 584. Some infinite dimensional vector spaces 60

Chapter 7. Subspaces 611. The definition and examples 612. The row and column spaces of a matrix 643. Lines, planes, and translations of subspaces 654. Linguistic difficulties: algebra vs. geometry 67

Chapter 8. Linear dependence and independence 691. The definition 692. Criteria for linear (in)dependence 693. Linear (in)dependence and systems of equations 73

Chapter 9. Non-singular matrices, invertible matrices, and inverses 751. Singular and non-singular matrices 752. Inverses 763. Elementary matrices are invertible 784. Making use of inverses 785. The inverse of a 2× 2 matrix 796. If A is non-singular how do we find A−1? 81

Chapter 10. Bases, coordinates, and dimension 831. Definitions 83

CONTENTS v

2. A subspace of dimension d is just like Rd 833. All bases for W have the same number of elements 844. Every subspace has a basis 845. Properties of bases and spanning sets 856. How to find a basis for a subspace 867. How to find a basis for the range of a matrix 868. Rank + Nullity 869. How to compute the null space and range of a matrix 90

Chapter 11. Linear transformations 911. A reminder on functions 912. First observations 923. Linear transformations and matrices 944. How to find the matrix representing a linear transformation 955. Invertible matrices and invertible linear transformations 966. How to find the formula for a linear transformation 967. Rotations in the plane 968. Reflections in R2 979. Invariant subspaces 9810. The one-to-one and onto properties 9811. Two-to-two 9812. Gazing into the distance: differential operators as linear

transformations 99

Chapter 12. Determinants 1031. The definition 1032. Elementary row operations and determinants 1063. The determinant and invertibility 1084. Properties 1085. Elementary column operations and determinants 109

Chapter 13. Eigenvalues 1131. Definitions and first steps 1132. Reflections in R2, revisited 1143. The 2× 2 case 1154. The equation A2 + I 1165. The characteristic polynomial 1176. How to find eigenvalues and eigenvectors 1187. The utility of eigenvectors 122

Chapter 14. Complex vector spaces and complex eigenvalues 1251. The complex numbers 1252. The complex numbers 1253. Linear algebra over C 1294. The complex norm 1295. An extended exercise 131

vi CONTENTS

Chapter 15. Orthogonality 1331. Geometry 1332. The dot product 1333. Orthogonal vectors 1344. Classical orthogonal polynomials 1355. Orthogonal and orthonormal bases 1356. The Gram-Schmidt process 1367. Orthogonal matrices 1388. Orthogonal projections 1409. Least-squares “solutions” to inconsistent systems 14310. Approximating data by polynomial curves 14611. Gazing into the distance: Fourier series 14712. Infinite sequences 147

Chapter 16. Similar matrices 1491. Definition 1492. Definition and first properties 1493. Example 1504. 1505. Diagonalizable matrices 152

Chapter 17. Words and terminology 1571. The need for clarity and precision 1572. Some, all, each, every, and only 1573. Singular and plural 1594. The word “it” 1595. “A” and “The” 1606. If and only if 160

Chapter 18. Applications of linear algebra and vector spaces 1611. Simple electrical circuits 1612. Magic squares 162

Chapter 19. Last rites 1651. A summary of notation 1652. A sermon 166

CHAPTER 0

Introduction

1. What it’s all about

The practical problem, solving systems of linear equations, that moti-vates the subject of Linear Algebra, is introduced in chapter 4. Althoughthe problem is concrete and easily understood, the methods and theoreticalframework required for a deeper understanding of it are abstract. Linear al-gebra can be approached in a completely abstract fashion, untethered fromthe problems that give rise to the subject.

Such an approach is daunting for the average student so we will strike abalance between the practical, specific, abstract, and general.

Linear algebra plays a central role in almost all parts of modern tech-nology. Systems of linear equations involving hundreds, thousands, evenbillions of unknowns are solved every second of every day in all corners ofthe globe. One of the more fantastic uses is the way in which Google priori-tizes pages on the web. All web pages are assigned a page rank that measuresits importance. The page ranks are the unknowns in an enormous systemof linear equations. To find the page rank one must solve the system oflinear equations. To handle such large systems of linear equations one usessophisticated techniques that are developed first as abstract results aboutlinear algebra.

Systems of linear equations are rephrased in terms of matrix equations,i.e., equations involving matrices. The translation is straightforward butafter mastering the basics of “matrix arithmetic” one must interpret thosebasics in geometric terms. That leads to linear geometry and the languageof vectors and vector spaces.

Chapter 1 provides a brief account of linear geometry. As the namesuggests, linear geometry concerns lines. The material about lines in theplane is covered in high school. Unless you know that material backwardsand forwards linear algebra will be impossible for you. Linear geometryalso involves higher dimensional analogues of lines, for examples, lines andplanes in 3-space, or R3 as we will denote it. I am assuming you met thatmaterial in a multivariable calculus course. Unless you know that materialbackwards and forwards linear algebra will be impossible for you.1

1I expect you to know the material about linear geometry in R2 and R3. By that Idon’t mean that you have simply passed a course where that material is covered. I expectyou to have understood and mastered that material and that you retain that mastery today. If

1

2 0. INTRODUCTION

After reviewing linear geometry we will review basic facts about matricesin chapter 2. That will be familiar to some students. Very little from chapter2 is required to understand the initial material on systems of linear equationsin chapter 4.

2. Some practical applications

(1) solving systems of linear equations(2) birth-death processes, Leslie matrices (see Ch. 6 in Difference equa-

tions. From rabbits to chaos), Lotka-Volterra(3) Google page rank(4) X-ray tomography, MRI, density of tissue, a slice through the

body is pixellated and each pixel has a different density to beinferred/computed from the input/output of sending various raysthrough the body. Generally many more measurements than un-knowns so almost always obtain an inconsistent system but wantthe least-squares solution.

3. The importance of asking questions

You will quickly find that this is a difficult course. Everyone finds itdifficult. The best advice I can give you is to ask questions. When you readsomething you don’t understand don’t just skim over it thinking you willreturn to it later and then fail to return to it. Ask, ask, ask. The mainreason people fail this course is because they are afraid to ask questions.They think asking a question will make them look stupid. Smart peoplearen’t afraid to ask questions. No one has all the answers.

The second most common reason people fail this course is that they waittill the week before the exam to start thinking seriously about the material.If you are smart you will try to understand every topic in this course at thetime it first appears.

you are not in command of the material in chapter 1 master it as soon as possible. Thatis essential if you want to pass this course.

CHAPTER 1

Linear Geometry

1. Linearity

The word linear comes from the word “line”.The geometric aspect of linear algebra involves lines, planes, and their

higher dimensional analogues: e.g., lines in the plane, lines in 3-space, linesin 4-space, planes in 3-space, planes in 4-space, 3-planes in 4-space, 5-planesin 8-space, and so on, ad infinitum. Such things form the subject matter oflinear geometry.

Curvy things play no role in linear algebra or linear geometry. We ignorecircles, spheres, ellipses, parabolas, etc. All is linear.

1.1. What is a line? You already “know” what a line is. More accu-rately, you know something about lines in the plane, R2, or in 3-space, R3.In this course, you need to know something about lines in n-space, Rn.

1.2. What is Rn? Rn is our notation for the set of all n-tuples ofreal numbers. We run out of words after pair, triple, quadruple, quintuple,sextuple, ... so invent the word n-tuple to refer to an ordered sequence of nnumbers where n can be any positive integer.

For example, (8,7,6,5,4,3,2,1) is an 8-tuple. It is not the same as the 8-tuple (1,2,3,4,5,6,7,8). We think of these 8-tuples as labels for two differentpoints in R8. We call the individual entries in (8,7,6,5,4,3,2,1) the coordinatesof the point (8,7,6,5,4,3,2,1); thus 8 is the first coordinate, 7 the secondcoordinate, etc.

Often we need to speak about a point in Rn when we don’t know itscoordinates. In that case we will say something like this: let (a1, a2, . . . , an)be a point in Rn. Here a1, a2, . . . , an are some arbitrary real numbers.

1.3. The origin. The origin is a special point in Rn: it is the pointhaving all its coordinates equal to 0. For example, (0,0) is the origin in R2;(0,0,0) is the origin in R3; (0,0,0,0) is the origin in R4; and so on. We oftenwrite 0 for the origin.

There is a special notation for the set of all points in Rn except the origin,namely Rn−{0}. We use the minus symbol because Rn−{0} is obtained bytaking 0 away from Rn. Here “taking away” is synonymous with “removing”.This notation permits a useful brevity of expression: “suppose p ∈ Rn−{0}”means the same thing as “suppose p is a point in Rn that is not 0”.

3

4 1. LINEAR GEOMETRY

1.4. Adding points in Rn. We can add points in Rn. For example, inR4, (1, 2, 3, 4) + (6, 4, 2, 0) = (7, 6, 5, 4). The origin is the unique point withthe property that 0 + p = p for all points p in Rn. For that reason we alsocall 0 zero.

We can also subtract: for example, (6, 4, 2, 0)−(1, 2, 3, 4) = (5, 2,−1,−4).We can not add a point in R4 to a point in R5. More generally, if m 6= n,

we can’t add a point in Rm to a point in Rn.

1.5. Multiplying a point in Rn by a number. First, some exam-ples. Consider the point p = (1, 1, 2, 3) in R4. Then 2p = (2, 2, 4, 6),5p = (5, 5, 10, 15), −3p = (−3,−3,−6,−9); more generally, if t is anyreal number tp = (t, t, 2t, 3t). In full generality, if λ is a real number andp = (a1, . . . , an) is a point in Rn, λp denotes the point (λa1, . . . , λan); thus,the coordinates of λp are obtained by multiplying each coordinate of p byλ. We call λp a multiple of p.

1.6. Lines. We will now define what we mean by the word line. If p isa point in Rn that is not the origin, the line through the origin in the directionp is the set of all multiples of p. Formally, if we write Rp to denote thatline,1

Rp = {λp | λ ∈ R}.If q is another point in Rn, the line through q in the direction p is

{q + λp | λ ∈ R};we often denote this line by q + Rp.

When we speak of a line in Rn we mean a subset of Rn of the formq+Rp. Thus, if I say L is a line in Rn I mean there is a point p ∈ Rn−{0}and a point q ∈ Rn such that L = q + Rp. The p and q are not uniquelydetermined by L.

Proposition 1.1. The lines L = q+Rp and L′ = q′+Rp′ are the sameif and only if p and p′ are multiples of each other and q− q′ lies on the lineRp; i.e., if and only if Rp = Rp′ and q − q′ ∈ Rp.

Proof.2 (⇒) Suppose the lines are the same, i.e., L = L′. Since q′ is on theline L′ it is equal to q+αp for some α ∈ R. Since q is on L, q = q′+ βp′ for

1I like the notation Rp for the set of all multiples of p because it is similar to thenotation 2p: when x and y are numbers we denote their product by xy, the juxtapositionof x and y. So Rp consists of all products λp where λ ranges over the set of all realnumbers.

2To show two sets X and X ′ are the same one must show that every element of X isin X ′, which is written as X ⊆ X ′, and that every element of X ′ is in X.

To prove an “if and only if” statement one must prove each statement implies theother; i.e., the statement “A if and only if B” is true if the truth of A implies the truth ofB and the truth of B implies the truth of A. Most of the time, when I prove a result ofthe form “A if and only if B” I will break the proof into two parts: I will begin the proofthat A implies B by writing the symbol (⇒) and will begin the proof that B implies Aby writing the symbol (⇐)

1. LINEARITY 5

some β ∈ R. Hence q − q′ = −αp = βp′. In particular, q − q′ ∈ Rp. Thereare now two cases.

Suppose q 6= q′. Then α and β are non-zero so p and p′ are multiples ofeach other whence Rp′ = Rp.

Suppose q = q′. Since q+p ∈ q+Rp = q′+Rp′ = q+Rp′, q+p = q+λp′

for some λ ∈ R. Thus p = λp′. Likewise, q + p′ ∈ q + Rp′ = q + Rp soq + p′ = q + µp for some µ ∈ R. Thus p′ = µp. We have shown that p andp′ are multiples of each other so Rp = Rp′.

(⇐) Suppose q − q′ ∈ Rp and Rp = Rp′. Then q − q′ = αp for someα ∈ R and p = βp′ for some non-zero β ∈ R.

If λ ∈ R, then

q + λp = q′ + αp+ λp = q′ + (α+ λ)βp′

which is in q′ + Rp′; thus q + Rp ⊆ q′ + Rp′.Similarly, if µ ∈ R, then

q′ + µp′ = q − αp+ µβ−1p = = q + (µβ−1 − α)p

which is in q + Rp; thus q′ + Rp′ ⊆ q + Rp.This completes the proof that q′ + Rp′ = q + Rp if q − q′ ∈ Rp and

Rp = Rp′. �

1.7. Fear and loathing. I know most of you are pretty worried aboutthis mysterious thing called Rn. What does it look like? What propertiesdoes it have? How can I work with it if I can’t picture it? What the heckdoes he mean when he talks about points and lines in Rn?

Although we can’t picture Rn we can ask questions about it.This is an important point. Prior to 1800 or so, mathematicians and

scientists only asked questions about things they could “see” or “touch”. Forexample, R4 wasn’t really considered in a serious way until general relativitytied together space and time. Going further back, there was a time whennegative numbers were considered absurd. Indeed, there still exist primitivecultures that have no concept of negative numbers. If you tried to introducethem to negative numbers they would think you were some kind of nut.Imaginary numbers were called that because they weren’t really numbers.Even when they held that status there were a few brave souls who calculatedwith them by following the rules of arithmetic that applied to real numbers.

Because we have a definition of Rn we can grapple with it, explore it,and ask and answer questions about it. That is what we have been doing inthe last few sections. In olden days, mathematicians rarely defined things.Because the things they studied were “real” it was “obvious” what the wordsmeant. The great counterexample to this statement is Euclid’s books ongeometry. Euclid took care to define everything carefully and built geometryon that rock. As mathematics became increasingly sophisticated the needfor precise definitions became more apparent. We now live in a sophisticatedmathematical world where everything is defined.


So, take a deep breath, gather your courage, and plunge into the bracingwaters. I will be there to help you if you start sinking. But I can’t beeverywhere at once, and I won’t always recognize whether you are wavingor drowning. Shout “help” if you are sinking. I’m not telepathic.

1.8. Basic properties of Rn. Here are some things we will prove:

(1) If p and q are different points in Rn there is one, and only one, linethrough p and q. We denote it by pq.

(2) If L is a line through the origin in Rn, and q and q′ are points inRn, then either q + L = q′ + L or (q + L) ∩ (q′ + L) = ∅.

(3) If L is a line through the origin in Rn, and q and q′ are points inRn, then q + L = q′ + L if and only if q − q′ ∈ L.

(4) If p and q are different points in Rn, then pq is the line through pin the direction q − p.

(5) If L and L′ are lines in Rn such that L ⊆ L′, then L = L′. This isa consequence of (2).

If L is a line through the origin in Rn, and q and q′ are points in Rn, we saythat the lines q + L and q′ + L are parallel.

Don’t just accept these as facts to be memorized. Learning is more thanknowing—learning involves understanding. To understand why the abovefacts are true you will need to look at how they are proved.

Proposition 1.2. If p and q are different points in Rn there is one, andonly one, line through p and q, namely the line through p in the directionq − p.

Proof. The line through p in the direction q − p is p + R(q − p). The linep+R(q−p) contains p because p = p+0×(q−p). It also contains q becauseq = p+ 1× (q − p). We have shown that the line through p in the directionq − p passes through p and q.

It remains to show that this is the only line that passes through both pand q. To that end, let L be any line in Rn that passes though p and q. Wewill use Proposition 1.1 to show that L is equal to p+ R(q − p).

By our definition of the word line, L = q′ + Rp′ for some points q′ ∈ Rand p′ ∈ R − {0}. By Proposition 1.1, q′ + Rp′ is equal to p + R(q − p) ifRp′ = R(q − p) and q′ − p ∈ Rp′.

We will now show that Rp′ = R(q − p) and q′ − p ∈ Rp′.Since p ∈ L and q ∈ L, there are numbers λ and µ such that p = q′+λp′,

which implies that q′ − p ∈ Rp′, and q = q′ + µp′. Thus p − q = q′ + λp −q′ − µq′ = (λ − µ)q′. Because p 6= q, p − q 6= 0; i.e., (λ − µ)p′ 6= 0. Henceλ−µ 6= 0. Therefore p′ = (λ−µ)−1(p− q). It follows that every multiple ofp′ is a multiple of p− q and every multiple of p− q is a multiple of p′. ThusRp′ = R(q − p). We already observed that q′ − p ∈ Rp′ so Proposition 1.1tells us that q′ + Rp′ = p+ R(q − p), i.e., L = p+ R(q − p). �

1. LINEARITY 7

1.8.1. Notation. We will write pq for the unique line in Rn that passesthrough the points p and q (when p 6= q). Proposition 1.2 tells us that

pq = p+ R(q − p).

Proposition 1.1 tells us that pq has infinitely many other similar descriptions.For example, pq = q + R(p− q) and pq = q + R(q − p).

1.8.2. Although we can’t picture R8 we can ask questions about it. Forexample, does the point (1,1,1,1,1,1,1,1) lie on the line through the points(8,7,6,5,4,3,2,1) and (1,2,3,4,5,6,7,8)? Why? If you can answer this youunderstand what is going on. If you can’t you don’t, and should ask aquestion.

1.9. Parametric description of lines. You already know that linesin R3 can be described by a pair of equations or parametrically. For example,the line given by the equations

(1-1)

{x+ y + z = 4

x+ 3y + 2z = 9

is the set of points of the form (t, 1 + t, 3 − 2t) as t ranges over all realnumbers; in set-theoretic notation, the line is

(1-2) {(t, 1 + t, 3− 2t) | t ∈ R};

we call t a parameter; we might say that t parametrizes the line or that theline is parametrized by t.

The next result shows that every line in Rn can be described paramet-rically.

Proposition 1.3. If p and q are distinct points in Rn, then

(1-3) pq = {tp+ (1− t)q | t ∈ R}.

Proof. We must show that the sets pq and {tp + (1 − t)q | t ∈ R} are thesame. We do this by showing each set contains the other one.

Let p′ be a point on pq. Since pq = p + R(q − p), p′ = p + λ(q − p) forsome λ ∈ R. Thus, if t = 1− λ, then

p′ = (1− λ)p+ λq = tp+ (1− t)q.

Therefore pq ⊆ {tp+ (1− t)q | t ∈ R}. If t is any number, then

tp+ (1− t)q = q + t(p− q) ∈ q + R(p− q) = pq

so {tp+ (1− t)q | t ∈ R} ⊆ pq. Thus, pq = {tp+ (1− t)q | t ∈ R}. �

If you think of t as denoting time, then the parametrization in (1-3) canbe thought of as giving the position of a moving point at time t; for example,at t = 0 the moving point is at q and at time t = 1, the moving point is atp; at time t = 1

2 the moving point is exactly half-way between p and q. Thisperspective should help you answer the question in §1.8.2.


1.9.1. Don’t forget this remark. One of the key ideas in understandingsystems of linear equations is moving back and forth between the parametricdescription of linear subsets of Rn and the description of those linear subsetsas solution sets to systems of linear equations. For example, (1-2) is aparametric description of the set of solutions to the system (1-1) of twolinear equations in the unknowns x, y, and z. In other words, every solutionto (1-1) is obtained by choosing a number t and then taking x = t, y = 1+ t,and z = 3− 2t.

1.9.2. Each line L in Rn has infinitely many parametric descriptions.1.9.3. Parametric descriptions of higher dimensional linear subsets of

Rn.

1.10. Linear subsets of Rn. Let L be a subset of Rn. We say that Lis linear if

(1) it is a point or(2) whenever p and q are different points on L every point on the line

pq lies on L.

1.11. What is a plane? Actually, it would be better if you thoughtabout this question. How would you define a plane in Rn? Look at thedefinition of a line for inspiration. We defined a line parametrically, not bya collection of equations. Did we really need to define a line in two steps,i.e., first defining a line through the origin, then defining a general line?

1.12. Hyperplanes. The dot product of two points u = (u1, . . . , un)and v = (v1, . . . , vn) in Rn is the number

u · v = u1v1 + · · ·+ unvn.

Notice that u · v = v · u. If u and v are non-zero and u · v = 0 we say u andv are orthogonal. If n is 2 or 3, the condition that u · v = 0 is equivalent tothe condition that the line 0u is perpendicular to the line 0v.

Let u be a non-zero point in Rn and c any number. The set

H := {v ∈ Rn | u · v = c}

is called a hyperplane in Rn.

Proposition 1.4. If p and q are different points on a hyperplane H,then all the points on the line pq lie on H.

Proof. Since H is a hyperplane there is a point u ∈ Rn−{0} and a numberc such that H = {v ∈ Rn | u · v = c}.

�

1. LINEARITY 9

1.13. Solutions to systems of equations: an example. A systemof equations is just a collection of equations. For example, taken together,

(1-4) x2 + y2 = z2 and x+ y + z = 1

form a system of two equations. We call x, y, and z, unknowns. A solutionto the system (1-4) consists of three numbers a, b, c such that a2 + b2 = c2

and a+b+c = 1. Such a solution corresponds to the point (a, b, c) in 3-space,R3. For example,

(0, 12 ,

12

)is a solution to this system of equations because

(0)2 +

(1

2

)2

=

(1

2

)2

and 0 +1

2+

1

2= 1.

Similarly,(14 ,

13 ,

512

)is a solution to the system of equations because(

1

4

)2

+

(1

3

)2

=

(5

12

)2

and1

4+

1

3+

5

12= 1.

Thus, we think of solutions to the system (1-4) as points in R3. Thesepoints form a geometric object. We use our visual sense to perceive them,and our visual sense organize those individual points into a single organizedgeometric object.

If you are really interested in math you might try finding all solutionsto the system (1-4). If not all, several. How did I find the second solution?Can you find another solution?

1.14. Solutions to systems of equations. A solution to a system ofequations in n unknowns x1, . . . , xn is a set of numbers s1, . . . , sn such thatevery equation is a true equality when each xi is replaced by the correspond-ing number si. This is the famous “plugging in” procedure. We will think ofs1, . . . , sn as the coordinates of a point in n-space, Rn. Thus, each solutionis a point p = (s1, . . . , sn) in Rn.

1.15. Systems of linear equations. The distinguishing feature of asystem of linear equations is that

if p and q are different solutions to a system of linear equa-tions, then every point on the line through p and q is asolution to the system.

It is this linearity feature that gives rise to the terminology linear algebra.It is convenient to write pq for the line through p and q.Suppose p, q, and r, are solutions to a system of linear equations and r

is not on the line through p and q. Then every point on the three lines pq,pr, and qr, is a solution to the system. Now take any point p′ on pq and anypoint q′ on pr; since p′ and q′ are solutions to the linear system so is everypoint on the line p′q′. Repeating this operation we can draw more and morelines. You should be able to convince yourself that all the lines we obtain inthis way eventually fill up a plane, the plane in Rn that contains the pointsp, q, and r. We denote that plane by pqr.


1.16. Grumbling and preaching. It has become popular over thelast two or three decades to speak of “linear thinking” and “non-linearthinking”. This is a fine example of the way in which scientific terms arehijacked by people who have no idea what they mean. To those people, “lin-ear thinking” is not as “good” as “non-linear thinking”. “Linear thinking”is considered rigid, not flexible, something “creative/imaginative” peopledon’t engage in. This is utter nonsense promulgated by ignorant, pompous,people. (Yes, I’m pompous too but perhaps not so ignorant.)

The recognition of linearity in mathematics and the subsequent devel-opment of linear algebra has produced methods of tremendous power andextraordinarily wide applicability. Indeed, the modern world could not func-tion as it does if it did not make use of linear algebra and linear geometrymillions of times every second. I mean that literally. No exaggeration. Lin-ear algebra is used millions of times every second, in every corner of theglobe, to ensure that (some of) our society functions smoothly.

The methods that have been developed to address problems in linearalgebra are so effective that it is often beneficial to approximate a non-linear problem (and here I use the word non-linear in a precise scientificsense) by a linear problem.

2. Lines in R2

2.1. An ordered pair of real numbers consists of two real numbers wheretheir order matters. For example, (0, 0), (0, 1), (1, 1), (−2, 7), and (−3,−5),are ordered pairs of real numbers, and they differ from the ordered pairs(1, 0), (7,−2), and (−5,−3).

R2 denotes the set of ordered pairs of real numbers. For example, (0, 0),(0, 1), (1, 1), (−2, 7), and (−3,−5), are points in R2. We use the word orderedbecause (1, 2) and (2, 1) are different points of R2. A formal definition is

R2 := {(a, b) | a, b ∈ R}.

At high school you learned about x- and y-axes and labelled points ofin the plane by ordered pairs of real numbers. The point labelled (3, 7) isthe point obtained by starting at the origin, going 3 units of distance in thex-direction, then 7 units of distance in the y-direction. Thus, we have ageometric view of the algebraically defined object R2.

Linear algebra involves a constant interplay of algebra and geometry.To master linear algebra one must keep in mind both the algebraic andgeometric features. A simple example of this interplay is illustrated by thefact that the unique solution to the pair of equations

2x+ 3y =4

2x− 3y =− 8

is the point where the lines 2x + 3y = 4 and 2x − 3y = −8 intersect. Thispair of equations is a system of two linear equations.

2. LINES IN R2 11

2.2. Notation. If p and q are two different points in the plane R2 orin 3-space, R3, we write pq for the line through them.

2.3. Set notation. We will use standard set notation throughout thiscourse. You will find it helpful to read something about set notation. Usegoogle.

2.4. Prerequisites for every linear algebra course. You must beable to do the following things:

• Find the slope of the line through two given points in R2.• Find the equation of the line through two points in R2.• Give a parametric form for a line that is given in the form ax+by =c. There are infinitely many different parametric forms for a givenline. For example, the line x = y is the line consisting of all points(t, t) as t ranges over R, the set of real numbers. The same line isgiven by the parametrization (1 + 2t, 1 + 2t), t ∈ R. And so on.• Give a parametric form for the line through two given points. Ifp = (a, b) and q = (c, d) are two different points in the plane the setof points t(a, b)+(1−t)(c, d), or, equivalently, tp+(1−t)q, gives allpoints on pq as t ranges over all real numbers. For example, t = 0gives the point q, t = 1 gives the point p, t = 1

2 gives the pointhalfway between p and q. One gets all points between p and q byletting t range over the closed interval [0, 1]. More formally, the setof points between p and q is {tp + (1 − t)q | t ∈ [0, 1]}. Similarly,pq = {tp+ (1− t)q | t ∈ R}.

Notice that t(a, b) + (1− t)(c, d) = (ta+ (1− t)c, tb+ (1− t)d).• Decide whether two lines are parallel.• Decide whether two lines are perpendicular.• Given a line L and a point not on L, find the line parallel to L that

passes through the given point.• Given a line L and a point not on L, find the line perpendicular toL that passes through the given point.• Find the point at which two lines intersect if such a point exists.

Proposition 2.1. Let p = (a, b) and q = (c, d) be two different pointsin the plane, R2. The set of points

{tp+ (1− t)q | t ∈ R}

is exactly the set of points on the line through p and q.

Proof. The line given by the equation

(2-1) (c− a)(y − b) = (d− b)(x− a)

passes through p because when we plug in a for x and b for y (2-1) is anequality, namely (c− a)(b− b) = (d− b)(a− a). The line given by equation(2-1) also passes through q because when we plug in c for x and d for y we


get a true equality, namely (c − a)(d − b) = (d − b)(c − a). Thus, the linegiven by equation (2-1) is the line through p and q which we denote by pq.

Let’s write L for the set of points {tp+ (1− t)q | t ∈ R}. Less elegantly,but more explicitly,

L ={(ta+ (1− t)c, tb+ (1− t)d

) ∣∣∣ t ∈ R}.

The proposition claims that L = pq. To prove this we must show that everypoint in L belongs to pq and every point on pq belongs to L.

We first show that L ⊆ pq, i.e., every point in L belongs to pq. A pointin L has coordinates

(ta + (1 − t)c, tb + (1 − t)d

). When we plug these

coordinates into (2-1) we obtain

(c− a)(tb+ (1− t)d− b) = (d− b)(ta+ (1− t)c− a),

i.e., (c−a)(1−t)(d−b) = (d−b)(1−t)(c−a). Since this really is an equalitywe have shown that

(ta+(1−t)c, tb+(1−t)d

)lies on pq. Therefore L ⊆ pq.

To prove the opposite inclusion, take a point (x, y) that lies on pq; wewill show that (x, y) is in L. Do do that we must show there is a numbert such that (x, y) = (ta + (1 − t)c, tb + (1 − t)d). Because (x, y) lies on pq,(c− a)(y − b) = (d− b)(x− a). This implies that

x− ca− c

=y − db− d

.

Let’s call this number t. If a = c we define t to be the number (y−d)/(b−d);if b = d, we define t to be the number (x − c)/(a − c); we can’t have a = cand b = d because then (a, b) = (c, d) which violates the hypothesis that(a, b) and (c, d) are different points.

In any case, we now have x − c = t(a − c) and y − d = t(b − d). Wecan rewrite these as x = ta + (1− t)c and y = tb + (1− t)d. Thus (x, y) =(ta+ (1− t)c, tb+ (1− t)d), i.e., (x, y) belongs to the set L. This completesthe proof that L ⊆ pq. �

2.5. Remarks on the previous proof. You might ask “How did youknow that (2-1) is the line through (a, b) and (c, d)?

Draw three points (a, b), (c, d), and (x, y), on a piece of paper. Draw theline segment from (a, b) to (c, d), and the line segment from (c, d) to (x, y).Can you see, or convince yourself, that (x, y) lies on the line through (a, b)and (c, d) if and only if those two line segments have the same slope? I hopeso because that is the key idea behind the equation (2-1). Let me explain.The slopes of the two lines are

d− bc− a

andy − bx− a

.

Thus, (x, y) lies on the line through (a, b) and (c, d) if and only if

d− bc− a

=y − bx− a

.

3. POINTS, LINES, AND PLANES IN R3 13

It is possible that a might equal c in which case the expression on the leftof the previous equation makes no sense. To avoid that I cross multiply andrewrite the required equality as (d− b)(x− a) = (c− a)(y − b).

Of course, there are other reasonable ways to write the equation for theline pq but I like mine because it seems elegant and fits in nicely with thegeometry of the situation, i.e., the statements I made about slopes.

2.6. About proofs. A proof is a narrative argument intended to con-vince other people of the truth of a statement.

By narrative I mean that the argument consists of sentences and para-graphs. The sentences should be grammatically correct and easy to un-derstand. Those sentences will contain words and mathematical symbols.Not only must the words obey the rules of grammar, so too must the sym-bols. Thus, there are two kinds of grammatical rules: those that govern theEnglish language and those that govern “mathematical phrases”.

By saying other people I want to emphasize that you are writing forsomeone else, the reader. Make it easy for the reader. It is not enough thatyou understand what your sentences say. Others must understand them too.Indeed, the quality of your proof is measured by the effect it has on otherpeople. If your proof does not convince other people you have done a poorjob.

If your proof does not convince others it might be that your “proof” isincorrect. You might have fooled yourself. If your proof does not convinceothers it might be because your argument is too long and convoluted. Inthat case, you should re-examine your proof and try to make it simpler.Are parts of your narrative argument unnecessary? Have you said the samething more than once? Have you used words or phrases that add nothing toyour argument? Are your sentences too long? Is it possible to use shortersentences. Short sentences are easier to understand. Your argument mightfail to convince others because it is too short. Is some essential part of theargument missing? Is it clear how each statement follows from the previousones? Have you used an undefined symbol, word, or phrase?

When I figure out how to prove something I usually rewrite the proofmany times trying to make it as simple and clear as possible. Once I haveconvinced myself that my argument works I start a new job, that of polishingand refining the argument so it will convince others.

Even if you follow the rules of grammar you might write nonsense. Afamous example is “Red dreams sleep furiously”. For some mathematicalnonsense, consider the sentence “The derivative of a triangle is perpendicularto its area”.

3. Points, lines, and planes in R3

3.1. R3 denotes the set of ordered triples of real numbers. Formally,

R3 := {(a, b, c) | a, b, c ∈ R}.


The corresponding geometric picture is 3-space. Elements of R3, i.e., orderedtriples (a, b, c) label points in 3-space. We call a the x-coordinate of the pointp = (a, b, c), b the y-coordinate of p, and c the z-coordinate of p.

3.2. Notation. If p and q are two different points in R3 we write pqfor the line through them. If p, q, and r, are three points in R3 that do notlie on a line, there is a unique plane that contains them. That plane will belabelled pqr.

3.3. The first difference from R2 is that a single linear equation, 2x+y−3z = 4 for example, “gives” a plane in R3. By “gives” I mean the following.Every solution to the equation is an ordered triple of numbers, e.g., (2, 6, 2)is a solution to the equation 2x+y−3z = 4 and it represents (is the label for)a point in R3. We say that the point (2, 6, 2) is a solution to the equation.The collection (formally, the set) of all solutions to 2x+ y − 3z = 4 forms aplane in R3.

Two planes in R3 are said to be parallel if they do not intersect (meet).

3.4. Prerequisites for every linear algebra course. You must beable to do the following things:

• Fiind the equation of the plane pqr when p, q, and r, are threepoints in R3 that do not lie on a line. A relatively easy questionof this type is to find the equation for the plane that contains(0, 0, 0), (2, 3, 7), and (3, 2, 3). That equation will be of the formax+ by + cz = d for some real numbers a, b, c, d. Since (0, 0, 0) lieson the plane, d = 0. Because (2, 3, 7) and (3, 2, 3) lie on the planeax+ by + cz = 0 the numbers a, b, c must have the property that

2a+ 3b+ 7c = 0 and

3a+ 2b+ 3c = 0.

This is a system of 2 linear equations in 3 unknowns. There aremany ways you might solve this system but all are essentially thesame. The idea is to multiply each equation by some number andadd or subtract the two equations to get a simpler equation thata, b, and c, must satisfy.• Find the equation of the line through two given points.• Give a parametric form for a line that is given in the form ax+by =c. There are infinitely many different parametric forms for a givenline. For example, the line x = y is the line consisting of all points(t, t) as t ranges over R, the set of real numbers. The same line isgiven by the parametrization (1 + 2t, 1 + 2t), t ∈ R. And so on.• Give a parametric form for the line through two given points. Ifp = (a, b) and q = (c, d) are two different points in the plane the setof points t(a, b)+(1−t)(c, d), or, equivalently, tp+(1−t)q, gives allpoints on pq as t ranges over all real numbers. For example, t = 0gives the point q, t = 1 gives the point p, t = 1

2 gives the point

5. PARAMETRIC DESCRIPTIONS OF LINES, PLANES, ETC. 15

halfway between p and q. One gets all points between p and q byletting t range over the closed interval [0, 1]. More formally, the setof points between p and q is {tp + (1 − t)q | t ∈ [0, 1]}. Similarly,pq = {tp+ (1− t)q | t ∈ R}.

Notice that t(a, b) + (1− t)(c, d) = (ta+ (1− t)c, tb+ (1− t)d).• Decide whether two lines are parallel.• Decide whether two lines are perpendicular.• Given a line L and a point not on L, find the line parallel to L that

passes through the given point.• Given a line L and a point not on L, find the line perpendicular toL that passes through the given point.• Find the point at which two lines intersect if such a point exists.

As t ranges over all real numbers the points (1, 2, 3) + t(1, 1, 1) form aline in R3. The line passes through the point (1, 2, 3) (just take t = 0). Wesometimes describe this line by saying “it is the line through (1, 2, 3) in thedirection (1, 1, 1).”

4. Higher dimensions

5. Parametric descriptions of lines, planes, etc.

The solutions to the system of equations x+y−2z = 0 and 2x−y−z = 3is a line in R3. Let’s call it L. Although these two equations give a completedescription of L that description does not make it easy to write down anypoints that lie on L. In a sense it is a useless description of L.

In contrast, it is easy to write down points that lie on the line (1, 2, 3) +t(1, 1, 1), t ∈ R; just choose any real number t and compute. For example,asking t = −1, t = 0, t = 1, and t = 2, gives us the points (0, 1, 2), (1, 2, 3),(2, 3, 4), and (3, 4, 5). In fact, the line in this paragraph is the line in theprevious paragraph.

Solving a system of linear equations means, roughly, taking a line (usu-ally a higher dimensional linear space) given as in the first paragraph andfiguring out how to describe that line by giving a description of it like thatin the second paragraph.

5.1. Exercise. Show that the line (1, 2, 3)+t(1, 1, 1), t ∈ R, is the sameas the line given by the equations x+ y− 2z = 0 and 2x− y− z = 3. (Hint:to show two lines are the same it suffices to show they have two points incommon.)

CHAPTER 2

Matrices

You can skip this chapter if you want, start reading at chapter 4, andreturn to this chapter whenever you need to.

Matrices are an essential part of the language of linear algebra and linearequations. This chapter isolates this part of the language so you can easilyrefer back to it when you need to.

1. What is a matrix?

1.1. An m× n matrix (we read this as “an m-by-n matrix”) is a rect-angular array of mn numbers arranged into m rows and n columns. Forexample,

(1-1) A :=

(1 3 0−4 5 2

)is a 2× 3 matrix. The prefix m× n is called the size of the matrix.

1.2. We use upper case letters to denote matrices. The numbers in thematrix are called its entries. The entry in row i and column j is called theijth entry, and if the matrix is denoted by A we often write Aij for its ijth

entry. The entries in the matrix A in (1-1) are

A11 = 1, A12 = 3, A13 = 0A21 = −4, A22 = 5, A23 = 2.

In this example, we call 5 the 22-entry; read this as two-two entry. Likewise,the 21-entry (two-one entry) is 4.

We sometimes write A = (aij) for the matrix whose ijth entry is aij . Forexample, we might write

A =

(a11 a12 a13a21 a22 a23

).

Notice that aij is the entry in row i and column j

1.3. Equality of matrices. Two matrices A and B are equal if andonly if they have the same size and Aij = Bij for all i and j.

17

18 2. MATRICES

1.4. Square matrices. An n×n matrix is called a square matrix. Forexample,

(1-2) A =

(1 2−1 0

)is a 2× 2 square matrix.

2. A warning about our use of the word vector

In the past you have probably used the word vector to mean somethingwith magnitude and direction. That is not the way the word vectorwill be used in this course.

Later we will define things called vector spaces. The familiar spacesR2 and R3 are vector spaces. But there are many other vector spaces.For example, the set of all polynomials in one variable is a vector space.The set of all continuous functions f : R → R is a vector space. The setof all continuous functions f : [0, 1] → R is a vector space.The set of allpolynomials of degree ≤ 56 fis a vector space; there is nothing special about56. All vector spaces have various common properties so rather than provingsuch a property for this vector space, then that one, then another one, wedevelop a general theory of vector spaces proving results about all vectorspaces at once.

That is what abstraction is about: by throwing away the inessentialdata and keeping the essential data we are able to prove results that applyto many different things at once.

When talking about vector spaces we need a name for the elements inthem. The word used for an element in a vector space is vector. However,when we look at a particular vector space the word vector might be confusing.For example, the set of all polynomials is a vector space but rather thancalling the elements in it vectors we call them polynomials.

One might argue that we should not have hijacked the word vector—Itend to agree with you, but the naming was done long ago and is now partof the language of linear algebra. We are stuck with it. Sometimes I will usethe word point to refer to an element in a vector space. For example, it iscommon to call elements of the plane R2 points. You probably are used tocalling elements of R3 points.

Bottom line: in this course, when you here the word vector think of apoint or whatever is appropriate to the situation. For example, the row andcolumn vectors we are about to define are just matrices of a particular kind.It is not helpful to think of row and column vectors as having magnitudeand direction.

3. Row and column vectors = points with coordinates

3.1. Matrices having just one row or column are of particular impor-tance and are often called vectors.

A matrix having a single row is called a row matrix or a row vector.

3. ROW AND COLUMN VECTORS = POINTS WITH COORDINATES 19

A matrix having a single column is called a column matrix or a column vector.For example, the 1 × 4 matrix (1 2 3 4) is a row vector, and the 4 × 1

matrix 1234

is a column vector.

In parts of this course it is better to call row and column vectors pointsand think of the entries in them as the coordinates of the points. For exam-ple, think of (1,2,3,4) and

1234

as points having 4 coordinates. Because it has 4 coordinates such a pointbelongs to 4-space which we will denote by R4.

At high school you met points with 2 coordinates, points in the plane,R2. Either there, or at college, you met points in 3-space,R3, often labelled(x, y, z).

Points with n coordinates, (a1, . . . , an) for example, belong to n-spacewhich we denote by Rn. We call the number n in the notation Rn thedimension of the space. Later we will give an abstract and formal definitionof dimension but for now your intuition should suffice.

3.2. Underlining row and column vectors. Matrices are usuallydenoted by upper case letters but this convention is usually violated forrow and column vectors. We usually denote a row or column vector bya lower case letter that is underlined. The underlining is done to avoidconfusion between a lower case letters denoting numbers and lower caseletters denoting row or column vectors. For example, I will write

u = (1 2 3 4) and v =

1234

.

Many students ask me whether they can write→v to denote a vector.

My answer is “NO”. It is best to avoid that notation because that arrowsuggests direction and as I said in §2 we are not using the word vector tomean things with magnitude and direction. Also it takes more work to write→v than v.

Warning. .

20 2. MATRICES

4. Matrix arithmetic: addition and subtraction

4.1. We add two matrices of the same size by adding the entries of oneto those of the other, e.g.,(

1 2 34 5 6

)+

(−1 2 03 −2 1

)=

(0 4 37 3 7

).

The result of adding two matrices is called their sum. Subtraction is definedin a similar way, e.g.,(

1 2 34 5 6

)−(−1 2 03 −2 1

)=

(2 0 31 7 5

).

Abstractly, if (aij) and (bij) are matrices of the same size, then

(aij) + (bij) = (aij + bij) and (aij)− (bij) = (aij − bij).

4.2. Matrices of different sizes can not be added. The sum oftwo matrices of different sizes is not defined. Whenever we write A+B wetacitly assume that A and B have the same size.

4.3. Addition of matrices is commutative:

A+B = B +A.

The commutativity is a simple consequence of the fact that addition ofnumbers is commutative.

4.4. Addition of matrices, like addition of numbers, is associative:

(A+B) + C = A+ (B + C).

This allows us to dispense with the parentheses: the expression A+B + Cis unambiguous—you get the same answer whether you first add A and B,then C, or first add B and C, then A. Even more, because + is commutativeyou also get the same answer if you first add A and C, then B.

5. The zero matrix and the negative of a matrix

The zero matrix is the matrix consisting entirely of zeroes. Of course, weshouldn’t really say the zero matrix because there is one zero matrix of sizem×n for each choice of m and n. We denote the zero matrix of any size by0. Although the symbol 0 now has many meanings it should be clear fromthe context which zero matrix is meant.

The zero matrix behaves like the number 0:

(1) A+ 0 = A = 0 +A for all matrices A, and(2) given a matrix A, there is a unique matrix A′ such that A+A′ = 0.

The matrix A′ in (2) is denoted by −A and its entries are the negatives ofthe entries in A.

After we define multiplication of matrices we will see that the zero matrixshares another important property with the number zero: the product of azero matrix with any other matrix is zero.

6. MATRIX ARITHMETIC: MULTIPLICATION 21

6. Matrix arithmetic: multiplication

6.1. Let A be an m×n matrix and B a p× q matrix. The product ABis only defined when n = p and in that case the product is an m× q matrix.For example, if A is a 2 × 3 matrix and B a 3 × 4 matrix we can form theproduct AB but there is no product BA! In other words,

the product AB can only be formed if the number ofcolumns in A is equal to the number of rows in B.

The rule for multiplication will strike you as complicated and arbitraryat first though you will later understand why it is natural. Let A be anm× n matrix and B an n× q matrix. Before being precise, let me say thatthe

the ijth entry of AB is obtained by multiplyingthe ith row of A by the jth column of B.

What we mean by this can be seen by reading the next example carefully.The product AB of the matrices

A =

(1 20 −1

)and B =

(5 4 32 1 2

)is the 2× 3 matrix whose

11-entry is (row 1)×(column 1) = (1 2)×(

52

)= 1× 5 + 2× 2 = 9


41

)= 1× 4 + 2× 1 = 6


32

)= 1× 3 + 2× 2 = 7

21-entry is (row 2)×(column 1) = (0 − 1)×(

52

)= 0× 5− 1× 2 = −2


41

)= 0× 4− 1× 1 = −1


32

)= 0× 3− 1× 2 = −2

In short,

AB =

(9 6 7−2 −1 −2

).

The product BA does not make sense in this case.

22 2. MATRICES

My own mental picture for remembering how to multiply matrices isencapsulated in the diagram

qqqqqq��

qqqqqqqqqqqq

��

q q q q q q ////

oo

//

Then one computes the dot product.

6.2. Powers of a matrix. A square matrix, i.e., an n× n matrix, hasthe same number of rows as columns so we can multiply it by itself. Wewrite A2 rather that AA for the product. For example, if

A =

(1 11 0

), then A2 =

(2 11 1

), A3 =

(3 22 1

), A4 =

(5 33 2

).

Please check those calculations to test your understanding.

6.3. The formal definition of multiplication. Let A be an m × nmatrix and B an n × q matrix. Then AB is the m × q matrix whose ijth

entry is

(6-1) (AB)ij =

n∑t=1

AitBtj .

Do you understand this formula? Is it compatible with what you haveunderstood above? If A is the 3 × 4 matrix with Aij = |4 − i − j| andB is the 4×3 matrix with Bij = ij−4, what is (AB)23? If you can’t answerthis question there is something about the definition of multiplication orsomething about the notation I am using that you do not understand. Doyourself and others a favor by asking a question in class.

6.4. The zero matrix. For every m and n there is an m×n matrix wecall zero. It is the m× n matrix with all its entries equal to 0. The productof any matrix with the zero matrix is equal to zero. Why?

6.5. The identity matrix. For every n there is an n × n matrix wecall the n× n identity matrix. We often denote it by In, or just I if the n isclear from the context. It is the matrix with entries

Iij =

{1 if i = j

0 if i 6= j.

6. MATRIX ARITHMETIC: MULTIPLICATION 23

In other words every entry on the NW-SE diagonal is 1 and all other entriesare 0. For example, the 4× 4 identity matrix is

1 0 0 00 1 0 00 0 1 00 0 0 1

.

The key property of the n × n identity matrix I is that if A is any m × nmatrix, then AI = A and, if B is any n × p matrix IB = B. You shouldcheck this with a couple of examples to convince yourself it is true. Evenbetter give a proof of it for all n by using the definition of the product. Forexample, if A is an m× n matrix, then

(AI)ij =n∑k=1

AikIkj = Aij

where we used the fact that Ikj is zero except when kji, i.e., the only non-zero term in the sum is AijIjj = Aij .

6.6. Multiplication is associative. Another way to see whether youunderstand the definition of the product in (6-1) is to try using it to provethat matrix multiplication is associative, i.e., that

(6-2) (AB)C = A(BC)

for any three matrices for which this product makes sense. This is an im-portant property, just as it is for numbers: if a, b, and c, are real numbersthen (ab)c = a(bc); this allows us to simply write abc, or when there are fournumbers abcd, because we know we get the same answer no matter how wegroup the numbers in forming the product. For example

(2× 3)× (4× 5) = 6× 20 = 120

and ((2× 3)× 4

)× 5 = (6× 4)× 5 = 24× 5 = 120.

Formula (6-1) only defines the product of two matrices so to form theproduct of three matrices A, B, and C, of sizes k × `, ` ×m, and m × n,respectively, we must use (6-1) twice. But there are two ways to do that:first compute AB, then multiply on the right by C; or first compute BC,then multiply on the left by A. The formula (AB)C = A(BC) says thatthose two alternatives produce the same matrix. Therefore we can writeABC for that product (no parentheses!) without ambiguity. Before we cando that we must prove (6-2) by using the definition (6-1) of the product oftwo matrices.

24 2. MATRICES

6.7. Try using (6-1) to prove (6-2). To show two matrices are equal youmust show their entries are the same. Thus, you must prove the ijth entry of(AB)C is equal to the ijth entry of A(BC). To begin, use (6-1) to compute((AB)C)ij . I leave you to continue.

This is a test of your desire to pass the course. I know you want to passthe course, but do you want that enough to do this computation?1

Can you use the associative law to prove that if J is an n × n matrixwith the property that AJ = A and JB = B for all m× n matrices A andall n× p matrices B, then J = In?

6.8. The distributive law. If A, B, and C, are matrices of sizes suchthat the following expressions make sense, then

(A+B)C = AC +BC.

6.9. The columns of the product AB. Suppose the product ABexists. It is sometimes useful to write Bj for the jth column of B and write

B = [B1, . . . , Bn].

The columns of AB are then given by the formula

AB = [AB1, . . . , ABn].

You should check this asssertion.

6.10. The product Ax. Suppose A = [A1, A2, . . . , An] is a matrixwith n columns and

x =

x1x2...xn

.

Then

Ax = x1A1 + x2A2 + · · ·+ xnAn.

This last equation is one of the most important equations in the course.Tattoo it on your body. If you don’t know this equation you will probablyfail the course.

Try to prove it yourself. First do two simple examples, A a 2×3 matrix,and A a 3× 2 matrix. Then try the general case, A an m× n matrix.

1Perhaps I will ask you to prove that (AB)C = A(BC) on the midterm—would youprefer to discover whether you can do that now or then?

8. TRANSPOSE 25

6.11. Multiplication by a scalar. There is another kind of multipli-cation we can do. Let A be an m× n matrix and let c be any real number.We define cA to be the m × n matrix obtained by multiplying every entryof A by c. Formally, (cA)ij = cAij for all i and j. For example,

3

(1 3 0−4 5 2

)=

(3 9 0−12 15 6

).

We also define the product Ac be declaring that it is equal to cA, i.e.,Ac = cA.

7. Pitfalls and warnings

7.1. Warning: multiplication is not commutative. If a and b arenumbers, then ab = ba. However, if A and B are matrices AB need notequal BA even if both products exist. For example, if A is a 2 × 3 matrixand B is a 3× 2 matrix, then AB is a 2× 2 matrix whereas BA is a 3× 3matrix.

Even if A and B are square matrices of the same size, which ensuresthat AB and BA have the same size, AB need not equal BA. For example,

(7-1)

(0 01 0

)(1 00 0

)=

(0 01 0

)but

(7-2)

(1 00 0

)(0 01 0

)=

(0 00 0

).

7.2. Warning: a product of non-zero matrices can be zero. Thecalculation (7-2) shows AB can be zero when both A and B are non-zero.

7.3. Warning: you can’t always cancel. If A is not zero and AB =AC it need not be true that B = C. For example, in (7-2) we see thatAB = 0 = A0 but the A cannot be cancelled to deduce that B = 0.

8. Transpose

The transpose of an m× n matrix A is the n×m matrix AT defined by

(AT )ij := Aji.

Thus the transpose of A is “the reflection of A in the diagonal”, and therows of A become the columns of AT and the columns of A become the rowsof AT . An example makes it clear:(

1 2 34 5 6

)T=

1 42 53 6

.

Notice that (AT )T = A.

26 2. MATRICES

The transpose of a column vector is a row vector and vice versa. Weoften use the transpose notation when writing a column vector—it savesspace2 and looks better to write vT = (1 2 3 4) rather than

v =

1234

.

Check that

(AB)T = BTAT .

A lot of students get this wrong: they think that (AB)T is equal to ATBT .(If the product AB makes sense BTAT need not make sense.)

9. Some special matrices

We have already met two special matrices, the identity and the zeromatrix. They behave like the numbers 1 and 0 and their great importancederives from that simple fact.

9.1. Symmetric matrices. We call A a symmetric matrix if AT =A. A symmetric matrix must be a square matrix. A symmetric matrix issymmetric about its main diagonal. For example

0 1 2 31 4 5 62 5 7 83 6 8 9

is symmetric.

A square matrix A is symmetric if Aij = Aji for all i and j. For example,the matrix 1 −2 5

−2 0 45 4 −9

is symmetric. The name comes from the fact that the entries below thediagonal are the same as the corresponding entries above the diagonal where“corresponding” means, roughly, the entry obtained by reflecting in thediagonal. By the diagonal of the matrix above I mean the line from the topleft corner to the bottom right corner that passes through the numbers 1,0, and −9.

If A is any square matrix show that A+AT is symmetric.Is AAT symmetric?Use the definition of multiplication to show that (AB)T = BTAT .

2Printers dislike blank space because it requires more paper. They also dislike blackspace, like N, F, ♣, �, because it requires more ink.

10. SOLVING AN EQUATION INVOLVING AN UPPER TRIANGULAR MATRIX 27

9.2. Skew-symmetric matrices. A square matrixA is skew-symmetricif AT = −A.

If B is a square matrix show that B −BT is skew symmetric.What can you say about the entries on the diagonal of a skew-symmetric

matrix.

9.3. Upper triangular matrices. A square matrix is upper triangularif all its entries below the diagonal are zero. For example,

0 1 2 30 4 5 60 0 7 80 0 0 9

is upper triangular.

9.4. Lower triangular matrices. A square matrix is lower triangularif all its entries above the diagonal are zero. For example,1 0 0

4 5 07 8 9

is lower triangular. The transpose of a lower triangular matrix is uppertriangular and vice-versa.

10. Solving an equation involving an upper triangular matrix

Here is an easy problem: find numbers x1, x2, x3, x4 such that

Ax =

1 2 3 40 2 −1 00 0 −1 10 0 0 2

x1x2x3x4

=

1234

= b.

Multiplying the bottom row of the 4× 4 matrix by the column x gives 2x4and we want that to equal b4 which is 4, so x4 = 2. Multiplying the thirdrow of the 4 × 4 matrix by x gives −x3 + x4 and we want that to equal b3which is 3. We already know x4 = 2 so we must have x3 = −1. Repeatingthis with the second row of A gives 2x2 − x3 = 2, so x2 = 1

2 . Finally thefirst row of A gives x1 + 2x2 + 3x3 + 4x4 = 1; plugging in the values we havefound for x4, x3, and x2, we get x1 + 1− 3 + 8 = 1 so x1 = −5. We find thata solution is given by

x =

−512−14

.

Is there any other solution to this equation? No. Our hand was forcedat each stage.

Here is a hard problem:

28 2. MATRICES

11. Some special products

Here we consider an m× n matrix B and the effect of multiplying B onthe left by some special m×m matrices. In what follows, let

X =

x1x2...xm

.

11.1. Let D denote the m×m matrix on the left in the product.

(11-1)

λ1 0 0 · · · 0 00 λ2 0 · · · 0 00 0 λ3 · · · 0 0...

. . ....

0 0 0 · · · λm−1 00 0 0 · · · 0 λm

x1x2...

...xm

=

λ1x1λ2x2

...

...λmxm

Since DB = [DB1, . . . , DBn] it follows from (11-1) that the ith row of DBis λi times the ith row of B.

A special case of this appears when we discuss elementary row operationsin chapter 5. There we take λi = c and all other λjs equal to 1. In that case

DB is the same as B except that the ith row of DB is c times the ith rowof DB.

11.2. Fix two different integers i and j and let E be the m×m matrixobtained by interchanging the ith and jth rows of the identity matrix. If Xis an m×1 matrix, then EX is the same as X except that the entries in theith and jth positions of X are switched.

Thus, if B is an m×n matrix, then EB is the same as B except that theith and jth rows are interchanged. (Check that.) This operation, switchingtwo rows, will also appear in chapter 5 when we discuss elementary rowoperations.

11.3. Fix two different integers i and j and let F be the m×m matrixobtained from the identity matrix by changing the ijth entry from zero toc. If X is an m× 1 matrix, then FX is the same as X except that the entryxi has been replaced by xi + cxj . Thus, if B is an m× n matrix, then FB

is the same as B except that the ith row is now the sum of the ith row andc times the jth row. (Check that.) This operation too appears when wediscuss elementary row operations.

11.4. The matrices D, E, and F all have inverses, i.e., there are matricesD′, E′, and F ′, such that I = DD′ = D′D = EE′ = E′E = FF ′ = F ′F ,where I denotes the m×m identity matrix. (Oh, I need to be careful – hereI mean D is the matrix in (11-1) with λi = c and all other λjs equal to 1.)

11. SOME SPECIAL PRODUCTS 29

11.5. Easy question. Is there a positive integer n such that(0 −11 −1

)n=

(1 00 1

)?

If so what is the smallest such n? Can you describe all other n for whichthis equality holds?

CHAPTER 3

Matrices and motions in R2 and R3

1. Linear transformations

Functions play a central role in all parts of mathematics. The functionsrelevant to linear algebra are called linear transformations.

We write Rn for the set of n × 1 column vectors. This has a linearstructure:

(1) if u and v are in Rn so is u+ v;(2) if c ∈ R and u ∈ Rn, then cu ∈ Rn.

A non-empty subset W of Rn is called a subspace if it has these twoproperties: i.e.,

(1) if u and v are in W so is u+ v;(2) if c ∈ R and u ∈W , then cu ∈W .

A linear transformation from Rn to Rm is a function f : Rn → Rm suchthat

(1) f(u+ v) = f(u) + f(v) for all u, v ∈ Rn, and(2) f(cu) = cf(u) for all c ∈ R and all u ∈ Rn.

In colloquial terms, these two requirements say that linear transformationspreserve the linear structure.

Left multiplication by an m × n matrix A is a linear function from Rnto Rm. If x ∈ Rn, then Ax ∈ Rm.

2. Rotations in the plane

3. Projections in the plane

4. Contraction and dilation

5. Reflections in the plane

6. Reflections in R3

7. Projections from R3 to a plane

31

CHAPTER 4

Systems of Linear Equations

The meat of this chapter begins in §2 below. However, some of theterminology and framework for systems of linear equations applies to othersystems of equations so §1 starts with a discussion of the general framework.Nevertheless, it is probably best to go directly to §2, read through to theend of the chapter then return to §1.

1. Systems of equations

You have been solving equations for years. Sometimes you have consid-ered the solutions to a single equation. For example, you are familiar withthe fact that the solutions to the equation y = x2− 4x+ 1 form a parabola.Likewise, the solutions to the equation (x− 2)2 + (y − 1)2 + z2 = 1 form asphere of radius 1 centered at the point (2, 1, 0).

1.1. Let’s consider these statements carefully. By a solution to theequation y = x2 − x − 4 we mean a pair of numbers (a, b) that give a truestatement when they are plugged into the equation, i.e., (a, b) is called asolution if b does equal a2 − 4a + 1. If a2 − 4a + 1 6= b, then (a, b) is not asolution to y = x2 − x− 4.

1.2. Pairs of numbers (a, b) also label points in the plane. More pre-cisely, we can draw a pair of axes perpendicular to each other, usually calledthe x- and y-axes, and (a, b) denotes the point obtained by going a unitsalong the x-axis and b-units along the y-axis. The picture below illustrates

33

34 4. SYSTEMS OF LINEAR EQUATIONS

the situation:

y

b •(a, b)

//

ax

OO

When we label a point by a pair of numbers the order of the numbersmatters. For example, if a 6= b, the points (a, b) and (b, a) are different. Inthe above example, we have

y

a •(b, a)

b •(a, b)

//

b ax

OO

The order also matters when we say (3, 2) is a solution to the equationy = x2 − x− 4. Although (3, 2) is a solution to y = x2 − x− 4, (2, 3) is not.

1.3. R2 and R3. We write R2 for the set of all ordered pairs of numbers(a, b). Here R denotes the set of real numbers and the superscript 2 in R2

indicates pairs of real numbers. The general principle is that a solution toan equation in 2 unknowns is an ordered pair of real numbers, i.e., a pointin R2.

A solution to an equation in 3 unknowns, (x − 2)2 + (y − 1)2 + z2 = 1for example, is an ordered triple of numbers, i.e., a triple (a, b, c) such that(a−2)2 +(b−1)2 +c2 does equal 1. The set of all ordered triples of numbersis denoted by R3.

1. SYSTEMS OF EQUATIONS 35

1.4. A system of equations. We can consider more than one equa-tion at a time. When we consider several equations at once we speak of asystem of equations. For example, we can ask for solutions to the system ofequations

(1-1)

{y = x2 − x− 4

y = 3x− 7.

A solution to this system of equations is an ordered pair of numbers (a, b)with the property that when we plug in a for x and b for y both equationsare true. That is, (a, b) is a solution to the system (1-1) if b = a2 − a − 4and b = 3a − 7. For example, (3, 2) is a solution to the system (1-1). So is(1,−4). Although (4, 8) is a solution to y = x2 − x − 4 it is not a solutionto the system (1-1) because 8 6= 3× 4− 7.

On the other hand, (4, 8) is not a solution to the system (1-1): it isa solution to y = x2 − x − 4 but not a solution to y = 3x − 7 because8 6= 3× 4− 7.

1.5. More unknowns and Rn. The demands of the modern world aresuch that we often encounter equations with many unknowns, sometimesbillions of unknowns. Let’s think about a modest situation, a system of 3equations in 5 unknowns. For example, a solution to the system of equations

(1-2)

x1 + x22 + x33 + x44 + x55 = 100

x1 + 2x2 + 3x3 + 4x4 + 5x5 = 0

x1x2 + x3x4 − x25 = −20

is an ordered 5-tuple1 of numbers, (s1, s2, s3, s4, s5), having the propertythat when we plug in s1 for x1, s2 for x2, s3 for x3, s4 for x4, and s5 for x5,all three of the equations in (1-2) become true. We use the notation R5 todenote the set of all ordered 5-tuples of real numbers. Thus, the solutionsto the system (1-2) are elements of R5.

We often call elements of R5 points just as we call points in the planepoints. A point in R5 has 5 coordinates, whereas a point in the plane has 2coordinates.

More generally, solutions to a system of equations with n unknowns are,or can be thought of as, points in Rn. We sometimes call Rn n-space. Forexample, we refer to the physical world around us as 3-space. If we fixan origin and x-, y-, and z-axes, each point in our physical world can be

1We run out of words after pair, triple, quadruple, quintuple, sextuple, ... so invent theword n-tuple to refer to an ordered sequence of n numbers where n can be any positiveinteger. For example, (8,7,6,5,4,3,2,1) is an 8-tuple. It is not the same as the 8-tuple(1,2,3,4,5,6,7,8). We think these 8-tuples as labels for two different points in R8. Althoughwe can’t picture R8 we can ask questions about it. For example, does (1,1,1,1,1,1,1,1) lieon the line through the points (8,7,6,5,4,3,2,1) and (1,2,3,4,5,6,7,8)? The answer is “yes”.Why? If you can answer this you understand what is going on. If you can’t you don’t,and should ask a question.


labelled in a unique way by an ordered triple of numbers. If we take time aswell as position into account we need 4-coordinates and then speak of R4.Physicists like to call R4 space-time. It has 3 space-like coordinates and 1time coordinate.

1.6. Non-linear equations. The equations y = x2 − x − 4 and (x −2)2 + (y − 1)2 + z2 = 1 are not linear equations.

2. A single linear equation

The general equation of a line in R2 is of the form

ax+ by = c.

The general equation of a plane in R3 is of the form

ax+ by + cz = d.

These are examples of linear equations.A linear equation is an equation of the form

(2-1) a1x1 + · · ·+ anxn = b

in which the ais and b belong to R and x1, . . . , xn are unknowns. We callthe ais the coefficients of the unknowns.

A solution to (2-1) is an ordered n-tuple (s1, . . . , sn) of real numbers thatwhen substituted for the xis makes equation (2-1) true, i.e.,

a1s1 + · · ·+ ansn = b.

When n = 2 a solution is a pair of numbers (s1, s2) which we can think ofas the coordinates of a point in the plane. When n = 3 a solution is a tripleof numbers (s1, s2, s3) which we can think of as the coordinates of a pointin 3-space. We will use the symbol R2 to denote the plane and R3 to denote3-space. The idea for the notation is that the R in R3 denotes the set of realnumbers and the 3 means a triple of real numbers. The notation continues:R4 denotes quadruples (a, b, c, d) of real numbers and R4 is referred to as4-space.2

A geometric view. If n ≥ 2 and at least one ai is non-zero, equation(2-1) has infinitely many solutions: for example, when n = 2 the solutionsare the points on a line in the plane R2; when n = 3 the solutions are thepoints on a plane in 3-space R3; when n = 4 the solutions are the points ona 3-plane in R4; and so on.

It will be important in this course to have a geometric picture of the setof solutions. We begin to do this in earnest in chapter 6 but it is useful tokeep this in mind from the outset. Solutions to equation (2-1) are orderedn-tuples of real numbers (s1, . . . , sn) and we think of the numbers si as thecoordinates of a point in n-space, i.e., in Rn.

2Physicists think of R4 as space-time with coordinates (x, y, z, t), 3 spatial coordinates,and one time coordinate.

3. SYSTEMS OF LINEAR EQUATIONS 37

The collection of all solutions to a1x1 + · · ·+ anxn = b is a special kindof subset in n-space: it has a linear structure and that is why we call (2-1)a linear equation. By a “linear structure” we mean this:

Proposition 2.1. If the points p = (s1, . . . , sn) and q = (t1, . . . , tn) aredifferent solutions to (2-1), then all points on the line through p and q arealso solutions to (2-1).

This result is easy to see once we figure out how to describe the pointsthat lie on the line through p and q.

Let’s write pq for the line through p and q. The prototypical line is thereal number line R so we want to associate to each real number λ a pointon pq. We do that by presenting pq in parametric form: pq consists of allpoints of the form

λp+ (1− λ)q = (λs1 + (1− λ)t1, . . . , λsn + (1− λ)tn)

as λ ranges over all real numbers. Notice that p is obtained when λ = 1 andq is obtained when λ = 0.

Proof of Proposition 2.1. The result follows from the calculation

a1(λs1 + (1− λ)t1

)+ · · ·+ an

(λsn + (1− λ)tn

)=

λ(a1s1 + · · ·+ ansn) + (1− λ)(a1t1 + · · ·+ antn)

which equals λb + (1 − λ)b, i.e., equals b, if (s1, . . . , sn) and (t1, . . . , tn) aresolutions to (2-1). ♦

3. Systems of linear equations

An m×n system of linear equations is a collection of m linear equationsin n unknowns. We usually write out the general m× n system like this:

a11x1 + a12x2 + · · ·+ a1nxn = b1

a21x1 + a22x2 + · · ·+ a2nxn = b2

......

am1x1 + am2x2 + · · ·+ amnxn = bm.

The unknowns are x1, . . . , xn, and the aim of the game is to find them whenthe aijs and bis are specific real numbers. The aijs are called the coefficientsof the system. We often arrange the coefficients into an m× n array

A :=

a11 a12 · · · a1na21 a22 · · · a2n...

. . ....

am1 am2 · · · amn

called the coefficient matrix.


4. A system of linear equations is a single matrix equation

We can assemble the bis and the unknowns xj into column vectors

b :=

b1b2...bm

and x :=

x1x2...xn

.

The system of linear equations at the beginning of §3 can now be written asa single matrix equation

Ax = b.

You should multiply out this equation and satisfy yourself that it is the samesystem of equations as at the beginning of chapter 3. This matrix interpre-tation of the system of linear equations will be center stage throughout thiscourse.

We often arrange all the data into a single m× (n+ 1) matrix

(A| b) =

a11 a12 · · · a1n | b1a21 a22 · · · a2n | b2...

. . ....

...am1 am2 · · · amn | bm

that we call the augmented matrix of the system.

5. Specific examples

5.1. A unique solution. The only solution to the 2× 2 system

x1 + x2 = 2

2x1 − x2 = 1

is (x1, x2) = (1, 1). You can think of this in geometric terms. Each equationdetermines a line in R2, the points (x1, x2) on each line corresponding to thesolutions of the corresponding equation. The points that lie on both linestherefore correspond to simultaneous solutions to the pair of eqations, i.e.,solutions to the 2× 2 system of equations.

5.2. No solutions. The 2× 2 system

x1 + x2 = 2

2x1 + 2x2 = 6

has no solution at all: if the numbers x1 and x2 are such that x1 + x2 = 2then 2(x1 + x2) = 4, not 6. It is enlightening to think about this systemgeometrically. The points in R2 that are solutions to the equation x1+x2 = 2lie on the line of slope −1 passing through (0, 2) and (2, 0). The points in R2

that are solutions to the equation 2(x1 + x2) = 6 lie on the line of slope −1passing through (0, 3) and (3, 0). Thus, the equations give two parallel lines:

5. SPECIFIC EXAMPLES 39

there is no point lying on both lines and therefore no common solution to thepair of equations, i.e., no solution to the given system of linear equations.

5.3. No solutions. The 3× 2 system

x1 + x2 = 2

2x1 − x2 = 1

x1 − x2 = 3

has no solutions because the only solution to the 2 × 2 system consistingof the first two equations is (1, 1) and that is not a solution to the thirdequation in the 3× 2 system. Geometrically, the solutions to each equationlie on a line in R2 and the three lines do not pass through a common point.

5.4. A unique solution. The 3× 2 system

x1 + x2 = 2

2x1 − x2 = 1

x1 − x2 = 0

has a unique solution, namely (1, 1). The three lines corresponding to thethree equations all pass through the point (1, 1).

5.5. Infinitely many solutions. It is obvious that the system con-sisting of the single equation

x1 + x2 = 2

has infinitely many solutions, namely all the points lying on the line of slope−1 passing through (0, 2) and (2, 0).

5.6. Infinitely many solutions. The 2× 2 system

x1 + x2 = 2

2x1 + 2x2 = 4

has infinitely many solutions, namely all the points lying on the line of slope−1 passing through (0, 2) and (2, 0) because the two equations actually givethe same line in R2. A solution to the first equation is also a solution to thesecond equation in the system.


x1 + x2 + x3 = 3

2x1 + x2 − x3 = 4

also has infinitely many solutions. The solutions to the first equation arethe points on the plane x1 + x2 + x3 = 3 in R3. The solutions to the secondequation are the points on the plane 2x1+x2−x3 = 4 in R3. The two planesmeet one another in a line. That line can be described parametrically as thepoints

(1 + 2t, 2− 3t, t)


as t ranges over all real numbers. You should check that when x1 = 1 + 2t,x2 = 2− 3t, and x3 = t, both equations in the 2× 3 system are satisfied.

5.8. A unique solution. The 3× 3 system

x1 + x2 + x3 = 3

2x1 + x2 − x3 = 4

4x1 + 3x2 − 2x3 = 13

has a unique solution, namely (x1, x2, x3) = (−1, 5,−1). The solutions toeach equation are the points in R3 that lie on the plane given by the equation.There is a unique point lying on all three planes , namely (−1, 5,−1). Thisis typical behavior: two planes in R3 meet in a line (unless they are parallel),and that line will (usually!) meet a third plane in a point. The next twoexample show this doesn’t always happen: in Example 5.9 the third plane isparallel to the line that is the intersection of the first two planes; in Example5.10 the third plane is contains the line that is the intersection of the firsttwo planes.

5.9. No solution. We will now show that the 3× 3 system

x1 + x2 + x3 = 3

2x1 + x2 − x3 = 4

3x1 + x2 − 3x3 = 0

has a no solution. In Example 5.7 we showed that all solutions to thesystem consisting of the first two equations in the 3× 3 system we are nowconsidering are of the form (1 + 2t, 2 − 3t, t) for some t in R. However, if(x1, x2, x3) = (1 + 2t, 2− 3t, t), then

(5-1) 3x1 + x2 − 3x3 = 3(1 + 2t) + (2− 3t)− 3t = 5,

not 0, so no solution to the first two equations is a solution to the thirdequation. Thus the 3× 3 system has no solution.


x1 + x2 + x3 = 3

2x1 + x2 − x3 = 4

3x1 + x2 − 3x3 = 5

has infinitely many solutions because, as equation (5-1) showed, every solu-tion to the system consisting of the first two equations in the 3× 3 systemwe are currently considering is a solution to the equation third equation3x1 + x2 − 3x3 = 5.

7. A GEOMETRIC VIEW ON THE NUMBER OF SOLUTIONS 41

6. The number of solutions

The previous examples show that a system of linear equations can haveno solution, or a unique solution, or infinitely many solutions. The nextproposition shows these are the only possibilities.

Proposition 6.1. If a system of linear equations has ≥ 2 solutions, ithas infinitely many.

Proof. Suppose (s1, . . . , sn) and (t1, . . . , tn) are different solutions to them× n system

ai1x1 + ai2x2 + · · ·+ ainxn = bi, (1 ≤ i ≤ m).

Then

(6-1) (λs1 + (1− λ)t1, . . . , λsn + (1− λ)tn)

is also a solution for every λ ∈ R because

ai1(λs1 + (1− λ)t1) + · · ·+ ain(λsn + (1− λ)tn) =

λ(ai1s1 + · · ·+ ainsn) + (1− λ)(ai1t1 + · · ·+ aintn)

= λbi + (1− λ)bi

= bi.

Since there are infinitely many λs in R there are infinitely many solutionsto the system. �

If we think of the solutions to an m × n system as points in Rn, thepoints in (6-1) are the points on the line through the points (s1, . . . , sn) and(t1, . . . , tn). Thus, the proof of Proposition 6.1 says that if two points in Rnare solutions to Ax = b, so are all points on the line through them. It isthis linear nature of the set of solutions that gives rise to the name linearequation. It is also why we call this subject linear algebra. I think Linearalgebra and linear geometry would be a better name.

The following result is an immediate consequence of Proposition 6.1.

Corollary 6.2. A system of linear equations has either

(1) a unique solution or(2) no solutions or(3) infinitely many solutions.

A system of linear equations is consistent if it has a solution, and inconsistentif it does not.

7. A geometric view on the number of solutions

Two lines in the plane intersect at either

(1) a unique point or(2) no point (they are parallel) or(3) infinitely many points (they are the same line).


Lines in the plane are “the same things” as equations a1x+ a′1y = b1. Thepoints on the line are the solutions to the equation. The solutions to a pair(system) of equations a1x+a′1y = b1 and a2x+a′2y = b2 are the points thatlie on both lines, i.e., their intersection.

In 3-space, which we will denote by R3, the solutions to a single equationform a plane. Two planes in R3 intersect at either

(1) no point (they are parallel) or(2) infinitely many points (they are not parallel).

In the second case the intersection is a line if the planes are different, and aplane if they are the same.

Three planes in R3 intersect at either

(1) a unique point or(2) no point (all three are parallel) or(3) infinitely many points (all three planes are the same or the inter-

section of two of them is a line that lies on the third plane).

I leave you to consider the possibilities for three lines in the plane andfour planes in R3. Discuss with friends if necessary.

8. Homogeneous systems

A system of linear equations of the form Ax = 0 is called a homogeneoussystem. A homogeneous system always has a solution, namely x = 0, i.e.,x1 = · · · = xn = 0. This is called the trivial solution because little brainpoweris needed to find it. Other solutions to Ax = 0 are said to be non-trivial.

For a homogeneous system the issue is to describe the set of non-trivialsolutions.

8.1. A trivial but important remark. If (s1, . . . , sn) is a solution toa homogeneous system so is (λs1, . . . , λsn) for all λ ∈ R, i.e., all points onthe line through the origin 0 = (0, . . . , 0) and (s1, . . . , sn) are solutions.

8.2. A geometric view. The next result, Proposition 8.1, shows thatsolutions to the system of equations Ax = b are closely related to solutions tothe homogeneous system of equations Ax = 0.3 We can see this relationshipin the setting of a single equation in 2 unknowns.

The solutions to the equation 2x + 3y = 0 are points on a line in theplane R2. Draw that line: it is the line through the origin with slope −2

3 .The origin is the trivial solution. Other points on the line such as (3,−2)are non-trivial solutions. The solutions to the non-homogeneous equation2x+ 3y = 1 are also points on a line in R2. Draw it. This line has slope −2

3too. It is parallel to the line 2x+ 3y = 0 but doesn’t go through the origin.

Pick any point on the second line, i.e., any solution to 2x+3y = 1. Let’spick (−1, 1). Now take a solution to the first equation, say (3,−2) and add

3Students seem to have a hard time understanding Proposition 8.1. Section 8.3 dis-cusses an analogy to Proposition 8.1 that will be familiar to you from your calculus courses.

8. HOMOGENEOUS SYSTEMS 43

it to (−1, 1) to obtain (2,−1). The point (2,−1) is on the line 2x+ 3y = 1.In fact, if we take any solution to 2x + 3y = 0, say (a, b), and add it to(−1, 1) we obtain a solution to 2x+ 3y = 1 because

(a, b) + (−1, 1) = (a− 1, b+ 1)

and

2(a− 1) + 3(b+ 1) = 2a+ 3b− 2 + 3 = 0 + 1 = 1.

Every solution to 2x + 3y = 1 can be obtained this way. That is whatProposition 8.1 says, though in much greater generailty.

There was nothing special about our choice of (−1, 1). Everything wouldwork in the same way if we had chosen (5,−3) for example. Check it out!

The line 2x + 3y = 1 is obtained by translating (moving) the line 2x +3y = 0 to the parallel line through (−1, 1). Alternatively, the line 2x+3y = 1is obtained by translating the line 2x + 3y = 0 to the parallel line through(5,−3). The picture in chapter 3 might increase your understanding of thisimportant point (though a different equation is used there). The discussionin chapter 3 might also help.

If I had taken 2x+3y = −3 instead of 2x+3y = 1 the same sort of thinghappens. The solutions to 2x+3y = −3 lie on a line parallel to 2x+3y = 0.In fact, every line parallel to 2x+3y = 0 is the set of solutions to 2x+3y = cfor a suitable c.

Proposition 8.1. Suppose u is a solution to the equation Ax = b. Thenthe set of all solutions to Ax = b consists of all vectors u + v in which vranges over all solutions to the homogeneous equation Ax = 0.

Proof. Fix a solution u to the equation Ax = b; i.e., Au = b. If u′ is asolution to Ax = b, then u′ − u is a solution to the homogeneous equationAx = 0, and u′ = u+ (u′ − u).

Conversely, if Av = 0, i.e., if v is a solution to the homogeneous equationAx = 0, then A(u + v) = Au + Av = b + 0 = b, so u + v is a solution toAx = b. �

As we shall later see, the set of solutions to a system of homogeneouslinear equations is a subspace. Subspaces are fundamental objects in linearalgebra. You should compare Proposition 8.1 with the statement of Propo-sition 3.1. The two results say the same thing in slightly different words.

Corollary 8.2. Suppose the equation Ax = b has a solution. ThenAx = b has a unique solution if and only if Ax = 0 has a unique solution.

Proof. Let u be a solution to the equation Ax = b; i.e., Au = b. Let v ∈ Rn.Then A(u+ v) = b+Av so u+ v is a solution to the equation Ax = b if andonly if v is a solution to the equation Ax = 0. �

You might find it easier to understand what Proposition 8.1 is tellingyou by looking at it in conjunction with chapter 7.1.


8.3. An analogy for understanding Proposition 8.1. You proba-bly know that every solution to the differential equation dy

dx = 2x is of the

form x2 + c where c is an arbitrary constant. One particular solution to theequation dy

dx = 2x is x2. Now consider the equation dydx = 0; the solutions to

this are the constant functions c, c ∈ R. Thus every solution to dydx = 2x is

obtained by adding the particular solution x2 to a solution to dydx = 0.

Compare the previous paragraph with the statement of Proposition 8.1:2x is playing the role of b, x2 is playing the role of u, and c is playing therole of v.

In Proposition 8.1 it doesn’t matter which u we choose; all solutions toAx = b are given by u+ v where v runs over all solutions to Ax = 0.

Similarly, we can take any solution to dydx = 2x, x2+17π for example, and

still all solutions to dydx = 2x are obtained by adding x2 + 17π to a solution

to dydx = 0. That is, every solution to dy

dx = 2x is of the form x2 + 17π+ c forsome c ∈ R.

8.4. A natural bijection between solutions to Ax = 0 and so-lutions to Ax = b. A bijection between two sets X and Y is a rule thatsets up a one-to-one correspondence between the elements on X and the el-ements of Y . We sometimes write x↔ y to denote the fact that an elementx in X corresponds to the element y in Y . For example, there is a bijectionbetween the set of all integers and the set of even integers given by n↔ 2n.There is also a bijection between the even numbers and the odd numbersgiven by 2m↔ 2m+ 1.

Suppose the equation Ax = b has a solution, say u. Then Proposition8.1 says there is a bijection between the set of solutions to Ax = b and theset of solutions to Ax = 0 given by

v ←→ u+ v.

In other words, v is a solution to Ax = 0 if and only if u+ v is a solution toAx = b. In plainer language, if Av = 0, then A(u+ v) = b and conversely.

This is an arithmetic explanation, and generalization, of what was saidin the discussion prior to Proposition 8.1. There

(a, b)←→ (a, b) + (−1, 1).

As I said before there is nothing special about our choice of (−1, 1). We couldtake any point on 2x + 3y = 1. For example, if we pick (5,−3) we obtainanother bijection between the solutions to 2x+ 3y = 0 and the solutions to2x+ 3y = 1, namely

(a, b)←→ (a, b) + (5,−3).

CHAPTER 5

Row operations and row equivalence

1. Equivalent systems of equations

Two systems of linear equations are equivalent if they have the samesolutions. An important strategy for solving a system of linear equations isto repeatedly perform operations on the system to get simpler but equivalentsystems, eventually arriving at an equivalent system that is very easy tosolve. The simplest systems to solve are those in row-reduced echelon form.We discuss those in the next chapter. In this chapter we discuss the three“row operations” you can perform on a system to obtain a simpler butequivalent system.

We will write Ri to denote the ith row in a matrix.The three elementary row operations on a matrix are the following:

(1) switch the positions of Ri and Rj ;(2) replace Ri by a non-zero multiple of Ri;(3) replace Ri by Ri + cRj for any number c.

If A′ can be obtained from A by a single row operation, then A can beobtained from A′ by a single row operation. We say that two matricesare row equivalent if one can be obtained from the other by a sequence ofelementary row operations.

Proposition 1.1. Two m × n systems of linear equations Ax = b andA′x = b′ are equivalent if and only if the augmented matrices (A|b) and(A′|b′) are row equivalent.

You might take a look back at chapter 11 to better understand theelementary row operations.

1.1. Exercise. Find conditions on a, b, c, d such that the matrices(a bc d

)and

(1 −30 0

)are row equivalent. (Use the fact that two matrices are row equivalent ifthey have the same sets of solutions.)

2. Echelon Form

A matrix E is in row echelon form (REF) if

(1) all rows consisting entirely of zeroes are at the bottom of E; and

45

46 5. ROW OPERATIONS AND ROW EQUIVALENCE

(2) in each non-zero row the left-most non-zero entry, called the leadingentry of that row, is to the right of the leading entry in the row aboveit.

A matrix E is in row-reduced echelon form (RREF) if

(1) it is in row echelon form, and(2) every leading entry is a 1, and(3) every column that contains a leading entry has all its other entries

equal to zero.

For example, 0 1 0 2 1 0 0 30 0 1 3 1 0 0 40 0 0 0 0 1 0 50 0 0 0 0 0 1 00 0 0 0 0 0 0 0

is in RREF.

Theorem 2.1. Given a matrix A, there is a unique row-reduced echelonmatrix that is row equivalent to A.

Proof. �

The uniqueness result in Theorem 2.1 allows us to speak of the row-reduced echelon matrix that is equivalent to A. We will denote it by

rref(A).

2.1. Reminder. It is important to know there are infinitely many se-quences of elementry row operations that can get you from A to rref(A).

In an exam, if you are given a specific matrix A and asked to put A inrow reduced echelon form there is a unique matrix, rref(A), that you mustproduce, but the row operations that one student uses to get from A torref(A) will probably not be the same as the row operations used by anotherstudent to get from A to rref(A). There is no best way to proceed in doingthis and therefore no guiding rule I can give you to do this. Experience andpractice will teach you how to minimize the number and complexity of theoperations and how to keep the intermediate matrices reasonably “nice”. Ioften don’t swap the order of the rows until the very end. I begin by tryingto get zeroes in the columns, i.e., all zeroes in some columns, and only onenon-zero entry in other columns, and I don’t care which order of columns Ido this for.

3. An example

Doing elementary row operations by hand to put a matrix in row-reducedechelon form can be a pretty tedious task. It is an ideal task for a computer!But, let’s take a deep breath and plunge in. Suppose the system is

MORE TO DO

4. YOU ALREADY DID THIS IN HIGH SCHOOL 47

4. You already did this in high school

You probably had to solve systems of equations like

x1 + x2 + x3 = 3

2x1 + x2 − x3 = 4

4x1 + 3x2 − 2x3 = 13

in high school. (This is the system in chapter 5.8.) I want to persuade youthat the approach you used then is pretty much identical to the method weare using here, i.e., putting the matrix (A | b) in row-reduced echelon form.

A typical high school approach to the problem is to add the first twoequations (to eliminate the x3 variable) and to add twice the first equationto the third equation to get another equation without an x3 term. The twonew equations are

3x1 + 2x2 = 7

6x1 + 5x2 = 19

Now subtract twice the first of these two equations from the second to getx2 = 5. Substitute this into 3x1 + 2x2 = 7 and get x1 = −1. Finally, fromthe first of the original equations, x1 + x2 + x3 = 3, we obtain x3 = −1.Thus, the system has a unique solution, (x1, x2, x3) = (−1, 5,−1).

Let’s rewrite what we did in terms of row operations. The first twooperations we performed are, in matrix notation,

(A | b) =

1 1 1 | 32 1 −1 | 44 1 1 | 13

1 1 1 | 3

3 2 0 | 76 5 0 | 19

.

Then we subtracted twice the second row from the third to get1 1 1 | 33 2 0 | 70 1 0 | 5

.

The last row tells us that x2 = 5. Above, we substituted x2 = 5 into3x1 + 2x2 = 7 and solved for x1. That is the same as subtracting twice thethird row from the second, then dividing by three, i.e.,1 1 1 | 3

3 2 0 | 70 1 0 | 5

1 1 1 | 3

3 0 0 | −30 1 0 | 5

1 1 1 | 3

1 0 0 | −10 1 0 | 5

which gives x1 = −1. Now subtract the second and third rows from the toprow to get 1 1 1 | 3

1 0 0 | −10 1 0 | 5

0 0 1 | −1

1 0 0 | −10 1 0 | 5

which gives x3 = −1. Admittedly the last matrix is not in row-reducedechelon form because the left-most ones do not proceed downwards in the


southeast direction, but that is fixed by one more row operation, movingthe top row to the bottom, i.e.,0 0 1 | −1

1 0 0 | −10 1 0 | 5

1 0 0 | −1

0 1 0 | 50 0 1 | −1

.

5. The rank of a matrix

The rank of an m× n matrix A is the number

rank(A) := the number of non-zero rows in rref(A).

5.1. A very important remark about rank. Later on we will seethat the rank of A is equal to the dimension of the range of A. Afterproving that equality it will be important to keep in mind these two differentinterpretations of rank(A), i.e., as the number of non-zero rows in rref(A)and as the dimension of the range of A.

5.2. Remark. If rref(A) = rref(B), then rank(A) = rank(B).

Proposition 5.1. If A is an m× n matrix, then

rank(A) ≤ mand

rank(A) ≤ n.

Proof. The rank of A is at most the number of rows in rref(A). But rref(A)and A have the same size so rank(A) ≤ m.

Each non-zero row of rref(A) has a left-most 1 in it so the rank of A isthe number of left-most 1s. But a column of rref(A) contains at most oneleft-most 1, so the rank of A is at most the number of columns in A, i.e.,rank(A) ≤ n. �

Proposition 5.2. Let A be an n× n matrix. Then

(1) rank(A) = n if and only if rref(A) = I, the identity matrix;(2) if rank(A) < n, then rref(A) has a row of zeroes.

Proof. (1) If rank(A) = n, then rref(A) has n non-zero rows and each ofthose rows has a left-most 1, i.e., every row of rref(A) contains a left-most 1.Suppose the left-most 1 in row i appear in column ci. The 1s move at leastone column to the right each time we go from each row to the one below itso

1 ≤ c1 < c2 < · · · < cn ≤ n.It follows that c1 = 1, c2 = 2, . . . , cn = n. In other words, the left-most 1in row j appears in column j; i.e., all the left-most 1s lie on the diagonal.Therefore rref(A) = I.

(2) This is clear. �

7. CONSISTENT SYSTEMS 49

6. Inconsistent systems

A system of linear equations is inconsistent if it has no solution (section6). If a row of the form

(6-1) (0 0 · · · 0 | b)

with b 6= 0 appears after performing some row operations on the augmentedmatrix (A | b), then the system is inconsistent because row (6-1) says theequation 0x1 + · · · + 0xn = b appears in a system that is equivalent to theoriginal system of equations, and it is utterly clear that there is no solutionto the equation 0x1 + · · · + 0xn = b when b 6= 0, and therefore no solutionto the original system of equations.

7. Consistent systems

Suppose Ax = b is a consistent m× n system of linear equations .

• Form the augmented matrix (A | b).• Perform elementary row operations on (A | b) until you obtain (rref(A) | d).• If column j contains the left-most 1 that appears in some row of

rref(A) we call xj a dependent variable.• The other xjs are called independent or free variables.• The number of dependent variables equals rank(A) and, becauseA has n columns, the number of independent/free variables is n−rank(A).• All solutions are now obtained by allowing each independent vari-

able to take any value and the dependent variables are now com-pletely determined by the equations corresponding to (rref(A) | d)and the values of the independent/free variables.

An example will clarify my meaning. If

(7-1) (rref(A) | d) =

1 0 2 0 0 3 4 | 10 1 3 0 0 4 0 | 30 0 0 1 0 −2 1 | 00 0 0 0 1 1 −2 | 50 0 0 0 0 0 0 | 00 0 0 0 0 0 0 | 0


the variables are x1, x2, x4, and x5. The independent/free variables x3, x6,and x7, may take any values and then the dependent variables are deter-mined by the equations

x1 = 1− 2x3 − 3x6 − 4x7

x2 = 3− 3x3 − 4x6

x3 = x3

x4 = 2x6 + x7

x5 = 5 − x6 + 2x7

x6 = x6

x7 = x7

In other words, the set of all solutions to this system is given by

x =

1− 2x3 − 3x6 − 4x73− 3x3 − 4x6

x32x6 + x7

5− x6 + 2x7x6x7

or, equivalently,

(7-2) x =

1300500

+ x3

−2−310000

+ x6

−3−402−110

+ x7

−4001201

as x3, x6, x7 range over all real numbers. We call (7-2) the general solutionto the system.1

7.1. Particular vs. general solutions. If we make a particular choiceof x3, x6, and x7, we call the resulting x a particular solution to the system.

1You will later see that the vectors obtained from the three right-most terms of (7-2)as x3, x6, and x7, vary over all of R form a subspace, namely the null space of A. Thus (7-2) is telling us that the solutions to Ax = b belong to a translate of the null space of A bya particular solution of the equation, the particular solution being the term immediatelyto the right of the equal sign in (7-2). See chapter 3 and Proposition 3.1 below for moreabout this.

8. PARAMETRIC EQUATIONS FOR LINES AND PLANES 51

For example, when (x3, x6, x7) = (0, 0, 0) we obtain the particular solution130500

.

Notice that

−2−310000

,

−3−402−110

,

−4001201

are solutions to the homogeneous system Ax = 0. The general solution isobtained by taking one particular solution, say u, perhaps that given bysetting all the independent variables to zero, and adding to it solutions tothe homogeneous equation Ax = 0. That is exactly what Proposition 8.1below says.

The number of independent variables, in this case 3, which is n−rank(A),gives a measure of “how many” solutions there are to the equation Ax = b.The fact that x3, x6, and x7, can be any real numbers says there are 3degrees of freedom for the solution space. Later we will introduce moreformal language which will allow us to say that the space of solutions is atranslate of a 3-dimensional subspace of R7. That 3-dimensional subspaceis called the null space of A. You will understand this paragraph later. Ihope you read it again after you have been introduced to the null space,

The discussion in chapter 3 is also relevant to what we have discussedin this chapter. In the language of chapter 3, the solutions to Ax = b areobtained by translating the solutions to Ax, i.e., the null space of A, by aparticular solution to Ax = b.

Sections 8.2 and 8.4 are also related to the issue of particular versusgeneral solutions. Section 8.2 discusses the particular solution (−1, 1) tothe equation 2x+ 3y = 1 and the general solution (−1, 1) + λ(3,−2) whereλ runs over all real numbers. As λ runs over all real numbers the pointsλ(3,−2) give all the points on the line 2x+ 3y = 0.

8. Parametric equations for lines and planes

You should review the material about parametric equations for curvesand surfaces that you encountered in a course prior to this. The only curvesand surfaces relevant to this course are lines and planes. These have partic-ularly simple parametric descriptions.


For example, the line 2x − 3y = 1 in the plane consists of all points ofthe form (3t + 2, 2t + 1), t ∈ R. Another way of saying this is the line isgiven by the equations

x = 3t+ 2

y = 2t+ 1.

In other words, t is allowed to be any real number, each choice of t givesexactly one point in the plane, and the totality of all points obtained isexactly all points that lie on the line 2x − 3y = 1. Another way of lookingat this is that the points in the plane that are solutions to the equation2x − 3y = 1 are the points on the line. For example, when t = 0 we getthe solution x = 2 and y = 1 or, equivalently, the point (2, 1); when t = −5we get the solution (x, y) = (−13,−9); when t = 4 we get the solution(x, y) = (14, 9); and so on.

In a similar fashion, the points on the plane 2x− y− z = 3 are given bythe parametric equations

x =1

2(3 + s+ t)

y = s

z = t

or, if you prefer,

x =3

2+ s+ t

y = 2s

z = 2t.

As (s, t) ranges over all points of R2, i.e., as s and t range over R, the points(32 + s+ t, 2s, 2t) are all points on the plane 2x− y − z = 3.

In a similar fashion, the points on the line that is the intersection of theplanes 2x−y−z = 3 and x+y+z = 4 are given by the parametric equations

x =

y =

z =

9. The importance of rank

We continue to discuss a consistent m× n system Ax = b.

• We have seen that rank(A) ≤ m and rank(A) ≤ n (Proposition5.1) and that the number of independent variables is n− rank(A).• If rank(A) = n there are no independent variables, so the system

has a unique solution.• If rank(A) < n there is at least one independent variable, so the

system has infinitely many solutions.

10. THE WORD “SOLUTION” 53

Theorem 9.1. Let A be an n×n matrix and consider the equation Ax =b. If rank(A) = n, and the elementary row operations on the augmentedmatrix (A | b) produce (rref(A) | d) = (I | d) where I is the identity matrix,then d is the unique matrix such that Ad = b.

Proof. By Proposition 5.2, rref(A) = I. Hence the elementary row opera-tions change (A | b) to (I | d) where I is the identity matrix. Therefore theequations Ax = b and Ix = d have the same set of solutions. But Ix = x sothe only solution to Ix = d is x = d. �

Proposition 9.2. Let Ax = 0 be an m×n system of homogeneous linearequations. If m < n, the system has a non-trivial solution.

Proof. By Proposition 5.1, rank(A) ≤ m. The number of independentvariables for the equation Ax = 0 is n− rank(A) ≥ n−m ≥ 1. Hence thereare infinitely many solutions by the remarks in chapter 9. �

A shorter, simpler version of Proposition 9.2 is the following.

Corollary 9.3. A homogeneous system of equations with more un-knowns than equations always has a non-trivial solution.

9.1. A simple proof of Corollary 9.3. Let Ax = 0 be a homogeneoussystem of m equations in n unknowns and suppose m < n. Since rref(A) hasmore columns than rows there will be a column that does not have a leading1. Hence there will be at least one independent variable. That variable canbe anything; in particular, it can be given a non-zero value which leads to anon-trivial solution.

10. The word “solution”

A midterm contained the following question: Suppose u is a solution tothe equation Ax = b and that v is a solution to the equation Ax = 0. Writeout a single equation, involving several steps, that shows u+ v is a solutionto the equation Ax = b.

The correct answer is

A(u+ v) = Au+Av = b+ 0 = b.

Done! I talked to some students who could not answer the question. Theydid not understand the meaning of the phrase “u is a solution to the equationAx = b”. The meaning is that Au = b. In principle, it is no more com-plicated than the statement “2 is a solution to the equation x2 + 3x = 10”which means that when you plug in 2 for x you obtain the true statement4 + 6 = 10.


11. Elementary matrices

The n × n identity matrix, which is denoted by In, or just I when n isclear from the context, is the n×n matrix having each entry on its diagonalequal to 1 and 0 elsewhere. For example, the 4× 4 identity matrix is

1 0 0 00 1 0 00 0 1 00 0 0 1

An elementary matrix is a matrix that is obtained by performing a singleelementary row operation on an identity matrix. For example, if we switchrows 2 and 4 of I4 we obtain the elementary matrix

1 0 0 00 0 0 10 0 1 00 1 0 0

.

If we multiply the third row of I4 by a non-zero number c we get the ele-mentary matrix

1 0 0 00 1 0 00 0 c 00 0 0 1

.

If we replace the first row of I4 by itself + c times the second row we getthe elementary matrix

1 c 0 00 1 0 00 0 1 00 0 0 1

.

Theorem 11.1. Let E be the elementary matrix obtained by performingan elementary row operation on the m×m identity matrix. If we perform thesame elementary row operation on an m× n matrix A the resulting matrixis equal to EA.

We won’t prove this. The proof is easy but looks a little technical becauseof the notation. Let’s convince ourselves that the theorem is plausible bylooking at the three elementary matrices above and the matrix

A =

1 2 33 2 11 1 1−1 0 2

.

11. ELEMENTARY MATRICES 55

Now carry out the following three multiplications and check your answers:1 0 0 00 0 0 10 0 1 00 1 0 0

1 2 33 2 11 1 1−1 0 2

=

1 2 3−1 0 21 1 13 2 1

;

and 1 0 0 00 1 0 00 0 c 00 0 0 1

1 2 33 2 11 1 1−1 0 2

=

1 2 33 2 1c c c−1 0 2

;

and 1 c 0 00 1 0 00 0 1 00 0 0 1

1 2 33 2 11 1 1−1 0 2

=

1 + 3c 2 + 2c 3 + c

3 2 11 1 1−1 0 2

.

Theorem 11.2. The following conditions on an n × n matrix A areequivalent:

(1) rref(A) = In;(2) A is a product of elementary matrices;(3) A is invertible.

Proof. (1) =⇒ (2) Suppose rref(A) = In. Then In can be obtained fromA by a sequence of elementary row operations. Therefore A can be ob-tained from In by a sequence of elementary row operations. By Theorem11.1, the result of each of those elementary row operations is the sameas the result obtained by multiplying on the left by an elementary ma-trix. Therefore a sequence of elementary row operations on In producesa matrix of the form EkEk−1 · · ·E1 where each Ej is an elementary ma-trix, i.e., A = EkEk−1 · · ·E1 for a suitable collection of elementary matricesE1, . . . , Ek.

(2) =⇒ (3)(3) =⇒ (1) �

CHAPTER 6

The Vector space Rn

We write Rn for the set of all n× 1 column vectors. Elements of Rn arecalled vectors or points in Rn. We call them points to encourage you to thinkof Rn as a geometric object, a space, in the same sort of way as you thinkof the number line R, the case n = 1, and the plane R2, in which the pointsof R2 are labelled by row vectors (a, b) after choosing a coordinate system,and 3-space R3, the space in which we live or, when you include time, R4.

Solutions to an m × n system Ax = b of linear equations are thereforepoints in Rn. The totality of all solutions is a certain subset of Rn. As wesaid just after Proposition 6.1, the set of solutions has a linear nature: if pand q are solutions to Ax = b so are all points on the line pq.

1. Arithmetic in Rn

Because the elements of Rn are matrices, we can add them to one an-other, and we can multiply them by scalars (numbers). These operationshave the following properties which we have been using for some time now:

(1) if u, v ∈ Rn so is u+ v;(2) if u ∈ Rn and λ ∈ R, then λu ∈ Rn;(3) u+ v = v + u for all u, v ∈ Rn;(4) (u+ v) + w = u+ (v + w) for all u, v, w ∈ Rn;(5) there is an element 0 ∈ Rn having the property that 0 + u = u for

all u ∈ Rn;1

(6) if u ∈ Rn there is an element u′ ∈ Rn such that u+ u′ = 0;2

(7) (αβ)u = α(βu) for all α, β ∈ R and all u ∈ Rn;(8) λ(u+ v) = λu+ λv for all λ ∈ R and all u, v ∈ Rn;(9) (α+ β)u = αu+ βu for all α, β ∈ R and all u ∈ Rn;

(10) 1.u = u for all u ∈ Rn.

1It is important to observe that there is only one element with this property—if0′ ∈ Rn has the property that 0′ + u = u for all u ∈ Rn, then 0 = 0′ + 0 = 0 + 0′ = 0′

where the first equality holds because 0′+u = u for all u, the second equality holds becauseof the commutativity property in (3), and the third equality holds because 0 + u = u forall u.

2We denote u′ by −u after showing that it is unique—it is unique because if u+u′ =u+ u′′ = 0, then

u′ = u′ + 0 = u′ + (u+ u′′) = (u′ + u) + u′′ = (u+ u′) + u′′ = 0 + u′′ = u′′.

57

58 6. THE VECTOR SPACE Rn

2. The standard basis for Rn

The vectors

e1 := (1, 0, . . . , 0, 0),

e2 := (0, 1, 0, . . . , 0),

......

en := (0, 0, . . . , 0, 1)

in Rn will appear often from now on. We call {e1, . . . , en} the standard basisfor Rn. The notion of a basis is defined and discussed in chapter 10 below.From now on the symbols

e1, . . . , en

will be reserved for the special vectors just defined. There is some potentialfor ambiguity since we will use e1 to denote the vector (1, 0) in R2, the vector(1, 0, 0) in R3, the vector (1, 0, 0, 0) in R4, etc. Likewise for the other eis. Itwill usually be clear from the context which one we mean.

One reason for the importance of the ejs is that

(a1, . . . , αn) = a1e1 + a2e2 + · · ·+ anen.

In other words, every vector in Rn can be written as a sum of multiples ofthe ejs in a unique way.

3. Linear combinations and linear span

A linear combination of vectors v1, . . . , vn in Rm is a vector that can bewritten as

a1v1 + · · ·+ anvn

for some numbers a1, . . . , an ∈ R. We say the linear combination is non-trivialif at least one of the ais is non-zero.

The set of all linear combinations of v1, . . . , vn is called the linear spanof v1, . . . , vn and denoted

Sp(v1, . . . , vn) or 〈v1, . . . , vn〉.

For example, the linear span of e1, . . . , en is Rn itself: if (a1, a2, . . . , an) isany vector in Rn, then

(a1, a2, . . . , an) = a1e1 + a2e2 + · · ·+ anen.

3.1. Note that the linear span of v1, . . . , vn has the following properties:

(1) 0 ∈ Sp(v1, . . . , vn);(2) if u and v are in Sp(v1, . . . , vn) so are u+ v;(3) if a ∈ R and v ∈ Sp(v1, . . . , vn) so is av.

You should check these three properties hold. We will return to them lateron – they are the defining properties of a subspace of Rn (see chapter 1).

3. LINEAR COMBINATIONS AND LINEAR SPAN 59

Proposition 3.1. Let A be any m× n matrix and x ∈ Rn. Then Ax isa linear combination of the columns of A. Explicitly,

(3-1) Ax = x1A1 + · · ·+ xnAn

where Aj denotes the jth column of A.

Proof. Let’s write A = (aij) and x = (x1, . . . , xn)T . Then

x1A1 + · · ·+ xnAn =x1

a11a21...

am1

+ x2

a12a22...

am2

+ · · ·+ xn

a1na2n

...amn

=

a11x1a21x1

...am1x1

+

a12x2a22x2

...am2x2

+ · · ·+

a1nxna2nxn

...amnxn

=

a11x1 + a12x2 + · · ·+ a1nxna21x1 + a22x2 + · · ·+ a2nxn

......

am1x1 + am2x2 + · · ·+ amnxn

=

a11 a12 · · · a1na21 a22 · · · a2n... · · ·

...am1 am2 · · · amn

x1x2...xn

=Ax.

The above calculation made use of scalar multiplication (6.11): if r ∈ R andB is a matrix, then rB is the matrix whose ijth entry is rBij ; since r andBij are numbers rBij = Bijr. �

A useful way of stating Proposition 3.1 is that for a fixed A the setof values Ax takes as x ranges over all of Rn is Sp(A1, . . . , An), the linearspan of the columns of A. Therefore, if b ∈ Rm, the equation Ax = b hasa solution if and only if b ∈ Sp(A1, . . . , An). This is important enough tostate as a separate result.

Corollary 3.2. Let A be an m× n matrix and b ∈ Rm. The equationAx = b has a solution if and only if b is a linear combination of the columnsof A.

60 6. THE VECTOR SPACE Rn

4. Some infinite dimensional vector spaces

You already know some infinite dimensional vector spaces. Perhaps themost familiar one is the set of all polynomials with real coefficients. Wedenote it by R[x].

4.1. Polynomials. At high school one considers polynomial functionsof arbitrary degree. If f(x) and g(x) are polynomials so is their sum f(x) +g(x); if λ ∈ R the function λf(x) is a polynomial whenever f is; the zeropolynomial has the wonderful property that f(x) + 0 = 0 + f(x) = f(x).Of course, other properties like associativity and commutativity of additionhold, and the distributive rules λ(f(x) + g(x) = λf(x) + λg(x) and λ +µ)f(x) = λf(x)+µf(x) hold. Every polynomial is a finite linear combinationof the powers 1, x, x2, . . . of x. The powers of x are linearly independentbecause the only way a0 + a1x + · · · + anx

n can be the zero polynomialis if all the ais are zero. The powers of x form a basis for the space ofpolynomials. Thus R[x] is an infinite dimensional vector space.

4.2. Other spaces of functions. One need not restrict attention topolynomials. Let C1(R) denote the set of all differentiable functions f :R → R. This is also a vector space: you can add ’em, multiply ’em byscalars, etc, and the usual associative, commutative and distributive rulesapply. Likewise, C(R), the set of all continuous functions f : R → R is avector space.

MORE to say...

4.3. Infinite sequences. The set of all infinite sequences s = (s0, s1, . . .)of real numbers is a vector space with respect to the following addition andscalar multiplication:

(s0, s1, . . .)+(t0, t1, . . .) = (s0+t0, s1+t1, . . .), λ(t0, t1, . . .) = (λt0, λt1, . . .).

CHAPTER 7

Subspaces

Let W denote the set of solutions to a system of homogeneous equationsAx = 0. It is easy to see that W has the following properties: (a) 0 ∈ W ;(b) if u and v are in W so is u+ v; (c) if u ∈ W and λ ∈ R, then λu ∈ W .Subsets of Rn having this property are called subspaces and are of particularimportance in all aspects of linear algebra. It is also important to knowthat every subspace is the set of solutions to some system of homogeneousequations.

Another important way in which subspaces turn up is that the linearspan of a set of vectors is a subspace.

Now we turn to the official definition.

1. The definition and examples

A subset W of Rn is called a subspace if

(1) 0 ∈W , and(2) u+ v ∈W for all u, v ∈W , and(3) λu ∈W for all λ ∈ R and all u ∈W .

The book gives a different definition but then proves that a subset W of Rnis a subspace if and only if it satisfies these three conditions. We will useour definition not the book’s.

We now give numerous examples.

1.1. The zero subspace. The set {0} consisting of the zero vector inRn is a subspace of Rn. We call it the zero subspace. It is also common tocall it the trivial subspace.

1.2. Rn itself is a subspace. This is clear.

1.3. The null space and range of a matrix. Let A be an m × nmatrix. The null space of A is

N (A) := {x | Ax = 0}.

The range of A is

R(A) := {Ax | x ∈ Rn}.

Proposition 1.1. Let A be an m×n matrix. Then N (A) is a subspaceof Rn and R(A) is a subspace of Rm.

61

62 7. SUBSPACES

Proof. First, we consider N (A). Certainly 0 ∈ N (A) because A0 = 0. nowsuppose that u and v belong to N (A) and that a ∈ R. Then

A(u+ v) = Au+Av = 0 + 0 = 0

and

A(au) = aAu = a0 = 0

so both u+ v and au belong to N (A). Hence N (A) is a subspace of Rn.Now consider R(A). Certainly 0 ∈ R(A) because A0 = 0. now suppose

that u and v belong to R(A) and that a ∈ R. Then there exist x and y inRn such that u = Ax and v = Ay. Therefore

u+ v = Ax+Ay = A(x+ y)

and

au = aAx = Aax

so both u+ v and au belong to R(A). Hence R(A) is a subspace of Rm. �

It is sometimes helpful to think of R(A) as all “multiples” of A where a“multiple of A” means a vector of the form Ax. It is clear that a multipleof AB is a multiple of A; explicitly, ABx = A(Bx). It follows that

R(AB) ⊆ R(A).

The following analogy might be helpful: every multiple of 6 is a multiple of2 because 6 = 2× 3. In set-theoretic notation

{6x | x ∈ Z} ⊆ {2x | x ∈ Z}.

Similarly,

{ABx | x ∈ Rn}. ⊆ {Ax | x ∈ Rn}which is the statement that R(AB) ⊆ R(A).

There is a similar inclusion for null spaces. If Bx = 0, then ABx = 0,so

N (B) ⊆ N (AB).

1.4. Lines through the origin are subspaces. Let u ∈ Rn. Weintroduce the notation

Ru := {λu | λ ∈ R}for the set of all real multiples of u. We call Ru a line if u 6= 0. If u = 0,then Ru is the zero subspace.

Lemma 1.2. Let u ∈ Rn. Then Ru is a subspace of Rn.

Proof. The zero vector is a multiple of u so is in Ru. A sum of two multiplesof u is again a multiple of u, and a multiple of a multiple of u is again amultiple of u. �

1. THE DEFINITION AND EXAMPLES 63

1.5. Planes through the origin are subspaces. Suppose that {u, v}is linearly independent subset of Rn. We introduce the notation

Ru+ Rv := {au+ bv | a, b ∈ R}.

We call Ru+ Rv the plane through u and v.

Lemma 1.3. Ru+ Rv is a subspace of Rn.

Proof. Exercise. It is also a special case of Proposition 1.4 below. �

You already know something about the geometry of R3. For example,if you are given three points in R3 that do not lie on a single line, thenthere is a unique plane containing them. In the previous paragraph thehypothesis that {u, v} is linearly independent implies that 0, u, and v, donot lie on a single line. (In fact, the linear independence is equivalent tothat statement–why?) The unique plane containing 0, u, and v is Ru+ Rv.

You also know, in R3, that given two lines meeting at a point, there is aunique plane that contains those lines. Here, the lines Ru and Rv meet at0, and Ru+ Rv is the unique plane containing them.

1.6. The linear span of any set of vectors is a subspace.

Proposition 1.4. If v1, . . . , vd are vectors in Rn, then their linear spanSp(v1, . . . , vd) is a subspace of Rn. It is the smallest subspace of Rn con-taining v1, . . . , vd.

Proof. We leave the reader to show that Sp(v1, . . . , vd) is a subspace of Rn.That is a straightforward exercise that you should do in order to check thatyou not only understand the meanings of linear combination, linear span,and subspace, but can put those ideas together in a simple proof.

Suppose we have shown that Sp(v1, . . . , vd) is a subspace of Rn. Itcertainly contains v1, . . . , vd. However, if V is a subspace of Rn, it containsall linear combinations of any set of elements in it. Thus if V is a subspacecontaining v1, . . . , vd it contains all linear combinations of those vectors, i.e.,V contains Sp(v1, . . . , vd). Hence Sp(v1, . . . , vd) is the smallest subspace ofRn containing v1, . . . , vd. �

Proposition 1.5. The range of a matrix is equal to the linear span ofits columns.1

1This result can be viewed as a restatement of Corollary 3.2 which says that Ax = bhas a solution if and only if b is a linear combination of the columns of A because therange of an m× n matrix can be described as

R(A) = {b ∈ Rm | b = Ax for some x ∈ Rn}= {b ∈ Rm | the equation b = Ax has a solution}.

64 7. SUBSPACES

Proof. Let A be an m×n matrix with columns A1, . . . , An. If x ∈ Rn, then

Ax = x1A1 + · · ·+ xnAn

soR(A) = {x1A1 + · · ·+ xnAn | x1, . . . , xn ∈ R}.

But the right hand side is the set of all linear combinations of the columnsof A, i.e., the linear span of those columns. �

2. The row and column spaces of a matrix

The row space of a matrixA is the linear span of its rows. The column spaceof a matrix A is the linear span of its columns.

Lemma 2.1. If A and B are equivalent matrices they have the same rowspace.

Proof. Since equivalent matrices are obtained from one another by a se-quence of elementary row operations it suffices to show that the row spacedoes not change when we perform a single elementary row operation on A.

Let R1, . . . , Rm denote the rows of A. If Ri and Rj are switched the rowspace certainly stays the same. If Ri is replaced by λRi for any non-zeronumber λ the row space stays the same. Since aRi + bRj = a(Ri + λRj) +(b−λa)Rj the row space does not change when Ri is replaced by Ri +λRj .�

2.1. Sums of subspaces are subspaces. If S and T are subsets ofRn, their sum is the set

S + T := {u+ v | u ∈ S and v ∈ T}.

Lemma 2.2. If U and V are subspaces of Rn so is U + V .

Proof. Since U and V are subspaces, each contains 0. Hence 0+0 ∈ U+V ;i.e., U + V contains 0.

If x, x′ ∈ U + V , then x = u+ v and x′ = u′ + v′ for some u, u′ ∈ U andv, v′ ∈ V . Using the fact that U and V are subspaces, it follows that

x+ x′ = (u+ v) + (u′ + v′) = (u+ u′) + (v + v′) ∈ U + V

and, if a ∈ R, then

ax = a(u+ v) = au+ av ∈ U + V.

Hence U + V is a subspace. �

The result can be extended to sums of more than subspaces using aninduction argument. You’ll see the idea by looking at the case of threesubspaces U , V , and W , say. First, we define

U + V +W = {u+ v + w | u ∈ U, v ∈ V, and w ∈W}.It is clear that U + V + W = (U + V ) + W . By the lemma, U + V is asubspace. Now we may apply the lemma to the subspaces U + V and W ;

3. LINES, PLANES, AND TRANSLATIONS OF SUBSPACES 65

the lemma then tells us that (U +V ) +W is a subspace. Hence U +V +Wis a subspace.

As an example of the sum of many subspaces, notice that Sp(v1, . . . , vd) =Rv1 + · · ·+Rvd so repeated application of the lemma tells us that the linearspan of any set of vectors is a subspace. This gives an alternative proof ofProposition 1.4.

2.2. Intersections of subspaces are subspaces.

Lemma 2.3. If U and V are subspaces of Rn so is U ∩ V .

Proof. Because U and V are subspaces each of them contains the zerovector. Hence 0 ∈ U ∩ V . Now suppose u and v belong to U ∩ V , and thata ∈ R. Because u and v belong to the subspace U so does u+ v; similarly, uand v belong to V so u+ v ∈ V . Hence u+ v ∈ U ∩V . Finally if w ∈ U ∩Vand a ∈ R, then aw ∈ U because U is a subspace and aw ∈ V because V isa subspace so aw ∈ U ∩ V . �

2.3. Planes through the origin are subspaces. If the set {u, v} islinearly independent we call the linear span of u and v a plane.

3. Lines, planes, and translations of subspaces

Let m and c be non-zero real numbers. Let ` be the line y = mx + cand L the line y = mx. Both lines lie in R2. Now L is a subspace but `is not because it does not contain (0, 0). However, ` is parallel L, and canbe obtained by sliding L sideways, not changing the slope, so the it passesthrough (0, c). We call the line y = mx+ c a translation of the line y = mx;i.e., ` is a translation of the subspace L. The picture below is an examplein which

` is y = −12(x− 3) and L is y = −1

2x

`

L??

//

??

??

OO

66 7. SUBSPACES

Returning to the general case, every point on ` is the sum of (0, c) and apoint on the line y = mx so, if we write L for the subspace y = mx, then

` = (0, c) + L

= {(0, c) + v | v ∈ L}= {(0, c) + (x,mx) | x ∈ R}= (0, c) + R(1,m).

Let V be a subspace of Rn and z ∈ Rn. We call the subset

z + V := {z + v | v ∈ V }

the translation of V by z. We also call z + V a translate of V .

Proposition 3.1. Let S be the set of solutions to the equation Ax = band let V be the set of solutions to the homogeneous equation Ax = 0. ThenS is a translate of V : if z is any solution to Ax = b, then S = z + V .

Proof. Let x ∈ Rn. Then A(z + x) = b + Ax, so z + x ∈ S if and only ifx ∈ V . The result follows. �

Now would be a good time to re-read chapter 7: that chapter talksabout obtaining solutions to an equation Ax = b by translating N (A) bya particular solution to the equation Ax = b. In fact, Proposition 3.1 andProposition 8.1 really say the same thing in slightly different languages, andthe example in chapter 7 illustrates what these results mean in a concretesituation.

3.1. The linear geometry of R3. This topic should already be famil-iar.

Skew lines, parallel planes, intersection of line and plane, parallel lines.For example, find an equation for the subspace of R3 spanned by (1, 1, 1)

and (1, 2, 3). Since the vectors are linearly independent that subspace will bea plane with basis {(1, 1, 1), (1, 2, 3)}. A plane in R3 is given by an equationof the form

ax+ by + cz = d

and is a subspace if and only if d = 0. Thus we want to find a, b, c ∈ R suchthat ax + by + cz = 0 is satisfied by (1, 1, 1) and (1, 2, 3). More concretely,we want to find a, b, c ∈ R such that

x+ y + z = 0

x+ 2y + 3z = 0.

This is done in the usual way:(1 1 1 | 01 2 3 | 0

)

(1 1 1 | 00 1 2 | 0

)

(1 0 −1 | 00 1 2 | 0

).

4. LINGUISTIC DIFFICULTIES: ALGEBRA VS. GEOMETRY 67

Thus c is an independent variable and a = c and b = −2c. We just want aparticular solution so take c = 1, a = 1, and b = −2. The subspace we seekis given by the equation

x− 2y + z = 0.

The fact that c can be anything is just saying that this plane can also begiven by non-zero multiples of the equation x − 2y + z = 0. For example,2x− 4y + 2z = 0 is another equation for the same plane. And so on.

4. Linguistic difficulties: algebra vs. geometry

A midterm question asked students to find all points on the plane

x1 − 2x2 + 3x3 − 4x4 = 0

x1 − x2 − x3 + x4 = 0

in R4. I was surprised how many students found this a hard problem. ButI suspect they could have answered the question find all solutions to thesystem of equations

x1 − 2x2 + 3x3 − 4x4 = 0

x1 − x2 − x3 + x4 = 0.

The two problems are identical! The students’ problem was linguistic notmathematical. The first problem sounds geometric, the second algebraic. Afirst course on linear algebra contains many new words and ideas. To under-stand linear algebra you must understand the connections and relationshipbetween the new words and ideas. Part of this involves the interaction be-tween algebra and geometry. This interaction is a fundamental aspect ofmathematics. You were introduced to it in high school. There you learnedthat the solutions to an equation like y = 3x− 2 “are” the points on a linein R2; that the solutions to the equation 2x2 + 3y2 = 4 “are” the points onan ellipse ;and so on. Translating between the two questions at the start ofthis chapter is the same kind of thing.

The reason students find the question of finding all solutions to the sys-tem of equations easier is that they have a recipe for finding the solutions:find rref(A), find the independent variables, etc. Recipes are a very impor-tant part of linear algebra—after all we do want to solve problems and wewant to find methods to solve particular kinds of problems. But linear alge-bra involves ideas, and connections and interactions between those ideas. Asyou understand these you will learn how to come up with new recipes, andlearn to recognize when a particular recipe can be used even if the problemdoesn’t have the “code words” that suggest a particular recipe can be used.

CHAPTER 8

Linear dependence and independence

The notion of linear independence is central to the course from here onout.

1. The definition

A set of vectors {v1, . . . , vn} is linearly independent if the only solutionto the equation

a1v1 + · · ·+ anvn = 0

is a1 = · · · = an = 0. We say {v1, . . . , vn} is linearly dependent if it is notlinearly independent.

In other words, {v1, . . . , vn} is linearly dependent if there are numbersa1, . . . , an ∈ R, not all zero, such that

a1v1 + · · ·+ anvn = 0.

1.1. The standard basis for Rn is linearly independent. Because

a1e1 + a2e2 + · · ·+ anen = (a1, a2, . . . , an)

this linear combination is zero if and only if a1 = a2 = · · · = am = 0. Theeis form the prototypical example of a linearly independent set.

1.2. Example. Any set containing the zero vector is linearly dependentbecause

1.0 + 0v1 + · · ·+ 0.vn = 0.

2. Criteria for linear (in)dependence

I won’t ask you to prove the next theorem on the exam but you willlearn a lot if you read and re-read its proof until you understand it.

Theorem 2.1. The non-zero rows of a matrix in row echelon form arelinearly independent.

Proof. The key point is that the left-most non-zero entries in each non-zerorow of a matrix in echelon form move to the right as we move downwards

69

70 8. LINEAR DEPENDENCE AND INDEPENDENCE

from row 1 to row 2 to row 3 to ... etc. We can get an idea of the proof bylooking at a single example. The matrix

E =

1 2 3 4 5 60 1 2 3 4 50 0 0 1 2 30 0 0 0 0 0

is in echelon form. Its non-zero rows are

v1 = (1, 2, 3, 4, 5, 6),

v2 = (0, 1, 2, 3, 4, 5),

v3 = (0, 0, 0, 1, 2, 3).

To show these are linearly independent we must show that the only solutionto the equation

a1v1 + a2v2 + a3v3 = 0

is a1 = a2 = a3 = 0. Suppose a1v1 + a2v2 + a3v3 = 0. The sum a1v1 +a2v2 + a3v3 is equal to

(a1, 2a1, 3a1, 4a1, 5a1, 6a1) + (0, a2, 2a2, 3a2, 4a2, 5a2) + (0, 0, 0, a3, 2a3, 3a3).

The left most entry of this sum is a1 so the fact that the sum is equal to0 = (0, 0, 0, 0, 0, 0) implies that a1 = 0. It follows that a2v2 + a3v3 = 0.But a2v2 + a3v3 = (0, a2, . . .) (we don’t care about what is in the po-sitions . . .) so we must have a2 = 0. Therefore a3v3 = 0; but a3v3 =(0 0 0 a3 2a3 3a3

)so we conclude that a3 = 0 too. Hence {v1, v2, v3}

is linearly independent. �

Theorem 2.2. Vectors v1, . . . , vn are linearly dependent if and only ifsome vi is a linear combination of the other vjs.

Proof. (⇒) Suppose v1, . . . , vn are linearly dependent. Then there arenumbers a1, . . . , an, not all zero, such that a1v1 + · · ·+ anvn = 0. Supposeai 6= 0. Then

vi =∑j 6=i

(−a−1i aj)vj .

(⇐) Suppose vi is a linear combination of the other vjs. Then there arenumbers bj such that

vi =∑j 6=i

bjvj .

Therefore

1.vi −∑j 6=i

bjvj = 0,

thus showing that v1, . . . , vn are linearly dependent. �

2. CRITERIA FOR LINEAR (IN)DEPENDENCE 71

2.1. Using Theorem 2.2. Theorem 2.2 is an excellent way to showthat a set of vectors is linearly dependent. For example, it tells us that thevectors 0

12

,

134

,

146

,

are linearly dependent because146

=

012

+

134

.

Please understand why this shows these three vectors are linearly indepen-dent by re-writing it as

1

012

+ 1

134

+ (−1)

146

=

000

and re-reading the definition of linear dependence.

2.2. Remark. The important word in the next theorem is the wordunique. It is this uniqueness that leads to the idea of coordinates which willbe taken up in chapter 10.

Theorem 2.3. Vectors v1, . . . , vn are linearly independent if and onlyif each vector in their linear span can be written as a linear combination ofv1, . . . , vn in a unique way.

Proof. Since Sp(v1, . . . , vn) is, by definition, the set of linear combina-tions of v1, . . . , vn we must prove that the linear independence condition isequivalent to the uniqueness condition.

(⇒) Suppose v1, . . . , vn are linearly independent. Let w ∈ Sp(v1, . . . , vn).If w = a1v1 + · · ·+ anvn and w = b1v1 + · · ·+ bnvn, then

0 = w − w=(a1v1 + · · ·+ anvn

)−(b1v1 + · · ·+ bnvn

)= (a1 − b1)v1 + · · ·+ (an − bn)vn.

Because the set {v1, . . . , vn} is linearly independent,

a1 − b1 = · · · = an − bn = 0.

Hence the two representations of w as a linear combination of v1, . . . , vn arethe same, i.e., there is a unique way of writing w as a linear combination ofthose vectors.

(⇐) Suppose each element in Sp(v1, . . . , vn) can be written as a linearcombination of v1, . . . , vn in a unique way. In particular, there is only oneway to write the zero vector as a linear combination. But 0 = 0v1+ · · ·+0vnso if a1v1 + · · ·+ anvn = 0, then a1 = 0, a2 = 0, . . . , an = 0. In other words,the set {v1, . . . , vn} is linearly independent. �


Theorem 2.4. Every subset of Sp(v1, . . . , vd) having ≥ d+ 1 vectors islinearly dependent.

Proof. Let r ≥ d + 1 and let w1, . . . , wr be vectors in Sp(v1, . . . , vd). Foreach i = 1, . . . , r there are numbers ai1, . . . , aid ∈ R such that

(2-1) wi = ai1v1 + ai2v2 · · ·+ aidvd.

The homogeneous system

a11c1 + a21c2 + · · ·+ ar1cr = 0

a12c1 + a22c2 + · · ·+ ar2cr = 0

......

a1dc1 + a2dc2 + · · ·+ ardcr = 0

of d equations in the unknowns c1, . . . , cr has more unknowns than equationsso has a non-trivial solution; i.e., there are numbers c1, . . . , cr, not all equalto zero, that satisfy this system of equations. It follows from this that

c1w1 + · · ·+ crwr = (c1a11 + c2a21 + · · ·+ crar1)v1+

(c1a12 + c2a22 + · · ·+ crar2)v2+

...

(c1a1d + c2a2d + · · ·+ crard)vd= 0.

This shows that {w1, . . . , wr} is linearly dependent. �

Corollary 2.5. All bases for a subspace have the same number of ele-ments.

Proof. Suppose that {v1, . . . , vd} and {w1, . . . , wr} are bases for W . Then

W = Sp(v1, . . . , vd) = Sp(w1, . . . , wr)

and the sets {v1, . . . , vd} and {w1, . . . , wr} are linearly independent.By Theorem 2.4, a linearly independent subset of Sp(v1, . . . , vd) has ≤ d

elements so r ≤ d. Likewise, by Theorem 2.4, a linearly independent subsetof Sp(w1, . . . , wr) has ≤ r elements so d ≤ r. Hence d = r. �

The next result is a special case of Theorem 2.4 because Rn = Sp{e1, . . . , en}.Nevertheless, we give a different proof which again emphasizes the fact thata homogeneous system of equations in which there are more unknowns thanequations has a non-trivial solution.

Theorem 2.6. If n > m, then every set of n vectors in Rm is linearlydependent.

Proof. Let’s label the vectors A1, . . . , An and make them into the columnsof an m × n matrix that we call A. Then Ax = x1A1 + · · · + xnAn sothe equation Ax = 0 has a non-trivial solution (i.e., a solution with some

3. LINEAR (IN)DEPENDENCE AND SYSTEMS OF EQUATIONS 73

xi 6= 0) if and only if A1, . . . , An are linearly dependent. But Ax = 0 has anon-trivial solution because it consists of m equations in n unknowns andn > m (see Proposition 9.2 and Corollary 9.3). �

3. Linear (in)dependence and systems of equations

Theorem 3.1. The homogeneous system Ax = 0 has a non-trivial solu-tion if and only if the columns A1, . . . , An of A are linearly dependent.

Proof. The proof of Theorem 2.6 used the fact that Ax = x1A1+· · ·+xnAnto conclude that Ax = 0 has a non-trivial solution (i.e., a solution with somexi 6= 0) if and only if A1, . . . , An are linearly dependent. �

The next result is obtained directly from the previous one —it is thecontrapositive. By that we mean the following general principle of logic: ifa statement P is true if and only if a statement Q is true, then P is false ifand only if Q is false.

We apply this principle to Theorem 3.1 by noting that “not having anon-trivial solution” is the same as “having only the trivial solution” (whichis the same as having a unique solution for a homogeneous system), and“not linearly dependent” is the same as “linearly independent”.

Theorem 3.2. The homogeneous system Ax = 0 has a unique solutionif and only if the columns A1, . . . , An of A are linearly independent.

The reason for restating Theorem 3.1 as Theorem 3.2 is to emphasizethe parallel between Theorem 3.2 and Theorem 3.3.

Theorem 3.3. The system Ax = b has a unique solution if and only if bis a linear combination of the columns A1, . . . , An of A and A1, . . . , An arelinearly independent.

Proof. We have already proved that Ax = b has a solution if and onlyif b is a linear combination of the columns of A; see Corollary 3.2 and thediscussion just before it.

(⇒) Suppose Ax = b has a unique solution. Because Ax = b has asolution, Corollary 3.2 and the discussion just before it tell us that b is alinear combination of the columns of A.

We will now use the fact that Ax = b has a unique solution to show thatthe columns of A are linearly independent. Suppose c1A1 + · · ·+ cnAn = 0;in other words Ac = 0 where

c =

c1...cn

.

If u is a solution to Ax = b, so is u+ c because

A(u+ c) = Au+Ac = b+ 0 = b.


But Ax = b has a unique solution so u = u + c. It follows that c = 0. Inother words, the only solution to the equation c1A1 + · · ·+ cnAn = 0 is c1 =· · · = cn = 0 which shows that the columns of A are linearly independent.

(⇐) Suppose that b is a linear combination of the columns A1, . . . , An ofA and the columns are linearly independent. The first of these assumptionstells us there are numbers c1, . . . , cn ∈ R such that b = c1A1 + · · · + cnAn;i.e., Ac = b where c is the n× 1 matrix displayed above. If

d =

d1...dn

were another solution to Ax = b, then b = Ad = d1A1 + · · ·+ dnAn so

(c1 − d1)A1 + · · ·+ (cn − dn)An = Ac−Ad = b− b = 0.

However, A1, . . . , An are linearly independent so c1 = d1, . . . , cn = dn, i.e.,c = d thereby showing that Ax = b has a unique solution. �

Lemma 2.1 tells us that equivalent matrices have the same row space.

Proposition 3.4. Let A be any matrix. Then A and rref(A) have thesame row space, and the non-zero rows of rref(A) are linearly independent.

Proof. Let E = rref(A). The fact that A and E have the same row spaceis a special case of Lemma 2.1. Suppose r1, . . . , rt are the non-zero rows ofE. Suppose the left-most 1 in ri appears in the jthi column. Then j1 < j2 <. . . < jt.

Now suppose that a1r1 + · · ·+ atrt = 0.The entry in column j1 of r1 is 1 and the entry in column j1 of all other

ris is 0, so the entry in column j1 of a1r1 + · · ·+ atrt is a1. Hence a1 = 0.The entry in column j2 of r2 is 1 and the entry in column j2 of all other

ris is 0, so the entry in column j2 of a1r1 + · · ·+ atrt is a2. Hence a2 = 0.Carry on in this way to deduce that each ai is zero. We will then

have shown that the only solution to the equation a1r1 + · · · + atrt = 0 isa1 = a2 = · · · = at = 0. Hence {r1, . . . , rt} is linearly independent. �

After we have defined the word basis you will see that Proposition 3.4says that the non-zero rows of rref(A) are a basis for the row space of A.

If one is interested in the column space of a matrix A rather than itsrow space, one can replace A by AT and perform elementary row operationson AT . The transposes of the rows of rref(AT ) span the column space of A.We will see later that they form a basis for the column space of A.

CHAPTER 9

Non-singular matrices, invertible matrices, andinverses

The adjectives singular, non-singular, and invertible, apply to squarematrices and square matrices only.

1. Singular and non-singular matrices

An n × n matrix A is called non-singular if the only solution to theequation Ax = 0 is x = 0. We say A is singular if Ax = 0 has a non-trivialsolution.

When we say the following are equivalent in the next theorem what wemean is that if one of the three statements is true so are the others. Italso means that if one of the three statements is false so are the others.Therefore to prove the theorem we must do the following: assume one ofthe statements is true and then show that assumption forces the other twoto be true. In the proof below we show that (1) is true if and only if (2) istrue. Then we show that (3) is true if (2) is. Finally we show (1) is true if(3) is. Putting all those implications together proves that the truth of anyone of the statements implies the truth of the other two.

Theorem 1.1. Let A be an n× n matrix. The following are equivalent:

(1) A is non-singular;(2) {A1, . . . , An} is linearly independent.(3) Ax = b has a unique solution for all b ∈ Rn.

Proof. (1) ⇔ (2) This is exactly what Theorem 3.2 says.(2) ⇒ (3) Suppose that (2) is true.Let b ∈ Rn. By Theorem 2.6, the n + 1 vectors b, A1, . . . , An in Rn are

linearly dependent so there are numbers a1, . . . , an, an+1, not all zero, suchthat

a0b+ a1A1 + · · ·+ anAn = 0.

I claim that a0 can not be zero.Suppose the claim is false. Then a1A1 + · · ·+anAn = 0, But A1, . . . , An

are linearly independent so a1 = · · · = an = 0. Thus if a0 = 0, then all theais are zero. That contradicts the fact (see the previous pragraph) that notall the ais are zero. Since the assumption that a0 = 0 leads to a contradictionwe conclude that a0 must be non-zero.

75

76 9. NON-SINGULAR MATRICES, INVERTIBLE MATRICES, AND INVERSES

Hence

b = (−a−10 a1)A1 + · · ·+ (−a−10 an)An.

Thus b is a linear combination of A1, . . . , An. By hypothesis, A1, . . . , Anlinearly independent. Hence Theorem 3.3 tells us there is a unique solutionto Ax = b.

(3) ⇒ (1) Assume (3) is true. Then, in particular, the equation Ax = 0has a unique solution; i.e., A is non-singular, so (1) is true. �

2. Inverses

Mankind used whole numbers, integers, before using fractions. Fractionswere created because they are useful in the solution of equations. For exam-ple, the equation 3x = 7 has no solution if we require x to be an integer butit does have a solution if we allow x to be a fraction, that solution being 7

3 .One could take the sophisticated (too sophisticated?) view that this

solution was found by multiplying both sides of the equation by the inverseof 3. Explicitly, 3x is equal to 7 if and only if 1

3 × 3x equals 13 × 7. Now use

the associative law for multiplication and perform the complicated equation

1

3× 3x =

(1

3× 3)× x = 1× x = x

so the previous sentence now reads 3x is equal to 7 if and only if x equals13 × 7. Presto! we have solved the equation.

Sometimes one encounters a matrix equation in which one can performthe same trick. Suppose the equation is Ax = b and suppose you are in thefortunate position of knowing there is a matrix E such that EA = I, theidentity matrix. Then Ax = b implies that E × Ax = E × b. Thus, usingthe associative law for multiplication Ax = b implies

Eb = E × b = E ×Ax = (E ×A)× x = I × x = x.

Presto! We have found a solution, namely x = Eb. We found this by thesame process as that in the previous paragraph.

Let’s look at an explicit example. Suppose we want to solve

(2-1)

(1 23 4

)(x1x2

)=

(−15

).

I know that the matrix

E =1

2

(−4 23 −1

)has the property that EA = I. You should check that. The analysis in theprevious paragraph tells me that a solution to the equation (2-1) is(

x1x2

)= E

(−15

)=

1

2

(−4 23 −1

)(−15

)=

(7−4

).

2. INVERSES 77

Lemma 2.1. Let A be an n× n matrix and I the n× n identity matrix.If there are matrices E and E′ such that

EA = AE′ = I

then E = E′.

Proof. This is a single calculation:

E′ = IE′ = (EA)E′ = E(AE′) = EI = E.

All we used was the associative law. �

The inverse of an n× n matrix A is the matrix A−1 with the property

AA−1 = I = A−1A,

provided it exists. If A has an inverse we say that A is invertible.

2.1. Uniqueness of A−1. Our definition says that the inverse is thematrix . . .. The significance of Lemma 2.1 is that it tells us there is atmost one matrix, E say, with the property that AE = EA = I. Check youunderstand that!

2.2. Not every square matrix has an inverse. The phrase providedit exists suggests that not every n×n matrix has an inverse—that is correct.For example, the zero matrix does not have an inverse. We will also seeexamples of non-zero square matrices that do not have an inverse.

2.3. The derivative of a function f(x) at a ∈ R is defined to be

limh→0

f(a+ h)− f(a)

h

provided this limit exists. The use of the phrase “provided it exists” inthe definition of the inverse of a matrix plays the same role as the phrase“provided this limit exists” plays in the definition of the derivative.

2.4. The inverse or an inverse. You should compare our definitionof inverse with that in the book. The book says an n×n matrix is invertibleif there is a matrix A−1 such that AA−1 = A−1A = I and calls such A−1 aninverse of A (if it exists). Lemma 2.1 tells us there is at most one inverseso we can immediately speak of the inverse (provided it exists). You mightwant to look at the definition of the inverse in several books to gain a finerappreciation of this fine point.

Here is the definiton from the second edition of Linear Algebra by M.O’Nan:

If A is an m×m matrix and there is an m×m matrix Bsuch that AB = BA = I, A is said to be invertible andB is said to be the inverse of A.

Having made this definition, the first natural questionto ask is whether a matrix A can have two different in-verses. Theorem 2 shows this is impossible.


Theorem 2 Let A be an m × m matrix with inverseB. If C is another matrix such that AC = CA = I, thenC = B.

3. Elementary matrices are invertible

As we will now explain, the fact that elementary matrices are invertibleis essentially the same as the fact that if a matrix B can be obtained froma matrix A by performing a single elementary row operation, then A can beobtained from B by performing a single elementary row operation

First recall that elementary matrices are, by definition, those matricesobtained by performing a single elementary row operation on an identitymatrix.

Let A be an m × n matrix. By Theorem 11.1, the result of performingan elementary row operation on A produces a matrix that is equal to EAfor a suitable m × m elementary matrix E. Since E is obtained from Imby an elementary row operation, Im can be obtained from E by performingan elementary row operation. Therefore Im = E′E for some elementarymatrix E′. If you check each of the three elementary row operations youwill see that the transpose of an elementary matrix is an elementary matrix.Hence ET is an elementary matrix and by the argument just given, there isan elementary matrix F such that FET = Im. Hence EF T = Im = E′E.By Lemma 2.1, the fact that EF T = I = E′E implies F T = E′. HenceE′E = EE′ = I and E′ is the inverse of E.

4. Making use of inverses

4.1. If A has an inverse and AB = C, then B = A−1C. To seethis just multiply both sides of the equality AB = C on the left by A−1 toobtain A−1C = A−1(AB) = (A−1A)B = IB = B.

4.2. Solving Ax = b when A is invertible. The only solution to thisequation is x = A−1b. This is a special case of the previous remark. Inchapter 2 we put this remark into practice for an explicit example.

4.3. Wicked notation. Never, never, never writeB/A to denoteBA−1.

4.4. If AB = 0 and B 6= 0, then A is not invertible. If A is in-vertible and B is a non-zero matrix such that AB = 0, then B = IB =(A−1A)B = A−1(AB) = 0 so contradicting the fact that B is non-zero. Weare therefore forced to conclude that if AB = 0 for some non-zero matrixB, then A is not invertible. Likewise, if BA = 0 for some non-zero matrixB, then A is not invertible.

This is a useful observation because it immediately gives a large sourceof matrices that are not invertible. For example, the matrix(

a b0 0

)

5. THE INVERSE OF A 2× 2 MATRIX 79

does not have an inverse because(a b0 0

)(w xy z

)=

(∗ ∗0 0

);

in particular, the product can never be the identity matrix because the22-entry is zero regardless of what w, x, y, z are.

4.5. We can cancel invertible matrices. Let A be an n×n matrix.If AB = AC, then B = C because AB = AC implies A−1(AB) = A−1(AC)and the associative law then allows us to replace A(A−1 by I to obtainIB = IC, i.e., B = C. I prefer the argument

B = IB = (A−1A)B = A−1(AB) = A−1(AC) = (A−1A)C = IC = C.

A similar argument shows that if A is invertible and EA = FA, thenE = F .

Be warned though that you cannot cancel in an expression like AB = CAbecause matrices do not necessarily commute with each other.

4.6. The inverse of a product. If A and B are n× n matrices haveinverses, then AB is invertible and its inverse is B−1A−1 because

(AB)(B−1A−1) = A(BB−1)A−1 = AIA−1 = AA−1 = I

and similarly (B−1A−1)(AB) = I.A popular error is to claim that the inverse of AB is A−1B−1. An

analogy might be helpful: the inverse to the operation put on socks thenshoes is take off shoes then socks. The inverse to a composition of operationsis the composition of the inverses of the operations in the reverse order.

5. The inverse of a 2× 2 matrix

There is a simple arithmetic criterion for deciding if a given 2×2 matrixhas an inverse.

Theorem 5.1. The matrix

A =

(a bc d

)has an inverse if and only if ad− bc 6= 0. If ad− bc 6= 0, then

A−1 =1

ad− bc

(d −b−c a

)Proof. (=⇒) Suppose A has an inverse. Just prior to this theorem, inchapter 4.4, we observed that (

a b0 0

)does not have an inverse, so either c or d must non-zero. Therefore(

00

)6=(d−c

)= A−1A

(d−c

)= A−1

(a bc d

)(d−c

)= A−1

(ad− bc

0

)


so we conclude that ad− bc 6= 0.(⇐=) If ad− bc 6= 0 a simple calculation shows that the claimed inverse

is indeed the inverse. �

The number ad− bc is called the determinant of(a bc d

).

We usually write det(A) to denote the determinant of a matrix. Theorem 5.1therefore says that a 2×2 matrix A has an inverse if and only if det(A) 6= 0.

In chapter 12 we define the determinant for n× n matrices and a provethat an n × n matrix A has an inverse if and only if det(A) 6= 0. Thatis a satisfying result. Although the formula for the determinant appearscomplicated at first it is straightforward to compute, even for large n, if onehas a computer handy. It is striking that the invertibility of a matrix can bedetermined by a single computation even if doing the computation by handcan be long and prone to error.

Theorem 5.2. Let A be an n×n matrix. Then A has an inverse if andonly if it is non-singular.

Proof. (⇒) Suppose A has an inverse. If Ax = 0, then

x = Ix = (A−1A)x = A−1(Ax) = A−1.0 = 0

so A is non-singular.(⇐) Suppose A is non-singular. Then Ax = b has a unique solution for

all b ∈ Rn by Theorem 1.1. Let

e1 = (1, 0, . . . , 0)T , e2 = (0, 1, 0, . . . , 0), . . . , en = (0, . . . , 0, 1)

and define uj ∈ Rn to be the unique column vector such that Auj = ej . LetU = [u1, . . . , un] be the n× n matrix whose columns are u1, . . . , un. Then

AU = [Au1, . . . , Aun] = [e1, . . . , en] = I.

We will show that U is the inverse of A but we do not compute UA to dothis!

First we note that U is non-singular because if Ux = 0, then

x = Ix = (AU)x = A(Ux) = A.0 = 0.

Because U is non-singular we may apply the argument in the previous para-graph to deduce the existence of a matrix V such that UV = I. (Theprevious paragraph showed that if a matrix is non-singular one can multiplyit on the right by some matrix to obtain the identity matrix.) Now

V = IV = (AU)V = A(UV ) = AI = A

so AU = UA = I, proving that U = A−1. �

We isolate an important part of the proof. After finding U and showingAU = I we were not allowed to say U is the inverse of A and stop the proofbecause the definition of the inverse (read it carefully, now) says that we

6. IF A IS NON-SINGULAR HOW DO WE FIND A−1? 81

need to show that UA = I too before we are allowed to say U is the inverseof A. It is not obvious that AU = I implies UA = I because there is noobvious reason for A and U to commute. That is why we need the lastparagraph of the proof.

Corollary 5.3. Let A be an n × n matrix. If AU = I, then UA = I,i.e., if AU = I, then U is the inverse of A.

Corollary 5.4. Let A be an n × n matrix. If V A = I, then AV = I,i.e., if V A = I, then V is the inverse of A.

Proof. Apply Corollary 5.3 with V playing the role that A played in Corol-lary 5.3 and A playing the role that U played in Corollary 5.3. �

Theorem 5.5. The following conditions on an n×n matrix A are equiv-alent:

(1) A is non-singular;(2) A is invertible;(3) rank(A) = n;(4) rref(A) = I, the identity matrix;(5) Ax = b has a unique solution for all b ∈ Rn;(6) the columns of A are linearly independent;

6. If A is non-singular how do we find A−1?

We use the idea in the proof of Theorem 5.2:

the jth column of A−1 is the solution to the equationAx = ej so we may form the augmented matrix (A | ej)and perform row operations to get A in row-reduced ech-elon form.

We can carry out this procedure in one great big military operation. If wewant to find the inverse of the matrix

A =

1 2 13 0 11 1 1

we perform elementary row operations on the augmented matrix1 2 1 | 1 0 0

3 0 1 | 0 1 01 1 1 | 0 0 1

so we get the identity matrix on the left hand side. The matrix on theright-hand side is then A−1. Check it out!

CHAPTER 10

Bases, coordinates, and dimension

1. Definitions

A basis for a subspace W is a linearly independent subset of W thatspans it.

The dimension of a subspace is the number of elements in a basis for it.To make this idea effective we must show every subspace has a basis and

that all bases for a subspace have the same number of elements.Before doing that, let us take note of the fact that {e1, . . . , en} is a

basis for Rn, the standard basis. We made the trivial observation that thesevectors span Rn in section 3, and the equally trivial observation that thesevectors are linearly independent in chapter 8. Combining these two facts wesee that {e1, . . . , en} is a basis for Rn. Therefore

dimRn = n.

Strictly speaking, we can’t say this until we have proved that every basis forRn has exactly n elements but we will prove that in Corollary 2.5 below.

Theorem 1.1. Suppose v1, . . . , vd is a basis for W . If w ∈ W , thenthere are unique scalars a1, . . . , ad such that w = a1v1 + · · ·+ advd.

Proof. This theorem is a reformulation of Theorem 2.3. �

If v1, . . . , vd is a basis for W and w = a1v1+· · ·+advd we call (a1, . . . , ad)the coordinates of w with respect to the basis v1, . . . , vd. In this situation wecan think of (a1, . . . , ad) as a point in Rd.

2. A subspace of dimension d is just like Rd

The function

T : Rd →W, T (a1, . . . , ad) := a1v1 + · · ·+ advd

is bijective (meaning injective and surjective, i.e., one-to-one and onto). Italso satisfies the following properties:

(1) T (0) = 0;(2) if x, y ∈ Rd, then T (x+ y) = T (x) + T (y);

(3) if x ∈ Rd and λ ∈ R, then T (λx) = λT (x).

These properties are all special cases of the following property:

(2-1) T (λ1x1 + · · ·+ λsxs) = λ1T (x1) + · · ·+ λsT (xs)

83

84 10. BASES, COORDINATES, AND DIMENSION

for all x1, . . . , xs ∈ Rd and all λ1, . . . , λs ∈ R.The function T is an example of a linear transformation.It is the existence of the bijective function T having these properties that

justifies my repeated remark that a d-dimensional subspace looks exactly likeRd. That is the way you should think of a d-dimensional subspace of Rn. Itis a copy of Rd sitting inside Rn. For example, all planes in R3 are copies ofR2 sitting inside R3. Likewise, all lines in R3 are copies of R1 = R sittinginside R3.

3. All bases for W have the same number of elements

4. Every subspace has a basis

Our next goal, Theorem 4.2, is to show that every subspace has a basis.First we need a lemma.

By Theorem 2.6, every subset of Rn having ≥ n+ 1 elements is linearlydependent. The lemma isolates the idea of building up a basis for a subspaceone element at a time, i.e., by getting larger and larger linearly independentsubsets of the subspace, a process that must eventually stop.

Lemma 4.1. Let {w1, . . . , wr} be a linearly independent subset of Rn.Let v be an element in Rn that does not belong to Sp(w1, . . . , wr). Then{v} ∪ {w1, . . . , wr} is linearly independent.

Proof. Suppose

λv + a1w1 + · · ·+ anwn = 0

for some λ, a1, . . . , an ∈ R. If λ is non-zero, then

v = (−λ−1a1)w1 + · · ·+ (−λ−1an)wn

which contradicts the hypothesis that v is not in Sp(w1, . . . , wr). Hence λ =0. Now a1w1+ · · ·+anwn = 0. Because {w1, . . . , wr} is linearly independenta1 = · · · = an = 0. Hence {v, w1, . . . , wr} is linearly independent. �

Theorem 4.2. Every subspace of Rn has a basis.

Proof. Let W be a subspace of Rn. We adopt the standard conventionthat the empty set is a basis for the subspace {0} so we may assume thatW 6= 0. This ensures there is an element w ∈ W such that {w} is linearlyindependent.

Pick d as large as possible such that W contains a linearly independentsubset having d elements. There is a largest such d because every subset ofRn having ≥ n + 1 elements is linearly dependent (Theorem 2.6). Suppose{w1, . . . , wd} is a linearly independent subset of W . It is a basis for W ifit spans W . However, if Sp(w1, . . . , wd) 6= W there is a v in W that is notin Sp(w1, . . . , wd). By Lemma 4.1, {w1, . . . , wd, v} is linearly independent,thus contradicting the maximality of d. We conclude that no such v canexist; i.e., Sp(w1, . . . , wd) = W . Hence {w1, . . . , wd} is a basis for W . �

5. PROPERTIES OF BASES AND SPANNING SETS 85

5. Properties of bases and spanning sets

The following is a useful way to think of d = dimW ; W has a linearly in-dependent set having d elements and every subset of W having > d elementsis linearly dependent.

Corollary 5.1. Let W be a subspace of Rn. Then dimW is the largestnumber d such that W contains d linearly independent vectors.

Proof. This is a consequence of the proof of Theorem 4.2. �

Corollary 5.2. Let V and W be subspaces of Rn and suppose V ⊆W .Then

(1) dimV < dimW if and only if V 6= W ;(2) dimV = dimW if and only if V = W .

Proof. Let {v1, . . . , vc} be a basis for V and {w1, . . . , wd} a basis for W .Since {v1, . . . , vc} is a linearly independent subset of V , and hence of W ,c ≤ d.

(1) (⇒) Suppose c < d. Since all bases forW have d elements, {v1, . . . , vc}can’t be a basis for W . Hence V 6= W .

(⇐) Suppose V 6= W . Then there is an element w ∈ W that is not inV , i.e., not in span{v1, . . . , vc}. By Lemma 4.1, {v1, . . . , vc, w} is linearlyindependent. It is also a subset of W and contains c+1 elements so c+1 ≤ d.In particular, c < d.

(2) This is the contrapositive of (1). �

Corollary 5.3. Let W = span{v1, . . . , vt}. Some subset of {v1, . . . , vt}is a basis for W .

Proof. If {v1, . . . , vt} is linearly independent it is a basis for W and thereis nothing more to prove.

If {v1, . . . , vt} is linearly dependent, some vi is a linear combination ofthe other vjs so we can remove that vi without changing the linear spanand so get a set of t − 1 vectors that spans W . By repeating this processwe eventually get a linearly independent subset of {v1, . . . , vt} that spansW and is therefore a basis for W . �

Theorem 5.4. Let W be a d-dimensional subspace of Rn and let S ={w1, . . . , wp} be a subset of W .

(1) If p ≥ d+ 1, then S is linearly dependent;(2) If p < d, then S does not span W .(3) If p = d, the following are equivalent:

(a) S is linearly independent;(b) S spans W ;(c) S is a basis for W .


Proof. Let V be the subspace spanned by S. Because S ⊆W , V ⊆W . ByCorollary 5.3, some subset of S is a basis for V . Hence dimV ≤ p.

(1) is a restatement of Corollary 5.1.(2) If p < d, then dimV < dimW so V 6= W .(3) Certainly, (c) implies both (a) and (b).(a) ⇒ (b) If S is linearly independent it is a basis for V so dimV = p =

d = dimW . Hence V = W .(b) ⇒ (c) If S spans W , then V = W so dimV = dimW

�

6. How to find a basis for a subspace

When a subspace is described as the linear span of an explicit set ofvectors, say W = Sp(w1, . . . , wr), the given vectors are typically linearlydependent so are not a basis for W . One can find a basis for W in thefollowing way:

• if the wjs are row vectors form the matrix A with rows w1, . . . , wr;– Sp(w1, . . . , wr) is equal to the row space of A;

• if the wjs are column vectors form the matrix A with the rows

wT1 , . . . , wTr ;

– Sp(wT1 , . . . , wTr ) is equal to the row space of A;

• perform elementary row operations on A to obtain a matrix E inechelon form (in row-reduced echelon form if you want, though itisn’t necessary for this);• by Theorem 2.1, the non-zero rows of E are linearly independent;• the non-zero rows of E are therefore a basis for the row space of E;• by Lemma 2.1, A and E have the same row space;• the non-zero rows of E are therefore a basis for the row space of A;

– if the wjs are row vectors the non-zero rows of E are a basisfor Sp(w1, . . . , wr);

– if the wjs are column vectors the transposes of the non-zerorows of E are a basis for Sp(w1, . . . , wr).

7. How to find a basis for the range of a matrix

Proposition 1.5 proved that the range of a matrix is the linear span ofits columns. A basis for R(A) is therefore a basis for the column space ofA. Take the transpose of A and compute rref(AT ). The non-zero rows ofrref(AT ) are a basis for the row space of AT . The transposes of the non-zerorows in rref(AT ) are therefore a basis for the column space, and hence therange, of A.

Try an example!

8. Rank + Nullity

The nullity of a matrix A is the dimension of its null space.

8. RANK + NULLITY 87

As a consequence of the following important result we shall see that thedimension of the range of A is equal to the rank of A (Corollary 8.2). Inmany books the rank of a matrix is defined to be the dimension of its range.Accordingly, the formula in the next result is often stated as

nullityA+ rank(A) = n.

Theorem 8.1. Let A be an m× n matrix. Then

dimN (A) + dimR(A) = n.

Proof. Pick bases {u1, . . . , up} for N (A) and {w1, . . . , wq} for R(A). Re-member that N (A) ⊆ Rn and R(A) ⊆ Rm.

There are vectors v1, . . . , vq in Rn such that wi = Avi for all i. We willshow that the set

B := {u1, . . . , up, v1, . . . , vq}is a basis for Rn.

Let x ∈ Rn. Then Ax = a1w1 + · · ·+aqwq for some a1, . . . , aq ∈ R. Now

A(x− a1v1 − · · · − aqvq

)= Ax− a1Av1 − · · · − aqAvq = 0

so x− a1v1 − · · · − aqvq is in the null space of A and therefore

x− a1v1 − · · · − aqvq = b1u1 + . . .+ bpup

for some b1, . . . , bp ∈ R. In particular,

x = a1v1 + · · ·+ aqvq + b1u1 + . . .+ bpup

so B spans Rn.We will now show that B is linearly independent. If

c1v1 + · · ·+ cqvq + d1u1 + . . .+ dpup = 0,

then

0 = A(c1v1 + · · ·+ cqvq + d1u1 + . . .+ dpup

)= c1Av1 + · · ·+ cqAvq + d1Au1 + . . .+ dpAup= c1w1 + · · ·+ cqwq + 0 + . . .+ 0.

But {w1, . . . , wq} is linearly independent so

c1 = · · · = cq = 0.

Therefore

d1u1 + . . .+ dpup = 0.

But {u1, . . . , up} is linearly independent so

d1 = · · · = dp = 0.

Hence B is linearly independent, and therefore a basis for Rn. But B hasp+ q elements so p+ q = n, as required. �

We now give the long-promised new interpretation of rank(A).


Corollary 8.2. Let A be an m× n matrix. Then

rank(A) = dimR(A).

Proof. In chapter 7, we defined rank(A) to be the number of non-zero rowsin rref(A), the row-reduced echelon form of A. From the discussion in thatchapter we obtain

n− rank(A) = the number of independent variables

for the equation Ax = 0

= dim{x ∈ Rn | Ax = 0}= dimN (A).

But dimN (A) = n−dimR(A) by Theorem 8.1, so rank(A) = dimR(A). �


nullityA+ rank(A) = n.

Corollary 8.4. The row space and column spaces of a matrix have thesame dimension.

Proof. Let A be an m×n matrix. By Proposition 1.5, the column space ofA is equal toR(A). The row space of A is equal to the row space of rref(A) sothe dimension of the row space of A is rank(A). But rank(A) = dimR(A).The result follows. �


rank(A) = rank(AT ).

Proof. The rank of AT is equal to the dimension of its row space whichis equal to the dimension of the column space of A which is equal to thedimension of the row space of A which is equal to the rank of A. �

We will now give some examples that show how to use Theorem 8.1.

8.1. 3× 3 examples. There are four possibilities for the rank and nul-lity of a 3× 3 matrix. 0 1 1

0 0 10 0 0

0 0 1

0 0 00 0 0

1 1 1

0 0 00 0 0

8. RANK + NULLITY 89

8.2. Planes in R4. Let V and V ′ be two dimensional subspaces of R4.The intersection of V and V ′ is a subspace of R4 of dimension ≤ 2 so theirare three possibilities: the intersection is 2-dimensional if and only if V = V ′;if the intersection is 1-dimensional, then V and V ′ have a line in common;if V ∩ V ′ = {0}, then V + V ′ = R4.

Let P and P ′ be planes in R4, i.e., translations of 2-dimensional sub-spaces. We say that P and P ′ are parallel if they are translates of the same2-dimensional subspace. In that case, P ∩ P ′ = φ but that is not the onlyreason two planes can fail to meet.

8.3. 3-planes in R5. We call a 3-dimensional subspace of Rn a 3-plane.Let V and V ′ be 3-planes

8.4. Three formulas. There are many examples in mathematics wherethe same idea pops up in very different settings. Here is an example:

• If U and V are subspaces of Rn, then

dim(U + V ) = dimU + dimV − dim(U ∩ V )

• If A and B are finite sets, then

|A ∪B| = |A|+ |B| − |A ∩B|.• Let A and B denote events; let A∪B denote the event either A orB occurs, and let A ∩ B denote the event both A and B occur. IfP [event] denotes the probability of an event, then

P [A ∪B] = P [A] + P [B]− P [A ∩B].

Lemma 8.6. Let W and W ′ be subspaces of Rn and suppose that W ⊆W ′. If w1, . . . , wm is a basis for W , then there is a basis for W ′ that containsw1, . . . , wm.

Proof. We argue by induction on dimW ′ − dimW . Suppose dimW ′ =1+dimW . Let w′ ∈W ′−W . Then {w′, w1, . . . , wm} is linearly independent,and hence a basis for W ′. etc. �

Proposition 8.7. If U and V are subspaces of Rn, then

dim(U + V ) = dimU + dimV − dim(U ∩ V )

Proof. Let B be a basis for U ∩ V . Then there is a basis B t BU for U anda basis B t BV for V . We will now show that B t BU t BV is a basis forU + V .

Hence

dim(U + V ) = |B|+ |BU |+ |BV |.But |B| = dimU ∩ V , |B|+ |BU | = dimU , and |B|+ |BV | = dimV , so

|B|+ |BU |+ |BV | = dimU + dimV − dimU ∩ Vthus proving the result. �


9. How to compute the null space and range of a matrix

Suppose you are given a matrix A and asked to compute its null spaceand range. What would constitute an answer to that question? One answerwould be to provide a basis for its null space and a basis for its range.

By definition, the null space of A is the set of solutions to the equationAx = 0 so we may use the methods in chapters 7 and 8 to find the solutions.Do elementary row operations to find rref(A). Write E for rref(A). Sincethe equations Ax = 0 and Ex = 0 have the same solutions,

N (A) = N (E).

It is easy to write down the null space for E. For example, suppose If

(9-1) E =

1 0 2 0 0 3 40 1 3 0 0 4 00 0 0 1 0 −2 10 0 0 0 1 1 −20 0 0 0 0 0 00 0 0 0 0 0 0

the dependent variables are x1, x2, x4, and x5. The independent variablesx3, x6, and x7, may take any values and then the dependent variables aredetermined by the equations

x1 = −2x3 − 3x6 − 4x7

x2 = −3x3 − 4x6

x4 = 2x6 + x7

x5 = − x6 + 2x7.

Now set any one of the independent variables to 1 and set the others to zero.This gives solutions

−2−310000

,

−3−402−110

,

−4001201

.

These are elements of the null space and are obviously linearly independent.They span the null space.

MORE TO SAY/DO...basis for null space

CHAPTER 11

Linear transformations

Linear transformations are special kinds of functions between vectorspaces. Subspaces of a vector space are of crucial importance and lineartransformations are precisely those functions between vector spaces thatsend subspaces to subspaces. More precisely, if T : V → W is a lineartransformation and U is a subspace of V , then

{T (u) | u ∈ U}

is a subspace of W . That subspace is often called the image of U under T .

1. A reminder on functions

Let X and Y be any sets. A function f from X to Y , often denotedf : X → Y , is a rule that associates to each element x ∈ X a single elementf(x) that belongs to Y . It might be helpful to think of f as a machine: weput an element of X into f and f spits out an element of Y . We call f(x)the image of x under f . We call X the domain of f and Y the codomain.

The range of f is the subset R(f) := {f(x) | x ∈ X} ⊆ Y . The range off might or might not equal Y . For example, the functions f : R → R andg : R → R≥0 given by the formulas f(x) = x2 and g(x) = x2 are differentfunctions even though they are defined by the same formula. The range gis equal to the codomain of g. The range of f is not equal to the codomainof f .

Part of you might rebel at this level of precision and view it as pedantryfor the sake of pedantry. However, as mathematics has developed it hasbecome clear that such precision is necessary and useful. A great deal ofmathematics involves formulas but those formulas are used to define func-tions and we add a new ingredient, the domain and codomain.

You have already met some of this. For example, the function f(x) = 1x

is not defined at 0 so is not a function from R to R. It is a function fromR − {0} to R. The range of f is not R but R − {0}. Thus the functionf : R − {0} → R − {0} defined by the formula f(x) = 1

x has an inverse,namely itself.

At first we learn that the number 4 has two square roots, 2 and −2,but when we advance a little we define the square root function R≥0 →R≥0. This is because we want the function f(x) = x2 to have an inverse.The desire to have inverses often makes it necessary to restrict the domainor codomain. A good example is the inverse sine function. Look at the

91

92 11. LINEAR TRANSFORMATIONS

definition of sin−1(−) in your calculus book, or on the web, and notice howmuch care is taken with the definition of the codomain.

2. First observations

Let V be a subspace of Rn andW a subspace of Rm. A linear transformationfrom V to W is a function F : V →W such that

(1) F (u+ v) = F (u) + F (v) for all u, v ∈ V , and(2) F (av) = aF (v) for all a ∈ R and all v ∈ V .

It is common to combine (1) and (2) into a single condition: F is a lineartransformation if and only if

F (au+ bv) = aF (u) + bF (v)

for all a, b ∈ R and all u, v ∈ V .

2.1. Equality of linear transformations. Two linear transforma-tions from V to W are equal if they take the same value at all points of V .This is the same way we define equality of functions: f and g are equal ifthey take the same values everywhere.

2.2. Linear transformations send 0 to 0. The only distinguishedpoint in a vector space is the zero vector, 0. It is nice that linear transfor-mations send 0 to 0.

Lemma 2.1. If T : V →W is a linear transformation, then T (0) = 0.1

Proof. The following calculation proves the lemma:

0 = T (0)− T (0)

= T (0 + 0)− T (0)

= T (0) + T (0)− T (0) by property (1)

= T (0) + 0

= T (0).

�

This proof is among the simplest in this course. Those averse to readingproofs should look at each step in the proof, i.e., each = sign, and ask whatis the justification for that step. Ask me if you don’t understand it.

2.3. The linear transformations R→ R.

1When we write T (0) = 0 we are using the same symbol 0 for two different things: the0 in T (0) is the zero in V and the other 0 is the zero in W . We could write T (0V ) = 0W

but that looks a little cluttered. It is also unnecessary because T is a function from Vto W so the only elements we can put into T (−) are elements of V , so for T (0) to makesense the 0 in T (0) must be the zero in V ; likewise, T spits out elements of W so T (0)must be an element of W .

2. FIRST OBSERVATIONS 93

2.4. T respects linear combinations. Let T : V → W be a lineartransformation. If v1, v2, v3 ∈ V and a1, a2, a3 ∈ R, then

T (a1v1 + a2v2 + a3v3) = T ((a1v1 + a2v2) + a3v3)

= T (a1v1 + a2v2) + a3T (v3)

= a1T (v1) + a2T (v2) + a3T (v3)

An induction argument shows that if v1, . . . , vp ∈ V and a1, . . . , ap ∈ R,then

(2-1) T (a1v1 + · · ·+ apvp) = a1T (v1) + · · ·+ apT (vp).

2.5. T is determined by its action on a basis. There are two waysto formalize the statement that a linear transformation is completely deter-mined by its values on a basis.

Let T : V → W be a linear transformation and suppose {v1, . . . , vp} isa basis for V .

(1) If you know the values of T (v1), . . . , T (vp) you can compute itsvalue at every point of V because if v ∈ V there are unique numbersa1, . . . , ap ∈ R such that v = a1v1 + · · · + apvp and it now followsfrom (2-1) that T (v) = a1T (v1) + · · ·+ apT (vp).

(2) If S : V → W is another linear transformation such that S(vi) =T (vi) for i = 1, . . . , p, then S = T . To see this use (2-1) twice: oncefor S and once for T .

2.6. How do we tell if a function T : V → W is a linear trans-formation? As with all such questions, the answer is to check whether Tsatisfies the definition or not.

PAUL - examples,The function f : R2 → R2 given by

f(x, y) =

{(x+ y, x− y) if x ≥ 0 and y ≥ 0

(x− y, x+ y) otherwise

is not a linear transformation because

2.7. The zero transformation. The function V → W that sendsevery x ∈ V to the zero vector in W is a linear transformation (check!). Wecall this the zero linear transformation and denote it by 0 if necessary.

2.8. The identity transformation. The function V → V that sendsevery x ∈ V to itself is a linear transformation (check!). We call it theidentity transformation of V and denote it by idV .


2.9. A composition of linear transformations is a linear trans-formation. If S : V →W and T : U → V are linear transformations, theircomposition, denoted by ST or S ◦ T , is defined in the same way as thecomposition of any two functions, namely

(ST )(u) = S(T (u)) or (S ◦ T )(u) = S(T (u)).

This makes sense because T (u) ∈ V and we can apply S to elements of V .Thus

ST : U →W.

To see that ST is really a linear combination suppose that u and u′ belongto U and that a, a′ ∈ R. Then

(ST )(au+ a′u′) = S(T (au+ a′u′)) by definition of ST

= S(aT (u) + a′T (u′)) because T is a linear transformation

= aS(T (u)) + a′S(T (u′)) because S is a linear transformation

= a(ST )(u) + a′(ST )(u′) by definition of ST

so ST is a linear transformation.

2.10. A linear combination of linear transformations is a lineartransformation. If S and T are linear transformations from U to V anda, b ∈ R, we define the function aS + bT by

(aS + bT )(x) := aS(x) + bT (x).

It is easy to check that aS + bT is a linear transformation from U to V too.We call it a linear combination of S and T . Of course the idea extends tolinear combinations of larger collections of linear transformations.

3. Linear transformations and matrices

Let A be an m × n matrix and define T : Rn → Rm by T (x) := Ax.Then T is a linear transformation from Rn to Rm. The next theorem saysthat every linear transformation from Rn to Rm is of this form.

Theorem 3.1. Let T : Rn → Rm be a linear transformation. Then thereis a unique m× n matrix A such that T (x) = Ax for all x ∈ Rn.

Proof. Let A be the m × n matrix whose jth column is Aj := T (ej). If

x = (x1, . . . , xn)T is an arbitrary element of Rn, then

T (x) = T (x1e1 + · · ·+ xnen)

= x1T (e1) + · · ·+ xnT (en)

= x1A1 + · · ·+ xnAn= Ax

as required. �

In the context of the previous theorem, we sometimes say that the matrixA represents the linear transformation T .

4. HOW TO FIND THE MATRIX REPRESENTING A LINEAR TRANSFORMATION95

Suppose T : Rk → Rm and S : Rm → Rn are linear transformations andA and B matrices such that T (x) = Ax for all x ∈ Rk and S(y) = By for

all y ∈ Rm, then (S ◦ T )(x) = BAx for all x ∈ Rk. Thus

multiplication of matrices corresponds tocomposition of linear transformations

or

multiplication of matrices is defined so that the matrixrepresenting the composition of two linear transformations

is the product of the matrices representing each linear transformation.

This is of great importance. It explains why the multiplication of matricesis defined as it is. That last sentence bears repeating. The fact that matrixmultiplication has real meaning, as opposed to something that was definedby a committee that just wanted to be able to ”multiply” matrices, is dueto the fact that matrix multiplication corresponds to that most natural ofoperations, composition of functions, first do one, then the other.

Mathematics is not about inventing definitions just for the heck of itand seeing what one can deduce from them. Every definition is motivatedby some problem, question, or naturally occurring object or feature. Lineartransformations and compositions of them are more fundamental than ma-trices and their product. Matrices are tools for doing calculations with lineartransformations. They sometimes convey information that is concealed inother presentations of linear transformations.

We introduce you to matrices before linear transformations because theyare more concrete and down to earth, not because they are more importantthan linear transformations.

3.1. Isomorphisms. Suppose that S : V → U and T : U → V arelinear transformations. Then ST : U → U and TS : V → V . If ST = idUand TS = idV we say that S and T are inverses to one another. We alsosay that S and T are isomorphisms and that U and V are isomorphic vectorspaces.

Theorem 3.2. Let V be a subspace of Rn. If dimV = d, then V isisomorphic to Rd.

Proof. Let v1, . . . , vd be a basis for V . Define T : Rd → V by

T (a1, . . . , ad) := a1v1 + · · ·+ advd.

MORE TO DOS : V → Rd by �

4. How to find the matrix representing a linear transformation

The solution to this problem can be seen from the proof of Theorem 3.1.There, the matrix A such that T (x) = Ax for all x ∈ Rn is constructed onecolumn at a time: the ith column of A is T (ei).


PAUL–explicit example.

5. Invertible matrices and invertible linear transformations

6. How to find the formula for a linear transformation

First, what do we mean by a formula for a linear transformation? Wemean the same thing as when we speak of a formula for one of the functionsyou met in calculus—the only difference is the appearance of the formula.For example, in calculus f(x) = x sinx is a formula for the function f ;similarly, g(x) = |x| is a formula for the function g; we can also give theformula for g by

g(x) =

{x if x ≥ 0

−x if x ≤ 0.

The expression

T

abc

=

2a− ba

a+ b+ c3c− 2b

is a formula for a linear transformation T : R3 → R4.

Although a linear transformation may not be given by an explicit for-mula, we might want one in order to do calculations. For example, rotationabout the origin in the clockwise direction by an angle of 45◦ is a lineartransformation, but that description is not a formula for it. A formula forit is given by

T

(ab

)= 1√

2

(a+ bb− a

)= 1√

2

(1 1−1 1

)(ab

).

Let T : Rn → Rm be a linear transformation. Suppose {v1, . . . , vn} isa basis for Rn and you are told the values of T (v1), . . . , T (vn). There is asimple procedure to find the m × n matrix A such that T (x) = Ax for allx ∈ Rn. The idea is buried in the proof of Theorem 3.1.

MORE TO DO

7. Rotations in the plane

Among the more important linear transformations R2 → R2 are therotations. Given an angle θ we write Tθ for the linear transformation thatrotates a point by an angle of θ radians in the counterclockwise direction.

Before we obtain a formula for Tθ some things should be clear. First,TθTψ = Tθ+ψ. Second, T0 = 0. Third, T2nπ = idR2 and T(2n+1)π = − idR2

for all n ∈ Z. Fourth, rotation in the counterclockwise direction by an angleof −θ is the same as rotation in the clockwise direction by an angle of θ soTθ has an inverse, namely T−θ.

Let’s write Aθ for the unique 2 × 2 matrix such that Tθx = Aθx for allx ∈ R2. The first column of Aθ is given by Tθ(e1) and the second column

8. REFLECTIONS IN R2 97

of Aθ is given by Tθ(e2). Elementary trigonometry (draw the diagrams andcheck!) then gives

(7-1) Aθ =

(cos θ − sin θsin θ cos θ

).

There are many interesting aspects to this:

• if you have forgotten your formulas for sin(θ + ψ) and cos(θ + ψ)you can recover them by using the fact that AθAψ = Aθ+ψ andcomputing the product on the left;• the determinant of Aθ is 1 because cos2 θ + sin2 θ = 1;• A−1θ = A−θ and using the formula for the inverse of a 2×2 matrix in

Theorem 5.1 you can recover the formulae for sin(−θ) and cos(−θ)if you have forgotten them.

8. Reflections in R2

Let L be the line in R2 through the origin and a non-zero point

(ab

).

The reflection in L is the linear transformation that sends

(ab

)to itself, i.e.,

it fixes every point on L, and sends the point

(−ba

), which is orthogonal to

L, to

(b−a

).2 Let A be the matrix implementing this linear transformation.

Since

e1 =1

a2 + b2

(a

(ab

)− b

(−ba

))and

e2 =1

a2 + b2

(b

(ab

)+ a

(−ba

))we get

Ae1 =1

a2 + b2

(a

(ab

)− b

(b−a

))=

1

a2 + b2

(a2 − b2

2ab

)and

Ae2 =1

a2 + b2

(b

(ab

)+ a

(b−a

))=

1

a2 + b2

(2ab

b2 − a2)

so

A =1

a2 + b2

(a2 − b2 2ab

2ab b2 − a2)

In summary, if x ∈ R2, then Ax is the reflection of x in the line ay = bx.

2Draw a picture to convince yourself that this does what you expect the reflection inL to do.


You might exercise your muscles by showing that all reflections are sim-ilar to one another. Why does it suffics to show each is similar to thereflection (

1 00 −1

)?

8.1. Reflections in higher dimensions. Let T : Rn → Rn. We callT a reflection if T 2 = I and the 1-eigenspace of T has dimension n− 1.

9. Invariant subspaces

10. The one-to-one and onto properties

For a few moments forget linear algebra. The adjectives one-to-one andonto can be used with any kinds of functions, not just those that arise inlinear algebra.

Let f : X → Y be a function between any two sets X and Y .

• We say that f is one-to-one if f(x) = f(x′) only when x = x′.Equivalently, f is one-to-one if it sends different elements of X todifferent elements of Y , i.e., f(x) 6= f(x′) if x 6= x′.• The range of f , which we denote by R(f), is the set of all values it

takes, i.e.,R(f) := {f(x) | x ∈ X}.

I like this definition because it is short but some prefer the equiv-alent definition

R(f) := {y ∈ Y | y = f(x) for some x ∈ X}.• We say that f is onto if R(f) = Y . Equivalently, f is onto if every

element of Y is f(x) for some x ∈ X.

10.1. The sine function sin : R → R is not one-to-one because sinπ =sin 2π. It is not onto because its range is [−1, 1] = {y | − 1 ≤ y ≤ 1}.However, if we consider sin as a function from R to [−1, 1] it is becomesonto, though it is still not one-to-one.

The function f : R → R given by f(x) = x2 is not one-to-one becausef(2) = f(−2) for example, and is not onto because its range is R≥0 := {y ∈R | ly ≥ 0}. However, the function

f : R≥0 → R≥0, f(x) = x2,

is both one-to-one and onto.

11. Two-to-two

One-to-one is a lousy choice of name. Two-to-two would be much betterbecause a one-to-one function f : X → Y is a function having the propertythat it always sends two different elements x and x′ in X to two two differentelements f(x) and f(x′) in Y . Of course, once one sees this, it follows that aone-to-one function is not only two-to-two but three-to-three, four-to-four,

12. GAZING INTO THE DISTANCE: DIFFERENTIAL OPERATORS AS LINEAR TRANSFORMATIONS99

and so on; i.e., if f : X → Y is one-to-one and x1, x2, x3, x4 are four differentelements of X, then f(x1), f(x2), f(x3), f(x4) are four different elements ofX

11.1. We can and do apply these ideas in the context of linear algebra.If T is a linear transformation we can ask whether T is one-to-one or onto.

For example, the linear transformation T : R2 → R2 given by

T

(x1x2

)=

(x2x1

)is both one-to-one and onto.

The linear transformation T : R2 → R3 given by

T

(x1x2

)=

x2x1

x1 + x2

is one-to-one but not onto because (0, 0, 1)T is not in the range. Of course,since the range of T is a plane in R3 there are lots of other elements of R3

not in the range of T .

Proposition 11.1. A linear transformation is one-to-one if and only ifits null space is zero.

Proof. Let T be a linear transformation. By definition,N (T ) = {x | T (x) =0}. If T is one-to-one the only x with the property that T (x) = T (0) is x = 0,so N (T ) = {0}.

Conversely, suppose that N (T ) = {0}. If T (x) = T (x′), then T (x−x′) =T (x) − T (x′) = 0 so x − x′ ∈ N (T ), whence x = x′ thus showing that T isone-to-one. �

Another way of saying this is that a linear transformation is one-to-oneif and only if its nullity is zero.

12. Gazing into the distance: differential operators as lineartransformations

You do not need to read this chapter. It is only for those who are curiousenough to raise their eyes from the road we have been traveling to pause,refresh themselves, and gaze about in order to see more of the land we areentering. Linear algebra is a big subject that permeates and provides aframework for almost all areas of mathematics. So far we have seen only asmall piece of this vast land.

We mentioned earlier that the set R[x] of all polynomials in x is a vectorspace with basis 1, x, x2, . . .. Differentiation is a linear transformation

d

dx: R[x]→ R[x]

becaused

dx(af(x) + bg(x)) = a

df

dx+ b

dg

dx.


If we write d/dx as a matrix with respect to the basis 1, x, x2, . . ., then

d

dx=

0 1 0 0 0 0 · · ·0 0 2 0 0 0 · · ·0 0 0 3 0 0 · · ·...

...

because d

dx(xn) = nxn−1.

The null space of ddx is the set of constant functions, the subspace R.1.

The range of ddx is all of R[x]: every polynomial is a derivative of another

polynomial.Another linear transformation from R[x] to itself is “multiplication by

x”, i.e., T (f) = xf . The composition of two linear transformations is a lineartransformation. In particular, we can compose x and d

dx to obtain anotherlinear transformation. There are two compositions, one in each order. It isinteresting to compute the difference of these two compositions. Let’s dothat

Write T : R[x] → R[x] for the linear transformation T (f) = xf andD : R[x] → R[x] for the linear transformation D(f) = f ′, the derivativeof f with respect to x. Then, applying the product rule to compute thederivative of the product xf , we have

(DT − TD)(f) = D(xf)− xf ′ = f + xf ′ − xf ′ = f.

Thus DT − TD = I, the identity transformation.MORE to say, Heisenberg’s Uncertainty Principle.

12.1. Higher-order matrix equations. This course is all about solv-ing equations in matrices. At high school you learned something about solv-ing equations in which the solution is a number, i.e., a 1 × 1 matrix. Inthis course you are learning about solving equations in which the solutionis a column vector, an n × 1 matrix. An obvious next step is to considerequations in which the solutions are matrices.

For example, at high school solving the two equations x2 + y2 = 1 andx − y = 1

2 amounts to finding the two points where the circle meets theline. The solutions are 1× 2 matrices. However, you could also consider theproblem of finding all 2 × 2 matrices x and y such that x2 + y2 = 1 andx− y = 1

2 .MORE to sayFind all n× n matrices E, F , H, such that

HE − EH = 2E

EF − FE = H

HF − FH = −2F.

Because of the equivalence between matrices and linear transformations,which we discussed in chapter 3, this problem is equivalent to finding all

12. GAZING INTO THE DISTANCE: DIFFERENTIAL OPERATORS AS LINEAR TRANSFORMATIONS101

linear transformations E, F , and H, from Rn to itself that simultaneouslysatisfy the three equations.

Let R[x, y]n denote the (n+1)-dimensional vector space consisting of allhomogeneous polynomials in two variables having degree n. A basis for thisvector space is the set

xn, xn−1y, xn−2y2, . . . , xyn−1, yn.

Thus the elements of R[x, y]n consists of all linear combinations

anxn + an−1x

n−1y + · · ·+ a1xyn−1 + a0y

n.

Let ∂x and ∂y denote the partial derivatives with respect to x and y, re-spectively. These are linear transformations R[x, y]n → R[x, y]n−1. We willwrite fx for ∂x(f) and fy for ∂y(f). Multiplication by x and multiplicationby y are linear transformations R[x, y]n → R[x, y]n+1. Therefore

E := x∂y, H := x∂x − y∂y, F := y∂x

are linear transformations R[x, y]n → R[x, y]n for every n. The action ofEF − FE on a polynomial f is

(EF − FE)(f) =(x∂yy∂x − y∂xx∂y

)(f)

= x∂y(yfx)− y∂x(xfy)

= x(fx + yfxy)− y(fy + xfyx)

= xfx + xyfxy − yfy − yxfyx= xfx − yfy=(x∂x − y∂y

)(f)

= H(f).

Since the transformations EF − FE and H act in the same way on everyfunction f , they are equal; i.e., EF − FE = H. We also have

(HE − EH)(f) =((x∂x − y∂y

)(x∂y)−(x∂y)(x∂x − y∂y

))(f)

=(x∂x − y∂y

)(xfy)−

(x∂y)(xfx − yfy)

= x(fy + xfyx)− yxfyy − x2fxy + x(fy + yfyy))

= 2xfy

= 2(x∂y)(f)

= 2E(f).

Therefore HE − EH = 2E. I leave you to check that HF − FH = −2F .This example illuminates several topics we have discussed. The null

space of E is Rxn. The null space of F is Ryn. Each xn−iyi is an eigenvectorfor H having eigenvalue n − 2i. Thus H has n + 1 distinct eigenvaluesn, n−2, n−4, . . . , 2−n,−n and, as we prove in greater generality in Theorem1.2, these eigenvectors are linearly independent.

MORE to say – spectral lines....

CHAPTER 12

Determinants

The determinant of an n × n matrix A, which is denoted det(A), is anumber computed from the entries of A having the wonderful property that:

A is invertible if and only if det(A) 6= 0.

We will prove this in Theorem 3.1 below.Sometimes, for brevity, we denote the determinant of A by |A|.

1. The definition

We met the determinant of a 2× 2 matrix in Theorem 5.1:

det

(a bc d

)= ad− bc.

The formula for the determinant of an n × n matrix is more compli-cated. It is defined inductively by which I mean that the formula for thedeterminant of an n× n matrix involves the formula for the determinant ofan (n− 1)× (n− 1) matrix.

1.1. The 3 × 3 case. The determinant det(A) = det(aij) of a 3 × 3matrix is the number

a11 a12 a13a21 a22 a23a31 a32 a33

= a11a22 a23a32 a33

− a12a21 a23a31 a33

+ a13a21 a22a31 a32

= a11 (a22a33 − a23a32)− a12 (a21a33 − a23a31) + a13 (a21a32 − a22a31)

The coefficients a11, a12, and a13, on the right-hand side of the formula arethe entries in the top row of A and they appear with alternating signs +1and −1; each a1j is then multiplied by the determinant of the 2× 2 matrixobtained by deleting the row and column that contain a1j , i.e., deleting the

top row and the jth column of A.In the second line of the expression for det(A) has 3 × 2 = 6 = 3!

terms. Each term is a product of three entries and those three entries aredistributed so that each row of A contains exactly one of them and eachcolumn of A contains exactly one of them. In other words, each term is aproduct a1pa2qa3r where {p, q, r} = {1, 2, 3}.

103

104 12. DETERMINANTS

1.2. The 4 × 4 case. The determinant det(A) = det(aij) of a 4 × 4matrix is the number

a11 a12 a13 a14a21 a22 a23 a24a31 a32 a33 a34a41 a42 a43 a44

= a11

a22 a23 a24a32 a33 a34a42 a43 a44

− a12a21 a23 a24a31 a33 a34a41 a43 a44

+ a13

a21 a22 a24a31 a32 a34a41 a42 a44

− a14a21 a22 a23a31 a32 a33a41 a42 a43

.

Look at the similarities with the 3× 3 case. Each entry in the top row of Aappears as a coefficient with alternating signs +1 and −1 in the formula onthe right. Each a1j is then multiplied by the determinant of the 3×3 matrixobtained by deleting the row and column that contain a1j , i.e., deleting the

top row and the jth column of A. Multiplying out the formula on the rightyields a total of 4 × 6 = 24 = 4! terms. Each term is a product of fourentries and those four entries are distributed so that each row of A containsexactly one of them and each column of A contains one of them. In otherwords, each term is a product a1pa2qa3ra4s where {p, q, r, s} = {1, 2, 3, 4}.

1.3. The n × n case. Let A be an n × n matrix. For each i and jbetween 1 and n we define the (n − 1) × (n − 1) matrix Aij to be thatobtained by deleting row i and column j from A. We then define

(1-1) det(A) := a11|A11| − a12|A12|+ a13|A13| − · · ·+ (−1)n−1a1n|A1n|.

Each |Aij | is called the ijth minor of A.The expression (1-1) is said to be obtained by expanding A along its

first column. There are similar expressions for the determinant of A thatare obtained by expanding A along any row or column.

There are a number of observations to make about the formula (1-1).When the right-hand side is multiplied out one gets a total of n! terms eachof which looks like

(1-2) ±a1j1a2j2 · · · anjnwhere {j1, . . . , jn} = {1, . . . , n}. In particular, each row and each columncontains exactly one of the factors a1j1 , a2j2 , . . . , anjn .

Proposition 1.1. If A has a column of zeros, then det(A) = 0.

Proof. Suppose column j consists entirely if zeroes, i.e., a1j = a2j = · · · =anj = 0. Consider one of the n! terms in det(A), say a1j1 , a2j2 , . . . , anjn .Because {j1, . . . , jn} = {1, . . . , n}, one of the factors apjp in the producta1j1 , a2j2 , . . . , anjn belongs to column j, so is zero; hence a1j1 , a2j2 , . . . , anjn =0. Thus, every one of the n! terms in det(A) is zero. Hence det(A) = 0. �

A similar argument shows that det(A) = 0 if one of the rows of A consistsentirely of zeroes.

1. THE DEFINITION 105

By Theorem ??, A is not invertible if its columns are linearly dependent.Certainly, if one of the columns of A consists entirely of zeroes its columnsare linearly dependent. As just explained, this implies that det(A) = 0.This is consistent with the statement (not yet proved) at the beginning ofthis chapter that A is invertible if and only if det(A) 6= 0.

1.4. The signs. The signs ±1 are important. First let’s just note thatthe coefficient of a11a22 . . . ann is +1. This can be proved by induction: wehave

a11 a12 . . . a1na21 a22 . . . a2n...

...an1 an2 . . . ann

= a11

a22 . . . a2n...

...an2 . . . ann

+ other terms

but a11 does not appear in any of the other terms. The coefficient ofa11a22 . . . ann in det(A) is therefore the same as the coefficient of a22 . . . annin

a22 . . . a2n...

...an2 . . . ann

However, by an induction argument we can assume that the coefficientof a22 . . . ann in this smaller determinant is +1. Hence the coefficient ofa11a22 . . . ann in det(A) is +1.

In general, the sign in front of a1j1a2j2 · · · anjn is determined accordingto the following recipe: it is +1 if one can change the sequence j1, j2, . . . , jnto 1, 2, . . . , n by an even number of switches of adjacent numbers, and is -1otherwise. For example, the coefficient of a12a23a34a45a51 in the expressionfor the determinant of a 5× 5 matrix is +1 because we need 4 switches

23451→ 23415→ 23145→ 21345→ 12345.

On the other hand, the coefficient of a13a24a35a42a51 is −1 because we need7 switches

34521→ 34251→ 34215→ 32415→ 32145→ 23145→ 21345→ 12345.

1.5. The upper triangular case. An n×nmatrixA is upper triangularif every entry below the main diagonal is zero; more precisely, if aij = 0 forall i > j. The next result shows that it is easy to compute the determinantof an upper triangular matrix: it is just the product of the diagonal entries.

Theorem 1.2.

det

a11 a12 a13 · · · a1n0 a22 a23 · · · a2n0 0 a33 · · · a3n...

.... . .

...0 0 0 · · · a1n

= a11a22 . . . ann.


Proof. The determinant is a sum of terms of the form

±a1j1a2j2 · · · anjnwhere {j1, . . . , jn} = {1, . . . , n}. For an upper triangular matrix this termcan only be non-zero if j1 ≥ 1, and j2 ≥ 2, and . . ., and jn ≥ n. But thiscan only happen if j1 = 1, j2 = 2, . . ., and jn = n. Hence the only non-zeroterm is a11a22 . . . ann which, as we noted earlier, has coefficient +1. �

Thus, for example,

det(I) = 1.

There is an analogue of Theorem 1.2 for lower triangular matrices.

2. Elementary row operations and determinants

Computing the determinant of a matrix directly from the definition istedious and prone to arithmetic error. However, given A we can performelementary row operations to get a far simpler matrix, say A′, perhaps uppertriangular, compute det(A′) and, through repeated us of the next result, thendetermine det(A).

The row reduced echelon form of a square matrix is always upper trian-gular.

Proposition 2.1. If B is obtained from A by the following elementaryrow operation the determinants are related in the following way:(2-1)

Elementary row operation• switch two rows det(B) = −det(A)• multiply a row by c ∈ R− {0} det(B) = cdet(A)• replace row i by (row i + row k) with i 6= k det(B) = det(A)

In particular, det(A) = 0 if and only if det(B) = 0.

Proof. The second of these statements is obvious. Suppose we multiply rowp by the number c. Each term a1j1a2j2 · · · anjn in det(A) contains exactlyone factor from row p so each term in det(B) is c times the correspondingterm in det(A). Hence det(B) is c times det(A).

We now prove the other two statements in the 2× 2 case.Switching two rows changes the sign because

det

(c da b

)= cb− ad = −(ad− bc) = −det

(a bc d

).

Adding one row to another doesn’t change the sign because

det

(a+ c b+ dc d

)= (a+ c)b− (b+ d)c = ad− bc = det

(a bc d

).

and

(2-2) det

(a b

c+ a d+ b

)= a(d+ b)− b(c+ a) = −ad− bc = det

(a bc d

).

2. ELEMENTARY ROW OPERATIONS AND DETERMINANTS 107

Hence all three statements are true for a 2× 2 matrix.We now turn to the n × n case and argue by induction on n, i.e., we

assume the result is true for (n− 1)× (n− 1) matrices and prove the resultfor the an n× n matrix.

Actually, the induction argument is best understood by looking at howthe truth of the result for 2× 2 matrices implies its truth for 3× 3 matrices.Consider switching rows 2 and 3 in a 3× 3 matrix. Since

(2-3)a11 a12 a13a21 a22 a23a31 a32 a33

= a11a22 a23a32 a33

− a12a21 a23a31 a33

+ a13a21 a22a31 a32

it follows that

(2-4)a11 a12 a13a31 a32 a33a21 a22 a23

= a11a32 a33a22 a23

− a12a31 a33a21 a23

+ a13a31 a32a21 a22

.

But the determinants of the three 2×2 matrices in this sum are the negativesof the three 2× 2 determinants in (2-3). Hence the determinant of the 3× 3matrix in (2-4) is the negative of the determinant of the 3 × 3 matrix in(2-3). This “proves” that det(B) = −det(A) if B is obtained from A byswitching two rows.

Now consider replacing the third row of the 3× 3 matrix A in (2-3) bythe sum of rows 2 and 3 to produce B. Then det(B) is

a11 a12 a13a21 a22 a23

a21 + a31 a22 + a32 a23 + a33

= a11a22 a23

a22 + a32 a23 + a33

− a12a21 a23

a21 + a31 a23 + a33

+ a13a21 a22

a21 + a31 a22 + a32.

However, by the calculation in (2-2), this is equal to

a11a32 a33a22 a23

− a12a31 a33a21 a23

+ a13a31 a32a21 a22

i.e., det(B) = det(A).PAUL ... didn’t cover the case of an ERO that involves row 1. �

Proposition 2.2. If A is an n × n matrix and E is an elementarymatrix, then det(EA) = det(E) det(A).

Proof. �

Proposition 2.3. Let A and A′ be row equivalent n×n matrices. Thendet(A) = 0 if and only if det(A′) = 0.


Proof. By hypothesis, A′ is obtained from A by a sequence of elementaryrow operations. By Proposition 2.1, a single elementary row operation doesnot change whether the determinant of a matrix is zero or not. There-fore a sequence of elementary row operations does not change whether thedeterminant of a matrix is zero or not. �

3. The determinant and invertibility

Theorem 3.1. A square matrix is invertible if and only if its determi-nant is non-zero.

Proof. Suppose A is an n× n matrix and let E = rref(A). Then A and Ehave the same rank.

By Proposition 5.2, E = I if rankE = n, and E has a row of zeroesif rankE < n. Since A is invertible if and only if rank(A) = n, it followsthat det(E) = 1 if A is invertible and det(E) = 0 if A is not invertible. Theresult now follows from Proposition 2.3. �

4. Properties

Theorem 4.1. det(AB) = (detA)(detB).

Proof. We break the proof into two separate cases.SupposeA andB are invertible. ThenA = E1 . . . Em andB = Em+1 . . . En

where each Ej is an elementary matrix. By repeated application of Propo-sition 2.2,

det(A) = det(E1 . . . Em)

= det(E1) det(E2 . . . Em)

= · · ·= det(E1) det(E2) · · · det(Em).

Applying the same argument to det(B) and det(AB) we see that det(AB) =(detA)(detB).

Suppose that either A or B is not invertible. Then AB is not invertible,so det(AB) = 0 by Theorem 3.1. By the same theorem, either det(A) ordet(B) is zero. Hence det(AB) = (detA)(detB) in this case too. �

Corollary 4.2. If A is an invertible matrix, then det(A−1) = (detA)−1.

Proof. It is clear that the determinant of the identity matrix is 1, so theresult follows from Theorem 4.1 and the fact that AA−1 = I. �

det(A) = 0 if two rows (or columns) of A are the same.det(AT ) = det(A).

5. ELEMENTARY COLUMN OPERATIONS AND DETERMINANTS 109

5. Elementary column operations and determinants

As you might imagine, one can define elementary column operations ina way that is analogous to our definition of elementary row operations andthe effect on the determinant of those operations on the determinant of amatrix parallels that on the effect of the elementary row operations.

5.1. Amusement. Here is an amusing use of the observation about theeffect of the elementary column operations on the determinant.

The numbers 132, 144, and 156, are all divisible by 12. Then the deter-minant of 1 3 2

1 4 41 5 6

is divisible by 12 because it is not changed by adding 100 times the 1stcolumn and 10 times the second column to the third column whence

det

1 3 21 4 41 5 6

= det

1 3 1321 4 1441 5 156

= 12× det

1 3 111 4 121 5 13

.

Similarly, the numbers 289, 391, and 867, are all divisible by 17 so

det

2 8 93 9 18 6 7

= det

2 8 2893 9 3918 6 867

= 17× det

2 8 173 9 238 6 51

.

is divisible by 17.Amuse your friends at parties.

5.2. A nice example. If the matrix

A =

b c da −d cd a −bc −b −a

has rank ≤ 2, then

(1) rref(A) has ≤ 2 non-zero rows so(2) the row space of A has dimension ≤ 2 so(3) any three rows of A are linearly dependent(4) so det(any 3 rows of A) = 0.

For example,

0 = det

b c da −d cd a −b

=b(db− ac)− c(−ab− cd) + d(a2 + d2)

=d(a2 + b2 + c2 + d2)


Similarly, taking the determinant of the other three 3× 3 submatrices of Agives

a(a2 + b2 + c2 + d2) = b(a2 + b2 + c2 + d2) = c(a2 + b2 + c2 + d2) = 0.

If a, b, c, d are real numbers it follows that a = b = c = d = 0, i.e., Amust be the zero matrix. However, if a, b, c, d are complex numbers A neednot be zero because there are non-zero complex numbers a, b, c, d such thata2 + b2 + c2 + d2 = 0.

5.3. Permutations. You have probably met the word permutationmeaning arrangement or rearrangement as in the question “How many per-mutations are there of the numbers 1, 2, . . . , n?” The answer, as you know,is n!.

We now define a permutation of the set {1, 2, . . . , n} as a function

π : {1, 2, . . . , n} −→ {1, 2, . . . , n}

that is one-to-one (or, equivalently, onto). Thus each term in the expressionfor the determinant of an n× n matrix (aij) is of the form

(5-1) ±a1π(1)a2π(2) · · · anπ(n).

The condition that π is one-to-one ensures that each row and each columncontains exactly one of the terms in (5-1).

There area total of n! permutations and the determinant of (aij) is thesum of n! terms (5-1), one for each permutation. However, there remainsthe issue of the ± signs.

Let π be a permutation of {1, 2, . . . , n}. Write π(1), π(2), . . . , π(n) oneafter the other. A transposition of this string of numbers is the string ofnumbers obtained by switching the positions of any two adjacent numbers.For example, if we start with 4321, then each of 3421, 4231, and 4312, is atransposition of 4321. It is clear that after some number of transpositionswe can get the numbers back in their correct order. For example,

4321→ 4312→ 4132→ 1432→ 1423→ 1243→ 1234

or

4321→ 4231→ 2431→ 2341→ 2314→ 2134→ 1234.

If it takes an even number of transpositions to get the numbers back intotheir correct order we call the permutation even. If it takes an odd numberwe call the permutation odd. We define the sign of a permutation to be thenumber

sgn(π) :=

{+1 if π is even

−1 if π is odd

It is not immediately clear that the sign of a permutation is well-defined. Itis, but we won’t give a proof.

5. ELEMENTARY COLUMN OPERATIONS AND DETERMINANTS 111

The significance for us is that the coefficient of the term a1π(1)a2π(2) · · · anπ(n)in the expression for the determinant is sgn(π). Hence

det(aij) =∑π

sgn(π) a1π(1)a2π(2) · · · anπ(n)

where the sum is taken over all permutations of {1, 2, . . . , n}.Thus, the term a14a23a32a41 appears in the expression for the determi-

nant of a 4× 4 matrix with coefficient +1 because, as we saw above, it tookan even number of transpositions (six) to get 4321 back to the correct order.

CHAPTER 13

Eigenvalues

Consider a fixed n × n matrix A. Viewed as a linear transformationfrom Rn to Rn, A will send a line through the origin to either another linethrough the origin or zero. Some lines might be sent back to themselves.That is a nice situation: the effect of A on the points on that line is simplyto scale them by some factor, λ say. The line Rx is sent by A to either zeroor the line R(Ax), so A sends this line to itself if and only if Ax is a multipleof x. This motivates the next definition.

1. Definitions and first steps

Let A be an n × n matrix. A number λ ∈ R is an eigenvalue for A ifthere is a non-zero vector x such that Ax = λx. If λ is an eigenvalue for Awe define

Eλ := {x | Ax = λx}.We call Eλ the λ-eigenspace forA and the elements in Eλ are called eigenvectorsor λ-eigenvectors for A.

We can define Eλ for any λ ∈ R but it is non-zero if and only if λ is aneigenvalue.1

Since

Ax = λx ⇐⇒ 0 = Ax− λx = Ax− λIx = (A− λI)x,

it follows that

Eλ = N (A− λI).

We have therefore proved that Eλ is a subspace of Rn. The following theoremalso follows from the observation that Eλ = N (A− λI).

Theorem 1.1. Let A be an n× n matrix. The following are equivalent:

(1) λ is an eigenvalue for A;(2) A− λI is singular;(3) det(A− λI) = 0.

Proof. Let λ ∈ R. Then λ is an eigenvalue for A if and only if there is anon-zero vector v such that (A − λI)v = 0; but such a v exists if and only

1Eigen is a German word that has a range of meanings but the connotation here isprobably closest to “peculiar to”, or “particular to”, or “characteristic of”. The eigenvaluesof a matrix are important characteristics of it.

113

114 13. EIGENVALUES

if A − λI is singular. This proves the equivalence of (1) and (2). Theorem3.1 tells us that (2) and (3) are equivalent statements. �

Theorem 1.2. Let A be an n× n matrix and λ1, . . . , λr distinct eigen-values for A. If v1, . . . , vr are non-zero vectors such that Avi = λivi foreach i, then {v1, . . . , vr} is linearly independent.

Proof. Suppose the theorem is false. Let m ≥ 1 be the smallest numbersuch that {v1, . . . , vm} is linearly independent but {v1, . . . , vm, vm+1} is lin-early dependent. Such an m exists because {v1} is a linearly independentset.

There are numbers a1, . . . , am such that vm+1 = a1v1 + · · · + arvm. Itfollows that

λm+1a1v1 + · · ·+ λm+1arvm = λm+1vm+1

= Avm+1

= a1Av1 + · · ·+ arAvm= a1λ1v1 + · · ·+ arλmvm.

Therefore

a1(λm+1 − λ1)v1 + · · ·+ ar(λm+1 − λm)vm = 0.

But {v1, . . . , vm} is linearly independent so

a1(λm+1 − λ1) = · · · = ar(λm+1 − λm) = 0.

By hypothesis, λm+1 6= λi for any i = 1, . . . ,m so

a1 = a2 = · · · = am = 0.

But this implies vm+1 = 0 contradicting the hypothesis that vm+1 is non-zero. We therefore conclude that the theorem is true. �

2. Reflections in R2, revisited

A simple example to consider is reflection in a line lying in R2, thesituation we considered in chapter 8. There we examined a pair of orthogonal

lines, L spanned by

(ab

), and L′ spanned by

(b−a

). We determined the

matrix A that fixed the points on L and sent other points to their reflections

in L. In particular, A sent

(b−a

)to its negative. We found that

A =1

a2 + b2

(a2 − b2 2ab

2ab b2 − a2).

The matrix A has eigenvectors

(ab

)and

(b−a

)with eigenvalues, +1 and

−1, and the corresponding eigenspaces are L and L′.

3. THE 2× 2 CASE 115

The point of this example is that the matrix A does not initially appearto have any special geometric meaning but by finding its eigenvalues andeigenspaces the geometric meaning of A becomes clear.

3. The 2× 2 case

Theorem 3.1. The eigenvalues of A =

(a bc d

)are the solutions of the

quadratic polynomial equation

(3-1) λ2 − (a+ d)λ+ ad− bc = 0.

Proof. We have already noted that λ is an eigenvalue for A if and only ifA− λI is singular, i.e., if and only if det(A− λI) = 0. But

det(A− λI) = det

(a− λ bc d− λ

)= (a− λ)(d− λ)− bc= λ2 − (a+ d)λ+ ad− bc,

so the result follows. �

A quadratic polynomial with real coefficients can have either zero, one,or two roots. It follows that a 2× 2 matrix with real entries can have eitherzero, one, or two eigenvalues. (Later we will see that an n× n matrix withreal entries can have can have anywhere from zero to n eigenvalues.)

3.1. The polynomial x2 + 1 has no real zeroes, so the matrix(0 −11 0

)has no eigenvalues. You might be happy to observe that this is the matrixthat rotates the plane through an angle of 90◦ in the counterclockwise di-rection. It is clear that such a linear transformation will not send any lineto itself: rotating by 90◦ moves every line.

Of course, it is apparent that rotating by any angle that is not an integermultiple of π will have the same property: it will move every line, so will haveno eigenvalues. You should check this by using the formula for Aθ in(7-1)and then applying the quadratic formula to the equation (2x2.eigenvals).

3.2. The polynomial x2− 4x+ 4 has one zero, namely 2, because it is asquare, so the matrix

A =

(3 −11 1

)has exactly one eigenvalue, namely 2. The 2-eigenspace is

E2 = N (A− 2I) = N(

1 −11 −1

)= R

(11

).

116 13. EIGENVALUES

You should check that A times

(11

)is 2

(11

). We know that dimE2 = 1

because the rank of A− 2I is 1 and hence its nullity is 1.

3.3. The polynomial x2−x−2 has two zeroes, namely −1 and 2, so thematrix

B =

(−2 −22 3

)has two eigenvalues, −1 and 2. The 2-eigenspace is

E2 = N (B − 2I) = N(−4 −22 1

)= R

(1−2

).

The (-1)-eigenspace is

E−1 = N (B + I) = N(−1 −22 4

)= R

(2−1

).

3.4. The examples we have just done should not mislead you into think-ing that eigenvalues must be integers! For example,

(1 21 1

)has two eigenval-

ues, neither of which is an integer. It would be sensible to try computingthose and the associated eigenspaces.

4. The equation A2 + I

You already know there is no real number x such that x2 + 1 = 0. Sincea number is a 1× 1 matrix it is not unreasonable to ask whether there is a2×2 matrix A such that A2 +I = 0. I will leave you to tackle that question.You can attack the question in a naive way by taking a 2× 2 matrix(

a bc d

)squaring it, setting the square equal to(

−1 00 −1

)and asking whether the system of four quadratic equations in a, b, c, d soobtained have a solution. Not pretty, but why not try it and find out whathappens.

As a cute illustration of the use of eigenvalues I will now show there isno 3× 3 matrix A such that A2 + I = 0. I will prove this by contradiction;i.e., I will assume A is a 3 × 3 matrix such that A2 + I = 0 and deducesomething false from that, thereby showing there can be no such A.

So, let’s get started by assuming A is a 3×3 matrix such that A2+I = 0.As a preliminary step I will show that A has no eigenvalues. If λ ∈ R

were an eigenvalue and v a non-zero vector such that Av = λv, then

−v = −Iv = A2v = A(Av) = A(λv) = λAv = λ2v.

But v 6= 0 so the equality −v = λ2v implies −1 = λ2. But this can not betrue: there is no real number whose square is −1 so I conclude that A can

5. THE CHARACTERISTIC POLYNOMIAL 117

not have an eigenvector. (Notice this argument did not use anything aboutthe size of the matrix A. If A is any square matrix with real entries suchthat A2 + I = 0, then A does not have an eigenvalue. We should really say“A does not have a real eigenvalue”.

With that preliminary step out of the way let’s proceed to prove theresult we want. Let v be any non-zero vector in R3 and let V = Sp(v,Av).Then V is an A-invariant subspace by which I mean if w ∈ V , then Aw ∈V . More colloquially, A sends elements of V to elements of V . Moreover,dimV = 2 because the fact that A has no eigenvectors implies that Av isnot a multiple of v; i.e., {v,Av} is linearly independent.

Since V is a 2-dimensional subspace of R3 there is a non-zero vector w ∈R3 that is not in V . By the same argument, the subspace W := Sp(w,Aw)is A-invariant and has dimension 2. Since w /∈ V , V + W = R3. From theformula

dim(V +W ) = dimV + dimW − dim(V ∩W )

we deduce that dim(V ∩ W ) = 1. Since V and W are A-invariant, so isV ∩ W . However, V ∩ W = Ru for some u and the fact that Ru is A-invariant implies that Au is a scalar multiple of u; i.e., u is an eigenvectorfor A. But that is false: we have already proved A has no eigenvectors.Thus we are forced to conclude that there is no 3 × 3 matrix A such thatA2 + I = 0.

4.1. Why did I just prove that? A recent midterm contained thefollowing question: if A is a 3×3 matrix such that A3+A = 0, is A invertible.The answer is “no”, but the question is a very bad question because the“proof” I had in mind is flawed. Let me explain my false reasoning: ifA3 + A = 0, then A(A2 + I) = 0 so A is not invertible because, as weobserved in chapter 4.4, if AB = 0 for some non-zero matrix B, then Ais not invertible. However, I can not apply the observation in chapter 4.4unless I know that A2 + I is non-zero, and it is not easy to show that A2 + Iis not zero. The argument above is the simplest argument I know, and itisn’t so simple.

There is a shorter argument using determinants but at the time of themidterm I had not introduced either determinants or eigenvalues.

5. The characteristic polynomial

The characteristic polynomial of an n× n matrix A is the degree n poly-nomial

det(A− tI).

You should convince yourself that det(A− tI) really is a polynomial in t ofdegree n. Let’s denote that polynomial by p(t). If we substitute a numberλ for t, then

p(λ) = det(A− λI).

The following result is an immediate consequence of this.

118 13. EIGENVALUES

Proposition 5.1. Let A be an n× n matrix, and let λ ∈ R. Then λ isan eigenvalue of A if and only if it is a zero of the characteristic polynomialof A.

Proof. Theorem 1.1 says that λ is an eigenvalue for A if and only det(A−λI) = 0. �

A polynomial of degree n has at most n roots. The next result followsfrom this fact.

Corollary 5.2. An n× n matrix has at most n eigenvalues.

The characteristic polynomial of

(a bc d

)is

t2 − (a+ d)t+ ad− bc.

6. How to find eigenvalues and eigenvectors

Finding eigenvalues and eigenvectors is often important. Here we willdiscuss the head-on, blunt, full frontal method. Like breaking into a bank,that is not always the most effective way, but it is certainly one thing to try.But keep in mind that a given matrix might have some special features thatcan be exploited to allow an easier method. Similarly, when breaking into abank, there might be some special features that might suggest an alternativeto the guns-ablazing method.

Let A be an n× n matrix. By Proposition 5.1, λ is an eigenvalue if andonly if det(A−λI) = 0, i.e., if and only if λ is a root/zero of the characteristicpolynomial. So, the first step is to compute the characteristic polynomial.Don’t forget to use elementary row operations before computing det(A− tI)if it looks like that will help—remember, the more zeroes in the matrix theeasier it is to compute its determinant . The second step is to find the rootsof the characteristic polynomial.

Once you have found an eigenvalue, λ say, remember that the λ-eigenspaceis the null space of A − λI. The third step is to compute that null space,i.e., find the solutions to the equation (A− λI)x = 0, by using the methodsdeveloped earlier in these notes. For example, put A − λI in row-reducedechelon form, find the independent variables and proceed from there.

6.1. An example. In order to compute the eigenvalues of the matrix

B :=

1 −1 −1 −1−1 1 −1 −1−1 −1 1 −1−1 −1 −1 1

we perform some elementary row operations on B− tI before computing itsdeterminant: subtracting the bottom row of B − tI from each of the other

6. HOW TO FIND EIGENVALUES AND EIGENVECTORS 119

rows does not change the determinant, so det(B − tI) is equal to

det

1− t −1 −1 −1−1 1− t −1 −1−1 −1 1− t −1−1 −1 −1 1− t

= det

2− t 0 0 t− 2

0 2− t 0 t− 20 0 2− t t− 2−1 −1 −1 1− t

.

If we replace a row by c times that row we change the determinant by afactor of c, so

det(B − tI) = (2− t)3 det

1 0 0 −10 1 0 −10 0 1 −1−1 −1 −1 1− t

.

The determinant does not change if we add each of the first three rows tothe last row so

det(B − tI) = (2− t)3 det

1 0 0 −10 1 0 −10 0 1 −10 0 0 −2− t

= (t− 2)3(t+ 2).

The eigenvalues are ±2. Now

E2 = N (B − 2I) = N

−1 −1 −1 −1−1 −1 −1 −1−1 −1 −1 −1−1 −1 −1 −1

.

The rank of B−2I, i.e., the dimension of the linear span of its columns, is 1,and its nullity is therefore 3. Therefore dimE2 = 3. It is easy to find threelinearly independent vectors x such that (B − 2I)x = 0 and so compute

E2 = R

1−100

+ R

10−10

+ R

100−1

.

Finally,

E−2 = N (B + 2I) = N

3 −1 −1 −1−1 3 −1 −1−1 −1 3 −1−1 −1 −1 3

= R

1111

.

6.2. Another example. Let

A :=

1 1 1 1 11 1 1 1 11 1 1 1 11 1 0 1 11 1 1 0 1

.

120 13. EIGENVALUES

The characteristic polynomial of A is

x2(x3 − 5x2 + 2x− 1).

The x2 factor implies that the 0-eigenspace, i.e., the null space of A, hasdimension two. It is easy to see that

N (A) = R

1−1000

+ R

1000−1

.

We don’t need to know the zeroes of the cubic factor f(x) = x3−5x2+2x−1just yet. However, a computation gives

f(0) = −1, f(23

)> 0, f(1) = −3, f(10) > 0,

so the graph y = f(x) crosses the x-axis at points in the open intervals(0, 23),

(23 , 1), and (1, 10).

Hence f(x) has three real zeroes, and A has three distinct non-zero eigen-values. Because A has a 2-dimensional 0-eigenspace it follows that R5 has abasis consisting of eigenvectors for A.

Claim: If λ is a non-zero eigenvalue for A, then1− λ−1 − λ1− λ−1 − λ4λ− 1− λ2−3

2λ−1 + 1− λ

is a λ-eigenvector for A.

Proof of Claim: We must show that Ax = λx. Since λ is an eigenvalueit is a zero of the characteristic polynomial; since λ 6= 0, f(λ) = 0, i.e.,λ3 − 5λ2 + 2λ− 1 = 0. The calculation

Ax =

1 1 1 1 11 1 1 1 11 1 1 1 11 1 0 1 11 1 1 0 1

1− λ−1 − λ1− λ−1 − λ4λ− 1− λ2−3

2λ−1 + 1− λ

=

λ− 1− λ2λ− 1− λ2λ− 1− λ2−3λ

2 + λ− λ2

= λx

shows that the Claim is true, however please, please, please, notice we usedthe fact that λ is a solution to the equation x3 − 5x2 + 2x − 1 = 0 in thecalculation in the following way: because λ3 − 5λ2 + 2λ − 1 = 0 it followsthat λ − 1 − λ2 = λ(4λ − 1 − λ2), so the third entry in Ax is λ times thethird entry in x, thus justifying the final equality in the calculation. ♦

Notice we did not need to compute the eigenvalues in order to find theeigenspaces. If one wants more explicit information one needs to determinethe roots of the cubic equation x3 − 5x2 + 2x− 1 = 0.

6. HOW TO FIND EIGENVALUES AND EIGENVECTORS 121

6.3. How did I find the eigenvector in the previous example?That is the real problem! If someone hands you a matrix A and a vector vand claims that v is an eigenvector one can test the truth of the claim bycomputing Av and checking whether Av is a multiple of v.

To find the eigenvector I followed the proceedure described in the firstparagraph of section 6. By definition, the λ-eigenspace is the space of solu-tions to the homogeneous equation (A − λI)x = 0. To find those solutionswe must put

1− λ 1 1 1 11 1− λ 1 1 11 1 1− λ 1 11 1 0 1− λ 11 1 1 0 1− λ

in row-reduced echelon form. To do this first subtract the top row from eachof the other rows to obtain

1− λ 1 1 1 1λ −λ 0 0 0λ 0 −λ 0 0λ 0 −1 −λ 0λ 0 0 −1 −λ

.

Now add the bottom row to the top row and subtract the bottom row fromeach of the middle three rows, then move the bottom row to the top to get

λ 0 0 −1 −λ1 1 1 0 1− λ0 −λ 0 1 λ0 0 −λ 1 λ0 0 −1 1− λ λ

.

Write ν = λ−1 (this is allowed since λ 6= 0) and multiply the first, third,and fourth rows by ν to get

1 0 0 −ν −11 1 1 0 1− λ0 −1 0 ν 10 0 −1 ν 10 0 −1 1− λ λ

.

Replace the second row by the second row minus the first row plus the thirdrow to get

1 0 0 −ν −10 0 1 2ν 3− λ0 −1 0 ν 10 0 −1 ν 10 0 −1 1− λ λ

.

122 13. EIGENVALUES

Now multiply the third row by −1, add the bottom row to the second row,and subtract the bottom row from the fourth row, then multiply the bottomrow by −1, to get

B :=

1 0 0 −ν −10 0 0 2ν + 1− λ 30 1 0 −ν −10 0 0 ν − 1 + λ 1− λ0 0 1 −1 + λ −λ

.

There is no need to do more calculation to get A in row-reduced echelonform because it is quite easy to write down the solutions to the systemBx = 0 which is equivalent to the system (A − λI)x = 0. Write x =(x1, x2, x3, x4, x5)

T . From the second row of B we see that we should take

x4 = −3 and x5 = 2ν + 1− λ.

From the other rows of B we see that we must have

x1 = νx4 + x5 = −3ν + (2ν + 1− λ)x2 = νx4 + x5 = −3ν + (2ν + 1− λ)x3 = (1− λ)x4 + λx5 = 3(λ− 1) + λ(2ν + 1− λ).

Thus

x =

1− ν − λ1− ν − λ

4λ− 1− λ2−3

2ν + 1− λ

.

This is the matrix in the statement of the claim we made on page 120.

6.4. It ain’t easy. Before finding the eigenvector in the claim I mademany miscalculations, a sign here, a sign there, and pretty soon I wasdoomed and had to start over. So, take heart: all you need is perseverance,care, and courage, when dealing with a 5× 5 matrix by hand. Fortunately,there are computer packages to do this for you. The main thing is to un-derstand the principle for finding the eigenvectors. The computer programssimply implement the principle.

7. The utility of eigenvectors

Let A be an n × n matrix, and suppose we are in the following goodsituation:

• Rn has a basis consisting of eigenvectors for A;• given v ∈ Rn we have an effective means of writing v as a linear

combination of the eigenvectors for A.

Then, as the next result says, it is easy to compute Av.

7. THE UTILITY OF EIGENVECTORS 123

Proposition 7.1. Suppose that A is an n×n matrix such that Rn has abasis consisting of eigenvectors for A. Explicitly, suppose that {v1, . . . , vn}is a basis for Rn and that each vi is an eigenvector for A with eigenvalueλi. If v = a1v1 + · · ·+ anvn, then

Av = a1λ1v1 + · · ·+ anλnvn.

CHAPTER 14

Complex vector spaces and complex eigenvalues

1. The complex numbers

The set of complex numbers is denoted by C. I assume you have alreadymet complex numbers but will give a short refresher course in chapter 2below.

The complex numbers were introduced to provide roots for polynomialsthat might have no real roots. The simplest example is the polynomialx2 + 1. It has no real roots because the square of a real number is neverequal to -1.

Complex numbers play a similar role in linear algebra: a matrix withreal entries might have no real eigenvalues but it will always have at leastone complex eigenvalue (unless it is a 1× 1 matrix, i.e., a number.

For example, the matrix (0 −11 0

)representing counterclockwise rotation by 90◦ moves every 1-dimensionalsubspace of R2 so has no eigenvectors and therefore no eigenvalues, at leastno real eigenvalues. I like that geometric argument but one can also see ithas no real eigenvalues because its characteristic polynomial x2 + 1 has noreal roots. However, x2 + 1 has two complex roots, ±i, and the matrix hasthese as complex eigenvalues and has two corresponding eigenvectors in C2.Explicitly, (

0 −11 0

)(1i

)=

(−i1

)= −i

(1i

)and (

0 −11 0

)(1−i

)=

(i1

)= i

(1−i

).

As this calculation suggests, all the matrix algebra and linear algebrawe have done so far extends with essentially no change from R to C. Wesimply allow matrices to have entries that are complex numbers and insteadof working with Rn we work with Cn, n × 1 column matrices, or vectors,whose entries are complex numbers.

2. The complex numbers

This interlude is to remind you of some basic facts about complex num-bers.

125

126 14. COMPLEX VECTOR SPACES AND COMPLEX EIGENVALUES

The set of complex numbers is an enlargement of the set of real numberscreated by adjoining a square root of -1, denoted by i, and then making theminimal enlargement required to permit addition, subtraction and multipli-cation of all the numbers created by this process. Thus C contains R andbecause we want to add and multiply complex numbers we define

C := {a+ ib | a, b ∈ R}.Notice that C contains R. The real numbers are just those complex numbersa + ib for which b = 0. Of course we don’t bother to write a + i0; we justwrite a.

We often think of the elements of R as lying on a line. It is common tothink of C as a plane, a 2-dimensional vector space over R, the linear spanof 1 and i.

Every complex number can be written in a unique way as a + ib forsuitable real numbers a and b, We add by

(a+ ib) + (c+ id) = (a+ c) + i(b+ d).

and multiply by

(a+ ib)(c+ id) = ac+ iad+ ibc+ i2bd = (ac− bd) + i(ad+ bc).

If b = d = 0 we get the usual addition and multiplication of real numbers.An important property enjoyed by both real and complex numbers is

that non-zero numbers have inverses. If a+ ib 6= 0, then

a− iba2 + b2

=1

a+ ib.

You should check this by multiplying the left-hand term by a+ib and notingthat the answer is 1.

2.1. Reassurance. If you haven’t met the complex numbers beforedon’t be afraid. You have done something very similar before and survivedthat so you will survive this. Once upon a time your knowledge of numberswas limited to fractions, i.e., all you knew was Q. Imagine yourself in thatsituation again. I now come to you saying I have a new number that I willdenote by the symbol 1 and I wish to enlarge the number system Q to anew number system that I will denote by Q(1). I propose that in my newnumber system the product of 1 with itself is 2, i.e., 12= 2, and I furtherpropose that Q(1) consists of all expressions

a+ b 1

with the addition and multiplication in Q(1) defined by

(a+ b 1) + (c+ d 1) := (a+ c) + (b+ d) 1

and(a+ b 1)(c+ d 1) := (ac+ 2bd) + (ad+ bc) 1 .

It might be a little tedious to check that the multiplication is associative,that the distributive law holds, and that every non-zero number in Q(1) has

2. THE COMPLEX NUMBERS 127

an inverse that also belongs to Q(1). The number system Q(1) might alsoseem a little mysterious until I tell you that a copy of Q(1) can already befound inside R. In fact, 1 is just a new symbol for

√2. The number a+ b 1

is really just a+ b√

2.Now re-read the previous paragraph with this additional information.

2.2. The complex plane. As mentioned above, it is useful to picturethe set of real numbers as the points on a line, the real line, and equallyuseful to picture the set of complex numbers as the points on a plane. Thehorizontal axis is just the real line, sometimes called the real axis, and thevertical axis, sometimes called the imaginary axis, is all real multiples of i.We denote it by iR.

iR

Thecomplexplane C

•a+ib

•1+2i

•i−1 •i •1+i •2+i

//•1

R

OO

2.3. Complex conjugation. Let z = a+ ib ∈ C. The conjugate of z,denoted z, is defined to be

z := a− ib.You can picture complex conjugation as reflection in the real axis. Theimportant points are these

• zz = a2 + b2 is a real number;• zz ≥ 0;• zz = 0 if and only if z = 0;•√zz is the distance from zero to z;

• wz = wz;• w + z = w + z;• z = z if and only if z ∈ R.

It is common to call√zz the absolute value of z and denote it by |z|. It is

also common to call√zz the norm of z and denote it by ||z||.

Performing the conjugate twice gets us back where we started: z = z.The geometric interpretation of this is that the conjugate of a complex


number is its relection in the real axis and z = z because reflecting twice inthe real axis is the same as doing nothing.

2.4. Roots of polynomials. Complex numbers were invented to pro-vide solutions to quadratic polynomials that had no real solution. Theur-example is x2 + 1 = 0 which has no solutions in R but solutions ±i inC. This is the context in which you meet complex numbers at high school.You make the observation there that the solutions of ax2 + bx + c = 0 aregiven by

−b±√b2 − 4ac

2a

provided a 6= 0 and the equation has no real solutions when b2 − 4ac < 0but has two complex zeroes,

λ = − b

2a+ i

√4ac− b2

2aand λ = − b

2a− i√

4ac− b22a

.

Notice that the zeroes λ and λ are complex conjugates of each other.It is a remarkable fact that although the complex numbers were invented

with no larger purpose than providing solutions to quadratic polynomials,they provide all solutions to polynomials of all degrees.

Theorem 2.1. Let f(x) = xn + a1xn−1 + a2x

n−2 + · · · + an−1x + anbe a polynomial with coefficients a1, . . . , an belonging to R. Then there arecomplex numbers r1, . . . , rn such that

f(x) = (x− r1)(x− r2) · · · (x− rn).

Just as for quadratic polynomials, the zeroes of f come in conjugate pairsbecause if f(λ) = 0, then, using the properties of conjugation mentioned atthe end of chapter 2.3,

f(λ) = λn + a1λn−1 + a2λ

n−2 + · · ·+ an−1λ+ an

= λn + a1λn−1 + a2λn−2 + · · ·+ an−1λ+ an

= λn + a1λn−1 + a2λn−2 + · · ·+ an−1λ+ an

= f(λ)

= 0.

2.5. Application to the characteristic polynomial of a real ma-trix. Let A be an n × n matrix whose entries are real numbers. Thendet(A − tI) is a polynomial with real coefficients. If λ ∈ C is a zero ofdet(A− tI) so is λ.

Recall that the eigenvalues of A are exactly (the real numbers that are)the zeroes of det(A− tI). We wish to view the complex zeroes of det(A− tI)as eigenvalues of A but to do that we have to extend our framework forlinear algebra from Rn to Cn.

4. THE COMPLEX NORM 129

3. Linear algebra over C

We now allow matrices to have entries that are complex numbers. Be-cause R is contained in C linear algebra over C includes linear algebra overR, i.e., every matrix with real entries is a matrix with complex entries byvirtue of the fact that every real number is a complex number.

Hence, left multiplication by an m × n matrix with real entries can beviewed as a linear transformation Cn → Cm in addition to being viewed asa linear transformation Rn → Rm.

We extend the operation of complex conjugation to matrices. If A =(aij) is a matrix with entries aij ∈ C, we define the conjugate of A to be

A := (aij),

i.e., the matrix obtained by replacing every entry of A by its complex con-jugate.

It is straightforward to verify that AB = AB and A+B = A+ B.

Proposition 3.1. Let A be an n× n matrix with real entries. If λ ∈ Cis an eigenvalue for A so is λ. Furthermore, if x is a λ-eigenvector for A,then x is a λ eigenvector for A.

Proof. This is a simple calculation. If Ax = λx, then

Ax = Ax = Ax = λx = λx

as claimed. �

The first half of Proposition 3.1 is a consequence of the fact that if λ isa zero of a polynomial with real coefficients so is λ.

3.1. The conjugate transpose. For complex matrices, i.e., entrieshaving complex entries, one often combines the operations of taking thetranspose and taking the conjugate. Notice that these operations commute,i.e., it doesn’t matter which order one performs them: in symbols(

A)ᵀ

= Aᵀ.

We often writeA∗ := A

ᵀ

and call this the conjugate transpose of A.

4. The complex norm

Let z ∈ C. The norm of z is the number

||z|| =√zz.

If z 6= 0,z−1 = z/||z||.

Check this.We can extend the notion of norm, or “size”, from complex numbers to

vectors in Cn.


One important difference between complex and real linear algebra in-volves the definition of the norm of a vector. Actually, this change alreadybegins when dealing with complex numbers. The distance of a real num-ber x from zero is given by its absolute value |x|. The distance of a point

x = (x1, x2) ∈ R2 from zero is√x21 + x22. Picturing the complex plane as

R2, the distance of z = a+ ib from 0 is√a2 + b2 =

√zz.

If Rn, the distance of a point x = (x1, . . . , xn)ᵀ from the origin is givenby taking the square root of its dot product with itself:

||x|| =√x21 + · · ·+ x2n =

√x · x =

√xᵀx.

Part of the reason this works is that a sum of squares of real numbers isalways ≥ 0 and is equal to zero only when each of those numbers is zero.This is no longer the case for complex numbers.

The norm of x ∈ Cn is

||x|| =√x∗x.

Since

x∗x =n∑j=1

xjxj =∑j=1

|xj |2

is a real number ≥ 0 the definition of ||x|| makes sense—it is the non-negativesquare root of the real number x∗x.

4.1. Real symmetric matrices have real eigenvalues. The nextresult is proved by a nice application of the complex norm.

Theorem 4.1. All the eigenvalues of a symmetric matrix with real en-tries are real.

Proof. Let A be real symmetric n × n matrix and λ an eigenvalue for A.We will show λ is a real number by proving that λ = λ. Because λ is aneigenvalue, there is a non-zero vector x ∈ Cn such that Ax = λx. Then

λ||x||2 = λx∗x = x∗λx = x∗Ax = (Aᵀx)ᵀx = (Ax)ᵀx

where the last equality follows from the fact that A is symmetric. Continuingthis string of equalities, we obtain

(Ax)ᵀx = (λx)ᵀx = λx∗x = λ||x||2.We have shown that

λ||x||2 = λ||x||2

from which we deduce that λ = λ because ||x||2 6= 0. Hence λ ∈ R. �

This is a clever proof. Let’s see how to prove the 2 × 2 case in a lessclever way. Consider the symmetric matrix

A =

(a bb d

)

5. AN EXTENDED EXERCISE 131

where a, b, d are real numbers. The characteristic polynomial of A is

t2 − (a+ d)t+ ad− b2 = 0

and your high school quadratic formula tells you that the roots of this poly-nomial are

±12

(a+ d±

√(a+ d)2 − 4(ad− b2)

However,(a+ d)2 − 4(ad− b2) = (a− d)2 + 4b2

which is ≥ 0 because it is a sum of squares. Hence√

(a+ d)2 − 4(ad− b2)is a real number and we conclude that the roots of the characteristic poly-nomial, i.e., the eigenvalues of A, are real.

It might be fun for you to try to prove the 3 × 3 case by a similarelementary argument.

5. An extended exercise

Let

A =

2 −1 −1 −11 −1 −1 01 0 −1 −11 −1 0 −1

.

Compute the characteristic polynomial of A and factor it as much as you can.Remember to use elementary row operations before computing det(A−xI).Can you write the characteristic polynomial of A as a product of 4 linearterms, i.e., as (x− α)(x− β)(x− γ)(x− δ) with α, β, γ, δ ∈ C?

What are the eigenvalues and corresponding eigenspaces for A in C4?Show that A6 = I and that no lower power of A is the identity.Suppose that λ is a complex eigenvalue for A, i.e., there is a non-zero

vector x ∈ C4 such that Ax = λx. Show that λ6 = I.Let ω = e2πi/3. Thus ω3 = 1. Show that (t− 1)(t− ω)(t− ω2) = 0.Show that

{±1,±ω,±ω2} = {λ ∈ C | λ6 = 1}.Compute Av1, Av2, Av3, and Av4, where

v1 =

1111

, v2 =

3111

, v3 =

01ωω2

, v4 =

01ω2

ω

.

CHAPTER 15

Orthogonality

The word orthogonal is synonymous with perpendicular.

1. Geometry

In earlier chapters I often used the word geometry and spoke of theimportance of have a geometric view of Rn and the sets of solutions tosystems of linear equations. However, geometry really involves lengths andangles which have been absent from our discussion so far. The word geometryis derived from the Greek word geometrein which means to measure the land:cf. γη or γαια meaning earth and metron or µετρoν meaning an instrumentto measure—metronome, metre, geography, etc.

In order to introduce angles and length in the context of vector spaceswe use the dot product.

2. The dot product

The dot product of vectors u = (u1, . . . , un) and v = (v1, . . . , vn) in Rnis the number

u · v := u1v1 + · · ·+ unvn.

The norm or length of u is||u|| := √u · u.

The justification for calling this the length is Pythagoras’s Theorem.Notice that

• u · u = 0 if and only if u = 0—this makes sense: the only vectorhaving length zero is the zero vector.• If u and v are column vectors of the same size, then u · v = uT v =vTu.

2.1. Angles. We define the angle between two non-zero vectors u andv in Rn to be the unique angle θ ∈ [0, π] such that

cos(θ) :=u · v||u|| ||v||

.

Since cos(π2 ) = 0, the angle between u and v is π2 = 90◦ if and only if u·v = 0.

Before accepting this definition as reasonable we should check that itgives the correct answer for the plane.

Suppose u, v ∈ R2. If u or v is a multiple of the other, then the anglebetween them is zero and this agrees with the formula because cos(0) = 1.

133

134 15. ORTHOGONALITY

Now assume u 6= v. The cosine rule applied to the triangle with sides u, v,and u− v, gives

||u− v||2 = ||u||2 + ||v||2 − 2||u|| ||v|| cos(θ).

The left-hand side of this expression is ||u||2− 2(u · v) + ||v||2. The result nowfollows.

General case done by reducing to the R2 case using the fact that everypair of vectors in Rn lies in a plane.

3. Orthogonal vectors

Two non-zero vectors u and v are said to be orthogonal if u · v = 0.The following result is a cute observation.

Proposition 3.1. Let A be a symmetric n×n matrix and suppose λ andµ are different eigenvalues of A. If u ∈ Eλ and v ∈ Eµ, then u is orthogonalto v.

Proof. The calculation

λuT v = (λu)T v = (Au)T v = uTAT v = uTAv = µuT v

shows that (λ− µ)uT v = 0. But λ− µ 6= 0 so uT v = 0. �

Let S be a non-empty subset of Rn. We define

S⊥ := {w ∈ Rn | w · v = 0 for all v ∈ S},

i.e., S⊥ consists of those vectors that are perpendicular to every vector in S.We call S⊥ the orthogonal to S and usually say “S perp” or “S orthogonal”when we read S⊥.

Lemma 3.2. If S is a non-empty subset of Rn, then S⊥ is a subspace ofRn.

Proof. Certainly 0 ∈ S⊥. If w1 and w2 are in S⊥ so is w1 + w2 because ifu ∈ S, then

(w1 + w2) · u = w1 · u+ w2 · u = 0 + 0 = 0.

If w ∈ S⊥ and λ ∈ R, then λw ∈ S⊥ because if u ∈ S, then (λw) · u =λ(w · u) = λ× 0 = 0. �

Lemma 3.3. If V is a subspace of Rn, then V ∩ V ⊥ = {0}.

Proof. If v is in V ∩ V ⊥, then v · v = 0, whence v = 0. �

5. ORTHOGONAL AND ORTHONORMAL BASES 135

4. Classical orthogonal polynomials

In chapter ?? we introduced the infinite dimensional vector space R[x]consisting of all polynomials with real coefficients. You are used to writingpolynomials with respect to the basis 1, x, x2, . . .. However, there are manyother bases some of which are extremely useful.

Suppose we define a “dot product” on R[x] by declaring that

f · g :=

∫ 1

−1

f(x)g(x)√1− x2

dx.

It is easy to see that f · g = g · f , that f · (g + h) = f · g + f · g, and(λf) · g = λ(f · g) for all λ ∈ R.

It is common practice to call this dot product the inner product and writeit as (f, g) rather than f · g. We will do this.

The Chebyshev polynomials are defined to be

T0(x) := 1

T1(x) := x and

Tn(x) = 2xTn(x)− Tn−1(x) for all n ≥ 2.

Theorem 4.1. The set of Chebyshev polynomials are orthogonal to oneanother with respect to the inner product

∫ 1

−1

Tm(x)Tn(x)√1− x2

dx =

0 if m 6= n

π if m = n = 0

π/2 if m = n ≥ 1

Proof. Use a change of variables x = cos(θ) and use an induction argumentto prove that Tn(cos(θ)) = cos(nθ). �

5. Orthogonal and orthonormal bases

A set of non-zero vectors {v1, . . . , vd} is said to be orthogonal if vi isorthogonal to vj whenever i 6= j. A set consisting of a single non-zero vectoris declared to be an orthogonal set.

Lemma 5.1. An orthogonal set of vectors is linearly independent.

Proof. Let {v1, . . . , vd} be an orthogonal set of vectors and suppose thata1v1 + · · ·+ advd = 0. Then

0 = vi · (a1v1 + · · ·+ advd).

However, when we multiply this out all but one term will vanish becausev·ivj = 0 when j 6= i. The non-vanishing term is aivi, so aivi = 0. But vi 6= 0,so ai = 0. Thus all the ais are zero and we conclude that {v1, . . . , vd} islinearly independent. �


An orthogonal basis is just that, a basis consisting of vectors that areorthogonal to one another. If in addition each vector has length one we callit an orthonormal basis.

The standard basis {e1, . . . , en} for Rn is an orthonormal basis.

It is usually a tedious chore to express an element w in a subspace W asan explicit linear combination of a given basis {v1, . . . , vd} for W . To find thenumbers a1, . . . , ad such that w = a1v1+ · · ·+advd usually involves solving asystem of linear equations. However, if {v1, . . . , vd} is an orthonormal basisfor W it is trivial to find the ais. That’s one reason we like orthogonal bases.

Proposition 5.2. Suppose {v1, . . . , vd} is an orthonormal basis for W .If w ∈W , then

w = (w · v1)v1 + · · ·+ (w · vd)vd.

Proof. Notice first that each (w · vi) is a number!We know there are numbers a1, . . . , ad such that w = a1v1 + · · ·+ advd.

It follows that

w · vi = (a1v1 + · · ·+ advd) · vi= a1(v1 · vi) + · · ·+ ad(vd · vi).

Because {v1, . . . , vd} is an orthonormal set of vectors vj · vi = 0 when j 6= iand vi · vi = 1. Hence

wT vi = a1(v1 · vi) + · · ·+ ad(vd · vi) = ai.

The result now follows. �

That is the great virtue of orthonormal bases. Unfortunately, one isnot always fortunate enough to be presented with an orthonormal basis.However, there is a standard mechanism, the Gram-Schmidt process, thatwill take any basis for W and produce from it an orthonormal basis for W .That is the subject of the next chapter.

6. The Gram-Schmidt process

Suppose B = {v1, . . . , vs} is a basis for a subspace W . The Gram-Schmidt process is a method for constructing from B an orthogonal basis{u1, . . . , us} for W .

Theorem 6.1. Every subspace of Rn has an orthogonal basis.

Proof. Let B = {v1, . . . , vs} be a basis for W . Define u1, u2, u3, . . . by

u1 := v1

u2 := v2 −u1 · v2||u1||2

u1

u3 := v3 −u1 · v3||u1||2

u1 −u2 · v3||u2||2

u2

etc.

6. THE GRAM-SCHMIDT PROCESS 137

We will prove the theorem by showing that ui · uj = 0 when i 6= j.The first case is i = 2 and j = 1. We get

u2 · u1 = v2 · u1 −u1 · v2||u1||2

u1 · u1.

But u1 · u1 = ||u1||2 so those factors cancel in the second term to leave

u2 · u1 = v2 · u1 − u1 · v2which is zero because u · v = v · u. Hence u2 is orthogonal to u1.

The next case is i = 3 and j = 1. We get

(6-1) u3 · u1 = v3 · u1 −u1 · v3||u1||2

u1 · u1 −u2 · v3||u2||2

u2 · u1.

The right-most term in (6-1) is zero because u2 · u1 = 0. Arguing as in theprevious paragraph shows that the two terms immediately to the right ofthe = sign in (6-2) cancel so we get u3 · u1 = 0.

Carry on like this to get ui · u1 for all i ≥ 2.The next case is to take i = 3 and j = 2. We get

(6-2) u3 · u2 = v3 · u2 −u1 · v3||u1||2

u1 · u2 −u2 · v3||u2||2

u2 · u2.

The second term on the right-hand-side is zero because u1 · u2 = 0; in thethird term, u2 · u2 = ||u2||2 so those factors cancel to leave

u3 · u2 = v3 · u2 − u2 · v3which is zero. Hence u3 is orthogonal to u2.

Now do the case i = 4 and j = 2, and carry on like this to get ui ·u2 = 0for all i ≥ 3. Then start again with i = 4 and j = 3, et cetera, et cetera!Eventually, we prove that ui · uj = 0 for all i 6= j.

We still need to prove that {u1, . . . , us} is a basis for W . Because theuis are an orthogonal set they are linearly independent. Their linear spanis therefore a vector space of dimension s. By definition, each ui is inSp(v1, . . . , vs) which equals W . Thus Sp(u1, . . . , us) is an s-dimensionalsubspace of the s-dimensional vector space W , and therefore equal to it byCorollary 5.2.1 �

Corollary 6.2. Every subspace of Rn has an orthonormal basis.

1Alternative proof that the uis span W . Rewrite the definition of ui by putting vion one side and so obtain

vi = ui + a linear combination of u1, . . . , ui−1.

Hence vi ∈ Sp(u1, . . . , us). Hence Sp(v1, . . . , vs) ⊆ Sp(u1, . . . , us); but Sp(u1, . . . , us) ⊆Sp(v1, . . . , vs) too so the two linear spans are equal. Since Sp(v1, . . . , vs) = W we concludethat Sp(u1, . . . , us) = W .


Proof. Let W be a subspace of Rn. By Theorem 6.1, W has an orthogonalbasis. Let {u1, . . . , us} be an orthogonal basis for W . If we divide each uiby its length we obtain a set of orthogonal vectors

u1||u1||

,u2||u2||

, . . . ,us||us||

each of which has length one; i.e., we obtain an orthonormal basis for W . �

Theorem 6.3. Let A be an n × n symmetric matrix. Then Rn has anorthonormal basis consisting of eigenvectors for A.

Proof. By Theorem 4.1, all A’s eigenvalues are real. �

7. Orthogonal matrices

A matrix A is orthogonal if AT = A−1. By definition, an orthogonalmatrix has an inverse so must be a square matrix.

An orthogonal linear transformation is a linear transformation implementedby an orthogonal matrix, i.e., if T (x) = Ax and A is an orthogonal matrixwe call T an orthogonal linear transformation.

Theorem 7.1. The following conditions on an n×n matrix Q are equiv-alent:

(1) Q is an orthogonal matrix;(2) Qx ·Qy = x · y for all x, y ∈ Rn;(3) ||Qx|| = ||x|| for all x ∈ Rn;(4) the columns of Q, {Q

1, . . . , Q

n}, form an orthonormal basis for Rn.

Proof. We start by pulling a rabbit out of a hat. Let x, y ∈ Rn. Then

14

(||x+ y||2 − ||x− y||2

)= 1

4

(x · x+ 2x · y + y · y

)− 1

4

(x · x− 2x · y + y · y

)= x · y.

We will use the fact that 14

(||x+ y||2 − ||x− y||2

)later in the proof.

(1) ⇒ (2) If Q is orthogonal, then QTQ = I so

(Qx) · (Qy) = (Qx)T (Qy) = xTQTQy = xT y = x · y.(2) ⇒ (3) because (3) is a special case of (2), the case x = y.(3) ⇒ (4) By using the hypothesis in (3) and the rabbit we obtain

x · y = 14

(||x+ y||2 − ||x− y||2

)= 1

4

(||Q(x+ y)||2 − ||Q(x− y)||2

)= 1

4

(||Qx+Qy||2 − ||Qx−Qy||2

)= (Qx) · (Qy).

(We have just shown that (3) implies (2).) In particular, if {e1, . . . , en} isthe standard basis for Rn, then

ei · ej = (Qei) · (Qej) = Qi·Q

j.

7. ORTHOGONAL MATRICES 139

But ei · ej = 0 if i 6= j and = 1 if i = j so the columns of Q form anorthonormal set and therefore an orthonormal basis for Rn.

(4) ⇒ (1) The ijth entry in QTQ is equal to the ith row of QT times theith column of Q, i.e., to QT

iQj

= Qi·Q

jwhich is 1 when i = j and 0 when

i 6= j so QTQ = I. �

7.1. Orthogonal linear transformations are of great practical importancebecause they do not change the lengths of vectors or, as a consequence ofthat, the angles between vectors. For this reason we often call the lineartransformations given by orthogonal matrices “rigid motions”. Moving arigid object, by which we mean some some kind of physical structure, asubmarine, a rocket, a spacecraft, or piece of machinery, in space does notchange the lengths of its pieces or the angles between them.

7.2. The orthogonal groups. You should check the following facts:the identity matrix is orthogonal; a product of orthogonal matrices is anorthogonal matrix; the inverse of an orthogonal matrix is orthogonal. Theset of all n × n orthogonal matrices is called the n × n orthogonal groupand is denoted by O(n). The orthogonal groups are fundamental objects inmathematics.

7.3. Rigid motions in R2. We have met two kinds of rigid motionsin R2, rotations and reflections. Let’s check that both are orthogonal trans-formations. If Aθ denotes the matrix representing the rotation by an angleθ in the counterclockwise direction, then

Aθ(Aθ)T =

(cos θ − sin θsin θ cos θ

)(cos θ sin θ− sin θ cos θ

)=

(1 00 1

).

Hence Aθ is an orthogonal matrix.On the other hand, reflection in the line L through the origin and a

non-zero point

(ab

)is given by the matrix

A =1

a2 + b2

(a2 − b2 2ab

2ab b2 − a2).

This is an orthogonal linear transformation because

ATA = A2 =1

(a2 + b2)2

(a4 + 2a2b2 + b4 0

0 a4 + 2a2b2 + b4

)= I.

Notice an important difference between rotations and reflections: thedeterminant of a rotation is +1 but the determinant of a reflection is −1.

Proposition 7.2. Every rigid motion in R2 is a composition of a rota-tion and a reflection.

Proof. Let A be an orthogonal 2× 2 matrix. �


7.4. The standard basis vectors e1, . . . , en are orthogonal to one another.We said that an orthogonal matrix preserves the angles between vectors so,if that is correct, an orthogonal matrix A will have the property that thevectors Aej are orthogonal to one another. But Aej is the jth column of A.Thus, the next result confirms what we have been saying.

Theorem 7.3. Let A be a symmetric matrix with real entries. Thenthere is an orthogonal matrix Q such that Q−1AQ is diagonal.

Proof. �

Theorem 7.4. Let A be a diagonal matrix with real entries. For everyorthogonal matrix Q, Q−1AQ is symmetric.

Proof. A matrix is symmetric if it is equal to its transpose. We have(Q−1AQ

)T= QTAT

(Q−1

)T= Q−1AT

(QT)T

= Q−1AQ

so Q−1AQ is symmetric. �

8. Orthogonal projections

Let V be a subspace of Rn. Let {u1, . . . , uk} be an orthonormal basisfor V .

The function PV : Rn → Rn defined by the formula

(8-1) PV (v) := (v · u1)u1 + · · ·+ (v · uk)ukis called the orthogonal projection of Rn onto V .

We are not yet justified in calling PV the orthogonal projection of Rnonto V because the definition of PV depends on the basis and every subspaceof Rn of dimension ≥ 2 has infinitely many orthogonal bases. For example,if a2 + b2 = 1, then {(a, b), (b,−a)} is an orthonormal basis for R2. In §??we will show that PV does not depend on the choice of orthonormal basis;that will justify our use of the word “the”.2

Proposition 8.1. Let {u1, . . . , uk} be an orthonormal basis for V . Theorthogonal projection PV : Rn → Rn has the following properties:

(1) it is a linear transformation;(2) PV (v) = v for all v ∈ V ;(3) R(PV ) = V ;(4) ker(PV ) = V ⊥;(5) PV ◦ PV = PV .3

2Thus, although the formula on the right-hand side of (8-1) depends on the choice oforthonormal basis the value PV (v) is independent of how it is computed.

3We usually abbreviate this by writing P 2V = PV .

8. ORTHOGONAL PROJECTIONS 141

Proof. We will write P for PV in this proof.(1) To show that P is a linear transformation we must show that P (av+

bw) = aP (v)+bP (w) for all a, b ∈ R and all v, w ∈ Rn. To do this we simplycalculate:

P (av + bw) =((av + bw) · u1

)u1 +

((av + bw) · u2

)u2 + · · ·

=(a(v · u1) + b(w · u1)

)u1 +

(a(v · u2) + b(w · u2)

)u2 + · · ·

= a(v · u1)u1 + b(w · u1)u1 + a(v · u2)u2 + b(w · u2)u2 + · · ·= a(v · u1)u1 + a(v · u2)u2 + · · ·+ b(w · u1)u1 + b(w · u2)u2 + · · ·= aP (v) + bP (w).

(2) Because the basis is orthonormal, ui · ui = 1 and ui · uj = 0 wheni 6= j. Therefore P (ui) = ui for each i = 1, . . . , k.

Let v ∈ V . Then v = a1u1 + · · · + akuk because V = span{u1, . . . , uk}.Therefore

P (v) = P (a1u1 + · · ·+ akuk)

= a1P (u1) + · · ·+ akP (uk) because P is a linear transformation

= a1u1 + · · ·+ akuk= v.

(3) Because (2) is true V ⊆ R(P ). On the other hand, if v is anyelement in Rn, then P (v) is a linear combination of the uis so belongs to V ;i.e., R(P ) ⊆ V . Hence R(P ) = V .

(4) If v ∈ V ⊥, then v · ui = 0 for all ui so P (v) = 0. Therefore V ⊥ ⊆ker(P ). Conversely, if w ∈ Rn and P (w) = 0, then

(w · u1)u1 + · · ·+ (w · uk)uk = 0;

but {u1, . . . , uk} is linearly independent so w · ui = 0 for all i. Thereforew ∈ Sp{u1, . . . , uk}⊥ = V ⊥. We have shown that ker(P ) ⊆ V ⊥. Henceker(P ) = V ⊥.

(5) Let v ∈ Rn. By (3), P (v) ∈ V . However, P (w) = w for all w ∈ V by(2). Applying this fact to w = P (v) we see that (P ◦ P )(v) = P (P (v)) = v.Since this equality holds for all v in Rn we conclude that P ◦ P = P . �

The next result shows that V + V ⊥ = Rn, i.e., every element in Rn isthe sum of an element in V and an element in V ⊥. Actually, the resultshows more: there is only one way to write a vector v ∈ Rn as the sum ofan element in V and an element in V ⊥.

Proposition 8.2. Let V be a subspace of Rn. If v ∈ Rn there are uniqueelements w ∈ V and w′ ∈ V ⊥ such that v = w+w′, namely w := PV (v) andw′ := v − PV (v).

Proof. We write P for PV in this proof.We start with the trivial observation that

(8-2) v = P (v) +(v − P (v)

).


By Proposition 8.1, P (v) belongs to V . On the other hand,

P (v − P (v)) = P (v)− P (P (v)) = P (v)− (P ◦ P )(v) = P (v)− P (v) = 0

so v − P (v) ∈ ker(P ). But ker(P ) = V ⊥ by Proposition 8.1. Therefore theexpression (8-2) shows that v is in V + V ⊥.

To show the uniqueness, suppose that w1, w2 ∈ V , that x1, x2 ∈ V ⊥, andthat w1 +x1 = w2 +x2. Therefore w1−w2 = x2−x1; since V is a subspacew1 − w2 ∈ V ; since V ⊥ is a subspace x2 − x1 ∈ V ⊥; but V ∩ V ⊥ = {0}so w1 − w2 = x2 − x1 = 0. Thus w1 = w2 and x1 = x2. This proves thatan element of Rn is the sum of an element in V and an element in V ⊥ in aunique way. �

The next result is just another way of stating Proposition 8.2.

Corollary 8.3. Let V is a subspace of Rn and v ∈ Rn. The only pointw ∈ V such that v − w ∈ V ⊥ is w = P (v).

Proof. Suppose that w ∈ V and v−w ∈ V ⊥. The equation v = w+(v−w)expresses v as the sum of an element in V and an element in V ⊥. But theonly way to express v as such a sum is v = PV (v) + (v − PV (v). Thereforew = PV (v). �

Proposition 8.4. If V is a subspace of Rn, then

dimV + dimV ⊥ = n.

Proof. Since PV is a linear transformation from Rn to itself there is aunique n×n matrix B such that PV (v) = Bv for all v ∈ Rn. It is clear thatR(PV ) = R(B) and ker(PV ) = N (B). But R(PV ) = V and ker(PV ) = V ⊥

son = rank(B) + nullity(B) = dim(V ) + dim(V ⊥)

as claimed. �

8.1. PV is well-defined. The projection PV was defined in terms ofan orthonormal basis {u1, . . . , uk} for V . However, if dim(V ) ≥ 2, then Vhas infinitely many orthonormal bases so why are we justified in calling PVthe orthogonal projection onto V ? It is conceivable that if {w1, . . . , wk} is adifferent orthonormal basis for V then the linear transformation T : Rn →Rn defined by the formula

T (v) := (v · w1)w1 + · · ·+ (v · wk)wkis not the same as PV , i.e., perhaps there is a v such that PV (v) 6= T (v).

This does not happen because we can use Proposition 8.2 to define

PV (v) :=

{the unique w ∈ V such that

v = w + w′ for some w′ ∈ V ⊥.Alternatively, using the point of view expressed by Corollary 8.3, we coulddefine PV (v) to be the unique element of V such that v − PV (v) belongs toV ⊥.

9. LEAST-SQUARES “SOLUTIONS” TO INCONSISTENT SYSTEMS 143

9. Least-squares “solutions” to inconsistent systems

A system of linear equations is inconsistent if it doesn’t have a solution.Until now we have ignored inconsistent systems for what might seem sensiblereasons—if Ax = b has no solutions what more is there to say? Actually, alot!

Although there may be no solution to Ax = b we might still want tofind an x such that Ax is as close as possible to b. For example, if b wasobtained as a result of making error-prone measurements it might be thatthe true value is not b but some other point b∗. If so, we should be seekingsolutions to the equation Ax = b∗ not Ax = b.

The equation Ax = b has a solution if and only if b is in the range of Aor, equivalently, if and only if b is a linear combination of the columns of A.Thus, if Ax = b has no solution it is sensible to replace b by the point b∗ inR(A) closest to b and find a solution x∗ to the equation Ax∗ = b∗; there issuch an x∗ because b∗ is in the range of A.

We will now implement this idea. The first step is the next result. Itsays that the point in R(A) closest to b is P (v) where P : Rn → Rn is theorthogonal projection onto R(A).4

Theorem 9.1 (The best approximation theorem,5 BAT). Let V be asubspace of Rn and b a point in Rn. Then

||PV (b)− b|| < ||w − b||for all other w ∈ V . More precisely, the unique point in V closest to b isPV (b).6

Proof. To keep the notation simple let’s write P for PV .Suppose b ∈ V . Then b is the point in V closest to b, duh! But P (b) = b

the theorem is true.From now on we assume b /∈ V . Let w ∈ V and assume w 6= P (b).The sides of the triangle in Rn with vertices 0, b − P (b), and w − P (b)

have lengths ||b−w||, ||b−P (b)||, and ||w−P (b)||. Notice that b−P (b) ∈ V ⊥,and w − P (b) ∈ V because it is a difference of elements in the subspace V .It follows that the line through 0 and b − P (b) is perpendicular to the line

4When you read the statement of the theorem you will see that it does not exactlysay that. The theorem states a more general result. Try to understand why the theoremimplies P (v) is the point in R(A) closest to b. If you can’t figure this out, ask.

5The adjective “best” applies to the word “approximation”, not to the word“theorem”.

6We use the word unique to emphasize the fact that there is only one point in V thatis closest to b. This fact is not obvious even though your geometric intuition might suggestotherwise. If we drop the requirement that V is a subspace there can be more than onepoint in V that is closest to some given point b. For example, if V is the circle x2 +y2 = 1in R2 and b = (0, 0), then all points in V have the same distance from b. Likewise, ifb = (0, 2) and V is the parabola y = x2 in R2, there are two points on V that are closest

to b, namely(√

32, 32

)and

(−√

32, 32

).


through 0 and w − P (b). Thus, the triangle has a right angle at the origin.The longest side of a right-angled triangle is its hypotenuse which, in thiscase, is the line through b− P (b), and w − P (b). That side has length

||(w − P (b))− (b− P (b))|| = ||w − b||.

Therefore ||w − b|| > ||b− P (b)||. �

9.1. The point in V closest to v. Let V be a subspace of Rn and va point in Rn. How do we find/compute the point in V that is closest to v?We will show that PV (v) is the point in V that is closest to v.

We begin with a high school question. Let L be a line through the originin R2 and v a point in R2; which point on L is closest to v? Here’s a pictureto help you think about the question:

L

•v

L′

It is pretty obvious that one should draw a line L′ through v that is perpen-dicular to L and the intersection point L∩L′ will be the point on L closestto v.

That is the idea behind the next result because the point L∩L′ is PL(v).

9.2. A problem: find the line that best fits the data. Givenpoints

(α1, β1), . . . , (αm, βm)

in R2 find the line y = dx + c that best approximates those points. Ingeneral there will not be a line passing through all the points but we stillwant “the best line”. This sort of problem is typical when gathering datato understand the relationship between two variables x and y. The dataobtained might suggest a linear relationship between x and y but, especiallyin the social sciences, it is rare that the data gathered spells out an exactlinear relationship.

We restate the question of finding the line y = dx+ c as a linear algebraproblem. The problem of finding the line that fits the data (αi, βi), 1 ≤ i ≤

9. LEAST-SQUARES “SOLUTIONS” TO INCONSISTENT SYSTEMS 145

m is equivalent to finding d and c such that

β1 = dα1 + c

β2 = dα2 + c

β3 = dα3 + c

......

βm = dαm + cm

which is, in turn, Equivalent to finding d and c such thatα1 1α2 1α3 1...

...αm 1

(dc

)=

β1β2β3...βm

or A

(dc

)= b

where A is an m× 2 matrix and b ∈ Rm.As we said, there is usually no solution (dc ) to this system of linear

equations but we still want to find a line that is as close as possible tothe given points. The next definition makes the idea “as close as possible”precise.

Definition 9.2. Let A be an m × n matrix and b a point in Rm. Wecall x∗ a least-squares “solution” to the equation Ax = b if Ax∗ is as closeas possible to b; i.e.,

||Ax∗ − b|| ≤ ||Ax− b|| for all x ∈ Rn.

We will now explain how to find x∗

Because Ax∗ is in the range of A we want Ax∗ to be the (unique!) pointin R(A) that is closest to b. Let’s write b∗ for the point in R(A) that isclosest to b. A picture helps:

•b

L

•b∗R(A)

R(A) •0

Lemma 9.3. Let A be an m × n matrix and v ∈ Rm. Consider v as acolumn vector. Then v ∈ R(A)⊥ if and only if AT v = 0.


Proof. Since the range of A is the linear span of the columns of A, v ∈R(A)⊥ if and only if A1 · v = · · · = An · v = 0. However, Aj · v = ATj v so

v ∈ R(A)⊥ if and only if AT1 v = · · · = ATnv = 0. But ATj is the jth row of

AT so v ∈ R(A)⊥ if and only if AT v = 0. �

With reference to the picture above, b − b∗ ∈ R(A)⊥ if and only ifAT (b− b∗) = 0.

Lemma 9.4. Let A be an m × n matrix and let P : Rn → Rn be theorthogonal projection onto R(A). Define

b∗ := P (b).

Then b∗ is the unique point in R(A)

(1) that is closest to b;(2) such that b− b∗ ∈ R(A)⊥;(3) such that AT (b− b∗) = 0.

Proof. (1) This is what Theorem 9.1 says.(2) This is what Corollary 8.3 says.(3) This follows from (2) and Lemma 9.3. �

The point inR(A) closest to b is the uniquepoint b∗ in R(A) such that AT b = AT b∗.

Since b∗ is in R(A), there is a vector x∗ such that Ax∗ = b∗. It follows that

ATAx∗ = AT b∗ = AT b.

The preceding argument has proved the following result.

Theorem 9.5. Let A be an m × n matrix and b a point in Rm. Thereis a point x∗ in Rn having the following equivalent properties:

(1) ATAx∗ = AT b;(2) Ax∗ is the point in R(A) that is closest to b;(3) x∗ is a least-squares solution to Ax = b.

Moreover, x∗ is unique if and only if N (A) = 0.

Theorem 9.6. Let Ax = b be an m× n system of linear equations.

(1) The system ATAx = AT b is consistent.(2) The solutions to the system ATAx = AT b are least-squares solu-

tions to Ax = b.(3) There is a unique least squares solution if and only if rank(A) = n.

10. Approximating data by polynomial curves

When engineers lay out the path for a new a freeway they often do sounder the constraint that the curve/path is a piecewise cubic curve. Let meexplain. Suppose one wants a freeway to begin at a point (a, b) in R2 and endat the point (a′, b′). Of course, it is unreasonable to have the freeway be a

12. INFINITE SEQUENCES 147

straight line: there might be geographical obstacles, or important buildingsand other things to be avoided, or one might want the freeway to pass closeto some particular points, towns, other roads, etc. At the other extreme onedoesn’t want the freeway to be “too curvy” because a very curvy road isdangerous and more expensive to build. What one tends to do is break theinterval [a, a′] into subintervals, say

[a, a1], [a1, a2], . . . , [an, a′]

where a < a1 < . . . < an < a′ and then decide on some points (a1, b1), . . . , (an, bn)that one would like the freeway to pass through.

Calculus tells us that a cubic curve y = αx3 + βx2 + γx+ δ has at mosttwo maxima and minima or, less formally, two major bends.

Let’s look atGiven points

(α1, β1), . . . , (αm, βm)

in R2 we might want to find a degree 2 polynomial f(x) such that the curvey = f(x) best approximate those points. In general there will not be adegree 2 curve that passes through all the points but we still want “thebest curve”. We could also ask for the best cubic curve, etc. All theseproblems are solved by using the general method in section 9. For example,if we are looking for a degree two curve we are looking for a, b, c such thaty = ax2 + bx + c best fits the data. That problem reduces to solving thesystem

α21 α1 1α22 α2 1α23 α3 1...

......

α2m αm 1

abc

=

β1β2β3...βm

or A

abc

= b

where A is an m× 3 matrix and b ∈ Rm.So, using the ideas in chapter 9 we must solve the equation

ATAx∗ = AT b

to get an a∗, b∗, and c∗ that gives a parabola y = a∗x2 + b∗x+ c∗ that bestfits the data.

10.1. Why call it least-squares? We are finding an x that minimizes||Ax− b|| and hence ||Ax− b||2 But

||Ax− b||2 = (Ax− b) · (Ax− b) = a sum of squares.

11. Gazing into the distance: Fourier series

12. Infinite sequences

CHAPTER 16

Similar matrices

1. Definition

Similar matrices are just that, similar, very similar, in a sense so alikeas to be indistinguishable in their essential properties. Although a precisedefinition is required, and will be given shortly, it should be apparent thatthe matrices

A =

(1 00 0

)and B =

(0 00 1

)are, in the everyday sense of the word, “similar”. Multiplication by A andmultiplication by B are linear transformations from R2 to R2; A sends e1 toitself and kills e2; B sends e2 to itself and kills e1. Pretty similar behavior,huh!?

2. Definition and first properties

.Let A and B be n×n matrices. We say that A is similar to B and write

A ∼ B if there is an invertible matrix S such that

A = SBS−1.

Similarity of matrices is an equivalence relation meaning that

(1) A ∼ A because A = IAI−1;(2) if A ∼ B, then B ∼ A because A = SBS−1 implies S−1AS =

S−1(SBS−1)S = (S−1S)B(S−1S) = IBI = B;(3) if A ∼ B and B ∼ C, then A ∼ C because A = SBS−1 and B =

TCT−1 impliesA = SBS−1 = S(TCT−1)S−1 = (ST )C(T−1S−1) =(ST )C(ST )−1.

Thus, the word “similar” behaves as it does in its everyday use:

(1) a thing is similar to itself;(2) if one thing is similar to another thing, then the other thing is

similar to the first thing;(3) if a thing is similar to a second thing and the second thing is similar

to a third thing, then the first thing is similar to the third thing.

149

150 16. SIMILAR MATRICES

3. Example

Let’s return to the example in chapter 1. The only difference betweenA and B is the choice of labeling: which order do we choose to write(

10

)and

(01

)or, if you prefer, which is labelled e1 and which is labelled e2.

There is a linear transformation from R2 to R2 that interchanges e1 ande2, namely

S =

(0 11 0

); Se1 = e2 and Se2 = e1.

Since S2 = I, S−1 = S and

SBS−1 =

(0 11 0

)(0 00 1

)(0 11 0

)=

(0 10 0

)(0 11 0

)=

(1 00 0

)= A.

The calculation shows that A = SBS−1; i.e., A is similar to B in thetechnical sense of the definition in chapter 2.

4

We now show that similar matrices have some of the same properties.

Theorem 4.1. Similar matrices have the same

(1) determinant;(2) characteristic polynomial;(3) eigenvalues.

Proof. Suppose A and B are similar. Then A = SBS−1 for some invertiblematrix S.

(1) Since det(S−1) = (detS)−1, we have

det(A) = (detS)(detB)(detS−1) = (detS)(detB)(detS)−1 = det(B).

We used the fact that the determinant of a matrix is a number.(2) Since S(tI)S−1 = tI, A− tI = S(B− tI)S−1; i.e., A− tI and B− tI

are similar. (2) now follows from (1) because the characteristic polynomialof A is det(A− tI).

(3) The eigenvalues of a matrix are the roots of its characteristic poly-nomial. Since A and B have the same characteristic polynomial they havethe same eigenvalues. �

4.1. Warning: Although they have the same eigenvalues similar matri-ces do not usually have the same eigenvectors or eigenspaces. Nevertheless,there is a precise relationship between the eigenspaces of similar matrices.We prove that in Proposition 4.2

4 151

4.2. Notation: If A is an n× n matrix and X a subset of Rn, we usethe notation

AX := {Ax | x ∈ Rn}.This shorthand is similar to the notation 2Z = {2n | n ∈ Z} for the evennumbers.

To check whether you understand this notation try the following prob-lems.

(1) Show that AX is a subspace if X is.(2) Show that A(BX) = (AB)X if A and B are n× n matrices.(3) Show that IX = X.(4) If S is an invertible matrix and Y = SX, show that X = S−1Y .(5) If S is an invertible matrix and X is a subspace of Rn show that

dimSX = dimX by proving the following claim: if {v1, . . . , vd} isa basis for X, then {Sv1, . . . , Svd} is a basis for SX.

Proposition 4.2. Suppose A = SBS−1. Let Eλ(A) be the λ-eigenspacefor A and Eλ(B) the λ-eigenspace for B. Then Eλ(B) = S−1Eλ(A), i.e.,

Eλ(B) = {S−1x | x ∈ Eλ(A)}.In particular, the dimensions of the λ-eigenspaces for A and B are the same.

Proof. If x ∈ Eλ(A), then λx = Ax = SBS−1x so

B(S−1x) = S−1λx = λ(S−1x);

i.e., S−1x is a λ-eigenvector for B or, equivalently,

S−1Eλ(A) ⊆ Eλ(B).

Starting from the fact that B = S−1AS, the same sort of argument showsthat SEλ(B) ⊆ Eλ(A).

Therefore

Eλ(B) = I.Eλ(B) = S−1S.Eλ(B) ⊆ S−1.Eλ(A) ⊆ Eλ(B).

In particular, Eλ(B) ⊆ S−1.Eλ(A) ⊆ Eλ(B) so these three sets are equal,i.e., Eλ(B) = S−1.Eλ(A) = {S−1x | x ∈ Eλ(A)}.

That dimEλ(A) = dimEλ(B) is proved by method suggested in exercise5 just prior to the statement of this proposition. �

If A and B are similar and Ar = 0, then Br = 0. To see this one usesthe marvelous cancellation trick, S−1S = I.

Corollary 4.3. Similar matrices have there same rank and the samenullity.

Proof. Suppose A and B are similar matrices. By definition, the nullity ofA is the dimension of its null space. But N (A) = E0(A), the 0-eigenspaceof A. By the last sentence in the statement of Proposition 4.2, E0(A) andE0(B) have the same dimension. Hence A and B have the same nullity.Since rank +nullity = n, A and B also have the same rank. �


4.3. Intrinsic and extrinsic properties of matrices. It is not un-reasonable to ask whether a matrix that is similar to a symmetric ma-trix is symmetric. Suppose A is symmetric and S is invertible. Then(SAS−1)T = (S−1)TATST = (ST )−1AST and there is no obvious reasonwhich this should be the same as SAS−1. The explicit example(

1 20 1

)(2 33 4

)(1 20 1

)−1=

(8 113 4

)(1 −20 1

)=

(8 −53 −2

)shows that a matrix similar to a symmetric matrix need not be symmetric.

PAUL - More to say.

5. Diagonalizable matrices

5.1. Diagonal matrices. A square matrix D = (dij) is diagonal ifdij = 0 for all i 6= j. In other words,

D =

λ1 0 0 · · · 0 00 λ2 0 · · · 0 00 0 λ3 · · · 0 0...

. . ....

0 0 0 · · · λn−1 00 0 0 · · · 0 λn

for some λ1, . . . , λn ∈ R. We sometimes use the abbreviation

diag(λ1, . . . , λn) =

λ1 0 0 · · · 0 00 λ2 0 · · · 0 00 0 λ3 · · · 0 0...

. . ....

0 0 0 · · · λn−1 00 0 0 · · · 0 λn

.

The identity matrix and the zero matrix are diagonal matrices.Let D = diag(λ1, . . . , λn). The following facts are obvious:

(1) the λis are the eigenvalues of D;(2) det(D) = λ1λ2 · · ·λn;(3) the characteristic polynomial of D is (λ1 − t)(λ2 − t) · · · (λn − t);(4) Dej = λjej , i.e., the standard basis vectors ej are eigenvectors for

D.

Warning: Although the axis Rej is contained in λj-eigenspace for Dits λj-eigenspace might be bigger. For example, if λi = λj , then the Eλicontains the plane Rei + Rej .

5. DIAGONALIZABLE MATRICES 153

5.2. Definition. An n× n matrix A is diagonalizable if it is similar toa diagonal matrix, i.e., if S−1AS is diagonal for some S.

A diagonal matrix is diagonalizable: if D is diagonal, then I−1DI isdiagonal!

Since similar matrices have the same characteristic polynomials, thecharacteristic polynomial of a diagonalizable matrix is a product of n linearterms. See “obvious fact” (3) above.

5.3. Example. If

A =

(5 −63 −4

)and S =

(2 11 1

),

then

S−1AS =

(1 −1−1 2

)(5 −63 −4

)(2 11 1

)=

(2 00 −1

)so A is diagonalizable.

5.4. The obvious questions are how do we determine whether A is di-agonalizable or not and if A is diagonalizable how do we find an S suchthat S−1AS is diagonal. The next theorem and its corollary answer thesequestions.

Theorem 5.1. An n×n matrix A is diagonalizable if and only if it hasn linearly independent eigenvectors.

Proof. We will use the following fact. If B is a p× q matrix and C a q × rmatrix, then the columns of BC are obtained by multiplying each columnof C by B; explicitly,

BC = [BC1, . . . , BCr]

where Cj is the jth column of C.(⇐) Let {u1, . . . , un} be linearly independent eigenvectors for A with

Auj = λjuj for each j. Define

S := [u1, . . . , un],

i.e., the jth column of S is uj . The columns of S are linearly independentso S is invertible. Since

I = S−1S = S−1[u1, . . . , un] = [S−1u1, . . . , S−1un],

S−1uj = ej , the jth standard basis vector.


Now

S−1AS = S−1A[u1, . . . , un]

= S−1[Au1, . . . , Aun]

= S−1[λ1u1, . . . , λnun]

= [λ1S−1u1, . . . , λnS

−1un]

= [λ1e1, . . . , λnen]

=

λ1 0 0 · · · 0 00 λ2 0 · · · 0 00 0 λ3 · · · 0 0...

. . ....

0 0 0 · · · λn−1 00 0 0 · · · 0 λn

.

Thus A is diagonalizable.(⇒) Now suppose A is diagonalizable, i.e.,

S−1AS = D = diag(λ1, . . . , λn)

for some invertible S. Now

[AS1, . . . , ASn] = AS

= SD

= S[λ1e1, . . . , λnen]

= [λ1Se1, . . . , λnSen]

= [λ1S1, . . . , λnSn]

so ASj = λjSj for all j. But the columns of S are linearly independent soA has n linearly independent eigenvectors for A. �

The proof of Theorem 5.1 established the truth of the following corollary.

Corollary 5.2. If A is a diagonalizable n×n matrix and u1, . . . , un arelinearly independent eigenvectors for A and S = [u1, . . . , un], then S−1ASis diagonal.

Corollary 5.3. If an n × n matrix has n distinct eigenvalues it isdiagonalizable.

Proof. Suppose λ1, . . . , λn are the distinct eigenvalues for A and for eachi let vi be a non-zero vector such that Avi = λivi. By Theorem 1.2,{v1, . . . , vn} is a linearly independent set. It now follows from Theorem5.1 that Ais diagonalizable. �

5. DIAGONALIZABLE MATRICES 155

5.5. Previous example revisited. We showed above that

A =

(5 −63 −4

)is diagonalizable. Let’s suppose we did not know that and had to find outwhether is was diagonalizable and, if so to find a matrix S such that S−1ASis diagonal. Theorem 5.1 says we must determine whether A has two linearlyindependent eigenvectors.

The characteristic polynomial of A is t2 − t(a+ d) + ad− bc, i.e.,

t2 − t− 2 = (t− 2)(t+ 1).

The eigenvalues for A are 2 and −1. By Corollary 5.3, A is diagonalizable.To find an S we need to find eigenvectors for A. The 2-eigenspace is

E2 = N (A− 2I) = N(

3 −63 −6

)= R

(21

).

The (−1)-eigenspace is

E−1 = N (A+ I) = N(

6 −63 −3

)= R

(11

).

Thus S can be any matrix with one column a non-zero multiple of (2 1)T

and the other a non-zero multiple of (1 1)T . For example, the matrix weused before, namely (

2 11 1

),

works.It is important to realize that S is not the only matrix that “works”.

For example, if

R =

(3 −23 −1

).

Now

R−1AR =1

3

(−1 2−3 3

)(5 −63 −4

)(3 −23 −1

)=

(−1 00 2

).

Notice that R−1AR 6= S−1AS. In particular, A is similar to two differentdiagonal matrices.

CHAPTER 17

Words and terminology

1. The need for clarity and precision

Definitions constitute the rock on which mathematics rests. Definitionsmust be precise and unambiguous. The should be expressed in simple lan-guage that is easy to understand. The mathematical edifice built upon thisrock consists of results, usually called Lemmas, Propositions, Theorems, andCorollaries. I won’t discuss the question of which types of results should re-ceive which of those labels. Results, like definitions, must be stated clearly,accurately, and precisely. They should be unambiguous and easy to under-stand.

Students taking linear algebra find it a challenge to state definitions andresults with the required clarity and precision. But clarity of thought andclarity of statement go hand-in-hand. You can’t have one without the otherand as the clarity and precision of one increases so does that of the other.Do not confuse the requirement for clarity and precision with a pedanticadherence to the rules of grammar. The demand for clarity and precision isa demand that you truly understand what you think, write, and say.

Be clear in what you write. Put yourself in the shoes of the reader.Particularly when stating a definition or result put yourself in the shoes ofa reader who has never seen that definition or result before. When I seea student’s inaccurate statement of a definition or result I can guess whatthey want to say. But I am not grading you on what you want to say. I amgrading you on the basis of what you do say. When I read your statementI ask myself “is this statement good enough to appear in a textbook?” and“does this statement tell someone who has never before seen the statement,or a variation on it, exactly what the author wants to, or should, convey?”

2. Some, all, each, every, and only

In your everyday life you know that a statement beginning “Some dogs...” is very different from one that begins “All dogs ...”. The three words“some”, “all”, and “every”, play a crucial role in mathematics. Try to usethem whenever it is appropriate to do so—there is rarely a situation in whicha variation on one of those three words is better than using the word itself.

Let’s consider the definition of a linear transformation.

2.1. The following definition is perfect:

157

158 17. WORDS AND TERMINOLOGY

A function T : V → W between two vector spaces is alinear transformation if T (au+ bu) = aT (u) + bT (v) for allu, v ∈ V and all a, b ∈ R.

Here is a less than perfect version of that definition.

A function T : V → W between two vector spaces is alinear transformation if T (au+ bu) = aT (u) + bT (v) whereu, v ∈ V and a, b ∈ R.

The words “for all” have been replaced by “where”. The word “where”has many meanings (check the dictionary if you need convincing) and inthis situation it does not have the clarity that “for all” does. When I read“where” I have to pause and ask myself which of its meanings is being usedbut when I read “for all” I do not have to pause and think.

Here is an incorrect version of that definition.

A function T : V → W between two vector spaces is alinear transformation if T (au + bu) = aT (u) + bT (v) forsome u, v ∈ V and a, b ∈ R.

This is wrong because “some” has a different meaning than “all”. Considerthe sentences “Some dogs have rabies” and “All dogs have rabies”.

There are mathematical statements in which the word “where” is appro-priate. For example:

If V is a subspace of Rn, then every point w ∈ Rn can bewritten in a unique way as w = v + v′ where v ∈ V andv′ ∈ V ⊥.

The sentence is intended to state the fact that every point in Rn is the sumof a point in V and a point in V ⊥, and there is only one way in which thathappens, i.e., if u and v are in V and u′ and v′ are in V ⊥ and v+v′ = u+u′,then u = v and u′ = v′. I think the displayed sentence does convey that butperhaps there is a better way to convey that fact. What do you think? Onealternative is this:

If V is a subspace of Rn and w ∈ Rn, then there is a uniquev ∈ V and a unique v′ ∈ V ⊥ such that w = v + v′.

What do you think of the alternatives?

2.2. The following is the definition of what it means for a set of vectorsto be orthogonal.

We say that {v1, . . . , vk} is orthogonal if vi · vj = 0 for alli 6= j.

I think it is implicit in the statement that this must be true for all pairs ofdifferent numbers between 1 and n but it is not unreasonable to argue thatone should make it clearer that vi · vj = 0 for all i, j ∈ {1, . . . , n} such thati 6= j. The words “for all” are suggesting that one is making a statementabout all pairs of distinct numbers i and j in some collection of pairs. In any

4. THE WORD “IT” 159

case, it seems to me that the word “all” should be used in this definition.The following version of the definition seems less clear:

We say that {v1, . . . , vk} is orthogonal if vi · vj = 0 wherei 6= j.

“Where” seems to need some interpretation and may be mis-interpreted.Better to use the word “all” because there is little possibility it can bemisunderstood.

2.3. I used the word “where” in my definition of linear span.

A linear combination of vectors v1, . . . , vn in Rm is a vectorthat can be written as

a1v1 + · · ·+ anvn

for some numbers a1, . . . , an ∈ R.

Let A be any m × n matrix and x ∈ Rn. Then Ax is alinear combination of the columns of A. Explicitly,

(2-1) Ax = x1A1 + · · ·+ xnAn

where Aj denotes the jth column of A.

A function T : V → W between two vector spaces is alinear transformation if T (au+ bu) = aT (u) + bT (v) whereu, v ∈ V and a, b ∈ R.

2.4. Try to avoid “where”. I think it is a good general rule to avoidusing the word “where” whenever possible. I don’t think it is good to neveruse the word “where” in mathematics, but certainly it should be avoided ifthere is a clear alternative that conveys your meaning without making thesentence clunky.

3. Singular and plural

A system of linear equations is consistent if it has a solutionand inconsistent if it doesn’t.

The subject of this sentence, “A system” is singular. The system may consistof many equations but the system is a single entity.

The subject of a sentence beginning “The people of America ...” isplural so it would be correct to proceed with the word “are”. The subjectof a sentence beginning “The population of America ...” is singular so itwould be correct to proceed with the word “is”.

4. The word “it”

The word “it” is versatile. But danger lurks in in its versatility. A writermust ensure that “it” refers to what it is supposed to refer to.

Examples...In the following definition “it” refers to the system of linear equations,

the subject of the sentence.

160 17. WORDS AND TERMINOLOGY

A system of linear equations is consistent if it has a solutionand inconsistent if it doesn’t.

5. “A” and “The”

If V is a non-zero vector space it has infinitely many bases so one cannotspeak of “the basis” for a vector space unless it is clear from the contextwhich basis is being referred to. That is why we say “a basis” in the definitionof dimension.

The dimension of a vector space is the number of elementsin a basis for it.

The definition

The dimension of a vector space is the number of elementsin the basis for it.

is wrong because it suggests, incorrectly, that there is only one basis for avector space.

6. If and only if

CHAPTER 18

Applications of linear algebra and vector spaces

There are so many applications of linear algebra and vector spaces thatmany volumes are needed to do justice to the title of this chapter. Lin-ear algebra and vector spaces are embedded deep in the heart of moderntechnology and the modern world.

1. Simple electrical circuits

The flow of current in a simple electrical circuit consisting of batteriesand resistors is governed by three laws:

• Kirchoff’s First Law The sum of the currents flowing into a nodeis equal to the sum of the currents flowing out of it.• Kirchoff’s Second Law The sum of the voltage drops around a

closed loop is equal to the total voltage in the loop.• Ohm’s Law The voltage drop across a resistor is the product of

the current and the resistor.

To make apply these we must first draw a picture of the circuit. Tosimplify I will begin by leaving out the resistors and batteries. A typicalcircuit might look like this:

A•B C

•D •E F

G H

There are three loops and three nodes in this circuit. We must put arrows oneach loop indicating the direction the current is flowing. It doesn’t matterhow we label these currents, or the direction we choose for the current flowing

161

162 18. APPLICATIONS OF LINEAR ALGEBRA AND VECTOR SPACES

around each loop.

A

I1��

•B

I3��

C

•D I4 // •EI2

OO

F

G

I5

OO

H

Applying Kirchoff’s First Law at the nodes B, D, and E, we deduce that

I1 + I3 = I2

I1 + I5 = I4

I2 + I5 = I3 + I4.

I am going to refer you to the book for more on this because I can’t drawthe diagrams!

2. Magic squares

You have probably seen the following example of a magic square:

4 3 89 5 12 7 6

The numbers in each row, column, and diagonal add up to 15. Let’s lookat this from the point of view of linear algebra and learn a little about thismagic square and then try to construct some larger ones.

Let’s call any n × n matrix with the property that the numbers ineach row, column, and diagonal add up to the same number, σ say, aσ-valued magic square. We will call σ the magic value. We will write

MSn(σ)

for the set of σ-valued magic squares. If the numbers in the matrix are1, 2, . . . , n2 we will call it a classical magic square.

The sum of all integers from 1 to n2 is 12n

2(n2+1), so in a classical n×nmagic square the sum of the numbers in a row is 1

2n(n2 + 1). For examplewhen n = 3, the magic value is 15. When n = 4 it is 34.

The n×n matrix with every entry equal to 1 is an n-valued magic square.We will write En for this matrix or just E if the n is clear. Although this willnot strike you as a particularly interesting n-valued magic square it will playa useful role in showing that it an appropriate system of linear equations

2. MAGIC SQUARES 163

is consistent – it will be a solution to them—and in reducing the problemfrom to a system of homogenous linear equations.

The set of all n×n matrices forms a vector space: we can add them andmultiply by scalars and the properties in Proposition ?? hold. We usuallywrite Mn(R) for the set of all n× n matrices. The subset of 0-valued magicsquares MSn(0) is a subspace of Mn(R). The next lemma says that

MSn(σ) = σnE +MSn(0);

i.e., the set of σ-valued magic squares is a translation of the subspaceMSn(0).

Lemma 2.1. Let A be an n × n matrix. Then A is a σ-valued magicsquare if and only if A− σ

nE is a 0-valued magic square.

For example, when n = 3,4 3 89 5 12 7 6

− 5E =

−1 −2 34 0 −4−3 2 1

is a 0-valued magic square.

We now consider the problem of finding all 0-valued 3×3 magic squares.We will write xij for the unknown value of the ijth entry. For example, thefact that sum of the top row of the matrix must be zero gives us the equationx11 + x12 + x13 = 0. And so on. Doing the obvious thing we get a totalof 8 homogeneous equations in 9 unknowns—3 equations from the rows, 3from the columns, and 2 from the diagonals. By Proposition ?? a system ofhomogeneous equations always has a non-trivial solution when the numberof rows is strictly smaller than the number of unknowns. Hence there is anon-trivial 3× 3 0-valued magic square.

Before we can write down the coefficient matrix for the system of equa-tions we must agree on an ordering for the unknowns. We will just use thenaive ordering and write them in the order

x11, x12, x13, x21, x22, x23, x31, x32, x33.

The coefficient matrix is

1 0 0 0 1 0 0 0 10 1 0 0 1 0 0 1 00 0 1 0 0 1 0 0 11 0 0 1 0 0 1 0 00 0 0 1 1 1 0 0 01 1 1 0 0 0 0 0 00 0 1 0 1 0 1 0 00 0 0 0 0 0 1 1 1

main diagonalcolumn 2column 3column 1

row 2row 1

anti-diagonalrow 3

164 18. APPLICATIONS OF LINEAR ALGEBRA AND VECTOR SPACES

I don’t want to present all the row operations in gory detail, but we canprogressively obtain the equivalent matrices

1 0 0 0 1 0 0 0 10 1 0 0 1 0 0 1 00 0 1 0 0 1 0 0 10 0 0 1 −1 0 1 0 −10 0 0 0 1 −1 1 0 −10 0 0 0 0 0 0 0 00 0 0 0 −2 −1 0 −1 −20 0 0 0 0 0 1 1 1

MDC2C3

C1-MDAD-C3

R2+R1+R3-C1-C2-C3R1-MD-C2-C3

row 3

then

1 0 0 0 1 0 0 0 10 1 0 0 1 0 0 1 00 0 1 0 0 1 0 0 10 0 0 1 −1 0 1 0 −10 0 0 0 1 −1 1 0 −10 0 0 0 0 0 0 0 00 0 0 0 0 −3 2 −1 −40 0 0 0 0 0 1 1 1

MDC2C3

C1-MDAD-C3

R2+R1+R3-C1-C2-C3R1-MD-C2-3C3+2AD

row 3

We can already see from this that x32 and x33 will be the independentvariables, so let’s take

x33 = 1, x32 = 2.

Evaluating the dependent variables gives

x31 = −3, x23 = −4, x22 = 0, x21 = 4, x13 = 3, x12 = −2, x11 = −1,

which is the 0-valued magic square we obtained before.The rank of the matrix is 8-1=7, so its nullity is 2. That is equivalent

to the existence of two independent variables, so let’s find another solutionthat together with the first will give us a basis. It is obvious though that a3× 3 matrix has various symmetries that we can exploit.

For the n× n case we get a homogeneous system of 2n+ 2 equations inn2 variables. If n > 2 this system has a non-trivial solution.

CHAPTER 19

Last rites

1. A summary of notation

Sets are usually denoted by upper case letters.f : X → Y denotes a function f from a set X to a set Y . If f : X → Y

and g : Y → Z are functions, then g ◦ f or gf denotes the composition firstdo f then do g. Thus gf : X → Z and is defined by

(gf)(x) = g(f(x)).

This makes sense because f(x) is an element of Y so g can be applied to it.The identity function idX : X → X is defined by

idX(x) = x for all x ∈ X.

If f : X → Y and g : Y → X are such that gf = idX and fg = idYwe call g the inverse of f and denote it by f−1. Of course, if g = f−1, thenf = g−1.

We denote the range of a function f : X → Y by R(f). Thus, R(f) ={f(x) | x ∈ X}. It is a subset of Y .

If A is an m× n matrix we often write A1, . . . , An for the columns of Aand

A = [A1, . . . , An]

to indicate this. This notation usually appears when we want to make useof the fact that

BA = B[A1, . . . , An] = [BA1, . . . , BAn],

i.e., the jth column of BA is B times the jth column of A.We write ej for the element in Rn having zeroes in all positions except

the jth which is 1. Thus, the identity matrix is I = [e1, . . . , en]. Moregenerally, if A is any m × n matrix, then Aej = Aj , the jth column of A.You can see this either by computation or using the fact that AI = A.

If x1, . . . , xn are vectors, then Rx1 + · · · + Rxn is synonymous withSp(x1, . . . , xn). It denotes the set of all linear combinations of the vectorsx1, . . . , xn.

The λ-eigenspace is denoted by Eλ. If it is necessary to name the matrixfor which it is the eigenspace we write Eλ(A).

165

166 19. LAST RITES

2. A sermon

Learning a new part of mathematics is like learning other things: pho-tography, cooking, riding a bike, playing chess, listening to music, playingmusic, tennis, tasting wine, archery, seeing different kinds of art, etc. Atsome point one does, or does not, make the transition to a deeper involve-ment where one internalizes and makes automatic the steps that are firstlearned by rote. At some point one does, or does not, integrate the separatediscrete aspects of the activity into a whole.

You will know whether you are undergoing this transition or not. It maybe happening slowly, more slowly than you want, but it is either happeningor not. Be sensitive to whether it is happening. Most importantly, askyourself whether you want it to happen and, if you do, how much workyou want to do to make it happen. It will not happen without your activeinvolvement.

Can you make fluent sentences about linear algebra? Can you formulatethe questions you need to ask in order to increase your understanding?Without this internalization and integration you will feel more and more likea juggler with too many balls in the air. Each new definition, concept, idea,is one more ball to keep in the air. It soon becomes impossible unless onesees the landscape of the subject as a single entity into which the separatepieces fit together neatly and sensibly.

Linear algebra will remain a great struggle if this transition is not hap-pening. A sense of mastery will elude you. Its opposite, a sense that thesubject is beyond you, will take root—paralysis and fear will set in. Thatis the dark side of all areas of life that involve some degree of competence,public performance, evaluation by others, and consequences that you careabout.

Very few people, perhaps none, get an A in 300- and higher-level mathcourses unless they can integrate the separate parts into a whole in whichthe individual pieces not only make sense but seem inevitable and natural.

It is you, and only you, who will determine whether you understandlinear algebra. It is not easy.

Good luck!

linear algebra math 308 s. paul smith

Documents