linear algebra - cloudflare-ipfs.com

LinearAlgebra

This page intentionally left blank

Fourth Edition

Stephen H. FriedbergArnold J. InselLawrence E. Spence

Illinois State University

PEARSON EDUCATION, Upper Saddle River, New Jersey 07458

Library of Congress Cataloging-in-Publication Data

Friedberg, Stephen H.Linear algebra / Stephen H. Friedberg, Arnold J. Insel, Lawrence E. Spence.--4th ed.

p. cm.Includes indexes.ISBN 0-13-008451-4

1. Algebra, Linear. I. Insel, Arnold J. II. Spence, Lawrence E. III. Title.

QA184.2.F75 2003 2002032677512’.5 --dc21

Acquisitions Editor: George LobellEditor in Chief: Sally YaganProduction Editor: Lynn Savino WendelVice President/Director of Production and Manufacturing: David W. RiccardiSenior Managing Editor: Linda Mihatov BehrensAssistant Managing Editor: Bayani DeLeonExecutive Managing Editor: Kathleen SchiaparelliManufacturing Buyer: Michael BellManufacturing Manager: Trudy PisciottiEditorial Assistant: Jennifer BradyMarketing Manager: Halee DinseyMarketing Assistant: Rachel BeckmanArt Director: Jayne ConteCover Designer: Bruce KenselaarCover Photo Credits: Anni Albers, Wandbehang We 791 (Orange), 1926/64. Dreifachgewebe:

Baumwolle und Kunstseide, schwarz, weiß, Orange 175 × 118 cm. Foto: GunterLepkowski, Berlin. Bauhaus-Archiv, Berlin, Inv. Nr. 1575. Lit.: DasBauhaus webt, Berlin 1998, Nr. 38.

c© 2003, 1997, 1989, 1979 by Pearson Education, Inc.Pearson Education, Inc.Upper Saddle River, New Jersey 07458

All rights reserved. No part of this book may bereproduced, in any form or by any means,without permission in writing from the publisher.

Printed in the United States of America

10 9 8 7 6 5 4 3 2 1

ISBN 0-13-008451-4

Pearson Education, Ltd., LondonPearson Education Australia Pty. Limited, SydneyPearson Education Singapore, Pte., LtdPearson Education North Asia Ltd, Hong KongPearson Education Canada, Ltd., TorontoPearson Educacion de Mexico, S.A. de C.V.Pearson Education -- Japan, TokyoPearson Education Malaysia, Pte. Ltd

To our families:Ruth Ann, Rachel, Jessica, and Jeremy

Barbara, Thomas, and SaraLinda, Stephen, and Alison

Contents

Preface ix

1 Vector Spaces 1

1.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . 11.2 Vector Spaces . . . . . . . . . . . . . . . . . . . . . . . . . . . 61.3 Subspaces . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 161.4 Linear Combinations and Systems of Linear Equations . . . . 241.5 Linear Dependence and Linear Independence . . . . . . . . . 351.6 Bases and Dimension . . . . . . . . . . . . . . . . . . . . . . 421.7∗ Maximal Linearly Independent Subsets . . . . . . . . . . . . 58

Index of Definitions . . . . . . . . . . . . . . . . . . . . . . . 62

2 Linear Transformations and Matrices 64

2.1 Linear Transformations, Null Spaces, and Ranges . . . . . . . 642.2 The Matrix Representation of a Linear Transformation . . . 792.3 Composition of Linear Transformations

and Matrix Multiplication . . . . . . . . . . . . . . . . . . . . 862.4 Invertibility and Isomorphisms . . . . . . . . . . . . . . . . . 992.5 The Change of Coordinate Matrix . . . . . . . . . . . . . . . 1102.6∗ Dual Spaces . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1192.7∗ Homogeneous Linear Differential Equations

with Constant Coefficients . . . . . . . . . . . . . . . . . . . 127Index of Definitions . . . . . . . . . . . . . . . . . . . . . . . 145

3 Elementary Matrix Operations and Systems of LinearEquations 147

3.1 Elementary Matrix Operations and Elementary Matrices . . 147

*Sections denoted by an asterisk are optional.

v

vi Table of Contents

3.2 The Rank of a Matrix and Matrix Inverses . . . . . . . . . . 1523.3 Systems of Linear Equations—Theoretical Aspects . . . . . . 1683.4 Systems of Linear Equations—Computational Aspects . . . . 182


4 Determinants 199

4.1 Determinants of Order 2 . . . . . . . . . . . . . . . . . . . . 1994.2 Determinants of Order n . . . . . . . . . . . . . . . . . . . . 2094.3 Properties of Determinants . . . . . . . . . . . . . . . . . . . 2224.4 Summary—Important Facts about Determinants . . . . . . . 2324.5∗ A Characterization of the Determinant . . . . . . . . . . . . 238


5 Diagonalization 245

5.1 Eigenvalues and Eigenvectors . . . . . . . . . . . . . . . . . . 2455.2 Diagonalizability . . . . . . . . . . . . . . . . . . . . . . . . . 2615.3∗ Matrix Limits and Markov Chains . . . . . . . . . . . . . . . 2835.4 Invariant Subspaces and the Cayley–Hamilton Theorem . . . 313


6 Inner Product Spaces 329

6.1 Inner Products and Norms . . . . . . . . . . . . . . . . . . . 3296.2 The Gram–Schmidt Orthogonalization Process

and Orthogonal Complements . . . . . . . . . . . . . . . . . 3416.3 The Adjoint of a Linear Operator . . . . . . . . . . . . . . . 3576.4 Normal and Self-Adjoint Operators . . . . . . . . . . . . . . 3696.5 Unitary and Orthogonal Operators and Their Matrices . . . 3796.6 Orthogonal Projections and the Spectral Theorem . . . . . . 3986.7∗ The Singular Value Decomposition and the Pseudoinverse . . 4056.8∗ Bilinear and Quadratic Forms . . . . . . . . . . . . . . . . . 4226.9∗ Einstein’s Special Theory of Relativity . . . . . . . . . . . . . 4516.10∗ Conditioning and the Rayleigh Quotient . . . . . . . . . . . . 4646.11∗ The Geometry of Orthogonal Operators . . . . . . . . . . . . 472


Table of Contents vii

7 Canonical Forms 482

7.1 The Jordan Canonical Form I . . . . . . . . . . . . . . . . . . 4827.2 The Jordan Canonical Form II . . . . . . . . . . . . . . . . . 4977.3 The Minimal Polynomial . . . . . . . . . . . . . . . . . . . . 5167.4∗ The Rational Canonical Form . . . . . . . . . . . . . . . . . . 524


Appendices 549

A Sets . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 549B Functions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 551C Fields . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 552D Complex Numbers . . . . . . . . . . . . . . . . . . . . . . . . 555E Polynomials . . . . . . . . . . . . . . . . . . . . . . . . . . . . 561

Answers to Selected Exercises 571

PrefaceThe language and concepts of matrix theory and, more generally, of linearalgebra have come into widespread usage in the social and natural sciences,computer science, and statistics. In addition, linear algebra continues to beof great importance in modern treatments of geometry and analysis.

The primary purpose of this fourth edition of Linear Algebra is to presenta careful treatment of the principal topics of linear algebra and to illustratethe power of the subject through a variety of applications. Our major thrustemphasizes the symbiotic relationship between linear transformations andmatrices. However, where appropriate, theorems are stated in the more gen-eral infinite-dimensional case. For example, this theory is applied to findingsolutions to a homogeneous linear differential equation and the best approx-imation by a trigonometric polynomial to a continuous function.

Although the only formal prerequisite for this book is a one-year coursein calculus, it requires the mathematical sophistication of typical junior andsenior mathematics majors. This book is especially suited for a second coursein linear algebra that emphasizes abstract vector spaces, although it can beused in a first course with a strong theoretical emphasis.

The book is organized to permit a number of different courses (rangingfrom three to eight semester hours in length) to be taught from it. Thecore material (vector spaces, linear transformations and matrices, systems oflinear equations, determinants, diagonalization, and inner product spaces) isfound in Chapters 1 through 5 and Sections 6.1 through 6.5. Chapters 6 and7, on inner product spaces and canonical forms, are completely independentand may be studied in either order. In addition, throughout the book areapplications to such areas as differential equations, economics, geometry, andphysics. These applications are not central to the mathematical development,however, and may be excluded at the discretion of the instructor.

We have attempted to make it possible for many of the important topicsof linear algebra to be covered in a one-semester course. This goal has ledus to develop the major topics with fewer preliminaries than in a traditionalapproach. (Our treatment of the Jordan canonical form, for instance, doesnot require any theory of polynomials.) The resulting economy permits us tocover the core material of the book (omitting many of the optional sectionsand a detailed discussion of determinants) in a one-semester four-hour coursefor students who have had some prior exposure to linear algebra.

Chapter 1 of the book presents the basic theory of vector spaces: sub-spaces, linear combinations, linear dependence and independence, bases, anddimension. The chapter concludes with an optional section in which we prove

ix

x Preface

that every infinite-dimensional vector space has a basis.Linear transformations and their relationship to matrices are the subject

of Chapter 2. We discuss the null space and range of a linear transformation,matrix representations of a linear transformation, isomorphisms, and changeof coordinates. Optional sections on dual spaces and homogeneous lineardifferential equations end the chapter.

The application of vector space theory and linear transformations to sys-tems of linear equations is found in Chapter 3. We have chosen to defer thisimportant subject so that it can be presented as a consequence of the pre-ceding material. This approach allows the familiar topic of linear systems toilluminate the abstract theory and permits us to avoid messy matrix computa-tions in the presentation of Chapters 1 and 2. There are occasional examplesin these chapters, however, where we solve systems of linear equations. (Ofcourse, these examples are not a part of the theoretical development.) Thenecessary background is contained in Section 1.4.

Determinants, the subject of Chapter 4, are of much less importance thanthey once were. In a short course (less than one year), we prefer to treatdeterminants lightly so that more time may be devoted to the material inChapters 5 through 7. Consequently we have presented two alternatives inChapter 4—a complete development of the theory (Sections 4.1 through 4.3)and a summary of important facts that are needed for the remaining chapters(Section 4.4). Optional Section 4.5 presents an axiomatic development of thedeterminant.

Chapter 5 discusses eigenvalues, eigenvectors, and diagonalization. One ofthe most important applications of this material occurs in computing matrixlimits. We have therefore included an optional section on matrix limits andMarkov chains in this chapter even though the most general statement of someof the results requires a knowledge of the Jordan canonical form. Section 5.4contains material on invariant subspaces and the Cayley–Hamilton theorem.

Inner product spaces are the subject of Chapter 6. The basic mathe-matical theory (inner products; the Gram–Schmidt process; orthogonal com-plements; the adjoint of an operator; normal, self-adjoint, orthogonal andunitary operators; orthogonal projections; and the spectral theorem) is con-tained in Sections 6.1 through 6.6. Sections 6.7 through 6.11 contain diverseapplications of the rich inner product space structure.

Canonical forms are treated in Chapter 7. Sections 7.1 and 7.2 developthe Jordan canonical form, Section 7.3 presents the minimal polynomial, andSection 7.4 discusses the rational canonical form.

There are five appendices. The first four, which discuss sets, functions,fields, and complex numbers, respectively, are intended to review basic ideasused throughout the book. Appendix E on polynomials is used primarilyin Chapters 5 and 7, especially in Section 7.4. We prefer to cite particularresults from the appendices as needed rather than to discuss the appendices

Preface xi

independently.The following diagram illustrates the dependencies among the various

chapters.

Chapter 1

�Chapter 2

�Chapter 3

�Sections 4.1–4.3or Section 4.4

�Sections 5.1 and 5.2 � Chapter 6

�Section 5.4

�Chapter 7

One final word is required about our notation. Sections and subsectionslabeled with an asterisk (∗) are optional and may be omitted as the instructorsees fit. An exercise accompanied by the dagger symbol (†) is not optional,however—we use this symbol to identify an exercise that is cited in some latersection that is not optional.

DIFFERENCES BETWEEN THE THIRD AND FOURTH EDITIONS

The principal content change of this fourth edition is the inclusion of anew section (Section 6.7) discussing the singular value decomposition andthe pseudoinverse of a matrix or a linear transformation between finite-dimensional inner product spaces. Our approach is to treat this material asa generalization of our characterization of normal and self-adjoint operators.

The organization of the text is essentially the same as in the third edition.Nevertheless, this edition contains many significant local changes that im-

xii Preface

prove the book. Section 5.1 (Eigenvalues and Eigenvectors) has been stream-lined, and some material previously in Section 5.1 has been moved to Sec-tion 2.5 (The Change of Coordinate Matrix). Further improvements includerevised proofs of some theorems, additional examples, new exercises, andliterally hundreds of minor editorial changes.

We are especially indebted to Jane M. Day (San Jose State University)for her extensive and detailed comments on the fourth edition manuscript.Additional comments were provided by the following reviewers of the fourthedition manuscript: Thomas Banchoff (Brown University), Christopher Heil(Georgia Institute of Technology), and Thomas Shemanske (Dartmouth Col-lege).

To find the latest information about this book, consult our web site onthe World Wide Web. We encourage comments, which can be sent to us bye-mail or ordinary post. Our web site and e-mail addresses are listed below.

web site: http://www.math.ilstu.edu/linalg

e-mail: [email protected]

Stephen H. FriedbergArnold J. InselLawrence E. Spence

http://www.math.ilstu.edu/linalg

1Vector Spaces

1.1 Introduction1.2 Vector Spaces1.3 Subspaces1.4 Linear Combinations and Systems of Linear Equations1.5 Linear Dependence and Linear Independence1.6 Bases and Dimension1.7* Maximal Linearly Independent Subsets

1.1 INTRODUCTION

Many familiar physical notions, such as forces, velocities,1 and accelerations,involve both a magnitude (the amount of the force, velocity, or acceleration)and a direction. Any such entity involving both magnitude and direction iscalled a “vector.” A vector is represented by an arrow whose length denotesthe magnitude of the vector and whose direction represents the direction ofthe vector. In most physical situations involving vectors, only the magnitudeand direction of the vector are significant; consequently, we regard vectorswith the same magnitude and direction as being equal irrespective of theirpositions. In this section the geometry of vectors is discussed. This geometryis derived from physical experiments that test the manner in which two vectorsinteract.

Familiar situations suggest that when two like physical quantities act si-multaneously at a point, the magnitude of their effect need not equal the sumof the magnitudes of the original quantities. For example, a swimmer swim-ming upstream at the rate of 2 miles per hour against a current of 1 mile perhour does not progress at the rate of 3 miles per hour. For in this instancethe motions of the swimmer and current oppose each other, and the rate ofprogress of the swimmer is only 1 mile per hour upstream. If, however, the

1The word velocity is being used here in its scientific sense—as an entity havingboth magnitude and direction. The magnitude of a velocity (without regard for thedirection of motion) is called its speed.

1

2 Chap. 1 Vector Spaces

swimmer is moving downstream (with the current), then his or her rate ofprogress is 3 miles per hour downstream.

Experiments show that if two like quantities act together, their effect ispredictable. In this case, the vectors used to represent these quantities can becombined to form a resultant vector that represents the combined effects ofthe original quantities. This resultant vector is called the sum of the originalvectors, and the rule for their combination is called the parallelogram law.(See Figure 1.1.)

��

��

��

��

x

y

x + y

P

Q

Figure 1.1

Parallelogram Law for Vector Addition. The sum of two vectorsx and y that act at the same point P is the vector beginning at P that isrepresented by the diagonal of parallelogram having x and y as adjacent sides.

Since opposite sides of a parallelogram are parallel and of equal length, theendpoint Q of the arrow representing x + y can also be obtained by allowingx to act at P and then allowing y to act at the endpoint of x. Similarly, theendpoint of the vector x + y can be obtained by first permitting y to act atP and then allowing x to act at the endpoint of y. Thus two vectors x andy that both act at the point P may be added “tail-to-head”; that is, eitherx or y may be applied at P and a vector having the same magnitude anddirection as the other may be applied to the endpoint of the first. If this isdone, the endpoint of the second vector is the endpoint of x + y.

The addition of vectors can be described algebraically with the use ofanalytic geometry. In the plane containing x and y, introduce a coordinatesystem with P at the origin. Let (a1, a2) denote the endpoint of x and (b1, b2)denote the endpoint of y. Then as Figure 1.2(a) shows, the endpoint Q of x+yis (a1 + b1, a2 + b2). Henceforth, when a reference is made to the coordinatesof the endpoint of a vector, the vector should be assumed to emanate fromthe origin. Moreover, since a vector beginning at the origin is completelydetermined by its endpoint, we sometimes refer to the point x rather thanthe endpoint of the vector x if x is a vector emanating from the origin.

Besides the operation of vector addition, there is another natural operationthat can be performed on vectors—the length of a vector may be magnified

Sec. 1.1 Introduction 3

��

��

��

��

x

y

x + y

P

Q

(a1 + b1, a2 + b2)

(a1 + b1, b2)

(b1, b2)

(a1, a2)

(a)

�

�

x

tx

(ta1, ta2)

(a1, a2)

(b)

a1 ta1

Figure 1.2

or contracted. This operation, called scalar multiplication, consists of mul-tiplying the vector by a real number. If the vector x is represented by anarrow, then for any real number t, the vector tx is represented by an arrow inthe same direction if t ≥ 0 and in the opposite direction if t < 0. The lengthof the arrow tx is |t| times the length of the arrow x. Two nonzero vectorsx and y are called parallel if y = tx for some nonzero real number t. (Thusnonzero vectors having the same or opposite directions are parallel.)

To describe scalar multiplication algebraically, again introduce a coordi-nate system into a plane containing the vector x so that x emanates from theorigin. If the endpoint of x has coordinates (a1, a2), then the coordinates ofthe endpoint of tx are easily seen to be (ta1, ta2). (See Figure 1.2(b).)

The algebraic descriptions of vector addition and scalar multiplication forvectors in a plane yield the following properties:

1. For all vectors x and y, x + y = y + x.2. For all vectors x, y, and z, (x + y) + z = x + (y + z).3. There exists a vector denoted 0 such that x + 0 = x for each vector x.4. For each vector x, there is a vector y such that x + y = 0 .5. For each vector x, 1x = x.6. For each pair of real numbers a and b and each vector x, (ab)x = a(bx).7. For each real number a and each pair of vectors x and y, a(x + y) =

ax + ay.8. For each pair of real numbers a and b and each vector x, (a + b)x =

ax + bx.

Arguments similar to the preceding ones show that these eight properties,as well as the geometric interpretations of vector addition and scalar multipli-cation, are true also for vectors acting in space rather than in a plane. Theseresults can be used to write equations of lines and planes in space.


Consider first the equation of a line in space that passes through twodistinct points A and B. Let O denote the origin of a coordinate system inspace, and let u and v denote the vectors that begin at O and end at A andB, respectively. If w denotes the vector beginning at A and ending at B, then“tail-to-head” addition shows that u+w = v, and hence w = v−u, where −udenotes the vector (−1)u. (See Figure 1.3, in which the quadrilateral OABCis a parallelogram.) Since a scalar multiple of w is parallel to w but possiblyof a different length than w, any point on the line joining A and B may beobtained as the endpoint of a vector that begins at A and has the form twfor some real number t. Conversely, the endpoint of every vector of the formtw that begins at A lies on the line joining A and B. Thus an equation of theline through A and B is x = u + tw = u + t(v − u), where t is a real numberand x denotes an arbitrary point on the line. Notice also that the endpointC of the vector v − u in Figure 1.3 has coordinates equal to the difference ofthe coordinates of B and A.

��

��

��

�

�

O

A

B

C

u

v

v − u

w

Figure 1.3

Example 1

Let A and B be points having coordinates (−2, 0, 1) and (4, 5, 3), respectively.The endpoint C of the vector emanating from the origin and having the samedirection as the vector beginning at A and terminating at B has coordinates(4, 5, 3)− (−2, 0, 1) = (6, 5, 2). Hence the equation of the line through A andB is

x = (−2, 0, 1) + t(6, 5, 2). ♦

Now let A, B, and C denote any three noncollinear points in space. Thesepoints determine a unique plane, and its equation can be found by use of ourprevious observations about vectors. Let u and v denote vectors beginning atA and ending at B and C, respectively. Observe that any point in the planecontaining A, B, and C is the endpoint S of a vector x beginning at A andhaving the form su+ tv for some real numbers s and t. The endpoint of su isthe point of intersection of the line through A and B with the line through S

Sec. 1.1 Introduction 5

�

�

��

� �A

B

C

S

u

su

x

tv v

Figure 1.4

parallel to the line through A and C. (See Figure 1.4.) A similar procedurelocates the endpoint of tv. Moreover, for any real numbers s and t, the vectorsu + tv lies in the plane containing A, B, and C. It follows that an equationof the plane containing A, B, and C is

x = A + su + tv,

where s and t are arbitrary real numbers and x denotes an arbitrary point inthe plane.

Example 2

Let A, B, and C be the points having coordinates (1, 0, 2), (−3,−2, 4), and(1, 8,−5), respectively. The endpoint of the vector emanating from the originand having the same length and direction as the vector beginning at A andterminating at B is

(−3,−2, 4) − (1, 0, 2) = (−4,−2, 2).

Similarly, the endpoint of a vector emanating from the origin and having thesame length and direction as the vector beginning at A and terminating at Cis (1, 8,−5)−(1, 0, 2) = (0, 8,−7). Hence the equation of the plane containingthe three given points is

x = (1, 0, 2) + s(−4,−2, 2) + t(0, 8,−7). ♦

Any mathematical structure possessing the eight properties on page 3 iscalled a vector space. In the next section we formally define a vector spaceand consider many examples of vector spaces other than the ones mentionedabove.

EXERCISES

1. Determine whether the vectors emanating from the origin and termi-nating at the following pairs of points are parallel.


(a) (3, 1, 2) and (6, 4, 2)(b) (−3, 1, 7) and (9,−3,−21)(c) (5,−6, 7) and (−5, 6,−7)(d) (2, 0,−5) and (5, 0,−2)

2. Find the equations of the lines through the following pairs of points inspace.

(a) (3,−2, 4) and (−5, 7, 1)(b) (2, 4, 0) and (−3,−6, 0)(c) (3, 7, 2) and (3, 7,−8)(d) (−2,−1, 5) and (3, 9, 7)

3. Find the equations of the planes containing the following points in space.

(a) (2,−5,−1), (0, 4, 6), and (−3, 7, 1)(b) (3,−6, 7), (−2, 0,−4), and (5,−9,−2)(c) (−8, 2, 0), (1, 3, 0), and (6,−5, 0)(d) (1, 1, 1), (5, 5, 5), and (−6, 4, 2)

4. What are the coordinates of the vector 0 in the Euclidean plane thatsatisfies property 3 on page 3? Justify your answer.

5. Prove that if the vector x emanates from the origin of the Euclideanplane and terminates at the point with coordinates (a1, a2), then thevector tx that emanates from the origin terminates at the point withcoordinates (ta1, ta2).

6. Show that the midpoint of the line segment joining the points (a, b) and(c, d) is ((a + c)/2, (b + d)/2).

7. Prove that the diagonals of a parallelogram bisect each other.

1.2 VECTOR SPACES

In Section 1.1, we saw that with the natural definitions of vector addition andscalar multiplication, the vectors in a plane satisfy the eight properties listedon page 3. Many other familiar algebraic systems also permit definitions ofaddition and scalar multiplication that satisfy the same eight properties. Inthis section, we introduce some of these systems, but first we formally definethis type of algebraic structure.

Definitions. A vector space (or linear space) V over a field 2 Fconsists of a set on which two operations (called addition and scalar mul-tiplication, respectively) are defined so that for each pair of elements x, y,

2Fields are discussed in Appendix C.

Sec. 1.2 Vector Spaces 7

in V there is a unique element x + y in V, and for each element a in F andeach element x in V there is a unique element ax in V, such that the followingconditions hold.

(VS 1) For all x, y in V, x + y = y + x (commutativity of addition).

(VS 2) For all x, y, z in V, (x + y) + z = x + (y + z) (associativity ofaddition).

(VS 3) There exists an element in V denoted by 0 such that x+ 0 = x foreach x in V.

(VS 4) For each element x in V there exists an element y in V such that

x + y = 0 .

(VS 5) For each element x in V, 1x = x.

(VS 6) For each pair of elements a, b in F and each element x in V,

(ab)x = a(bx).

(VS 7) For each element a in F and each pair of elements x, y in V,

a(x + y) = ax + ay.

(VS 8) For each pair of elements a, b in F and each element x in V,

(a + b)x = ax + bx.

The elements x + y and ax are called the sum of x and y and the productof a and x, respectively.

The elements of the field F are called scalars and the elements of thevector space V are called vectors. The reader should not confuse this use ofthe word “vector” with the physical entity discussed in Section 1.1: the word“vector” is now being used to describe any element of a vector space.

A vector space is frequently discussed in the text without explicitly men-tioning its field of scalars. The reader is cautioned to remember, however,that every vector space is regarded as a vector space over a given field, whichis denoted by F . Occasionally we restrict our attention to the fields of realand complex numbers, which are denoted R and C, respectively.

Observe that (VS 2) permits us to unambiguously define the addition ofany finite number of vectors (without the use of parentheses).

In the remainder of this section we introduce several important examplesof vector spaces that are studied throughout this text. Observe that in de-scribing a vector space, it is necessary to specify not only the vectors but alsothe operations of addition and scalar multiplication.

An object of the form (a1, a2, . . . , an), where the entries a1, a2, . . . , an areelements of a field F , is called an n-tuple with entries from F . The elements


a1, a2, . . . , an are called the entries or components of the n-tuple. Twon-tuples (a1, a2, . . . , an) and (b1, b2, . . . , bn) with entries from a field F arecalled equal if ai = bi for i = 1, 2, . . . , n.

Example 1

The set of all n-tuples with entries from a field F is denoted by Fn. This set is avector space over F with the operations of coordinatewise addition and scalarmultiplication; that is, if u = (a1, a2, . . . , an) ∈ Fn, v = (b1, b2 . . . , bn) ∈ Fn,and c ∈ F , then

u + v = (a1 + b1, a2 + b2, . . . , an + bn) and cu = (ca1, ca2, . . . , can).

Thus R3 is a vector space over R. In this vector space,

(3,−2, 0) + (−1, 1, 4) = (2,−1, 4) and − 5(1,−2, 0) = (−5, 10, 0).

Similarly, C2 is a vector space over C. In this vector space,

(1 + i, 2) + (2 − 3i, 4i) = (3 − 2i, 2 + 4i) and i(1 + i, 2) = (−1 + i, 2i).

Vectors in Fn may be written as column vectors⎛⎜⎜⎜⎝a1

a2

...an

⎞⎟⎟⎟⎠rather than as row vectors (a1, a2, . . . , an). Since a 1-tuple whose only entryis from F can be regarded as an element of F , we usually write F rather thanF1 for the vector space of 1-tuples with entry from F . ♦

An m×n matrix with entries from a field F is a rectangular array of theform ⎛⎜⎜⎜⎝

a11 a12 · · · a1n

a21 a22 · · · a2n

......

...am1 am2 · · · amn

⎞⎟⎟⎟⎠ ,

where each entry aij (1 ≤ i ≤ m, 1 ≤ j ≤ n) is an element of F . Wecall the entries aij with i = j the diagonal entries of the matrix. Theentries ai1, ai2, . . . , ain compose the ith row of the matrix, and the entriesa1j , a2j , . . . , amj compose the j th column of the matrix. The rows of thepreceding matrix are regarded as vectors in Fn, and the columns are regardedas vectors in Fm. The m× n matrix in which each entry equals zero is calledthe zero matrix and is denoted by O.


In this book, we denote matrices by capital italic letters (e.g., A, B, andC), and we denote the entry of a matrix A that lies in row i and column j byAij . In addition, if the number of rows and columns of a matrix are equal,the matrix is called square.

Two m × n matrices A and B are called equal if all their correspondingentries are equal, that is, if Aij = Bij for 1 ≤ i ≤ m and 1 ≤ j ≤ n.

Example 2

The set of all m×n matrices with entries from a field F is a vector space, whichwe denote by Mm×n(F ), with the following operations of matrix additionand scalar multiplication: For A, B ∈ Mm×n(F ) and c ∈ F ,

(A + B)ij = Aij + Bij and (cA)ij = cAij

for 1 ≤ i ≤ m and 1 ≤ j ≤ n. For instance,(2 0 −11 −3 4

)+(−5 −2 6

3 4 −1

)=(−3 −2 5

4 1 3

)and

−3(

1 0 −2−3 2 3

)=(−3 0 6

9 −6 −9

)in M2×3(R). ♦Example 3

Let S be any nonempty set and F be any field, and let F(S, F ) denote theset of all functions from S to F . Two functions f and g in F(S, F ) are calledequal if f(s) = g(s) for each s ∈ S. The set F(S, F ) is a vector space withthe operations of addition and scalar multiplication defined for f, g ∈ F(S, F )and c ∈ F by

(f + g)(s) = f(s) + g(s) and (cf)(s) = c[f(s)]

for each s ∈ S. Note that these are the familiar operations of addition andscalar multiplication for functions used in algebra and calculus. ♦

A polynomial with coefficients from a field F is an expression of the form

f(x) = anxn + an−1xn−1 + · · · + a1x + a0,

where n is a nonnegative integer and each ak, called the coefficient of xk, isin F . If f(x) = 0 , that is, if an = an−1 = · · · = a0 = 0, then f(x) is calledthe zero polynomial and, for convenience, its degree is defined to be −1;


otherwise, the degree of a polynomial is defined to be the largest exponentof x that appears in the representation

f(x) = anxn + an−1xn−1 + · · · + a1x + a0

with a nonzero coefficient. Note that the polynomials of degree zero may bewritten in the form f(x) = c for some nonzero scalar c. Two polynomials,

f(x) = anxn + an−1xn−1 + · · · + a1x + a0

and

g(x) = bmxm + bm−1xm−1 + · · · + b1x + b0,

are called equal if m = n and ai = bi for i = 0, 1, . . . , n.When F is a field containing infinitely many scalars, we usually regard

a polynomial with coefficients from F as a function from F into F . (Seepage 569.) In this case, the value of the function

f(x) = anxn + an−1xn−1 + · · · + a1x + a0

at c ∈ F is the scalar

f(c) = ancn + an−1cn−1 + · · · + a1c + a0.

Here either of the notations f or f(x) is used for the polynomial function

f(x) = anxn + an−1xn−1 + · · · + a1x + a0.

Example 4

Let

f(x) = anxn + an−1xn−1 + · · · + a1x + a0

and

g(x) = bmxm + bm−1xm−1 + · · · + b1x + b0

be polynomials with coefficients from a field F . Suppose that m ≤ n, anddefine bm+1 = bm+2 = · · · = bn = 0. Then g(x) can be written as

g(x) = bnxn + bn−1xn−1 + · · · + b1x + b0.

Define

f(x) + g(x) = (an + bn)xn+(an−1+ bn−1)xn−1+· · ·+(a1 + b1)x+(a0 + b0)

and for any c ∈ F , define

cf(x) = canxn + can−1xn−1 + · · · + ca1x + ca0.

With these operations of addition and scalar multiplication, the set of allpolynomials with coefficients from F is a vector space, which we denote byP(F ). ♦


We will see in Exercise 23 of Section 2.4 that the vector space defined inthe next example is essentially the same as P(F ).

Example 5

Let F be any field. A sequence in F is a function σ from the positive integersinto F . In this book, the sequence σ such that σ(n) = an for n = 1, 2, . . . isdenoted {an}. Let V consist of all sequences {an} in F that have only a finitenumber of nonzero terms an. If {an} and {bn} are in V and t ∈ F , define

{an} + {bn} = {an + bn} and t{an} = {tan}.

With these operations V is a vector space. ♦Our next two examples contain sets on which addition and scalar multi-

plication are defined, but which are not vector spaces.

Example 6

Let S = {(a1, a2) : a1, a2 ∈ R}. For (a1, a2), (b1, b2) ∈ S and c ∈ R, define

(a1, a2) + (b1, b2) = (a1 + b1, a2 − b2) and c(a1, a2) = (ca1, ca2).

Since (VS 1), (VS 2), and (VS 8) fail to hold, S is not a vector space withthese operations. ♦Example 7

Let S be as in Example 6. For (a1, a2), (b1, b2) ∈ S and c ∈ R, define

(a1, a2) + (b1, b2) = (a1 + b1, 0) and c(a1, a2) = (ca1, 0).

Then S is not a vector space with these operations because (VS 3) (hence(VS 4)) and (VS 5) fail. ♦

We conclude this section with a few of the elementary consequences of thedefinition of a vector space.

Theorem 1.1 (Cancellation Law for Vector Addition). If x, y,and z are vectors in a vector space V such that x + z = y + z, then x = y.

Proof. There exists a vector v in V such that z + v = 0 (VS 4). Thus

x = x + 0 = x + (z + v) = (x + z) + v

= (y + z) + v = y + (z + v) = y + 0 = y

by (VS 2) and (VS 3).

Corollary 1. The vector 0 described in (VS 3) is unique.


Proof. Exercise.

Corollary 2. The vector y described in (VS 4) is unique.

Proof. Exercise.

The vector 0 in (VS 3) is called the zero vector of V, and the vector y in(VS 4) (that is, the unique vector such that x+y = 0 ) is called the additiveinverse of x and is denoted by −x.

The next result contains some of the elementary properties of scalar mul-tiplication.

Theorem 1.2. In any vector space V, the following statements are true:(a) 0x = 0 for each x ∈ V.(b) (−a)x = −(ax) = a(−x) for each a ∈ F and each x ∈ V.(c) a0 = 0 for each a ∈ F .

Proof. (a) By (VS 8), (VS 3), and (VS 1), it follows that

0x + 0x = (0 + 0)x = 0x = 0x + 0 = 0 + 0x.

Hence 0x = 0 by Theorem 1.1.(b) The vector −(ax) is the unique element of V such that ax+[−(ax)] =

0 . Thus if ax+(−a)x = 0 , Corollary 2 to Theorem 1.1 implies that (−a)x =−(ax). But by (VS 8),

ax + (−a)x = [a + (−a)]x = 0x = 0

by (a). Consequently (−a)x = −(ax). In particular, (−1)x = −x. So,by (VS 6),

a(−x) = a[(−1)x] = [a(−1)]x = (−a)x.

The proof of (c) is similar to the proof of (a).

EXERCISES

1. Label the following statements as true or false.(a) Every vector space contains a zero vector.(b) A vector space may have more than one zero vector.(c) In any vector space, ax = bx implies that a = b.(d) In any vector space, ax = ay implies that x = y.(e) A vector in Fn may be regarded as a matrix in Mn×1(F ).(f) An m × n matrix has m columns and n rows.(g) In P(F ), only polynomials of the same degree may be added.(h) If f and g are polynomials of degree n, then f + g is a polynomial

of degree n.(i) If f is a polynomial of degree n and c is a nonzero scalar, then cf

is a polynomial of degree n.


(j) A nonzero scalar of F may be considered to be a polynomial inP(F ) having degree zero.

(k) Two functions in F(S, F ) are equal if and only if they have thesame value at each element of S.

2. Write the zero vector of M3×4(F ).

3. If

M =(

1 2 34 5 6

),

what are M13, M21, and M22?

4. Perform the indicated operations.

(a)(

2 5 −31 0 7

)+(

4 −2 5−5 3 2

)

(b)

⎛⎝−6 43 −21 8

⎞⎠+

⎛⎝7 −50 −32 0

⎞⎠(c) 4

(2 5 −31 0 7

)

(d) −5

⎛⎝−6 43 −21 8

⎞⎠(e) (2x4 − 7x3 + 4x + 3) + (8x3 + 2x2 − 6x + 7)(f) (−3x3 + 7x2 + 8x − 6) + (2x3 − 8x + 10)(g) 5(2x7 − 6x4 + 8x2 − 3x)(h) 3(x5 − 2x3 + 4x + 2)

Exercises 5 and 6 show why the definitions of matrix addition and scalarmultiplication (as defined in Example 2) are the appropriate ones.

5. Richard Gard (“Effects of Beaver on Trout in Sagehen Creek, Cali-fornia,” J. Wildlife Management, 25, 221-242) reports the followingnumber of trout having crossed beaver dams in Sagehen Creek.

Upstream Crossings

Fall Spring Summer

Brook trout 8 3 1

Rainbow trout 3 0 0

Brown trout 3 0 0


Downstream Crossings

Fall Spring Summer

Brook trout 9 1 4

Rainbow trout 3 0 0

Brown trout 1 1 0

Record the upstream and downstream crossings in two 3 × 3 matrices,and verify that the sum of these matrices gives the total number ofcrossings (both upstream and downstream) categorized by trout speciesand season.

6. At the end of May, a furniture store had the following inventory.

Early Mediter-

American Spanish ranean Danish

Living room suites 4 2 1 3

Bedroom suites 5 1 1 4

Dining room suites 3 1 2 6

Record these data as a 3 × 4 matrix M . To prepare for its June sale,the store decided to double its inventory on each of the items listed inthe preceding table. Assuming that none of the present stock is solduntil the additional furniture arrives, verify that the inventory on handafter the order is filled is described by the matrix 2M . If the inventoryat the end of June is described by the matrix

A =

⎛⎝5 3 1 26 2 1 51 0 3 3

⎞⎠ ,

interpret 2M − A. How many suites were sold during the June sale?

7. Let S = {0, 1} and F = R. In F(S, R), show that f = g and f + g = h,where f(t) = 2t + 1, g(t) = 1 + 4t − 2t2, and h(t) = 5t + 1.

8. In any vector space V, show that (a + b)(x + y) = ax + ay + bx + by forany x, y ∈ V and any a, b ∈ F .

9. Prove Corollaries 1 and 2 of Theorem 1.1 and Theorem 1.2(c).

10. Let V denote the set of all differentiable real-valued functions definedon the real line. Prove that V is a vector space with the operations ofaddition and scalar multiplication defined in Example 3.


11. Let V = {0} consist of a single vector 0 and define 0 + 0 = 0 andc0 = 0 for each scalar c in F . Prove that V is a vector space over F .(V is called the zero vector space.)

12. A real-valued function f defined on the real line is called an even func-tion if f(−t) = f(t) for each real number t. Prove that the set of evenfunctions defined on the real line with the operations of addition andscalar multiplication defined in Example 3 is a vector space.

13. Let V denote the set of ordered pairs of real numbers. If (a1, a2) and(b1, b2) are elements of V and c ∈ R, define

(a1, a2) + (b1, b2) = (a1 + b1, a2b2) and c(a1, a2) = (ca1, a2).

Is V a vector space over R with these operations? Justify your answer.

14. Let V = {(a1, a2, . . . , an) : ai ∈ C for i = 1, 2, . . . n}; so V is a vectorspace over C by Example 1. Is V a vector space over the field of realnumbers with the operations of coordinatewise addition and multipli-cation?

15. Let V = {(a1, a2, . . . , an) : ai ∈ R for i = 1, 2, . . . n}; so V is a vec-tor space over R by Example 1. Is V a vector space over the field ofcomplex numbers with the operations of coordinatewise addition andmultiplication?

16. Let V denote the set of all m × n matrices with real entries; so Vis a vector space over R by Example 2. Let F be the field of rationalnumbers. Is V a vector space over F with the usual definitions of matrixaddition and scalar multiplication?

17. Let V = {(a1, a2) : a1, a2 ∈ F}, where F is a field. Define addition ofelements of V coordinatewise, and for c ∈ F and (a1, a2) ∈ V, define

c(a1, a2) = (a1, 0).

Is V a vector space over F with these operations? Justify your answer.

18. Let V = {(a1, a2) : a1, a2 ∈ R}. For (a1, a2), (b1, b2) ∈ V and c ∈ R,define

(a1, a2) + (b1, b2) = (a1 + 2b1, a2 + 3b2) and c(a1, a2) = (ca1, ca2).



19. Let V = {(a1, a2) : a1, a2 ∈ R}. Define addition of elements of V coor-dinatewise, and for (a1, a2) in V and c ∈ R, define

c(a1, a2) =

⎧⎪⎨⎪⎩(0, 0) if c = 0(ca1,

a2

c

)if c �= 0.


20. Let V be the set of sequences {an} of real numbers. (See Example 5 forthe definition of a sequence.) For {an}, {bn} ∈ V and any real numbert, define

{an} + {bn} = {an + bn} and t{an} = {tan}.Prove that, with these operations, V is a vector space over R.

21. Let V and W be vector spaces over a field F . Let

Z = {(v, w) : v ∈ V and w ∈ W}.Prove that Z is a vector space over F with the operations

(v1, w1) + (v2, w2) = (v1 + v2, w1 + w2) and c(v1, w1) = (cv1, cw1).

22. How many matrices are there in the vector space Mm×n(Z2)? (SeeAppendix C.)

1.3 SUBSPACES

In the study of any algebraic structure, it is of interest to examine subsets thatpossess the same structure as the set under consideration. The appropriatenotion of substructure for vector spaces is introduced in this section.

Definition. A subset W of a vector space V over a field F is called asubspace of V if W is a vector space over F with the operations of additionand scalar multiplication defined on V.

In any vector space V, note that V and {0} are subspaces. The latter iscalled the zero subspace of V.

Fortunately it is not necessary to verify all of the vector space propertiesto prove that a subset is a subspace. Because properties (VS 1), (VS 2),(VS 5), (VS 6), (VS 7), and (VS 8) hold for all vectors in the vector space,these properties automatically hold for the vectors in any subset. Thus asubset W of a vector space V is a subspace of V if and only if the followingfour properties hold.

Sec. 1.3 Subspaces 17

1. x+y ∈ W whenever x ∈ W and y ∈ W. (W is closed under addition.)2. cx ∈ W whenever c ∈ F and x ∈ W. (W is closed under scalar

multiplication.)3. W has a zero vector.4. Each vector in W has an additive inverse in W.

The next theorem shows that the zero vector of W must be the same asthe zero vector of V and that property 4 is redundant.

Theorem 1.3. Let V be a vector space and W a subset of V. Then Wis a subspace of V if and only if the following three conditions hold for theoperations defined in V.

(a) 0 ∈ W.(b) x + y ∈ W whenever x ∈ W and y ∈ W.(c) cx ∈ W whenever c ∈ F and x ∈ W.

Proof. If W is a subspace of V, then W is a vector space with the operationsof addition and scalar multiplication defined on V. Hence conditions (b) and(c) hold, and there exists a vector 0 ′ ∈ W such that x + 0 ′ = x for eachx ∈ W. But also x + 0 = x, and thus 0 ′ = 0 by Theorem 1.1 (p. 11). Socondition (a) holds.

Conversely, if conditions (a), (b), and (c) hold, the discussion precedingthis theorem shows that W is a subspace of V if the additive inverse of eachvector in W lies in W. But if x ∈ W, then (−1)x ∈ W by condition (c), and−x = (−1)x by Theorem 1.2 (p. 12). Hence W is a subspace of V.

The preceding theorem provides a simple method for determining whetheror not a given subset of a vector space is a subspace. Normally, it is this resultthat is used to prove that a subset is, in fact, a subspace.

The transpose At of an m × n matrix A is the n × m matrix obtainedfrom A by interchanging the rows with the columns; that is, (At)ij = Aji.For example,

(1 −2 30 5 −1

)t

=

⎛⎝ 1 0−2 5

3 −1

⎞⎠ and(

1 22 3

)t

=(

1 22 3

).

A symmetric matrix is a matrix A such that At = A. For example, the2 × 2 matrix displayed above is a symmetric matrix. Clearly, a symmetricmatrix must be square. The set W of all symmetric matrices in Mn×n(F ) isa subspace of Mn×n(F ) since the conditions of Theorem 1.3 hold:

1. The zero matrix is equal to its transpose and hence belongs to W.

It is easily proved that for any matrices A and B and any scalars a and b,(aA + bB)t = aAt + bBt. (See Exercise 3.) Using this fact, we show that theset of symmetric matrices is closed under addition and scalar multiplication.


2. If A ∈ W and B ∈ W, then At = A and Bt = B. Thus (A + B)t =At + Bt = A + B, so that A + B ∈ W.

3. If A ∈ W, then At = A. So for any a ∈ F , we have (aA)t = aAt = aA.Thus aA ∈ W.

The examples that follow provide further illustrations of the concept of asubspace. The first three are particularly important.

Example 1

Let n be a nonnegative integer, and let Pn(F ) consist of all polynomials inP(F ) having degree less than or equal to n. Since the zero polynomial hasdegree −1, it is in Pn(F ). Moreover, the sum of two polynomials with degreesless than or equal to n is another polynomial of degree less than or equal to n,and the product of a scalar and a polynomial of degree less than or equal ton is a polynomial of degree less than or equal to n. So Pn(F ) is closed underaddition and scalar multiplication. It therefore follows from Theorem 1.3 thatPn(F ) is a subspace of P(F ). ♦Example 2

Let C(R) denote the set of all continuous real-valued functions defined on R.Clearly C(R) is a subset of the vector space F(R, R) defined in Example 3of Section 1.2. We claim that C(R) is a subspace of F(R, R). First notethat the zero of F(R, R) is the constant function defined by f(t) = 0 for allt ∈ R. Since constant functions are continuous, we have f ∈ C(R). Moreover,the sum of two continuous functions is continuous, and the product of a realnumber and a continuous function is continuous. So C(R) is closed underaddition and scalar multiplication and hence is a subspace of F(R, R) byTheorem 1.3. ♦Example 3

An n×n matrix M is called a diagonal matrix if Mij = 0 whenever i �= j,that is, if all its nondiagonal entries are zero. Clearly the zero matrix is adiagonal matrix because all of its entries are 0. Moreover, if A and B arediagonal n × n matrices, then whenever i �= j,

(A + B)ij = Aij + Bij = 0 + 0 = 0 and (cA)ij = cAij = c 0 = 0

for any scalar c. Hence A + B and cA are diagonal matrices for any scalarc. Therefore the set of diagonal matrices is a subspace of Mn×n(F ) by Theo-rem 1.3. ♦Example 4

The trace of an n × n matrix M , denoted tr(M), is the sum of the diagonalentries of M ; that is,

tr(M) = M11 + M22 + · · · + Mnn.


It follows from Exercise 6 that the set of n × n matrices having trace equalto zero is a subspace of Mn×n(F ). ♦Example 5

The set of matrices in Mm×n(R) having nonnegative entries is not a subspaceof Mm×n(R) because it is not closed under scalar multiplication (by negativescalars). ♦

The next theorem shows how to form a new subspace from other sub-spaces.

Theorem 1.4. Any intersection of subspaces of a vector space V is asubspace of V.

Proof. Let C be a collection of subspaces of V, and let W denote theintersection of the subspaces in C. Since every subspace contains the zerovector, 0 ∈ W. Let a ∈ F and x, y ∈ W. Then x and y are contained in eachsubspace in C. Because each subspace in C is closed under addition and scalarmultiplication, it follows that x + y and ax are contained in each subspace inC. Hence x + y and ax are also contained in W, so that W is a subspace of Vby Theorem 1.3.

Having shown that the intersection of subspaces of a vector space V is asubspace of V, it is natural to consider whether or not the union of subspacesof V is a subspace of V. It is easily seen that the union of subspaces mustcontain the zero vector and be closed under scalar multiplication, but ingeneral the union of subspaces of V need not be closed under addition. In fact,it can be readily shown that the union of two subspaces of V is a subspace of Vif and only if one of the subspaces contains the other. (See Exercise 19.) Thereis, however, a natural way to combine two subspaces W1 and W2 to obtaina subspace that contains both W1 and W2. As we already have suggested,the key to finding such a subspace is to assure that it must be closed underaddition. This idea is explored in Exercise 23.

EXERCISES

1. Label the following statements as true or false.

(a) If V is a vector space and W is a subset of V that is a vector space,then W is a subspace of V.

(b) The empty set is a subspace of every vector space.(c) If V is a vector space other than the zero vector space, then V

contains a subspace W such that W �= V.(d) The intersection of any two subsets of V is a subspace of V.


(e) An n × n diagonal matrix can never have more than n nonzeroentries.

(f) The trace of a square matrix is the product of its diagonal entries.(g) Let W be the xy-plane in R3; that is, W = {(a1, a2, 0) : a1, a2 ∈ R}.

Then W = R2.

2. Determine the transpose of each of the matrices that follow. In addition,if the matrix is square, compute its trace.

(a)(−4 2

5 −1

)(b)

(0 8 −63 4 7

)

(c)

⎛⎝−3 90 −26 1

⎞⎠ (d)

⎛⎝ 10 0 −82 −4 3

−5 7 6

⎞⎠(e)

(1 −1 3 5

)(f)

(−2 5 1 47 0 1 −6

)

(g)

⎛⎝567

⎞⎠ (h)

⎛⎝−4 0 60 1 −36 −3 5

⎞⎠3. Prove that (aA + bB)t = aAt + bBt for any A, B ∈ Mm×n(F ) and any

a, b ∈ F .

4. Prove that (At)t = A for each A ∈ Mm×n(F ).

5. Prove that A + At is symmetric for any square matrix A.

6. Prove that tr(aA + bB) = a tr(A) + b tr(B) for any A, B ∈ Mn×n(F ).

7. Prove that diagonal matrices are symmetric matrices.

8. Determine whether the following sets are subspaces of R3 under theoperations of addition and scalar multiplication defined on R3. Justifyyour answers.

(a) W1 = {(a1, a2, a3) ∈ R3 : a1 = 3a2 and a3 = −a2}(b) W2 = {(a1, a2, a3) ∈ R3 : a1 = a3 + 2}(c) W3 = {(a1, a2, a3) ∈ R3 : 2a1 − 7a2 + a3 = 0}(d) W4 = {(a1, a2, a3) ∈ R3 : a1 − 4a2 − a3 = 0}(e) W5 = {(a1, a2, a3) ∈ R3 : a1 + 2a2 − 3a3 = 1}(f) W6 = {(a1, a2, a3) ∈ R3 : 5a2

1 − 3a22 + 6a2

3 = 0}9. Let W1, W3, and W4 be as in Exercise 8. Describe W1 ∩W3, W1 ∩W4,

and W3 ∩ W4, and observe that each is a subspace of R3.


10. Prove that W1 = {(a1, a2, . . . , an) ∈ Fn : a1 + a2 + · · · + an = 0} is asubspace of Fn, but W2 = {(a1, a2, . . . , an) ∈ Fn : a1 +a2 + · · ·+an = 1}is not.

11. Is the set W = {f(x) ∈ P(F ) : f(x) = 0 or f(x) has degree n} a subspaceof P(F ) if n ≥ 1? Justify your answer.

12. An m×n matrix A is called upper triangular if all entries lying belowthe diagonal entries are zero, that is, if Aij = 0 whenever i > j. Provethat the upper triangular matrices form a subspace of Mm×n(F ).

13. Let S be a nonempty set and F a field. Prove that for any s0 ∈ S,{f ∈ F(S, F ) : f(s0) = 0}, is a subspace of F(S, F ).

14. Let S be a nonempty set and F a field. Let C(S, F ) denote the set ofall functions f ∈ F(S, F ) such that f(s) = 0 for all but a finite numberof elements of S. Prove that C(S, F ) is a subspace of F(S, F ).

15. Is the set of all differentiable real-valued functions defined on R a sub-space of C(R)? Justify your answer.

16. Let Cn(R) denote the set of all real-valued functions defined on thereal line that have a continuous nth derivative. Prove that Cn(R) is asubspace of F(R, R).

17. Prove that a subset W of a vector space V is a subspace of V if andonly if W �= ∅, and, whenever a ∈ F and x, y ∈ W, then ax ∈ W andx + y ∈ W.

18. Prove that a subset W of a vector space V is a subspace of V if and onlyif 0 ∈ W and ax + y ∈ W whenever a ∈ F and x, y ∈ W .

19. Let W1 and W2 be subspaces of a vector space V. Prove that W1 ∪W2

is a subspace of V if and only if W1 ⊆ W2 or W2 ⊆ W1.

20.† Prove that if W is a subspace of a vector space V and w1, w2, . . . , wn arein W, then a1w1 +a2w2 + · · ·+anwn ∈ W for any scalars a1, a2, . . . , an.

21. Show that the set of convergent sequences {an} (i.e., those for whichlimn→∞ an exists) is a subspace of the vector space V in Exercise 20 ofSection 1.2.

22. Let F1 and F2 be fields. A function g ∈ F(F1, F2) is called an evenfunction if g(−t) = g(t) for each t ∈ F1 and is called an odd functionif g(−t) = −g(t) for each t ∈ F1. Prove that the set of all even functionsin F(F1, F2) and the set of all odd functions in F(F1, F2) are subspacesof F(F1, F2).

†A dagger means that this exercise is essential for a later section.


The following definitions are used in Exercises 23–30.

Definition. If S1 and S2 are nonempty subsets of a vector space V, thenthe sum of S1 and S2, denoted S1 +S2, is the set {x+y : x ∈ S1 and y ∈ S2}.

Definition. A vector space V is called the direct sum of W1 and W2 ifW1 and W2 are subspaces of V such that W1 ∩W2 = {0} and W1 + W2 = V.We denote that V is the direct sum of W1 and W2 by writing V = W1 ⊕ W2.

23. Let W1 and W2 be subspaces of a vector space V.

(a) Prove that W1 +W2 is a subspace of V that contains both W1 andW2.

(b) Prove that any subspace of V that contains both W1 and W2 mustalso contain W1 + W2.

24. Show that Fn is the direct sum of the subspaces

W1 = {(a1, a2, . . . , an) ∈ Fn : an = 0}

and

W2 = {(a1, a2, . . . , an) ∈ Fn : a1 = a2 = · · · = an−1 = 0}.

25. Let W1 denote the set of all polynomials f(x) in P(F ) such that in therepresentation

f(x) = anxn + an−1xn−1 + · · · + a1x + a0,

we have ai = 0 whenever i is even. Likewise let W2 denote the set ofall polynomials g(x) in P(F ) such that in the representation

g(x) = bmxm + bm−1xm−1 + · · · + b1x + b0,

we have bi = 0 whenever i is odd. Prove that P(F ) = W1 ⊕ W2.

26. In Mm×n(F ) define W1 = {A ∈ Mm×n(F ) : Aij = 0 whenever i > j}and W2 = {A ∈ Mm×n(F ) : Aij = 0 whenever i ≤ j}. (W1 is theset of all upper triangular matrices defined in Exercise 12.) Show thatMm×n(F ) = W1 ⊕ W2.

27. Let V denote the vector space consisting of all upper triangular n × nmatrices (as defined in Exercise 12), and let W1 denote the subspace ofV consisting of all diagonal matrices. Show that V = W1 ⊕ W2, whereW2 = {A ∈ V : Aij = 0 whenever i ≥ j}.


28. A matrix M is called skew-symmetric if M t = −M . Clearly, a skew-symmetric matrix is square. Let F be a field. Prove that the set W1

of all skew-symmetric n× n matrices with entries from F is a subspaceof Mn×n(F ). Now assume that F is not of characteristic 2 (see Ap-pendix C), and let W2 be the subspace of Mn×n(F ) consisting of allsymmetric n × n matrices. Prove that Mn×n(F ) = W1 ⊕ W2.

29. Let F be a field that is not of characteristic 2. Define

W1 = {A ∈ Mn×n(F ) : Aij = 0 whenever i ≤ j}and W2 to be the set of all symmetric n × n matrices with entriesfrom F . Both W1 and W2 are subspaces of Mn×n(F ). Prove thatMn×n(F ) = W1 ⊕ W2. Compare this exercise with Exercise 28.

30. Let W1 and W2 be subspaces of a vector space V. Prove that V is thedirect sum of W1 and W2 if and only if each vector in V can be uniquelywritten as x1 + x2, where x1 ∈ W1 and x2 ∈ W2.

31. Let W be a subspace of a vector space V over a field F . For any v ∈ Vthe set {v}+W = {v+w : w ∈ W} is called the coset of W containingv. It is customary to denote this coset by v + W rather than {v} + W.

(a) Prove that v + W is a subspace of V if and only if v ∈ W.(b) Prove that v1 + W = v2 + W if and only if v1 − v2 ∈ W.

Addition and scalar multiplication by scalars of F can be defined in thecollection S = {v + W : v ∈ V} of all cosets of W as follows:

(v1 + W) + (v2 + W) = (v1 + v2) + W

for all v1, v2 ∈ V and

a(v + W) = av + W

for all v ∈ V and a ∈ F .

(c) Prove that the preceding operations are well defined; that is, showthat if v1 + W = v′1 + W and v2 + W = v′2 + W, then

(v1 + W) + (v2 + W) = (v′1 + W) + (v′2 + W)

and

a(v1 + W) = a(v′1 + W)

for all a ∈ F .(d) Prove that the set S is a vector space with the operations defined in

(c). This vector space is called the quotient space of V moduloW and is denoted by V/W.


1.4 LINEAR COMBINATIONS AND SYSTEMS OF LINEAREQUATIONS

In Section 1.1, it was shown that the equation of the plane through threenoncollinear points A, B, and C in space is x = A + su + tv, where u andv denote the vectors beginning at A and ending at B and C, respectively,and s and t denote arbitrary real numbers. An important special case occurswhen A is the origin. In this case, the equation of the plane simplifies tox = su + tv, and the set of all points in this plane is a subspace of R3. (Thisis proved as Theorem 1.5.) Expressions of the form su + tv, where s and tare scalars and u and v are vectors, play a central role in the theory of vectorspaces. The appropriate generalization of such expressions is presented in thefollowing definitions.

Definitions. Let V be a vector space and S a nonempty subset of V. Avector v ∈ V is called a linear combination of vectors of S if there exista finite number of vectors u1, u2, . . . , un in S and scalars a1, a2, . . . , an in Fsuch that v = a1u1 + a2u2 + · · · + anun. In this case we also say that v isa linear combination of u1, u2, . . . , un and call a1, a2, . . . , an the coefficientsof the linear combination.

Observe that in any vector space V, 0v = 0 for each v ∈ V. Thus the zerovector is a linear combination of any nonempty subset of V.

Example 1

TABLE 1.1 Vitamin Content of 100 Grams of Certain Foods

A B1 B2 Niacin C

(units) (mg) (mg) (mg) (mg)

Apple butter 0 0.01 0.02 0.2 2

Raw, unpared apples (freshly harvested) 90 0.03 0.02 0.1 4

Chocolate-coated candy with coconut 0 0.02 0.07 0.2 0

center

Clams (meat only) 100 0.10 0.18 1.3 10

Cupcake from mix (dry form) 0 0.05 0.06 0.3 0

Cooked farina (unenriched) (0)a 0.01 0.01 0.1 (0)

Jams and preserves 10 0.01 0.03 0.2 2

Coconut custard pie (baked from mix) 0 0.02 0.02 0.4 0

Raw brown rice (0) 0.34 0.05 4.7 (0)

Soy sauce 0 0.02 0.25 0.4 0

Cooked spaghetti (unenriched) 0 0.01 0.01 0.3 0

Raw wild rice (0) 0.45 0.63 6.2 (0)

Source: Bernice K. Watt and Annabel L. Merrill, Composition of Foods (Agriculture Hand-book Number 8), Consumer and Food Economics Research Division, U.S. Department ofAgriculture, Washington, D.C., 1963.

aZeros in parentheses indicate that the amount of a vitamin present is either none or toosmall to measure.

Sec. 1.4 Linear Combinations and Systems of Linear Equations 25

Table 1.1 shows the vitamin content of 100 grams of 12 foods with respect tovitamins A, B1 (thiamine), B2 (riboflavin), niacin, and C (ascorbic acid).

The vitamin content of 100 grams of each food can be recorded as a columnvector in R5—for example, the vitamin vector for apple butter is⎛⎜⎜⎜⎜⎝

0.000.010.020.202.00

⎞⎟⎟⎟⎟⎠ .

Considering the vitamin vectors for cupcake, coconut custard pie, raw brownrice, soy sauce, and wild rice, we see that⎛⎜⎜⎜⎜⎝

0.000.050.060.300.00

⎞⎟⎟⎟⎟⎠+

⎛⎜⎜⎜⎜⎝0.000.020.020.400.00

⎞⎟⎟⎟⎟⎠+

⎛⎜⎜⎜⎜⎝0.000.340.054.700.00

⎞⎟⎟⎟⎟⎠+ 2

⎛⎜⎜⎜⎜⎝0.000.020.250.400.00

⎞⎟⎟⎟⎟⎠ =

⎛⎜⎜⎜⎜⎝0.000.450.636.200.00

⎞⎟⎟⎟⎟⎠ .

Thus the vitamin vector for wild rice is a linear combination of the vitaminvectors for cupcake, coconut custard pie, raw brown rice, and soy sauce. So100 grams of cupcake, 100 grams of coconut custard pie, 100 grams of rawbrown rice, and 200 grams of soy sauce provide exactly the same amounts ofthe five vitamins as 100 grams of raw wild rice. Similarly, since

2

⎛⎜⎜⎜⎜⎝0.000.010.020.202.00

⎞⎟⎟⎟⎟⎠+

⎛⎜⎜⎜⎜⎝90.000.030.020.104.00

⎞⎟⎟⎟⎟⎠+

⎛⎜⎜⎜⎜⎝0.000.020.070.200.00

⎞⎟⎟⎟⎟⎠+

⎛⎜⎜⎜⎜⎝0.000.010.010.100.00

⎞⎟⎟⎟⎟⎠+

⎛⎜⎜⎜⎜⎝10.000.010.030.202.00

⎞⎟⎟⎟⎟⎠+

⎛⎜⎜⎜⎜⎝0.000.010.010.300.00

⎞⎟⎟⎟⎟⎠ =

⎛⎜⎜⎜⎜⎝100.00

0.100.181.30

10.00

⎞⎟⎟⎟⎟⎠ ,

200 grams of apple butter, 100 grams of apples, 100 grams of chocolate candy,100 grams of farina, 100 grams of jam, and 100 grams of spaghetti provideexactly the same amounts of the five vitamins as 100 grams of clams. ♦

Throughout Chapters 1 and 2 we encounter many different situations inwhich it is necessary to determine whether or not a vector can be expressedas a linear combination of other vectors, and if so, how. This question oftenreduces to the problem of solving a system of linear equations. In Chapter 3,we discuss a general method for using matrices to solve any system of linearequations. For now, we illustrate how to solve a system of linear equations byshowing how to determine if the vector (2, 6, 8) can be expressed as a linearcombination of

u1 = (1, 2, 1), u2 = (−2,−4,−2), u3 = (0, 2, 3),


u4 = (2, 0,−3), and u5 = (−3, 8, 16).

Thus we must determine if there are scalars a1, a2, a3, a4, and a5 such that

(2, 6, 8) = a1u1 + a2u2 + a3u3 + a4u4 + a5u5

= a1(1, 2, 1) + a2(−2,−4,−2) + a3(0, 2, 3)+ a4(2, 0,−3) + a5(−3, 8, 16)

= (a1 − 2a2 + 2a4 − 3a5, 2a1 − 4a2 + 2a3 + 8a5,

a1 − 2a2 + 3a3 − 3a4 + 16a5).

Hence (2, 6, 8) can be expressed as a linear combination of u1, u2, u3, u4, andu5 if and only if there is a 5-tuple of scalars (a1, a2, a3, a4, a5) satisfying thesystem of linear equations

a1 − 2a2 + 2a4 − 3a5 = 22a1 − 4a2 + 2a3 + 8a5 = 6a1 − 2a2 + 3a3 − 3a4 + 16a5 = 8,

(1)

which is obtained by equating the corresponding coordinates in the precedingequation.

To solve system (1), we replace it by another system with the same solu-tions, but which is easier to solve. The procedure to be used expresses someof the unknowns in terms of others by eliminating certain unknowns fromall the equations except one. To begin, we eliminate a1 from every equationexcept the first by adding −2 times the first equation to the second and −1times the first equation to the third. The result is the following new system:

a1 − 2a2 + 2a4 − 3a5 = 22a3 − 4a4 + 14a5 = 23a3 − 5a4 + 19a5 = 6.

(2)

In this case, it happened that while eliminating a1 from every equationexcept the first, we also eliminated a2 from every equation except the first.This need not happen in general. We now want to make the coefficient of a3 inthe second equation equal to 1, and then eliminate a3 from the third equation.To do this, we first multiply the second equation by 1

2 , which produces

a1 − 2a2 + 2a4 − 3a5 = 2a3 − 2a4 + 7a5 = 1

3a3 − 5a4 + 19a5 = 6.

Next we add −3 times the second equation to the third, obtaining

a1 − 2a2 + 2a4 − 3a5 = 2a3 − 2a4 + 7a5 = 1

a4 − 2a5 = 3.(3)


We continue by eliminating a4 from every equation of (3) except the third.This yields

a1 − 2a2 + a5 = −4a3 + 3a5 = 7

a4 − 2a5 = 3.(4)

System (4) is a system of the desired form: It is easy to solve for the firstunknown present in each of the equations (a1, a3, and a4) in terms of theother unknowns (a2 and a5). Rewriting system (4) in this form, we find that

a1 = 2a2 − a5 − 4a3 = − 3a5 + 7a4 = 2a5 + 3.

Thus for any choice of scalars a2 and a5, a vector of the form

(a1, a2, a3, a4, a5) = (2a2 − a5 − 4, a2,−3a5 + 7, 2a5 + 3, a5)

is a solution to system (1). In particular, the vector (−4, 0, 7, 3, 0) obtainedby setting a2 = 0 and a5 = 0 is a solution to (1). Therefore

(2, 6, 8) = −4u1 + 0u2 + 7u3 + 3u4 + 0u5,

so that (2, 6, 8) is a linear combination of u1, u2, u3, u4, and u5.The procedure just illustrated uses three types of operations to simplify

the original system:

1. interchanging the order of any two equations in the system;2. multiplying any equation in the system by a nonzero constant;3. adding a constant multiple of any equation to another equation in the

system.

In Section 3.4, we prove that these operations do not change the set ofsolutions to the original system. Note that we employed these operations toobtain a system of equations that had the following properties:

1. The first nonzero coefficient in each equation is one.2. If an unknown is the first unknown with a nonzero coefficient in some

equation, then that unknown occurs with a zero coefficient in each ofthe other equations.

3. The first unknown with a nonzero coefficient in any equation has alarger subscript than the first unknown with a nonzero coefficient inany preceding equation.


To help clarify the meaning of these properties, note that none of thefollowing systems meets these requirements.

x1 + 3x2 + x4 = 72x3 − 5x4 = −1 (5)

x1 − 2x2 + 3x3 + x5 = −5x3 − 2x5 = 9

x4 + 3x5 = 6(6)

x1 − 2x3 + x5 = 1x4 − 6x5 = 0

x2 + 5x3 − 3x5 = 2.(7)

Specifically, system (5) does not satisfy property 1 because the first nonzerocoefficient in the second equation is 2; system (6) does not satisfy property 2because x3, the first unknown with a nonzero coefficient in the second equa-tion, occurs with a nonzero coefficient in the first equation; and system (7)does not satisfy property 3 because x2, the first unknown with a nonzerocoefficient in the third equation, does not have a larger subscript than x4, thefirst unknown with a nonzero coefficient in the second equation.

Once a system with properties 1, 2, and 3 has been obtained, it is easyto solve for some of the unknowns in terms of the others (as in the precedingexample). If, however, in the course of using operations 1, 2, and 3 a systemcontaining an equation of the form 0 = c, where c is nonzero, is obtained,then the original system has no solutions. (See Example 2.)

We return to the study of systems of linear equations in Chapter 3. Wediscuss there the theoretical basis for this method of solving systems of linearequations and further simplify the procedure by use of matrices.

Example 2

We claim that

2x3 − 2x2 + 12x − 6

is a linear combination of

x3 − 2x2 − 5x − 3 and 3x3 − 5x2 − 4x − 9

in P3(R), but that

3x3 − 2x2 + 7x + 8

is not. In the first case we wish to find scalars a and b such that

2x3 − 2x2 + 12x − 6 = a(x3 − 2x2 − 5x − 3) + b(3x3 − 5x2 − 4x − 9)


= (a + 3b)x3 + (−2a − 5b)x2 + (−5a − 4b)x + (−3a − 9b).

Thus we are led to the following system of linear equations:

a + 3b = 2−2a − 5b = −2−5a − 4b = 12−3a − 9b = −6.

Adding appropriate multiples of the first equation to the others in order toeliminate a, we find that

a + 3b = 2b = 2

11b = 220b = 0.

Now adding the appropriate multiples of the second equation to the othersyields

a = −4b = 20 = 00 = 0.

Hence

2x3 − 2x2 + 12x − 6 = −4(x3 − 2x2 − 5x − 3) + 2(3x3 − 5x2 − 4x − 9).

In the second case, we wish to show that there are no scalars a and b forwhich

3x3 − 2x2 + 7x + 8 = a(x3 − 2x2 − 5x − 3) + b(3x3 − 5x2 − 4x − 9).

Using the preceding technique, we obtain a system of linear equations

a + 3b = 3−2a − 5b = −2−5a − 4b = 7−3a − 9b = 8.

(8)

Eliminating a as before yields

a + 3b = 3b = 4

11b = 220 = 17.

But the presence of the inconsistent equation 0 = 17 indicates that (8)has no solutions. Hence 3x3 − 2x2 + 7x + 8 is not a linear combination ofx3 − 2x2 − 5x − 3 and 3x3 − 5x2 − 4x − 9. ♦


Throughout this book, we form the set of all linear combinations of someset of vectors. We now name such a set of linear combinations.

Definition. Let S be a nonempty subset of a vector space V. The spanof S, denoted span(S), is the set consisting of all linear combinations of thevectors in S. For convenience, we define span(∅) = {0}.

In R3, for instance, the span of the set {(1, 0, 0), (0, 1, 0)} consists of allvectors in R3 that have the form a(1, 0, 0) + b(0, 1, 0) = (a, b, 0) for somescalars a and b. Thus the span of {(1, 0, 0), (0, 1, 0)} contains all the points inthe xy-plane. In this case, the span of the set is a subspace of R3. This factis true in general.

Theorem 1.5. The span of any subset S of a vector space V is a subspaceof V. Moreover, any subspace of V that contains S must also contain thespan of S.

Proof. This result is immediate if S = ∅ because span(∅) = {0}, whichis a subspace that is contained in any subspace of V.

If S �= ∅, then S contains a vector z. So 0z = 0 is in span(S). Letx, y ∈ span(S). Then there exist vectors u1, u2, . . . , um, v1, v2, . . . , vn in Sand scalars a1, a2, . . . , am, b1, b2, . . . , bn such that

x = a1u1 + a2u2 + · · · + amum and y = b1v1 + b2v2 + · · · + bnvn.

Then

x + y = a1u1 + a2u2 + · · · + amum + b1v1 + b2v2 + · · · + bnvn

and, for any scalar c,

cx = (ca1)u1 + (ca2)u2 + · · · + (cam)um

are clearly linear combinations of the vectors in S; so x + y and cx are inspan(S). Thus span(S) is a subspace of V.

Now let W denote any subspace of V that contains S. If w ∈ span(S), thenw has the form w = c1w1+c2w2+· · ·+ckwk for some vectors w1, w2, . . . , wk inS and some scalars c1, c2, . . . , ck. Since S ⊆ W, we have w1, w2, . . . , wk ∈ W.Therefore w = c1w1 + c2w2 + · · · + ckwk is in W by Exercise 20 of Section1.3. Because w, an arbitrary vector in span(S), belongs to W, it follows thatspan(S) ⊆ W.

Definition. A subset S of a vector space V generates (or spans) Vif span(S) = V. In this case, we also say that the vectors of S generate (orspan) V.


Example 3

The vectors (1, 1, 0), (1, 0, 1), and (0, 1, 1) generate R3 since an arbitrary vector(a1, a2, a3) in R3 is a linear combination of the three given vectors; in fact,the scalars r, s, and t for which

r(1, 1, 0) + s(1, 0, 1) + t(0, 1, 1) = (a1, a2, a3)

are

r =12(a1 + a2 − a3), s =

12(a1 − a2 + a3), and t =

12(−a1 + a2 + a3). ♦

Example 4

The polynomials x2 + 3x− 2, 2x2 + 5x− 3, and −x2 − 4x + 4 generate P2(R)since each of the three given polynomials belongs to P2(R) and each polyno-mial ax2 + bx + c in P2(R) is a linear combination of these three, namely,

(−8a + 5b + 3c)(x2 + 3x − 2) + (4a − 2b − c)(2x2 + 5x − 3)

+(−a + b + c)(−x2 − 4x + 4) = ax2 + bx + c. ♦

Example 5

The matrices (1 11 0

),

(1 10 1

),

(1 01 1

), and

(0 11 1

)generate M2×2(R) since an arbitrary matrix A in M2×2(R) can be expressedas a linear combination of the four given matrices as follows:(

a11 a12

a21 a22

)= (

13a11 +

13a12 +

13a21 − 2

3a22)

(1 11 0

)+ (

13a11 +

13a12 − 2

3a21 +

13a22)

(1 10 1

)+ (

13a11 − 2

3a12 +

13a21 +

13a22)

(1 01 1

)+ (−2

3a11 +

13a12 +

13a21 +

13a22)

(0 11 1

).

On the other hand, the matrices(1 00 1

),

(1 10 1

), and

(1 01 1

)


do not generate M2×2(R) because each of these matrices has equal diagonalentries. So any linear combination of these matrices has equal diagonal en-tries. Hence not every 2 × 2 matrix is a linear combination of these threematrices. ♦

At the beginning of this section we noted that the equation of a planethrough three noncollinear points in space, one of which is the origin, is ofthe form x = su+ tv, where u, v ∈ R3 and s and t are scalars. Thus x ∈ R3 isa linear combination of u, v ∈ R3 if and only if x lies in the plane containingu and v. (See Figure 1.5.)

�

�

��

� �u

su

x

tv v

Figure 1.5

Usually there are many different subsets that generate a subspace W. (SeeExercise 13.) It is natural to seek a subset of W that generates W and is assmall as possible. In the next section we explore the circumstances underwhich a vector can be removed from a generating set to obtain a smallergenerating set.

EXERCISES


(a) The zero vector is a linear combination of any nonempty set ofvectors.

(b) The span of ∅ is ∅.(c) If S is a subset of a vector space V, then span(S) equals the inter-

section of all subspaces of V that contain S.(d) In solving a system of linear equations, it is permissible to multiply

an equation by any constant.(e) In solving a system of linear equations, it is permissible to add any

multiple of one equation to another.(f) Every system of linear equations has a solution.


2. Solve the following systems of linear equations by the method intro-duced in this section.

(a)2x1 − 2x2 − 3x3 = −23x1 − 3x2 − 2x3 + 5x4 = 7x1 − x2 − 2x3 − x4 = −3

(b)3x1 − 7x2 + 4x3 = 10x1 − 2x2 + x3 = 3

2x1 − x2 − 2x3 = 6

(c)x1 + 2x2 − x3 + x4 = 5x1 + 4x2 − 3x3 − 3x4 = 6

2x1 + 3x2 − x3 + 4x4 = 8

(d)x1 + 2x2 + 2x3 = 2x1 + 8x3 + 5x4 = −6x1 + x2 + 5x3 + 5x4 = 3

(e)

x1 + 2x2 − 4x3 − x4 + x5 = 7−x1 + 10x3 − 3x4 − 4x5 = −162x1 + 5x2 − 5x3 − 4x4 − x5 = 24x1 + 11x2 − 7x3 − 10x4 − 2x5 = 7

(f)

x1 + 2x2 + 6x3 = −12x1 + x2 + x3 = 83x1 + x2 − x3 = 15x1 + 3x2 + 10x3 = −5

3. For each of the following lists of vectors in R3, determine whether thefirst vector can be expressed as a linear combination of the other two.

(a) (−2, 0, 3), (1, 3, 0), (2, 4,−1)(b) (1, 2,−3), (−3, 2, 1), (2,−1,−1)(c) (3, 4, 1), (1,−2, 1), (−2,−1, 1)(d) (2,−1, 0), (1, 2,−3), (1,−3, 2)(e) (5, 1,−5), (1,−2,−3), (−2, 3,−4)(f) (−2, 2, 2), (1, 2,−1), (−3,−3, 3)

4. For each list of polynomials in P3(R), determine whether the first poly-nomial can be expressed as a linear combination of the other two.

(a) x3 − 3x + 5, x3 + 2x2 − x + 1, x3 + 3x2 − 1(b) 4x3 + 2x2 − 6, x3 − 2x2 + 4x + 1, 3x3 − 6x2 + x + 4(c) −2x3 − 11x2 + 3x + 2, x3 − 2x2 + 3x − 1, 2x3 + x2 + 3x − 2(d) x3 + x2 + 2x + 13, 2x3 − 3x2 + 4x + 1, x3 − x2 + 2x + 3(e) x3 − 8x2 + 4x, x3 − 2x2 + 3x − 1, x3 − 2x + 3(f) 6x3 − 3x2 + x + 2, x3 − x2 + 2x + 3, 2x3 − 3x + 1


5. In each part, determine whether the given vector is in the span of S.

(a) (2,−1, 1), S = {(1, 0, 2), (−1, 1, 1)}(b) (−1, 2, 1), S = {(1, 0, 2), (−1, 1, 1)}(c) (−1, 1, 1, 2), S = {(1, 0, 1,−1), (0, 1, 1, 1)}(d) (2,−1, 1,−3), S = {(1, 0, 1,−1), (0, 1, 1, 1)}(e) −x3 + 2x2 + 3x + 3, S = {x3 + x2 + x + 1, x2 + x + 1, x + 1}(f) 2x3 − x2 + x + 3, S = {x3 + x2 + x + 1, x2 + x + 1, x + 1}

(g)(

1 2−3 4

), S =

{(1 0

−1 0

),

(0 10 1

),

(1 10 0

)}(h)

(1 00 1

), S =

{(1 0

−1 0

),

(0 10 1

),

(1 10 0

)}6. Show that the vectors (1, 1, 0), (1, 0, 1), and (0, 1, 1) generate F3.

7. In Fn, let ej denote the vector whose jth coordinate is 1 and whoseother coordinates are 0. Prove that {e1, e2, . . . , en} generates Fn.

8. Show that Pn(F ) is generated by {1, x, . . . , xn}.9. Show that the matrices(

1 00 0

),

(0 10 0

),

(0 01 0

), and

(0 00 1

)generate M2×2(F ).

10. Show that if

M1 =(

1 00 0

), M2 =

(0 00 1

), and M3 =

(0 11 0

),

then the span of {M1, M2, M3} is the set of all symmetric 2×2 matrices.

11.† Prove that span({x}) = {ax : a ∈ F} for any vector x in a vector space.Interpret this result geometrically in R3.

12. Show that a subset W of a vector space V is a subspace of V if and onlyif span(W) = W.

13.† Show that if S1 and S2 are subsets of a vector space V such that S1 ⊆ S2,then span(S1) ⊆ span(S2). In particular, if S1 ⊆ S2 and span(S1) = V,deduce that span(S2) = V.

14. Show that if S1 and S2 are arbitrary subsets of a vector space V, thenspan(S1∪S2) = span(S1)+span(S2). (The sum of two subsets is definedin the exercises of Section 1.3.)

Sec. 1.5 Linear Dependence and Linear Independence 35

15. Let S1 and S2 be subsets of a vector space V. Prove that span(S1∩S2) ⊆span(S1) ∩ span(S2). Give an example in which span(S1 ∩ S2) andspan(S1) ∩ span(S2) are equal and one in which they are unequal.

16. Let V be a vector space and S a subset of V with the property thatwhenever v1, v2, . . . , vn ∈ S and a1v1 + a2v2 + · · · + anvn = 0 , thena1 = a2 = · · · = an = 0. Prove that every vector in the span of S canbe uniquely written as a linear combination of vectors of S.

17. Let W be a subspace of a vector space V. Under what conditions arethere only a finite number of distinct subsets S of W such that S gen-erates W?

1.5 LINEAR DEPENDENCE AND LINEAR INDEPENDENCE

Suppose that V is a vector space over an infinite field and that W is a subspaceof V. Unless W is the zero subspace, W is an infinite set. It is desirable tofind a “small” finite subset S that generates W because we can then describeeach vector in W as a linear combination of the finite number of vectors inS. Indeed, the smaller that S is, the fewer computations that are requiredto represent vectors in W. Consider, for example, the subspace W of R3

generated by S = {u1, u2, u3, u4}, where u1 = (2,−1, 4), u2 = (1,−1, 3),u3 = (1, 1,−1), and u4 = (1,−2,−1). Let us attempt to find a proper subsetof S that also generates W. The search for this subset is related to thequestion of whether or not some vector in S is a linear combination of theother vectors in S. Now u4 is a linear combination of the other vectors in Sif and only if there are scalars a1, a2, and a3 such that

u4 = a1u1 + a2u2 + a3u3,

that is, if and only if there are scalars a1, a2, and a3 satisfying

(1,−2,−1) = (2a1 + a2 + a3,−a1 − a2 + a3, 4a1 + 3a2 − a3).

Thus u4 is a linear combination of u1, u2, and u3 if and only if the system oflinear equations

2a1 + a2 + a3 = 1−a1 − a2 + a3 = −24a1 + 3a2 − a3 = −1

has a solution. The reader should verify that no such solution exists. Thisdoes not, however, answer our question of whether some vector in S is a linearcombination of the other vectors in S. It can be shown, in fact, that u3 is alinear combination of u1, u2, and u4, namely, u3 = 2u1 − 3u2 + 0u4.


In the preceding example, checking that some vector in S is a linearcombination of the other vectors in S could require that we solve severaldifferent systems of linear equations before we determine which, if any, ofu1, u2, u3, and u4 is a linear combination of the others. By formulatingour question differently, we can save ourselves some work. Note that sinceu3 = 2u1 − 3u2 + 0u4, we have

−2u1 + 3u2 + u3 − 0u4 = 0 .

That is, because some vector in S is a linear combination of the others, thezero vector can be expressed as a linear combination of the vectors in S usingcoefficients that are not all zero. The converse of this statement is also true:If the zero vector can be written as a linear combination of the vectors in Sin which not all the coefficients are zero, then some vector in S is a linearcombination of the others. For instance, in the example above, the equation−2u1 + 3u2 + u3 − 0u4 = 0 can be solved for any vector having a nonzerocoefficient; so u1, u2, or u3 (but not u4) can be written as a linear combinationof the other three vectors. Thus, rather than asking whether some vector inS is a linear combination of the other vectors in S, it is more efficient toask whether the zero vector can be expressed as a linear combination of thevectors in S with coefficients that are not all zero. This observation leads usto the following definition.

Definition. A subset S of a vector space V is called linearly dependentif there exist a finite number of distinct vectors u1, u2, . . . , un in S and scalarsa1, a2, . . . , an, not all zero, such that

a1u1 + a2u2 + · · · + anun = 0 .

In this case we also say that the vectors of S are linearly dependent.

For any vectors u1, u2, . . . , un, we have a1u1 + a2u2 + · · · + anun = 0if a1 = a2 = · · · = an = 0. We call this the trivial representation of 0 asa linear combination of u1, u2, . . . , un. Thus, for a set to be linearly depen-dent, there must exist a nontrivial representation of 0 as a linear combinationof vectors in the set. Consequently, any subset of a vector space that con-tains the zero vector is linearly dependent, because 0 = 1 ·0 is a nontrivialrepresentation of 0 as a linear combination of vectors in the set.

Example 1

Consider the set

S = {(1, 3,−4, 2), (2, 2,−4, 0), (1,−3, 2,−4), (−1, 0, 1, 0)}in R4. We show that S is linearly dependent and then express one of thevectors in S as a linear combination of the other vectors in S. To show that


S is linearly dependent, we must find scalars a1, a2, a3, and a4, not all zero,such that

a1(1, 3,−4, 2) + a2(2, 2,−4, 0) + a3(1,−3, 2,−4) + a4(−1, 0, 1, 0) = 0 .

Finding such scalars amounts to finding a nonzero solution to the system oflinear equations

a1 + 2a2 + a3 − a4 = 03a1 + 2a2 − 3a3 = 0

−4a1 − 4a2 + 2a3 + a4 = 02a1 − 4a3 = 0.

One such solution is a1 = 4, a2 = −3, a3 = 2, and a4 = 0. Thus S is alinearly dependent subset of R4, and

4(1, 3,−4, 2) − 3(2, 2,−4, 0) + 2(1,−3, 2,−4) + 0(−1, 0, 1, 0) = 0 . ♦

Example 2

In M2×3(R), the set{(1 −3 2

−4 0 5

),

(−3 7 46 −2 −7

),

(−2 3 11−1 −3 2

)}is linearly dependent because

5(

1 −3 2−4 0 5

)+3

(−3 7 46 −2 −7

)−2

(−2 3 11−1 −3 2

)=(

0 0 00 0 0

).♦

Definition. A subset S of a vector space that is not linearly dependentis called linearly independent. As before, we also say that the vectors ofS are linearly independent.

The following facts about linearly independent sets are true in any vectorspace.

1. The empty set is linearly independent, for linearly dependent sets mustbe nonempty.

2. A set consisting of a single nonzero vector is linearly independent. Forif {u} is linearly dependent, then au = 0 for some nonzero scalar a.Thus

u = a−1(au) = a−10 = 0 .

3. A set is linearly independent if and only if the only representations of0 as linear combinations of its vectors are trivial representations.


The condition in item 3 provides a useful method for determining whethera finite set is linearly independent. This technique is illustrated in the exam-ples that follow.

Example 3

To prove that the set

S = {(1, 0, 0,−1), (0, 1, 0,−1), (0, 0, 1,−1), (0, 0, 0, 1)}is linearly independent, we must show that the only linear combination ofvectors in S that equals the zero vector is the one in which all the coefficientsare zero. Suppose that a1, a2, a3, and a4 are scalars such that

a1(1, 0, 0,−1) + a2(0, 1, 0,−1) + a3(0, 0, 1,−1) + a4(0, 0, 0, 1) = (0, 0, 0, 0).

Equating the corresponding coordinates of the vectors on the left and the rightsides of this equation, we obtain the following system of linear equations.

a1 = 0a2 = 0

a3 = 0−a1 − a2 − a3 + a4 = 0

Clearly the only solution to this system is a1 = a2 = a3 = a4 = 0, and so Sis linearly independent. ♦Example 4

For k = 0, 1, . . . , n let pk(x) = xk + xk+1 + · · · + xn. The set

{p0(x), p1(x), . . . , pn(x)}is linearly independent in Pn(F ). For if

a0p0(x) + a1p1(x) + · · · + anpn(x) = 0

for some scalars a0, a1, . . . , an, then

a0 + (a0 + a1)x + (a0 + a1 + a2)x2 + · · · + (a0 + a1 + · · · + an)xn = 0 .

By equating the coefficients of xk on both sides of this equation for k =1, 2, . . . , n, we obtain

a0 = 0a0 + a1 = 0a0 + a1 + a2 = 0

...a0 + a1 + a2 + · · · + an = 0.

Clearly the only solution to this system of linear equations is a0 = a1 = · · · =an = 0. ♦


The following important results are immediate consequences of the defi-nitions of linear dependence and linear independence.

Theorem 1.6. Let V be a vector space, and let S1 ⊆ S2 ⊆ V. If S1 islinearly dependent, then S2 is linearly dependent.

Proof. Exercise.

Corollary. Let V be a vector space, and let S1 ⊆ S2 ⊆ V. If S2 is linearlyindependent, then S1 is linearly independent.

Proof. Exercise.

Earlier in this section, we remarked that the issue of whether S is thesmallest generating set for its span is related to the question of whethersome vector in S is a linear combination of the other vectors in S. Thusthe issue of whether S is the smallest generating set for its span is relatedto the question of whether S is linearly dependent. To see why, considerthe subset S = {u1, u2, u3, u4} of R3, where u1 = (2,−1, 4), u2 = (1,−1, 3),u3 = (1, 1,−1), and u4 = (1,−2,−1). We have previously noted that S islinearly dependent; in fact,

−2u1 + 3u2 + u3 − 0u4 = 0 .

This equation implies that u3 (or alternatively, u1 or u2) is a linear combina-tion of the other vectors in S. For example, u3 = 2u1 − 3u2 + 0u4. Thereforeevery linear combination a1u1 + a2u2 + a3u3 + a4u4 of vectors in S can bewritten as a linear combination of u1, u2, and u4:

a1u1 + a2u2 + a3u3 + a4u4 = a1u1 + a2u2 + a3(2u1 − 3u2 + 0u4) + a4u4

= (a1 + 2a3)u1 + (a2 − 3a3)u2 + a4u4.

Thus the subset S′ = {u1, u2, u4} of S has the same span as S!More generally, suppose that S is any linearly dependent set containing

two or more vectors. Then some vector v ∈ S can be written as a linearcombination of the other vectors in S, and the subset obtained by removingv from S has the same span as S. It follows that if no proper subset of Sgenerates the span of S, then S must be linearly independent. Another wayto view the preceding statement is given in Theorem 1.7.

Theorem 1.7. Let S be a linearly independent subset of a vector spaceV, and let v be a vector in V that is not in S. Then S ∪ {v} is linearlydependent if and only if v ∈ span(S).


Proof. If S∪{v} is linearly dependent, then there are vectors u1, u2, . . . , un

in S ∪ {v} such that a1u1 + a2u2 + · · · + anun = 0 for some nonzero scalarsa1, a2, . . . , an. Because S is linearly independent, one of the ui’s, say u1,equals v. Thus a1v + a2u2 + · · · + anun = 0 , and so

v = a−11 (−a2u2 − · · · − anun) = −(a−1

1 a2)u2 − · · · − (a−11 an)un.

Since v is a linear combination of u2, . . . , un, which are in S, we have v ∈span(S).

Conversely, let v ∈ span(S). Then there exist vectors v1, v2, . . . , vm in Sand scalars b1, b2, . . . , bm such that v = b1v1 + b2v2 + · · · + bmvm. Hence

0 = b1v1 + b2v2 + · · · + bmvm + (−1)v.

Since v �= vi for i = 1, 2, . . . , m, the coefficient of v in this linear combinationis nonzero, and so the set {v1, v2, . . . , vm, v} is linearly dependent. ThereforeS ∪ {v} is linearly dependent by Theorem 1.6.

Linearly independent generating sets are investigated in detail in Sec-tion 1.6.

EXERCISES


(a) If S is a linearly dependent set, then each vector in S is a linearcombination of other vectors in S.

(b) Any set containing the zero vector is linearly dependent.(c) The empty set is linearly dependent.(d) Subsets of linearly dependent sets are linearly dependent.(e) Subsets of linearly independent sets are linearly independent.(f) If a1x1 + a2x2 + · · · + anxn = 0 and x1, x2, . . . , xn are linearly

independent, then all the scalars ai are zero.

2.3 Determine whether the following sets are linearly dependent or linearlyindependent.

(a){(

1 −3−2 4

),

(−2 64 −8

)}in M2×2(R)

(b){(

1 −2−1 4

),

(−1 12 −4

)}in M2×2(R)

(c) {x3 + 2x2,−x2 + 3x + 1, x3 − x2 + 2x − 1} in P3(R)3The computations in Exercise 2(g), (h), (i), and (j) are tedious unless technology is

used.


(d) {x3 − x, 2x2 + 4,−2x3 + 3x2 + 2x + 6} in P3(R)(e) {(1,−1, 2), (1,−2, 1), (1, 1, 4)} in R3

(f) {(1,−1, 2), (2, 0, 1), (−1, 2,−1)} in R3

(g){(

1 0−2 1

),

(0 −11 1

),

(−1 21 0

),

(2 1

−4 4

)}in M2×2(R)

(h){(

1 0−2 1

),

(0 −11 1

),

(−1 21 0

),

(2 12 −2

)}in M2×2(R)

(i) {x4 − x3 + 5x2 − 8x + 6,−x4 + x3 − 5x2 + 5x − 3,x4 +3x2 −3x+5, 2x4 +3x3 +4x2 −x+1, x3 −x+2} in P4(R)

(j) {x4 − x3 + 5x2 − 8x + 6,−x4 + x3 − 5x2 + 5x − 3,x4 + 3x2 − 3x + 5, 2x4 + x3 + 4x2 + 8x} in P4(R)

3. In M2×3(F ), prove that the set⎧⎨⎩⎛⎝1 1

0 00 0

⎞⎠ ,

⎛⎝0 01 10 0

⎞⎠ ,

⎛⎝0 00 01 1

⎞⎠ ,

⎛⎝1 01 01 0

⎞⎠ ,

⎛⎝0 10 10 1

⎞⎠⎫⎬⎭is linearly dependent.

4. In Fn, let ej denote the vector whose jth coordinate is 1 and whose othercoordinates are 0. Prove that {e1, e2, · · · , en} is linearly independent.

5. Show that the set {1, x, x2, . . . , xn} is linearly independent in Pn(F ).

6. In Mm×n(F ), let Eij denote the matrix whose only nonzero entry is 1 inthe ith row and jth column. Prove that {Eij : 1 ≤ i ≤ m, 1 ≤ j ≤ n}is linearly independent.

7. Recall from Example 3 in Section 1.3 that the set of diagonal matrices inM2×2(F ) is a subspace. Find a linearly independent set that generatesthis subspace.

8. Let S = {(1, 1, 0), (1, 0, 1), (0, 1, 1)} be a subset of the vector space F3.

(a) Prove that if F = R, then S is linearly independent.(b) Prove that if F has characteristic 2, then S is linearly dependent.

9.† Let u and v be distinct vectors in a vector space V. Show that {u, v} islinearly dependent if and only if u or v is a multiple of the other.

10. Give an example of three linearly dependent vectors in R3 such thatnone of the three is a multiple of another.


11. Let S = {u1, u2, . . . , un} be a linearly independent subset of a vectorspace V over the field Z2. How many vectors are there in span(S)?Justify your answer.

12. Prove Theorem 1.6 and its corollary.

13. Let V be a vector space over a field of characteristic not equal to two.

(a) Let u and v be distinct vectors in V. Prove that {u, v} is linearlyindependent if and only if {u + v, u − v} is linearly independent.

(b) Let u, v, and w be distinct vectors in V. Prove that {u, v, w} islinearly independent if and only if {u + v, u + w, v + w} is linearlyindependent.

14. Prove that a set S is linearly dependent if and only if S = {0} orthere exist distinct vectors v, u1, u2, . . . , un in S such that v is a linearcombination of u1, u2, . . . , un.

15. Let S = {u1, u2, . . . , un} be a finite set of vectors. Prove that S islinearly dependent if and only if u1 = 0 or uk+1 ∈ span({u1, u2, . . . , uk})for some k (1 ≤ k < n).

16. Prove that a set S of vectors is linearly independent if and only if eachfinite subset of S is linearly independent.

17. Let M be a square upper triangular matrix (as defined in Exercise 12of Section 1.3) with nonzero diagonal entries. Prove that the columnsof M are linearly independent.

18. Let S be a set of nonzero polynomials in P(F ) such that no two havethe same degree. Prove that S is linearly independent.

19. Prove that if {A1, A2, . . . , Ak} is a linearly independent subset ofMn×n(F ), then {At

1, At2, . . . , At

k} is also linearly independent.

20. Let f, g,∈ F(R, R) be the functions defined by f(t) = ert and g(t) = est,where r �= s. Prove that f and g are linearly independent in F(R, R).

1.6 BASES AND DIMENSION

We saw in Section 1.5 that if S is a generating set for a subspace W andno proper subset of S is a generating set for W, then S must be linearlyindependent. A linearly independent generating set for W possesses a veryuseful property—every vector in W can be expressed in one and only one wayas a linear combination of the vectors in the set. (This property is provedbelow in Theorem 1.8.) It is this property that makes linearly independentgenerating sets the building blocks of vector spaces.

Sec. 1.6 Bases and Dimension 43

Definition. A basis β for a vector space V is a linearly independentsubset of V that generates V. If β is a basis for V, we also say that thevectors of β form a basis for V.

Example 1

Recalling that span(∅) = {0} and ∅ is linearly independent, we see that ∅

is a basis for the zero vector space. ♦Example 2

In Fn, let e1 = (1, 0, 0, . . . , 0), e2 = (0, 1, 0, . . . , 0), . . . , en = (0, 0, . . . , 0, 1);{e1, e2, . . . , en} is readily seen to be a basis for Fn and is called the standardbasis for Fn. ♦Example 3

In Mm×n(F ), let Eij denote the matrix whose only nonzero entry is a 1 inthe ith row and jth column. Then {Eij : 1 ≤ i ≤ m, 1 ≤ j ≤ n} is a basis forMm×n(F ). ♦Example 4

In Pn(F ) the set {1, x, x2, . . . , xn} is a basis. We call this basis the standardbasis for Pn(F ). ♦Example 5

In P(F ) the set {1, x, x2, . . .} is a basis. ♦Observe that Example 5 shows that a basis need not be finite. In fact,

later in this section it is shown that no basis for P(F ) can be finite. Hencenot every vector space has a finite basis.

The next theorem, which is used frequently in Chapter 2, establishes themost significant property of a basis.

Theorem 1.8. Let V be a vector space and β = {u1, u2, . . . , un} be asubset of V. Then β is a basis for V if and only if each v ∈ V can be uniquelyexpressed as a linear combination of vectors of β, that is, can be expressed inthe form

v = a1u1 + a2u2 + · · · + anun

for unique scalars a1, a2, . . . , an.

Proof. Let β be a basis for V. If v ∈ V, then v ∈ span(β) becausespan(β) = V. Thus v is a linear combination of the vectors of β. Supposethat

v = a1u1 + a2u2 + · · · + anun and v = b1u1 + b2u2 + · · · + bnun


are two such representations of v. Subtracting the second equation from thefirst gives

0 = (a1 − b1)u1 + (a2 − b2)u2 + · · · + (an − bn)un.

Since β is linearly independent, it follows that a1 − b1 = a2 − b2 = · · · =an − bn = 0. Hence a1 = b1, a2 = b2, · · · , an = bn, and so v is uniquelyexpressible as a linear combination of the vectors of β.

The proof of the converse is an exercise.

Theorem 1.8 shows that if the vectors u1, u2, . . . , un form a basis for avector space V, then every vector in V can be uniquely expressed in the form

v = a1u1 + a2u2 + · · · + anun

for appropriately chosen scalars a1, a2, . . . , an. Thus v determines a uniquen-tuple of scalars (a1, a2, . . . , an) and, conversely, each n-tuple of scalars de-termines a unique vector v ∈ V by using the entries of the n-tuple as thecoefficients of a linear combination of u1, u2, . . . , un. This fact suggests thatV is like the vector space Fn, where n is the number of vectors in the basisfor V. We see in Section 2.4 that this is indeed the case.

In this book, we are primarily interested in vector spaces having finitebases. Theorem 1.9 identifies a large class of vector spaces of this type.

Theorem 1.9. If a vector space V is generated by a finite set S, thensome subset of S is a basis for V. Hence V has a finite basis.

Proof. If S = ∅ or S = {0}, then V = {0} and ∅ is a subset of S that is abasis for V. Otherwise S contains a nonzero vector u1. By item 2 on page 37,{u1} is a linearly independent set. Continue, if possible, choosing vectorsu2, . . . , uk in S such that {u1, u2, . . . , uk} is linearly independent. Since S isa finite set, we must eventually reach a stage at which β = {u1, u2, . . . , uk} isa linearly independent subset of S, but adjoining to β any vector in S not in βproduces a linearly dependent set. We claim that β is a basis for V. Becauseβ is linearly independent by construction, it suffices to show that β spans V.By Theorem 1.5 (p. 30) we need to show that S ⊆ span(β). Let v ∈ S. Ifv ∈ β, then clearly v ∈ span(β). Otherwise, if v /∈ β, then the precedingconstruction shows that β ∪ {v} is linearly dependent. So v ∈ span(β) byTheorem 1.7 (p. 39). Thus S ⊆ span(β).

Because of the method by which the basis β was obtained in the proofof Theorem 1.9, this theorem is often remembered as saying that a finitespanning set for V can be reduced to a basis for V. This method is illustratedin the next example.


Example 6

Let

S = {(2,−3, 5), (8,−12, 20), (1, 0,−2), (0, 2,−1), (7, 2, 0)}.It can be shown that S generates R3. We can select a basis for R3 thatis a subset of S by the technique used in proving Theorem 1.9. To start,select any nonzero vector in S, say (2,−3, 5), to be a vector in the basis.Since 4(2,−3, 5) = (8,−12, 20), the set {(2, 3,−5), (8,−12, 20)} is linearlydependent by Exercise 9 of Section 1.5. Hence we do not include (8,−12, 20)in our basis. On the other hand, (1, 0,−2) is not a multiple of (2,−3, 5) andvice versa, so that the set {(2,−3, 5), (1, 0,−2)} is linearly independent. Thuswe include (1, 0,−2) as part of our basis.

Now we consider the set {(2,−3, 5), (1, 0,−2), (0, 2,−1)} obtained by ad-joining another vector in S to the two vectors that we have already includedin our basis. As before, we include (0, 2,−1) in our basis or exclude it fromthe basis according to whether {(2,−3, 5), (1, 0,−2), (0, 2,−1)} is linearly in-dependent or linearly dependent. An easy calculation shows that this set islinearly independent, and so we include (0, 2,−1) in our basis. In a similarfashion the final vector in S is included or excluded from our basis accordingto whether the set

{(2,−3, 5), (1, 0,−2), (0, 2,−1), (7, 2, 0)}is linearly independent or linearly dependent. Because

2(2,−3, 5) + 3(1, 0,−2) + 4(0, 2,−1) − (7, 2, 0) = (0, 0, 0),

we exclude (7, 2, 0) from our basis. We conclude that

{(2,−3, 5), (1, 0,−2), (0, 2,−1)}is a subset of S that is a basis for R3. ♦

The corollaries of the following theorem are perhaps the most significantresults in Chapter 1.

Theorem 1.10 (Replacement Theorem). Let V be a vector spacethat is generated by a set G containing exactly n vectors, and let L be alinearly independent subset of V containing exactly m vectors. Then m ≤ nand there exists a subset H of G containing exactly n − m vectors such thatL ∪ H generates V.

Proof. The proof is by mathematical induction on m. The induction beginswith m = 0; for in this case L = ∅, and so taking H = G gives the desiredresult.


Now suppose that the theorem is true for some integer m ≥ 0. We provethat the theorem is true for m + 1. Let L = {v1, v2, . . . , vm+1} be a linearlyindependent subset of V consisting of m + 1 vectors. By the corollary toTheorem 1.6 (p. 39), {v1, v2, . . . , vm} is linearly independent, and so we mayapply the induction hypothesis to conclude that m ≤ n and that there is asubset {u1, u2, . . . , un−m} of G such that {v1, v2, . . . , vm}∪{u1, u2, . . . , un−m}generates V. Thus there exist scalars a1, a2, . . . , am, b1, b2, . . . , bn−m such that

a1v1 + a2v2 + · · · + amvm + b1u1 + b2u2 + · · · + bn−mun−m = vm+1. (9)

Note that n −m > 0, lest vm+1 be a linear combination of v1, v2, . . . , vm,which by Theorem 1.7 (p. 39) contradicts the assumption that L is linearlyindependent. Hence n > m; that is, n ≥ m + 1. Moreover, some bi, say b1, isnonzero, for otherwise we obtain the same contradiction. Solving (9) for u1

gives

u1 = (−b−11 a1)v1 + (−b−1

1 a2)v2 + · · · + (−b−11 am)vm + (b−1

1 )vm+1

+ (−b−11 b2)u2 + · · · + (−b−1

1 bn−m)un−m.

Let H = {u2, . . . , un−m}. Then u1 ∈ span(L∪H), and because v1, v2, . . . , vm,u2, . . . , un−m are clearly in span(L ∪ H), it follows that

{v1, v2, . . . , vm, u1, u2, . . . , un−m} ⊆ span(L ∪ H).

Because {v1, v2, . . . , vm, u1, u2, . . . , un−m} generates V, Theorem 1.5 (p. 30)implies that span(L ∪ H) = V. Since H is a subset of G that contains(n − m) − 1 = n − (m + 1) vectors, the theorem is true for m + 1. Thiscompletes the induction.

Corollary 1. Let V be a vector space having a finite basis. Then everybasis for V contains the same number of vectors.

Proof. Suppose that β is a finite basis for V that contains exactly n vectors,and let γ be any other basis for V. If γ contains more than n vectors, thenwe can select a subset S of γ containing exactly n + 1 vectors. Since S islinearly independent and β generates V, the replacement theorem implies thatn+1 ≤ n, a contradiction. Therefore γ is finite, and the number m of vectorsin γ satisfies m ≤ n. Reversing the roles of β and γ and arguing as above, weobtain n ≤ m. Hence m = n.

If a vector space has a finite basis, Corollary 1 asserts that the numberof vectors in any basis for V is an intrinsic property of V. This fact makespossible the following important definitions.

Definitions. A vector space is called finite-dimensional if it has abasis consisting of a finite number of vectors. The unique number of vectors


in each basis for V is called the dimension of V and is denoted by dim(V).A vector space that is not finite-dimensional is called infinite-dimensional.

The following results are consequences of Examples 1 through 4.

Example 7

The vector space {0} has dimension zero. ♦Example 8

The vector space Fn has dimension n. ♦Example 9

The vector space Mm×n(F ) has dimension mn. ♦Example 10

The vector space Pn(F ) has dimension n + 1. ♦The following examples show that the dimension of a vector space depends

on its field of scalars.

Example 11

Over the field of complex numbers, the vector space of complex numbers hasdimension 1. (A basis is {1}.) ♦Example 12

Over the field of real numbers, the vector space of complex numbers hasdimension 2. (A basis is {1, i}.) ♦

In the terminology of dimension, the first conclusion in the replacementtheorem states that if V is a finite-dimensional vector space, then no linearlyindependent subset of V can contain more than dim(V) vectors. From thisfact it follows that the vector space P(F ) is infinite-dimensional because ithas an infinite linearly independent set, namely {1, x, x2, . . .}. This set is,in fact, a basis for P(F ). Yet nothing that we have proved in this sectionguarantees an infinite-dimensional vector space must have a basis. In Section1.7 it is shown, however, that every vector space has a basis.

Just as no linearly independent subset of a finite-dimensional vector spaceV can contain more than dim(V) vectors, a corresponding statement can bemade about the size of a generating set.

Corollary 2. Let V be a vector space with dimension n.(a) Any finite generating set for V contains at least n vectors, and a gener-

ating set for V that contains exactly n vectors is a basis for V.


(b) Any linearly independent subset of V that contains exactly n vectors isa basis for V.

(c) Every linearly independent subset of V can be extended to a basis forV.

Proof. Let β be a basis for V.(a) Let G be a finite generating set for V. By Theorem 1.9 some subset H

of G is a basis for V. Corollary 1 implies that H contains exactly n vectors.Since a subset of G contains n vectors, G must contain at least n vectors.Moreover, if G contains exactly n vectors, then we must have H = G, so thatG is a basis for V.

(b) Let L be a linearly independent subset of V containing exactly nvectors. It follows from the replacement theorem that there is a subset H ofβ containing n − n = 0 vectors such that L ∪ H generates V. Thus H = ∅,and L generates V. Since L is also linearly independent, L is a basis for V.

(c) If L is a linearly independent subset of V containing m vectors, thenthe replacement theorem asserts that there is a subset H of β containingexactly n − m vectors such that L ∪ H generates V. Now L ∪ H contains atmost n vectors; therefore (a) implies that L ∪ H contains exactly n vectorsand that L ∪ H is a basis for V.

Example 13

It follows from Example 4 of Section 1.4 and (a) of Corollary 2 that

{x2 + 3x − 2, 2x2 + 5x − 3,−x2 − 4x + 4}

is a basis for P2(R). ♦

Example 14

It follows from Example 5 of Section 1.4 and (a) of Corollary 2 that{(1 11 0

),

(1 10 1

),

(1 01 1

),

(0 11 1

)}is a basis for M2×2(R). ♦

Example 15

It follows from Example 3 of Section 1.5 and (b) of Corollary 2 that

{(1, 0, 0,−1), (0, 1, 0,−1), (0, 0, 1,−1), (0, 0, 0, 1)}

is a basis for R4. ♦


Example 16

For k = 0, 1, . . . , n, let pk(x) = xk+xk+1+· · ·+xn. It follows from Example 4of Section 1.5 and (b) of Corollary 2 that

{p0(x), p1(x), . . . , pn(x)}

is a basis for Pn(F ). ♦

A procedure for reducing a generating set to a basis was illustrated inExample 6. In Section 3.4, when we have learned more about solving systemsof linear equations, we will discover a simpler method for reducing a gener-ating set to a basis. This procedure also can be used to extend a linearlyindependent set to a basis, as (c) of Corollary 2 asserts is possible.

An Overview of Dimension and Its Consequences

Theorem 1.9 as well as the replacement theorem and its corollaries containa wealth of information about the relationships among linearly independentsets, bases, and generating sets. For this reason, we summarize here the mainresults of this section in order to put them into better perspective.

A basis for a vector space V is a linearly independent subset of V thatgenerates V. If V has a finite basis, then every basis for V contains the samenumber of vectors. This number is called the dimension of V, and V is saidto be finite-dimensional. Thus if the dimension of V is n, every basis for Vcontains exactly n vectors. Moreover, every linearly independent subset ofV contains no more than n vectors and can be extended to a basis for Vby including appropriately chosen vectors. Also, each generating set for Vcontains at least n vectors and can be reduced to a basis for V by excludingappropriately chosen vectors. The Venn diagram in Figure 1.6 depicts theserelationships.

.........

..........

......................................................................................

............................

.......................................

.......................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................

.......................................

...................................................................................................................................... .........

..........

......................................................................................

............................

.......................................

.......................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................

.......................................

......................................................................................................................................

Linearlyindependent

sets

GeneratingsetsBases

Figure 1.6


The Dimension of Subspaces

Our next result relates the dimension of a subspace to the dimension ofthe vector space that contains it.

Theorem 1.11. Let W be a subspace of a finite-dimensional vector spaceV. Then W is finite-dimensional and dim(W) ≤ dim(V). Moreover, ifdim(W) = dim(V), then V = W.

Proof. Let dim(V) = n. If W = {0}, then W is finite-dimensional anddim(W) = 0 ≤ n. Otherwise, W contains a nonzero vector x1; so {x1} is alinearly independent set. Continue choosing vectors, x1, x2, . . . , xk in W suchthat {x1, x2, . . . , xk} is linearly independent. Since no linearly independentsubset of V can contain more than n vectors, this process must stop at astage where k ≤ n and {x1, x2, . . . , xk} is linearly independent but adjoiningany other vector from W produces a linearly dependent set. Theorem 1.7(p. 39) implies that {x1, x2, . . . , xk} generates W, and hence it is a basis forW. Therefore dim(W) = k ≤ n.

If dim(W) = n, then a basis for W is a linearly independent subset of Vcontaining n vectors. But Corollary 2 of the replacement theorem impliesthat this basis for W is also a basis for V; so W = V.

Example 17

Let

W = {(a1, a2, a3, a4, a5) ∈ F5 : a1 + a3 + a5 = 0, a2 = a4}.It is easily shown that W is a subspace of F5 having

{(−1, 0, 1, 0, 0), (−1, 0, 0, 0, 1), (0, 1, 0, 1, 0)}as a basis. Thus dim(W) = 3. ♦Example 18

The set of diagonal n×n matrices is a subspace W of Mn×n(F ) (see Example 3of Section 1.3). A basis for W is

{E11, E22, . . . , Enn},where Eij is the matrix in which the only nonzero entry is a 1 in the ith rowand jth column. Thus dim(W) = n. ♦Example 19

We saw in Section 1.3 that the set of symmetric n×n matrices is a subspaceW of Mn×n(F ). A basis for W is

{Aij : 1 ≤ i ≤ j ≤ n},


where Aij is the n × n matrix having 1 in the ith row and jth column, 1 inthe jth row and ith column, and 0 elsewhere. It follows that

dim(W) = n + (n − 1) + · · · + 1 =12n(n + 1). ♦

Corollary. If W is a subspace of a finite-dimensional vector space V, thenany basis for W can be extended to a basis for V.

Proof. Let S be a basis for W. Because S is a linearly independent subset ofV, Corollary 2 of the replacement theorem guarantees that S can be extendedto a basis for V.

Example 20

The set of all polynomials of the form

a18x18 + a16x

16 + · · · + a2x2 + a0,

where a18, a16, . . . , a2, a0 ∈ F , is a subspace W of P18(F ). A basis for W is{1, x2, . . . , x16, x18}, which is a subset of the standard basis for P18(F ). ♦

We can apply Theorem 1.11 to determine the subspaces of R2 and R3.Since R2 has dimension 2, subspaces of R2 can be of dimensions 0, 1, or 2only. The only subspaces of dimension 0 or 2 are {0} and R2, respectively.Any subspace of R2 having dimension 1 consists of all scalar multiples of somenonzero vector in R2 (Exercise 11 of Section 1.4).

If a point of R2 is identified in the natural way with a point in the Euclideanplane, then it is possible to describe the subspaces of R2 geometrically: Asubspace of R2 having dimension 0 consists of the origin of the Euclideanplane, a subspace of R2 with dimension 1 consists of a line through the origin,and a subspace of R2 having dimension 2 is the entire Euclidean plane.

Similarly, the subspaces of R3 must have dimensions 0, 1, 2, or 3. Inter-preting these possibilities geometrically, we see that a subspace of dimensionzero must be the origin of Euclidean 3-space, a subspace of dimension 1 isa line through the origin, a subspace of dimension 2 is a plane through theorigin, and a subspace of dimension 3 is Euclidean 3-space itself.

The Lagrange Interpolation Formula

Corollary 2 of the replacement theorem can be applied to obtain a usefulformula. Let c0, c1, . . . , cn be distinct scalars in an infinite field F . Thepolynomials f0(x), f1(x), . . . , fn(x) defined by

fi(x) =(x − c0) · · · (x − ci−1)(x − ci+1) · · · (x − cn)

(ci − c0) · · · (ci − ci−1)(ci − ci+1) · · · (ci − cn)=

n∏k=0k �=i

x − ck

ci − ck


are called the Lagrange polynomials (associated with c0, c1, . . . , cn). Notethat each fi(x) is a polynomial of degree n and hence is in Pn(F ). By re-garding fi(x) as a polynomial function fi : F → F , we see that

fi(cj) =

{0 if i �= j

1 if i = j.(10)

This property of Lagrange polynomials can be used to show that β ={f0, f1, . . . , fn} is a linearly independent subset of Pn(F ). Suppose that

n∑i=0

aifi = 0 for some scalars a0, a1, . . . , an,

where 0 denotes the zero function. Then

n∑i=0

aifi(cj) = 0 for j = 0, 1, . . . , n.

But also

n∑i=0

aifi(cj) = aj

by (10). Hence aj = 0 for j = 0, 1, . . . , n; so β is linearly independent. Sincethe dimension of Pn(F ) is n+1, it follows from Corollary 2 of the replacementtheorem that β is a basis for Pn(F ).

Because β is a basis for Pn(F ), every polynomial function g in Pn(F ) is alinear combination of polynomial functions of β, say,

g =n∑

i=0

bifi.

It follows that

g(cj) =n∑

i=0

bifi(cj) = bj ;

so

g =n∑

i=0

g(ci)fi

is the unique representation of g as a linear combination of elements of β.This representation is called the Lagrange interpolation formula. Notice


that the preceding argument shows that if b0, b1, . . . , bn are any n + 1 scalarsin F (not necessarily distinct), then the polynomial function

g =n∑

i=0

bifi

is the unique polynomial in Pn(F ) such that g(cj) = bj . Thus we have foundthe unique polynomial of degree not exceeding n that has specified valuesbj at given points cj in its domain (j = 0, 1, . . . , n). For example, let usconstruct the real polynomial g of degree at most 2 whose graph contains thepoints (1, 8), (2, 5), and (3,−4). (Thus, in the notation above, c0 = 1, c1 = 2,c2 = 3, b0 = 8, b1 = 5, and b2 = −4.) The Lagrange polynomials associatedwith c0, c1, and c2 are

f0(x) =(x − 2)(x − 3)(1 − 2)(1 − 3)

=12(x2 − 5x + 6),

f1(x) =(x − 1)(x − 3)(2 − 1)(2 − 3)

= −1(x2 − 4x + 3),

and

f2(x) =(x − 1)(x − 2)(3 − 1)(3 − 2)

=12(x2 − 3x + 2).

Hence the desired polynomial is

g(x) =2∑

i=0

bifi(x) = 8f0(x) + 5f1(x) − 4f2(x)

= 4(x2 − 5x + 6) − 5(x2 − 4x + 3) − 2(x2 − 3x + 2)

= −3x2 + 6x + 5.

An important consequence of the Lagrange interpolation formula is the fol-lowing result: If f ∈ Pn(F ) and f(ci) = 0 for n+1 distinct scalars c0, c1, . . . , cn

in F , then f is the zero function.

EXERCISES


(a) The zero vector space has no basis.(b) Every vector space that is generated by a finite set has a basis.(c) Every vector space has a finite basis.(d) A vector space cannot have more than one basis.


(e) If a vector space has a finite basis, then the number of vectors inevery basis is the same.

(f) The dimension of Pn(F ) is n.(g) The dimension of Mm×n(F ) is m + n.(h) Suppose that V is a finite-dimensional vector space, that S1 is a

linearly independent subset of V, and that S2 is a subset of V thatgenerates V. Then S1 cannot contain more vectors than S2.

(i) If S generates the vector space V, then every vector in V can bewritten as a linear combination of vectors in S in only one way.

(j) Every subspace of a finite-dimensional space is finite-dimensional.(k) If V is a vector space having dimension n, then V has exactly one

subspace with dimension 0 and exactly one subspace with dimen-sion n.

(l) If V is a vector space having dimension n, and if S is a subset ofV with n vectors, then S is linearly independent if and only if Sspans V.

2. Determine which of the following sets are bases for R3.

(a) {(1, 0,−1), (2, 5, 1), (0,−4, 3)}(b) {(2,−4, 1), (0, 3,−1), (6, 0,−1)}(c) {(1, 2,−1), (1, 0, 2), (2, 1, 1)}(d) {(−1, 3, 1), (2,−4,−3), (−3, 8, 2)}(e) {(1,−3,−2), (−3, 1, 3), (−2,−10,−2)}

3. Determine which of the following sets are bases for P2(R).

(a) {−1 − x + 2x2, 2 + x − 2x2, 1 − 2x + 4x2}(b) {1 + 2x + x2, 3 + x2, x + x2}(c) {1 − 2x − 2x2,−2 + 3x − x2, 1 − x + 6x2}(d) {−1 + 2x + 4x2, 3 − 4x − 10x2,−2 − 5x − 6x2}(e) {1 + 2x − x2, 4 − 2x + x2,−1 + 18x − 9x2}

4. Do the polynomials x3−2x2+1, 4x2−x+3, and 3x−2 generate P3(R)?Justify your answer.

5. Is {(1, 4,−6), (1, 5, 8), (2, 1, 1), (0, 1, 0)} a linearly independent subset ofR3? Justify your answer.

6. Give three different bases for F2 and for M2×2(F ).

7. The vectors u1 = (2,−3, 1), u2 = (1, 4,−2), u3 = (−8, 12,−4), u4 =(1, 37,−17), and u5 = (−3,−5, 8) generate R3. Find a subset of the set{u1, u2, u3, u4, u5} that is a basis for R3.


8. Let W denote the subspace of R5 consisting of all the vectors havingcoordinates that sum to zero. The vectors

u1 = (2,−3, 4,−5, 2), u2 = (−6, 9,−12, 15,−6),u3 = (3,−2, 7,−9, 1), u4 = (2,−8, 2,−2, 6),u5 = (−1, 1, 2, 1,−3), u6 = (0,−3,−18, 9, 12),u7 = (1, 0,−2, 3,−2), u8 = (2,−1, 1,−9, 7)

generate W. Find a subset of the set {u1, u2, . . . , u8} that is a basis forW.

9. The vectors u1 = (1, 1, 1, 1), u2 = (0, 1, 1, 1), u3 = (0, 0, 1, 1), andu4 = (0, 0, 0, 1) form a basis for F4. Find the unique representationof an arbitrary vector (a1, a2, a3, a4) in F4 as a linear combination ofu1, u2, u3, and u4.

10. In each part, use the Lagrange interpolation formula to construct thepolynomial of smallest degree whose graph contains the following points.

(a) (−2,−6), (−1, 5), (1, 3)(b) (−4, 24), (1, 9), (3, 3)(c) (−2, 3), (−1,−6), (1, 0), (3,−2)(d) (−3,−30), (−2, 7), (0, 15), (1, 10)

11. Let u and v be distinct vectors of a vector space V. Show that if {u, v}is a basis for V and a and b are nonzero scalars, then both {u + v, au}and {au, bv} are also bases for V.

12. Let u, v, and w be distinct vectors of a vector space V. Show that if{u, v, w} is a basis for V, then {u+v +w, v +w, w} is also a basis for V.

13. The set of solutions to the system of linear equations

x1 − 2x2 + x3 = 02x1 − 3x2 + x3 = 0

is a subspace of R3. Find a basis for this subspace.

14. Find bases for the following subspaces of F5:

W1 = {(a1, a2, a3, a4, a5) ∈ F5 : a1 − a3 − a4 = 0}

and

W2 = {(a1, a2, a3, a4, a5) ∈ F5 : a2 = a3 = a4 and a1 + a5 = 0}.

What are the dimensions of W1 and W2?


15. The set of all n×n matrices having trace equal to zero is a subspace Wof Mn×n(F ) (see Example 4 of Section 1.3). Find a basis for W. Whatis the dimension of W?

16. The set of all upper triangular n × n matrices is a subspace W ofMn×n(F ) (see Exercise 12 of Section 1.3). Find a basis for W. What isthe dimension of W?

17. The set of all skew-symmetric n × n matrices is a subspace W ofMn×n(F ) (see Exercise 28 of Section 1.3). Find a basis for W. What isthe dimension of W?

18. Find a basis for the vector space in Example 5 of Section 1.2. Justifyyour answer.

19. Complete the proof of Theorem 1.8.

20.† Let V be a vector space having dimension n, and let S be a subset of Vthat generates V.

(a) Prove that there is a subset of S that is a basis for V. (Be carefulnot to assume that S is finite.)

(b) Prove that S contains at least n vectors.

21. Prove that a vector space is infinite-dimensional if and only if it containsan infinite linearly independent subset.

22. Let W1 and W2 be subspaces of a finite-dimensional vector space V.Determine necessary and sufficient conditions on W1 and W2 so thatdim(W1 ∩ W2) = dim(W1).

23. Let v1, v2, . . . , vk, v be vectors in a vector space V, and define W1 =span({v1, v2, . . . , vk}), and W2 = span({v1, v2, . . . , vk, v}).(a) Find necessary and sufficient conditions on v such that dim(W1) =

dim(W2).(b) State and prove a relationship involving dim(W1) and dim(W2) in

the case that dim(W1) �= dim(W2).

24. Let f(x) be a polynomial of degree n in Pn(R). Prove that for anyg(x) ∈ Pn(R) there exist scalars c0, c1, . . . , cn such that

g(x) = c0f(x) + c1f′(x) + c2f

′′(x) + · · · + cnf (n)(x),

where f (n)(x) denotes the nth derivative of f(x).

25. Let V, W, and Z be as in Exercise 21 of Section 1.2. If V and W arevector spaces over F of dimensions m and n, determine the dimensionof Z.


26. For a fixed a ∈ R, determine the dimension of the subspace of Pn(R)defined by {f ∈ Pn(R) : f(a) = 0}.

27. Let W1 and W2 be the subspaces of P(F ) defined in Exercise 25 inSection 1.3. Determine the dimensions of the subspaces W1 ∩ Pn(F )and W2 ∩ Pn(F ).

28. Let V be a finite-dimensional vector space over C with dimension n.Prove that if V is now regarded as a vector space over R, then dimV =2n. (See Examples 11 and 12.)

Exercises 29–34 require knowledge of the sum and direct sum of subspaces,as defined in the exercises of Section 1.3.

29. (a) Prove that if W1 and W2 are finite-dimensional subspaces of avector space V, then the subspace W1 + W2 is finite-dimensional,and dim(W1 + W2) = dim(W1) + dim(W2)− dim(W1 ∩W2). Hint:Start with a basis {u1, u2, . . . , uk} for W1 ∩ W2 and extend thisset to a basis {u1, u2, . . . , uk, v1, v2, . . . vm} for W1 and to a basis{u1, u2, . . . , uk, w1, w2, . . . wp} for W2.

(b) Let W1 and W2 be finite-dimensional subspaces of a vector spaceV, and let V = W1 + W2. Deduce that V is the direct sum of W1

and W2 if and only if dim(V) = dim(W1) + dim(W2).

30. Let

V = M2×2(F ), W1 ={(

a bc a

)∈ V : a, b, c ∈ F

},

and

W2 ={(

0 a−a b

)∈ V : a, b ∈ F

}.

Prove that W1 and W2 are subspaces of V, and find the dimensions ofW1, W2, W1 + W2, and W1 ∩ W2.

31. Let W1 and W2 be subspaces of a vector space V having dimensions mand n, respectively, where m ≥ n.

(a) Prove that dim(W1 ∩ W2) ≤ n.(b) Prove that dim(W1 + W2) ≤ m + n.

32. (a) Find an example of subspaces W1 and W2 of R3 with dimensionsm and n, where m > n > 0, such that dim(W1 ∩ W2) = n.

(b) Find an example of subspaces W1 and W2 of R3 with dimensionsm and n, where m > n > 0, such that dim(W1 + W2) = m + n.


(c) Find an example of subspaces W1 and W2 of R3 with dimensionsm and n, where m ≥ n, such that both dim(W1 ∩ W2) < n anddim(W1 + W2) < m + n.

33. (a) Let W1 and W2 be subspaces of a vector space V such that V =W1⊕W2. If β1 and β2 are bases for W1 and W2, respectively, showthat β1 ∩ β2 = ∅ and β1 ∪ β2 is a basis for V.

(b) Conversely, let β1 and β2 be disjoint bases for subspaces W1 andW2, respectively, of a vector space V. Prove that if β1 ∪ β2 is abasis for V, then V = W1 ⊕ W2.

34. (a) Prove that if W1 is any subspace of a finite-dimensional vectorspace V, then there exists a subspace W2 of V such that V =W1 ⊕ W2.

(b) Let V = R2 and W1 = {(a1, 0) : a1 ∈ R}. Give examples of twodifferent subspaces W2 and W′

2 such that V = W1 ⊕ W2 and V =W1 ⊕ W′

2.

The following exercise requires familiarity with Exercise 31 of Section 1.3.

35. Let W be a subspace of a finite-dimensional vector space V, and considerthe basis {u1, u2, . . . , uk} for W. Let {u1, u2, . . . , uk, uk+1, . . . , un} bean extension of this basis to a basis for V.

(a) Prove that {uk+1 + W, uk+2 + W, . . . , un + W} is a basis for V/W.(b) Derive a formula relating dim(V), dim(W), and dim(V/W).

1.7∗ MAXIMAL LINEARLY INDEPENDENT SUBSETS

In this section, several significant results from Section 1.6 are extended toinfinite-dimensional vector spaces. Our principal goal here is to prove thatevery vector space has a basis. This result is important in the study ofinfinite-dimensional vector spaces because it is often difficult to construct anexplicit basis for such a space. Consider, for example, the vector space ofreal numbers over the field of rational numbers. There is no obvious way toconstruct a basis for this space, and yet it follows from the results of thissection that such a basis does exist.

The difficulty that arises in extending the theorems of the preceding sec-tion to infinite-dimensional vector spaces is that the principle of mathematicalinduction, which played a crucial role in many of the proofs of Section 1.6,is no longer adequate. Instead, a more general result called the maximalprinciple is needed. Before stating this principle, we need to introduce someterminology.

Definition. Let F be a family of sets. A member M of F is calledmaximal (with respect to set inclusion) if M is contained in no member ofF other than M itself.

Sec. 1.7 Maximal Linearly Independent Subsets 59

Example 1

Let F be the family of all subsets of a nonempty set S. (This family F iscalled the power set of S.) The set S is easily seen to be a maximal elementof F . ♦Example 2

Let S and T be disjoint nonempty sets, and let F be the union of their powersets. Then S and T are both maximal elements of F . ♦Example 3

Let F be the family of all finite subsets of an infinite set S. Then F has nomaximal element. For if M is any member of F and s is any element of Sthat is not in M , then M ∪{s} is a member of F that contains M as a propersubset. ♦

Definition. A collection of sets C is called a chain (or nest or tower)if for each pair of sets A and B in C, either A ⊆ B or B ⊆ A.

Example 4

For each positive integer n let An = {1, 2, . . . , n}. Then the collection ofsets C = {An : n = 1, 2, 3, . . .} is a chain. In fact, Am ⊆ An if and only ifm ≤ n. ♦

With this terminology we can now state the maximal principle.

Maximal Principle.4 Let F be a family of sets. If, for each chain C ⊆ F ,there exists a member of F that contains each member of C, then F containsa maximal member.

Because the maximal principle guarantees the existence of maximal el-ements in a family of sets satisfying the hypothesis above, it is useful toreformulate the definition of a basis in terms of a maximal property. In The-orem 1.12, we show that this is possible; in fact, the concept defined next isequivalent to a basis.

Definition. Let S be a subset of a vector space V. A maximal linearlyindependent subset of S is a subset B of S satisfying both of the followingconditions.

(a) B is linearly independent.(b) The only linearly independent subset of S that contains B is B itself.

4The Maximal Principle is logically equivalent to the Axiom of Choice, whichis an assumption in most axiomatic developments of set theory. For a treatmentof set theory using the Maximal Principle, see John L. Kelley, General Topology,Graduate Texts in Mathematics Series, Vol. 27, Springer-Verlag, 1991.


Example 5

Example 2 of Section 1.4 shows that

{x3 − 2x2 − 5x − 3, 3x3 − 5x2 − 4x − 9}

is a maximal linearly independent subset of

S = {2x3 − 2x2 + 12x − 6, x3 − 2x2 − 5x − 3, 3x3 − 5x2 − 4x − 9}

in P2(R). In this case, however, any subset of S consisting of two polynomialsis easily shown to be a maximal linearly independent subset of S. Thusmaximal linearly independent subsets of a set need not be unique. ♦

A basis β for a vector space V is a maximal linearly independent subsetof V, because

1. β is linearly independent by definition.2. If v ∈ V and v /∈ β, then β ∪ {v} is linearly dependent by Theorem 1.7

(p. 39) because span(β) = V.

Our next result shows that the converse of this statement is also true.

Theorem 1.12. Let V be a vector space and S a subset that generatesV. If β is a maximal linearly independent subset of S, then β is a basis for V.

Proof. Let β be a maximal linearly independent subset of S. Because βis linearly independent, it suffices to prove that β generates V. We claimthat S ⊆ span(β), for otherwise there exists a v ∈ S such that v /∈ span(β).Since Theorem 1.7 (p. 39) implies that β ∪ {v} is linearly independent, wehave contradicted the maximality of β. Therefore S ⊆ span(β). Becausespan(S) = V, it follows from Theorem 1.5 (p. 30) that span(β) = V.

Thus a subset of a vector space is a basis if and only if it is a maximallinearly independent subset of the vector space. Therefore we can accomplishour goal of proving that every vector space has a basis by showing that everyvector space contains a maximal linearly independent subset. This resultfollows immediately from the next theorem.

Theorem 1.13. Let S be a linearly independent subset of a vector spaceV. There exists a maximal linearly independent subset of V that contains S.

Proof. Let F denote the family of all linearly independent subsets of Vthat contain S. In order to show that F contains a maximal element, we mustshow that if C is a chain in F , then there exists a member U of F that containseach member of C. We claim that U , the union of the members of C, is thedesired set. Clearly U contains each member of C, and so it suffices to prove

Sec. 1.7 Maximal Linearly Independent Subsets 61

that U ∈ F (i.e., that U is a linearly independent subset of V that contains S).Because each member of C is a subset of V containing S, we have S ⊆ U ⊆ V.Thus we need only prove that U is linearly independent. Let u1, u2, . . . , un

be in U and a1, a2, . . . , an be scalars such that a1u1 + a2u2 + · · ·+ anun = 0 .Because ui ∈ U for i = 1, 2, . . . , n, there exists a set Ai in C such that ui ∈ Ai.But since C is a chain, one of these sets, say Ak, contains all the others. Thusui ∈ Ak for i = 1, 2, . . . , n. However, Ak is a linearly independent set; soa1u1 + a2u2 + · · ·+ anun = 0 implies that a1 = a2 = · · · = an = 0. It followsthat U is linearly independent.

The maximal principle implies that F has a maximal element. This el-ement is easily seen to be a maximal linearly independent subset of V thatcontains S.

Corollary. Every vector space has a basis.

It can be shown, analogously to Corollary 1 of the replacement theorem(p. 46), that every basis for an infinite-dimensional vector space has the samecardinality. (Sets have the same cardinality if there is a one-to-one and ontomapping between them.) (See, for example, N. Jacobson, Lectures in Ab-stract Algebra, vol. 2, Linear Algebra, D. Van Nostrand Company, NewYork, 1953, p. 240.)

Exercises 4-7 extend other results from Section 1.6 to infinite-dimensionalvector spaces.

EXERCISES


(a) Every family of sets contains a maximal element.(b) Every chain contains a maximal element.(c) If a family of sets has a maximal element, then that maximal

element is unique.(d) If a chain of sets has a maximal element, then that maximal ele-

ment is unique.(e) A basis for a vector space is a maximal linearly independent subset

of that vector space.(f) A maximal linearly independent subset of a vector space is a basis

for that vector space.

2. Show that the set of convergent sequences is an infinite-dimensionalsubspace of the vector space of all sequences of real numbers. (SeeExercise 21 in Section 1.3.)

3. Let V be the set of real numbers regarded as a vector space over thefield of rational numbers. Prove that V is infinite-dimensional. Hint:


Use the fact that π is transcendental, that is, π is not a zero of anypolynomial with rational coefficients.

4. Let W be a subspace of a (not necessarily finite-dimensional) vectorspace V. Prove that any basis for W is a subset of a basis for V.

5. Prove the following infinite-dimensional version of Theorem 1.8 (p. 43):Let β be a subset of an infinite-dimensional vector space V. Then β is abasis for V if and only if for each nonzero vector v in V, there exist uniquevectors u1, u2, . . . , un in β and unique nonzero scalars c1, c2, . . . , cn suchthat v = c1u1 + c2u2 + · · · + cnun.

6. Prove the following generalization of Theorem 1.9 (p. 44): Let S1 andS2 be subsets of a vector space V such that S1 ⊆ S2. If S1 is linearlyindependent and S2 generates V, then there exists a basis β for V suchthat S1 ⊆ β ⊆ S2. Hint: Apply the maximal principle to the family ofall linearly independent subsets of S2 that contain S1, and proceed asin the proof of Theorem 1.13.

7. Prove the following generalization of the replacement theorem. Let βbe a basis for a vector space V, and let S be a linearly independentsubset of V. There exists a subset S1 of β such that S ∪ S1 is a basisfor V.

INDEX OF DEFINITIONS FOR CHAPTER 1

Additive inverse 12Basis 43Cancellation law 11Column vector 8Chain 59Degree of a polynomial 9Diagonal entries of a matrix 8Diagonal matrix 18Dimension 47Finite-dimensional space 46Generates 30Infinite-dimensional space 47Lagrange interpolation formula 52Lagrange polynomials 52Linear combination 24Linearly dependent 36Linearly independent 37Matrix 8Maximal element of a family

of sets 58

Maximal linearly independentsubset 59

n-tuple 7Polynomial 9Row vector 8Scalar 7Scalar multiplication 6Sequence 11Span of a subset 30Spans 30Square matrix 9Standard basis for Fn 43Standard basis for Pn(F ) 43Subspace 16Subspace generated by the elements

of a set 30Symmetric matrix 17Trace 18Transpose 17Trivial representation of 0 36

Chap. 1 Index of Definitions 63

Vector 7Vector addition 6Vector space 6Zero matrix 8

Zero polynomial 9Zero subspace 16Zero vector 12Zero vector space 15

2Linear Transformationsand Matrices2.1 Linear Transformations, Null spaces, and Ranges2.2 The Matrix Representation of a Linear Transformation2.3 Composition of Linear Transformations and Matrix Multiplication2.4 Invertibility and Isomorphisms2.5 The Change of Coordinate Matrix2.6* Dual Spaces2.7* Homogeneous Linear Differential Equations with Constant Coefficients

In Chapter 1, we developed the theory of abstract vector spaces in consid-erable detail. It is now natural to consider those functions defined on vectorspaces that in some sense “preserve” the structure. These special functionsare called linear transformations, and they abound in both pure and appliedmathematics. In calculus, the operations of differentiation and integrationprovide us with two of the most important examples of linear transforma-tions (see Examples 6 and 7 of Section 2.1). These two examples allow usto reformulate many of the problems in differential and integral equations interms of linear transformations on particular vector spaces (see Sections 2.7and 5.2).

In geometry, rotations, reflections, and projections (see Examples 2, 3,and 4 of Section 2.1) provide us with another class of linear transformations.Later we use these transformations to study rigid motions in Rn (Section6.10).

In the remaining chapters, we see further examples of linear transforma-tions occurring in both the physical and the social sciences. Throughout thischapter, we assume that all vector spaces are over a common field F .

2.1 LINEAR TRANSFORMATIONS, NULL SPACES, AND RANGES

In this section, we consider a number of examples of linear transformations.Many of these transformations are studied in more detail in later sections.Recall that a function T with domain V and codomain W is denoted by

64

Sec. 2.1 Linear Transformations, Null Spaces, and Ranges 65

T : V → W. (See Appendix B.)

Definition. Let V and W be vector spaces (over F ). We call a functionT : V → W a linear transformation from V to W if, for all x, y ∈ V andc ∈ F , we have

(a) T(x + y) = T(x) + T(y) and(b) T(cx) = cT(x).

If the underlying field F is the field of rational numbers, then (a) implies(b) (see Exercise 37), but, in general (a) and (b) are logically independent.See Exercises 38 and 39.

We often simply call T linear. The reader should verify the followingproperties of a function T : V → W. (See Exercise 7.)

1. If T is linear, then T(0 ) = 0 .2. T is linear if and only if T(cx + y) = cT(x) + T(y) for all x, y ∈ V and

c ∈ F .3. If T is linear, then T(x − y) = T(x) − T(y) for all x, y ∈ V.4. T is linear if and only if, for x1, x2, . . . , xn ∈ V and a1, a2, . . . , an ∈ F ,

we have

T

(n∑

i=1

aixi

)=

n∑i=1

aiT(xi).

We generally use property 2 to prove that a given transformation is linear.

Example 1

Define

T : R2 → R2 by T(a1, a2) = (2a1 + a2, a1).

To show that T is linear, let c ∈ R and x, y ∈ R2, where x = (b1, b2) andy = (d1, d2). Since

cx + y = (cb1 + d1, cb2 + d2),

we have

T(cx + y) = (2(cb1 + d1) + cb2 + d2, cb1 + d1).

Also

cT(x) + T(y) = c(2b1 + b2, b1) + (2d1 + d2, d1)= (2cb1 + cb2 + 2d1 + d2, cb1 + d1)= (2(cb1 + d1) + cb2 + d2, cb1 + d1).

So T is linear. ♦

66 Chap. 2 Linear Transformations and Matrices

��

�

(a1, a2)

Tθ(a1, a2)

θ

α

.................................................

.........

............

�

�

(a) Rotation

��

��

(a1, a2)

T(a1, a2) =

(a1,−a2)

�

�

��

.........................................................................................................................................

(b) Reflection

��

�

(a1, a2)

T(a1, a2) =

(a1, 0)

(c) Projection

Figure 2.1

As we will see in Chapter 6, the applications of linear algebra to geometryare wide and varied. The main reason for this is that most of the importantgeometrical transformations are linear. Three particular transformations thatwe now consider are rotation, reflection, and projection. We leave the proofsof linearity to the reader.

Example 2

For any angle θ, define Tθ : R2 → R2 by the rule: Tθ(a1, a2) is the vectorobtained by rotating (a1, a2) counterclockwise by θ if (a1, a2) �= (0, 0), andTθ(0, 0) = (0, 0). Then Tθ : R2 → R2 is a linear transformation that is calledthe rotation by θ.

We determine an explicit formula for Tθ. Fix a nonzero vector (a1, a2) ∈R2. Let α be the angle that (a1, a2) makes with the positive x-axis (seeFigure 2.1(a)), and let r =

√a21 + a2

2. Then a1 = r cos α and a2 = r sin α.Also, Tθ(a1, a2) has length r and makes an angle α + θ with the positivex-axis. It follows that

Tθ(a1, a2) = (r cos(α + θ), r sin(α + θ))= (r cos α cos θ − r sin α sin θ, r cos α sin θ + r sin α cos θ)= (a1 cos θ − a2 sin θ, a1 sin θ + a2 cos θ).

Finally, observe that this same formula is valid for (a1, a2) = (0, 0).It is now easy to show, as in Example 1, that Tθ is linear. ♦

Example 3

Define T : R2 → R2 by T(a1, a2) = (a1,−a2). T is called the reflectionabout the x -axis. (See Figure 2.1(b).) ♦Example 4

Define T : R2 → R2 by T(a1, a2) = (a1, 0). T is called the projection on thex -axis. (See Figure 2.1(c).) ♦


We now look at some additional examples of linear transformations.

Example 5

Define T : Mm×n(F ) → Mn×m(F ) by T(A) = At, where At is the transposeof A, defined in Section 1.3. Then T is a linear transformation by Exercise 3of Section 1.3. ♦Example 6

Define T : Pn(R) → Pn−1(R) by T(f(x)) = f ′(x), where f ′(x) denotes thederivative of f(x). To show that T is linear, let g(x), h(x) ∈ Pn(R) and a ∈ R.Now

T(ag(x) + h(x)) = (ag(x) + h(x))′ = ag′(x) + h′(x) = aT(g(x)) + T(h(x)).

So by property 2 above, T is linear. ♦Example 7

Let V = C(R), the vector space of continuous real-valued functions on R. Leta, b ∈ R, a < b. Define T : V → R by

T(f) =∫ b

a

f(t) dt

for all f ∈ V. Then T is a linear transformation because the definite integralof a linear combination of functions is the same as the linear combination ofthe definite integrals of the functions. ♦

Two very important examples of linear transformations that appear fre-quently in the remainder of the book, and therefore deserve their own nota-tion, are the identity and zero transformations.

For vector spaces V and W (over F ), we define the identity transfor-mation IV : V → V by IV(x) = x for all x ∈ V and the zero transformationT0 : V → W by T0(x) = 0 for all x ∈ V. It is clear that both of thesetransformations are linear. We often write I instead of IV.

We now turn our attention to two very important sets associated withlinear transformations: the range and null space. The determination of thesesets allows us to examine more closely the intrinsic properties of a lineartransformation.

Definitions. Let V and W be vector spaces, and let T : V → W be linear.We define the null space (or kernel) N(T) of T to be the set of all vectorsx in V such that T(x) = 0 ; that is, N(T) = {x ∈ V : T(x) = 0}.

We define the range (or image) R(T) of T to be the subset of W con-sisting of all images (under T) of vectors in V; that is, R(T) = {T(x) : x ∈ V}.


Example 8

Let V and W be vector spaces, and let I : V → V and T0 : V → W be theidentity and zero transformations, respectively. Then N(I) = {0}, R(I) = V,N(T0) = V, and R(T0) = {0}. ♦

Example 9

Let T : R3 → R2 be the linear transformation defined by

T(a1, a2, a3) = (a1 − a2, 2a3).

It is left as an exercise to verify that

N(T) = {(a, a, 0) : a ∈ R} and R(T) = R2. ♦

In Examples 8 and 9, we see that the range and null space of each of thelinear transformations is a subspace. The next result shows that this is truein general.

Theorem 2.1. Let V and W be vector spaces and T : V → W be linear.Then N(T) and R(T) are subspaces of V and W, respectively.

Proof. To clarify the notation, we use the symbols 0V and 0W to denotethe zero vectors of V and W, respectively.

Since T(0V) = 0W, we have that 0V ∈ N(T). Let x, y ∈ N(T) and c ∈ F .Then T(x+y) = T(x)+T(y) = 0W +0W = 0W, and T(cx) = cT(x) = c0W =0W. Hence x + y ∈ N(T) and cx ∈ N(T), so that N(T) is a subspace of V.

Because T(0V) = 0W, we have that 0W ∈ R(T). Now let x, y ∈ R(T) andc ∈ F . Then there exist v and w in V such that T(v) = x and T(w) = y. SoT(v +w) = T(v)+T(w) = x+y, and T(cv) = cT(v) = cx. Thus x+y ∈ R(T)and cx ∈ R(T), so R(T) is a subspace of W.

The next theorem provides a method for finding a spanning set for therange of a linear transformation. With this accomplished, a basis for therange is easy to discover using the technique of Example 6 of Section 1.6.

Theorem 2.2. Let V and W be vector spaces, and let T : V → W belinear. If β = {v1, v2, . . . , vn} is a basis for V, then

R(T) = span(T(β)) = span({T(v1), T(v2), . . . ,T(vn)}).

Proof. Clearly T(vi) ∈ R(T) for each i. Because R(T) is a subspace,R(T) contains span({T(v1), T(v2), . . . ,T(vn)}) = span(T(β)) by Theorem 1.5(p. 30).


Now suppose that w ∈ R(T). Then w = T(v) for some v ∈ V. Because βis a basis for V, we have

v =n∑

i=1

aivi for some a1, a2, . . . , an ∈ F.

Since T is linear, it follows that

w = T(v) =n∑

i=1

aiT(vi) ∈ span(T(β)).

So R(T) is contained in span(T(β)).

It should be noted that Theorem 2.2 is true if β is infinite, that is, R(T) =span({T(v) : v ∈ β}). (See Exercise 33.)

The next example illustrates the usefulness of Theorem 2.2.

Example 10

Define the linear transformation T : P2(R) → M2×2(R) by

T(f(x)) =(

f(1) − f(2) 00 f(0)

).

Since β = {1, x, x2} is a basis for P2(R), we have

R(T) = span(T(β)) = span({T(1), T(x), T(x2)})

= span({(

0 00 1

),

(−1 00 0

),

(−3 00 0

)})= span

({(0 00 1

),

(−1 00 0

)}).

Thus we have found a basis for R(T), and so dim(R(T)) = 2. ♦As in Chapter 1, we measure the “size” of a subspace by its dimension.

The null space and range are so important that we attach special names totheir respective dimensions.

Definitions. Let V and W be vector spaces, and let T : V → W belinear. If N(T) and R(T) are finite-dimensional, then we define the nullityof T, denoted nullity(T), and the rank of T, denoted rank(T), to be thedimensions of N(T) and R(T), respectively.

Reflecting on the action of a linear transformation, we see intuitively thatthe larger the nullity, the smaller the rank. In other words, the more vectorsthat are carried into 0 , the smaller the range. The same heuristic reasoningtells us that the larger the rank, the smaller the nullity. This balance betweenrank and nullity is made precise in the next theorem, appropriately called thedimension theorem.


Theorem 2.3 (Dimension Theorem). Let V and W be vector spaces,and let T : V → W be linear. If V is finite-dimensional, then

nullity(T) + rank(T) = dim(V).

Proof. Suppose that dim(V) = n, dim(N(T)) = k, and {v1, v2, . . . , vk} isa basis for N(T). By the corollary to Theorem 1.11 (p. 51), we may extend{v1, v2, . . . , vk} to a basis β = {v1, v2, . . . , vn} for V. We claim that S ={T(vk+1), T(vk+2), . . . ,T(vn)} is a basis for R(T).

First we prove that S generates R(T). Using Theorem 2.2 and the factthat T(vi) = 0 for 1 ≤ i ≤ k, we have

R(T) = span({T(v1), T(v2), . . . ,T(vn)}= span({T(vk+1), T(vk+2), . . . ,T(vn)} = span(S).

Now we prove that S is linearly independent. Suppose that

n∑i=k+1

biT(vi) = 0 for bk+1, bk+2, . . . , bn ∈ F.

Using the fact that T is linear, we have

T

(n∑

i=k+1

bivi

)= 0 .

Son∑

i=k+1

bivi ∈ N(T).

Hence there exist c1, c2, . . . , ck ∈ F such that

n∑i=k+1

bivi =k∑

i=1

civi ork∑

i=1

(−ci)vi +n∑

i=k+1

bivi = 0 .

Since β is a basis for V, we have bi = 0 for all i. Hence S is linearly indepen-dent. Notice that this argument also shows that T(vk+1), T(vk+2), . . . ,T(vn)are distinct; therefore rank(T) = n − k.

If we apply the dimension theorem to the linear transformation T in Ex-ample 9, we have that nullity(T) + 2 = 3, so nullity(T) = 1.

The reader should review the concepts of “one-to-one” and “onto” pre-sented in Appendix B. Interestingly, for a linear transformation, both of theseconcepts are intimately connected to the rank and nullity of the transforma-tion. This is demonstrated in the next two theorems.


Theorem 2.4. Let V and W be vector spaces, and let T : V → W belinear. Then T is one-to-one if and only if N(T) = {0}.

Proof. Suppose that T is one-to-one and x ∈ N(T). Then T(x) = 0 =T(0 ). Since T is one-to-one, we have x = 0 . Hence N(T) = {0}.

Now assume that N(T) = {0}, and suppose that T(x) = T(y). Then0 = T(x) − T(y) = T(x − y) by property 3 on page 65. Therefore x − y ∈N(T) = {0}. So x − y = 0 , or x = y. This means that T is one-to-one.

The reader should observe that Theorem 2.4 allows us to conclude thatthe transformation defined in Example 9 is not one-to-one.

Surprisingly, the conditions of one-to-one and onto are equivalent in animportant special case.

Theorem 2.5. Let V and W be vector spaces of equal (finite) dimension,and let T : V → W be linear. Then the following are equivalent.

(a) T is one-to-one.(b) T is onto.(c) rank(T) = dim(V).

Proof. From the dimension theorem, we have

nullity(T) + rank(T) = dim(V).

Now, with the use of Theorem 2.4, we have that T is one-to-one if and only ifN(T) = {0}, if and only if nullity(T) = 0, if and only if rank(T) = dim(V), ifand only if rank(T) = dim(W), and if and only if dim(R(T)) = dim(W). ByTheorem 1.11 (p. 50), this equality is equivalent to R(T) = W, the definitionof T being onto.

We note that if V is not finite-dimensional and T : V → V is linear, thenit does not follow that one-to-one and onto are equivalent. (See Exercises 15,16, and 21.)

The linearity of T in Theorems 2.4 and 2.5 is essential, for it is easy toconstruct examples of functions from R into R that are not one-to-one, butare onto, and vice versa.

The next two examples make use of the preceding theorems in determiningwhether a given linear transformation is one-to-one or onto.

Example 11

Let T : P2(R) → P3(R) be the linear transformation defined by

T(f(x)) = 2f ′(x) +∫ x

0

3f(t) dt.


Now

R(T) = span({T(1), T(x), T(x2)}) = span({3x, 2 +32x2, 4x + x3}).

Since {3x, 2 + 32x2, 4x + x3} is linearly independent, rank(T) = 3. Since

dim(P3(R)) = 4, T is not onto. From the dimension theorem, nullity(T) +3 = 3. So nullity(T) = 0, and therefore, N(T) = {0}. We conclude fromTheorem 2.4 that T is one-to-one. ♦

Example 12

Let T : F2 → F2 be the linear transformation defined by

T(a1, a2) = (a1 + a2, a1).

It is easy to see that N(T) = {0}; so T is one-to-one. Hence Theorem 2.5tells us that T must be onto. ♦

In Exercise 14, it is stated that if T is linear and one-to-one, then asubset S is linearly independent if and only if T(S) is linearly independent.Example 13 illustrates the use of this result.

Example 13

Let T : P2(R) → R3 be the linear transformation defined by

T(a0 + a1x + a2x2) = (a0, a1, a2).

Clearly T is linear and one-to-one. Let S = {2 − x + 3x2, x + x2, 1 − 2x2}.Then S is linearly independent in P2(R) because

T(S) = {(2,−1, 3), (0, 1, 1), (1, 0,−2)}

is linearly independent in R3. ♦In Example 13, we transferred a property from the vector space of polyno-

mials to a property in the vector space of 3-tuples. This technique is exploitedmore fully later.

One of the most important properties of a linear transformation is that it iscompletely determined by its action on a basis. This result, which follows fromthe next theorem and corollary, is used frequently throughout the book.

Theorem 2.6. Let V and W be vector spaces over F , and suppose that{v1, v2, . . . , vn} is a basis for V. For w1, w2, . . . , wn in W, there exists exactlyone linear transformation T : V → W such that T(vi) = wi for i = 1, 2, . . . , n.


Proof. Let x ∈ V. Then

x =n∑

i=1

aivi,

where a1a2, . . . , an are unique scalars. Define

T : V → W by T(x) =n∑

i=1

aiwi.

(a) T is linear: Suppose that u, v ∈ V and d ∈ F . Then we may write

u =n∑

i=1

bivi and v =n∑

i=1

civi

for some scalars b1, b2, . . . , bn, c1, c2, . . . , cn. Thus

du + v =n∑

i=1

(dbi + ci)vi.

So

T(du + v) =n∑

i=1

(dbi + ci)wi = dn∑

i=1

biwi +n∑

i=1

ciwi = dT(u) + T(v).

(b) Clearly

T(vi) = wi for i = 1, 2, . . . , n.

(c) T is unique: Suppose that U : V → W is linear and U(vi) = wi fori = 1, 2, . . . , n. Then for x ∈ V with

x =n∑

i=1

aivi,

we have

U(x) =n∑

i=1

aiU(vi) =n∑

i=1

aiwi = T(x).

Hence U = T.

Corollary. Let V and W be vector spaces, and suppose that V has afinite basis {v1, v2, . . . , vn}. If U, T : V → W are linear and U(vi) = T(vi) fori = 1, 2, . . . , n, then U = T.


Example 14


T(a1, a2) = (2a2 − a1, 3a1),

and suppose that U : R2 → R2 is linear. If we know that U(1, 2) = (3, 3) andU(1, 1) = (1, 3), then U = T. This follows from the corollary and from thefact that {(1, 2), (1, 1)} is a basis for R2. ♦

EXERCISES

1. Label the following statements as true or false. In each part, V and Ware finite-dimensional vector spaces (over F ), and T is a function fromV to W.

(a) If T is linear, then T preserves sums and scalar products.(b) If T(x + y) = T(x) + T(y), then T is linear.(c) T is one-to-one if and only if the only vector x such that T(x) = 0

is x = 0 .(d) If T is linear, then T(0V) = 0W.(e) If T is linear, then nullity(T) + rank(T) = dim(W).(f) If T is linear, then T carries linearly independent subsets of V onto

linearly independent subsets of W.(g) If T, U : V → W are both linear and agree on a basis for V, then

T = U.(h) Given x1, x2 ∈ V and y1, y2 ∈ W, there exists a linear transforma-

tion T : V → W such that T(x1) = y1 and T(x2) = y2.

For Exercises 2 through 6, prove that T is a linear transformation, and findbases for both N(T) and R(T). Then compute the nullity and rank of T, andverify the dimension theorem. Finally, use the appropriate theorems in thissection to determine whether T is one-to-one or onto.

2. T : R3 → R2 defined by T(a1, a2, a3) = (a1 − a2, 2a3).

3. T : R2 → R3 defined by T(a1, a2) = (a1 + a2, 0, 2a1 − a2).

4. T : M2×3(F ) → M2×2(F ) defined by

T

(a11 a12 a13

a21 a22 a23

)=(

2a11 − a12 a13 + 2a12

0 0

).

5. T : P2(R) → P3(R) defined by T(f(x)) = xf(x) + f ′(x).


6. T : Mn×n(F ) → F defined by T(A) = tr(A). Recall (Example 4, Sec-tion 1.3) that

tr(A) =n∑

i=1

Aii.

7. Prove properties 1, 2, 3, and 4 on page 65.

8. Prove that the transformations in Examples 2 and 3 are linear.

9. In this exercise, T : R2 → R2 is a function. For each of the followingparts, state why T is not linear.

(a) T(a1, a2) = (1, a2)(b) T(a1, a2) = (a1, a

21)

(c) T(a1, a2) = (sin a1, 0)(d) T(a1, a2) = (|a1|, a2)(e) T(a1, a2) = (a1 + 1, a2)

10. Suppose that T : R2 → R2 is linear, T(1, 0) = (1, 4), and T(1, 1) = (2, 5).What is T(2, 3)? Is T one-to-one?

11. Prove that there exists a linear transformation T : R2 → R3 such thatT(1, 1) = (1, 0, 2) and T(2, 3) = (1,−1, 4). What is T(8, 11)?

12. Is there a linear transformation T : R3 → R2 such that T(1, 0, 3) = (1, 1)and T(−2, 0,−6) = (2, 1)?

13. Let V and W be vector spaces, let T : V → W be linear, and let{w1, w2, . . . , wk} be a linearly independent subset of R(T). Prove thatif S = {v1, v2, . . . , vk} is chosen so that T(vi) = wi for i = 1, 2, . . . , k,then S is linearly independent.

14. Let V and W be vector spaces and T : V → W be linear.

(a) Prove that T is one-to-one if and only if T carries linearly inde-pendent subsets of V onto linearly independent subsets of W.

(b) Suppose that T is one-to-one and that S is a subset of V. Provethat S is linearly independent if and only if T(S) is linearly inde-pendent.

(c) Suppose β = {v1, v2, . . . , vn} is a basis for V and T is one-to-oneand onto. Prove that T(β) = {T(v1), T(v2), . . . ,T(vn)} is a basisfor W.

15. Recall the definition of P(R) on page 10. Define

T : P(R) → P(R) by T(f(x)) =∫ x

0

f(t) dt.

Prove that T linear and one-to-one, but not onto.


16. Let T : P(R) → P(R) be defined by T(f(x)) = f ′(x). Recall that T islinear. Prove that T is onto, but not one-to-one.

17. Let V and W be finite-dimensional vector spaces and T : V → W belinear.

(a) Prove that if dim(V) < dim(W), then T cannot be onto.(b) Prove that if dim(V) > dim(W), then T cannot be one-to-one.

18. Give an example of a linear transformation T : R2 → R2 such thatN(T) = R(T).

19. Give an example of distinct linear transformations T and U such thatN(T) = N(U) and R(T) = R(U).

20. Let V and W be vector spaces with subspaces V1 and W1, respectively.If T : V → W is linear, prove that T(V1) is a subspace of W and that{x ∈ V : T(x) ∈ W1} is a subspace of V.

21. Let V be the vector space of sequences described in Example 5 of Sec-tion 1.2. Define the functions T, U : V → V by

T(a1, a2, . . .) = (a2, a3, . . .) and U(a1, a2, . . .) = (0, a1, a2, . . .).

T and U are called the left shift and right shift operators on V,respectively.

(a) Prove that T and U are linear.(b) Prove that T is onto, but not one-to-one.(c) Prove that U is one-to-one, but not onto.

22. Let T : R3 → R be linear. Show that there exist scalars a, b, and c suchthat T(x, y, z) = ax + by + cz for all (x, y, z) ∈ R3. Can you generalizethis result for T : Fn → F? State and prove an analogous result forT : Fn → Fm.

23. Let T : R3 → R be linear. Describe geometrically the possibilities forthe null space of T. Hint: Use Exercise 22.

The following definition is used in Exercises 24–27 and in Exercise 30.

Definition. Let V be a vector space and W1 and W2 be subspaces ofV such that V = W1 ⊕ W2. (Recall the definition of direct sum given in theexercises of Section 1.3.) A function T : V → V is called the projection onW1 along W2 if, for x = x1 + x2 with x1 ∈ W1 and x2 ∈ W2, we haveT(x) = x1.

24. Let T : R2 → R2. Include figures for each of the following parts.


(a) Find a formula for T(a, b), where T represents the projection onthe y-axis along the x-axis.

(b) Find a formula for T(a, b), where T represents the projection onthe y-axis along the line L = {(s, s) : s ∈ R}.

25. Let T : R3 → R3.

(a) If T(a, b, c) = (a, b, 0), show that T is the projection on the xy-plane along the z-axis.

(b) Find a formula for T(a, b, c), where T represents the projection onthe z-axis along the xy-plane.

(c) If T(a, b, c) = (a − c, b, 0), show that T is the projection on thexy-plane along the line L = {(a, 0, a) : a ∈ R}.

26. Using the notation in the definition above, assume that T : V → V isthe projection on W1 along W2.

(a) Prove that T is linear and W1 = {x ∈ V : T(x) = x}.(b) Prove that W1 = R(T) and W2 = N(T).(c) Describe T if W1 = V.(d) Describe T if W1 is the zero subspace.

27. Suppose that W is a subspace of a finite-dimensional vector space V.

(a) Prove that there exists a subspace W′ and a function T : V → Vsuch that T is a projection on W along W′.

(b) Give an example of a subspace W of a vector space V such thatthere are two projections on W along two (distinct) subspaces.


Definitions. Let V be a vector space, and let T : V → V be linear. Asubspace W of V is said to be T-invariant if T(x) ∈ W for every x ∈ W, thatis, T(W) ⊆ W. If W is T-invariant, we define the restriction of T on W tobe the function TW : W → W defined by TW(x) = T(x) for all x ∈ W.

Exercises 28–32 assume that W is a subspace of a vector space V and thatT : V → V is linear. Warning: Do not assume that W is T-invariant or thatT is a projection unless explicitly stated.

28. Prove that the subspaces {0}, V, R(T), and N(T) are all T-invariant.

29. If W is T-invariant, prove that TW is linear.

30. Suppose that T is the projection on W along some subspace W′. Provethat W is T-invariant and that TW = IW.

31. Suppose that V = R(T)⊕W and W is T-invariant. (Recall the definitionof direct sum given in the exercises of Section 1.3.)


(a) Prove that W ⊆ N(T).(b) Show that if V is finite-dimensional, then W = N(T).(c) Show by example that the conclusion of (b) is not necessarily true

if V is not finite-dimensional.

32. Suppose that W is T-invariant. Prove that N(TW) = N(T) ∩ W andR(TW) = T(W).

33. Prove Theorem 2.2 for the case that β is infinite, that is, R(T) =span({T(v) : v ∈ β}).

34. Prove the following generalization of Theorem 2.6: Let V and W bevector spaces over a common field, and let β be a basis for V. Then forany function f : β → W there exists exactly one linear transformationT : V → W such that T(x) = f(x) for all x ∈ β.

Exercises 35 and 36 assume the definition of direct sum given in the exercisesof Section 1.3.

35. Let V be a finite-dimensional vector space and T : V → V be linear.

(a) Suppose that V = R(T) + N(T). Prove that V = R(T) ⊕ N(T).(b) Suppose that R(T) ∩ N(T) = {0}. Prove that V = R(T) ⊕ N(T).

Be careful to say in each part where finite-dimensionality is used.

36. Let V and T be as defined in Exercise 21.

(a) Prove that V = R(T)+N(T), but V is not a direct sum of these twospaces. Thus the result of Exercise 35(a) above cannot be provedwithout assuming that V is finite-dimensional.

(b) Find a linear operator T1 on V such that R(T1)∩N(T1) = {0} butV is not a direct sum of R(T1) and N(T1). Conclude that V beingfinite-dimensional is also essential in Exercise 35(b).

37. A function T : V → W between vector spaces V and W is called additiveif T(x + y) = T(x) + T(y) for all x, y ∈ V. Prove that if V and Ware vector spaces over the field of rational numbers, then any additivefunction from V into W is a linear transformation.

38. Let T : C → C be the function defined by T(z) = z. Prove that T isadditive (as defined in Exercise 37) but not linear.

39. Prove that there is an additive function T : R → R (as defined in Ex-ercise 37) that is not linear. Hint: Let V be the set of real numbersregarded as a vector space over the field of rational numbers. By thecorollary to Theorem 1.13 (p. 60), V has a basis β. Let x and y be twodistinct vectors in β, and define f : β → V by f(x) = y, f(y) = x, andf(z) = z otherwise. By Exercise 34, there exists a linear transformation

Sec. 2.2 The Matrix Representation of a Linear Transformation 79

T : V → V such that T(u) = f(u) for all u ∈ β. Then T is additive, butfor c = y/x, T(cx) �= cT(x).

The following exercise requires familiarity with the definition of quotient spacegiven in Exercise 31 of Section 1.3.

40. Let V be a vector space and W be a subspace of V. Define the mappingη : V → V/W by η(v) = v + W for v ∈ V.

(a) Prove that η is a linear transformation from V onto V/W and thatN(η) = W.

(b) Suppose that V is finite-dimensional. Use (a) and the dimen-sion theorem to derive a formula relating dim(V), dim(W), anddim(V/W).

(c) Read the proof of the dimension theorem. Compare the method ofsolving (b) with the method of deriving the same result as outlinedin Exercise 35 of Section 1.6.

2.2 THE MATRIX REPRESENTATION OF A LINEARTRANSFORMATION

Until now, we have studied linear transformations by examining their rangesand null spaces. In this section, we embark on one of the most useful ap-proaches to the analysis of a linear transformation on a finite-dimensionalvector space: the representation of a linear transformation by a matrix. Infact, we develop a one-to-one correspondence between matrices and lineartransformations that allows us to utilize properties of one to study propertiesof the other.

We first need the concept of an ordered basis for a vector space.

Definition. Let V be a finite-dimensional vector space. An orderedbasis for V is a basis for V endowed with a specific order; that is, an orderedbasis for V is a finite sequence of linearly independent vectors in V thatgenerates V.

Example 1

In F3, β = {e1, e2, e3} can be considered an ordered basis. Also γ ={e2, e1, e3} is an ordered basis, but β �= γ as ordered bases. ♦

For the vector space Fn, we call {e1, e2, . . . , en} the standard orderedbasis for Fn. Similarly, for the vector space Pn(F ), we call {1, x, . . . , xn} thestandard ordered basis for Pn(F ).

Now that we have the concept of ordered basis, we can identify abstractvectors in an n-dimensional vector space with n-tuples. This identification isprovided through the use of coordinate vectors, as introduced next.


Definition. Let β = {u1, u2, . . . , un} be an ordered basis for a finite-dimensional vector space V. For x ∈ V, let a1, a2, . . . , an be the unique scalarssuch that

x =n∑

i=1

aiui.

We define the coordinate vector of x relative to β, denoted [x]β , by

[x]β =

⎛⎜⎜⎜⎝a1

a2

...an

⎞⎟⎟⎟⎠ .

Notice that [ui]β = ei in the preceding definition. It is left as an exerciseto show that the correspondence x → [x]β provides us with a linear transfor-mation from V to Fn. We study this transformation in Section 2.4 in moredetail.

Example 2

Let V = P2(R), and let β = {1, x, x2} be the standard ordered basis for V. Iff(x) = 4 + 6x − 7x2, then

[f ]β =

⎛⎝ 46

−7

⎞⎠ . ♦

Let us now proceed with the promised matrix representation of a lineartransformation. Suppose that V and W are finite-dimensional vector spaceswith ordered bases β = {v1, v2, . . . , vn} and γ = {w1, w2, . . . , wm}, respec-tively. Let T : V → W be linear. Then for each j, 1 ≤ j ≤ n, there existunique scalars aij ∈ F , 1 ≤ i ≤ m, such that

T(vj) =m∑

i=1

aijwi for 1 ≤ j ≤ n.

Definition. Using the notation above, we call the m×n matrix A definedby Aij = aij the matrix representation of T in the ordered bases βand γ and write A = [T]γβ . If V = W and β = γ, then we write A = [T]β .

Notice that the jth column of A is simply [T (vj)]γ . Also observe that ifU : V → W is a linear transformation such that [U]γβ = [T]γβ , then U = T bythe corollary to Theorem 2.6 (p. 73).

We illustrate the computation of [T]γβ in the next several examples.


Example 3


T(a1, a2) = (a1 + 3a2, 0, 2a1 − 4a2).

Let β and γ be the standard ordered bases for R2 and R3, respectively. Now

T(1, 0) = (1, 0, 2) = 1e1 + 0e2 + 2e3

and

T(0, 1) = (3, 0,−4) = 3e1 + 0e2 − 4e3.

Hence

[T]γβ =

⎛⎝1 30 02 −4

⎞⎠ .

If we let γ′ = {e3, e2, e1}, then

[T]γ′

β =

⎛⎝2 −40 01 3

⎞⎠ . ♦

Example 4

Let T : P3(R) → P2(R) be the linear transformation defined by T(f(x)) =f ′(x). Let β and γ be the standard ordered bases for P3(R) and P2(R),respectively. Then

T(1) = 0 ·1 + 0 ·x + 0 ·x2

T(x) = 1 ·1 + 0 ·x + 0 ·x2

T(x2) = 0 ·1 + 2 ·x + 0 ·x2

T(x3) = 0 ·1 + 0 ·x + 3 ·x2.

So

[T]γβ =

⎛⎝0 1 0 00 0 2 00 0 0 3

⎞⎠ .

Note that when T(xj) is written as a linear combination of the vectors of γ,its coefficients give the entries of the jth column of [T]γβ . ♦


Now that we have defined a procedure for associating matrices with lineartransformations, we show in Theorem 2.8 that this association “preserves”addition and scalar multiplication. To make this more explicit, we need somepreliminary discussion about the addition and scalar multiplication of lineartransformations.

Definition. Let T, U : V → W be arbitrary functions, where V and Ware vector spaces over F , and let a ∈ F . We define T + U : V → W by(T + U)(x) = T(x) + U(x) for all x ∈ V, and aT : V → W by (aT)(x) = aT(x)for all x ∈ V.

Of course, these are just the usual definitions of addition and scalar mul-tiplication of functions. We are fortunate, however, to have the result thatboth sums and scalar multiples of linear transformations are also linear.

Theorem 2.7. Let V and W be vector spaces over a field F , and letT, U : V → W be linear.

(a) For all a ∈ F , aT + U is linear.(b) Using the operations of addition and scalar multiplication in the pre-

ceding definition, the collection of all linear transformations from V toW is a vector space over F .

Proof. (a) Let x, y ∈ V and c ∈ F . Then

(aT + U)(cx + y) = aT(cx + y) + U(cx + y)= a[T(cx + y)] + cU(x) + U(y)= a[cT(x) + T(y)] + cU(x) + U(y)= acT(x) + cU(x) + aT(y) + U(y)= c(aT + U)(x) + (aT + U)(y).

So aT + U is linear.(b) Noting that T0, the zero transformation, plays the role of the zero

vector, it is easy to verify that the axioms of a vector space are satisfied,and hence that the collection of all linear transformations from V into W is avector space over F .

Definitions. Let V and W be vector spaces over F . We denote thevector space of all linear transformations from V into W by L(V, W). In thecase that V = W, we write L(V) instead of L(V, W).

In Section 2.4, we see a complete identification of L(V, W) with the vectorspace Mm×n(F ), where n and m are the dimensions of V and W, respectively.This identification is easily established by the use of the next theorem.

Theorem 2.8. Let V and W be finite-dimensional vector spaces withordered bases β and γ, respectively, and let T, U : V → W be linear transfor-mations. Then


(a) [T + U]γβ = [T]γβ + [U]γβ and

(b) [aT]γβ = a[T]γβ for all scalars a.

Proof. Let β = {v1, v2, . . . , vn} and γ = {w1, w2, . . . , wm}. There existunique scalars aij and bij (1 ≤ i ≤ m, 1 ≤ j ≤ n) such that

T(vj) =m∑

i=1

aijwi and U(vj) =m∑

i=1

bijwi for 1 ≤ j ≤ n.

Hence

(T + U)(vj) =m∑

i=1

(aij + bij)wi.

Thus

([T + U]γβ)ij = aij + bij = ([T]γβ + [U]γβ)ij .

So (a) is proved, and the proof of (b) is similar.

Example 5

Let T : R2 → R3 and U : R2 → R3 be the linear transformations respectivelydefined by

T(a1, a2) = (a1 + 3a2, 0, 2a1 − 4a2) and U(a1, a2) = (a1 − a2, 2a1, 3a1 + 2a2).

Let β and γ be the standard ordered bases of R2 and R3, respectively. Then

[T]γβ =

⎛⎝1 30 02 −4

⎞⎠ ,

(as computed in Example 3), and

[U]γβ =

⎛⎝1 −12 03 2

⎞⎠ .

If we compute T + U using the preceding definitions, we obtain

(T + U)(a1, a2) = (2a1 + 2a2, 2a1, 5a1 − 2a2).

So

[T + U]γβ =

⎛⎝2 22 05 −2

⎞⎠ ,

which is simply [T]γβ + [U]γβ , illustrating Theorem 2.8. ♦


EXERCISES

1. Label the following statements as true or false. Assume that V andW are finite-dimensional vector spaces with ordered bases β and γ,respectively, and T, U : V → W are linear transformations.(a) For any scalar a, aT + U is a linear transformation from V to W.(b) [T]γβ = [U]γβ implies that T = U.(c) If m = dim(V) and n = dim(W), then [T]γβ is an m × n matrix.(d) [T + U]γβ = [T]γβ + [U]γβ .(e) L(V, W) is a vector space.(f) L(V, W) = L(W, V).

2. Let β and γ be the standard ordered bases for Rn and Rm, respectively.For each linear transformation T : Rn → Rm, compute [T]γβ .

(a) T : R2 → R3 defined by T(a1, a2) = (2a1 − a2, 3a1 + 4a2, a1).(b) T : R3 → R2 defined by T(a1, a2, a3) = (2a1 + 3a2 − a3, a1 + a3).(c) T : R3 → R defined by T(a1, a2, a3) = 2a1 + a2 − 3a3.(d) T : R3 → R3 defined by

T(a1, a2, a3) = (2a2 + a3,−a1 + 4a2 + 5a3, a1 + a3).

(e) T : Rn → Rn defined by T(a1, a2, . . . , an) = (a1, a1, . . . , a1).(f) T : Rn → Rn defined by T(a1, a2, . . . , an) = (an, an−1, . . . , a1).(g) T : Rn → R defined by T(a1, a2, . . . , an) = a1 + an.

3. Let T : R2 → R3 be defined by T(a1, a2) = (a1 − a2, a1, 2a1 + a2). Let βbe the standard ordered basis for R2 and γ = {(1, 1, 0), (0, 1, 1), (2, 2, 3)}.Compute [T]γβ . If α = {(1, 2), (2, 3)}, compute [T]γα.

4. Define

T : M2×2(R) → P2(R) by T

(a bc d

)= (a + b) + (2d)x + bx2.

Let

β ={(

1 00 0

),

(0 10 0

),

(0 01 0

),

(0 00 1

)}and γ = {1, x, x2}.

Compute [T]γβ .

5. Let

α ={(

1 00 0

),

(0 10 0

),

(0 01 0

),

(0 00 1

)},

β = {1, x, x2},and

γ = {1}.


(a) Define T : M2×2(F ) → M2×2(F ) by T(A) = At. Compute [T]α.(b) Define

T : P2(R) → M2×2(R) by T(f(x)) =(

f ′(0) 2f(1)0 f ′′(3)

),

where ′ denotes differentiation. Compute [T]αβ .(c) Define T : M2×2(F ) → F by T(A) = tr(A). Compute [T]γα.(d) Define T : P2(R) → R by T(f(x)) = f(2). Compute [T]γβ .(e) If

A =(

1 −20 4

),

compute [A]α.(f) If f(x) = 3 − 6x + x2, compute [f(x)]β .(g) For a ∈ F , compute [a]γ .

6. Complete the proof of part (b) of Theorem 2.7.

7. Prove part (b) of Theorem 2.8.

8.† Let V be an n-dimensional vector space with an ordered basis β. DefineT : V → Fn by T(x) = [x]β . Prove that T is linear.

9. Let V be the vector space of complex numbers over the field R. DefineT : V → V by T(z) = z, where z is the complex conjugate of z. Provethat T is linear, and compute [T]β , where β = {1, i}. (Recall by Exer-cise 38 of Section 2.1 that T is not linear if V is regarded as a vectorspace over the field C.)

10. Let V be a vector space with the ordered basis β = {v1, v2, . . . , vn}.Define v0 = 0 . By Theorem 2.6 (p. 72), there exists a linear trans-formation T : V → V such that T(vj) = vj + vj−1 for j = 1, 2, . . . , n.Compute [T]β .

11. Let V be an n-dimensional vector space, and let T : V → V be a lineartransformation. Suppose that W is a T-invariant subspace of V (see theexercises of Section 2.1) having dimension k. Show that there is a basisβ for V such that [T]β has the form(

A BO C

),

where A is a k × k matrix and O is the (n − k) × k zero matrix.


12. Let V be a finite-dimensional vector space and T be the projection onW along W′, where W and W′ are subspaces of V. (See the definitionin the exercises of Section 2.1 on page 76.) Find an ordered basis β forV such that [T]β is a diagonal matrix.

13. Let V and W be vector spaces, and let T and U be nonzero lineartransformations from V into W. If R(T) ∩ R(U) = {0}, prove that{T, U} is a linearly independent subset of L(V, W).

14. Let V = P(R), and for j ≥ 1 define Tj(f(x)) = f (j)(x), where f (j)(x)is the jth derivative of f(x). Prove that the set {T1, T2, . . . ,Tn} is alinearly independent subset of L(V) for any positive integer n.

15. Let V and W be vector spaces, and let S be a subset of V. DefineS0 = {T ∈ L(V, W) : T(x) = 0 for all x ∈ S}. Prove the followingstatements.

(a) S0 is a subspace of L(V, W).(b) If S1and S2 are subsets of V and S1 ⊆ S2, then S0

2 ⊆ S01 .

(c) If V1 and V2 are subspaces of V, then (V1 + V2)0 = V01 ∩ V0

2.

16. Let V and W be vector spaces such that dim(V) = dim(W), and letT : V → W be linear. Show that there exist ordered bases β and γ forV and W, respectively, such that [T]γβ is a diagonal matrix.

2.3 COMPOSITION OF LINEAR TRANSFORMATIONSAND MATRIX MULTIPLICATION

In Section 2.2, we learned how to associate a matrix with a linear transforma-tion in such a way that both sums and scalar multiples of matrices are associ-ated with the corresponding sums and scalar multiples of the transformations.The question now arises as to how the matrix representation of a compositeof linear transformations is related to the matrix representation of each of theassociated linear transformations. The attempt to answer this question leadsto a definition of matrix multiplication. We use the more convenient notationof UT rather than U ◦T for the composite of linear transformations U and T.(See Appendix B.)

Our first result shows that the composite of linear transformations is lin-ear.

Theorem 2.9. Let V, W, and Z be vector spaces over the same field F ,and let T : V → W and U : W → Z be linear. Then UT : V → Z is linear.

Proof. Let x, y ∈ V and a ∈ F . Then

UT(ax + y) = U(T(ax + y)) = U(aT(x) + T(y))

= aU(T(x)) + U(T(y)) = a(UT)(x) + UT(y).

Sec. 2.3 Composition of Linear Transformations and Matrix Multiplication 87

The following theorem lists some of the properties of the composition oflinear transformations.

Theorem 2.10. Let V be a vector space. Let T, U1, U2 ∈ L(V). Then(a) T(U1 + U2) = TU1 + TU2 and (U1 + U2)T = U1T + U2T(b) T(U1U2) = (TU1)U2

(c) TI = IT = T(d) a(U1U2) = (aU1)U2 = U1(aU2) for all scalars a.

Proof. Exercise.

A more general result holds for linear transformations that have domainsunequal to their codomains. (See Exercise 8.)

Let T : V → W and U : W → Z be linear transformations, and let A = [U]γβand B = [T]βα, where α = {v1, v2, . . . , vn}, β = {w1, w2, . . . , wm}, and γ ={z1, z2, . . . , zp} are ordered bases for V, W, and Z, respectively. We wouldlike to define the product AB of two matrices so that AB = [UT]γα. Considerthe matrix [UT]γα. For 1 ≤ j ≤ n, we have

(UT)(vj) = U(T(vj)) = U

(m∑

k=1

Bkjwk

)=

m∑k=1

BkjU(wk)

=m∑

k=1

Bkj

(p∑

i=1

Aikzi

)=

p∑i=1

(m∑

k=1

AikBkj

)zi

=p∑

i=1

Cijzi,

where

Cij =m∑

k=1

AikBkj .

This computation motivates the following definition of matrix multiplication.

Definition. Let A be an m × n matrix and B be an n × p matrix. Wedefine the product of A and B, denoted AB, to be the m × p matrix suchthat

(AB)ij =n∑

k=1

AikBkj for 1 ≤ i ≤ m, 1 ≤ j ≤ p.

Note that (AB)ij is the sum of products of corresponding entries from theith row of A and the jth column of B. Some interesting applications of thisdefinition are presented at the end of this section.


The reader should observe that in order for the product AB to be defined,there are restrictions regarding the relative sizes of A and B. The followingmnemonic device is helpful: “(m × n) ·(n × p) = (m × p)”; that is, in orderfor the product AB to be defined, the two “inner” dimensions must be equal,and the two “outer” dimensions yield the size of the product.

Example 1

We have (1 2 10 4 −1

)⎛⎝425

⎞⎠ =(

1 ·4 + 2 ·2 + 1 ·50 ·4 + 4 ·2 + (−1) ·5

)=(

133

).

Notice again the symbolic relationship (2 × 3) ·(3 × 1) = 2 × 1. ♦As in the case with composition of functions, we have that matrix multi-

plication is not commutative. Consider the following two products:(1 10 0

)(0 11 0

)=(

1 10 0

)and

(0 11 0

)(1 10 0

)=(

0 01 1

).

Hence we see that even if both of the matrix products AB and BA are defined,it need not be true that AB = BA.

Recalling the definition of the transpose of a matrix from Section 1.3, weshow that if A is an m×n matrix and B is an n×p matrix, then (AB)t = BtAt.Since

(AB)tij = (AB)ji =

n∑k=1

AjkBki

and

(BtAt)ij =n∑

k=1

(Bt)ik(At)kj =n∑

k=1

BkiAjk,

we are finished. Therefore the transpose of a product is the product of thetransposes in the opposite order.

The next theorem is an immediate consequence of our definition of matrixmultiplication.

Theorem 2.11. Let V, W, and Z be finite-dimensional vector spaces withordered bases α, β, and γ, respectively. Let T : V → W and U : W → Z belinear transformations. Then

[UT]γα = [U]γβ [T]βα.


Corollary. Let V be a finite-dimensional vector space with an orderedbasis β. Let T, U ∈ L(V). Then [UT]β = [U]β [T]β .

We illustrate Theorem 2.11 in the next example.

Example 2

Let U : P3(R) → P2(R) and T : P2(R) → P3(R) be the linear transformationsrespectively defined by

U(f(x)) = f ′(x) and T(f(x)) =∫ x

0

f(t) dt.

Let α and β be the standard ordered bases of P3(R) and P2(R), respectively.From calculus, it follows that UT = I, the identity transformation on P2(R).To illustrate Theorem 2.11, observe that

[UT]β = [U]βα[T]αβ =

⎛⎝0 1 0 00 0 2 00 0 0 3

⎞⎠⎛⎜⎜⎜⎝

0 0 01 0 00 1

2 0

0 0 13

⎞⎟⎟⎟⎠ =

⎛⎝1 0 00 1 00 0 1

⎞⎠ = [I]β . ♦

The preceding 3 × 3 diagonal matrix is called an identity matrix and isdefined next, along with a very useful notation, the Kronecker delta.

Definitions. We define the Kronecker delta δij by δij = 1 if i = j andδij = 0 if i �= j. The n × n identity matrix In is defined by (In)ij = δij .

Thus, for example,

I1 = (1), I2 =(

1 00 1

), and I3 =

⎛⎝1 0 00 1 00 0 1

⎞⎠ .

The next theorem provides analogs of (a), (c), and (d) of Theorem 2.10.Theorem 2.10(b) has its analog in Theorem 2.16. Observe also that part (c) ofthe next theorem illustrates that the identity matrix acts as a multiplicativeidentity in Mn×n(F ). When the context is clear, we sometimes omit thesubscript n from In.

Theorem 2.12. Let A be an m × n matrix, B and C be n × p matrices,and D and E be q × m matrices. Then

(a) A(B + C) = AB + AC and (D + E)A = DA + EA.(b) a(AB) = (aA)B = A(aB) for any scalar a.(c) ImA = A = AIn.(d) If V is an n-dimensional vector space with an ordered basis β, then

[IV]β = In.


Proof. We prove the first half of (a) and (c) and leave the remaining proofsas an exercise. (See Exercise 5.)

(a) We have

[A(B + C)]ij =n∑

k=1

Aik(B + C)kj =n∑

k=1

Aik(Bkj + Ckj)

=n∑

k=1

(AikBkj + AikCkj) =n∑

k=1

AikBkj +n∑

k=1

AikCkj

= (AB)ij + (AC)ij = [AB + AC]ij .

So A(B + C) = AB + AC.(c) We have

(ImA)ij =m∑

k=1

(Im)ikAkj =m∑

k=1

δikAkj = Aij .

Corollary. Let A be an m× n matrix, B1, B2, . . . , Bk be n× p matrices,C1, C2, . . . , Ck be q × m matrices, and a1, a2, . . . , ak be scalars. Then

A

(k∑

i=1

aiBi

)=

k∑i=1

aiABi

and (k∑

i=1

aiCi

)A =

k∑i=1

aiCiA.

Proof. Exercise.

For an n × n matrix A, we define A1 = A, A2 = AA, A3 = A2A, and, ingeneral, Ak = Ak−1A for k = 2, 3, . . . . We define A0 = In.

With this notation, we see that if

A =(

0 01 0

),

then A2 = O (the zero matrix) even though A �= O. Thus the cancellationproperty for multiplication in fields is not valid for matrices. To see why,assume that the cancellation law is valid. Then, from A ·A = A2 = O = A ·O,we would conclude that A = O, which is false.

Theorem 2.13. Let A be an m × n matrix and B be an n × p matrix.For each j (1 ≤ j ≤ p) let uj and vj denote the jth columns of AB and B,respectively. Then


(a) uj = Avj

(b) vj = Bej , where ej is the jth standard vector of Fp.

Proof. (a) We have

uj =

⎛⎜⎜⎜⎝(AB)1j

(AB)2j

...(AB)mj

⎞⎟⎟⎟⎠ =

⎛⎜⎜⎜⎜⎜⎜⎜⎜⎜⎜⎜⎜⎝

n∑k=1

A1kBkj

n∑k=1

A2kBkj

...n∑

k=1

AmkBkj

⎞⎟⎟⎟⎟⎟⎟⎟⎟⎟⎟⎟⎟⎠= A

⎛⎜⎜⎜⎝B1j

B2j

...Bnj

⎞⎟⎟⎟⎠ = Avj .

Hence (a) is proved. The proof of (b) is left as an exercise. (See Exercise 6.)

It follows (see Exercise 14) from Theorem 2.13 that column j of AB isa linear combination of the columns of A with the coefficients in the linearcombination being the entries of column j of B. An analogous result holdsfor rows; that is, row i of AB is a linear combination of the rows of B withthe coefficients in the linear combination being the entries of row i of A.

The next result justifies much of our past work. It utilizes both the matrixrepresentation of a linear transformation and matrix multiplication in orderto evaluate the transformation at any given vector.

Theorem 2.14. Let V and W be finite-dimensional vector spaces havingordered bases β and γ, respectively, and let T : V → W be linear. Then, foreach u ∈ V, we have

[T(u)]γ = [T]γβ [u]β .

Proof. Fix u ∈ V, and define the linear transformations f : F → V byf(a) = au and g : F → W by g(a) = aT(u) for all a ∈ F . Let α = {1} bethe standard ordered basis for F . Notice that g = Tf . Identifying columnvectors as matrices and using Theorem 2.11, we obtain

[T(u)]γ = [g(1)]γ = [g]γα = [Tf ]γα = [T]γβ [f ]βα = [T]γβ [f(1)]β = [T]γβ [u]β .

Example 3

Let T : P3(R) → P2(R) be the linear transformation defined by T(f(x)) =f ′(x), and let β and γ be the standard ordered bases for P3(R) and P2(R),respectively. If A = [T]γβ , then, from Example 4 of Section 2.2, we have

A =

⎛⎝0 1 0 00 0 2 00 0 0 3

⎞⎠ .


We illustrate Theorem 2.14 by verifying that [T(p(x))]γ = [T]γβ [p(x)]β , wherep(x) ∈ P3(R) is the polynomial p(x) = 2−4x+x2 +3x3. Let q(x) = T(p(x));then q(x) = p′(x) = −4 + 2x + 9x2. Hence

[T(p(x))]γ = [q(x)]γ =

⎛⎝−429

⎞⎠ ,

but also

[T]γβ [p(x)]β = A[p(x)]β =

⎛⎝0 1 0 00 0 2 00 0 0 3

⎞⎠⎛⎜⎜⎝

2−4

13

⎞⎟⎟⎠ =

⎛⎝−429

⎞⎠ . ♦

We complete this section with the introduction of the left-multiplicationtransformation LA, where A is an m×n matrix. This transformation is proba-bly the most important tool for transferring properties about transformationsto analogous properties about matrices and vice versa. For example, we useit to prove that matrix multiplication is associative.

Definition. Let A be an m × n matrix with entries from a field F .We denote by LA the mapping LA : Fn → Fm defined by LA(x) = Ax (thematrix product of A and x) for each column vector x ∈ Fn. We call LA aleft-multiplication transformation.

Example 4

Let

A =(

1 2 10 1 2

).

Then A ∈ M2×3(R) and LA : R3 → R2. If

x =

⎛⎝ 13

−1

⎞⎠ ,

then

LA(x) = Ax =(

1 2 10 1 2

)⎛⎝ 13

−1

⎞⎠ =(

61

). ♦

We see in the next theorem that not only is LA linear, but, in fact, it hasa great many other useful properties. These properties are all quite naturaland so are easy to remember.


Theorem 2.15. Let A be an m × n matrix with entries from F . Thenthe left-multiplication transformation LA : Fn → Fm is linear. Furthermore,if B is any other m × n matrix (with entries from F ) and β and γ are thestandard ordered bases for Fn and Fm, respectively, then we have the followingproperties.

(a) [LA]γβ = A.(b) LA = LB if and only if A = B.(c) LA+B = LA + LB and LaA = aLA for all a ∈ F .(d) If T : Fn → Fm is linear, then there exists a unique m×n matrix C such

that T = LC . In fact, C = [T]γβ .(e) If E is an n × p matrix, then LAE = LALE .(f) If m = n, then LIn = IFn .

Proof. The fact that LA is linear follows immediately from Theorem 2.12.(a) The jth column of [LA]γβ is equal to LA(ej). However LA(ej) = Aej ,

which is also the jth column of A by Theorem 2.13(b). So [LA]γβ = A.(b) If LA = LB , then we may use (a) to write A = [LA]γβ = [LB ]γβ = B.

Hence A = B. The proof of the converse is trivial.(c) The proof is left as an exercise. (See Exercise 7.)(d) Let C = [T]γβ . By Theorem 2.14, we have [T(x)]γ = [T]γβ [x]β , or

T(x) = Cx = LC(x) for all x ∈ Fn. So T = LC . The uniqueness of C followsfrom (b).

(e) For any j (1 ≤ j ≤ p), we may apply Theorem 2.13 several times tonote that (AE)ej is the jth column of AE and that the jth column of AE isalso equal to A(Eej). So (AE)ej = A(Eej). Thus

LAE(ej) = (AE)ej = A(Eej) = LA(Eej) = LA(LE(ej)).

Hence LAE = LALE by the corollary to Theorem 2.6 (p. 73).(f) The proof is left as an exercise. (See Exercise 7.)

We now use left-multiplication transformations to establish the associa-tivity of matrix multiplication.

Theorem 2.16. Let A, B, and C be matrices such that A(BC) is de-fined. Then (AB)C is also defined and A(BC) = (AB)C; that is, matrixmultiplication is associative.

Proof. It is left to the reader to show that (AB)C is defined. Using (e)of Theorem 2.15 and the associativity of functional composition (see Ap-pendix B), we have

LA(BC) = LALBC = LA(LBLC) = (LALB)LC = LABLC = L(AB)C .

So from (b) of Theorem 2.15, it follows that A(BC) = (AB)C.


Needless to say, this theorem could be proved directly from the definitionof matrix multiplication (see Exercise 18). The proof above, however, providesa prototype of many of the arguments that utilize the relationships betweenlinear transformations and matrices.

Applications

A large and varied collection of interesting applications arises in connec-tion with special matrices called incidence matrices. An incidence matrixis a square matrix in which all the entries are either zero or one and, forconvenience, all the diagonal entries are zero. If we have a relationship on aset of n objects that we denote by 1, 2, . . . , n, then we define the associatedincidence matrix A by Aij = 1 if i is related to j, and Aij = 0 otherwise.

To make things concrete, suppose that we have four people, each of whomowns a communication device. If the relationship on this group is “can trans-mit to,” then Aij = 1 if i can send a message to j, and Aij = 0 otherwise.Suppose that

A =

⎛⎜⎜⎝0 1 0 01 0 0 10 1 0 11 1 0 0

⎞⎟⎟⎠ .

Then since A34 = 1 and A14 = 0, we see that person 3 can send to 4 but 1cannot send to 4.

We obtain an interesting interpretation of the entries of A2. Consider, forinstance,

(A2)31 = A31A11 + A32A21 + A33A31 + A34A41.

Note that any term A3kAk1 equals 1 if and only if both A3k and Ak1 equal 1,that is, if and only if 3 can send to k and k can send to 1. Thus (A2)31 givesthe number of ways in which 3 can send to 1 in two stages (or in one relay).Since

A2 =

⎛⎜⎜⎝1 0 0 11 2 0 02 1 0 11 1 0 1

⎞⎟⎟⎠ ,

we see that there are two ways 3 can send to 1 in two stages. In general,(A + A2 + · · · + Am)ij is the number of ways in which i can send to j in atmost m stages.

A maximal collection of three or more people with the property that anytwo can send to each other is called a clique. The problem of determiningcliques is difficult, but there is a simple method for determining if someone


belongs to a clique. If we define a new matrix B by Bij = 1 if i and j can sendto each other, and Bij = 0 otherwise, then it can be shown (see Exercise 19)that person i belongs to a clique if and only if (B3)ii > 0. For example,suppose that the incidence matrix associated with some relationship is

A =

⎛⎜⎜⎝0 1 0 11 0 1 01 1 0 11 1 1 0

⎞⎟⎟⎠ .

To determine which people belong to cliques, we form the matrix B, describedearlier, and compute B3. In this case,

B =

⎛⎜⎜⎝0 1 0 11 0 1 00 1 0 11 0 1 0

⎞⎟⎟⎠ and B3 =

⎛⎜⎜⎝0 4 0 44 0 4 00 4 0 44 0 4 0

⎞⎟⎟⎠ .

Since all the diagonal entries of B3 are zero, we conclude that there are nocliques in this relationship.

Our final example of the use of incidence matrices is concerned with theconcept of dominance. A relation among a group of people is called a dom-inance relation if the associated incidence matrix A has the property thatfor all distinct pairs i and j, Aij = 1 if and only if Aji = 0, that is, givenany two people, exactly one of them dominates (or, using the terminology ofour first example, can send a message to) the other. Since A is an incidencematrix, Aii = 0 for all i. For such a relation, it can be shown (see Exercise 21)that the matrix A + A2 has a row [column] in which each entry is positiveexcept for the diagonal entry. In other words, there is at least one personwho dominates [is dominated by] all others in one or two stages. In fact, itcan be shown that any person who dominates [is dominated by] the greatestnumber of people in the first stage has this property. Consider, for example,the matrix

A =

⎛⎜⎜⎜⎜⎝0 1 0 1 00 0 1 0 01 0 0 1 00 1 0 0 11 1 1 0 0

⎞⎟⎟⎟⎟⎠ .

The reader should verify that this matrix corresponds to a dominance relation.Now

A + A2 =

⎛⎜⎜⎜⎜⎝0 2 1 1 11 0 1 1 01 2 0 2 11 2 2 0 12 2 2 2 0

⎞⎟⎟⎟⎟⎠ .


Thus persons 1, 3, 4, and 5 dominate (can send messages to) all the othersin at most two stages, while persons 1, 2, 3, and 4 are dominated by (canreceive messages from) all the others in at most two stages.

EXERCISES

1. Label the following statements as true or false. In each part, V, W,and Z denote vector spaces with ordered (finite) bases α, β, and γ,respectively; T : V → W and U : W → Z denote linear transformations;and A and B denote matrices.

(a) [UT]γα = [T]βα[U]γβ .

(b) [T(v)]β = [T]βα[v]α for all v ∈ V.(c) [U(w)]β = [U]βα[w]β for all w ∈ W.(d) [IV]α = I.(e) [T2]βα = ([T]βα)2.(f) A2 = I implies that A = I or A = −I.(g) T = LA for some matrix A.(h) A2 = O implies that A = O, where O denotes the zero matrix.(i) LA+B = LA + LB .(j) If A is square and Aij = δij for all i and j, then A = I.

2. (a) Let

A =(

1 32 −1

), B =

(1 0 −34 1 2

),

C =(

1 1 4−1 −2 0

), and D =

⎛⎝ 2−2

3

⎞⎠ .

Compute A(2B + 3C), (AB)D, and A(BD).(b) Let

A =

⎛⎝ 2 5−3 1

4 2

⎞⎠ , B =

⎛⎝3 −2 01 −1 45 5 3

⎞⎠ , and C =(4 0 3

).

Compute At, AtB, BCt, CB, and CA.

3. Let g(x) = 3 + x. Let T : P2(R) → P2(R) and U : P2(R) → R3 be thelinear transformations respectively defined by

T(f(x)) = f ′(x)g(x) + 2f(x) and U(a + bx + cx2) = (a + b, c, a − b).

Let β and γ be the standard ordered bases of P2(R) and R3, respectively.


(a) Compute [U]γβ , [T]β , and [UT]γβ directly. Then use Theorem 2.11to verify your result.

(b) Let h(x) = 3 − 2x + x2. Compute [h(x)]β and [U(h(x))]γ . Thenuse [U]γβ from (a) and Theorem 2.14 to verify your result.

4. For each of the following parts, let T be the linear transformation definedin the corresponding part of Exercise 5 of Section 2.2. Use Theorem 2.14to compute the following vectors:

(a) [T(A)]α, where A =(

1 4−1 6

).

(b) [T(f(x))]α, where f(x) = 4 − 6x + 3x2.

(c) [T(A)]γ , where A =(

1 32 4

).

(d) [T(f(x))]γ , where f(x) = 6 − x + 2x2.

5. Complete the proof of Theorem 2.12 and its corollary.

6. Prove (b) of Theorem 2.13.

7. Prove (c) and (f) of Theorem 2.15.

8. Prove Theorem 2.10. Now state and prove a more general result involv-ing linear transformations with domains unequal to their codomains.

9. Find linear transformations U, T : F2 → F2 such that UT = T0 (the zerotransformation) but TU �= T0. Use your answer to find matrices A andB such that AB = O but BA �= O.

10. Let A be an n × n matrix. Prove that A is a diagonal matrix if andonly if Aij = δijAij for all i and j.

11. Let V be a vector space, and let T : V → V be linear. Prove that T2 = T0

if and only if R(T) ⊆ N(T).

12. Let V, W, and Z be vector spaces, and let T : V → W and U : W → Zbe linear.

(a) Prove that if UT is one-to-one, then T is one-to-one. Must U alsobe one-to-one?

(b) Prove that if UT is onto, then U is onto. Must T also be onto?(c) Prove that if U and T are one-to-one and onto, then UT is also.

13. Let A and B be n × n matrices. Recall that the trace of A is definedby

tr(A) =n∑

i=1

Aii.

Prove that tr(AB) = tr(BA) and tr(A) = tr(At).


14. Assume the notation in Theorem 2.13.

(a) Suppose that z is a (column) vector in Fp. Use Theorem 2.13(b)to prove that Bz is a linear combination of the columns of B. Inparticular, if z = (a1, a2, . . . , ap)t, then show that

Bz =p∑

j=1

ajvj .

(b) Extend (a) to prove that column j of AB is a linear combinationof the columns of A with the coefficients in the linear combinationbeing the entries of column j of B.

(c) For any row vector w ∈ Fm, prove that wA is a linear combinationof the rows of A with the coefficients in the linear combinationbeing the coordinates of w. Hint: Use properties of the transposeoperation applied to (a).

(d) Prove the analogous result to (b) about rows: Row i of AB is alinear combination of the rows of B with the coefficients in thelinear combination being the entries of row i of A.

15.† Let M and A be matrices for which the product matrix MA is defined.If the jth column of A is a linear combination of a set of columnsof A, prove that the jth column of MA is a linear combination of thecorresponding columns of MA with the same corresponding coefficients.

16. Let V be a finite-dimensional vector space, and let T : V → V be linear.

(a) If rank(T) = rank(T2), prove that R(T) ∩ N(T) = {0}. Deducethat V = R(T) ⊕ N(T) (see the exercises of Section 1.3).

(b) Prove that V = R(Tk) ⊕ N(Tk) for some positive integer k.

17. Let V be a vector space. Determine all linear transformations T : V → Vsuch that T = T2. Hint: Note that x = T(x) + (x − T(x)) for everyx in V, and show that V = {y : T(y) = y} ⊕ N(T) (see the exercises ofSection 1.3).

18. Using only the definition of matrix multiplication, prove that multipli-cation of matrices is associative.

19. For an incidence matrix A with related matrix B defined by Bij = 1 ifi is related to j and j is related to i, and Bij = 0 otherwise, prove thati belongs to a clique if and only if (B3)ii > 0.

20. Use Exercise 19 to determine the cliques in the relations correspondingto the following incidence matrices.

Sec. 2.4 Invertibility and Isomorphisms 99

(a)

⎛⎜⎜⎝0 1 0 11 0 0 00 1 0 11 0 1 0

⎞⎟⎟⎠ (b)

⎛⎜⎜⎝0 0 1 11 0 0 11 0 0 11 0 1 0

⎞⎟⎟⎠21. Let A be an incidence matrix that is associated with a dominance rela-

tion. Prove that the matrix A + A2 has a row [column] in which eachentry is positive except for the diagonal entry.

22. Prove that the matrix

A =

⎛⎝0 1 00 0 11 0 0

⎞⎠corresponds to a dominance relation. Use Exercise 21 to determinewhich persons dominate [are dominated by] each of the others withintwo stages.

23. Let A be an n × n incidence matrix that corresponds to a dominancerelation. Determine the number of nonzero entries of A.

2.4 INVERTIBILITY AND ISOMORPHISMS

The concept of invertibility is introduced quite early in the study of functions.Fortunately, many of the intrinsic properties of functions are shared by theirinverses. For example, in calculus we learn that the properties of being con-tinuous or differentiable are generally retained by the inverse functions. Wesee in this section (Theorem 2.17) that the inverse of a linear transformationis also linear. This result greatly aids us in the study of inverses of matrices.As one might expect from Section 2.3, the inverse of the left-multiplicationtransformation LA (when it exists) can be used to determine properties of theinverse of the matrix A.

In the remainder of this section, we apply many of the results about in-vertibility to the concept of isomorphism. We will see that finite-dimensionalvector spaces (over F ) of equal dimension may be identified. These ideas willbe made precise shortly.

The facts about inverse functions presented in Appendix B are, of course,true for linear transformations. Nevertheless, we repeat some of the defini-tions for use in this section.

Definition. Let V and W be vector spaces, and let T : V → W be linear.A function U : W → V is said to be an inverse of T if TU = IW and UT = IV.If T has an inverse, then T is said to be invertible. As noted in Appendix B,if T is invertible, then the inverse of T is unique and is denoted by T−1.


The following facts hold for invertible functions T and U.

1. (TU)−1 = U−1T−1.2. (T−1)−1 = T; in particular, T−1 is invertible.

We often use the fact that a function is invertible if and only if it is bothone-to-one and onto. We can therefore restate Theorem 2.5 as follows.

3. Let T : V → W be a linear transformation, where V and W are finite-dimensional spaces of equal dimension. Then T is invertible if and onlyif rank(T) = dim(V).

Example 1

Let T : P1(R) → R2 be the linear transformation defined by T(a + bx) =(a, a+ b). The reader can verify directly that T−1 : R2 → P1(R) is defined byT−1(c, d) = c + (d − c)x. Observe that T−1 is also linear. As Theorem 2.17demonstrates, this is true in general. ♦

Theorem 2.17. Let V and W be vector spaces, and let T : V → W belinear and invertible. Then T−1 : W → V is linear.

Proof. Let y1, y2 ∈ W and c ∈ F . Since T is onto and one-to-one, thereexist unique vectors x1 and x2 such that T(x1) = y1 and T(x2) = y2. Thusx1 = T−1(y1) and x2 = T−1(y2); so

T−1(cy1 + y2) = T−1[cT(x1) + T(x2)] = T−1[T(cx1 + x2)]

= cx1 + x2 = cT−1(y1) + T−1(y2).

It now follows immediately from Theorem 2.5 (p. 71) that if T is a lineartransformation between vector spaces of equal (finite) dimension, then theconditions of being invertible, one-to-one, and onto are all equivalent.

We are now ready to define the inverse of a matrix. The reader shouldnote the analogy with the inverse of a linear transformation.

Definition. Let A be an n × n matrix. Then A is invertible if thereexists an n × n matrix B such that AB = BA = I.

If A is invertible, then the matrix B such that AB = BA = I is unique. (IfC were another such matrix, then C = CI = C(AB) = (CA)B = IB = B.)The matrix B is called the inverse of A and is denoted by A−1.

Example 2

The reader should verify that the inverse of(5 72 3

)is

(3 −7

−2 5

). ♦


In Section 3.2, we learn a technique for computing the inverse of a matrix.At this point, we develop a number of results that relate the inverses ofmatrices to the inverses of linear transformations.

Lemma. Let T be an invertible linear transformation from V to W. ThenV is finite-dimensional if and only if W is finite-dimensional. In this case,dim(V) = dim(W).

Proof. Suppose that V is finite-dimensional. Let β = {x1, x2, . . . , xn} be abasis for V. By Theorem 2.2 (p. 68), T(β) spans R(T) = W; hence W is finite-dimensional by Theorem 1.9 (p. 44). Conversely, if W is finite-dimensional,then so is V by a similar argument, using T−1.

Now suppose that V and W are finite-dimensional. Because T is one-to-oneand onto, we have

nullity(T) = 0 and rank(T) = dim(R(T)) = dim(W).

So by the dimension theorem (p. 70), it follows that dim(V) = dim(W).

Theorem 2.18. Let V and W be finite-dimensional vector spaces withordered bases β and γ, respectively. Let T : V → W be linear. Then T isinvertible if and only if [T]γβ is invertible. Furthermore, [T−1]βγ = ([T]γβ)−1.

Proof. Suppose that T is invertible. By the lemma, we have dim(V) =dim(W). Let n = dim(V). So [T]γβ is an n × n matrix. Now T−1 : W → V

satisfies TT−1 = IW and T−1T = IV. Thus

In = [IV]β = [T−1T]β = [T−1]βγ [T]γβ .

Similarly, [T]γβ [T−1]βγ = In. So [T]γβ is invertible and([T]γβ

)−1

= [T−1]βγ .

Now suppose that A = [T]γβ is invertible. Then there exists an n × nmatrix B such that AB = BA = In. By Theorem 2.6 (p. 72), there existsU ∈ L(W, V) such that

U(wj) =n∑

i=1

Bijvi for j = 1, 2, . . . , n,

where γ = {w1, w2, . . . , wn} and β = {v1, v2, . . . , vn}. It follows that [U]βγ =B. To show that U = T−1, observe that

[UT]β = [U]βγ [T]γβ = BA = In = [IV]β

by Theorem 2.11 (p. 88). So UT = IV, and similarly, TU = IW.


Example 3

Let β and γ be the standard ordered bases of P1(R) and R2, respectively. ForT as in Example 1, we have

[T]γβ =(

1 01 1

)and [T−1]βγ =

(1 0

−1 1

).

It can be verified by matrix multiplication that each matrix is the inverse ofthe other. ♦

Corollary 1. Let V be a finite-dimensional vector space with an orderedbasis β, and let T : V → V be linear. Then T is invertible if and only if [T]βis invertible. Furthermore, [T−1]β = ([T]β)−1

.

Proof. Exercise.

Corollary 2. Let A be an n× n matrix. Then A is invertible if and onlyif LA is invertible. Furthermore, (LA)−1 = LA−1 .

Proof. Exercise.

The notion of invertibility may be used to formalize what may alreadyhave been observed by the reader, that is, that certain vector spaces stronglyresemble one another except for the form of their vectors. For example, inthe case of M2×2(F ) and F4, if we associate to each matrix(

a bc d

)the 4-tuple (a, b, c, d), we see that sums and scalar products associate in asimilar manner; that is, in terms of the vector space structure, these twovector spaces may be considered identical or isomorphic.

Definitions. Let V and W be vector spaces. We say that V is isomor-phic to W if there exists a linear transformation T : V → W that is invertible.Such a linear transformation is called an isomorphism from V onto W.

We leave as an exercise (see Exercise 13) the proof that “is isomorphicto” is an equivalence relation. (See Appendix A.) So we need only say thatV and W are isomorphic.

Example 4

Define T : F2 → P1(F ) by T(a1, a2) = a1 + a2x. It is easily checked that T isan isomorphism; so F2 is isomorphic to P1(F ). ♦


Example 5

Define

T : P3(R) → M2×2(R) by T(f) =(

f(1) f(2)f(3) f(4)

).

It is easily verified that T is linear. By use of the Lagrange interpolationformula in Section 1.6, it can be shown (compare with Exercise 22) thatT(f) = O only when f is the zero polynomial. Thus T is one-to-one (seeExercise 11). Moreover, because dim(P3(R)) = dim(M2×2(R)), it follows thatT is invertible by Theorem 2.5 (p. 71). We conclude that P3(R) is isomorphicto M2×2(R). ♦

In each of Examples 4 and 5, the reader may have observed that isomor-phic vector spaces have equal dimensions. As the next theorem shows, thisis no coincidence.

Theorem 2.19. Let V and W be finite-dimensional vector spaces (overthe same field). Then V is isomorphic to W if and only if dim(V) = dim(W).

Proof. Suppose that V is isomorphic to W and that T : V → W is anisomorphism from V to W. By the lemma preceding Theorem 2.18, we havethat dim(V) = dim(W).

Now suppose that dim(V) = dim(W), and let β = {v1, v2, . . . , vn} andγ = {w1, w2, . . . , wn} be bases for V and W, respectively. By Theorem 2.6(p. 72), there exists T : V → W such that T is linear and T(vi) = wi fori = 1, 2, . . . , n. Using Theorem 2.2 (p. 68), we have

R(T) = span(T(β)) = span(γ) = W.

So T is onto. From Theorem 2.5 (p. 71), we have that T is also one-to-one.Hence T is an isomorphism.

By the lemma to Theorem 2.18, if V and W are isomorphic, then eitherboth of V and W are finite-dimensional or both are infinite-dimensional.

Corollary. Let V be a vector space over F . Then V is isomorphic to Fn

if and only if dim(V) = n.

Up to this point, we have associated linear transformations with theirmatrix representations. We are now in a position to prove that, as a vectorspace, the collection of all linear transformations between two given vectorspaces may be identified with the appropriate vector space of m×n matrices.

Theorem 2.20. Let V and W be finite-dimensional vector spaces over Fof dimensions n and m, respectively, and let β and γ be ordered bases for Vand W, respectively. Then the function Φ: L(V, W) → Mm×n(F ), defined byΦ(T) = [T]γβ for T ∈ L(V, W), is an isomorphism.


Proof. By Theorem 2.8 (p. 82), Φ is linear. Hence we must show that Φis one-to-one and onto. This is accomplished if we show that for every m×nmatrix A, there exists a unique linear transformation T : V → W such thatΦ(T) = A. Let β = {v1, v2, . . . , vn}, γ = {w1, w2, . . . , wm}, and let A be agiven m × n matrix. By Theorem 2.6 (p. 72), there exists a unique lineartransformation T : V → W such that

T(vj) =m∑

i=1

Aijwi for 1 ≤ j ≤ n.

But this means that [T]γβ = A, or Φ(T) = A. Thus Φ is an isomorphism.

Corollary. Let V and W be finite-dimensional vector spaces of dimensionsn and m, respectively. Then L(V, W) is finite-dimensional of dimension mn.

Proof. The proof follows from Theorems 2.20 and 2.19 and the fact thatdim(Mm×n(F )) = mn.

We conclude this section with a result that allows us to see more clearlythe relationship between linear transformations defined on abstract finite-dimensional vector spaces and linear transformations from Fn to Fm.

We begin by naming the transformation x → [x]β introduced in Sec-tion 2.2.

Definition. Let β be an ordered basis for an n-dimensional vector spaceV over the field F . The standard representation of V with respect toβ is the function φβ : V → Fn defined by φβ(x) = [x]β for each x ∈ V.

Example 6

Let β = {(1, 0), (0, 1)} and γ = {(1, 2), (3, 4)}. It is easily observed that βand γ are ordered bases for R2. For x = (1,−2), we have

φβ(x) = [x]β =(

1−2

)and φγ(x) = [x]γ =

(−52

). ♦

We observed earlier that φβ is a linear transformation. The next theoremtells us much more.

Theorem 2.21. For any finite-dimensional vector space V with orderedbasis β, φβ is an isomorphism.

Proof. Exercise.

This theorem provides us with an alternate proof that an n-dimensionalvector space is isomorphic to Fn (see the corollary to Theorem 2.19).


Fn Fm

V W

�

�

� �

T

LA

φβ φγ

........................................................................................................................................................... ........................................ .................. .................. .................. .................. .................. .................. .........................................................................................................................

...............................................................................................................................................................................................................

� �

(1)(2)

Figure 2.2

Let V and W be vector spaces of dimension n and m, respectively, and letT : V → W be a linear transformation. Define A = [T]γβ , where β and γ arearbitrary ordered bases of V and W, respectively. We are now able to use φβ

and φγ to study the relationship between the linear transformations T andLA : Fn → Fm.

Let us first consider Figure 2.2. Notice that there are two composites oflinear transformations that map V into Fm:

1. Map V into Fn with φβ and follow this transformation with LA; thisyields the composite LAφβ .

2. Map V into W with T and follow it by φγ to obtain the composite φγT.

These two composites are depicted by the dashed arrows in the diagram.By a simple reformulation of Theorem 2.14 (p. 91), we may conclude that

LAφβ = φγT;

that is, the diagram “commutes.” Heuristically, this relationship indicatesthat after V and W are identified with Fn and Fm via φβ and φγ , respectively,we may “identify” T with LA. This diagram allows us to transfer operationson abstract vector spaces to ones on Fn and Fm.

Example 7

Recall the linear transformation T : P3(R) → P2(R) defined in Example 4 ofSection 2.2 (T(f(x)) = f ′(x)). Let β and γ be the standard ordered bases forP3(R) and P2(R), respectively, and let φβ : P3(R) → R4 and φγ : P2(R) → R3

be the corresponding standard representations of P3(R) and P2(R). If A =[T]γβ , then

A =

⎛⎝0 1 0 00 0 2 00 0 0 3

⎞⎠ .


Consider the polynomial p(x) = 2+x−3x2+5x3. We show that LAφβ(p(x)) =φγT(p(x)). Now

LAφβ(p(x)) =

⎛⎝0 1 0 00 0 2 00 0 0 3

⎞⎠⎛⎜⎜⎝

21

−35

⎞⎟⎟⎠ =

⎛⎝ 1−615

⎞⎠ .

But since T(p(x)) = p′(x) = 1 − 6x + 15x2, we have

φγT(p(x)) =

⎛⎝ 1−615

⎞⎠ .

So LAφβ(p(x)) = φγT(p(x)). ♦Try repeating Example 7 with different polynomials p(x).

EXERCISES

1. Label the following statements as true or false. In each part, V andW are vector spaces with ordered (finite) bases α and β, respectively,T : V → W is linear, and A and B are matrices.

(a)([T]βα

)−1 = [T−1]βα.(b) T is invertible if and only if T is one-to-one and onto.(c) T = LA, where A = [T]βα.(d) M2×3(F ) is isomorphic to F5.(e) Pn(F ) is isomorphic to Pm(F ) if and only if n = m.(f) AB = I implies that A and B are invertible.(g) If A is invertible, then (A−1)−1 = A.(h) A is invertible if and only if LA is invertible.(i) A must be square in order to possess an inverse.

2. For each of the following linear transformations T, determine whetherT is invertible and justify your answer.

(a) T : R2 → R3 defined by T(a1, a2) = (a1 − 2a2, a2, 3a1 + 4a2).(b) T : R2 → R3 defined by T(a1, a2) = (3a1 − a2, a2, 4a1).(c) T : R3 → R3 defined by T(a1, a2, a3) = (3a1 − 2a3, a2, 3a1 + 4a2).(d) T : P3(R) → P2(R) defined by T(p(x)) = p′(x).

(e) T : M2×2(R) → P2(R) defined by T

(a bc d

)= a + 2bx + (c + d)x2.

(f) T : M2×2(R) → M2×2(R) defined by T

(a bc d

)=(

a + b ac c + d

).


3. Which of the following pairs of vector spaces are isomorphic? Justifyyour answers.

(a) F3 and P3(F ).(b) F4 and P3(F ).(c) M2×2(R) and P3(R).(d) V = {A ∈ M2×2(R) : tr(A) = 0} and R4.

4.† Let A and B be n × n invertible matrices. Prove that AB is invertibleand (AB)−1 = B−1A−1.

5.† Let A be invertible. Prove that At is invertible and (At)−1 = (A−1)t.

6. Prove that if A is invertible and AB = O, then B = O.

7. Let A be an n × n matrix.

(a) Suppose that A2 = O. Prove that A is not invertible.(b) Suppose that AB = O for some nonzero n×n matrix B. Could A

be invertible? Explain.

8. Prove Corollaries 1 and 2 of Theorem 2.18.

9. Let A and B be n×n matrices such that AB is invertible. Prove that Aand B are invertible. Give an example to show that arbitrary matricesA and B need not be invertible if AB is invertible.

10.† Let A and B be n × n matrices such that AB = In.

(a) Use Exercise 9 to conclude that A and B are invertible.(b) Prove A = B−1 (and hence B = A−1). (We are, in effect, saying

that for square matrices, a “one-sided” inverse is a “two-sided”inverse.)

(c) State and prove analogous results for linear transformations de-fined on finite-dimensional vector spaces.

11. Verify that the transformation in Example 5 is one-to-one.

12. Prove Theorem 2.21.

13. Let ∼ mean “is isomorphic to.” Prove that ∼ is an equivalence relationon the class of vector spaces over F .

14. Let

V ={(

a a + b0 c

): a, b, c ∈ F

}.

Construct an isomorphism from V to F3.


15. Let V and W be finite-dimensional vector spaces, and let T : V → W bea linear transformation. Suppose that β is a basis for V. Prove that Tis an isomorphism if and only if T(β) is a basis for W.

16. Let B be an n × n invertible matrix. Define Φ: Mn×n(F ) → Mn×n(F )by Φ(A) = B−1AB. Prove that Φ is an isomorphism.

17.† Let V and W be finite-dimensional vector spaces and T : V → W be anisomorphism. Let V0 be a subspace of V.

(a) Prove that T(V0) is a subspace of W.(b) Prove that dim(V0) = dim(T(V0)).

18. Repeat Example 7 with the polynomial p(x) = 1 + x + 2x2 + x3.

19. In Example 5 of Section 2.1, the mapping T : M2×2(R) → M2×2(R) de-fined by T(M) = M t for each M ∈ M2×2(R) is a linear transformation.Let β = {E11, E12, E21, E22}, which is a basis for M2×2(R), as noted inExample 3 of Section 1.6.

(a) Compute [T]β .(b) Verify that LAφβ(M) = φβT(M) for A = [T]β and

M =(

1 23 4

).

20.† Let T : V → W be a linear transformation from an n-dimensional vectorspace V to an m-dimensional vector space W. Let β and γ be orderedbases for V and W, respectively. Prove that rank(T) = rank(LA) andthat nullity(T) = nullity(LA), where A = [T]γβ . Hint: Apply Exercise 17to Figure 2.2.

21. Let V and W be finite-dimensional vector spaces with ordered basesβ = {v1, v2, . . . , vn} and γ = {w1, w2, . . . , wm}, respectively. By The-orem 2.6 (p. 72), there exist linear transformations Tij : V → W suchthat

Tij(vk) =

{wi if k = j

0 if k �= j.

First prove that {Tij : 1 ≤ i ≤ m, 1 ≤ j ≤ n} is a basis for L(V, W).Then let M ij be the m×n matrix with 1 in the ith row and jth columnand 0 elsewhere, and prove that [Tij ]

γβ = M ij . Again by Theorem 2.6,

there exists a linear transformation Φ: L(V, W) → Mm×n(F ) such thatΦ(Tij) = M ij . Prove that Φ is an isomorphism.


22. Let c0, c1, . . . , cn be distinct scalars from an infinite field F . DefineT : Pn(F ) → Fn+1 by T(f) = (f(c0), f(c1), . . . , f(cn)). Prove that T isan isomorphism. Hint: Use the Lagrange polynomials associated withc0, c1, . . . , cn.

23. Let V denote the vector space defined in Example 5 of Section 1.2, andlet W = P(F ). Define

T : V → W by T(σ) =n∑

i=0

σ(i)xi,

where n is the largest integer such that σ(n) �= 0. Prove that T is anisomorphism.

The following exercise requires familiarity with the concept of quotient spacedefined in Exercise 31 of Section 1.3 and with Exercise 40 of Section 2.1.

24. Let T : V → Z be a linear transformation of a vector space V onto avector space Z. Define the mapping

T : V/N(T) → Z by T(v + N(T)) = T(v)

for any coset v + N(T) in V/N(T).

(a) Prove that T is well-defined; that is, prove that if v + N(T) =v′ + N(T), then T(v) = T(v′).

(b) Prove that T is linear.(c) Prove that T is an isomorphism.(d) Prove that the diagram shown in Figure 2.3 commutes; that is,

prove that T = Tη.

V/N(T)

V ZT

η T

�

�

�

Figure 2.3

25. Let V be a nonzero vector space over a field F , and suppose that S isa basis for V. (By the corollary to Theorem 1.13 (p. 60) in Section 1.7,every vector space has a basis). Let C(S, F ) denote the vector space ofall functions f ∈ F(S, F ) such that f(s) = 0 for all but a finite number


of vectors in S. (See Exercise 14 of Section 1.3.) Let Ψ: C(S, F ) → Vbe the function defined by

Ψ(f) =∑

s∈S,f(s) �=0

f(s)s.

Prove that Ψ is an isomorphism. Thus every nonzero vector space canbe viewed as a space of functions.

2.5 THE CHANGE OF COORDINATE MATRIX

In many areas of mathematics, a change of variable is used to simplify theappearance of an expression. For example, in calculus an antiderivative of2xex2

can be found by making the change of variable u = x2. The resultingexpression is of such a simple form that an antiderivative is easily recognized:∫

2xex2dx =

∫eu du = eu + c = ex2

+ c.

Similarly, in geometry the change of variable

x =2√5x′ − 1√

5y′

y =1√5x′ +

2√5y′

can be used to transform the equation 2x2 − 4xy + 5y2 = 1 into the simplerequation (x′)2+6(y′)2 = 1, in which form it is easily seen to be the equation ofan ellipse. (See Figure 2.4.) We see how this change of variable is determinedin Section 6.5. Geometrically, the change of variable(

xy

)→(

x′

y′

)is a change in the way that the position of a point P in the plane is described.This is done by introducing a new frame of reference, an x′y′-coordinatesystem with coordinate axes rotated from the original xy-coordinate axes. Inthis case, the new coordinate axes are chosen to lie in the direction of theaxes of the ellipse. The unit vectors along the x′-axis and the y′-axis form anordered basis

β′ ={

1√5

(21

),

1√5

(−12

)}

for R2, and the change of variable is actually a change from [P ]β =(

xy

), the

coordinate vector of P relative to the standard ordered basis β = {e1, e2}, to

Sec. 2.5 The Change of Coordinate Matrix 111

[P ]β′ =(

x′

y′

), the coordinate vector of P relative to the new rotated basis β′.

�

�

x

x′

yy′

Figure 2.4

A natural question arises: How can a coordinate vector relative to one ba-sis be changed into a coordinate vector relative to the other? Notice that thesystem of equations relating the new and old coordinates can be representedby the matrix equation (

xy

)=

1√5

(2 −11 2

)(x′

y′

).

Notice also that the matrix

Q =1√5

(2 −11 2

)equals [I]ββ′ , where I denotes the identity transformation on R2. Thus [v]β =Q[v]β′ for all v ∈ R2. A similar result is true in general.

Theorem 2.22. Let β and β′ be two ordered bases for a finite-dimensionalvector space V, and let Q = [IV]ββ′ . Then

(a) Q is invertible.(b) For any v ∈ V, [v]β = Q[v]β′ .

Proof. (a) Since IV is invertible, Q is invertible by Theorem 2.18 (p. 101).(b) For any v ∈ V,

[v]β = [IV(v)]β = [IV]ββ′ [v]β′ = Q[v]β′

by Theorem 2.14 (p. 91).


The matrix Q = [IV]ββ′ defined in Theorem 2.22 is called a change of coor-dinate matrix. Because of part (b) of the theorem, we say that Q changesβ′-coordinates into β-coordinates. Observe that if β = {x1, x2, . . . , xn}and β′ = {x′

1, x′2, . . . , x′

n}, then

x′j =

n∑i=1

Qijxi

for j = 1, 2, . . . , n; that is, the jth column of Q is [x′j ]β .

Notice that if Q changes β′-coordinates into β-coordinates, then Q−1

changes β-coordinates into β′-coordinates. (See Exercise 11.)

Example 1

In R2, let β = {(1, 1), (1,−1)} and β′ = {(2, 4), (3, 1)}. Since

(2, 4) = 3(1, 1) − 1(1,−1) and (3, 1) = 2(1, 1) + 1(1,−1),

the matrix that changes β′-coordinates into β-coordinates is

Q =(

3 2−1 1

).

Thus, for instance,

[(2, 4)]β = Q[(2, 4)]β′ = Q

(10

)=(

3−1

). ♦

For the remainder of this section, we consider only linear transformationsthat map a vector space V into itself. Such a linear transformation is called alinear operator on V. Suppose now that T is a linear operator on a finite-dimensional vector space V and that β and β′ are ordered bases for V. ThenV can be represented by the matrices [T]β and [T]β′ . What is the relationshipbetween these matrices? The next theorem provides a simple answer using achange of coordinate matrix.

Theorem 2.23. Let T be a linear operator on a finite-dimensional vectorspace V, and let β and β′ be ordered bases for V. Suppose that Q is thechange of coordinate matrix that changes β′-coordinates into β-coordinates.Then

[T]β′ = Q−1[T]βQ.

Proof. Let I be the identity transformation on V. Then T = IT = TI;hence, by Theorem 2.11 (p. 88),

Q[T]β′ = [I]ββ′ [T]β′

β′ = [IT]ββ′ = [TI]ββ′ = [T]ββ [I]ββ′ = [T]βQ.

Therefore [T]β′ = Q−1[T]βQ.


Example 2

Let T be the linear operator on R2 defined by

T

(ab

)=(

3a − ba + 3b

),

and let β and β′ be the ordered bases in Example 1. The reader should verifythat

[T]β =(

3 1−1 3

).

In Example 1, we saw that the change of coordinate matrix that changesβ′-coordinates into β-coordinates is

Q =(

3 2−1 1

),

and it is easily verified that

Q−1 =15

(1 −21 3

).

Hence, by Theorem 2.23,

[T]β′ = Q−1[T]βQ =(

4 1−2 2

).

To show that this is the correct matrix, we can verify that the imageunder T of each vector of β′ is the linear combination of the vectors of β′

with the entries of the corresponding column as its coefficients. For example,the image of the second vector in β′ is

T

(31

)=(

86

)= 1

(24

)+ 2

(31

).

Notice that the coefficients of the linear combination are the entries of thesecond column of [T]β′ . ♦

It is often useful to apply Theorem 2.23 to compute [T]β , as the nextexample shows.

Example 3

Recall the reflection about the x-axis in Example 3 of Section 2.1. The rule(x, y) → (x,−y) is easy to obtain. We now derive the less obvious rule forthe reflection T about the line y = 2x. (See Figure 2.5.) We wish to find anexpression for T(a, b) for any (a, b) in R2. Since T is linear, it is completely


��

��

��

��

�

��

�

(1, 2)(−2, 1)

y = 2x

(a, b)

T(a, b)

x

y

Figure 2.5

determined by its values on a basis for R2. Clearly, T(1, 2) = (1, 2) andT(−2, 1) = −(−2, 1) = (2,−1). Therefore if we let

β′ ={(

12

),

(−21

)},

then β′ is an ordered basis for R2 and

[T]β′ =(

1 00 −1

).

Let β be the standard ordered basis for R2, and let Q be the matrix thatchanges β′-coordinates into β-coordinates. Then

Q =(

1 −22 1

)and Q−1[T]βQ = [T]β′ . We can solve this equation for [T]β to obtain that[T]β = Q[T]β′Q−1. Because

Q−1 =15

(1 2

−2 1

),

the reader can verify that

[T]β =15

(−3 44 3

).

Since β is the standard ordered basis, it follows that T is left-multiplicationby [T]β . Thus for any (a, b) in R2, we have

T

(ab

)=

15

(−3 44 3

)(ab

)=

15

(−3a + 4b4a + 3b

). ♦


A useful special case of Theorem 2.23 is contained in the next corollary,whose proof is left as an exercise.

Corollary. Let A ∈ Mn×n(F ), and let γ be an ordered basis for Fn. Then[LA]γ = Q−1AQ, where Q is the n × n matrix whose jth column is the jthvector of γ.

Example 4

Let

A =

⎛⎝2 1 01 1 30 −1 0

⎞⎠ ,

and let

γ =

⎧⎨⎩⎛⎝−1

00

⎞⎠ ,

⎛⎝210

⎞⎠ ,

⎛⎝111

⎞⎠⎫⎬⎭ ,

which is an ordered basis for R3. Let Q be the 3×3 matrix whose jth columnis the jth vector of γ. Then

Q =

⎛⎝−1 2 10 1 10 0 1

⎞⎠ and Q−1 =

⎛⎝−1 2 −10 1 −10 0 1

⎞⎠ .

So by the preceding corollary,

[LA]γ = Q−1AQ =

⎛⎝ 0 2 8−1 4 6

0 −1 −1

⎞⎠ . ♦

The relationship between the matrices [T]β′ and [T]β in Theorem 2.23 willbe the subject of further study in Chapters 5, 6, and 7. At this time, however,we introduce the name for this relationship.

Definition. Let A and B be matrices in Mn×n(F ). We say that B issimilar to A if there exists an invertible matrix Q such that B = Q−1AQ.

Observe that the relation of similarity is an equivalence relation (see Ex-ercise 9). So we need only say that A and B are similar.

Notice also that in this terminology Theorem 2.23 can be stated as follows:If T is a linear operator on a finite-dimensional vector space V, and if β andβ′ are any ordered bases for V, then [T]β′ is similar to [T]β .

Theorem 2.23 can be generalized to allow T : V → W, where V is distinctfrom W. In this case, we can change bases in V as well as in W (see Exercise 8).


EXERCISES


(a) Suppose that β = {x1, x2, . . . , xn} and β′ = {x′1, x

′2, . . . , x

′n} are

ordered bases for a vector space and Q is the change of coordinatematrix that changes β′-coordinates into β-coordinates. Then thejth column of Q is [xj ]β′ .

(b) Every change of coordinate matrix is invertible.(c) Let T be a linear operator on a finite-dimensional vector space V,

let β and β′ be ordered bases for V, and let Q be the change ofcoordinate matrix that changes β′-coordinates into β-coordinates.Then [T]β = Q[T]β′Q−1.

(d) The matrices A, B ∈ Mn×n(F ) are called similar if B = QtAQ forsome Q ∈ Mn×n(F ).

(e) Let T be a linear operator on a finite-dimensional vector space V.Then for any ordered bases β and γ for V, [T]β is similar to [T]γ .

2. For each of the following pairs of ordered bases β and β′ for R2, findthe change of coordinate matrix that changes β′-coordinates into β-coordinates.

(a) β = {e1, e2} and β′ = {(a1, a2), (b1, b2)}(b) β = {(−1, 3), (2,−1)} and β′ = {(0, 10), (5, 0)}(c) β = {(2, 5), (−1,−3)} and β′ = {e1, e2}(d) β = {(−4, 3), (2,−1)} and β′ = {(2, 1), (−4, 1)}

3. For each of the following pairs of ordered bases β and β′ for P2(R),find the change of coordinate matrix that changes β′-coordinates intoβ-coordinates.

(a) β = {x2, x, 1} andβ′ = {a2x

2 + a1x + a0, b2x2 + b1x + b0, c2x

2 + c1x + c0}(b) β = {1, x, x2} and

β′ = {a2x2 + a1x + a0, b2x

2 + b1x + b0, c2x2 + c1x + c0}

(c) β = {2x2 − x, 3x2 + 1, x2} and β′ = {1, x, x2}(d) β = {x2 − x + 1, x + 1, x2 + 1} and

β′ = {x2 + x + 4, 4x2 − 3x + 2, 2x2 + 3}(e) β = {x2 − x, x2 + 1, x − 1} and

β′ = {5x2 − 2x − 3,−2x2 + 5x + 5, 2x2 − x − 3}(f) β = {2x2 − x + 1, x2 + 3x − 2,−x2 + 2x + 1} and

β′ = {9x − 9, x2 + 21x − 2, 3x2 + 5x + 2}4. Let T be the linear operator on R2 defined by

T

(ab

)=(

2a + ba − 3b

),


let β be the standard ordered basis for R2, and let

β′ ={(

11

),

(12

)}.

Use Theorem 2.23 and the fact that(1 11 2

)−1

=(

2 −1−1 1

)to find [T]β′ .

5. Let T be the linear operator on P1(R) defined by T(p(x)) = p′(x),the derivative of p(x). Let β = {1, x} and β′ = {1 + x, 1 − x}. UseTheorem 2.23 and the fact that(

1 11 −1

)−1

=

⎛⎝ 12

12

12 − 1

2

⎞⎠to find [T]β′ .

6. For each matrix A and ordered basis β, find [LA]β . Also, find an invert-ible matrix Q such that [LA]β = Q−1AQ.

(a) A =(

1 31 1

)and β =

{(11

),

(12

)}(b) A =

(1 22 1

)and β =

{(11

),

(1

−1

)}

(c) A =

⎛⎝1 1 −12 0 11 1 0

⎞⎠ and β =

⎧⎨⎩⎛⎝1

11

⎞⎠ ,

⎛⎝101

⎞⎠ ,

⎛⎝112

⎞⎠⎫⎬⎭(d) A =

⎛⎝13 1 41 13 44 4 10

⎞⎠ and β =

⎧⎨⎩⎛⎝ 1

1−2

⎞⎠ ,

⎛⎝ 1−1

0

⎞⎠ ,

⎛⎝111

⎞⎠⎫⎬⎭7. In R2, let L be the line y = mx, where m �= 0. Find an expression for

T(x, y), where

(a) T is the reflection of R2 about L.(b) T is the projection on L along the line perpendicular to L. (See

the definition of projection in the exercises of Section 2.1.)

8. Prove the following generalization of Theorem 2.23. Let T : V → W bea linear transformation from a finite-dimensional vector space V to afinite-dimensional vector space W. Let β and β′ be ordered bases for


V, and let γ and γ′ be ordered bases for W. Then [T]γ′

β′ = P−1[T]γβQ,where Q is the matrix that changes β′-coordinates into β-coordinatesand P is the matrix that changes γ′-coordinates into γ-coordinates.

9. Prove that “is similar to” is an equivalence relation on Mn×n(F ).

10. Prove that if A and B are similar n × n matrices, then tr(A) = tr(B).Hint: Use Exercise 13 of Section 2.3.

11. Let V be a finite-dimensional vector space with ordered bases α, β,and γ.

(a) Prove that if Q and R are the change of coordinate matrices thatchange α-coordinates into β-coordinates and β-coordinates intoγ-coordinates, respectively, then RQ is the change of coordinatematrix that changes α-coordinates into γ-coordinates.

(b) Prove that if Q changes α-coordinates into β-coordinates, thenQ−1 changes β-coordinates into α-coordinates.

12. Prove the corollary to Theorem 2.23.

13.† Let V be a finite-dimensional vector space over a field F , and let β ={x1, x2, . . . , xn} be an ordered basis for V. Let Q be an n×n invertiblematrix with entries from F . Define

x′j =

n∑i=1

Qijxi for 1 ≤ j ≤ n,

and set β′ = {x′1, x

′2, . . . , x

′n}. Prove that β′ is a basis for V and hence

that Q is the change of coordinate matrix changing β′-coordinates intoβ-coordinates.

14. Prove the converse of Exercise 8: If A and B are each m × n matriceswith entries from a field F , and if there exist invertible m×m and n×nmatrices P and Q, respectively, such that B = P−1AQ, then there existan n-dimensional vector space V and an m-dimensional vector space W(both over F ), ordered bases β and β′ for V and γ and γ′ for W, and alinear transformation T : V → W such that

A = [T]γβ and B = [T]γ′

β′ .

Hints: Let V = Fn, W = Fm, T = LA, and β and γ be the standardordered bases for Fn and Fm, respectively. Now apply the results ofExercise 13 to obtain ordered bases β′ and γ′ from β and γ via Q andP , respectively.

Sec. 2.6 Dual Spaces 119

2.6∗ DUAL SPACES

In this section, we are concerned exclusively with linear transformations froma vector space V into its field of scalars F , which is itself a vector space of di-mension 1 over F . Such a linear transformation is called a linear functionalon V. We generally use the letters f, g, h, . . . to denote linear functionals. Aswe see in Example 1, the definite integral provides us with one of the mostimportant examples of a linear functional in mathematics.

Example 1

Let V be the vector space of continuous real-valued functions on the interval[0, 2π]. Fix a function g ∈ V. The function h : V → R defined by

h(x) =12π

∫ 2π

0

x(t)g(t) dt

is a linear functional on V. In the cases that g(t) equals sin nt or cos nt, h(x)is often called the nth Fourier coefficient of x. ♦Example 2

Let V = Mn×n(F ), and define f : V → F by f(A) = tr(A), the trace of A. ByExercise 6 of Section 1.3, we have that f is a linear functional. ♦Example 3

Let V be a finite-dimensional vector space, and let β = {x1, x2, . . . , xn} bean ordered basis for V. For each i = 1, 2, . . . , n, define fi(x) = ai, where

[x]β =

⎛⎜⎜⎜⎝a1

a2

...an

⎞⎟⎟⎟⎠is the coordinate vector of x relative to β. Then fi is a linear functional on Vcalled the ith coordinate function with respect to the basis β. Notethat fi(xj) = δij , where δij is the Kronecker delta. These linear functionalsplay an important role in the theory of dual spaces (see Theorem 2.24). ♦

Definition. For a vector space V over F , we define the dual space ofV to be the vector space L(V, F ), denoted by V∗.

Thus V∗ is the vector space consisting of all linear functionals on V withthe operations of addition and scalar multiplication as defined in Section 2.2.Note that if V is finite-dimensional, then by the corollary to Theorem 2.20(p. 104)

dim(V∗) = dim(L(V, F )) = dim(V) · dim(F ) = dim(V).


Hence by Theorem 2.19 (p. 103), V and V∗ are isomorphic. We also definethe double dual V∗∗ of V to be the dual of V∗. In Theorem 2.26, we show,in fact, that there is a natural identification of V and V∗∗ in the case that Vis finite-dimensional.

Theorem 2.24. Suppose that V is a finite-dimensional vector space withthe ordered basis β = {x1, x2, . . . , xn}. Let fi (1 ≤ i ≤ n) be the ith coordi-nate function with respect to β as just defined, and let β∗ = {f1, f2, . . . , fn}.Then β∗ is an ordered basis for V∗, and, for any f ∈ V∗, we have

f =n∑

i=1

f(xi)fi.

Proof. Let f ∈ V∗. Since dim(V∗) = n, we need only show that

f =n∑

i=1

f(xi)fi,

from which it follows that β∗ generates V∗, and hence is a basis by Corollary2(a) to the replacement theorem (p. 47). Let

g =n∑

i=1

f(xi)fi.

For 1 ≤ j ≤ n, we have

g(xj) =

(n∑

i=1

f(xi)fi

)(xj) =

n∑i=1

f(xi)fi(xj)

=n∑

i=1

f(xi)δij = f(xj).

Therefore f = g by the corollary to Theorem 2.6 (p. 72).

Definition. Using the notation of Theorem 2.24, we call the orderedbasis β∗ = {f1, f2, . . . , fn} of V∗ that satisfies fi(xj) = δij (1 ≤ i, j ≤ n) thedual basis of β.

Example 4

Let β = {(2, 1), (3, 1)} be an ordered basis for R2. Suppose that the dualbasis of β is given by β∗ = {f1, f2}. To explicitly determine a formula for f1,we need to consider the equations

1 = f1(2, 1) = f1(2e1 + e2) = 2f1(e1) + f1(e2)0 = f1(3, 1) = f1(3e1 + e2) = 3f1(e1) + f1(e2).

Solving these equations, we obtain f1(e1) = −1 and f1(e2) = 3; that is,f1(x, y) = −x + 3y. Similarly, it can be shown that f2(x, y) = x − 2y. ♦


We now assume that V and W are finite-dimensional vector spaces over Fwith ordered bases β and γ, respectively. In Section 2.4, we proved that thereis a one-to-one correspondence between linear transformations T : V → W andm × n matrices (over F ) via the correspondence T ↔ [T]γβ . For a matrix ofthe form A = [T]γβ , the question arises as to whether or not there exists alinear transformation U associated with T in some natural way such that Umay be represented in some basis as At. Of course, if m �= n, it would beimpossible for U to be a linear transformation from V into W. We now answerthis question by applying what we have already learned about dual spaces.

Theorem 2.25. Let V and W be finite-dimensional vector spaces overF with ordered bases β and γ, respectively. For any linear transformationT : V → W, the mapping Tt : W∗ → V∗ defined by Tt(g) = gT for all g ∈ W∗

is a linear transformation with the property that [Tt]β∗

γ∗ = ([T]γβ)t.

Proof. For g ∈ W∗, it is clear that Tt(g) = gT is a linear functional on Vand hence is in V∗. Thus Tt maps W∗ into V∗. We leave the proof that Tt islinear to the reader.

To complete the proof, let β = {x1, x2, . . . , xn} and γ = {y1, y2, . . . , ym}with dual bases β∗ = {f1, f2, . . . , fn} and γ∗ = {g1, g2, . . . , gm}, respectively.For convenience, let A = [T]γβ . To find the jth column of [Tt]β

∗γ∗ , we be-

gin by expressing Tt(gj) as a linear combination of the vectors of β∗. ByTheorem 2.24, we have

Tt(gj) = gjT =n∑

s=1

(gjT)(xs)fs.

So the row i, column j entry of [Tt]β∗

γ∗ is

(gjT)(xi) = gj(T(xi)) = gj

(m∑

k=1

Akiyk

)

=m∑

k=1

Akigj(yk) =m∑

k=1

Akiδjk = Aji.

Hence [Tt]β∗

γ∗ = At.

The linear transformation Tt defined in Theorem 2.25 is called the trans-pose of T. It is clear that Tt is the unique linear transformation U such that[U]β

∗γ∗ = ([T]γβ)t.We illustrate Theorem 2.25 with the next example.


Example 5

Define T : P1(R) → R2 by T(p(x)) = (p(0), p(2)). Let β and γ be the standardordered bases for P1(R) and R2, respectively. Clearly,

[T]γβ =(

1 01 2

).

We compute [Tt]β∗

γ∗ directly from the definition. Let β∗ = {f1, f2} and γ∗ =

{g1, g2}. Suppose that [Tt]β∗

γ∗ =(

a bc d

). Then Tt(g1) = af1 + cf2. So

Tt(g1)(1) = (af1 + cf2)(1) = af1(1) + cf2(1) = a(1) + c(0) = a.

But also

(Tt(g1))(1) = g1(T(1)) = g1(1, 1) = 1.

So a = 1. Using similar computations, we obtain that c = 0, b = 1, andd = 2. Hence a direct computation yields

[Tt]β∗

γ∗ =(

1 10 2

)=([T]γβ

)t

,

as predicted by Theorem 2.25. ♦We now concern ourselves with demonstrating that any finite-dimensional

vector space V can be identified in a natural way with its double dual V∗∗.There is, in fact, an isomorphism between V and V∗∗ that does not dependon any choice of bases for the two vector spaces.

For a vector x ∈ V, we define x : V∗ → F by x(f) = f(x) for every f ∈ V∗.It is easy to verify that x is a linear functional on V∗, so x ∈ V∗∗. Thecorrespondence x ↔ x allows us to define the desired isomorphism betweenV and V∗∗.

Lemma. Let V be a finite-dimensional vector space, and let x ∈ V. Ifx(f) = 0 for all f ∈ V∗, then x = 0 .

Proof. Let x �= 0 . We show that there exists f ∈ V∗ such that x(f) �= 0.Choose an ordered basis β = {x1, x2, . . . , xn} for V such that x1 = x. Let{f1, f2, . . . , fn} be the dual basis of β. Then f1(x1) = 1 �= 0. Let f = f1.

Theorem 2.26. Let V be a finite-dimensional vector space, and defineψ : V → V∗∗ by ψ(x) = x. Then ψ is an isomorphism.


Proof. (a) ψ is linear: Let x, y ∈ V and c ∈ F . For f ∈ V∗, we have

ψ(cx + y)(f) = f(cx + y) = cf(x) + f(y) = cx(f) + y(f)= (cx + y)(f).

Therefore

ψ(cx + y) = cx + y = cψ(x) + ψ(y).

(b) ψ is one-to-one: Suppose that ψ(x) is the zero functional on V∗ forsome x ∈ V. Then x(f) = 0 for every f ∈ V∗. By the previous lemma, weconclude that x = 0 .

(c) ψ is an isomorphism: This follows from (b) and the fact that dim(V) =dim(V∗∗).

Corollary. Let V be a finite-dimensional vector space with dual space V∗.Then every ordered basis for V∗ is the dual basis for some basis for V.

Proof. Let {f1, f2, . . . , fn} be an ordered basis for V∗. We may combineTheorems 2.24 and 2.26 to conclude that for this basis for V∗ there exists adual basis {x1, x2, . . . , xn} in V∗∗, that is, δij = xi(fj) = fj(xi) for all i andj. Thus {f1, f2, . . . , fn} is the dual basis of {x1, x2, . . . , xn}.

Although many of the ideas of this section, (e.g., the existence of a dualspace), can be extended to the case where V is not finite-dimensional, only afinite-dimensional vector space is isomorphic to its double dual via the mapx → x. In fact, for infinite-dimensional vector spaces, no two of V, V∗, andV∗∗ are isomorphic.

EXERCISES

1. Label the following statements as true or false. Assume that all vectorspaces are finite-dimensional.

(a) Every linear transformation is a linear functional.(b) A linear functional defined on a field may be represented as a 1×1

matrix.(c) Every vector space is isomorphic to its dual space.(d) Every vector space is the dual of some other vector space.(e) If T is an isomorphism from V onto V∗ and β is a finite ordered

basis for V, then T(β) = β∗.(f) If T is a linear transformation from V to W, then the domain of

(Tt)t is V∗∗.(g) If V is isomorphic to W, then V∗ is isomorphic to W∗.


(h) The derivative of a function may be considered as a linear func-tional on the vector space of differentiable functions.

2. For the following functions f on a vector space V, determine which arelinear functionals.

(a) V = P(R); f(p(x)) = 2p′(0)+p′′(1), where ′ denotes differentiation(b) V = R2; f(x, y) = (2x, 4y)(c) V = M2×2(F ); f(A) = tr(A)(d) V = R3; f(x, y, z) = x2 + y2 + z2

(e) V = P(R); f(p(x)) =∫ 1

0p(t) dt

(f) V = M2×2(F ); f(A) = A11

3. For each of the following vector spaces V and bases β, find explicitformulas for vectors of the dual basis β∗ for V∗, as in Example 4.

(a) V = R3; β = {(1, 0, 1), (1, 2, 1), (0, 0, 1)}(b) V = P2(R); β = {1, x, x2}

4. Let V = R3, and define f1, f2, f3 ∈ V∗ as follows:

f1(x, y, z) = x − 2y, f2(x, y, z) = x + y + z, f3(x, y, z) = y − 3z.

Prove that {f1, f2, f3} is a basis for V∗, and then find a basis for V forwhich it is the dual basis.

5. Let V = P1(R), and, for p(x) ∈ V, define f1, f2 ∈ V∗ by

f1(p(x)) =∫ 1

0

p(t) dt and f2(p(x)) =∫ 2

0

p(t) dt.

Prove that {f1, f2} is a basis for V∗, and find a basis for V for which itis the dual basis.

6. Define f ∈ (R2)∗ by f(x, y) = 2x + y and T : R2 → R2 by T(x, y) =(3x + 2y, x).

(a) Compute Tt(f).(b) Compute [Tt]β∗ , where β is the standard ordered basis for R2 and

β∗ = {f1, f2} is the dual basis, by finding scalars a, b, c, and d suchthat Tt(f1) = af1 + cf2 and Tt(f2) = bf1 + df2.

(c) Compute [T]β and ([T]β)t, and compare your results with (b).

7. Let V = P1(R) and W = R2 with respective standard ordered bases βand γ. Define T : V → W by

T(p(x)) = (p(0) − 2p(1), p(0) + p′(0)),

where p′(x) is the derivative of p(x).


(a) For f ∈ W∗ defined by f(a, b) = a − 2b, compute Tt(f).(b) Compute [Tt]β

∗γ∗ without appealing to Theorem 2.25.

(c) Compute [T]γβ and its transpose, and compare your results with(b).

8. Show that every plane through the origin in R3 may be identified withthe null space of a vector in (R3)∗. State an analogous result for R2.

9. Prove that a function T : Fn → Fm is linear if and only if there existf1, f2, . . . , fm ∈ (Fn)∗ such that T(x) = (f1(x), f2(x), . . . , fm(x)) for allx ∈ Fn. Hint: If T is linear, define fi(x) = (giT)(x) for x ∈ Fn; that is,fi = Tt(gi) for 1 ≤ i ≤ m, where {g1, g2, . . . , gm} is the dual basis ofthe standard ordered basis for Fm.

10. Let V = Pn(F ), and let c0, c1, . . . , cn be distinct scalars in F .

(a) For 0 ≤ i ≤ n, define fi ∈ V∗ by fi(p(x)) = p(ci). Prove that{f0, f1, . . . , fn} is a basis for V∗. Hint: Apply any linear combi-nation of this set that equals the zero transformation to p(x) =(x− c1)(x− c2) · · · (x− cn), and deduce that the first coefficient iszero.

(b) Use the corollary to Theorem 2.26 and (a) to show that there existunique polynomials p0(x), p1(x), . . . , pn(x) such that pi(cj) = δij

for 0 ≤ i ≤ n. These polynomials are the Lagrange polynomialsdefined in Section 1.6.

(c) For any scalars a0, a1, . . . , an (not necessarily distinct), deduce thatthere exists a unique polynomial q(x) of degree at most n such thatq(ci) = ai for 0 ≤ i ≤ n. In fact,

q(x) =n∑

i=0

aipi(x).

(d) Deduce the Lagrange interpolation formula:

p(x) =n∑

i=0

p(ci)pi(x)

for any p(x) ∈ V.(e) Prove that ∫ b

a

p(t) dt =n∑

i=0

p(ci)di,

where

di =∫ b

a

pi(t) dt.


Suppose now that

ci = a +i(b − a)

nfor i = 0, 1, . . . , n.

For n = 1, the preceding result yields the trapezoidal rule forevaluating the definite integral of a polynomial. For n = 2, thisresult yields Simpson’s rule for evaluating the definite integral ofa polynomial.

11. Let V and W be finite-dimensional vector spaces over F , and let ψ1 andψ2 be the isomorphisms between V and V∗∗ and W and W∗∗, respec-tively, as defined in Theorem 2.26. Let T : V → W be linear, and defineTtt = (Tt)t. Prove that the diagram depicted in Figure 2.6 commutes(i.e., prove that ψ2T = Tttψ1).

VT−−−−→ W

ψ1

⏐⏐! ⏐⏐!ψ2

V∗∗ Ttt

−−−−→ W∗∗

Figure 2.6

12. Let V be a finite-dimensional vector space with the ordered basis β.Prove that ψ(β) = β∗∗, where ψ is defined in Theorem 2.26.

In Exercises 13 through 17, V denotes a finite-dimensional vector space overF . For every subset S of V, define the annihilator S0 of S as

S0 = {f ∈ V∗ : f(x) = 0 for all x ∈ S}.13. (a) Prove that S0 is a subspace of V∗.

(b) If W is a subspace of V and x �∈ W, prove that there exists f ∈ W0

such that f(x) �= 0.(c) Prove that (S0)0 = span(ψ(S)), where ψ is defined as in Theo-

rem 2.26.(d) For subspaces W1 and W2, prove that W1 = W2 if and only if

W01 = W0

2.(e) For subspaces W1 and W2, show that (W1 + W2)0 = W0

1 ∩ W02.

14. Prove that if W is a subspace of V, then dim(W) + dim(W0) = dim(V).Hint: Extend an ordered basis {x1, x2, . . . , xk} of W to an ordered ba-sis β = {x1, x2, . . . , xn} of V. Let β∗ = {f1, f2, . . . , fn}. Prove that{fk+1, fk+2, . . . , fn} is a basis for W0.

Sec. 2.7 Homogeneous Linear Differential Equations with Constant Coefficients 127

15. Suppose that W is a finite-dimensional vector space and that T : V → Wis linear. Prove that N(Tt) = (R(T))0.

16. Use Exercises 14 and 15 to deduce that rank(LAt) = rank(LA) for anyA ∈ Mm×n(F ).

17. Let T be a linear operator on V, and let W be a subspace of V. Provethat W is T-invariant (as defined in the exercises of Section 2.1) if andonly if W0 is Tt-invariant.

18. Let V be a nonzero vector space over a field F , and let S be a basisfor V. (By the corollary to Theorem 1.13 (p. 60) in Section 1.7, everyvector space has a basis.) Let Φ: V∗ → L(S, F ) be the mapping definedby Φ(f) = fS , the restriction of f to S. Prove that Φ is an isomorphism.Hint: Apply Exercise 34 of Section 2.1.

19. Let V be a nonzero vector space, and let W be a proper subspace of V(i.e., W �= V). Prove that there exists a nonzero linear functional f ∈ V∗

such that f(x) = 0 for all x ∈ W. Hint: For the infinite-dimensionalcase, use Exercise 34 of Section 2.1 as well as results about extendinglinearly independent sets to bases in Section 1.7.

20. Let V and W be nonzero vector spaces over the same field, and letT : V → W be a linear transformation.

(a) Prove that T is onto if and only if Tt is one-to-one.(b) Prove that Tt is onto if and only if T is one-to-one.

Hint: Parts of the proof require the result of Exercise 19 for the infinite-dimensional case.

2.7∗ HOMOGENEOUS LINEAR DIFFERENTIAL EQUATIONSWITH CONSTANT COEFFICIENTS

As an introduction to this section, consider the following physical problem. Aweight of mass m is attached to a vertically suspended spring that is allowed tostretch until the forces acting on the weight are in equilibrium. Suppose thatthe weight is now motionless and impose an xy-coordinate system with theweight at the origin and the spring lying on the positive y-axis (see Figure 2.7).

Suppose that at a certain time, say t = 0, the weight is lowered a distances along the y-axis and released. The spring then begins to oscillate.

We describe the motion of the spring. At any time t ≥ 0, let F (t) denotethe force acting on the weight and y(t) denote the position of the weight alongthe y-axis. For example, y(0) = −s. The second derivative of y with respect


.............................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................

�

�

x

y

Figure 2.7

to time, y′′(t), is the acceleration of the weight at time t; hence, by Newton’ssecond law of motion,

F (t) = my′′(t). (1)

It is reasonable to assume that the force acting on the weight is due totallyto the tension of the spring, and that this force satisfies Hooke’s law: The forceacting on the weight is proportional to its displacement from the equilibriumposition, but acts in the opposite direction. If k > 0 is the proportionalityconstant, then Hooke’s law states that

F (t) = −ky(t). (2)

Combining (1) and (2), we obtain my′′ = −ky or

y′′ +k

my = 0 . (3)

The expression (3) is an example of a differential equation. A differentialequation in an unknown function y = y(t) is an equation involving y, t, andderivatives of y. If the differential equation is of the form

any(n) + an−1y(n−1) + · · · + a1y

(1) + a0y = f, (4)

where a0, a1, . . . , an and f are functions of t and y(k) denotes the kth deriva-tive of y, then the equation is said to be linear. The functions ai are calledthe coefficients of the differential equation (4). Thus (3) is an exampleof a linear differential equation in which the coefficients are constants andthe function f is identically zero. When f is identically zero, (4) is calledhomogeneous.

In this section, we apply the linear algebra we have studied to solve ho-mogeneous linear differential equations with constant coefficients. If an �= 0,


we say that differential equation (4) is of order n. In this case, we divideboth sides by an to obtain a new, but equivalent, equation

y(n) + bn−1y(n−1) + · · · + b1y

(1) + b0y = 0 ,

where bi = ai/an for i = 0, 1, . . . , n − 1. Because of this observation, wealways assume that the coefficient an in (4) is 1.

A solution to (4) is a function that when substituted for y reduces (4)to an identity.

Example 1

The function y(t) = sin√

k/m t is a solution to (3) since

y′′(t) +k

my(t) = − k

msin

√k

mt +

k

msin

√k

mt = 0

for all t. Notice, however, that substituting y(t) = t into (3) yields

y′′(t) +k

my(t) =

k

mt,

which is not identically zero. Thus y(t) = t is not a solution to (3). ♦In our study of differential equations, it is useful to regard solutions as

complex-valued functions of a real variable even though the solutions thatare meaningful to us in a physical sense are real-valued. The convenienceof this viewpoint will become clear later. Thus we are concerned with thevector space F(R, C) (as defined in Example 3 of Section 1.2). In order toconsider complex-valued functions of a real variable as solutions to differentialequations, we must define what it means to differentiate such functions. Givena complex-valued function x ∈ F(R, C) of a real variable t, there exist uniquereal-valued functions x1 and x2 of t, such that

x(t) = x1(t) + ix2(t) for t ∈ R,

where i is the imaginary number such that i2 = −1. We call x1 the real partand x2 the imaginary part of x.

Definitions. Given a function x ∈ F(R, C) with real part x1 and imag-inary part x2, we say that x is differentiable if x1 and x2 are differentiable.If x is differentiable, we define the derivative x′ of x by

x′ = x′1 + ix′

2.

We illustrate some computations with complex-valued functions in thefollowing example.


Example 2

Suppose that x(t) = cos 2t + i sin 2t. Then

x′(t) = −2 sin 2t + 2i cos 2t.

We next find the real and imaginary parts of x2. Since

x2(t) = (cos 2t + i sin 2t)2 = (cos2 2t − sin2 2t) + i(2 sin 2t cos 2t)= cos 4t + i sin 4t,

the real part of x2(t) is cos 4t, and the imaginary part is sin 4t. ♦The next theorem indicates that we may limit our investigations to a

vector space considerably smaller than F(R, C). Its proof, which is illustratedin Example 3, involves a simple induction argument, which we omit.

Theorem 2.27. Any solution to a homogeneous linear differential equa-tion with constant coefficients has derivatives of all orders; that is, if x is asolution to such an equation, then x(k) exists for every positive integer k.

Example 3

To illustrate Theorem 2.27, consider the equation

y(2) + 4y = 0 .

Clearly, to qualify as a solution, a function y must have two derivatives. If yis a solution, however, then

y(2) = −4y.

Thus since y(2) is a constant multiple of a function y that has two derivatives,y(2) must have two derivatives. Hence y(4) exists; in fact,

y(4) = −4y(2).

Since y(4) is a constant multiple of a function that we have shown has atleast two derivatives, it also has at least two derivatives; hence y(6) exists.Continuing in this manner, we can show that any solution has derivatives ofall orders. ♦

Definition. We use C∞ to denote the set of all functions in F(R, C) thathave derivatives of all orders.

It is a simple exercise to show that C∞ is a subspace of F(R, C) and hencea vector space over C. In view of Theorem 2.27, it is this vector space that


is of interest to us. For x ∈ C∞, the derivative x′ of x also lies in C∞. Wecan use the derivative operation to define a mapping D : C∞ → C∞ by

D(x) = x′ for x ∈ C∞.

It is easy to show that D is a linear operator. More generally, consider anypolynomial over C of the form

p(t) = antn + an−1tn−1 + · · · + a1t + a0.

If we define

p(D) = anDn + an−1Dn−1 + · · · + a1D + a0I,

then p(D) is a linear operator on C∞. (See Appendix E.)

Definitions. For any polynomial p(t) over C of positive degree, p(D) iscalled a differential operator. The order of the differential operator p(D)is the degree of the polynomial p(t).

Differential operators are useful since they provide us with a means ofreformulating a differential equation in the context of linear algebra. Anyhomogeneous linear differential equation with constant coefficients,

y(n) + an−1y(n−1) + · · · + a1y

(1) + a0y = 0 ,

can be rewritten using differential operators as

(Dn + an−1Dn−1 + · · · + a1D + a0I)(y) = 0 .

Definition. Given the differential equation above, the complex polyno-mial

p(t) = tn + an−1tn−1 + · · · + a1t + a0

is called the auxiliary polynomial associated with the equation.

For example, (3) has the auxiliary polynomial

p(t) = t2 +k

m.

Any homogeneous linear differential equation with constant coefficientscan be rewritten as

p(D)(y) = 0 ,

where p(t) is the auxiliary polynomial associated with the equation. Clearly,this equation implies the following theorem.


Theorem 2.28. The set of all solutions to a homogeneous linear differen-tial equation with constant coefficients coincides with the null space of p(D),where p(t) is the auxiliary polynomial associated with the equation.

Proof. Exercise.

Corollary. The set of all solutions to a homogeneous linear differentialequation with constant coefficients is a subspace of C∞.

In view of the preceding corollary, we call the set of solutions to a homo-geneous linear differential equation with constant coefficients the solutionspace of the equation. A practical way of describing such a space is in termsof a basis. We now examine a certain class of functions that is of use infinding bases for these solution spaces.

For a real number s, we are familiar with the real number es, where e isthe unique number whose natural logarithm is 1 (i.e., ln e = 1). We know,for instance, certain properties of exponentiation, namely,

es+t = eset and e−t =1et

for any real numbers s and t. We now extend the definition of powers of e toinclude complex numbers in such a way that these properties are preserved.

Definition. Let c = a + ib be a complex number with real part a andimaginary part b. Define

ec = ea(cos b + i sin b).

The special case

eib = cos b + i sin b

is called Euler’s formula.

For example, for c = 2 + i(π/3),

ec = e2(cos

π

3+ i sin

π

3

)= e2

(12

+ i

√3

2

).

Clearly, if c is real (b = 0), then we obtain the usual result: ec = ea. Usingthe approach of Example 2, we can show by the use of trigonometric identitiesthat

ec+d = eced and e−c =1ec

for any complex numbers c and d.


Definition. A function f : R → C defined by f(t) = ect for a fixedcomplex number c is called an exponential function.

The derivative of an exponential function, as described in the next theo-rem, is consistent with the real version. The proof involves a straightforwardcomputation, which we leave as an exercise.

Theorem 2.29. For any exponential function f(t) = ect, f ′(t) = cect.

Proof. Exercise.

We can use exponential functions to describe all solutions to a homoge-neous linear differential equation of order 1. Recall that the order of such anequation is the degree of its auxiliary polynomial. Thus an equation of order1 is of the form

y′ + a0y = 0 . (5)

Theorem 2.30. The solution space for (5) is of dimension 1 and has{e−a0t} as a basis.

Proof. Clearly (5) has e−a0t as a solution. Suppose that x(t) is any solutionto (5). Then

x′(t) = −a0x(t) for all t ∈ R.

Define

z(t) = ea0tx(t).

Differentiating z yields

z′(t) = (ea0t)′x(t) + ea0tx′(t) = a0ea0tx(t) − a0e

a0tx(t) = 0 .

(Notice that the familiar product rule for differentiation holds for complex-valued functions of a real variable. A justification of this involves a lengthy,although direct, computation.)

Since z′ is identically zero, z is a constant function. (Again, this fact, wellknown for real-valued functions, is also true for complex-valued functions.The proof, which relies on the real case, involves looking separately at thereal and imaginary parts of z.) Thus there exists a complex number k suchthat

z(t) = ea0tx(t) = k for all t ∈ R.

So

x(t) = ke−a0t.

We conclude that any solution to (5) is a linear combination of e−a0t.


Another way of stating Theorem 2.30 is as follows.

Corollary. For any complex number c, the null space of the differentialoperator D − cI has {ect} as a basis.

We next concern ourselves with differential equations of order greaterthan one. Given an nth order homogeneous linear differential equation withconstant coefficients,

y(n) + an−1y(n−1) + · · · + a1y

(1) + a0y = 0 ,

its auxiliary polynomial

p(t) = tn + an−1tn−1 + · · · + a1t + a0

factors into a product of polynomials of degree 1, that is,

p(t) = (t − c1)(t − c2) · · · (t − cn),

where c1, c2, . . . , cn are (not necessarily distinct) complex numbers. (Thisfollows from the fundamental theorem of algebra in Appendix D.) Thus

p(D) = (D − c1I)(D − c2I) · · · (D − cnI).

The operators D − ciI commute, and so, by Exercise 9, we have that

N(D − ciI) ⊆ N(p(D)) for all i.

Since N(p(D)) coincides with the solution space of the given differential equa-tion, we can deduce the following result from the preceding corollary.

Theorem 2.31. Let p(t) be the auxiliary polynomial for a homogeneouslinear differential equation with constant coefficients. For any complex num-ber c, if c is a zero of p(t), then ect is a solution to the differential equation.

Example 4

Given the differential equation

y′′ − 3y′ + 2y = 0 ,

its auxiliary polynomial is

p(t) = t2 − 3t + 2 = (t − 1)(t − 2).

Hence, by Theorem 2.31, et and e2t are solutions to the differential equa-tion because c = 1 and c = 2 are zeros of p(t). Since the solution spaceof the differential equation is a subspace of C∞, span({et, e2t}) lies in thesolution space. It is a simple matter to show that {et, e2t} is linearly inde-pendent. Thus if we can show that the solution space is two-dimensional, wecan conclude that {et, e2t} is a basis for the solution space. This result is aconsequence of the next theorem. ♦


Theorem 2.32. For any differential operator p(D) of order n, the nullspace of p(D) is an n-dimensional subspace of C∞.

As a preliminary to the proof of Theorem 2.32, we establish two lemmas.

Lemma 1. The differential operator D − cI : C∞ → C∞ is onto for anycomplex number c.

Proof. Let v ∈ C∞. We wish to find a u ∈ C∞ such that (D − cI)u = v.Let w(t) = v(t)e−ct for t ∈ R. Clearly, w ∈ C∞ because both v and e−ct lie inC∞. Let w1 and w2 be the real and imaginary parts of w. Then w1 and w2 arecontinuous because they are differentiable. Hence they have antiderivatives,say, W1 and W2, respectively. Let W : R → C be defined by

W (t) = W1(t) + iW2(t) for t ∈ R.

Then W ∈ C∞, and the real and imaginary parts of W are W1 and W2,respectively. Furthermore, W ′ = w. Finally, let u : R → C be defined byu(t) = W (t)ect for t ∈ R. Clearly u ∈ C∞, and since

(D − cI)u(t) = u′(t) − cu(t)

= W ′(t)ect + W (t)cect − cW (t)ect

= w(t)ect

= v(t)e−ctect

= v(t),

we have (D − cI)u = v.

Lemma 2. Let V be a vector space, and suppose that T and U arelinear operators on V such that U is onto and the null spaces of T and U arefinite-dimensional. Then the null space of TU is finite-dimensional, and

dim(N(TU)) = dim(N(T)) + dim(N(U)).

Proof. Let p = dim(N(T)), q = dim(N(U)), and {u1, u2, . . . , up} and{v1, v2, . . . , vq} be bases for N(T) and N(U), respectively. Since U is onto,we can choose for each i (1 ≤ i ≤ p) a vector wi ∈ V such that U(wi) = ui.Note that the wi’s are distinct. Furthermore, for any i and j, wi �= vj , forotherwise ui = U(wi) = U(vj) = 0—a contradiction. Hence the set

β = {w1, w2, . . . , wp, v1, v2, . . . , vq}

contains p+q distinct vectors. To complete the proof of the lemma, it sufficesto show that β is a basis for N(TU).


We first show that β generates N(TU). Since for any wi and vj in β,TU(wi) = T(ui) = 0 and TU(vj) = T(0 ) = 0 , it follows that β ⊆ N(TU).Now suppose that v ∈ N(TU). Then 0 = TU(v) = T(U(v)). Thus U(v) ∈N(T). So there exist scalars a1, a2, . . . , ap such that

U(v) = a1u1 + a2u2 + · · · + apup

= a1U(w1) + a2U(w2) + · · · + apU(wp)= U(a1w1 + a2w2 + · · · + apwp).

Hence

U(v − (a1w1 + a2w2 + · · · + apwp)) = 0 .

Consequently, v − (a1w1 + a2w2 + · · · + apwp) lies in N(U). It follows thatthere exist scalars b1, b2, . . . , bq such that

v − (a1w1 + a2w2 + · · · + apwp) = b1v1 + b2v2 + · · · + bqvq

or

v = a1w1 + a2w2 + · · · + apwp + b1v1 + b2v2 + · · · + bqvq.

Therefore β spans N(TU).To prove that β is linearly independent, let a1, a2, . . . , ap, b1, b2, . . . , bq be

any scalars such that

a1w1 + a2w2 + · · · + apwp + b1v1 + b2v2 + · · · + bqvq = 0 . (6)

Applying U to both sides of (6), we obtain

a1u1 + a2u2 + · · · + apup = 0 .

Since {u1, u2, . . . , up} is linearly independent, the ai’s are all zero. Thus (6)reduces to

b1v1 + b2v2 + · · · + bqvq = 0 .

Again, the linear independence of {v1, v2, . . . , vq} implies that the bi’s areall zero. We conclude that β is a basis for N(TU). Hence N(TU) is finite-dimensional, and dim(N(TU)) = p + q = dim(N(T)) + dim(N(U)).

Proof of Theorem 2.32. The proof is by mathematical induction on theorder of the differential operator p(D). The first-order case coincides withTheorem 2.30. For some integer n > 1, suppose that Theorem 2.32 holdsfor any differential operator of order less than n, and consider a differential


operator p(D) of order n. The polynomial p(t) can be factored into a productof two polynomials as follows:

p(t) = q(t)(t − c),

where q(t) is a polynomial of degree n − 1 and c is a complex number. Thusthe given differential operator may be rewritten as

p(D) = q(D)(D − cI).

Now, by Lemma 1, D − cI is onto, and by the corollary to Theorem 2.30,dim(N(D− cI)) = 1. Also, by the induction hypothesis, dim(N(q(D)) = n−1.Thus, by Lemma 2, we conclude that

dim(N(p(D))) = dim(N(q(D))) + dim(N(D − cI))

= (n − 1) + 1 = n.

Corollary. The solution space of any nth-order homogeneous linear dif-ferential equation with constant coefficients is an n-dimensional subspace ofC∞.

The corollary to Theorem 2.32 reduces the problem of finding all solutionsto an nth-order homogeneous linear differential equation with constant coeffi-cients to finding a set of n linearly independent solutions to the equation. Bythe results of Chapter 1, any such set must be a basis for the solution space.The next theorem enables us to find a basis quickly for many such equations.Hints for its proof are provided in the exercises.

Theorem 2.33. Given n distinct complex numbers c1, c2, . . . , cn, the setof exponential functions {ec1t, ec2t, . . . , ecnt} is linearly independent.

Proof. Exercise. (See Exercise 10.)

Corollary. For any nth-order homogeneous linear differential equationwith constant coefficients, if the auxiliary polynomial has n distinct zerosc1, c2, . . . , cn, then {ec1t, ec2t, . . . , ecnt} is a basis for the solution space of thedifferential equation.

Proof. Exercise. (See Exercise 10.)

Example 5

We find all solutions to the differential equation

y′′ + 5y′ + 4y = 0 .


Since the auxiliary polynomial factors as (t + 4)(t + 1), it has two distinctzeros, −1 and −4. Thus {e−t, e−4t} is a basis for the solution space. So anysolution to the given equation is of the form

y(t) = b1e−t + b2e

−4t

for unique scalars b1 and b2. ♦Example 6


y′′ + 9y = 0 .

The auxiliary polynomial t2 + 9 factors as (t − 3i)(t + 3i) and hence hasdistinct zeros c1 = 3i and c2 = −3i. Thus {e3it, e−3it} is a basis for thesolution space. Since

cos 3t =12(e3it + e−3it) and sin 3t =

12i

(e3it − e−3it),

it follows from Exercise 7 that {cos 3t, sin 3t} is also a basis for this solutionspace. This basis has an advantage over the original one because it consists ofthe familiar sine and cosine functions and makes no reference to the imaginarynumber i. Using this latter basis, we see that any solution to the givenequation is of the form

y(t) = b1 cos 3t + b2 sin 3t

for unique scalars b1and b2. ♦Next consider the differential equation

y′′ + 2y′ + y = 0 ,

for which the auxiliary polynomial is (t + 1)2. By Theorem 2.31, e−t is asolution to this equation. By the corollary to Theorem 2.32, its solutionspace is two-dimensional. In order to obtain a basis for the solution space,we need a solution that is linearly independent of e−t. The reader can verifythat te−t is a such a solution. The following lemma extends this result.

Lemma. For a given complex number c and positive integer n, supposethat (t − c)n is the auxiliary polynomial of a homogeneous linear differentialequation with constant coefficients. Then the set

β = {ect, tect, . . . , tn−1ect}is a basis for the solution space of the equation.


Proof. Since the solution space is n-dimensional, we need only show thatβ is linearly independent and lies in the solution space. First, observe thatfor any positive integer k,

(D − cI)(tkect) = ktk−1ect + ctkect − ctkect

= ktk−1ect.

Hence for k < n,

(D − cI)n(tkect) = 0 .

It follows that β is a subset of the solution space.We next show that β is linearly independent. Consider any linear combi-

nation of vectors in β such that

b0ect + b1te

ct + · · · + bn−1tn−1ect = 0 (7)

for some scalars b0, b1, . . . , bn−1. Dividing by ect in (7), we obtain

b0 + b1t + · · · + bn−1tn−1 = 0 . (8)

Thus the left side of (8) must be the zero polynomial function. We concludethat the coefficients b0, b1, . . . , bn−1 are all zero. So β is linearly independentand hence is a basis for the solution space.

Example 7


y(4) − 4y(3) + 6y(2) − 4y(1) + y = 0 .

Since the auxiliary polynomial is

t4 − 4t3 + 6t2 − 4t + 1 = (t − 1)4,

we can immediately conclude by the preceding lemma that {et, tet, t2et, t3et}is a basis for the solution space. So any solution y to the given differentialequation is of the form

y(t) = b1et + b2te

t + b3t2et + b4t

3et

for unique scalars b1, b2, b3, and b4. ♦The most general situation is stated in the following theorem.

Theorem 2.34. Given a homogeneous linear differential equation withconstant coefficients and auxiliary polynomial

(t − c1)n1(t − c2)n2 · · · (t − ck)nk ,

where n1, n2, . . . , nk are positive integers and c1, c2, . . . , ck are distinct com-plex numbers, the following set is a basis for the solution space of the equation:

{ec1t, tec1t, . . . , tn1−1ec1t, . . . , eckt, teckt, . . . , tnk−1eckt}.


Proof. Exercise.

Example 8

The differential equation

y(3) − 4y(2) + 5y(1) − 2y = 0

has the auxiliary polynomial

t3 − 4t2 + 5t − 2 = (t − 1)2(t − 2).

By Theorem 2.34, {et, tet, e2t} is a basis for the solution space of the differ-ential equation. Thus any solution y has the form

y(t) = b1et + b2te

t + b3e2t

for unique scalars b1, b2, and b3. ♦

EXERCISES


(a) The set of solutions to an nth-order homogeneous linear differentialequation with constant coefficients is an n-dimensional subspace ofC∞.

(b) The solution space of a homogeneous linear differential equationwith constant coefficients is the null space of a differential operator.

(c) The auxiliary polynomial of a homogeneous linear differentialequation with constant coefficients is a solution to the differentialequation.

(d) Any solution to a homogeneous linear differential equation withconstant coefficients is of the form aect or atkect, where a and care complex numbers and k is a positive integer.

(e) Any linear combination of solutions to a given homogeneous lineardifferential equation with constant coefficients is also a solution tothe given equation.

(f) For any homogeneous linear differential equation with constantcoefficients having auxiliary polynomial p(t), if c1, c2, . . . , ck arethe distinct zeros of p(t), then {ec1t, ec2t, . . . , eckt} is a basis forthe solution space of the given differential equation.

(g) Given any polynomial p(t) ∈ P(C), there exists a homogeneous lin-ear differential equation with constant coefficients whose auxiliarypolynomial is p(t).


2. For each of the following parts, determine whether the statement is trueor false. Justify your claim with either a proof or a counterexample,whichever is appropriate.

(a) Any finite-dimensional subspace of C∞ is the solution space of ahomogeneous linear differential equation with constant coefficients.

(b) There exists a homogeneous linear differential equation with con-stant coefficients whose solution space has the basis {t, t2}.

(c) For any homogeneous linear differential equation with constantcoefficients, if x is a solution to the equation, so is its derivativex′.

Given two polynomials p(t) and q(t) in P(C), if x ∈ N(p(D)) and y ∈N(q(D)), then

(d) x + y ∈ N(p(D)q(D)).(e) xy ∈ N(p(D)q(D)).

3. Find a basis for the solution space of each of the following differentialequations.

(a) y′′ + 2y′ + y = 0(b) y′′′ = y′

(c) y(4) − 2y(2) + y = 0(d) y′′ + 2y′ + y = 0(e) y(3) − y(2) + 3y(1) + 5y = 0

4. Find a basis for each of the following subspaces of C∞.

(a) N(D2 − D − I)(b) N(D3 − 3D2 + 3D − I)(c) N(D3 + 6D2 + 8D)

5. Show that C∞ is a subspace of F(R, C).

6. (a) Show that D : C∞ → C∞ is a linear operator.(b) Show that any differential operator is a linear operator on C∞.

7. Prove that if {x, y} is a basis for a vector space over C, then so is{12(x + y),

12i

(x − y)}

.

8. Consider a second-order homogeneous linear differential equation withconstant coefficients in which the auxiliary polynomial has distinct con-jugate complex roots a + ib and a − ib, where a, b ∈ R. Show that{eat cos bt, eat sin bt} is a basis for the solution space.


9. Suppose that {U1, U2, . . . ,Un} is a collection of pairwise commutativelinear operators on a vector space V (i.e., operators such that UiUj =UjUi for all i, j). Prove that, for any i (1 ≤ i ≤ n),

N(Ui) ⊆ N(U1U2 · · ·Un).

10. Prove Theorem 2.33 and its corollary. Hint: Suppose that

b1ec1t + b2e

c2t + · · · + bnecnt = 0 (where the ci’s are distinct).

To show the bi’s are zero, apply mathematical induction on n as follows.Verify the theorem for n = 1. Assuming that the theorem is true forn − 1 functions, apply the operator D − cnI to both sides of the givenequation to establish the theorem for n distinct exponential functions.

11. Prove Theorem 2.34. Hint: First verify that the alleged basis lies inthe solution space. Then verify that this set is linearly independent bymathematical induction on k as follows. The case k = 1 is the lemmato Theorem 2.34. Assuming that the theorem holds for k − 1 distinctci’s, apply the operator (D − ckI)nk to any linear combination of thealleged basis that equals 0 .

12. Let V be the solution space of an nth-order homogeneous linear differ-ential equation with constant coefficients having auxiliary polynomialp(t). Prove that if p(t) = g(t)h(t), where g(t) and h(t) are polynomialsof positive degree, then

N(h(D)) = R(g(DV)) = g(D)(V),

where DV : V → V is defined by DV(x) = x′ for x ∈ V. Hint: First proveg(D)(V) ⊆ N(h(D)). Then prove that the two spaces have the samefinite dimension.

13. A differential equation

y(n) + an−1y(n−1) + · · · + a1y

(1) + a0y = x

is called a nonhomogeneous linear differential equation with constantcoefficients if the ai’s are constant and x is a function that is not iden-tically zero.

(a) Prove that for any x ∈ C∞ there exists y ∈ C∞ such that y isa solution to the differential equation. Hint: Use Lemma 1 toTheorem 2.32 to show that for any polynomial p(t), the linearoperator p(D) : C∞ → C∞ is onto.


(b) Let V be the solution space for the homogeneous linear equation

y(n) + an−1y(n−1) + · · · + a1y

(1) + a0y = 0 .

Prove that if z is any solution to the associated nonhomogeneouslinear differential equation, then the set of all solutions to thenonhomogeneous linear differential equation is

{z + y : y ∈ V}.

14. Given any nth-order homogeneous linear differential equation with con-stant coefficients, prove that, for any solution x and any t0 ∈ R, ifx(t0) = x′(t0) = · · · = x(n−1)(t0) = 0, then x = 0 (the zero function).Hint: Use mathematical induction on n as follows. First prove the con-clusion for the case n = 1. Next suppose that it is true for equations oforder n − 1, and consider an nth-order differential equation with aux-iliary polynomial p(t). Factor p(t) = q(t)(t − c), and let z = q((D))x.Show that z(t0) = 0 and z′−cz = 0 to conclude that z = 0 . Now applythe induction hypothesis.

15. Let V be the solution space of an nth-order homogeneous linear dif-ferential equation with constant coefficients. Fix t0 ∈ R, and define amapping Φ: V → Cn by

Φ(x) =

⎛⎜⎜⎜⎝x(t0)x′(t0)

...x(n−1)(t0)

⎞⎟⎟⎟⎠ for each x in V.

(a) Prove that Φ is linear and its null space is the zero subspace of V.Deduce that Φ is an isomorphism. Hint: Use Exercise 14.

(b) Prove the following: For any nth-order homogeneous linear dif-ferential equation with constant coefficients, any t0 ∈ R, and anycomplex numbers c0, c1, . . . , cn−1 (not necessarily distinct), thereexists exactly one solution, x, to the given differential equationsuch that x(t0) = c0 and x(k)(t0) = ck for k = 1, 2, . . . n − 1.

16. Pendular Motion. It is well known that the motion of a pendulum isapproximated by the differential equation

θ′′ +g

lθ = 0 ,

where θ(t) is the angle in radians that the pendulum makes with avertical line at time t (see Figure 2.8), interpreted so that θ is positiveif the pendulum is to the right and negative if the pendulum is to the


..........................................................................

........................................................

....................................................

��

��

.......................................

θ(t) l�

Figure 2.8

left of the vertical line as viewed by the reader. Here l is the lengthof the pendulum and g is the magnitude of acceleration due to gravity.The variable t and constants l and g must be in compatible units (e.g.,t in seconds, l in meters, and g in meters per second per second).

(a) Express an arbitrary solution to this equation as a linear combi-nation of two real-valued solutions.

(b) Find the unique solution to the equation that satisfies the condi-tions

θ(0) = θ0 > 0 and θ′(0) = 0.

(The significance of these conditions is that at time t = 0 thependulum is released from a position displaced from the verticalby θ0.)

(c) Prove that it takes 2π√

l/g units of time for the pendulum to makeone circuit back and forth. (This time is called the period of thependulum.)

17. Periodic Motion of a Spring without Damping. Find the general solu-tion to (3), which describes the periodic motion of a spring, ignoringfrictional forces.

18. Periodic Motion of a Spring with Damping. The ideal periodic motiondescribed by solutions to (3) is due to the ignoring of frictional forces.In reality, however, there is a frictional force acting on the motion thatis proportional to the speed of motion, but that acts in the oppositedirection. The modification of (3) to account for the frictional force,called the damping force, is given by

my′′ + ry′ + ky = 0 ,

where r > 0 is the proportionality constant.

(a) Find the general solution to this equation.


(b) Find the unique solution in (a) that satisfies the initial conditionsy(0) = 0 and y′(0) = v0, the initial velocity.

(c) For y(t) as in (b), show that the amplitude of the oscillation de-creases to zero; that is, prove that lim

t→∞ y(t) = 0.

19. In our study of differential equations, we have regarded solutions ascomplex-valued functions even though functions that are useful in de-scribing physical motion are real-valued. Justify this approach.

20. The following parts, which do not involve linear algebra, are includedfor the sake of completeness.

(a) Prove Theorem 2.27. Hint: Use mathematical induction on thenumber of derivatives possessed by a solution.

(b) For any c, d ∈ C, prove that

ec+d = cced and e−c =1ec

.

(c) Prove Theorem 2.28.(d) Prove Theorem 2.29.(e) Prove the product rule for differentiating complex-valued func-

tions of a real variable: For any differentiable functions x andy in F(R, C), the product xy is differentiable and

(xy)′ = x′y + xy′.

Hint: Apply the rules of differentiation to the real and imaginaryparts of xy.

(f) Prove that if x ∈ F(R, C) and x′ = 0 , then x is a constant func-tion.


Auxiliary polynomial 131Change of coordinate matrix 112Clique 94Coefficients of a differential equation

128Coordinate function 119Coordinate vector relative to a basis

80Differential equation 128Differential operator 131Dimension theorem 69

Dominance relation 95Double dual 120Dual basis 120Dual space 119Euler’s formula 132Exponential function 133Fourier coefficient 119Homogeneous linear differential

equation 128Identity matrix 89Identity transformation 67


Incidence matrix 94Inverse of a linear transformation

99Inverse of a matrix 100Invertible linear transformation 99Invertible matrix 100Isomorphic vector spaces 102Isomorphism 102Kronecker delta 89Left-multiplication transformation

92Linear functional 119Linear operator 112Linear transformation 65Matrix representing a linear trans-

formation 80Nonhomogeneous differential equa-

tion 142Nullity of a linear transformation

69Null space 67Ordered basis 79Order of a differential equation 129

Order of a differential operator 131Product of matrices 87Projection on a subspace 76Projection on the x-axis 66Range 67Rank of a linear transformation 69Reflection about the x-axis 66Rotation 66Similar matrices 115Solution to a differential equation

129Solution space of a homogeneous dif-

ferential equation 132Standard ordered basis for Fn 79Standard ordered basis for Pn(F )

79Standard representation of a vector

space with respect to a basis 104Transpose of a linear transformation

121Zero transformation 67

3Elementary MatrixOperations and Systemsof Linear Equations

3.1 Elementary Matrix Operations and Elementary Matrices3.2 The Rank of a Matrix and Matrix Inverses3.3 Systems of Linear Equations—Theoretical Aspects3.4 Systems of Linear Equations—Computational Aspects

This chapter is devoted to two related objectives:

1. the study of certain “rank-preserving” operations on matrices;2. the application of these operations and the theory of linear transforma-

tions to the solution of systems of linear equations.

As a consequence of objective 1, we obtain a simple method for com-puting the rank of a linear transformation between finite-dimensional vectorspaces by applying these rank-preserving matrix operations to a matrix thatrepresents that transformation.

Solving a system of linear equations is probably the most important ap-plication of linear algebra. The familiar method of elimination for solvingsystems of linear equations, which was discussed in Section 1.4, involves theelimination of variables so that a simpler system can be obtained. The tech-nique by which the variables are eliminated utilizes three types of operations:

1. interchanging any two equations in the system;2. multiplying any equation in the system by a nonzero constant;3. adding a multiple of one equation to another.

In Section 3.3, we express a system of linear equations as a single matrixequation. In this representation of the system, the three operations aboveare the “elementary row operations” for matrices. These operations providea convenient computational method for determining all solutions to a systemof linear equations.

147

148 Chap. 3 Elementary Matrix Operations and Systems of Linear Equations

3.1 ELEMENTARY MATRIX OPERATIONS AND ELEMENTARYMATRICES

In this section, we define the elementary operations that are used throughoutthe chapter. In subsequent sections, we use these operations to obtain simplecomputational methods for determining the rank of a linear transformationand the solution of a system of linear equations. There are two types of el-ementary matrix operations—row operations and column operations. As wewill see, the row operations are more useful. They arise from the three opera-tions that can be used to eliminate variables in a system of linear equations.

Definitions. Let A be an m × n matrix. Any one of the followingthree operations on the rows [columns] of A is called an elementary row[column] operation:

(1) interchanging any two rows [columns] of A;(2) multiplying any row [column] of A by a nonzero scalar;(3) adding any scalar multiple of a row [column] of A to another row [col-

umn].

Any of these three operations is called an elementary operation. Elemen-tary operations are of type 1, type 2, or type 3 depending on whether theyare obtained by (1), (2), or (3).

Example 1

Let

A =

⎛⎝1 2 3 42 1 −1 34 0 1 2

⎞⎠ .

Interchanging the second row of A with the first row is an example of anelementary row operation of type 1. The resulting matrix is

B =

⎛⎝2 1 −1 31 2 3 44 0 1 2

⎞⎠ .

Multiplying the second column of A by 3 is an example of an elementarycolumn operation of type 2. The resulting matrix is

C =

⎛⎝1 6 3 42 3 −1 34 0 1 2

⎞⎠ .

Sec. 3.1 Elementary Matrix Operations and Elementary Matrices 149

Adding 4 times the third row of A to the first row is an example of anelementary row operation of type 3. In this case, the resulting matrix is

M =

⎛⎝17 2 7 122 1 −1 34 0 1 2

⎞⎠ . ♦

Notice that if a matrix Q can be obtained from a matrix P by means of anelementary row operation, then P can be obtained from Q by an elementaryrow operation of the same type. (See Exercise 8.) So, in Example 1, A canbe obtained from M by adding −4 times the third row of M to the first rowof M .

Definition. An n × n elementary matrix is a matrix obtained byperforming an elementary operation on In. The elementary matrix is saidto be of type 1, 2, or 3 according to whether the elementary operationperformed on In is a type 1, 2, or 3 operation, respectively.

For example, interchanging the first two rows of I3 produces the elemen-tary matrix

E =

⎛⎝0 1 01 0 00 0 1

⎞⎠ .

Note that E can also be obtained by interchanging the first two columns ofI3. In fact, any elementary matrix can be obtained in at least two ways—either by performing an elementary row operation on In or by performing anelementary column operation on In. (See Exercise 4.) Similarly,⎛⎝1 0 −2

0 1 00 0 1

⎞⎠is an elementary matrix since it can be obtained from I3 by an elementarycolumn operation of type 3 (adding −2 times the first column of I3 to thethird column) or by an elementary row operation of type 3 (adding −2 timesthe third row to the first row).

Our first theorem shows that performing an elementary row operation ona matrix is equivalent to multiplying the matrix by an elementary matrix.

Theorem 3.1. Let A ∈ Mm×n(F ), and suppose that B is obtained fromA by performing an elementary row [column] operation. Then there exists anm × m [n × n] elementary matrix E such that B = EA [B = AE]. In fact,E is obtained from Im [In] by performing the same elementary row [column]operation as that which was performed on A to obtain B. Conversely, if E is


an elementary m × m [n × n] matrix, then EA [AE] is the matrix obtainedfrom A by performing the same elementary row [column] operation as thatwhich produces E from Im [In].

The proof, which we omit, requires verifying Theorem 3.1 for each typeof elementary row operation. The proof for column operations can then beobtained by using the matrix transpose to transform a column operation intoa row operation. The details are left as an exercise. (See Exercise 7.)

The next example illustrates the use of the theorem.

Example 2

Consider the matrices A and B in Example 1. In this case, B is obtained fromA by interchanging the first two rows of A. Performing this same operationon I3, we obtain the elementary matrix

E =

⎛⎝0 1 01 0 00 0 1

⎞⎠ .

Note that EA = B.

In the second part of Example 1, C is obtained from A by multiplying thesecond column of A by 3. Performing this same operation on I4, we obtainthe elementary matrix

E =

⎛⎜⎜⎝1 0 0 00 3 0 00 0 1 00 0 0 1

⎞⎟⎟⎠ .

Observe that AE = C. ♦It is a useful fact that the inverse of an elementary matrix is also an

elementary matrix.

Theorem 3.2. Elementary matrices are invertible, and the inverse of anelementary matrix is an elementary matrix of the same type.

Proof. Let E be an elementary n× n matrix. Then E can be obtained byan elementary row operation on In. By reversing the steps used to transformIn into E, we can transform E back into In. The result is that In canbe obtained from E by an elementary row operation of the same type. ByTheorem 3.1, there is an elementary matrix E such that EE = In. Therefore,by Exercise 10 of Section 2.4, E is invertible and E−1 = E.

Sec. 3.1 Elementary Matrix Operations and Elementary Matrices 151

EXERCISES


(a) An elementary matrix is always square.(b) The only entries of an elementary matrix are zeros and ones.(c) The n × n identity matrix is an elementary matrix.(d) The product of two n × n elementary matrices is an elementary

matrix.(e) The inverse of an elementary matrix is an elementary matrix.(f) The sum of two n×n elementary matrices is an elementary matrix.(g) The transpose of an elementary matrix is an elementary matrix.(h) If B is a matrix that can be obtained by performing an elementary

row operation on a matrix A, then B can also be obtained byperforming an elementary column operation on A.

(i) If B is a matrix that can be obtained by performing an elemen-tary row operation on a matrix A, then A can be obtained byperforming an elementary row operation on B.

2. Let

A =

⎛⎝1 2 31 0 11 −1 1

⎞⎠ , B =

⎛⎝1 0 31 −2 11 −3 1

⎞⎠ , and C =

⎛⎝1 0 30 −2 −21 −3 1

⎞⎠ .

Find an elementary operation that transforms A into B and an elemen-tary operation that transforms B into C. By means of several additionaloperations, transform C into I3.

3. Use the proof of Theorem 3.2 to obtain the inverse of each of the fol-lowing elementary matrices.

(a)

⎛⎝0 0 10 1 01 0 0

⎞⎠ (b)

⎛⎝1 0 00 3 00 0 1

⎞⎠ (c)

⎛⎝ 1 0 00 1 0

−2 0 1

⎞⎠4. Prove the assertion made on page 149: Any elementary n×n matrix can

be obtained in at least two ways—either by performing an elementaryrow operation on In or by performing an elementary column operationon In.

5. Prove that E is an elementary matrix if and only if Et is.

6. Let A be an m× n matrix. Prove that if B can be obtained from A byan elementary row [column] operation, then Bt can be obtained fromAt by the corresponding elementary column [row] operation.



8. Prove that if a matrix Q can be obtained from a matrix P by an elemen-tary row operation, then P can be obtained from Q by an elementarymatrix of the same type. Hint: Treat each type of elementary rowoperation separately.

9. Prove that any elementary row [column] operation of type 1 can beobtained by a succession of three elementary row [column] operationsof type 3 followed by one elementary row [column] operation of type 2.

10. Prove that any elementary row [column] operation of type 2 can beobtained by dividing some row [column] by a nonzero scalar.

11. Prove that any elementary row [column] operation of type 3 can beobtained by subtracting a multiple of some row [column] from anotherrow [column].

12. Let A be an m × n matrix. Prove that there exists a sequence ofelementary row operations of types 1 and 3 that transforms A into anupper triangular matrix.

3.2 THE RANK OF A MATRIX AND MATRIX INVERSES

In this section, we define the rank of a matrix. We then use elementaryoperations to compute the rank of a matrix and a linear transformation. Thesection concludes with a procedure for computing the inverse of an invertiblematrix.

Definition. If A ∈ Mm×n(F ), we define the rank of A, denoted rank(A),to be the rank of the linear transformation LA : Fn → Fm.

Many results about the rank of a matrix follow immediately from thecorresponding facts about a linear transformation. An important result ofthis type, which follows from Fact 3 (p. 100) and Corollary 2 to Theorem 2.18(p. 102), is that an n × n matrix is invertible if and only if its rank is n.

Every matrix A is the matrix representation of the linear transformationLA with respect to the appropriate standard ordered bases. Thus the rankof the linear transformation LA is the same as the rank of one of its matrixrepresentations, namely, A. The next theorem extends this fact to any ma-trix representation of any linear transformation defined on finite-dimensionalvector spaces.

Theorem 3.3. Let T : V → W be a linear transformation between finite-dimensional vector spaces, and let β and γ be ordered bases for V and W,respectively. Then rank(T) = rank([T]γβ).

Proof. This is a restatement of Exercise 20 of Section 2.4.

Sec. 3.2 The Rank of a Matrix and Matrix Inverses 153

Now that the problem of finding the rank of a linear transformation hasbeen reduced to the problem of finding the rank of a matrix, we need a resultthat allows us to perform rank-preserving operations on matrices. The nexttheorem and its corollary tell us how to do this.

Theorem 3.4. Let A be an m × n matrix. If P and Q are invertiblem × m and n × n matrices, respectively, then

(a) rank(AQ) = rank(A),(b) rank(PA) = rank(A),

and therefore,(c) rank(PAQ) = rank(A).

Proof. First observe that

R(LAQ) = R(LALQ) = LALQ(Fn) = LA(LQ(Fn)) = LA(Fn) = R(LA)

since LQ is onto. Therefore

rank(AQ) = dim(R(LAQ)) = dim(R(LA)) = rank(A).

This establishes (a). To establish (b), apply Exercise 17 of Section 2.4 toT = LP . We omit the details. Finally, applying (a) and (b), we have

rank(PAQ) = rank(PA) = rank(A).

Corollary. Elementary row and column operations on a matrix are rank-preserving.

Proof. If B is obtained from a matrix A by an elementary row operation,then there exists an elementary matrix E such that B = EA. By Theorem 3.2(p. 150), E is invertible, and hence rank(B) = rank(A) by Theorem 3.4. Theproof that elementary column operations are rank-preserving is left as anexercise.

Now that we have a class of matrix operations that preserve rank, weneed a way of examining a transformed matrix to ascertain its rank. Thenext theorem is the first of several in this direction.

Theorem 3.5. The rank of any matrix equals the maximum number of itslinearly independent columns; that is, the rank of a matrix is the dimensionof the subspace generated by its columns.

Proof. For any A ∈ Mm×n(F ),

rank(A) = rank(LA) = dim(R(LA)).


Let β be the standard ordered basis for Fn. Then β spans Fn and hence, byTheorem 2.2 (p. 68),

R(LA) = span(LA(β)) = span ({LA(e1), LA(e2), . . . , LA(en)}) .

But, for any j, we have seen in Theorem 2.13(b) (p. 90) that LA(ej) = Aej =aj , where aj the jth column of A. Hence

R(LA) = span ({a1, a2, . . . , an}) .

Thus

rank(A) = dim(R(LA)) = dim(span ({a1, a2, . . . , an})).Example 1

Let

A =

⎛⎝1 0 10 1 11 0 1

⎞⎠ .

Observe that the first and second columns of A are linearly independent andthat the third column is a linear combination of the first two. Thus

rank(A) = dim

⎛⎝span

⎛⎝⎧⎨⎩⎛⎝1

01

⎞⎠ ,

⎛⎝010

⎞⎠ ,

⎛⎝111

⎞⎠⎫⎬⎭⎞⎠⎞⎠ = 2. ♦

To compute the rank of a matrix A, it is frequently useful to postpone theuse of Theorem 3.5 until A has been suitably modified by means of appro-priate elementary row and column operations so that the number of linearlyindependent columns is obvious. The corollary to Theorem 3.4 guaranteesthat the rank of the modified matrix is the same as the rank of A. Onesuch modification of A can be obtained by using elementary row and col-umn operations to introduce zero entries. The next example illustrates thisprocedure.

Example 2

Let

A =

⎛⎝1 2 11 0 31 1 2

⎞⎠ .

If we subtract the first row of A from rows 2 and 3 (type 3 elementary rowoperations), the result is ⎛⎝1 2 1

0 −2 20 −1 1

⎞⎠ .


If we now subtract twice the first column from the second and subtract thefirst column from the third (type 3 elementary column operations), we obtain⎛⎝1 0 0

0 −2 20 −1 1

⎞⎠ .

It is now obvious that the maximum number of linearly independent columnsof this matrix is 2. Hence the rank of A is 2. ♦

The next theorem uses this process to transform a matrix into a particu-larly simple form. The power of this theorem can be seen in its corollaries.

Theorem 3.6. Let A be an m× n matrix of rank r. Then r ≤ m, r ≤ n,and, by means of a finite number of elementary row and column operations,A can be transformed into the matrix

D =(

Ir O1

O2 O3

),

where O1, O2, and O3 are zero matrices. Thus Dii = 1 for i ≤ r and Dij = 0otherwise.

Theorem 3.6 and its corollaries are quite important. Its proof, thougheasy to understand, is tedious to read. As an aid in following the proof, wefirst consider an example.

Example 3

Consider the matrix

A =

⎛⎜⎜⎝0 2 4 2 24 4 4 8 08 2 0 10 26 3 2 9 1

⎞⎟⎟⎠ .

By means of a succession of elementary row and column operations, we cantransform A into a matrix D as in Theorem 3.6. We list many of the inter-mediate matrices, but on several occasions a matrix is transformed from thepreceding one by means of several elementary operations. The number aboveeach arrow indicates how many elementary operations are involved. Try toidentify the nature of each elementary operation (row or column and type)in the following matrix transformations.⎛⎜⎜⎝

0 2 4 2 24 4 4 8 08 2 0 10 26 3 2 9 1

⎞⎟⎟⎠ 1−→

⎛⎜⎜⎝4 4 4 8 00 2 4 2 28 2 0 10 26 3 2 9 1

⎞⎟⎟⎠ 1−→

⎛⎜⎜⎝1 1 1 2 00 2 4 2 28 2 0 10 26 3 2 9 1

⎞⎟⎟⎠ 2−→

156 Chap. 3 Elementary Matrix Operations and Systems of Linear Equations⎛⎜⎜⎝1 1 1 2 00 2 4 2 20 −6 −8 −6 20 −3 −4 −3 1

⎞⎟⎟⎠ 3−→

⎛⎜⎜⎝1 0 0 0 00 2 4 2 20 −6 −8 −6 20 −3 −4 −3 1

⎞⎟⎟⎠ 1−→

⎛⎜⎜⎝1 0 0 0 00 1 2 1 10 −6 −8 −6 20 −3 −4 −3 1

⎞⎟⎟⎠ 2−→

⎛⎜⎜⎝1 0 0 0 00 1 2 1 10 0 4 0 80 0 2 0 4

⎞⎟⎟⎠ 3−→

⎛⎜⎜⎝1 0 0 0 00 1 0 0 00 0 4 0 80 0 2 0 4

⎞⎟⎟⎠ 1−→

⎛⎜⎜⎝1 0 0 0 00 1 0 0 00 0 1 0 20 0 2 0 4

⎞⎟⎟⎠ 1−→

⎛⎜⎜⎝1 0 0 0 00 1 0 0 00 0 1 0 20 0 0 0 0

⎞⎟⎟⎠ 1−→

⎛⎜⎜⎝1 0 0 0 00 1 0 0 00 0 1 0 00 0 0 0 0

⎞⎟⎟⎠ = D

By the corollary to Theorem 3.4, rank(A) = rank(D). Clearly, however,rank(D) = 3; so rank(A) = 3. ♦

Note that the first two elementary operations in Example 3 result in a1 in the 1,1 position, and the next several operations (type 3) result in 0’severywhere in the first row and first column except for the 1,1 position. Sub-sequent elementary operations do not change the first row and first column.With this example in mind, we proceed with the proof of Theorem 3.6.

Proof of Theorem 3.6. If A is the zero matrix, r = 0 by Exercise 3. Inthis case, the conclusion follows with D = A.

Now suppose that A �= O and r = rank(A); then r > 0. The proof is bymathematical induction on m, the number of rows of A.

Suppose that m = 1. By means of at most one type 1 column operationand at most one type 2 column operation, A can be transformed into a matrixwith a 1 in the 1,1 position. By means of at most n − 1 type 3 columnoperations, this matrix can in turn be transformed into the matrix(

1 0 · · · 0).

Note that there is one linearly independent column in D. So rank(D) =rank(A) = 1 by the corollary to Theorem 3.4 and by Theorem 3.5. Thus thetheorem is established for m = 1.

Next assume that the theorem holds for any matrix with at most m − 1rows (for some m > 1). We must prove that the theorem holds for any matrixwith m rows.

Suppose that A is any m × n matrix. If n = 1, Theorem 3.6 can beestablished in a manner analogous to that for m = 1 (see Exercise 10).

We now suppose that n > 1. Since A �= O, Aij �= 0 for some i, j. Bymeans of at most one elementary row and at most one elementary column


operation (each of type 1), we can move the nonzero entry to the 1,1 position(just as was done in Example 3). By means of at most one additional type 2operation, we can assure a 1 in the 1,1 position. (Look at the second operationin Example 3.) By means of at most m−1 type 3 row operations and at mostn − 1 type 3 column operations, we can eliminate all nonzero entries in thefirst row and the first column with the exception of the 1 in the 1,1 position.(In Example 3, we used two row and three column operations to do this.)

Thus, with a finite number of elementary operations, A can be transformedinto a matrix

B =

⎛⎜⎜⎜⎝1 0 · · · 00...0

B′

⎞⎟⎟⎟⎠ ,

where B′ is an (m − 1) × (n − 1) matrix. In Example 3, for instance,

B′ =

⎛⎝ 2 4 2 2−6 −8 −6 2−3 −4 −3 1

⎞⎠ .

By Exercise 11, B′ has rank one less than B. Since rank(A) = rank(B) =r, rank(B′) = r − 1. Therefore r − 1 ≤ m − 1 and r − 1 ≤ n − 1 by theinduction hypothesis. Hence r ≤ m and r ≤ n.

Also by the induction hypothesis, B′ can be transformed by a finite num-ber of elementary row and column operations into the (m−1)×(n−1) matrixD′ such that

D′ =(

Ir−1 O4

O5 O6

),

where O4, O5, and O6 are zero matrices. That is, D′ consists of all zerosexcept for its first r − 1 diagonal entries, which are ones. Let

D =

⎛⎜⎜⎜⎝1 0 · · · 00...0

D′

⎞⎟⎟⎟⎠ .

We see that the theorem now follows once we show that D can be obtainedfrom B by means of a finite number of elementary row and column operations.However this follows by repeated applications of Exercise 12.

Thus, since A can be transformed into B and B can be transformed intoD, each by a finite number of elementary operations, A can be transformedinto D by a finite number of elementary operations.


Finally, since D′ contains ones as its first r−1 diagonal entries, D containsones as its first r diagonal entries and zeros elsewhere. This establishes thetheorem.

Corollary 1. Let A be an m × n matrix of rank r. Then there existinvertible matrices B and C of sizes m×m and n×n, respectively, such thatD = BAC, where

D =(

Ir O1

O2 O3

)is the m × n matrix in which O1, O2, and O3 are zero matrices.

Proof. By Theorem 3.6, A can be transformed by means of a finite numberof elementary row and column operations into the matrix D. We can appealto Theorem 3.1 (p. 149) each time we perform an elementary operation. Thusthere exist elementary m × m matrices E1, E2, . . . , Ep and elementary n × nmatrices G1, G2, . . . , Gq such that

D = EpEp−1 · · ·E2E1AG1G2 · · ·Gq.

By Theorem 3.2 (p. 150), each Ej and Gj is invertible. Let B = EpEp−1 · · ·E1

and C = G1G2 · · ·Gq. Then B and C are invertible by Exercise 4 of Sec-tion 2.4, and D = BAC.

Corollary 2. Let A be an m × n matrix. Then(a) rank(At) = rank(A).(b) The rank of any matrix equals the maximum number of its linearly

independent rows; that is, the rank of a matrix is the dimension of thesubspace generated by its rows.

(c) The rows and columns of any matrix generate subspaces of the samedimension, numerically equal to the rank of the matrix.

Proof. (a) By Corollary 1, there exist invertible matrices B and C suchthat D = BAC, where D satisfies the stated conditions of the corollary.Taking transposes, we have

Dt = (BAC)t = CtAtBt.

Since B and C are invertible, so are Bt and Ct by Exercise 5 of Section 2.4.Hence by Theorem 3.4,

rank(At) = rank(CtAtBt) = rank(Dt).

Suppose that r = rank(A). Then Dt is an n×m matrix with the form of thematrix D in Corollary 1, and hence rank(Dt) = r by Theorem 3.5. Thus

rank(At) = rank(Dt) = r = rank(A).

This establishes (a).The proofs of (b) and (c) are left as exercises. (See Exercise 13.)


Corollary 3. Every invertible matrix is a product of elementary matrices.

Proof. If A is an invertible n × n matrix, then rank(A) = n. Hence thematrix D in Corollary 1 equals In, and there exist invertible matrices B andC such that In = BAC.

As in the proof of Corollary 1, note that B = EpEp−1 · · ·E1 and C =G1G2 · · ·Gq, where the Ei’s and Gi’s are elementary matrices. Thus A =B−1InC−1 = B−1C−1, so that

A = E−11 E−1

2 · · ·E−1p G−1

q G−1q−1 · · ·G−1

1 .

The inverses of elementary matrices are elementary matrices, however, andhence A is the product of elementary matrices.

We now use Corollary 2 to relate the rank of a matrix product to the rankof each factor. Notice how the proof exploits the relationship between therank of a matrix and the rank of a linear transformation.

Theorem 3.7. Let T : V → W and U : W → Z be linear transformationson finite-dimensional vector spaces V, W, and Z, and let A and B be matricessuch that the product AB is defined. Then

(a) rank(UT) ≤ rank(U).(b) rank(UT) ≤ rank(T).(c) rank(AB) ≤ rank(A).(d) rank(AB) ≤ rank(B).

Proof. We prove these items in the order: (a), (c), (d), and (b).(a) Clearly, R(T) ⊆ W. Hence

R(UT) = UT(V) = U(T(V)) = U(R(T)) ⊆ U(W) = R(U).

Thus

rank(UT) = dim(R(UT)) ≤ dim(R(U)) = rank(U).

(c) By (a),

rank(AB) = rank(LAB) = rank(LALB) ≤ rank(LA) = rank(A).

(d) By (c) and Corollary 2 to Theorem 3.6,

rank(AB) = rank((AB)t) = rank(BtAt) ≤ rank(Bt) = rank(B).

(b) Let α, β, and γ be ordered bases for V, W, and Z, respectively, andlet A′ = [U]γβ and B′ = [T]βα. Then A′B′ = [UT]γα by Theorem 2.11 (p. 88).Hence, by Theorem 3.3 and (d),

rank(UT) = rank(A′B′) ≤ rank(B′) = rank(T).


It is important to be able to compute the rank of any matrix. We canuse the corollary to Theorem 3.4, Theorems 3.5 and 3.6, and Corollary 2 toTheorem 3.6 to accomplish this goal.

The object is to perform elementary row and column operations on amatrix to “simplify” it (so that the transformed matrix has many zero entries)to the point where a simple observation enables us to determine how manylinearly independent rows or columns the matrix has, and thus to determineits rank.

Example 4

(a) Let

A =(

1 2 1 11 1 −1 1

).

Note that the first and second rows of A are linearly independent since oneis not a multiple of the other. Thus rank(A) = 2.

(b) Let

A =

⎛⎝1 3 1 11 0 1 10 3 0 0

⎞⎠ .

In this case, there are several ways to proceed. Suppose that we begin withan elementary row operation to obtain a zero in the 2,1 position. Subtractingthe first row from the second row, we obtain⎛⎝1 3 1 1

0 −3 0 00 3 0 0

⎞⎠ .

Now note that the third row is a multiple of the second row, and the first andsecond rows are linearly independent. Thus rank(A) = 2.

As an alternative method, note that the first, third, and fourth columnsof A are identical and that the first and second columns of A are linearlyindependent. Hence rank(A) = 2.

(c) Let

A =

⎛⎝1 2 3 12 1 1 11 −1 1 0

⎞⎠ .

Using elementary row operations, we can transform A as follows:

A −→⎛⎝1 2 3 1

0 −3 −5 −10 −3 −2 −1

⎞⎠ −→⎛⎝1 2 3 1

0 −3 −5 −10 0 3 0

⎞⎠ .


It is clear that the last matrix has three linearly independent rows and hencehas rank 3. ♦

In summary, perform row and column operations until the matrix is sim-plified enough so that the maximum number of linearly independent rows orcolumns is obvious.

The Inverse of a Matrix

We have remarked that an n×n matrix is invertible if and only if its rankis n. Since we know how to compute the rank of any matrix, we can alwaystest a matrix to determine whether it is invertible. We now provide a simpletechnique for computing the inverse of a matrix that utilizes elementary rowoperations.

Definition. Let A and B be m × n and m × p matrices, respectively.By the augmented matrix (A|B), we mean the m× (n + p) matrix (A B),that is, the matrix whose first n columns are the columns of A, and whoselast p columns are the columns of B.

Let A be an invertible n × n matrix, and consider the n × 2n augmentedmatrix C = (A|In). By Exercise 15, we have

A−1C = (A−1A|A−1In) = (In|A−1). (1)

By Corollary 3 to Theorem 3.6, A−1 is the product of elementary matrices,say A−1 = EpEp−1 · · ·E1. Thus (1) becomes

EpEp−1 · · ·E1(A|In) = A−1C = (In|A−1).

Because multiplying a matrix on the left by an elementary matrix transformsthe matrix by an elementary row operation (Theorem 3.1 p. 149), we havethe following result: If A is an invertible n × n matrix, then it is possible totransform the matrix (A|In) into the matrix (In|A−1) by means of a finitenumber of elementary row operations.

Conversely, suppose that A is invertible and that, for some n × n matrixB, the matrix (A|In) can be transformed into the matrix (In|B) by a finitenumber of elementary row operations. Let E1, E2, . . . , Ep be the elementarymatrices associated with these elementary row operations as in Theorem 3.1;then

EpEp−1 · · ·E1(A|In) = (In|B). (2)

Letting M = EpEp−1 · · ·E1, we have from (2) that

(MA|M) = M(A|In) = (In|B).


Hence MA = In and M = B. It follows that M = A−1. So B = A−1. Thuswe have the following result: If A is an invertible n×n matrix, and the matrix(A|In) is transformed into a matrix of the form (In|B) by means of a finitenumber of elementary row operations, then B = A−1.

If, on the other hand, A is an n × n matrix that is not invertible, thenrank(A) < n. Hence any attempt to transform (A|In) into a matrix of theform (In|B) by means of elementary row operations must fail because oth-erwise A can be transformed into In using the same row operations. Thisis impossible, however, because elementary row operations preserve rank. Infact, A can be transformed into a matrix with a row containing only zeroentries, yielding the following result: If A is an n × n matrix that is notinvertible, then any attempt to transform (A|In) into a matrix of the form(In|B) produces a row whose first n entries are zeros.

The next two examples demonstrate these comments.

Example 5

We determine whether the matrix

A =

⎛⎝0 2 42 4 23 3 1

⎞⎠is invertible, and if it is, we compute its inverse.

We attempt to use elementary row operations to transform

(A|I) =

⎛⎝0 2 4 1 0 02 4 2 0 1 03 3 1 0 0 1

⎞⎠into a matrix of the form (I|B). One method for accomplishing this transfor-mation is to change each column of A successively, beginning with the firstcolumn, into the corresponding column of I. Since we need a nonzero entryin the 1,1 position, we begin by interchanging rows 1 and 2. The result is⎛⎝2 4 2 0 1 0

0 2 4 1 0 03 3 1 0 0 1

⎞⎠ .

In order to place a 1 in the 1,1 position, we must multiply the first row by 12 ;

this operation yields ⎛⎜⎝1 2 1 0 12 0

0 2 4 1 0 03 3 1 0 0 1

⎞⎟⎠ .


We now complete work in the first column by adding −3 times row 1 to row3 to obtain ⎛⎜⎝1 2 1 0 1

2 00 2 4 1 0 00 −3 −2 0 − 3

2 1

⎞⎟⎠ .

In order to change the second column of the preceding matrix into thesecond column of I, we multiply row 2 by 1

2 to obtain a 1 in the 2,2 position.This operation produces⎛⎜⎝1 2 1 0 1

2 0

0 1 2 12 0 0

0 −3 −2 0 − 32 1

⎞⎟⎠ .

We now complete our work on the second column by adding −2 times row 2to row 1 and 3 times row 2 to row 3. The result is⎛⎜⎜⎝

1 0 −3 −1 12 0

0 1 2 12 0 0

0 0 4 32 − 3

2 1

⎞⎟⎟⎠ .

Only the third column remains to be changed. In order to place a 1 in the3,3 position, we multiply row 3 by 1

4 ; this operation yields⎛⎜⎜⎝1 0 −3 −1 1

2 0

0 1 2 12 0 0

0 0 1 38 − 3

814

⎞⎟⎟⎠ .

Adding appropriate multiples of row 3 to rows 1 and 2 completes the processand gives ⎛⎜⎜⎝

1 0 0 18 − 5

834

0 1 0 − 14

34 − 1

2

0 0 1 38 − 3

814

⎞⎟⎟⎠ .

Thus A is invertible, and

A−1 =

⎛⎜⎜⎝18 − 5

834

− 14

34 − 1

2

38 − 3

814

⎞⎟⎟⎠ . ♦


Example 6

We determine whether the matrix

A =

⎛⎝1 2 12 1 −11 5 4

⎞⎠is invertible, and if it is, we compute its inverse. Using a strategy similar tothe one used in Example 5, we attempt to use elementary row operations totransform

(A|I) =

⎛⎝1 2 1 1 0 02 1 −1 0 1 01 5 4 0 0 1

⎞⎠into a matrix of the form (I|B). We first add −2 times row 1 to row 2 and−1 times row 1 to row 3. We then add row 2 to row 3. The result,⎛⎝1 2 1 1 0 0

2 1 −1 0 1 01 5 4 0 0 1

⎞⎠ −→⎛⎝1 2 1 1 0 0

0 −3 −3 −2 1 00 3 3 −1 0 1

⎞⎠

−→⎛⎝1 2 1 1 0 0

0 −3 −3 −2 1 00 0 0 −3 1 1

⎞⎠ ,

is a matrix with a row whose first 3 entries are zeros. Therefore A is notinvertible. ♦

Being able to test for invertibility and compute the inverse of a matrixallows us, with the help of Theorem 2.18 (p. 101) and its corollaries, to testfor invertibility and compute the inverse of a linear transformation. The nextexample demonstrates this technique.

Example 7

Let T : P2(R) → P2(R) be defined by T(f(x)) = f(x) + f ′(x) + f ′′(x), wheref ′(x) and f ′′(x) denote the first and second derivatives of f(x). We useCorollary 1 of Theorem 2.18 (p. 102) to test T for invertibility and computethe inverse if T is invertible. Taking β to be the standard ordered basis ofP2(R), we have

[T]β =

⎛⎝1 1 20 1 20 0 1

⎞⎠ .


Using the method of Examples 5 and 6, we can show that [T]β is invertiblewith inverse

([T]β)−1 =

⎛⎝1 −1 00 1 −20 0 1

⎞⎠ .

Thus T is invertible, and ([T]β)−1 = [T−1]β . Hence by Theorem 2.14 (p. 91),we have

[T−1(a0 + a1x + a2x2)]β =

⎛⎝1 −1 00 1 −20 0 1

⎞⎠⎛⎝a0

a1

a2

⎞⎠=

⎛⎝ a0 − a1

a1 − 2a2

a2

⎞⎠ .

Therefore

T−1(a0 + a1x + a2x2) = (a0 − a1) + (a1 − 2a2)x + a2x

2. ♦

EXERCISES


(a) The rank of a matrix is equal to the number of its nonzero columns.(b) The product of two matrices always has rank equal to the lesser of

the ranks of the two matrices.(c) The m × n zero matrix is the only m × n matrix having rank 0.(d) Elementary row operations preserve rank.(e) Elementary column operations do not necessarily preserve rank.(f) The rank of a matrix is equal to the maximum number of linearly

independent rows in the matrix.(g) The inverse of a matrix can be computed exclusively by means of

elementary row operations.(h) The rank of an n × n matrix is at most n.(i) An n × n matrix having rank n is invertible.

2. Find the rank of the following matrices.

(a)

⎛⎝1 1 00 1 11 1 0

⎞⎠ (b)

⎛⎝1 1 02 1 11 1 1

⎞⎠ (c)(

1 0 21 1 4

)


(d)(

1 2 12 4 2

)(e)

⎛⎜⎜⎝1 2 3 1 11 4 0 1 20 2 −3 0 11 0 0 0 0

⎞⎟⎟⎠

(f)

⎛⎜⎜⎝1 2 0 1 12 4 1 3 03 6 2 5 1

−4 −8 1 −3 1

⎞⎟⎟⎠ (g)

⎛⎜⎜⎝1 1 0 12 2 0 21 1 0 11 1 0 1

⎞⎟⎟⎠3. Prove that for any m× n matrix A, rank(A) = 0 if and only if A is the

zero matrix.

4. Use elementary row and column operations to transform each of thefollowing matrices into a matrix D satisfying the conditions of Theo-rem 3.6, and then determine the rank of each matrix.

(a)

⎛⎝1 1 1 22 0 −1 21 1 1 2

⎞⎠ (b)

⎛⎝ 2 1−1 2

2 1

⎞⎠5. For each of the following matrices, compute the rank and the inverse if

it exists.

(a)(

1 21 1

)(b)

(1 22 4

)(c)

⎛⎝1 2 11 3 42 3 −1

⎞⎠

(d)

⎛⎝0 −2 41 1 −12 4 −5

⎞⎠ (e)

⎛⎝ 1 2 1−1 1 2

1 0 1

⎞⎠ (f)

⎛⎝1 2 11 0 11 1 1

⎞⎠

(g)

⎛⎜⎜⎝1 2 1 02 5 5 1

−2 −3 0 33 4 −2 −3

⎞⎟⎟⎠ (h)

⎛⎜⎜⎝1 0 1 11 1 −1 22 0 1 00 −1 1 −3

⎞⎟⎟⎠6. For each of the following linear transformations T, determine whether

T is invertible, and compute T−1 if it exists.

(a) T : P2(R) → P2(R) defined by T(f(x)) = f ′′(x) + 2f ′(x) − f(x).(b) T : P2(R) → P2(R) defined by T(f(x)) = (x + 1)f ′(x).(c) T : R3 → R3 defined by

T(a1, a2, a3) = (a1 + 2a2 + a3,−a1 + a2 + 2a3, a1 + a3).


(d) T : R3 → P2(R) defined by

T(a1, a2, a3) = (a1 + a2 + a3) + (a1 − a2 + a3)x + a1x2.

(e) T : P2(R) → R3 defined by T(f(x)) = (f(−1), f(0), f(1)).(f) T : M2×2(R) → R4 defined by

T(A) = (tr(A), tr(At), tr(EA), tr(AE)),

where

E =(

0 11 0

).

7. Express the invertible matrix⎛⎝1 2 11 0 11 1 2

⎞⎠as a product of elementary matrices.

8. Let A be an m × n matrix. Prove that if c is any nonzero scalar, thenrank(cA) = rank(A).

9. Complete the proof of the corollary to Theorem 3.4 by showing thatelementary column operations preserve rank.

10. Prove Theorem 3.6 for the case that A is an m × 1 matrix.

11. Let

B =

⎛⎜⎜⎜⎝1 0 · · · 00...0

B′

⎞⎟⎟⎟⎠ ,

where B′ is an m × n submatrix of B. Prove that if rank(B) = r, thenrank(B′) = r − 1.

12. Let B′ and D′ be m×n matrices, and let B and D be (m+1)× (n+1)matrices respectively defined by

B =

⎛⎜⎜⎜⎝1 0 · · · 00...0

B′

⎞⎟⎟⎟⎠ and D =

⎛⎜⎜⎜⎝1 0 · · · 00...0

D′

⎞⎟⎟⎟⎠ .

Prove that if B′ can be transformed into D′ by an elementary row[column] operation, then B can be transformed into D by an elementaryrow [column] operation.


13. Prove (b) and (c) of Corollary 2 to Theorem 3.6.

14. Let T, U : V → W be linear transformations.(a) Prove that R(T+U) ⊆ R(T)+R(U). (See the definition of the sum

of subsets of a vector space on page 22.)(b) Prove that if W is finite-dimensional, then rank(T+U) ≤ rank(T)+

rank(U).(c) Deduce from (b) that rank(A + B) ≤ rank(A) + rank(B) for any

m × n matrices A and B.

15. Suppose that A and B are matrices having n rows. Prove thatM(A|B) = (MA|MB) for any m × n matrix M .

16. Supply the details to the proof of (b) of Theorem 3.4.

17. Prove that if B is a 3× 1 matrix and C is a 1× 3 matrix, then the 3× 3matrix BC has rank at most 1. Conversely, show that if A is any 3× 3matrix having rank 1, then there exist a 3 × 1 matrix B and a 1 × 3matrix C such that A = BC.

18. Let A be an m × n matrix and B be an n × p matrix. Prove that ABcan be written as a sum of n matrices of rank one.

19. Let A be an m× n matrix with rank m and B be an n× p matrix withrank n. Determine the rank of AB. Justify your answer.

20. Let

A =

⎛⎜⎜⎝1 0 −1 2 1

−1 1 3 −1 0−2 1 4 −1 3

3 −1 −5 1 −6

⎞⎟⎟⎠ .

(a) Find a 5 × 5 matrix M with rank 2 such that AM = O, where Ois the 4 × 5 zero matrix.

(b) Suppose that B is a 5 × 5 matrix such that AB = O. Prove thatrank(B) ≤ 2.

21. Let A be an m × n matrix with rank m. Prove that there exists ann × m matrix B such that AB = Im.

22. Let B be an n × m matrix with rank m. Prove that there exists anm × n matrix A such that AB = Im.

3.3 SYSTEMS OF LINEAR EQUATIONS—THEORETICAL ASPECTS

This section and the next are devoted to the study of systems of linear equa-tions, which arise naturally in both the physical and social sciences. In thissection, we apply results from Chapter 2 to describe the solution sets of

Sec. 3.3 Systems of Linear Equations—Theoretical Aspects 169

systems of linear equations as subsets of a vector space. In Section 3.4, el-ementary row operations are used to provide a computational method forfinding all solutions to such systems.

The system of equations

(S)

a11x1 + a12x2 + · · · + a1nxn = b1

a21x1 + a22x2 + · · · + a2nxn = b2

...am1x1 + am2x2 + · · · + amnxn = bm,

where aij and bi (1 ≤ i ≤ m and 1 ≤ j ≤ n) are scalars in a field F andx1, x2, . . . , xn are n variables taking values in F , is called a system of mlinear equations in n unknowns over the field F .

The m × n matrix

A =

⎛⎜⎜⎜⎝a11 a12 · · · a1n

a21 a22 · · · a2n

......

...am1 am2 · · · amn

⎞⎟⎟⎟⎠is called the coefficient matrix of the system (S).

If we let

x =

⎛⎜⎜⎜⎝x1

x2

...xn

⎞⎟⎟⎟⎠ and b =

⎛⎜⎜⎜⎝b1

b2

...bm

⎞⎟⎟⎟⎠ ,

then the system (S) may be rewritten as a single matrix equation

Ax = b.

To exploit the results that we have developed, we often consider a system oflinear equations as a single matrix equation.

A solution to the system (S) is an n-tuple

s =

⎛⎜⎜⎜⎝s1

s2

...sn

⎞⎟⎟⎟⎠ ∈ Fn

such that As = b. The set of all solutions to the system (S) is called thesolution set of the system. System (S) is called consistent if its solutionset is nonempty; otherwise it is called inconsistent.


Example 1

(a) Consider the system

x1 + x2 = 3x1 − x2 = 1.

By use of familiar techniques, we can solve the preceding system and concludethat there is only one solution: x1 = 2, x2 = 1; that is,

s =(

21

).

In matrix form, the system can be written(1 11 −1

)(x1

x2

)=(

31

);

so

A =(

1 11 −1

)and B =

(31

).

(b) Consider

2x1 + 3x2 + x3 = 1x1 − x2 + 2x3 = 6;

that is, (2 3 11 −1 2

)⎛⎝x1

x2

x3

⎞⎠ =(

16

).

This system has many solutions, such as

s =

⎛⎝−627

⎞⎠ and s =

⎛⎝ 8−4−3

⎞⎠ .

(c) Consider

x1 + x2 = 0x1 + x2 = 1;

that is, (1 11 1

)(x1

x2

)=(

01

).

It is evident that this system has no solutions. Thus we see that a system oflinear equations can have one, many, or no solutions. ♦


We must be able to recognize when a system has a solution and then beable to describe all its solutions. This section and the next are devoted tothis end.

We begin our study of systems of linear equations by examining the classof homogeneous systems of linear equations. Our first result (Theorem 3.8)shows that the set of solutions to a homogeneous system of m linear equationsin n unknowns forms a subspace of Fn. We can then apply the theory of vectorspaces to this set of solutions. For example, a basis for the solution space canbe found, and any solution can be expressed as a linear combination of thevectors in the basis.

Definitions. A system Ax = b of m linear equations in n unknownsis said to be homogeneous if b = 0 . Otherwise the system is said to benonhomogeneous.

Any homogeneous system has at least one solution, namely, the zero vec-tor. The next result gives further information about the set of solutions to ahomogeneous system.

Theorem 3.8. Let Ax = 0 be a homogeneous system of m linear equa-tions in n unknowns over a field F . Let K denote the set of all solutionsto Ax = 0 . Then K = N(LA); hence K is a subspace of Fn of dimensionn − rank(LA) = n − rank(A).

Proof. Clearly, K = {s ∈ Fn : As = 0} = N(LA). The second part nowfollows from the dimension theorem (p. 70).

Corollary. If m < n, the system Ax = 0 has a nonzero solution.

Proof. Suppose that m < n. Then rank(A) = rank(LA) ≤ m. Hence

dim(K) = n − rank(LA) ≥ n − m > 0,

where K = N(LA). Since dim(K) > 0, K �= {0}. Thus there exists a nonzerovector s ∈ K; so s is a nonzero solution to Ax = 0 .

Example 2


x1 + 2x2 + x3 = 0x1 − x2 − x3 = 0.

Let

A =(

1 2 11 −1 −1

)


be the coefficient matrix of this system. It is clear that rank(A) = 2. If K isthe solution set of this system, then dim(K) = 3 − 2 = 1. Thus any nonzerosolution constitutes a basis for K. For example, since⎛⎝ 1

−23

⎞⎠is a solution to the given system,⎧⎨⎩

⎛⎝ 1−2

3

⎞⎠⎫⎬⎭is a basis for K. Thus any vector in K is of the form

t

⎛⎝ 1−2

3

⎞⎠ =

⎛⎝ t−2t

3t

⎞⎠ ,

where t ∈ R.

(b) Consider the system x1 − 2x2 + x3 = 0 of one equation in threeunknowns. If A =

(1 −2 1

)is the coefficient matrix, then rank(A) = 1.

Hence if K is the solution set, then dim(K) = 3 − 1 = 2. Note that⎛⎝210

⎞⎠ and

⎛⎝−101

⎞⎠are linearly independent vectors in K. Thus they constitute a basis for K, sothat

K =

⎧⎨⎩t1

⎛⎝210

⎞⎠+ t2

⎛⎝−101

⎞⎠: t1, t2 ∈ R

⎫⎬⎭ . ♦

In Section 3.4, explicit computational methods for finding a basis for thesolution set of a homogeneous system are discussed.

We now turn to the study of nonhomogeneous systems. Our next resultshows that the solution set of a nonhomogeneous system Ax = b can bedescribed in terms of the solution set of the homogeneous system Ax = 0 . Werefer to the equation Ax = 0 as the homogeneous system correspondingto Ax = b.

Theorem 3.9. Let K be the solution set of a system of linear equationsAx = b, and let KH be the solution set of the corresponding homogeneoussystem Ax = 0 . Then for any solution s to Ax = b

K = {s} + KH = {s + k : k ∈ KH}.


Proof. Let s be any solution to Ax = b. We must show that K = {s}+KH.If w ∈ K, then Aw = b. Hence

A(w − s) = Aw − As = b − b = 0 .

So w− s ∈ KH. Thus there exists k ∈ KH such that w− s = k. It follows thatw = s + k ∈ {s} + KH, and therefore

K ⊆ {s} + KH.

Conversely, suppose that w ∈ {s} + KH; then w = s + k for some k ∈ KH.But then Aw = A(s + k) = As + Ak = b + 0 = b; so w ∈ K. Therefore{s} + KH ⊆ K, and thus K = {s} + KH.

Example 3


x1 + 2x2 + x3 = 7x1 − x2 − x3 = −4.

The corresponding homogeneous system is the system in Example 2(a). It iseasily verified that

s =

⎛⎝114

⎞⎠is a solution to the preceding nonhomogeneous system. So the solution set ofthe system is

K =

⎧⎨⎩⎛⎝1

14

⎞⎠+ t

⎛⎝ 1−2

3

⎞⎠: t ∈ R

⎫⎬⎭by Theorem 3.9.

(b) Consider the system x1 − 2x2 + x3 = 4. The corresponding homoge-neous system is the system in Example 2(b). Since

s =

⎛⎝400

⎞⎠is a solution to the given system, the solution set K can be written as

K =

⎧⎨⎩⎛⎝4

00

⎞⎠+ t1

⎛⎝210

⎞⎠+ t2

⎛⎝−101

⎞⎠: t1, t2 ∈ R

⎫⎬⎭ . ♦


The following theorem provides us with a means of computing solutionsto certain systems of linear equations.

Theorem 3.10. Let Ax = b be a system of n linear equations in nunknowns. If A is invertible, then the system has exactly one solution, namely,A−1b. Conversely, if the system has exactly one solution, then A is invertible.

Proof. Suppose that A is invertible. Substituting A−1b into the system, wehave A(A−1b) = (AA−1)b = b. Thus A−1b is a solution. If s is an arbitrarysolution, then As = b. Multiplying both sides by A−1 gives s = A−1b. Thusthe system has one and only one solution, namely, A−1b.

Conversely, suppose that the system has exactly one solution s. Let KH

denote the solution set for the corresponding homogeneous system Ax = 0 .By Theorem 3.9, {s} = {s} + KH. But this is so only if KH = {0}. ThusN(LA) = {0}, and hence A is invertible.

Example 4

Consider the following system of three linear equations in three unknowns:

2x2 + 4x3 = 22x1 + 4x2 + 2x3 = 33x1 + 3x2 + x3 = 1.

In Example 5 of Section 3.2, we computed the inverse of the coefficient matrixA of this system. Thus the system has exactly one solution, namely,

⎛⎝x1

x2

x3

⎞⎠ = A−1b =

⎛⎜⎜⎝18 − 5

834

− 14

34 − 1

2

38 − 3

814

⎞⎟⎟⎠⎛⎝2

31

⎞⎠ =

⎛⎜⎜⎝− 7

8

54

− 18

⎞⎟⎟⎠ . ♦

We use this technique for solving systems of linear equations having in-vertible coefficient matrices in the application that concludes this section.

In Example 1(c), we saw a system of linear equations that has no solutions.We now establish a criterion for determining when a system has solutions.This criterion involves the rank of the coefficient matrix of the system Ax = band the rank of the matrix (A|b). The matrix (A|b) is called the augmentedmatrix of the system Ax = b.

Theorem 3.11. Let Ax = b be a system of linear equations. Then thesystem is consistent if and only if rank(A) = rank(A|b).

Proof. To say that Ax = b has a solution is equivalent to saying thatb ∈ R(LA). (See Exercise 9.) In the proof of Theorem 3.5 (p. 153), we sawthat

R(LA) = span({a1, a2, . . . , an}),


the span of the columns of A. Thus Ax = b has a solution if and onlyif b ∈ span({a1, a2, . . . , an}). But b ∈ span({a1, a2, . . . , an}) if and onlyif span({a1, a2, . . . , an}) = span({a1, a2, . . . , an, b}). This last statement isequivalent to

dim(span({a1, a2, . . . , an})) = dim(span({a1, a2, . . . , an, b})).

So by Theorem 3.5, the preceding equation reduces to

rank(A) = rank(A|b).

Example 5

Recall the system of equations

x1 + x2 = 0x1 + x2 = 1

in Example 1(c).

Since

A =(

1 11 1

)and (A|b) =

(1 1 01 1 1

),

rank(A) = 1 and rank(A|b) = 2. Because the two ranks are unequal, thesystem has no solutions. ♦

Example 6

We can use Theorem 3.11 to determine whether (3, 3, 2) is in the range of thelinear transformation T : R3 → R3 defined by

T(a1, a2, a3) = (a1 + a2 + a3, a1 − a2 + a3, a1 + a3).

Now (3, 3, 2) ∈ R(T) if and only if there exists a vector s = (x1, x2, x3)in R3 such that T(s) = (3, 3, 2). Such a vector s must be a solution to thesystem

x1 + x2 + x3 = 3x1 − x2 + x3 = 3x1 + x3 = 2.

Since the ranks of the coefficient matrix and the augmented matrix of thissystem are 2 and 3, respectively, it follows that this system has no solutions.Hence (3, 3, 2) /∈ R(T). ♦


An Application

In 1973, Wassily Leontief won the Nobel prize in economics for his workin developing a mathematical model that can be used to describe variouseconomic phenomena. We close this section by applying some of the ideas wehave studied to illustrate two special cases of his work.

We begin by considering a simple society composed of three people(industries)—a farmer who grows all the food, a tailor who makes all theclothing, and a carpenter who builds all the housing. We assume that eachperson sells to and buys from a central pool and that everything produced isconsumed. Since no commodities either enter or leave the system, this caseis referred to as the closed model.

Each of these three individuals consumes all three of the commodities pro-duced in the society. Suppose that the proportion of each of the commoditiesconsumed by each person is given in the following table. Notice that each ofthe columns of the table must sum to 1.

Food Clothing Housing

Farmer 0.40 0.20 0.20

Tailor 0.10 0.70 0.20

Carpenter 0.50 0.10 0.60

Let p1, p2, and p3 denote the incomes of the farmer, tailor, and carpenter,respectively. To ensure that this society survives, we require that the con-sumption of each individual equals his or her income. Note that the farmerconsumes 20% of the clothing. Because the total cost of all clothing is p2,the tailor’s income, the amount spent by the farmer on clothing is 0.20p2.Moreover, the amount spent by the farmer on food, clothing, and housingmust equal the farmer’s income, and so we obtain the equation

0.40p1 + 0.20p2 + 0.20p3 = p1.

Similar equations describing the expenditures of the tailor and carpenter pro-duce the following system of linear equations:

0.40p1 + 0.20p2 + 0.20p3 = p1

0.10p1 + 0.70p2 + 0.20p3 = p2

0.50p1 + 0.10p2 + 0.60p3 = p3.

This system can be written as Ap = p, where

p =

⎛⎝p1

p2

p3

⎞⎠


and A is the coefficient matrix of the system. In this context, A is calledthe input–output (or consumption) matrix, and Ap = p is called theequilibrium condition.

For vectors b = (b1, b2, . . . , bn) and c = (c1, c2, . . . , cn) in Rn, we use thenotation b ≥ c [b > c] to mean bi ≥ ci [bi > ci] for all i. The vector b is callednonnegative [positive] if b ≥ 0 [b > 0 ].

At first, it may seem reasonable to replace the equilibrium condition bythe inequality Ap ≤ p, that is, the requirement that consumption not exceedproduction. But, in fact, Ap ≤ p implies that Ap = p in the closed model.For otherwise, there exists a k for which

pk >∑

j

Akjpj .

Hence, since the columns of A sum to 1,

∑i

pi >∑

i

∑j

Aijpj =∑

j

(∑i

Aij

)pj =

∑j

pj ,

which is a contradiction.One solution to the homogeneous system (I−A)x = 0 , which is equivalent

to the equilibrium condition, is

p =

⎛⎝0.250.350.40

⎞⎠ .

We may interpret this to mean that the society survives if the farmer, tailor,and carpenter have incomes in the proportions 25 : 35 : 40 (or 5 : 7 : 8).

Notice that we are not simply interested in any nonzero solution to thesystem, but in one that is nonnegative. Thus we must consider the questionof whether the system (I −A)x = 0 has a nonnegative solution, where A is amatrix with nonnegative entries whose columns sum to 1. A useful theoremin this direction (whose proof may be found in “Applications of Matrices toEconomic Models and Social Science Relationships,” by Ben Noble, Proceed-ings of the Summer Conference for College Teachers on Applied Mathematics,1971, CUPM, Berkeley, California) is stated below.

Theorem 3.12. Let A be an n×n input–output matrix having the form

A =(

B CD E

),

where D is a 1×(n−1) positive vector and C is an (n−1)×1 positive vector.Then (I −A)x = 0 has a one-dimensional solution set that is generated by anonnegative vector.


Observe that any input–output matrix with all positive entries satisfiesthe hypothesis of this theorem. The following matrix does also:⎛⎝0.75 0.50 0.65

0 0.25 0.350.25 0.25 0

⎞⎠ .

In the open model, we assume that there is an outside demand for eachof the commodities produced. Returning to our simple society, let x1, x2,and x3 be the monetary values of food, clothing, and housing produced withrespective outside demands d1, d2, and d3. Let A be the 3 × 3 matrix suchthat Aij represents the amount (in a fixed monetary unit such as the dollar)of commodity i required to produce one monetary unit of commodity j. Thenthe value of the surplus of food in the society is

x1 − (A11x1 + A12x2 + A13x3),

that is, the value of food produced minus the value of food consumed whileproducing the three commodities. The assumption that everything producedis consumed gives us a similar equilibrium condition for the open model,namely, that the surplus of each of the three commodities must equal thecorresponding outside demands. Hence

xi −3∑

j=1

Aijxj = di for i = 1, 2, 3.

In general, we must find a nonnegative solution to (I − A)x = d, whereA is a matrix with nonnegative entries such that the sum of the entries ofeach column of A does not exceed one, and d ≥ 0 . It is easy to see that if(I − A)−1 exists and is nonnegative, then the desired solution is (I − A)−1d.

Recall that for a real number a, the series 1 + a + a2 + · · · converges to(1 − a)−1 if |a| < 1. Similarly, it can be shown (using the concept of conver-gence of matrices developed in Section 5.3) that the series I + A + A2 + · · ·converges to (I − A)−1 if {An} converges to the zero matrix. In this case,(I − A)−1 is nonnegative since the matrices I, A,A2, . . . are nonnegative.

To illustrate the open model, suppose that 30 cents worth of food, 10cents worth of clothing, and 30 cents worth of housing are required for theproduction of $1 worth of food. Similarly, suppose that 20 cents worth offood, 40 cents worth of clothing, and 20 cents worth of housing are requiredfor the production of $1 of clothing. Finally, suppose that 30 cents worth offood, 10 cents worth of clothing, and 30 cents worth of housing are requiredfor the production of $1 worth of housing. Then the input–output matrix is

A =

⎛⎝0.30 0.20 0.300.10 0.40 0.100.30 0.20 0.30

⎞⎠ ;


so

I − A =

⎛⎝ 0.70 −0.20 −0.30−0.10 0.60 −0.10−0.30 −0.20 0.70

⎞⎠ and (I − A)−1 =

⎛⎝2.0 1.0 1.00.5 2.0 0.51.0 1.0 2.0

⎞⎠ .

Since (I−A)−1 is nonnegative, we can find a (unique) nonnegative solution to(I −A)x = d for any demand d. For example, suppose that there are outsidedemands for $30 billion in food, $20 billion in clothing, and $10 billion inhousing. If we set

d =

⎛⎝302010

⎞⎠ ,

then

x = (I − A)−1d =

⎛⎝906070

⎞⎠ .

So a gross production of $90 billion of food, $60 billion of clothing, and $70billion of housing is necessary to meet the required demands.

EXERCISES


(a) Any system of linear equations has at least one solution.(b) Any system of linear equations has at most one solution.(c) Any homogeneous system of linear equations has at least one so-

lution.(d) Any system of n linear equations in n unknowns has at most one

solution.(e) Any system of n linear equations in n unknowns has at least one

solution.(f) If the homogeneous system corresponding to a given system of lin-

ear equations has a solution, then the given system has a solution.(g) If the coefficient matrix of a homogeneous system of n linear equa-

tions in n unknowns is invertible, then the system has no nonzerosolutions.

(h) The solution set of any system of m linear equations in n unknownsis a subspace of Fn.

2. For each of the following homogeneous systems of linear equations, findthe dimension of and a basis for the solution set.


(a)x1 + 3x2 = 0

2x1 + 6x2 = 0 (b)x1 + x2 − x3 = 0

4x1 + x2 − 2x3 = 0

(c)x1 + 2x2 − x3 = 0

2x1 + x2 + x3 = 0 (d)2x1 + x2 − x3 = 0x1 − x2 + x3 = 0x1 + 2x2 − 2x3 = 0

(e) x1 + 2x2 − 3x3 + x4 = 0 (f)x1 + 2x2 = 0x1 − x2 = 0

(g)x1 + 2x2 + x3 + x4 = 0

x2 − x3 + x4 = 0

3. Using the results of Exercise 2, find all solutions to the following sys-tems.

(a)x1 + 3x2 = 5

2x1 + 6x2 = 10 (b)x1 + x2 − x3 = 1

4x1 + x2 − 2x3 = 3

(c)x1 + 2x2 − x3 = 3

2x1 + x2 + x3 = 6 (d)2x1 + x2 − x3 = 5x1 − x2 + x3 = 1x1 + 2x2 − 2x3 = 4

(e) x1 + 2x2 − 3x3 + x4 = 1 (f)x1 + 2x2 = 5x1 − x2 = −1

(g)x1 + 2x2 + x3 + x4 = 1

x2 − x3 + x4 = 1

4. For each system of linear equations with the invertible coefficient matrixA,

(1) Compute A−1.

(2) Use A−1 to solve the system.

(a)x1 + 3x2 = 4

2x1 + 5x2 = 3 (b)x1 + 2x2 − x3 = 5x1 + x2 + x3 = 1

2x1 − 2x2 + x3 = 4

5. Give an example of a system of n linear equations in n unknowns withinfinitely many solutions.

6. Let T : R3 → R2 be defined by T(a, b, c) = (a + b, 2a − c). DetermineT−1(1, 11).

7. Determine which of the following systems of linear equations has a so-lution.


(a)x1 + x2 − x3 + 2x4 = 2x1 + x2 + 2x3 = 1

2x1 + 2x2 + x3 + 2x4 = 4(b)

x1 + x2 − x3 = 12x1 + x2 + 3x3 = 2

(c)x1 + 2x2 + 3x3 = 1x1 + x2 − x3 = 0x1 + 2x2 + x3 = 3

(d)

x1 + x2 + 3x3 − x4 = 0x1 + x2 + x3 + x4 = 1x1 − 2x2 + x3 − x4 = 1

4x1 + x2 + 8x3 − x4 = 0

(e)x1 + 2x2 − x3 = 1

2x1 + x2 + 2x3 = 3x1 − 4x2 + 7x3 = 4

8. Let T : R3 → R3 be defined by T(a, b, c) = (a + b, b − 2c, a + 2c). Foreach vector v in R3, determine whether v ∈ R(T).

(a) v = (1, 3,−2) (b) v = (2, 1, 1)

9. Prove that the system of linear equations Ax = b has a solution if andonly if b ∈ R(LA).

10. Prove or give a counterexample to the following statement: If the co-efficient matrix of a system of m linear equations in n unknowns hasrank m, then the system has a solution.

11. In the closed model of Leontief with food, clothing, and housing as thebasic industries, suppose that the input–output matrix is

A =

⎛⎜⎜⎝716

12

316

516

16

516

14

13

12

⎞⎟⎟⎠ .

At what ratio must the farmer, tailor, and carpenter produce in orderfor equilibrium to be attained?

12. A certain economy consists of two sectors: goods and services. Supposethat 60% of all goods and 30% of all services are used in the productionof goods. What proportion of the total economic output is used in theproduction of goods?

13. In the notation of the open model of Leontief, suppose that

A =

⎛⎝ 12

15

13

15

⎞⎠ and d =(

25

)are the input–output matrix and the demand vector, respectively. Howmuch of each commodity must be produced to satisfy this demand?


14. A certain economy consisting of the two sectors of goods and servicessupports a defense system that consumes $90 billion worth of goods and$20 billion worth of services from the economy but does not contributeto economic production. Suppose that 50 cents worth of goods and20 cents worth of services are required to produce $1 worth of goodsand that 30 cents worth of of goods and 60 cents worth of services arerequired to produce $1 worth of services. What must the total outputof the economic system be to support this defense system?

3.4 SYSTEMS OF LINEAR EQUATIONS—COMPUTATIONAL ASPECTS

In Section 3.3, we obtained a necessary and sufficient condition for a systemof linear equations to have solutions (Theorem 3.11 p. 174) and learned howto express the solutions to a nonhomogeneous system in terms of solutionsto the corresponding homogeneous system (Theorem 3.9 p. 172). The latterresult enables us to determine all the solutions to a given system if we canfind one solution to the given system and a basis for the solution set of thecorresponding homogeneous system. In this section, we use elementary rowoperations to accomplish these two objectives simultaneously. The essence ofthis technique is to transform a given system of linear equations into a systemhaving the same solutions, but which is easier to solve (as in Section 1.4).

Definition. Two systems of linear equations are called equivalent ifthey have the same solution set.

The following theorem and corollary give a useful method for obtainingequivalent systems.

Theorem 3.13. Let Ax = b be a system of m linear equations in nunknowns, and let C be an invertible m × m matrix. Then the system(CA)x = Cb is equivalent to Ax = b.

Proof. Let K be the solution set for Ax = b and K ′ the solution set for(CA)x = Cb. If w ∈ K, then Aw = b. So (CA)w = Cb, and hence w ∈ K ′.Thus K ⊆ K ′.

Conversely, if w ∈ K ′, then (CA)w = Cb. Hence

Aw = C−1(CAw) = C−1(Cb) = b;

so w ∈ K. Thus K ′ ⊆ K, and therefore, K = K ′.

Corollary. Let Ax = b be a system of m linear equations in n unknowns.If (A′|b′) is obtained from (A|b) by a finite number of elementary row opera-tions, then the system A′x = b′ is equivalent to the original system.

Sec. 3.4 Systems of Linear Equations—Computational Aspects 183

Proof. Suppose that (A′|b′) is obtained from (A|b) by elementary rowoperations. These may be executed by multiplying (A|b) by elementary m×mmatrices E1, E2, . . . , Ep. Let C = Ep · · ·E2E1; then

(A′|b′) = C(A|b) = (CA|Cb).

Since each Ei is invertible, so is C. Now A′ = CA and b′ = Cb. Thus byTheorem 3.13, the system A′x = b′ is equivalent to the system Ax = b.

We now describe a method for solving any system of linear equations.Consider, for example, the system of linear equations

3x1 + 2x2 + 3x3 − 2x4 = 1x1 + x2 + x3 = 3x1 + 2x2 + x3 − x4 = 2.

First, we form the augmented matrix⎛⎝3 2 3 −2 11 1 1 0 31 2 1 −1 2

⎞⎠ .

By using elementary row operations, we transform the augmented matrixinto an upper triangular matrix in which the first nonzero entry of each rowis 1, and it occurs in a column to the right of the first nonzero entry of eachpreceding row. (Recall that matrix A is upper triangular if Aij = 0 wheneveri > j.)

1. In the leftmost nonzero column, create a 1 in the first row. In ourexample, we can accomplish this step by interchanging the first andthird rows. The resulting matrix is⎛⎝1 2 1 −1 2

1 1 1 0 33 2 3 −2 1

⎞⎠ .

2. By means of type 3 row operations, use the first row to obtain zeros inthe remaining positions of the leftmost nonzero column. In our example,we must add −1 times the first row to the second row and then add −3times the first row to the third row to obtain⎛⎝1 2 1 −1 2

0 −1 0 1 10 −4 0 1 −5

⎞⎠ .

3. Create a 1 in the next row in the leftmost possible column, without usingprevious row(s). In our example, the second column is the leftmost


possible column, and we can obtain a 1 in the second row, second columnby multiplying the second row by −1. This operation produces⎛⎝1 2 1 −1 2

0 1 0 −1 −10 −4 0 1 −5

⎞⎠ .

4. Now use type 3 elementary row operations to obtain zeros below the 1created in the preceding step. In our example, we must add four timesthe second row to the third row. The resulting matrix is⎛⎝1 2 1 −1 2

0 1 0 −1 −10 0 0 −3 −9

⎞⎠ .

5. Repeat steps 3 and 4 on each succeeding row until no nonzero rowsremain. (This creates zeros above the first nonzero entry in each row.)In our example, this can be accomplished by multiplying the third rowby − 1

3 . This operation produces⎛⎝1 2 1 −1 20 1 0 −1 −10 0 0 1 3

⎞⎠ .

We have now obtained the desired matrix. To complete the simplificationof the augmented matrix, we must make the first nonzero entry in each rowthe only nonzero entry in its column. (This corresponds to eliminating certainunknowns from all but one of the equations.)

6. Work upward, beginning with the last nonzero row, and add multiples ofeach row to the rows above. (This creates zeros above the first nonzeroentry in each row.) In our example, the third row is the last nonzerorow, and the first nonzero entry of this row lies in column 4. Hence weadd the third row to the first and second rows to obtain zeros in row 1,column 4 and row 2, column 4. The resulting matrix is⎛⎝1 2 1 0 5

0 1 0 0 20 0 0 1 3

⎞⎠ .

7. Repeat the process described in step 6 for each preceding row until it isperformed with the second row, at which time the reduction process iscomplete. In our example, we must add −2 times the second row to thefirst row in order to make the first row, second column entry becomezero. This operation produces⎛⎝1 0 1 0 1

0 1 0 0 20 0 0 1 3

⎞⎠ .


We have now obtained the desired reduction of the augmented matrix.This matrix corresponds to the system of linear equations

x1 + x3 = 1x2 = 2

x4 = 3.

Recall that, by the corollary to Theorem 3.13, this system is equivalent tothe original system. But this system is easily solved. Obviously x2 = 2 andx4 = 3. Moreover, x1 and x3 can have any values provided their sum is 1.Letting x3 = t, we then have x1 = 1 − t. Thus an arbitrary solution to theoriginal system has the form⎛⎜⎜⎝

1 − t2t3

⎞⎟⎟⎠ =

⎛⎜⎜⎝1203

⎞⎟⎟⎠+ t

⎛⎜⎜⎝−1

010

⎞⎟⎟⎠ .

Observe that ⎧⎪⎪⎨⎪⎪⎩⎛⎜⎜⎝−1

010

⎞⎟⎟⎠⎫⎪⎪⎬⎪⎪⎭

is a basis for the homogeneous system of equations corresponding to the givensystem.

In the preceding example we performed elementary row operations on theaugmented matrix of the system until we obtained the augmented matrix of asystem having properties 1, 2, and 3 on page 27. Such a matrix has a specialname.

Definition. A matrix is said to be in reduced row echelon form ifthe following three conditions are satisfied.

(a) Any row containing a nonzero entry precedes any row in which all theentries are zero (if any).

(b) The first nonzero entry in each row is the only nonzero entry in itscolumn.

(c) The first nonzero entry in each row is 1 and it occurs in a column tothe right of the first nonzero entry in the preceding row.

Example 1

(a) The matrix on page 184 is in reduced row echelon form. Note that thefirst nonzero entry of each row is 1 and that the column containing each suchentry has all zeros otherwise. Also note that each time we move downward to


a new row, we must move to the right one or more columns to find the firstnonzero entry of the new row.

(b) The matrix ⎛⎝1 1 00 1 01 0 1

⎞⎠ ,

is not in reduced row echelon form, because the first column, which containsthe first nonzero entry in row 1, contains another nonzero entry. Similarly,the matrix ⎛⎝0 1 0 2

1 0 0 10 0 1 1

⎞⎠ ,

is not in reduced row echelon form, because the first nonzero entry of thesecond row is not to the right of the first nonzero entry of the first row.Finally, the matrix (

2 0 00 1 0

),

is not in reduced row echelon form, because the first nonzero entry of the firstrow is not 1. ♦

It can be shown (see the corollary to Theorem 3.16) that the reducedrow echelon form of a matrix is unique; that is, if different sequences ofelementary row operations are used to transform a matrix into matrices Qand Q′ in reduced row echelon form, then Q = Q′. Thus, although there aremany different sequences of elementary row operations that can be used totransform a given matrix into reduced row echelon form, they all produce thesame result.

The procedure described on pages 183–185 for reducing an augmentedmatrix to reduced row echelon form is called Gaussian elimination. Itconsists of two separate parts.

1. In the forward pass (steps 1-5), the augmented matrix is transformedinto an upper triangular matrix in which the first nonzero entry of eachrow is 1, and it occurs in a column to the right of the first nonzero entryof each preceding row.

2. In the backward pass or back-substitution (steps 6-7), the upper trian-gular matrix is transformed into reduced row echelon form by makingthe first nonzero entry of each row the only nonzero entry of its column.


Of all the methods for transforming a matrix into its reduced row ech-elon form, Gaussian elimination requires the fewest arithmetic operations.(For large matrices, it requires approximately 50% fewer operations than theGauss-Jordan method, in which the matrix is transformed into reduced rowechelon form by using the first nonzero entry in each row to make zero allother entries in its column.) Because of this efficiency, Gaussian eliminationis the preferred method when solving systems of linear equations on a com-puter. In this context, the Gaussian elimination procedure is usually modifiedin order to minimize roundoff errors. Since discussion of these techniques isinappropriate here, readers who are interested in such matters are referred tobooks on numerical analysis.

When a matrix is in reduced row echelon form, the corresponding sys-tem of linear equations is easy to solve. We present below a procedure forsolving any system of linear equations for which the augmented matrix is inreduced row echelon form. First, however, we note that every matrix can betransformed into reduced row echelon form by Gaussian elimination. In theforward pass, we satisfy conditions (a) and (c) in the definition of reducedrow echelon form and thereby make zero all entries below the first nonzeroentry in each row. Then in the backward pass, we make zero all entries abovethe first nonzero entry in each row, thereby satisfying condition (b) in thedefinition of reduced row echelon form.

Theorem 3.14. Gaussian elimination transforms any matrix into its re-duced row echelon form.

We now describe a method for solving a system in which the augmentedmatrix is in reduced row echelon form. To illustrate this procedure, we con-sider the system

2x1 + 3x2 + x3 + 4x4 − 9x5 = 17x1 + x2 + x3 + x4 − 3x5 = 6x1 + x2 + x3 + 2x4 − 5x5 = 8

2x1 + 2x2 + 2x3 + 3x4 − 8x5 = 14,

for which the augmented matrix is⎛⎜⎜⎝2 3 1 4 −9 171 1 1 1 −3 61 1 1 2 −5 82 2 2 3 −8 14

⎞⎟⎟⎠ .

Applying Gaussian elimination to the augmented matrix of the system pro-duces the following sequence of matrices.⎛⎜⎜⎝

2 3 1 4 −9 171 1 1 1 −3 61 1 1 2 −5 82 2 2 3 −8 14

⎞⎟⎟⎠ −→

⎛⎜⎜⎝1 1 1 1 −3 62 3 1 4 −9 171 1 1 2 −5 82 2 2 3 −8 14

⎞⎟⎟⎠ −→

188 Chap. 3 Elementary Matrix Operations and Systems of Linear Equations⎛⎜⎜⎝1 1 1 1 −3 60 1 −1 2 −3 50 0 0 1 −2 20 0 0 1 −2 2

⎞⎟⎟⎠ −→

⎛⎜⎜⎝1 1 1 1 −3 60 1 −1 2 −3 50 0 0 1 −2 20 0 0 0 0 0

⎞⎟⎟⎠ −→

⎛⎜⎜⎝1 1 1 0 −1 40 1 −1 0 1 10 0 0 1 −2 20 0 0 0 0 0

⎞⎟⎟⎠ −→

⎛⎜⎜⎝1 0 2 0 −2 30 1 −1 0 1 10 0 0 1 −2 20 0 0 0 0 0

⎞⎟⎟⎠ .

The system of linear equations corresponding to this last matrix is

x1 + 2x3 − 2x5 = 3x2 − x3 + x5 = 1

x4 − 2x5 = 2.

Notice that we have ignored the last row since it consists entirely of zeros.To solve a system for which the augmented matrix is in reduced row

echelon form, divide the variables into two sets. The first set consists ofthose variables that appear as leftmost variables in one of the equations ofthe system (in this case the set is {x1, x2, x4}). The second set consists ofall the remaining variables (in this case, {x3, x5}). To each variable in thesecond set, assign a parametric value t1, t2, . . . (x3 = t1, x5 = t2), and thensolve for the variables of the first set in terms of those in the second set:

x1 = −2x3 + 2x5 + 3 = −2t1 + 2t2 + 3x2 = x3 − x5 + 1 = t1 − t2 + 1x4 = 2x5 + 2 = 2t2 + 2.

Thus an arbitrary solution is of the form⎛⎜⎜⎜⎜⎝x1

x2

x3

x4

x5

⎞⎟⎟⎟⎟⎠ =

⎛⎜⎜⎜⎜⎝−2t1 + 2t2 + 3

t1 − t2 + 1t1

2t2 + 2t2

⎞⎟⎟⎟⎟⎠ =

⎛⎜⎜⎜⎜⎝31020

⎞⎟⎟⎟⎟⎠+ t1

⎛⎜⎜⎜⎜⎝−2

1100

⎞⎟⎟⎟⎟⎠+ t2

⎛⎜⎜⎜⎜⎝2

−1021

⎞⎟⎟⎟⎟⎠ ,

where t1, t2 ∈ R. Notice that⎧⎪⎪⎪⎪⎨⎪⎪⎪⎪⎩

⎛⎜⎜⎜⎜⎝−2

1100

⎞⎟⎟⎟⎟⎠ ,

⎛⎜⎜⎜⎜⎝2

−1021

⎞⎟⎟⎟⎟⎠⎫⎪⎪⎪⎪⎬⎪⎪⎪⎪⎭


is a basis for the solution set of the corresponding homogeneous system ofequations and ⎛⎜⎜⎜⎜⎝

31020

⎞⎟⎟⎟⎟⎠is a particular solution to the original system.

Therefore, in simplifying the augmented matrix of the system to reducedrow echelon form, we are in effect simultaneously finding a particular solu-tion to the original system and a basis for the solution set of the associatedhomogeneous system. Moreover, this procedure detects when a system is in-consistent, for by Exercise 3, solutions exist if and only if, in the reduction ofthe augmented matrix to reduced row echelon form, we do not obtain a rowin which the only nonzero entry lies in the last column.

Thus to use this procedure for solving a system Ax = b of m linear equa-tions in n unknowns, we need only begin to transform the augmented matrix(A|b) into its reduced row echelon form (A′|b′) by means of Gaussian elimi-nation. If a row is obtained in which the only nonzero entry lies in the lastcolumn, then the original system is inconsistent. Otherwise, discard any zerorows from (A′|b′), and write the corresponding system of equations. Solvethis system as described above to obtain an arbitrary solution of the form

s = s0 + t1u1 + t2u2 + · · · + tn−run−r,

where r is the number of nonzero rows in A′ (r ≤ m). The preceding equationis called a general solution of the system Ax = b. It expresses an arbitrarysolution s of Ax = b in terms of n − r parameters. The following theoremstates that s cannot be expressed in fewer than n − r parameters.

Theorem 3.15. Let Ax = b be a system of r nonzero equations in nunknowns. Suppose that rank(A) = rank(A|b) and that (A|b) is in reducedrow echelon form. Then

(a) rank(A) = r.(b) If the general solution obtained by the procedure above is of the form

s = s0 + t1u1 + t2u2 + · · · + tn−run−r,

then {u1, u2, . . . , un−r} is a basis for the solution set of the correspond-ing homogeneous system, and s0 is a solution to the original system.

Proof. Since (A|b) is in reduced row echelon form, (A|b) must have rnonzero rows. Clearly these rows are linearly independent by the definitionof the reduced row echelon form, and so rank(A|b) = r. Thus rank(A) = r.


Let K be the solution set for Ax = b, and let KH be the solution set forAx = 0 . Setting t1 = t2 = · · · = tn−r = 0, we see that s = s0 ∈ K. But byTheorem 3.9 (p. 172), K = {s0} + KH. Hence

KH = {−s0} + K = span({u1, u2, . . . , un−r}).Because rank(A) = r, we have dim(KH) = n− r. Thus since dim(KH) = n− rand KH is generated by a set {u1, u2, . . . , un−r} containing at most n − rvectors, we conclude that this set is a basis for KH.

An Interpretation of the Reduced Row Echelon Form

Let A be an m × n matrix with columns a1, a2, . . . , an, and let B be thereduced row echelon form of A. Denote the columns of B by b1, b2, . . . , bn. Ifthe rank of A is r, then the rank of B is also r by the corollary to Theorem 3.4(p. 153). Because B is in reduced row echelon form, no nonzero row of B canbe a linear combination of the other rows of B. Hence B must have exactlyr nonzero rows, and if r ≥ 1, the vectors e1, e2, . . . , er must occur among thecolumns of B. For i = 1, 2, . . . , r, let ji denote a column number of B suchthat bji

= ei. We claim that aj1 , aj2 , . . . , ajr, the columns of A corresponding

to these columns of B, are linearly independent. For suppose that there arescalars c1, c2, . . . , cr such that

c1aj1 + c2aj2 + · · · + crajr = 0 .

Because B can be obtained from A by a sequence of elementary row oper-ations, there exists (as in the proof of the corollary to Theorem 3.13) aninvertible m × m matrix M such that MA = B. Multiplying the precedingequation by M yields

c1Maj1 + c2Maj2 + · · · + crMajr= 0 .

Since Maji = bji = ei, it follows that

c1e1 + c2e2 + · · · + crer = 0 .

Hence c1 = c2 = · · · = cr = 0, proving that the vectors aj1 , aj2 , . . . , ajr arelinearly independent.

Because B has only r nonzero rows, every column of B has the form⎛⎜⎜⎜⎜⎜⎜⎜⎜⎜⎜⎝

d1

d2

...dr

0...0

⎞⎟⎟⎟⎟⎟⎟⎟⎟⎟⎟⎠


for scalars d1, d2, . . . , dr. The corresponding column of A must be

M−1(d1e1 + d2e2 + · · · + drer) = d1M−1e1 + d2M

−1e2 + · · · + drM−1er

= d1M−1bj1 + d2M

−1bj2 + · · · + drM−1bjr

= d1aj1 + d2aj2 + · · · + drajr .

The next theorem summarizes these results.

Theorem 3.16. Let A be an m × n matrix of rank r, where r > 0, andlet B be the reduced row echelon form of A. Then

(a) The number of nonzero rows in B is r.(b) For each i = 1, 2, . . . , r, there is a column bji of B such that bji = ei.(c) The columns of A numbered j1, j2, . . . , jr are linearly independent.(d) For each k = 1, 2, . . . n, if column k of B is d1e1 +d2e2 + · · ·+drer, then

column k of A is d1aj1 + d2aj2 + · · · + drajr.

Corollary. The reduced row echelon form of a matrix is unique.

Proof. Exercise. (See Exercise15.)

Example 2

Let

A =

⎛⎜⎜⎝2 4 6 2 41 2 3 1 12 4 8 0 03 6 7 5 9

⎞⎟⎟⎠ .

The reduced row echelon form of A is

B =

⎛⎜⎜⎝1 2 0 4 00 0 1 −1 00 0 0 0 10 0 0 0 0

⎞⎟⎟⎠ .

Since B has three nonzero rows, the rank of A is 3. The first, third, and fifthcolumns of B are e1, e2, and e3; so Theorem 3.16(c) asserts that the first,third, and fifth columns of A are linearly independent.

Let the columns of A be denoted a1, a2, a3, a4, and a5. Because the secondcolumn of B is 2e1, it follows from Theorem 3.16(d) that a2 = 2a1, as is easilychecked. Moreover, since the fourth column of B is 4e1 + (−1)e2, the sameresult shows that

a4 = 4a1 + (−1)a3. ♦


In Example 6 of Section 1.6, we extracted a basis for R3 from the gener-ating set

S = {(2,−3, 5), (8,−12, 20), (1, 0,−2), (0, 2,−1), (7, 2, 0)}.The procedure described there can be streamlined by using Theorem 3.16.We begin by noting that if S were linearly independent, then S would be abasis for R3. In this case, it is clear that S is linearly dependent becauseS contains more than dim(R3) = 3 vectors. Nevertheless, it is instructiveto consider the calculation that is needed to determine whether S is linearlydependent or linearly independent. Recall that S is linearly dependent ifthere are scalars c1, c2, c3, c4, and c5, not all zero, such that

c1(2,−3, 5)+c2(8,−12, 20)+c3(1, 0,−2)+c4(0, 2,−1)+c5(7, 2, 0) = (0, 0, 0).

Thus S is linearly dependent if and only if the system of linear equations

2c1 + 8c2 + c3 + 7c5 = 0−3c1 − 12c2 + 2c4 + 2c5 = 0

5c1 + 20c2 − 2c3 − c4 = 0

has a nonzero solution. The augmented matrix of this system of equations is

A =

⎛⎝ 2 8 1 0 7 0−3 −12 0 2 2 0

5 20 −2 −1 0 0

⎞⎠ ,

and its reduced row echelon form is

B =

⎛⎝1 4 0 0 2 00 0 1 0 3 00 0 0 1 4 0

⎞⎠ .

Using the technique described earlier in this section, we can find nonzerosolutions of the preceding system, confirming that S is linearly dependent.However, Theorem 3.16(c) gives us additional information. Since the first,third, and fourth columns of B are e1, e2, and e3, we conclude that the first,third, and fourth columns of A are linearly independent. But the columnsof A other than the last column (which is the zero vector) are vectors in S.Hence

β = {(2,−3, 5), (1, 0,−2), (0, 2,−1)}is a linearly independent subset of S. If follows from (b) of Corollary 2 to thereplacement theorem (p. 47) that β is a basis for R3.

Because every finite-dimensional vector space over F is isomorphic to Fn

for some n, a similar approach can be used to reduce any finite generatingset to a basis. This technique is illustrated in the next example.


Example 3

The set

S ={2+x+2x2+3x3, 4+2x+4x2+6x3, 6+3x+8x2+7x3, 2+x+5x3, 4+x+9x3}

generates a subspace V of P3(R). To find a subset of S that is a basis for V,we consider the subset

S′ = {(2, 1, 2, 3), (4, 2, 4, 6), (6, 3, 8, 7), (2, 1, 0, 5), (4, 1, 0, 9)}

consisting of the images of the polynomials in S under the standard repre-sentation of P3(R) with respect to the standard ordered basis. Note that the4 × 5 matrix in which the columns are the vectors in S′ is the matrix A inExample 2. From the reduced row echelon form of A, which is the matrix Bin Example 2, we see that the first, third, and fifth columns of A are linearlyindependent and the second and fourth columns of A are linear combinationsof the first, third, and fifth columns. Hence

{(2, 1, 2, 3), (6, 3, 8, 7), (4, 1, 0, 9)}

is a basis for the subspace of R4 that is generated by S′. It follows that

{2 + x + 2x2 + 3x3, 6 + 3x + 8x2 + 7x3, 4 + x + 9x3}

is a basis for the subspace V of P3(R). ♦We conclude this section by describing a method for extending a linearly

independent subset S of a finite-dimensional vector space V to a basis for V.Recall that this is always possible by (c) of Corollary 2 to the replacementtheorem (p. 47). Our approach is based on the replacement theorem andassumes that we can find an explicit basis β for V. Let S′ be the ordered setconsisting of the vectors in S followed by those in β. Since β ⊆ S′, the setS′ generates V. We can then apply the technique described above to reducethis generating set to a basis for V containing S.

Example 4

Let

V = {(x1, x2, x3, x4, x5) ∈ R5 : x1 + 7x2 + 5x3 − 4x4 + 2x5 = 0}.

It is easily verified that V is a subspace of R5 and that

S = {(−2, 0, 0,−1,−1), (1, 1,−2,−1,−1), (−5, 1, 0, 1, 1)}

is a linearly independent subset of V.


To extend S to a basis for V, we first obtain a basis β for V. To do so,we solve the system of linear equations that defines V. Since in this case V isdefined by a single equation, we need only write the equation as

x1 = −7x2 − 5x3 + 4x4 − 2x5

and assign parametric values to x2, x3, x4, and x5. If x2 = t1, x3 = t2,x4 = t3, and x5 = t4, then the vectors in V have the form

(x1,x2, x3, x4, x5) = (−7t1 − 5t2 + 4t3 − 2t4, t1, t2, t3, t4)= t1(−7, 1, 0, 0, 0) + t2(−5, 0, 1, 0, 0) + t3(4, 0, 0, 1, 0) + t4(−2, 0, 0, 0, 1).

Hence

β = {(−7, 1, 0, 0, 0), (−5, 0, 1, 0, 0), (4, 0, 0, 1, 0), (−2, 0, 0, 0, 1)}is a basis for V by Theorem 3.15.

The matrix whose columns consist of the vectors in S followed by thosein β is ⎛⎜⎜⎜⎜⎝

−2 1 −5 −7 −5 4 −20 1 1 1 0 0 00 −2 0 0 1 0 0

−1 −1 1 0 0 1 0−1 −1 1 0 0 0 1

⎞⎟⎟⎟⎟⎠ ,

and its reduced row echelon form is⎛⎜⎜⎜⎜⎝1 0 0 1 1 0 −10 1 0 0 −.5 0 00 0 1 1 .5 0 00 0 0 0 0 1 −10 0 0 0 0 0 0

⎞⎟⎟⎟⎟⎠ .

Thus

{(−2, 0, 0,−1,−1), (1, 1,−2,−1,−1), (−5, 1, 0, 1, 1), (4, 0, 0, 1, 0)}is a basis for V containing S. ♦

EXERCISES


(a) If (A′|b′) is obtained from (A|b) by a finite sequence of elementarycolumn operations, then the systems Ax = b and A′x = b′ areequivalent.


(b) If (A′|b′) is obtained from (A|b) by a finite sequence of elemen-tary row operations, then the systems Ax = b and A′x = b′ areequivalent.

(c) If A is an n×n matrix with rank n, then the reduced row echelonform of A is In.

(d) Any matrix can be put in reduced row echelon form by means ofa finite sequence of elementary row operations.

(e) If (A|b) is in reduced row echelon form, then the system Ax = b isconsistent.

(f) Let Ax = b be a system of m linear equations in n unknowns forwhich the augmented matrix is in reduced row echelon form. Ifthis system is consistent, then the dimension of the solution set ofAx = 0 is n− r, where r equals the number of nonzero rows in A.

(g) If a matrix A is transformed by elementary row operations into amatrix A′ in reduced row echelon form, then the number of nonzerorows in A′ equals the rank of A.

2. Use Gaussian elimination to solve the following systems of linear equa-tions.

(a)x1 + 2x2 − x3 = −1

2x1 + 2x2 + x3 = 13x1 + 5x2 − 2x3 = −1

(b)

x1 − 2x2 − x3 = 12x1 − 3x2 + x3 = 63x1 − 5x2 = 7x1 + 5x3 = 9

(c)

x1 + 2x2 + 2x4 = 63x1 + 5x2 − x3 + 6x4 = 172x1 + 4x2 + x3 + 2x4 = 122x1 − 7x3 + 11x4 = 7

(d)

x1 − x2 − 2x3 + 3x4 = −72x1 − x2 + 6x3 + 6x4 = −2

−2x1 + x2 − 4x3 − 3x4 = 03x1 − 2x2 + 9x3 + 10x4 = −5

(e)x1 − 4x2 − x3 + x4 = 3

2x1 − 8x2 + x3 − 4x4 = 9−x1 + 4x2 − 2x3 + 5x4 = −6

(f)x1 + 2x2 − x3 + 3x4 = 2

2x1 + 4x2 − x3 + 6x4 = 5x2 + 2x4 = 3

(g)2x1 − 2x2 − x3 + 6x4 − 2x5 = 1x1 − x2 + x3 + 2x4 − x5 = 2

4x1 − 4x2 + 5x3 + 7x4 − x5 = 6

(h)

3x1 − x2 + x3 − x4 + 2x5 = 5x1 − x2 − x3 − 2x4 − x5 = 2

5x1 − 2x2 + x3 − 3x4 + 3x5 = 102x1 − x2 − 2x4 + x5 = 5


(i)

3x1 − x2 + 2x3 + 4x4 + x5 = 2x1 − x2 + 2x3 + 3x4 + x5 = −1

2x1 − 3x2 + 6x3 + 9x4 + 4x5 = −57x1 − 2x2 + 4x3 + 8x4 + x5 = 6

(j)

2x1 + 3x3 − 4x5 = 53x1 − 4x2 + 8x3 + 3x4 = 8x1 − x2 + 2x3 + x4 − x5 = 2

−2x1 + 5x2 − 9x3 − 3x4 − 5x5 = −8

3. Suppose that the augmented matrix of a system Ax = b is transformedinto a matrix (A′|b′) in reduced row echelon form by a finite sequenceof elementary row operations.(a) Prove that rank(A′) �= rank(A′|b′) if and only if (A′|b′) contains a

row in which the only nonzero entry lies in the last column.(b) Deduce that Ax = b is consistent if and only if (A′|b′) contains no

row in which the only nonzero entry lies in the last column.

4. For each of the systems that follow, apply Exercise 3 to determinewhether the system is consistent. If the system is consistent, find allsolutions. Finally, find a basis for the solution set of the correspondinghomogeneous system.

(a)x1 + 2x2 − x3 + x4 = 2

2x1 + x2 + x3 − x4 = 3x1 + 2x2 − 3x3 + 2x4 = 2

(b)x1 + x2 − 3x3 + x4 = −2x1 + x2 + x3 − x4 = 2x1 + x2 − x3 = 0

(c)x1 + x2 − 3x3 + x4 = 1x1 + x2 + x3 − x4 = 2x1 + x2 − x3 = 0

5. Let the reduced row echelon form of A be⎛⎝1 0 2 0 −20 1 −5 0 −30 0 0 1 6

⎞⎠ .

Determine A if the first, second, and fourth columns of A are⎛⎝ 1−1

3

⎞⎠ ,

⎛⎝ 0−1

1

⎞⎠ , and

⎛⎝ 1−2

0

⎞⎠ ,

respectively.

6. Let the reduced row echelon form of A be⎛⎜⎜⎝1 −3 0 4 0 50 0 1 3 0 20 0 0 0 1 −10 0 0 0 0 0

⎞⎟⎟⎠ .


Determine A if the first, third, and sixth columns of A are⎛⎜⎜⎝1

−2−1

3

⎞⎟⎟⎠ ,

⎛⎜⎜⎝−1

12

−4

⎞⎟⎟⎠ , and

⎛⎜⎜⎝3

−925

⎞⎟⎟⎠ ,

respectively.

7. It can be shown that the vectors u1 = (2,−3, 1), u2 = (1, 4,−2), u3 =(−8, 12,−4), u4 = (1, 37,−17), and u5 = (−3,−5, 8) generate R3. Finda subset of {u1, u2, u3, u4, u5} that is a basis for R3.

8. Let W denote the subspace of R5 consisting of all vectors having coor-dinates that sum to zero. The vectors

u1 = (2,−3, 4,−5, 2), u2 = (−6, 9,−12, 15,−6),u3 = (3,−2, 7,−9, 1), u4 = (2,−8, 2,−2, 6),u5 = (−1, 1, 2, 1,−3), u6 = (0,−3,−18, 9, 12),u7 = (1, 0,−2, 3,−2), and u8 = (2,−1, 1,−9, 7)

generate W. Find a subset of {u1, u2, . . . , u8} that is a basis for W.

9. Let W be the subspace of M2×2(R) consisting of the symmetric 2 × 2matrices. The set

S ={(

0 −1−1 1

),

(1 22 3

),

(2 11 9

),

(1 −2

−2 4

),

(−1 22 −1

)}generates W. Find a subset of S that is a basis for W.

10. Let

V = {(x1, x2, x3, x4, x5) ∈ R5 : x1 − 2x2 + 3x3 − x4 + 2x5 = 0}.(a) Show that S = {(0, 1, 1, 1, 0)} is a linearly independent subset of

V.(b) Extend S to a basis for V.

11. Let V be as in Exercise 10.

(a) Show that S = {(1, 2, 1, 0, 0)} is a linearly independent subset ofV.

(b) Extend S to a basis for V.

12. Let V denote the set of all solutions to the system of linear equations

x1 − x2 + 2x4 − 3x5 + x6 = 02x1 − x2 − x3 + 3x4 − 4x5 + 4x6 = 0.


(a) Show that S = {(0,−1, 0, 1, 1, 0), (1, 0, 1, 1, 1, 0)} is a linearly inde-pendent subset of V.


13. Let V be as in Exercise 12.

(a) Show that S = {(1, 0, 1, 1, 1, 0), (0, 2, 1, 1, 0, 0)} is a linearly inde-pendent subset of V.


14. If (A|b) is in reduced row echelon form, prove that A is also in reducedrow echelon form.

15. Prove the corollary to Theorem 3.16: The reduced row echelon form ofa matrix is unique.


Augmented matrix 161Augmented matrix of a system of lin-

ear equations 174Backward pass 186Closed model of a simple economy

176Coefficient matrix of a system of lin-

ear equations 169Consistent system of linear equations

169Elementary column operation 148Elementary matrix 149Elementary operation 148Elementary row operation 148Equilibrium condition for a simple

economy 177Equivalent systems of linear equa-

tions 182Forward pass 186Gaussian elimination 186General solution of a system of linear

equations 189Homogeneous system correspond-

ing to a nonhomogeneous system172

Homogeneous system of linear equa-tions 171

Inconsistent system of linear equa-tions 169

Input–output matrix 177Nonhomogeneous system of linear

equations 171Nonnegative vector 177Open model of a simple economy

178Positive matrix 177Rank of a matrix 152Reduced row echelon form of a ma-

trix 185Solution to a system of linear equa-

tions 169Solution set of a system of equations

169System of linear equations 169Type 1, 2, and 3 elementary opera-

tions 148

4Determinants4.1 Determinants of Order 24.2 Determinants of Order n4.3 Properties of Determinants4.4 Summary — Important Facts about Determinants4.5* A Characterization of the Determinant

The determinant, which has played a prominent role in the theory of lin-ear algebra, is a special scalar-valued function defined on the set of squarematrices. Although it still has a place in the study of linear algebra and itsapplications, its role is less central than in former times. Yet no linear algebrabook would be complete without a systematic treatment of the determinant,and we present one here. However, the main use of determinants in this bookis to compute and establish the properties of eigenvalues, which we discuss inChapter 5.

Although the determinant is not a linear transformation on Mn×n(F )for n > 1, it does possess a kind of linearity (called n-linearity) as wellas other properties that are examined in this chapter. In Section 4.1, weconsider the determinant on the set of 2×2 matrices and derive its importantproperties and develop an efficient computational procedure. To illustrate theimportant role that determinants play in geometry, we also include optionalmaterial that explores the applications of the determinant to the study ofarea and orientation. In Sections 4.2 and 4.3, we extend the definition of thedeterminant to all square matrices and derive its important properties anddevelop an efficient computational procedure. For the reader who prefers totreat determinants lightly, Section 4.4 contains the essential properties thatare needed in later chapters. Finally, Section 4.5, which is optional, offersan axiomatic approach to determinants by showing how to characterize thedeterminant in terms of three key properties.

4.1 DETERMINANTS OF ORDER 2

In this section, we define the determinant of a 2 × 2 matrix and investigateits geometric significance in terms of area and orientation.

199

200 Chap. 4 Determinants

Definition. If

A =(

a bc d

)is a 2×2 matrix with entries from a field F , then we define the determinantof A, denoted det(A) or |A|, to be the scalar ad − bc.

Example 1

For the matrices

A =(

1 23 4

)and B =

(3 26 4

)in M2×2(R), we have

det(A) = 1 ·4 − 2 ·3 = −2 and det(B) = 3 ·4 − 2 ·6 = 0. ♦

For the matrices A and B in Example 1, we have

A + B =(

4 49 8

),

and so

det(A + B) = 4 ·8 − 4 ·9 = −4.

Since det(A + B) �= det(A) + det(B), the function det : M2×2(R) → R isnot a linear transformation. Nevertheless, the determinant does possess animportant linearity property, which is explained in the following theorem.

Theorem 4.1. The function det : M2×2(F ) → F is a linear function ofeach row of a 2 × 2 matrix when the other row is held fixed. That is, if u, v,and w are in F2 and k is a scalar, then

det(

u + kvw

)= det

(uw

)+ k det

(vw

)and

det(

wu + kv

)= det

(wu

)+ k det

(wv

).

Proof. Let u = (a1, a2), v = (b1, b2), and w = (c1, c2) be in F2 and k be ascalar. Then

det(

uw

)+ k det

(vw

)= det

(a1 a2

c1 c2

)+ k det

(b1 b2

c1 c2

)

Sec. 4.1 Determinants of Order 2 201

= (a1c2 − a2c1) + k(b1c2 − b2c1)= (a1 + kb1)c2 − (a2 + kb2)c1

= det(

a1 + kb1 a2 + kb2

c1 c2

)= det

(u + kv

w

).

A similar calculation shows that

det(

wu

)+ k det

(wv

)= det

(w

u + kv

).

For the 2 × 2 matrices A and B in Example 1, it is easily checked that Ais invertible but B is not. Note that det(A) �= 0 but det(B) = 0. We nowshow that this property is true in general.

Theorem 4.2. Let A ∈ M2×2(F ). Then the determinant of A is nonzeroif and only if A is invertible. Moreover, if A is invertible, then

A−1 =1

det(A)

(A22 −A12

−A21 A11

).

Proof. If det(A) �= 0, then we can define a matrix

M =1

det(A)

(A22 −A12

−A21 A11

).

A straightforward calculation shows that AM = MA = I, and so A is invert-ible and M = A−1.

Conversely, suppose that A is invertible. A remark on page 152 showsthat the rank of

A =(

A11 A12

A21 A22

)must be 2. Hence A11 �= 0 or A21 �= 0. If A11 �= 0, add −A21/A11 times row 1of A to row 2 to obtain the matrix⎛⎝A11 A12

0 A22 − A12A21

A11

⎞⎠ .

Because elementary row operations are rank-preserving by the corollary toTheorem 3.4 (p. 153), it follows that

A22 − A12A21

A11�= 0.


Therefore det(A) = A11A22 − A12A21 �= 0. On the other hand, if A21 �= 0,we see that det(A) �= 0 by adding −A11/A21 times row 2 of A to row 1 andapplying a similar argument. Thus, in either case, det(A) �= 0.

In Sections 4.2 and 4.3, we extend the definition of the determinant ton×n matrices and show that Theorem 4.2 remains true in this more generalcontext. In the remainder of this section, which can be omitted if desired,we explore the geometric significance of the determinant of a 2 × 2 matrix.In particular, we show the importance of the sign of the determinant in thestudy of orientation.

The Area of a Parallelogram

By the angle between two vectors in R2, we mean the angle with measureθ (0 ≤ θ < π) that is formed by the vectors having the same magnitude anddirection as the given vectors but emanating from the origin. (See Figure 4.1.)

�

��

��

�

��

�

�

.................................................................�θ

y

x

Figure 4.1: Angle between two vectors in R2

If β = {u, v} is an ordered basis for R2, we define the orientation of βto be the real number

O(

uv

)=

det(

uv

)∣∣∣∣det

(uv

)∣∣∣∣ .(The denominator of this fraction is nonzero by Theorem 4.2.) Clearly

O(

uv

)= ±1.

Notice that

O(

e1

e2

)= 1 and O

(e1

−e2

)= −1.

Recall that a coordinate system {u, v} is called right-handed if u canbe rotated in a counterclockwise direction through an angle θ (0 < θ < π)


to coincide with v. Otherwise {u, v} is called a left-handed system. (SeeFigure 4.2.) In general (see Exercise 12),

��

�

�y

x

uv

A right-handed coordinate system

��

�

�y

x

uv

A left-handed coordinate system

Figure 4.2

O(

uv

)= 1

if and only if the ordered basis {u, v} forms a right-handed coordinate system.For convenience, we also define

O(

uv

)= 1

if {u, v} is linearly dependent.Any ordered set {u, v} in R2 determines a parallelogram in the following

manner. Regarding u and v as arrows emanating from the origin of R2, wecall the parallelogram having u and v as adjacent sides the parallelogramdetermined by u and v. (See Figure 4.3.) Observe that if the set {u, v}

v

u x

y

v

u

x

y

Figure 4.3: Parallelograms determined by u and v

is linearly dependent (i.e., if u and v are parallel), then the “parallelogram”determined by u and v is actually a line segment, which we consider to be adegenerate parallelogram having area zero.


There is an interesting relationship between

A(

uv

),

the area of the parallelogram determined by u and v, and

det(

uv

),

which we now investigate. Observe first, however, that since

det(

uv

)may be negative, we cannot expect that

A(

uv

)= det

(uv

).

But we can prove that

A(

uv

)= O

(uv

)· det

(uv

),

from which it follows that

A(

uv

)=∣∣∣∣det

(uv

)∣∣∣∣ .Our argument that

A(

uv

)= O

(uv

)· det

(uv

)employs a technique that, although somewhat indirect, can be generalized toRn. First, since

O(

uv

)= ±1,

we may multiply both sides of the desired equation by

O(

uv

)to obtain the equivalent form

O(

uv

)·A

(uv

)= det

(uv

).


We establish this equation by verifying that the three conditions of Exercise 11are satisfied by the function

δ

(uv

)= O

(uv

)·A

(uv

).

(a) We begin by showing that for any real number c

δ

(ucv

)= c ·δ

(uv

).

Observe that this equation is valid if c = 0 because

δ

(ucv

)= O

(u0

)·A

(u0

)= 1 ·0 = 0.

So assume that c �= 0. Regarding cv as the base of the parallelogram deter-mined by u and cv, we see that

A(

ucv

)= base × altitude = |c|(length of v)(altitude) = |c| ·A

(uv

),

since the altitude h of the parallelogram determined by u and cv is the sameas that in the parallelogram determined by u and v. (See Figure 4.4.) Hence

��

� �

v cv

hu

Figure 4.4

δ

(ucv

)= O

(ucv

)·A

(ucv

)=[

c

|c| ·O(

uv

)][|c| ·A

(uv

)]

= c ·O(

uv

)·A

(uv

)= c ·δ

(uv

).

A similar argument shows that

δ

(cuv

)= c ·δ

(uv

).


We next prove that

δ

(u

au + bw

)= b ·δ

(uw

)for any u, w ∈ R2 and any real numbers a and b. Because the parallelogramsdetermined by u and w and by u and u + w have a common base u and thesame altitude (see Figure 4.5), it follows that

��

�

�

��

��

��

�u

u + ww

Figure 4.5

A(

uw

)= A

(u

u + w

).

If a = 0, then

δ

(u

au + bw

)= δ

(ubw

)= b ·δ

(uw

)by the first paragraph of (a). Otherwise, if a �= 0, then

δ

(u

au + bw

)= a ·δ

⎛⎝ u

u +b

aw

⎞⎠ = a ·δ⎛⎝ u

b

aw

⎞⎠ = b ·δ(

u

w

).

So the desired conclusion is obtained in either case.We are now able to show that

δ

(u

v1 + v2

)= δ

(uv1

)+ δ

(uv2

)for all u, v1, v2 ∈ R2. Since the result is immediate if u = 0, we assume thatu �= 0. Choose any vector w ∈ R2 such that {u, w} is linearly independent.Then for any vectors v1, v2 ∈ R2 there exist scalars ai and bi such thatvi = aiu + biw (i = 1, 2). Thus

δ

(u

v1 + v2

)= δ

(u

(a1 + a2)u + (b1 + b2)w

)= (b1 + b2)δ

(uw

)


= δ

(u

a1u + b1w

)+ δ

(u

a2u + b2w

)= δ

(uv1

)+ δ

(uv2

).

A similar argument shows that

δ

(u1 + u2

v

)= δ

(u1

v

)+ δ

(u2

v

)for all u1, u2, v ∈ R2.

(b) Since

A(

uu

)= 0, it follows that δ

(uu

)= O

(uu

)·A

(uu

)= 0

for any u ∈ R2.(c) Because the parallelogram determined by e1 and e2 is the unit square,

δ

(e1

e2

)= O

(e1

e2

)·A

(e1

e2

)= 1 · 1 = 1.

Therefore δ satisfies the three conditions of Exercise 11, and hence δ = det.So the area of the parallelogram determined by u and v equals

O(

uv

)· det

(uv

).

Thus we see, for example, that the area of the parallelogram determinedby u = (−1, 5) and v = (4,−2) is∣∣∣∣det

(uv

)∣∣∣∣ =∣∣∣∣det

(−1 54 −2

)∣∣∣∣ = 18.

EXERCISES


(a) The function det : M2×2(F ) → F is a linear transformation.(b) The determinant of a 2× 2 matrix is a linear function of each row

of the matrix when the other row is held fixed.(c) If A ∈ M2×2(F ) and det(A) = 0, then A is invertible.(d) If u and v are vectors in R2 emanating from the origin, then the

area of the parallelogram having u and v as adjacent sides is

det(

uv

).


(e) A coordinate system is right-handed if and only if its orientationequals 1.

2. Compute the determinants of the following matrices in M2×2(R).

(a)(

6 −32 4

)(b)

(−5 26 1

)(c)

(8 03 −1

)3. Compute the determinants of the following matrices in M2×2(C).

(a)(−1 + i 1 − 4i

3 + 2i 2 − 3i

)(b)

(5 − 2i 6 + 4i

−3 + i 7i

)(c)

(2i 34 6i

)4. For each of the following pairs of vectors u and v in R2, compute the

area of the parallelogram determined by u and v.

(a) u = (3,−2) and v = (2, 5)(b) u = (1, 3) and v = (−3, 1)(c) u = (4,−1) and v = (−6,−2)(d) u = (3, 4) and v = (2,−6)

5. Prove that if B is the matrix obtained by interchanging the rows of a2 × 2 matrix A, then det(B) = −det(A).

6. Prove that if the two columns of A ∈ M2×2(F ) are identical, thendet(A) = 0.

7. Prove that det(At) = det(A) for any A ∈ M2×2(F ).

8. Prove that if A ∈ M2×2(F ) is upper triangular, then det(A) equals theproduct of the diagonal entries of A.

9. Prove that det(AB) = det(A) · det(B) for any A, B ∈ M2×2(F ).

10. The classical adjoint of a 2 × 2 matrix A ∈ M2×2(F ) is the matrix

C =(

A22 −A12

−A21 A11

).

Prove that

(a) CA = AC = [det(A)]I.(b) det(C) = det(A).(c) The classical adjoint of At is Ct.(d) If A is invertible, then A−1 = [det(A)]−1C.

11. Let δ : M2×2(F ) → F be a function with the following three properties.

(i) δ is a linear function of each row of the matrix when the other rowis held fixed.

(ii) If the two rows of A ∈ M2×2(F ) are identical, then δ(A) = 0.

Sec. 4.2 Determinants of Order n 209

(iii) If I is the 2 × 2 identity matrix, then δ(I) = 1.

Prove that δ(A) = det(A) for all A ∈ M2×2(F ). (This result is general-ized in Section 4.5.)

12. Let {u, v} be an ordered basis for R2. Prove that

O(

uv

)= 1

if and only if {u, v} forms a right-handed coordinate system. Hint:Recall the definition of a rotation given in Example 2 of Section 2.1.

4.2 DETERMINANTS OF ORDER n

In this section, we extend the definition of the determinant to n×n matricesfor n ≥ 3. For this definition, it is convenient to introduce the followingnotation: Given A ∈ Mn×n(F ), for n ≥ 2, denote the (n−1)× (n−1) matrixobtained from A by deleting row i and column j by Aij . Thus for

A =

⎛⎝1 2 34 5 67 8 9

⎞⎠ ∈ M3×3(R),

we have

A11 =(

5 68 9

), A13 =

(4 57 8

), and A32 =

(1 34 6

),

and for

B =

⎛⎜⎜⎝1 −1 2 −1

−3 4 1 −12 −5 −3 8

−2 6 −4 1

⎞⎟⎟⎠ ∈ M4×4(R),

we have

B23 =

⎛⎝ 1 −1 −12 −5 8

−2 6 1

⎞⎠ and B42 =

⎛⎝ 1 2 −1−3 1 −1

2 −3 8

⎞⎠ .

Definitions. Let A ∈ Mn×n(F ). If n = 1, so that A = (A11), we definedet(A) = A11. For n ≥ 2, we define det(A) recursively as

det(A) =n∑

j=1

(−1)1+jA1j · det(A1j).


The scalar det(A) is called the determinant of A and is also denoted by |A|.The scalar

(−1)i+j det(Aij)

is called the cofactor of the entry of A in row i, column j.

Letting

cij = (−1)i+j det(Aij)

denote the cofactor of the row i, column j entry of A, we can express theformula for the determinant of A as

det(A) = A11c11 + A12c12 + · · · + A1nc1n.

Thus the determinant of A equals the sum of the products of each entry in row1 of A multiplied by its cofactor. This formula is called cofactor expansionalong the first row of A. Note that, for 2 × 2 matrices, this definition ofthe determinant of A agrees with the one given in Section 4.1 because

det(A) = A11(−1)1+1 det(A11) + A12(−1)1+2 det(A12) = A11A22 − A12A21.

Example 1

Let

A =

⎛⎝ 1 3 −3−3 −5 2−4 4 −6

⎞⎠ ∈ M3×3(R).

Using cofactor expansion along the first row of A, we obtain

det(A) = (−1)1+1A11 · det(A11) + (−1)1+2A12 · det(A12)

+ (−1)1+3A13 · det(A13)

= (−1)2(1) · det(−5 2

4 −6

)+ (−1)3(3) ·

(−3 2−4 −6

)+ (−1)4(−3) · det

(−3 −5−4 4

)= 1 [−5(−6) − 2(4)] − 3 [−3(−6) − 2(−4)] − 3 [−3(4) − (−5)(−4)]= 1(22) − 3(26) − 3(−32)= 40. ♦


Example 2

Let

B =

⎛⎝ 0 1 3−2 −3 −5

4 −4 4

⎞⎠ ∈ M3×3(R).

Using cofactor expansion along the first row of B, we obtain

det(B) = (−1)1+1B11 · det(B11) + (−1)1+2B12 · det(B12)

+ (−1)1+3B13 · det(B13)

= (−1)2(0) · det(−3 −5−4 4

)+ (−1)3(1) · det

(−2 −54 4

)+ (−1)4(3) · det

(−2 −34 −4

)= 0 − 1 [−2(4) − (−5)(4)] + 3 [−2(−4) − (−3)(4)]= 0 − 1(12) + 3(20)= 48. ♦

Example 3

Let

C =

⎛⎜⎜⎝2 0 0 10 1 3 −3

−2 −3 −5 24 −4 4 −6

⎞⎟⎟⎠ ∈ M4×4(R).

Using cofactor expansion along the first row of C and the results of Examples 1and 2, we obtain

det(C) = (−1)2(2) · det(C11) + (−1)3(0) · det(C12)

+ (−1)4(0) · det(C13) + (−1)5(1) · det(C14)

= (−1)2(2) · det

⎛⎝ 1 3 −3−3 −5 2−4 4 −6

⎞⎠+ 0 + 0

+ (−1)5(1) · det

⎛⎝ 0 1 3−2 −3 −5

4 −4 4

⎞⎠= 2(40) + 0 + 0 − 1(48)= 32. ♦


Example 4

The determinant of the n×n identity matrix is 1. We prove this assertion bymathematical induction on n. The result is clearly true for the 1× 1 identitymatrix. Assume that the determinant of the (n− 1)× (n− 1) identity matrixis 1 for some n ≥ 2, and let I denote the n×n identity matrix. Using cofactorexpansion along the first row of I, we obtain

det(I) = (−1)2(1) · det(I11) + (−1)3(0) · det(I12) + · · ·+ (−1)1+n(0) · det(I1n)

= 1(1) + 0 + · · · + 0= 1

because I11 is the (n − 1) × (n − 1) identity matrix. This shows that thedeterminant of the n× n identity matrix is 1, and so the determinant of anyidentity matrix is 1 by the principle of mathematical induction. ♦

As is illustrated in Example 3, the calculation of a determinant usingthe recursive definition is extremely tedious, even for matrices as small as4×4. Later in this section, we present a more efficient method for evaluatingdeterminants, but we must first learn more about them.

Recall from Theorem 4.1 (p. 200) that, although the determinant of a 2×2matrix is not a linear transformation, it is a linear function of each row whenthe other row is held fixed. We now show that a similar property is true fordeterminants of any size.

Theorem 4.3. The determinant of an n × n matrix is a linear functionof each row when the remaining rows are held fixed. That is, for 1 ≤ r ≤ n,we have

det

⎛⎜⎜⎜⎜⎜⎜⎜⎜⎜⎜⎝

a1

...ar−1

u + kvar+1

...an

⎞⎟⎟⎟⎟⎟⎟⎟⎟⎟⎟⎠= det

⎛⎜⎜⎜⎜⎜⎜⎜⎜⎜⎜⎝

a1

...ar−1

uar+1

...an

⎞⎟⎟⎟⎟⎟⎟⎟⎟⎟⎟⎠+ k det

⎛⎜⎜⎜⎜⎜⎜⎜⎜⎜⎜⎝

a1

...ar−1

var+1

...an

⎞⎟⎟⎟⎟⎟⎟⎟⎟⎟⎟⎠whenever k is a scalar and u, v, and each ai are row vectors in Fn.

Proof. The proof is by mathematical induction on n. The result is imme-diate if n = 1. Assume that for some integer n ≥ 2 the determinant of any(n − 1) × (n − 1) matrix is a linear function of each row when the remaining


rows are held fixed. Let A be an n×n matrix with rows a1, a2, . . . , an, respec-tively, and suppose that for some r (1 ≤ r ≤ n), we have ar = u+kv for someu, v ∈ Fn and some scalar k. Let u = (b1, b2, . . . , bn) and v = (c1, c2, . . . , cn),and let B and C be the matrices obtained from A by replacing row r of A byu and v, respectively. We must prove that det(A) = det(B) + k det(C). Weleave the proof of this fact to the reader for the case r = 1. For r > 1 and1 ≤ j ≤ n, the rows of A1j , B1j , and C1j are the same except for row r − 1.Moreover, row r − 1 of A1j is

(b1 + kc1, . . . , bj−1 + kcj−1, bj+1 + kcj+1, . . . , bn + kcn),

which is the sum of row r − 1 of B1j and k times row r − 1 of C1j . Since B1j

and C1j are (n − 1) × (n − 1) matrices, we have

det(A1j) = det(B1j) + k det(C1j)

by the induction hypothesis. Thus since A1j = B1j = C1j , we have

det(A) =n∑

j=1

(−1)1+jA1j · det(A1j)

=n∑

j=1

(−1)1+jA1j ·[det(B1j) + k det(C1j)

]=

n∑j=1

(−1)1+jA1j · det(B1j) + kn∑

j=1

(−1)1+jA1j · det(C1j)

= det(B) + k det(C).

This shows that the theorem is true for n × n matrices, and so the theoremis true for all square matrices by mathematical induction.

Corollary. If A ∈ Mn×n(F ) has a row consisting entirely of zeros, thendet(A) = 0.

Proof. See Exercise 24.

The definition of a determinant requires that the determinant of a matrixbe evaluated by cofactor expansion along the first row. Our next theoremshows that the determinant of a square matrix can be evaluated by cofactorexpansion along any row. Its proof requires the following technical result.

Lemma. Let B ∈ Mn×n(F ), where n ≥ 2. If row i of B equals ek forsome k (1 ≤ k ≤ n), then det(B) = (−1)i+k det(Bik).


Proof. The proof is by mathematical induction on n. The lemma is easilyproved for n = 2. Assume that for some integer n ≥ 3, the lemma is true for(n− 1)× (n− 1) matrices, and let B be an n× n matrix in which row i of Bequals ek for some k (1 ≤ k ≤ n). The result follows immediately from thedefinition of the determinant if i = 1. Suppose therefore that 1 < i ≤ n. Foreach j �= k (1 ≤ j ≤ n), let Cij denote the (n − 2) × (n − 2) matrix obtainedfrom B by deleting rows 1 and i and columns j and k. For each j, row i − 1of B1j is the following vector in Fn−1:⎧⎪⎨⎪⎩

ek−1 if j < k

0 if j = k

ek if j > k.

Hence by the induction hypothesis and the corollary to Theorem 4.3, we have

det(B1j) =

⎧⎪⎨⎪⎩(−1)(i−1)+(k−1) det(Cij) if j < k

0 if j = k

(−1)(i−1)+k det(Cij) if j > k.

Therefore

det(B) =n∑

j=1

(−1)1+jB1j · det(B1j)

=∑j<k

(−1)1+jB1j · det(B1j) +∑j>k

(−1)1+jB1j · det(B1j)

=∑j<k

(−1)1+jB1j ·[(−1)(i−1)+(k−1) det(Cij)

]+∑j>k

(−1)1+jB1j ·[(−1)(i−1)+k det(Cij)

]

= (−1)i+k

⎡⎣∑j<k

(−1)1+jB1j · det(Cij)

+∑j>k

(−1)1+(j−1)B1j · det(Cij)

⎤⎦ .

Because the expression inside the preceding bracket is the cofactor expan-sion of Bik along the first row, it follows that

det(B) = (−1)i+k det(Bik).

This shows that the lemma is true for n × n matrices, and so the lemma istrue for all square matrices by mathematical induction.


We are now able to prove that cofactor expansion along any row can beused to evaluate the determinant of a square matrix.

Theorem 4.4. The determinant of a square matrix can be evaluated bycofactor expansion along any row. That is, if A ∈ Mn×n(F ), then for anyinteger i (1 ≤ i ≤ n),

det(A) =n∑

j=1

(−1)i+jAij · det(Aij).

Proof. Cofactor expansion along the first row of A gives the determinantof A by definition. So the result is true if i = 1. Fix i > 1. Row i of A canbe written as

∑nj=1 Aijej . For 1 ≤ j ≤ n, let Bj denote the matrix obtained

from A by replacing row i of A by ej . Then by Theorem 4.3 and the lemma,we have

det(A) =n∑

j=1

Aij det(Bj) =n∑

j=1

(−1)i+jAij · det(Aij).

Corollary. If A ∈ Mn×n(F ) has two identical rows, then det(A) = 0.

Proof. The proof is by mathematical induction on n. We leave the proofof the result to the reader in the case that n = 2. Assume that for someinteger n ≥ 3, it is true for (n − 1) × (n − 1) matrices, and let rows r ands of A ∈ Mn×n(F ) be identical for r �= s. Because n ≥ 3, we can choose aninteger i (1 ≤ i ≤ n) other than r and s. Now

det(A) =n∑

j=1

(−1)i+jAij · det(Aij)

by Theorem 4.4. Since each Aij is an (n − 1) × (n − 1) matrix with twoidentical rows, the induction hypothesis implies that each det(Aij) = 0, andhence det(A) = 0. This completes the proof for n × n matrices, and so thelemma is true for all square matrices by mathematical induction.

It is possible to evaluate determinants more efficiently by combining co-factor expansion with the use of elementary row operations. Before such aprocess can be developed, we need to learn what happens to the determinantof a matrix if we perform an elementary row operation on that matrix. The-orem 4.3 provides this information for elementary row operations of type 2(those in which a row is multiplied by a nonzero scalar). Next we turn ourattention to elementary row operations of type 1 (those in which two rowsare interchanged).


Theorem 4.5. If A ∈ Mn×n(F ) and B is a matrix obtained from A byinterchanging any two rows of A, then det(B) = −det(A).

Proof. Let the rows of A ∈ Mn×n(F ) be a1, a2, . . . , an, and let B be thematrix obtained from A by interchanging rows r and s, where r < s. Thus

A =

⎛⎜⎜⎜⎜⎜⎜⎜⎜⎜⎜⎜⎝

a1

...ar

...as

...an

⎞⎟⎟⎟⎟⎟⎟⎟⎟⎟⎟⎟⎠and B =

⎛⎜⎜⎜⎜⎜⎜⎜⎜⎜⎜⎜⎝

a1

...as

...ar

...an

⎞⎟⎟⎟⎟⎟⎟⎟⎟⎟⎟⎟⎠.

Consider the matrix obtained from A by replacing rows r and s by ar + as.By the corollary to Theorem 4.4 and Theorem 4.3, we have

0 = det

⎛⎜⎜⎜⎜⎜⎜⎜⎜⎜⎜⎜⎝

a1

...ar + as

...ar + as

...an

⎞⎟⎟⎟⎟⎟⎟⎟⎟⎟⎟⎟⎠= det

⎛⎜⎜⎜⎜⎜⎜⎜⎜⎜⎜⎜⎝

a1

...ar

...ar + as

...an

⎞⎟⎟⎟⎟⎟⎟⎟⎟⎟⎟⎟⎠+ det

⎛⎜⎜⎜⎜⎜⎜⎜⎜⎜⎜⎜⎝

a1

...as

...ar + as

...an

⎞⎟⎟⎟⎟⎟⎟⎟⎟⎟⎟⎟⎠

= det

⎛⎜⎜⎜⎜⎜⎜⎜⎜⎜⎜⎜⎝

a1

...ar

...ar

...an

⎞⎟⎟⎟⎟⎟⎟⎟⎟⎟⎟⎟⎠+ det

⎛⎜⎜⎜⎜⎜⎜⎜⎜⎜⎜⎜⎝

a1

...ar

...as

...an

⎞⎟⎟⎟⎟⎟⎟⎟⎟⎟⎟⎟⎠+ det

⎛⎜⎜⎜⎜⎜⎜⎜⎜⎜⎜⎜⎝

a1

...as

...ar

...an

⎞⎟⎟⎟⎟⎟⎟⎟⎟⎟⎟⎟⎠+ det

⎛⎜⎜⎜⎜⎜⎜⎜⎜⎜⎜⎜⎝

a1

...as

...as

...an

⎞⎟⎟⎟⎟⎟⎟⎟⎟⎟⎟⎟⎠= 0 + det(A) + det(B) + 0.

Therefore det(B) = −det(A).

We now complete our investigation of how an elementary row operationaffects the determinant of a matrix by showing that elementary row operationsof type 3 do not change the determinant of a matrix.

Theorem 4.6. Let A ∈ Mn×n(F ), and let B be a matrix obtained byadding a multiple of one row of A to another row of A. Then det(B) = det(A).


Proof. Suppose that B is the n × n matrix obtained from A by adding ktimes row r to row s, where r �= s. Let the rows of A be a1, a2, . . . , an, andthe rows of B be b1, b2, . . . , bn. Then bi = ai for i �= s and bs = as + kar.Let C be the matrix obtained from A by replacing row s with ar. ApplyingTheorem 4.3 to row s of B, we obtain

det(B) = det(A) + k det(C) = det(A)

because det(C) = 0 by the corollary to Theorem 4.4.

In Theorem 4.2 (p. 201), we proved that a 2 × 2 matrix is invertible ifand only if its determinant is nonzero. As a consequence of Theorem 4.6, wecan prove half of the promised generalization of this result in the followingcorollary. The converse is proved in the corollary to Theorem 4.7.

Corollary. If A ∈ Mn×n(F ) has rank less than n, then det(A) = 0.

Proof. If the rank of A is less than n, then the rows a1, a2, . . . , an of A arelinearly dependent. By Exercise 14 of Section 1.5, some row of A, say, row r,is a linear combination of the other rows. So there exist scalars ci such that

ar = c1a1 + · · · + cr−1ar−1 + cr+1ar+1 + · · · + cnan.

Let B be the matrix obtained from A by adding −ci times row i to row r foreach i �= r. Then row r of B consists entirely of zeros, and so det(B) = 0.But by Theorem 4.6, det(B) = det(A). Hence det(A) = 0.

The following rules summarize the effect of an elementary row operationon the determinant of a matrix A ∈ Mn×n(F ).

(a) If B is a matrix obtained by interchanging any two rows of A, thendet(B) = −det(A).

(b) If B is a matrix obtained by multiplying a row of A by a nonzero scalark, then det(B) = k det(A).

(c) If B is a matrix obtained by adding a multiple of one row of A to anotherrow of A, then det(B) = det(A).

These facts can be used to simplify the evaluation of a determinant. Con-sider, for instance, the matrix in Example 1:

A =

⎛⎝ 1 3 −3−3 −5 2−4 4 −6

⎞⎠ .

Adding 3 times row 1 of A to row 2 and 4 times row 1 to row 3, we obtain

M =

⎛⎝1 4 −30 4 −70 16 −18

⎞⎠ .


Since M was obtained by performing two type 3 elementary row operationson A, we have det(A) = det(M). The cofactor expansion of M along the firstrow gives

det(M) = (−1)1+1(1) · det(M11) + (−1)1+2(4) · det(M12)

+ (−1)1+3(−3) · det(M13).

Both M12 and M13 have a column consisting entirely of zeros, and sodet(M12) = det(M13) = 0 by the corollary to Theorem 4.6. Hence

det(M) = (−1)1+1(1) · det(M11)

= (−1)1+1(1) · det(

4 −716 −18

)= 1[4(−18) − (−7)(16)] = 40.

Thus with the use of two elementary row operations of type 3, we have reducedthe computation of det(A) to the evaluation of one determinant of a 2 × 2matrix.

But we can do even better. If we add −4 times row 2 of M to row 3(another elementary row operation of type 3), we obtain

P =

⎛⎝1 4 −30 4 −70 0 10

⎞⎠ .

Evaluating det(P ) by cofactor expansion along the first row, we have

det(P ) = (−1)1+1(1) · det(P11)

= (−1)1+1(1) · det(

4 −70 10

)= 1 ·4 ·10 = 40,

as described earlier. Since det(A) = det(M) = det(P ), it follows thatdet(A) = 40.

The preceding calculation of det(P ) illustrates an important general fact.The determinant of an upper triangular matrix is the product of its diagonalentries. (See Exercise 23.) By using elementary row operations of types 1and 3 only, we can transform any square matrix into an upper triangularmatrix, and so we can easily evaluate the determinant of any square matrix.The next two examples illustrate this technique.

Example 5

To evaluate the determinant of the matrix

B =

⎛⎝ 0 1 3−2 −3 −5

4 −4 4

⎞⎠


in Example 2, we must begin with a row interchange. Interchanging rows 1and 2 of B produces

C =

⎛⎝−2 −3 −50 1 34 −4 4

⎞⎠ .

By means of a sequence of elementary row operations of type 3, we cantransform C into an upper triangular matrix:⎛⎝−2 −3 −5

0 1 34 −4 4

⎞⎠ −→⎛⎝−2 −3 −5

0 1 30 −10 −6

⎞⎠ −→⎛⎝−2 −3 −5

0 1 30 0 24

⎞⎠ .

Thus det(C) = −2 ·1 ·24 = −48. Since C was obtained from B by an inter-change of rows, it follows that

det(B) = −det(C) = 48. ♦Example 6

The technique in Example 5 can be used to evaluate the determinant of thematrix

C =

⎛⎜⎜⎝2 0 0 10 1 3 −3

−2 −3 −5 24 −4 4 −6

⎞⎟⎟⎠in Example 3. This matrix can be transformed into an upper triangularmatrix by means of the following sequence of elementary row operations oftype 3:⎛⎜⎜⎝

2 0 0 10 1 3 −3

−2 −3 −5 24 −4 4 −6

⎞⎟⎟⎠ −→

⎛⎜⎜⎝2 0 0 10 1 3 −30 −3 −5 30 −4 4 −8

⎞⎟⎟⎠ −→

⎛⎜⎜⎝2 0 0 10 1 3 −30 0 4 −60 0 16 −20

⎞⎟⎟⎠

−→

⎛⎜⎜⎝2 0 0 10 1 3 −30 0 4 −60 0 0 4

⎞⎟⎟⎠ .

Thus det(C) = 2 ·1 ·4 ·4 = 32. ♦Using elementary row operations to evaluate the determinant of a matrix,

as illustrated in Example 6, is far more efficient than using cofactor expansion.Consider first the evaluation of a 2 × 2 matrix. Since

det(

a bc d

)= ad − bc,


the evaluation of the determinant of a 2× 2 matrix requires 2 multiplications(and 1 subtraction). For n ≥ 3, evaluating the determinant of an n×n matrixby cofactor expansion along any row expresses the determinant as a sum of nproducts involving determinants of (n−1)×(n−1) matrices. Thus in all, theevaluation of the determinant of an n×n matrix by cofactor expansion alongany row requires over n! multiplications, whereas evaluating the determinantof an n × n matrix by elementary row operations as in Examples 5 and 6can be shown to require only (n3 + 2n − 3)/3 multiplications. To evaluatethe determinant of a 20× 20 matrix, which is not large by present standards,cofactor expansion along a row requires over 20! ≈ 2.4 × 1018 multiplica-tions. Thus it would take a computer performing one billion multiplicationsper second over 77 years to evaluate the determinant of a 20 × 20 matrix bythis method. By contrast, the method using elementary row operations re-quires only 2679 multiplications for this calculation and would take the samecomputer less than three-millionths of a second! It is easy to see why mostcomputer programs for evaluating the determinant of an arbitrary matrix donot use cofactor expansion.

In this section, we have defined the determinant of a square matrix interms of cofactor expansion along the first row. We then showed that thedeterminant of a square matrix can be evaluated using cofactor expansionalong any row. In addition, we showed that the determinant possesses anumber of special properties, including properties that enable us to calculatedet(B) from det(A) whenever B is a matrix obtained from A by means of anelementary row operation. These properties enable us to evaluate determi-nants much more efficiently. In the next section, we continue this approachto discover additional properties of determinants.

EXERCISES


(a) The function det : Mn×n(F ) → F is a linear transformation.(b) The determinant of a square matrix can be evaluated by cofactor

expansion along any row.(c) If two rows of a square matrix A are identical, then det(A) = 0.(d) If B is a matrix obtained from a square matrix A by interchanging

any two rows, then det(B) = −det(A).(e) If B is a matrix obtained from a square matrix A by multiplying

a row of A by a scalar, then det(B) = det(A).(f) If B is a matrix obtained from a square matrix A by adding k

times row i to row j, then det(B) = k det(A).(g) If A ∈ Mn×n(F ) has rank n, then det(A) = 0.(h) The determinant of an upper triangular matrix equals the product

of its diagonal entries.


2. Find the value of k that satisfies the following equation:

det

⎛⎝3a1 3a2 3a3

3b1 3b2 3b3

3c1 3c2 3c3

⎞⎠ = k det

⎛⎝a1 a2 a3

b1 b2 b3

c1 c2 c3

⎞⎠ .


det

⎛⎝ 2a1 2a2 2a3

3b1 + 5c1 3b2 + 5c2 3b3 + 5c3

7c1 7c2 7c3

⎞⎠ = k det

⎛⎝a1 a2 a3

b1 b2 b3

c1 c2 c3

⎞⎠ .


det

⎛⎝b1 + c1 b2 + c2 b3 + c3

a1 + c1 a2 + c2 a3 + c3

a1 + b1 a2 + b2 a3 + b3

⎞⎠ = k det

⎛⎝a1 a2 a3

b1 b2 b3

c1 c2 c3

⎞⎠ .

In Exercises 5–12, evaluate the determinant of the given matrix by cofactorexpansion along the indicated row.

5.

⎛⎝ 0 1 2−1 0 −3

2 3 0

⎞⎠along the first row

6.

⎛⎝ 1 0 20 1 5

−1 3 0


7.

⎛⎝ 0 1 2−1 0 −3

2 3 0

⎞⎠along the second row

8.

⎛⎝ 1 0 20 1 5

−1 3 0

⎞⎠along the third row

9.

⎛⎝ 0 1 + i 2−2i 0 1 − i3 4i 0


10.

⎛⎝ i 2 + i 0−1 3 2i0 −1 1 − i

⎞⎠along the second row

11.

⎛⎜⎜⎝0 2 1 31 0 −2 23 −1 0 1

−1 1 2 0

⎞⎟⎟⎠along the fourth row

12.

⎛⎜⎜⎝1 −1 2 −1

−3 4 1 −12 −5 −3 8

−2 6 −4 1


In Exercises 13–22, evaluate the determinant of the given matrix by any le-gitimate method.


13.

⎛⎝0 0 10 2 34 5 6

⎞⎠ 14.

⎛⎝2 3 45 6 07 0 0

⎞⎠

15.

⎛⎝1 2 34 5 67 8 9

⎞⎠ 16.

⎛⎝−1 3 24 −8 12 2 5

⎞⎠

17.

⎛⎝0 1 11 2 −56 −4 3

⎞⎠ 18.

⎛⎝ 1 −2 3−1 2 −5

3 −1 2

⎞⎠

19.

⎛⎝ i 2 −13 1 + i 2

−2i 1 4 − i

⎞⎠ 20.

⎛⎝ −1 2 + i 31 − i i 13i 2 −1 + i

⎞⎠

21.

⎛⎜⎜⎝1 0 −2 3

−3 1 1 20 4 −1 12 3 0 1

⎞⎟⎟⎠ 22.

⎛⎜⎜⎝1 −2 3 −12

−5 12 −14 19−9 22 −20 31−4 9 −14 15

⎞⎟⎟⎠23. Prove that the determinant of an upper triangular matrix is the product

of its diagonal entries.


25. Prove that det(kA) = kn det(A) for any A ∈ Mn×n(F ).

26. Let A ∈ Mn×n(F ). Under what conditions is det(−A) = det(A)?

27. Prove that if A ∈ Mn×n(F ) has two identical columns, then det(A) = 0.

28. Compute det(Ei) if Ei is an elementary matrix of type i.

29.† Prove that if E is an elementary matrix, then det(Et) = det(E).

30. Let the rows of A ∈ Mn×n(F ) be a1, a2, . . . , an, and let B be the matrixin which the rows are an, an−1, . . . , a1. Calculate det(B) in terms ofdet(A).

4.3 PROPERTIES OF DETERMINANTS

In Theorem 3.1, we saw that performing an elementary row operation ona matrix can be accomplished by multiplying the matrix by an elementarymatrix. This result is very useful in studying the effects on the determinant ofapplying a sequence of elementary row operations. Because the determinant

Sec. 4.3 Properties of Determinants 223

of the n×n identity matrix is 1 (see Example 4 in Section 4.2), we can interpretthe statements on page 217 as the following facts about the determinants ofelementary matrices.

(a) If E is an elementary matrix obtained by interchanging any two rowsof I, then det(E) = −1.

(b) If E is an elementary matrix obtained by multiplying some row of I bythe nonzero scalar k, then det(E) = k.

(c) If E is an elementary matrix obtained by adding a multiple of some rowof I to another row, then det(E) = 1.

We now apply these facts about determinants of elementary matrices toprove that the determinant is a multiplicative function.

Theorem 4.7. For any A, B ∈ Mn×n(F ), det(AB) = det(A) · det(B).

Proof. We begin by establishing the result when A is an elementary matrix.If A is an elementary matrix obtained by interchanging two rows of I, thendet(A) = −1. But by Theorem 3.1 (p. 149), AB is a matrix obtained byinterchanging two rows of B. Hence by Theorem 4.5 (p. 216), det(AB) =−det(B) = det(A) · det(B). Similar arguments establish the result when Ais an elementary matrix of type 2 or type 3. (See Exercise 18.)

If A is an n × n matrix with rank less than n, then det(A) = 0 by thecorollary to Theorem 4.6 (p. 216). Since rank(AB) ≤ rank(A) < n by Theo-rem 3.7 (p. 159), we have det(AB) = 0. Thus det(AB) = det(A) · det(B) inthis case.

On the other hand, if A has rank n, then A is invertible and hence isthe product of elementary matrices (Corollary 3 to Theorem 3.6 p. 159), say,A = Em · · ·E2E1. The first paragraph of this proof shows that

det(AB) = det(Em · · ·E2E1B)= det(Em) · det(Em−1 · · ·E2E1B)...= det(Em) · · · · · · det(E2) · det(E1) · det(B)= det(Em · · ·E2E1) · det(B)

= det(A) · det(B).

Corollary. A matrix A ∈ Mn×n(F ) is invertible if and only if det(A) �= 0.

Furthermore, if A is invertible, then det(A−1) =1

det(A).

Proof. If A ∈ Mn×n(F ) is not invertible, then the rank of A is less than n.So det(A) = 0 by the corollary to Theorem 4.6 (p, 217). On the other hand,if A ∈ Mn×n(F ) is invertible, then

det(A) · det(A−1) = det(AA−1) = det(I) = 1


by Theorem 4.7. Hence det(A) �= 0 and det(A−1) =1

det(A).

In our discussion of determinants until now, we have used only the rowsof a matrix. For example, the recursive definition of a determinant involvedcofactor expansion along a row, and the more efficient method developed inSection 4.2 used elementary row operations. Our next result shows that thedeterminants of A and At are always equal. Since the rows of A are thecolumns of At, this fact enables us to translate any statement about determi-nants that involves the rows of a matrix into a corresponding statement thatinvolves its columns.

Theorem 4.8. For any A ∈ Mn×n(F ), det(At) = det(A).

Proof. If A is not invertible, then rank(A) < n. But rank(At) = rank(A)by Corollary 2 to Theorem 3.6 (p. 158), and so At is not invertible. Thusdet(At) = 0 = det(A) in this case.

On the other hand, if A is invertible, then A is a product of elementarymatrices, say A = Em · · ·E2E1. Since det(Ei) = det(Et

i ) for every i byExercise 29 of Section 4.2, by Theorem 4.7 we have

det(At) = det(Et1E

t2 · · ·Et

m)

= det(Et1) · det(Et

2)· · · · · det(Etm)

= det(E1) · det(E2) · · · · · det(Em)= det(Em) · · · · · det(E2) · det(E1)= det(Em · · ·E2E1)= det(A).

Thus, in either case, det(At) = det(A).

Among the many consequences of Theorem 4.8 are that determinants canbe evaluated by cofactor expansion along a column, and that elementary col-umn operations can be used as well as elementary row operations in evaluatinga determinant. (The effect on the determinant of performing an elementarycolumn operation is the same as the effect of performing the correspondingelementary row operation.) We conclude our discussion of determinant prop-erties with a well-known result that relates determinants to the solutions ofcertain types of systems of linear equations.

Theorem 4.9 (Cramer’s Rule). Let Ax = b be the matrix form ofa system of n linear equations in n unknowns, where x = (x1, x2, . . . , xn)t.If det(A) �= 0, then this system has a unique solution, and for each k (k =1, 2, . . . , n),

xk =det(Mk)det(A)

,


where Mk is the n × n matrix obtained from A by replacing column k of Aby b.

Proof. If det(A) �= 0, then the system Ax = b has a unique solution bythe corollary to Theorem 4.7 and Theorem 3.10 (p. 174). For each integer k(1 ≤ k ≤ n), let ak denote the kth column of A and Xk denote the matrixobtained from the n × n identity matrix by replacing column k by x. Thenby Theorem 2.13 (p. 90), AXk is the n × n matrix whose ith column is

Aei = ai if i �= k and Ax = b if i = k.

Thus AXk = Mk. Evaluating Xk by cofactor expansion along row k produces

det(Xk) = xk · det(In−1) = xk.

Hence by Theorem 4.7,

det(Mk) = det(AXk) = det(A) · det(Xk) = det(A) ·xk.

Therefore

xk = [det(A)]−1 · det(Mk).

Example 1

We illustrate Theorem 4.9 by using Cramer’s rule to solve the following systemof linear equations:

x1 + 2x2 + 3x3 = 2x1 + x3 = 3x2 + x2 − x3 = 1.

The matrix form of this system of linear equations is Ax = b, where

A =

⎛⎝1 2 31 0 11 1 −1

⎞⎠ and b =

⎛⎝231

⎞⎠ .

Because det(A) = 6 �= 0, Cramer’s rule applies. Using the notation of Theo-rem 4.9, we have

x1 =det(M1)det(A)

=

det

⎛⎝2 2 33 0 11 1 −1

⎞⎠det(A)

=156

=52,

x2 =det(M2)det(A)

=

det

⎛⎝1 2 31 3 11 1 −1

⎞⎠det(A)

=−66

= −1,


and

x3 =det(M3)det(A)

=

det

⎛⎝1 2 21 0 31 1 1

⎞⎠det(A)

=36

=12.

Thus the unique solution to the given system of linear equations is

(x1, x2, x3) =(

52,−1,

12

). ♦

In applications involving systems of linear equations, we sometimes needto know that there is a solution in which the unknowns are integers. In thissituation, Cramer’s rule can be useful because it implies that a system of linearequations with integral coefficients has an integral solution if the determinantof its coefficient matrix is ±1. On the other hand, Cramer’s rule is not usefulfor computation because it requires evaluating n + 1 determinants of n × nmatrices to solve a system of n linear equations in n unknowns. The amountof computation to do this is far greater than that required to solve the systemby the method of Gaussian elimination, which was discussed in Section 3.4.Thus Cramer’s rule is primarily of theoretical and aesthetic interest, ratherthan of computational value.

As in Section 4.1, it is possible to interpret the determinant of a matrixA ∈ Mn×n(R) geometrically. If the rows of A are a1, a2, . . . , an, respectively,then |det(A)| is the n-dimensional volume (the generalization of area inR2 and volume in R3) of the parallelepiped having the vectors a1, a2, . . . , an

as adjacent sides. (For a proof of a more generalized result, see JerroldE. Marsden and Michael J. Hoffman, Elementary Classical Analysis, W.H.Freeman and Company, New York, 1993, p. 524.)

Example 2

The volume of the parallelepiped having the vectors a1 = (1,−2, 1), a2 =(1, 0,−1), and a3 = (1, 1, 1) as adjacent sides is∣∣∣∣∣∣det

⎛⎝1 −2 11 0 −11 1 1

⎞⎠∣∣∣∣∣∣ = 6.

Note that the object in question is a rectangular parallelepiped (see Fig-ure 4.6) with sides of lengths

√6,

√2, and

√3. Hence by the familiar formula

for volume, its volume should be√

6 ·√2 ·√3 = 6, as the determinant calcu-lation shows. ♦

In our earlier discussion of the geometric significance of the determinantformed from the vectors in an ordered basis for R2, we also saw that this


x

y

z

(1, 1, 1)

(1, 0,−1)

(1,−2, 1)

Figure 4.6: Parallelepiped determined by three vectors in R3.

determinant is positive if and only if the basis induces a right-handed coor-dinate system. A similar statement is true in Rn. Specifically, if γ is anyordered basis for Rn and β is the standard ordered basis for Rn, then γ in-duces a right-handed coordinate system if and only if det(Q) > 0, where Q isthe change of coordinate matrix changing γ-coordinates into β-coordinates.Thus, for instance,

γ =

⎧⎨⎩⎛⎝1

10

⎞⎠ ,

⎛⎝ 1−1

0

⎞⎠ ,

⎛⎝001

⎞⎠⎫⎬⎭induces a left-handed coordinate system in R3 because

det

⎛⎝1 1 01 −1 00 0 1

⎞⎠ = −2 < 0,

whereas

γ′ =

⎧⎨⎩⎛⎝1

20

⎞⎠ ,

⎛⎝−210

⎞⎠ ,

⎛⎝001

⎞⎠⎫⎬⎭induces a right-handed coordinate system in R3 because

det

⎛⎝1 −2 02 1 00 0 1

⎞⎠ = 5 > 0.


More generally, if β and γ are two ordered bases for Rn, then the coordinatesystems induced by β and γ have the same orientation (either both areright-handed or both are left-handed) if and only if det(Q) > 0, where Q isthe change of coordinate matrix changing γ-coordinates into β-coordinates.

EXERCISES


(a) If E is an elementary matrix, then det(E) = ±1.(b) For any A, B ∈ Mn×n(F ), det(AB) = det(A) · det(B).(c) A matrix M ∈ Mn×n(F ) is invertible if and only if det(M) = 0.(d) A matrix M ∈ Mn×n(F ) has rank n if and only if det(M) �= 0.(e) For any A ∈ Mn×n(F ), det(At) = −det(A).(f) The determinant of a square matrix can be evaluated by cofactor

expansion along any column.(g) Every system of n linear equations in n unknowns can be solved

by Cramer’s rule.(h) Let Ax = b be the matrix form of a system of n linear equations

in n unknowns, where x = (x1, x2, . . . , xn)t. If det(A) �= 0 and ifMk is the n × n matrix obtained from A by replacing row k of Aby bt, then the unique solution of Ax = b is

xk =det(Mk)det(A)

for k = 1, 2, . . . , n.

In Exercises 2–7, use Cramer’s rule to solve the given system of linear equa-tions.

2.a11x1 + a12x2 = b1

a21x1 + a22x2 = b2

where a11a22 − a12a21 �= 03.

2x1 + x2 − 3x3 = 5x1 − 2x2 + x3 = 10

3x1 + 4x2 − 2x3 = 0

4.2x1 + x2 − 3x3 = 1x1 − 2x2 + x3 = 0

3x1 + 4x2 − 2x3 = −55.

x1 − x2 + 4x3 = −4−8x1 + 3x2 + x3 = 8

2x1 − x2 + x3 = 0

6.x1 − x2 + 4x3 = −2

−8x1 + 3x2 + x3 = 02x1 − x2 + x3 = 6

7.3x1 + x2 + x3 = 4

−2x1 − x2 = 12x1 + 2x2 + x3 = −8

8. Use Theorem 4.8 to prove a result analogous to Theorem 4.3 (p. 212),but for columns.

9. Prove that an upper triangular n × n matrix is invertible if and only ifall its diagonal entries are nonzero.


10. A matrix M ∈ Mn×n(C) is called nilpotent if, for some positive integerk, Mk = O, where O is the n × n zero matrix. Prove that if M isnilpotent, then det(M) = 0.

11. A matrix M ∈ Mn×n(C) is called skew-symmetric if M t = −M .Prove that if M is skew-symmetric and n is odd, then M is not invert-ible. What happens if n is even?

12. A matrix Q ∈ Mn×n(R) is called orthogonal if QQt = I. Prove thatif Q is orthogonal, then det(Q) = ±1.

13. For M ∈ Mn×n(C), let M be the matrix such that (M)ij = Mij for alli, j, where Mij is the complex conjugate of Mij .

(a) Prove that det(M) = det(M).(b) A matrix Q ∈ Mn×n(C) is called unitary if QQ∗ = I, where

Q∗ = Qt. Prove that if Q is a unitary matrix, then |det(Q)| = 1.

14. Let β = {u1, u2, . . . , un} be a subset of Fn containing n distinct vectors,and let B be the matrix in Mn×n(F ) having uj as column j. Prove thatβ is a basis for Fn if and only if det(B) �= 0.

15.† Prove that if A, B ∈ Mn×n(F ) are similar, then det(A) = det(B).

16. Use determinants to prove that if A, B ∈ Mn×n(F ) are such that AB =I, then A is invertible (and hence B = A−1).

17. Let A, B ∈ Mn×n(F ) be such that AB = −BA. Prove that if n is oddand F is not a field of characteristic two, then A or B is not invertible.

18. Complete the proof of Theorem 4.7 by showing that if A is an elementarymatrix of type 2 or type 3, then det(AB) = det(A) · det(B).

19. A matrix A ∈ Mn×n(F ) is called lower triangular if Aij = 0 for1 ≤ i < j ≤ n. Suppose that A is a lower triangular matrix. Describedet(A) in terms of the entries of A.

20. Suppose that M ∈ Mn×n(F ) can be written in the form

M =(

A BO I

),

where A is a square matrix. Prove that det(M) = det(A).

21.† Prove that if M ∈ Mn×n(F ) can be written in the form

M =(

A BO C

),

where A and C are square matrices, then det(M) = det(A) · det(C).


22. Let T : Pn(F ) → Fn+1 be the linear transformation defined in Exer-cise 22 of Section 2.4 by T(f) = (f(c0), f(c1), . . . , f(cn)), wherec0, c1, . . . , cn are distinct scalars in an infinite field F . Let β be thestandard ordered basis for Pn(F ) and γ be the standard ordered basisfor Fn+1.

(a) Show that M = [T]γβ has the form⎛⎜⎜⎜⎝1 c0 c2

0 · · · cn0

1 c1 c21 · · · cn

1...

......

...1 cn c2

n · · · cnn

⎞⎟⎟⎟⎠ .

A matrix with this form is called a Vandermonde matrix.(b) Use Exercise 22 of Section 2.4 to prove that det(M) �= 0.(c) Prove that

det(M) =∏

0≤i<j≤n

(cj − ci),

the product of all terms of the form cj − ci for 0 ≤ i < j ≤ n.

23. Let A ∈ Mn×n(F ) be nonzero. For any m (1 ≤ m ≤ n), an m × msubmatrix is obtained by deleting any n − m rows and any n − mcolumns of A.

(a) Let k (1 ≤ k ≤ n) denote the largest integer such that some k × ksubmatrix has a nonzero determinant. Prove that rank(A) = k.

(b) Conversely, suppose that rank(A) = k. Prove that there exists ak × k submatrix with a nonzero determinant.

24. Let A ∈ Mn×n(F ) have the form

A =

⎛⎜⎜⎜⎜⎜⎝0 0 0 · · · 0 a0

−1 0 0 · · · 0 a1

0 −1 0 · · · 0 a2

......

......

...0 0 0 · · · −1 an−1

⎞⎟⎟⎟⎟⎟⎠ .

Compute det(A + tI), where I is the n × n identity matrix.

25. Let cjk denote the cofactor of the row j, column k entry of the matrixA ∈ Mn×n(F ).

(a) Prove that if B is the matrix obtained from A by replacing columnk by ej , then det(B) = cjk.


(b) Show that for 1 ≤ j ≤ n, we have

A

⎛⎜⎜⎜⎝cj1

cj2

...cjn

⎞⎟⎟⎟⎠ = det(A) ·ej .

Hint: Apply Cramer’s rule to Ax = ej .(c) Deduce that if C is the n × n matrix such that Cij = cji, then

AC = [det(A)]I.(d) Show that if det(A) �= 0, then A−1 = [det(A)]−1C.

The following definition is used in Exercises 26–27.

Definition. The classical adjoint of a square matrix A is the transposeof the matrix whose ij-entry is the ij-cofactor of A.

26. Find the classical adjoint of each of the following matrices.

(a)(

A11 A12

A21 A22

)(b)

⎛⎝4 0 00 4 00 0 4

⎞⎠(c)

⎛⎝−4 0 00 2 00 0 5

⎞⎠ (d)

⎛⎝3 6 70 4 80 0 5

⎞⎠(e)

⎛⎝1 − i 0 04 3i 02i 1 + 4i −1

⎞⎠ (f)

⎛⎝ 7 1 46 −3 0

−3 5 −2

⎞⎠(g)

⎛⎝−1 2 58 0 −34 6 1

⎞⎠ (h)

⎛⎝ 3 2 + i 0−1 + i 0 i

0 1 3 − 2i

⎞⎠27. Let C be the classical adjoint of A ∈ Mn×n(F ). Prove the following

statements.(a) det(C) = [det(A)]n−1.(b) Ct is the classical adjoint of At.(c) If A is an invertible upper triangular matrix, then C and A−1 are

both upper triangular matrices.

28. Let y1, y2, . . . , yn be linearly independent functions in C∞. For eachy ∈ C∞, define T(y) ∈ C∞ by

[T(y)](t) = det

⎛⎜⎜⎜⎝y(t) y1(t) y2(t) · · · yn(t)y′(t) y′

1(t) y′2(t) · · · y′

n(t)...

......

...y(n)(t) y

(n)1 (t) y

(n)2 (t) · · · y

(n)n (t)

⎞⎟⎟⎟⎠ .


The preceding determinant is called the Wronskian of y, y1, . . . , yn.

(a) Prove that T : C∞ → C∞ is a linear transformation.(b) Prove that N(T) = span({y1, y2, . . . , yn}).

4.4 SUMMARY—IMPORTANT FACTS ABOUT DETERMINANTS

In this section, we summarize the important properties of the determinantneeded for the remainder of the text. The results contained in this sectionhave been derived in Sections 4.2 and 4.3; consequently, the facts presentedhere are stated without proofs.

The determinant of an n×n matrix A having entries from a field F is ascalar in F , denoted by det(A) or |A|, and can be computed in the followingmanner:

1. If A is 1 × 1, then det(A) = A11, the single entry of A.

2. If A is 2 × 2, then det(A) = A11A22 − A12A21. For example,

det(−1 2

5 3

)= (−1)(3) − (2)(5) = −13.

3. If A is n × n for n > 2, then

det(A) =n∑

j=1


(if the determinant is evaluated by the entries of row i of A) or

det(A) =n∑

i=1


(if the determinant is evaluated by the entries of column j of A), whereAij is the (n−1)×(n−1) matrix obtained by deleting row i and columnj from A.

In the formulas above, the scalar (−1)i+j det(Aij) is called the cofactorof the row i column j entry of A. In this language, the determinant of A isevaluated as the sum of terms obtained by multiplying each entry of somerow or column of A by the cofactor of that entry. Thus det(A) is expressedin terms of n determinants of (n− 1)× (n− 1) matrices. These determinantsare then evaluated in terms of determinants of (n−2)× (n−2) matrices, andso forth, until 2 × 2 matrices are obtained. The determinants of the 2 × 2matrices are then evaluated as in item 2.

Sec. 4.4 Summary—Important Facts about Determinants 233

Let us consider two examples of this technique in evaluating the determi-nant of the 4 × 4 matrix

A =

⎛⎜⎜⎝2 1 1 51 1 −4 −12 0 −3 13 6 1 2

⎞⎟⎟⎠ .

To evaluate the determinant of A by expanding along the fourth row, wemust know the cofactors of each entry of that row. The cofactor of A41 = 3is (−1)4+1 det(B), where

B =

⎛⎝1 1 51 −4 −10 −3 1

⎞⎠ .

Let us evaluate this determinant by expanding along the first column. Wehave

det(B) = (−1)1+1(1) det(−4 −1−3 1

)+ (−1)2+1(1) det

(1 5

−3 1

)+ (−1)3+1(0) det

(1 5

−4 −1

)= 1(1)[(−4)(1) − (−1)(−3)] + (−1)(1)[(1)(1) − (5)(−3)] + 0

= −7 − 16 + 0 = −23.

Thus the cofactor of A41 is (−1)5(−23) = 23. Similarly, the cofactors of A42,A43, and A44 are 8, 11, and −13, respectively. We can now evaluate thedeterminant of A by multiplying each entry of the fourth row by its cofactor;this gives

det(A) = 3(23) + 6(8) + 1(11) + 2(−13) = 102.

For the sake of comparison, let us also compute the determinant of Aby expansion along the second column. The reader should verify that thecofactors of A12, A22, and A42 are −14, 40, and 8, respectively. Thus

det(A) = (−1)1+2(1) det

⎛⎝1 −4 −12 −3 13 1 2

⎞⎠+ (−1)2+2(1) det

⎛⎝2 1 52 −3 13 1 2

⎞⎠+ (−1)3+2(0) det

⎛⎝2 1 51 −4 −13 1 2

⎞⎠+ (−1)4+2(6) det

⎛⎝2 1 51 −4 −12 −3 1

⎞⎠= 14 + 40 + 0 + 48 = 102.


Of course, the fact that the value 102 is obtained again is no surprise since thevalue of the determinant of A is independent of the choice of row or columnused in the expansion.

Observe that the computation of det(A) is easier when expanded alongthe second column than when expanded along the fourth row. The differenceis the presence of a zero in the second column, which makes it unnecessaryto evaluate one of the cofactors (the cofactor of A32). For this reason, it isbeneficial to evaluate the determinant of a matrix by expanding along a row orcolumn of the matrix that contains the largest number of zero entries. In fact,it is often helpful to introduce zeros into the matrix by means of elementaryrow operations before computing the determinant. This technique utilizesthe first three properties of the determinant.

Properties of the Determinant

1. If B is a matrix obtained by interchanging any two rows or interchangingany two columns of an n × n matrix A, then det(B) = −det(A).

2. If B is a matrix obtained by multiplying each entry of some row orcolumn of an n × n matrix A by a scalar k, then det(B) = k · det(A).

3. If B is a matrix obtained from an n× n matrix A by adding a multipleof row i to row j or a multiple of column i to column j for i �= j, thendet(B) = det(A).

As an example of the use of these three properties in evaluating deter-minants, let us compute the determinant of the 4 × 4 matrix A consideredpreviously. Our procedure is to introduce zeros into the second column ofA by employing property 3, and then to expand along that column. (Theelementary row operations used here consist of adding multiples of row 1 torows 2 and 4.) This procedure yields

det(A) = det

⎛⎜⎜⎝2 1 1 51 1 −4 −12 0 −3 13 6 1 2

⎞⎟⎟⎠ = det

⎛⎜⎜⎝2 1 1 5

−1 0 −5 −62 0 −3 1

−9 0 −5 −28

⎞⎟⎟⎠

= 1(−1)1+2 det

⎛⎝−1 −5 −62 −3 1

−9 −5 −28

⎞⎠ .

The resulting determinant of a 3 × 3 matrix can be evaluated in the samemanner: Use type 3 elementary row operations to introduce two zeros intothe first column, and then expand along that column. This results in thevalue −102. Therefore

det(A) = 1(−1)1+2(−102) = 102.


The reader should compare this calculation of det(A) with the precedingones to see how much less work is required when properties 1, 2, and 3 areemployed.

In the chapters that follow, we often have to evaluate the determinant ofmatrices having special forms. The next two properties of the determinantare useful in this regard:

4. The determinant of an upper triangular matrix is the product of itsdiagonal entries. In particular, det(I) = 1.

5. If two rows (or columns) of a matrix are identical, then the determinantof the matrix is zero.

As an illustration of property 4, notice that

det

⎛⎝−3 1 20 4 50 0 −6

⎞⎠ = (−3)(4)(−6) = 72.

Property 4 provides an efficient method for evaluating the determinant of amatrix:

(a) Use Gaussian elimination and properties 1, 2, and 3 above to reduce thematrix to an upper triangular matrix.

(b) Compute the product of the diagonal entries.

For instance,

det

⎛⎜⎜⎝1 −1 2 12 −1 −1 4

−4 5 −10 −63 −2 10 −1

⎞⎟⎟⎠ = det

⎛⎜⎜⎝1 −1 2 10 1 −5 20 1 −2 −20 1 4 −4

⎞⎟⎟⎠

= det

⎛⎜⎜⎝1 −1 2 10 1 −5 20 0 3 −40 0 9 −6

⎞⎟⎟⎠ = det

⎛⎜⎜⎝1 −1 2 10 1 −5 20 0 3 −40 0 0 6

⎞⎟⎟⎠= 1 ·1 ·3 ·6 = 18.

The next three properties of the determinant are used frequently in laterchapters. Indeed, perhaps the most significant property of the determinantis that it provides a simple characterization of invertible matrices. (See prop-erty 7.)

6. For any n × n matrices A and B, det(AB) = det(A) · det(B).


7. An n×n matrix A is invertible if and only if det(A) �= 0. Furthermore,

if A is invertible, then det(A−1) =1

det(A).

8. For any n × n matrix A, the determinants of A and At are equal.

For example, property 7 guarantees that the matrix A on page 233 isinvertible because det(A) = 102.

The final property, stated as Exercise 15 of Section 4.3, is used in Chap-ter 5. It is a simple consequence of properties 6 and 7.

9. If A and B are similar matrices, then det(A) = det(B).

EXERCISES


(a) The determinant of a square matrix may be computed by expand-ing the matrix along any row or column.

(b) In evaluating the determinant of a matrix, it is wise to expandalong a row or column containing the largest number of zero en-tries.

(c) If two rows or columns of A are identical, then det(A) = 0.(d) If B is a matrix obtained by interchanging two rows or two columns

of A, then det(B) = det(A).(e) If B is a matrix obtained by multiplying each entry of some row

or column of A by a scalar, then det(B) = det(A).(f) If B is a matrix obtained from A by adding a multiple of some row

to a different row, then det(B) = det(A).(g) The determinant of an upper triangular n×n matrix is the product

of its diagonal entries.(h) For every A ∈ Mn×n(F ), det(At) = −det(A).(i) If A, B ∈ Mn×n(F ), then det(AB) = det(A) · det(B).(j) If Q is an invertible matrix, then det(Q−1) = [det(Q)]−1.(k) A matrix Q is invertible if and only if det(Q) �= 0.

2. Evaluate the determinant of the following 2 × 2 matrices.

(a)(

4 −52 3

)(b)

(−1 73 8

)

(c)(

2 + i −1 + 3i1 − 2i 3 − i

)(d)

(3 4i

−6i 2i

)3. Evaluate the determinant of the following matrices in the manner indi-

cated.


(a)

⎛⎝ 0 1 2−1 0 −3

2 3 0


(b)

⎛⎝ 1 0 20 1 5

−1 3 0

⎞⎠along the first column

(c)

⎛⎝ 0 1 2−1 0 −3

2 3 0

⎞⎠along the second column

(d)

⎛⎝ 1 0 20 1 5

−1 3 0


(e)

⎛⎝ 0 1 + i 2−2i 0 1 − i3 4i 0


(f)

⎛⎝ i 2 + i 0−1 3 2i0 −1 1 − i

⎞⎠along the third column

(g)

⎛⎜⎜⎝0 2 1 31 0 −2 23 −1 0 1

−1 1 2 0

⎞⎟⎟⎠along the fourth column

(h)

⎛⎜⎜⎝1 −1 2 −1

−3 4 1 −12 −5 −3 8

−2 6 −4 1


4. Evaluate the determinant of the following matrices by any legitimatemethod.

(a)

⎛⎝1 2 34 5 67 8 9

⎞⎠ (b)

⎛⎝−1 3 24 −8 12 2 5

⎞⎠

(c)

⎛⎝0 1 11 2 −56 −4 3

⎞⎠ (d)

⎛⎝ 1 −2 3−1 2 −5

3 −1 2

⎞⎠

(e)

⎛⎝ i 2 −13 1 + i 2

−2i 1 4 − i

⎞⎠ (f)

⎛⎝ −1 2 + i 31 − i i 13i 2 −1 + i

⎞⎠

(g)

⎛⎜⎜⎝1 0 −2 3

−3 1 1 20 4 −1 12 3 0 1

⎞⎟⎟⎠ (h)

⎛⎜⎜⎝1 −2 3 −12

−5 12 −14 19−9 22 −20 31−4 9 −14 15

⎞⎟⎟⎠5. Suppose that M ∈ Mn×n(F ) can be written in the form

M =(

A BO I

),

where A is a square matrix. Prove that det(M) = det(A).


6.† Prove that if M ∈ Mn×n(F ) can be written in the form

M =(

A BO C

),

where A and C are square matrices, then det(M) = det(A) · det(C).

4.5∗ A CHARACTERIZATION OF THE DETERMINANT

In Sections 4.2 and 4.3, we showed that the determinant possesses a number ofproperties. In this section, we show that three of these properties completelycharacterize the determinant; that is, the only function δ : Mn×n(F ) → Fhaving these three properties is the determinant. This characterization ofthe determinant is the one used in Section 4.1 to establish the relationship

between det(

uv

)and the area of the parallelogram determined by u and

v. The first of these properties that characterize the determinant is the onedescribed in Theorem 4.3 (p. 212).

Definition. A function δ : Mn×n(F ) → F is called an n-linear functionif it is a linear function of each row of an n × n matrix when the remainingn− 1 rows are held fixed, that is, δ is n-linear if, for every r = 1, 2, . . . , n, wehave

δ

⎛⎜⎜⎜⎜⎜⎜⎜⎜⎜⎜⎝

a1

...ar−1

u + kvar+1

...an

⎞⎟⎟⎟⎟⎟⎟⎟⎟⎟⎟⎠= δ

⎛⎜⎜⎜⎜⎜⎜⎜⎜⎜⎜⎝

a1

...ar−1

uar+1

...an

⎞⎟⎟⎟⎟⎟⎟⎟⎟⎟⎟⎠+ kδ

⎛⎜⎜⎜⎜⎜⎜⎜⎜⎜⎜⎝

a1

...ar−1

var+1

...an

⎞⎟⎟⎟⎟⎟⎟⎟⎟⎟⎟⎠whenever k is a scalar and u, v, and each ai are vectors in Fn.

Example 1

The function δ : Mn×n(F ) → F defined by δ(A) = 0 for each A ∈ Mn×n(F )is an n-linear function. ♦

Example 2

For 1 ≤ j ≤ n, define δj : Mn×n(F ) → F by δj(A) = A1jA2j · · ·Anj for eachA ∈ Mn×n(F ); that is, δj(A) equals the product of the entries of column j of

Sec. 4.5 A Characterization of the Determinant 239

A. Let A ∈ Mn×n(F ), ai = (Ai1, Ai2, . . . , Ain), and v = (b1, b2, . . . , bn) ∈ Fn.Then each δj is an n-linear function because, for any scalar k, we have

δ

⎛⎜⎜⎜⎜⎜⎜⎜⎜⎜⎜⎝

a1

...ar−1

ar + kvar+1

...an

⎞⎟⎟⎟⎟⎟⎟⎟⎟⎟⎟⎠= A1j · · ·A(r−1)j(Arj + kbj)A(r+1)j · · ·Anj

= A1j · · ·A(r−1)jArjA(r+1)j · · ·Anj

+ A1j · · ·A(r−1)j(kbj)A(r+1)j · · ·Anj

= A1j · · ·A(r−1)jArjA(r+1)j · · ·Anj

+ k(A1j · · ·A(r−1)jbjA(r+1)j · · ·Anj)

= δ

⎛⎜⎜⎜⎜⎜⎜⎜⎜⎜⎜⎝

a1

...ar−1

ar

ar+1

...an

⎞⎟⎟⎟⎟⎟⎟⎟⎟⎟⎟⎠+ kδ

⎛⎜⎜⎜⎜⎜⎜⎜⎜⎜⎜⎝

a1

...ar−1

var+1

...an

⎞⎟⎟⎟⎟⎟⎟⎟⎟⎟⎟⎠. ♦

Example 3

The function δ : Mn×n(F ) → F defined for each A ∈ Mn×n(F ) by δ(A) =A11A22 · · ·Ann (i.e., δ(A) equals the product of the diagonal entries of A) isan n-linear function. ♦Example 4

The function δ : Mn×n(R) → R defined for each A ∈ Mn×n(R) by δ(A) =tr(A) is not an n-linear function for n ≥ 2. For if I is the n × n identitymatrix and A is the matrix obtained by multiplying the first row of I by 2,then δ(A) = n + 1 �= 2n = 2 ·δ(I). ♦

Theorem 4.3 (p. 212) asserts that the determinant is an n-linear function.For our purposes this is the most important example of an n-linear function.Now we introduce the second of the properties used in the characterizationof the determinant.

Definition. An n-linear function δ : Mn×n(F ) → F is called alternatingif, for each A ∈ Mn×n(F ), we have δ(A) = 0 whenever two adjacent rows ofA are identical.


Theorem 4.10. Let δ : Mn×n(F ) → F be an alternating n-linear function.(a) If A ∈ Mn×n(F ) and B is a matrix obtained from A by interchanging

any two rows of A, then δ(B) = −δ(A).(b) If A ∈ Mn×n(F ) has two identical rows, then δ(A) = 0.

Proof. (a) Let A ∈ Mn×n(F ), and let B be the matrix obtained from Aby interchanging rows r and s, where r < s. We first establish the result inthe case that s = r + 1. Because δ : Mn×n(F ) → F is an n-linear functionthat is alternating, we have

0 = δ

⎛⎜⎜⎜⎜⎜⎜⎜⎜⎝

a1

...ar + ar+1

ar + ar+1

...an

⎞⎟⎟⎟⎟⎟⎟⎟⎟⎠= δ

⎛⎜⎜⎜⎜⎜⎜⎜⎜⎝

a1

...ar

ar + ar+1

...an

⎞⎟⎟⎟⎟⎟⎟⎟⎟⎠+ δ

⎛⎜⎜⎜⎜⎜⎜⎜⎜⎝

a1

...ar+1

ar + ar+1

...an

⎞⎟⎟⎟⎟⎟⎟⎟⎟⎠

= δ

⎛⎜⎜⎜⎜⎜⎜⎜⎜⎝

a1

...ar

ar

...an

⎞⎟⎟⎟⎟⎟⎟⎟⎟⎠+ δ

⎛⎜⎜⎜⎜⎜⎜⎜⎜⎝

a1

...ar

ar+1

...an

⎞⎟⎟⎟⎟⎟⎟⎟⎟⎠+ δ

⎛⎜⎜⎜⎜⎜⎜⎜⎜⎝

a1

...ar+1

ar

...an

⎞⎟⎟⎟⎟⎟⎟⎟⎟⎠+ δ

⎛⎜⎜⎜⎜⎜⎜⎜⎜⎝

a1

...ar+1

ar+1

...an

⎞⎟⎟⎟⎟⎟⎟⎟⎟⎠= 0 + δ(A) + δ(B) + 0.

Thus δ(B) = −δ(A).Next suppose that s > r + 1, and let the rows of A be a1, a2, . . . , an.

Beginning with ar and ar+1, successively interchange ar with the row thatfollows it until the rows are in the sequence

a1, a2, . . . , ar−1, ar+1, . . . , as, ar, as+1, . . . , an.

In all, s−r interchanges of adjacent rows are needed to produce this sequence.Then successively interchange as with the row that precedes it until the rowsare in the order

a1, a2, . . . , ar−1, as, ar+1, . . . , as−1, ar, as+1, . . . , an.

This process requires an additional s − r − 1 interchanges of adjacent rowsand produces the matrix B. It follows from the preceding paragraph that

δ(B) = (−1)(s−r)+(s−r−1)δ(A) = −δ(A).

(b) Suppose that rows r and s of A ∈ Mn×n(F ) are identical, where r < s.If s = r + 1, then δ(A) = 0 because δ is alternating and two adjacent rows


of A are identical. If s > r + 1, let B be the matrix obtained from A byinterchanging rows r + 1 and s. Then δ(B) = 0 because two adjacent rows ofB are identical. But δ(B) = −δ(A) by (a). Hence δ(A) = 0.

Corollary 1. Let δ : Mn×n(F ) → F be an alternating n-linear function.If B is a matrix obtained from A ∈ Mn×n(F ) by adding a multiple of somerow of A to another row, then δ(B) = δ(A).

Proof. Let B be obtained from A ∈ Mn×n(F ) by adding k times row i ofA to row j, where j �= i, and let C be obtained from A by replacing row j ofA by row i of A. Then the rows of A, B, and C are identical except for rowj. Moreover, row j of B is the sum of row j of A and k times row j of C.Since δ is an n-linear function and C has two identical rows, it follows that

δ(B) = δ(A) + kδ(C) = δ(A) + k ·0 = δ(A).

The next result now follows as in the proof of the corollary to Theorem 4.6(p. 216). (See Exercise 11.)

Corollary 2. Let δ : Mn×n(F ) → F be an alternating n-linear function.If M ∈ Mn×n(F ) has rank less than n, then δ(M) = 0.

Proof. Exercise.

Corollary 3. Let δ : Mn×n(F ) → F be an alternating n-linear function,and let E1, E2, and E3 in Mn×n(F ) be elementary matrices of types 1, 2,and 3, respectively. Suppose that E2 is obtained by multiplying some rowof I by the nonzero scalar k. Then δ(E1) = −δ(I), δ(E2) = k ·δ(I), andδ(E3) = δ(I).

Proof. Exercise.

We wish to show that under certain circumstances, the only alternatingn-linear function δ : Mn×n(F ) → F is the determinant, that is, δ(A) = det(A)for all A ∈ Mn×n(F ). In view of Corollary 3 to Theorem 4.10 and the factson page 223 about the determinant of an elementary matrix, this can happenonly if δ(I) = 1. Hence the third condition that is used in the characterizationof the determinant is that the determinant of the n × n identity matrix is 1.Before we can establish the desired characterization of the determinant, wemust first show that an alternating n-linear function δ such that δ(I) = 1 isa multiplicative function. The proof of this result is identical to the proof ofTheorem 4.7 (p. 223), and so it is omitted. (See Exercise 12.)

Theorem 4.11. Let δ : Mn×n(F ) → F be an alternating n-linear functionsuch that δ(I) = 1. For any A, B ∈ Mn×n(F ), we have δ(AB) = δ(A) ·δ(B).


Proof. Exercise.

Theorem 4.12. If δ : Mn×n(F ) → F is an alternating n-linear functionsuch that δ(I) = 1, then δ(A) = det(A) for every A ∈ Mn×n(F ).

Proof. Let δ : Mn×n(F ) → F be an alternating n-linear function such thatδ(I) = 1, and let A ∈ Mn×n(F ). If A has rank less than n, then by Corollary 2to Theorem 4.10, δ(A) = 0. Since the corollary to Theorem 4.6 (p. 217) givesdet(A) = 0, we have δ(A) = det(A) in this case. If, on the other hand, A hasrank n, then A is invertible and hence is the product of elementary matrices(Corollary 3 to Theorem 3.6 p. 159), say A = Em · · ·E2E1. Since δ(I) = 1,it follows from Corollary 3 to Theorem 4.10 and the facts on page 223 thatδ(E) = det(E) for every elementary matrix E. Hence by Theorems 4.11and 4.7 (p. 223), we have

δ(A) = δ(Em · · ·E2E1)= δ(Em) · · · · ·δ(E2) ·δ(E1)= det(Em) · · · · · det(E2) · det(E1)= det(Em · · ·E2E1)

= det(A).

Theorem 4.12 provides the desired characterization of the determinant: Itis the unique function δ : Mn×n(F ) → F that is n-linear, is alternating, andhas the property that δ(I) = 1.

EXERCISES


(a) Any n-linear function δ : Mn×n(F ) → F is a linear transformation.(b) Any n-linear function δ : Mn×n(F ) → F is a linear function of each

row of an n × n matrix when the other n − 1 rows are held fixed.(c) If δ : Mn×n(F ) → F is an alternating n-linear function and the

matrix A ∈ Mn×n(F ) has two identical rows, then δ(A) = 0.(d) If δ : Mn×n(F ) → F is an alternating n-linear function and B is

obtained from A ∈ Mn×n(F ) by interchanging two rows of A, thenδ(B) = δ(A).

(e) There is a unique alternating n-linear function δ : Mn×n(F ) → F .(f) The function δ : Mn×n(F ) → F defined by δ(A) = 0 for every

A ∈ Mn×n(F ) is an alternating n-linear function.

2. Determine all the 1-linear functions δ : M1×1(F ) → F .

Determine which of the functions δ : M3×3(F ) → F in Exercises 3–10 are3-linear functions. Justify each answer.


3. δ(A) = k, where k is any nonzero scalar

4. δ(A) = A22

5. δ(A) = A11A23A32

6. δ(A) = A11 + A23 + A32

7. δ(A) = A11A21A32

8. δ(A) = A11A31A32

9. δ(A) = A211A

222A

233

10. δ(A) = A11A22A33 − A11A21A32



13. Prove that det : M2×2(F ) → F is a 2-linear function of the columns ofa matrix.

14. Let a, b, c, d ∈ F . Prove that the function δ : M2×2(F ) → F defined byδ(A) = A11A22a + A11A21b + A12A22c + A12A21d is a 2-linear function.

15. Prove that δ : M2×2(F ) → F is a 2-linear function if and only if it hasthe form

δ(A) = A11A22a + A11A21b + A12A22c + A12A21d

for some scalars a, b, c, d ∈ F .

16. Prove that if δ : Mn×n(F ) → F is an alternating n-linear function, thenthere exists a scalar k such that δ(A) = k det(A) for all A ∈ Mn×n(F ).

17. Prove that a linear combination of two n-linear functions is an n-linearfunction, where the sum and scalar product of n-linear functions are asdefined in Example 3 of Section 1.2 (p. 9).

18. Prove that the set of all n-linear functions over a field F is a vectorspace over F under the operations of function addition and scalar mul-tiplication as defined in Example 3 of Section 1.2 (p. 9).

19. Let δ : Mn×n(F ) → F be an n-linear function and F a field that doesnot have characteristic two. Prove that if δ(B) = −δ(A) whenever B isobtained from A ∈ Mn×n(F ) by interchanging any two rows of A, thenδ(M) = 0 whenever M ∈ Mn×n(F ) has two identical rows.

20. Give an example to show that the implication in Exercise 19 need nothold if F has characteristic two.



Alternating n-linear function 239Angle between two vectors 202Cofactor 210Cofactor expansion along the first

row 210Cramer’s rule 224Determinant of a 2× 2 matrix 200Determinant of a matrix 210Left-handed coordinate system

203

n-linear function 238Orientation of an ordered basis

202Parallelepiped, volume of 226Parallelogram determined by two

vectors 203Right-handed coordinate system

202

5Diagonalization

5.1 Eigenvalues and Eigenvectors5.2 Diagonalizability5.3* Matrix Limits and Markov Chains5.4 Invariant Subspaces and the Cayley-Hamilton Theorem

This chapter is concerned with the so-called diagonalization problem. Fora given linear operator T on a finite-dimensional vector space V, we seekanswers to the following questions.

1. Does there exist an ordered basis β for V such that [T]β is a diagonalmatrix?

2. If such a basis exists, how can it be found?

Since computations involving diagonal matrices are simple, an affirmativeanswer to question 1 leads us to a clearer understanding of how the operator Tacts on V, and an answer to question 2 enables us to obtain easy solutions tomany practical problems that can be formulated in a linear algebra context.We consider some of these problems and their solutions in this chapter; see,for example, Section 5.3.

A solution to the diagonalization problem leads naturally to the conceptsof eigenvalue and eigenvector. Aside from the important role that theseconcepts play in the diagonalization problem, they also prove to be usefultools in the study of many nondiagonalizable operators, as we will see inChapter 7.

5.1 EIGENVALUES AND EIGENVECTORS

In Example 3 of Section 2.5, we were able to obtain a formula for thereflection of R2 about the line y = 2x. The key to our success was to find abasis β′ for which [T]β′ is a diagonal matrix. We now introduce the name foran operator or matrix that has such a basis.

Definitions. A linear operator T on a finite-dimensional vector space Vis called diagonalizable if there is an ordered basis β for V such that [T]β

245

246 Chap. 5 Diagonalization

is a diagonal matrix. A square matrix A is called diagonalizable if LA isdiagonalizable.

We want to determine when a linear operator T on a finite-dimensionalvector space V is diagonalizable and, if so, how to obtain an ordered basisβ = {v1, v2, . . . , vn} for V such that [T]β is a diagonal matrix. Note that, ifD = [T]β is a diagonal matrix, then for each vector vj ∈ β, we have

T(vj) =n∑

i=1

Dijvi = Djjvj = λjvj ,

where λj = Djj .Conversely, if β = {v1, v2, . . . , vn} is an ordered basis for V such that

T(vj) = λjvj for some scalars λ1, λ2, . . . , λn, then clearly

[T]β =

⎛⎜⎜⎜⎝λ1 0 · · · 00 λ2 · · · 0...

......

0 0 · · · λn

⎞⎟⎟⎟⎠ .

In the preceding paragraph, each vector v in the basis β satisfies thecondition that T(v) = λv for some scalar λ. Moreover, because v lies in abasis, v is nonzero. These computations motivate the following definitions.

Definitions. Let T be a linear operator on a vector space V. A nonzerovector v ∈ V is called an eigenvector of T if there exists a scalar λ suchthat T(v) = λv. The scalar λ is called the eigenvalue corresponding to theeigenvector v.

Let A be in Mn×n(F ). A nonzero vector v ∈ Fn is called an eigenvectorof A if v is an eigenvector of LA; that is, if Av = λv for some scalar λ. Thescalar λ is called the eigenvalue of A corresponding to the eigenvector v.

The words characteristic vector and proper vector are also used in place ofeigenvector. The corresponding terms for eigenvalue are characteristic valueand proper value.

Note that a vector is an eigenvector of a matrix A if and only if it is aneigenvector of LA. Likewise, a scalar λ is an eigenvalue of A if and only if it isan eigenvalue of LA. Using the terminology of eigenvectors and eigenvalues,we can summarize the preceding discussion as follows.

Theorem 5.1. A linear operator T on a finite-dimensional vector space Vis diagonalizable if and only if there exists an ordered basis β for V consistingof eigenvectors of T. Furthermore, if T is diagonalizable, β = {v1, v2, . . . , vn}is an ordered basis of eigenvectors of T, and D = [T]β , then D is a diagonalmatrix and Djj is the eigenvalue corresponding to vj for 1 ≤ j ≤ n.

Sec. 5.1 Eigenvalues and Eigenvectors 247

To diagonalize a matrix or a linear operator is to find a basis of eigenvec-tors and the corresponding eigenvalues.

Before continuing our study of the diagonalization problem, we considerthree examples of eigenvalues and eigenvectors.

Example 1

Let

A =(

1 34 2

), v1 =

(1

−1

), and v2 =

(34

).

Since

LA(v1) =(

1 34 2

)(1

−1

)=(−2

2

)= −2

(1

−1

)= −2v1,

v1 is an eigenvector of LA, and hence of A. Here λ1 = −2 is the eigenvaluecorresponding to v1. Furthermore,

LA(v2) =(

1 34 2

)(34

)=(

1520

)= 5

(34

)= 5v2,

and so v2 is an eigenvector of LA, and hence of A, with the correspondingeigenvalue λ2 = 5. Note that β = {v1, v2} is an ordered basis for R2 consistingof eigenvectors of both A and LA, and therefore A and LA are diagonalizable.Moreover, by Theorem 5.1,

[LA]β =(−2 0

0 5

). ♦

Example 2

Let T be the linear operator on R2 that rotates each vector in the planethrough an angle of π/2. It is clear geometrically that for any nonzero vectorv, the vectors v and T(v) are not collinear; hence T(v) is not a multiple ofv. Therefore T has no eigenvectors and, consequently, no eigenvalues. Thusthere exist operators (and matrices) with no eigenvalues or eigenvectors. Ofcourse, such operators and matrices are not diagonalizable. ♦Example 3

Let C∞(R) denote the set of all functions f : R → R having derivatives of allorders. (Thus C∞(R) includes the polynomial functions, the sine and cosinefunctions, the exponential functions, etc.) Clearly, C∞(R) is a subspace ofthe vector space F(R, R) of all functions from R to R as defined in Section1.2. Let T : C∞(R) → C∞(R) be the function defined by T(f) = f ′, thederivative of f . It is easily verified that T is a linear operator on C∞(R). Wedetermine the eigenvalues and eigenvectors of T.


Suppose that f is an eigenvector of T with corresponding eigenvalue λ.Then f ′ = T(f) = λf . This is a first-order differential equation whose solu-tions are of the form f(t) = ceλt for some constant c. Consequently, everyreal number λ is an eigenvalue of T, and λ corresponds to eigenvectors of theform ceλt for c �= 0. Note that for λ = 0, the eigenvectors are the nonzeroconstant functions. ♦

In order to obtain a basis of eigenvectors for a matrix (or a linear opera-tor), we need to be able to determine its eigenvalues and eigenvectors. Thefollowing theorem gives us a method for computing eigenvalues.

Theorem 5.2. Let A ∈ Mn×n(F ). Then a scalar λ is an eigenvalue of Aif and only if det(A − λIn) = 0.

Proof. A scalar λ is an eigenvalue of A if and only if there exists a nonzerovector v ∈ Fn such that Av = λv, that is, (A−λIn)(v) = 0 . By Theorem 2.5(p. 71), this is true if and only if A − λIn is not invertible. However, thisresult is equivalent to the statement that det(A − λIn) = 0.

Definition. Let A ∈ Mn×n(F ). The polynomial f(t) = det(A − tIn) iscalled the characteristic polynomial 1 of A.

Theorem 5.2 states that the eigenvalues of a matrix are the zeros of itscharacteristic polynomial. When determining the eigenvalues of a matrix ora linear operator, we normally compute its characteristic polynomial, as inthe next example.

Example 4

To find the eigenvalues of

A =(

1 14 1

)∈ M2×2(R),

we compute its characteristic polynomial:

det(A − tI2) = det(

1 − t 14 1 − t

)= t2 − 2t − 3 = (t − 3)(t + 1).

It follows from Theorem 5.2 that the only eigenvalues of A are 3 and −1.♦

1The observant reader may have noticed that the entries of the matrix A − tIn

are not scalars in the field F . They are, however, scalars in another field F (t), thefield of quotients of polynomials in t with coefficients from F . Consequently, anyresults proved about determinants in Chapter 4 remain valid in this context.


It is easily shown that similar matrices have the same characteristic poly-nomial (see Exercise 12). This fact enables us to define the characteristicpolynomial of a linear operator as follows.

Definition. Let T be a linear operator on an n-dimensional vector spaceV with ordered basis β. We define the characteristic polynomial f(t) ofT to be the characteristic polynomial of A = [T]β . That is,

f(t) = det(A − tIn).

The remark preceding this definition shows that the definition is indepen-dent of the choice of ordered basis β. Thus if T is a linear operator on afinite-dimensional vector space V and β is an ordered basis for V, then λ isan eigenvalue of T if and only if λ is an eigenvalue of [T]β . We often denotethe characteristic polynomial of an operator T by det(T − tI).

Example 5

Let T be the linear operator on P2(R) defined by T(f(x)) = f(x)+(x+1)f ′(x),let β be the standard ordered basis for P2(R), and let A = [T]β . Then

A =

⎛⎝1 1 00 2 20 0 3

⎞⎠ .

The characteristic polynomial of T is

det(A − tI3) = det

⎛⎝1 − t 1 00 2 − t 20 0 3 − t

⎞⎠= (1 − t)(2 − t)(3 − t)= −(t − 1)(t − 2)(t − 3).

Hence λ is an eigenvalue of T (or A) if and only if λ = 1, 2, or 3. ♦Examples 4 and 5 suggest that the characteristic polynomial of an n × n

matrix A is a polynomial of degree n. The next theorem tells us even more.It can be proved by a straightforward induction argument.

Theorem 5.3. Let A ∈ Mn×n(F ).(a) The characteristic polynomial of A is a polynomial of degree n with

leading coefficient (−1)n.(b) A has at most n distinct eigenvalues.

Proof. Exercise.


Theorem 5.2 enables us to determine all the eigenvalues of a matrix ora linear operator on a finite-dimensional vector space provided that we cancompute the zeros of the characteristic polynomial. Our next result givesus a procedure for determining the eigenvectors corresponding to a giveneigenvalue.

Theorem 5.4. Let T be a linear operator on a vector space V, and let λbe an eigenvalue of T. A vector v ∈ V is an eigenvector of T correspondingto λ if and only if v �= 0 and v ∈ N(T − λI).

Proof. Exercise.

Example 6

To find all the eigenvectors of the matrix

A =(

1 14 1

)in Example 4, recall that A has two eigenvalues, λ1 = 3 and λ2 = −1. Webegin by finding all the eigenvectors corresponding to λ1 = 3. Let

B1 = A − λ1I =(

1 14 1

)−(

3 00 3

)=(−2 1

4 −2

).

Then

x =(

x1

x2

)∈ R2

is an eigenvector corresponding to λ1 = 3 if and only if x �= 0 and x ∈ N(LB1);that is, x �= 0 and(−2 1

4 −2

)(x1

x2

)=(−2x1 + x2

4x1 − 2x2

)=(

00

).

Clearly the set of all solutions to this equation is{t

(12

): t ∈ R

}.

Hence x is an eigenvector corresponding to λ1 = 3 if and only if

x = t

(12

)for some t �= 0.

Now suppose that x is an eigenvector of A corresponding to λ2 = −1. Let

B2 = A − λ2I =(

1 14 1

)−(−1 0

0 −1

)=(

2 14 2

).


Then

x =(

x1

x2

)∈ N(LB2)

if and only if x is a solution to the system

2x1 + x2 = 04x1 + 2x2 = 0.

Hence

N(LB2) ={

t

(1

−2

): t ∈ R

}.

Thus x is an eigenvector corresponding to λ2 = −1 if and only if

x = t

(1

−2

)for some t �= 0.

Observe that {(12

),

(1

−2

)}is a basis for R2 consisting of eigenvectors of A. Thus LA, and hence A, isdiagonalizable. ♦

Suppose that β is a basis for Fn consisting of eigenvectors of A. Thecorollary to Theorem 2.23 assures us that if Q is the n × n matrix whosecolumns are the vectors in β, then Q−1AQ is a diagonal matrix. In Example 6,for instance, if

Q =(

1 12 −2

),

then

Q−1AQ =(

3 00 −1

).

Of course, the diagonal entries of this matrix are the eigenvalues of A thatcorrespond to the respective columns of Q.

To find the eigenvectors of a linear operator T on an n-dimensional vectorspace, select an ordered basis β for V and let A = [T]β . Figure 5.1 is thespecial case of Figure 2.2 in Section 2.4 in which V = W and β = γ. Recallthat for v ∈ V, φβ(v) = [v]β , the coordinate vector of v relative to β. Weshow that v ∈ V is an eigenvector of T corresponding to λ if and only if φβ(v)


VT−−−−→ V

φβ

⏐⏐! ⏐⏐!φβ

Fn LA−−−−→ Fn

Figure 5.1

is an eigenvector of A corresponding to λ. Suppose that v is an eigenvectorof T corresponding to λ. Then T(v) = λv. Hence

Aφβ(v) = LAφβ(v) = φβT(v) = φβ(λv) = λφβ(v).

Now φβ(v) �= 0 , since φβ is an isomorphism; hence φβ(v) is an eigenvectorof A. This argument is reversible, and so we can establish that if φβ(v)is an eigenvector of A corresponding to λ, then v is an eigenvector of Tcorresponding to λ. (See Exercise 13.)

An equivalent formulation of the result discussed in the preceding para-graph is that for an eigenvalue λ of A (and hence of T), a vector y ∈ Fn is aneigenvector of A corresponding to λ if and only if φ−1

β (y) is an eigenvector ofT corresponding to λ.

Thus we have reduced the problem of finding the eigenvectors of a linearoperator on a finite-dimensional vector space to the problem of finding theeigenvectors of a matrix. The next example illustrates this procedure.

Example 7

Let T be the linear operator on P2(R) defined in Example 5, and let β be thestandard ordered basis for P2(R). Recall that T has eigenvalues 1, 2, and 3and that

A = [T]β =

⎛⎝1 1 00 2 20 0 3

⎞⎠ .

We consider each eigenvalue separately.

Let λ1 = 1, and define

B1 = A − λ1I =

⎛⎝0 1 00 1 20 0 2

⎞⎠ .

Then

x =

⎛⎝x1

x2

x3

⎞⎠ ∈ R3


is an eigenvector corresponding to λ1 = 1 if and only if x �= 0 and x ∈ N(LB1);that is, x is a nonzero solution to the system

x2 = 0x2 + 2x3 = 0

2x3 = 0.

Notice that this system has three unknowns, x1, x2, and x3, but one of these,x1, does not actually appear in the system. Since the values of x1 do notaffect the system, we assign x1 a parametric value, say x1 = a, and solve thesystem for x2 and x3. Clearly, x2 = x3 = 0, and so the eigenvectors of Acorresponding to λ1 = 1 are of the form

a

⎛⎝100

⎞⎠ = ae1

for a �= 0. Consequently, the eigenvectors of T corresponding to λ1 = 1 areof the form

φ−1β (ae1) = aφ−1

β (e1) = a ·1 = a

for any a �= 0. Hence the nonzero constant polynomials are the eigenvectorsof T corresponding to λ1 = 1.

Next let λ2 = 2, and define

B2 = A − λ2I =

⎛⎝−1 1 00 0 20 0 1

⎞⎠ .

It is easily verified that

N(LB2) =

⎧⎨⎩a

⎛⎝110

⎞⎠: a ∈ R

⎫⎬⎭ ,

and hence the eigenvectors of T corresponding to λ2 = 2 are of the form

φ−1β

⎛⎝a

⎛⎝110

⎞⎠⎞⎠ = aφ−1β (e1 + e2) = a(1 + x)

for a �= 0.

Finally, consider λ3 = 3 and

B3 = A − λ3I =

⎛⎝−2 1 00 −1 20 0 0

⎞⎠ .


Since

N(LB3) =

⎧⎨⎩a

⎛⎝121

⎞⎠: a ∈ R

⎫⎬⎭ ,

the eigenvectors of T corresponding to λ3 = 3 are of the form

φ−1β

⎛⎝a

⎛⎝121

⎞⎠⎞⎠ = aφ−1β (e1 + 2e2 + e3) = a(1 + 2x + x2)

for a �= 0.

For each eigenvalue, select the corresponding eigenvector with a = 1 in thepreceding descriptions to obtain γ = {1, 1+x, 1+2x+x2}, which is an orderedbasis for P2(R) consisting of eigenvectors of T. Thus T is diagonalizable, and

[T]γ =

⎛⎝1 0 00 2 00 0 3

⎞⎠ . ♦

We close this section with a geometric description of how a linear operatorT acts on an eigenvector in the context of a vector space V over R. Let v bean eigenvector of T and λ be the corresponding eigenvalue. We can think ofW = span({v}), the one-dimensional subspace of V spanned by v, as a linein V that passes through 0 and v. For any w ∈ W, w = cv for some scalar c,and hence

T(w) = T(cv) = cT(v) = cλv = λw;

so T acts on the vectors in W by multiplying each such vector by λ. Thereare several possible ways for T to act on the vectors in W, depending on thevalue of λ. We consider several cases. (See Figure 5.2.)

Case 1. If λ > 1, then T moves vectors in W farther from 0 by a factorof λ.

Case 2. If λ = 1, then T acts as the identity operator on W.

Case 3. If 0 < λ < 1, then T moves vectors in W closer to 0 by a factorof λ.

Case 4. If λ = 0, then T acts as the zero transformation on W.

Case 5. If λ < 0, then T reverses the orientation of W; that is, T movesvectors in W from one side of 0 to the other.


��

�0

�

y �

T(y) Case 1: λ > 1

��

�0

�

y = T(y)Case 2: λ = 1

��

�0 �

T(y) �

y Case 3: 0 < λ < 1

��

�

0 = T(y)�

y Case 4: λ = 0

��

�0

�

y

�

T(y)

Case 5: λ < 0

Figure 5.2: The action of T on W = span({x}) when x is an eigenvector of T.

To illustrate these ideas, we consider the linear operators in Examples 3,4, and 2 of Section 2.1.

For the operator T on R2 defined by T(a1, a2) = (a1,−a2), the reflectionabout the x-axis, e1 and e2 are eigenvectors of T with corresponding eigen-values 1 and −1, respectively. Since e1 and e2 span the x-axis and the y-axis,respectively, T acts as the identity on the x-axis and reverses the orientationof the y-axis.

For the operator T on R2 defined by T(a1, a2) = (a1, 0), the projection onthe x-axis, e1 and e2 are eigenvectors of T with corresponding eigenvalues 1and 0, respectively. Thus, T acts as the identity on the x-axis and as the zerooperator on the y-axis.

Finally, we generalize Example 2 of this section by considering the oper-ator that rotates the plane through the angle θ, which is defined by

Tθ(a1, a2) = (a1 cos θ − a2 sin θ, a1 sin θ + a2 cos θ).

Suppose that 0 < θ < π. Then for any nonzero vector v, the vectors v andTθ(v) are not collinear, and hence Tθ maps no one-dimensional subspace ofR2 into itself. But this implies that Tθ has no eigenvectors and thereforeno eigenvalues. To confirm this conclusion, we note that the characteristicpolynomial of Tθ is

det(Tθ − tI) = det(

cos θ − t − sin θsin θ cos θ − t

)= t2 − (2 cos θ)t + 1,


which has no real zeros because, for 0 < θ < π, the discriminant 4 cos2 θ − 4is negative.

EXERCISES


(a) Every linear operator on an n-dimensional vector space has n dis-tinct eigenvalues.

(b) If a real matrix has one eigenvector, then it has an infinite numberof eigenvectors.

(c) There exists a square matrix with no eigenvectors.(d) Eigenvalues must be nonzero scalars.(e) Any two eigenvectors are linearly independent.(f) The sum of two eigenvalues of a linear operator T is also an eigen-

value of T.(g) Linear operators on infinite-dimensional vector spaces never have

eigenvalues.(h) An n × n matrix A with entries from a field F is similar to a

diagonal matrix if and only if there is a basis for Fn consisting ofeigenvectors of A.

(i) Similar matrices always have the same eigenvalues.(j) Similar matrices always have the same eigenvectors.(k) The sum of two eigenvectors of an operator T is always an eigen-

vector of T.

2. For each of the following linear operators T on a vector space V andordered bases β, compute [T]β , and determine whether β is a basisconsisting of eigenvectors of T.

(a) V = R2, T

(ab

)=(

10a − 6b17a − 10b

), and β =

{(12

),

(23

)}(b) V = P1(R), T(a + bx) = (6a − 6b) + (12a − 11b)x, and

β = {3 + 4x, 2 + 3x}

(c) V = R3, T

⎛⎝abc

⎞⎠ =

⎛⎝ 3a + 2b − 2c−4a − 3b + 2c

−c

⎞⎠, and

β =

⎧⎨⎩⎛⎝0

11

⎞⎠ ,

⎛⎝ 1−1

0

⎞⎠ ,

⎛⎝102

⎞⎠⎫⎬⎭(d) V = P2(R), T(a + bx + cx2) =

(−4a + 2b − 2c) − (7a + 3b + 7c)x + (7a + b + 5c)x2,

and β = {x − x2,−1 + x2,−1 − x + x2}


(e) V = P3(R), T(a + bx + cx2 + dx3) =

−d + (−c + d)x + (a + b − 2c)x2 + (−b + c − 2d)x3,

and β = {1 − x + x3, 1 + x2, 1, x + x2}(f) V = M2×2(R), T

(a bc d

)=(−7a − 4b + 4c − 4d b−8a − 4b + 5c − 4d d

), and

β ={(

1 01 0

),

(−1 20 0

),

(1 02 0

),

(−1 00 2

)}3. For each of the following matrices A ∈ Mn×n(F ),

(i) Determine all the eigenvalues of A.

(ii) For each eigenvalue λ of A, find the set of eigenvectors correspond-ing to λ.

(iii) If possible, find a basis for Fn consisting of eigenvectors of A.

(iv) If successful in finding such a basis, determine an invertible matrixQ and a diagonal matrix D such that Q−1AQ = D.

(a) A =(

1 23 2

)for F = R

(b) A =

⎛⎝ 0 −2 −3−1 1 −1

2 2 5

⎞⎠ for F = R

(c) A =(

i 12 −i

)for F = C

(d) A =

⎛⎝2 0 −14 1 −42 0 −1

⎞⎠ for F = R

4. For each linear operator T on V, find the eigenvalues of T and an orderedbasis β for V such that [T]β is a diagonal matrix.

(a) V = R2 and T(a, b) = (−2a + 3b,−10a + 9b)(b) V = R3 and T(a, b, c) = (7a− 4b + 10c, 4a− 3b + 8c,−2a + b − 2c)(c) V = R3 and T(a, b, c) = (−4a+3b−6c, 6a−7b+12c, 6a−6b+11c)(d) V = P1(R) and T(ax + b) = (−6a + 2b)x + (−6a + b)(e) V = P2(R) and T(f(x)) = xf ′(x) + f(2)x + f(3)(f) V = P3(R) and T(f(x)) = f(x) + f(2)x(g) V = P3(R) and T(f(x)) = xf ′(x) + f ′′(x) − f(2)

(h) V = M2×2(R) and T

(a bc d

)=(

d bc a

)


(i) V = M2×2(R) and T

(a bc d

)=(

c da b

)(j) V = M2×2(R) and T(A) = At + 2 · tr(A) · I2


6. Let T be a linear operator on a finite-dimensional vector space V, andlet β be an ordered basis for V. Prove that λ is an eigenvalue of T ifand only if λ is an eigenvalue of [T]β .

7. Let T be a linear operator on a finite-dimensional vector space V. Wedefine the determinant of T, denoted det(T), as follows: Choose anyordered basis β for V, and define det(T) = det([T]β).

(a) Prove that the preceding definition is independent of the choiceof an ordered basis for V. That is, prove that if β and γ are twoordered bases for V, then det([T]β) = det([T]γ).

(b) Prove that T is invertible if and only if det(T) �= 0.(c) Prove that if T is invertible, then det(T−1) = [det(T)]−1.(d) Prove that if U is also a linear operator on V, then det(TU) =

det(T) · det(U).(e) Prove that det(T− λIV) = det([T]β − λI) for any scalar λ and any

ordered basis β for V.

8. (a) Prove that a linear operator T on a finite-dimensional vector spaceis invertible if and only if zero is not an eigenvalue of T.

(b) Let T be an invertible linear operator. Prove that a scalar λ is aneigenvalue of T if and only if λ−1 is an eigenvalue of T−1.

(c) State and prove results analogous to (a) and (b) for matrices.

9. Prove that the eigenvalues of an upper triangular matrix M are thediagonal entries of M .

10. Let V be a finite-dimensional vector space, and let λ be any scalar.

(a) For any ordered basis β for V, prove that [λIV]β = λI.(b) Compute the characteristic polynomial of λIV.(c) Show that λIV is diagonalizable and has only one eigenvalue.

11. A scalar matrix is a square matrix of the form λI for some scalar λ;that is, a scalar matrix is a diagonal matrix in which all the diagonalentries are equal.

(a) Prove that if a square matrix A is similar to a scalar matrix λI,then A = λI.

(b) Show that a diagonalizable matrix having only one eigenvalue is ascalar matrix.


(c) Prove that(

1 10 1

)is not diagonalizable.

12. (a) Prove that similar matrices have the same characteristic polyno-mial.

(b) Show that the definition of the characteristic polynomial of a linearoperator on a finite-dimensional vector space V is independent ofthe choice of basis for V.

13. Let T be a linear operator on a finite-dimensional vector space V over afield F , let β be an ordered basis for V, and let A = [T]β . In referenceto Figure 5.1, prove the following.

(a) If v ∈ V and φβ(v) is an eigenvector of A corresponding to theeigenvalue λ, then v is an eigenvector of T corresponding to λ.

(b) If λ is an eigenvalue of A (and hence of T), then a vector y ∈ Fn

is an eigenvector of A corresponding to λ if and only if φ−1β (y) is

an eigenvector of T corresponding to λ.

14.† For any square matrix A, prove that A and At have the same charac-teristic polynomial (and hence the same eigenvalues).

15.† (a) Let T be a linear operator on a vector space V, and let x be aneigenvector of T corresponding to the eigenvalue λ. For any posi-tive integer m, prove that x is an eigenvector of Tm correspondingto the eigenvalue λm.

(b) State and prove the analogous result for matrices.

16. (a) Prove that similar matrices have the same trace. Hint: Use Exer-cise 13 of Section 2.3.

(b) How would you define the trace of a linear operator on a finite-dimensional vector space? Justify that your definition is well-defined.

17. Let T be the linear operator on Mn×n(R) defined by T(A) = At.

(a) Show that ±1 are the only eigenvalues of T.(b) Describe the eigenvectors corresponding to each eigenvalue of T.(c) Find an ordered basis β for M2×2(R) such that [T]β is a diagonal

matrix.(d) Find an ordered basis β for Mn×n(R) such that [T]β is a diagonal

matrix for n > 2.

18. Let A, B ∈ Mn×n(C).

(a) Prove that if B is invertible, then there exists a scalar c ∈ C suchthat A + cB is not invertible. Hint: Examine det(A + cB).


(b) Find nonzero 2×2 matrices A and B such that both A and A+cBare invertible for all c ∈ C.

19.† Let A and B be similar n × n matrices. Prove that there exists an n-dimensional vector space V, a linear operator T on V, and ordered basesβ and γ for V such that A = [T]β and B = [T]γ . Hint: Use Exercise 14of Section 2.5.

20. Let A be an n × n matrix with characteristic polynomial

f(t) = (−1)ntn + an−1tn−1 + · · · + a1t + a0.

Prove that f(0) = a0 = det(A). Deduce that A is invertible if and onlyif a0 �= 0.

21. Let A and f(t) be as in Exercise 20.

(a) Prove that f(t) = (A11− t)(A22− t) · · · (Ann − t)+ q(t), where q(t)is a polynomial of degree at most n−2. Hint: Apply mathematicalinduction to n.

(b) Show that tr(A) = (−1)n−1an−1.

22.† (a) Let T be a linear operator on a vector space V over the field F ,and let g(t) be a polynomial with coefficients from F . Prove thatif x is an eigenvector of T with corresponding eigenvalue λ, theng(T)(x) = g(λ)x. That is, x is an eigenvector of g(T) with corre-sponding eigenvalue g(λ).

(b) State and prove a comparable result for matrices.(c) Verify (b) for the matrix A in Exercise 3(a) with polynomial g(t) =

2t2 − t + 1, eigenvector x =(

23

), and corresponding eigenvalue

λ = 4.

23. Use Exercise 22 to prove that if f(t) is the characteristic polynomialof a diagonalizable linear operator T, then f(T) = T0, the zero opera-tor. (In Section 5.4 we prove that this result does not depend on thediagonalizability of T.)

24. Use Exercise 21(a) to prove Theorem 5.3.


26. Determine the number of distinct characteristic polynomials of matricesin M2×2(Z2).

Sec. 5.2 Diagonalizability 261

5.2 DIAGONALIZABILITY

In Section 5.1, we presented the diagonalization problem and observed thatnot all linear operators or matrices are diagonalizable. Although we are ableto diagonalize operators and matrices and even obtain a necessary and suf-ficient condition for diagonalizability (Theorem 5.1 p. 246), we have not yetsolved the diagonalization problem. What is still needed is a simple test todetermine whether an operator or a matrix can be diagonalized, as well as amethod for actually finding a basis of eigenvectors. In this section, we developsuch a test and method.

In Example 6 of Section 5.1, we obtained a basis of eigenvectors by choos-ing one eigenvector corresponding to each eigenvalue. In general, such aprocedure does not yield a basis, but the following theorem shows that anyset constructed in this manner is linearly independent.

Theorem 5.5. Let T be a linear operator on a vector space V, and letλ1, λ2, . . . , λk be distinct eigenvalues of T. If v1, v2, . . . , vk are eigenvectors ofT such that λi corresponds to vi (1 ≤ i ≤ k), then {v1, v2, . . . , vk} is linearlyindependent.

Proof. The proof is by mathematical induction on k. Suppose that k = 1.Then v1 �= 0 since v1 is an eigenvector, and hence {v1} is linearly independent.Now assume that the theorem holds for k − 1 distinct eigenvalues, wherek−1 ≥ 1, and that we have k eigenvectors v1, v2, . . . , vk corresponding to thedistinct eigenvalues λ1, λ2, . . . , λk. We wish to show that {v1, v2, . . . , vk} islinearly independent. Suppose that a1, a2, . . . , ak are scalars such that

a1v1 + a2v2 + · · · + akvk = 0 . (1)

Applying T − λkI to both sides of (1), we obtain

a1(λ1 − λk)v1 + a2(λ2 − λk)v2 + · · · + ak−1(λk−1 − λk)vk−1 = 0 .

By the induction hypothesis {v1, v2, . . . , vk−1} is linearly independent, andhence

a1(λ1 − λk) = a2(λ2 − λk) = · · · = ak−1(λk−1 − λk) = 0.

Since λ1, λ2, . . . , λk are distinct, it follows that λi −λk �= 0 for 1 ≤ i ≤ k− 1.So a1 = a2 = · · · = ak−1 = 0, and (1) therefore reduces to akvk = 0 . Butvk �= 0 and therefore ak = 0. Consequently a1 = a2 = · · · = ak = 0, and itfollows that {v1, v2, . . . , vk} is linearly independent.

Corollary. Let T be a linear operator on an n-dimensional vector spaceV. If T has n distinct eigenvalues, then T is diagonalizable.


Proof. Suppose that T has n distinct eigenvalues λ1, . . . , λn. For each ichoose an eigenvector vi corresponding to λi. By Theorem 5.5, {v1, . . . , vn}is linearly independent, and since dim(V) = n, this set is a basis for V. Thus,by Theorem 5.1 (p. 246), T is diagonalizable.

Example 1

Let

A =(

1 11 1

)∈ M2×2(R).

The characteristic polynomial of A (and hence of LA) is

det(A − tI) = det(

1 − t 11 1 − t

)= t(t − 2),

and thus the eigenvalues of LA are 0 and 2. Since LA is a linear operator on thetwo-dimensional vector space R2, we conclude from the preceding corollarythat LA (and hence A) is diagonalizable. ♦

The converse of Theorem 5.5 is false. That is, it is not true that if T isdiagonalizable, then it has n distinct eigenvalues. For example, the identityoperator is diagonalizable even though it has only one eigenvalue, namely,λ = 1.

We have seen that diagonalizability requires the existence of eigenvalues.Actually, diagonalizability imposes a stronger condition on the characteristicpolynomial.

Definition. A polynomial f(t) in P(F ) splits over F if there are scalarsc, a1, . . . , an (not necessarily distinct) in F such that

f(t) = c(t − a1)(t − a2) · · · (t − an).

For example, t2 − 1 = (t + 1)(t− 1) splits over R, but (t2 + 1)(t− 2) does notsplit over R because t2 +1 cannot be factored into a product of linear factors.However, (t2 + 1)(t− 2) does split over C because it factors into the product(t+ i)(t− i)(t−2). If f(t) is the characteristic polynomial of a linear operatoror a matrix over a field F , then the statement that f(t) splits is understoodto mean that it splits over F .

Theorem 5.6. The characteristic polynomial of any diagonalizable linearoperator splits.

Proof. Let T be a diagonalizable linear operator on the n-dimensionalvector space V, and let β be an ordered basis for V such that [T]β = D is a


diagonal matrix. Suppose that

D =

⎛⎜⎜⎜⎝λ1 0 · · · 00 λ2 · · · 0...

......

0 0 · · · λn

⎞⎟⎟⎟⎠ ,

and let f(t) be the characteristic polynomial of T. Then

f(t) = det(D − tI) = det

⎛⎜⎜⎜⎝λ1 − t 0 · · · 0

0 λ2 − t · · · 0...

......

0 0 · · · λn − t

⎞⎟⎟⎟⎠= (λ1 − t)(λ2 − t) · · · (λn − t) = (−1)n(t − λ1)(t − λ2) · · · (t − λn).

From this theorem, it is clear that if T is a diagonalizable linear operatoron an n-dimensional vector space that fails to have distinct eigenvalues, thenthe characteristic polynomial of T must have repeated zeros.

The converse of Theorem 5.6 is false; that is, the characteristic polynomialof T may split, but T need not be diagonalizable. (See Example 3, whichfollows.) The following concept helps us determine when an operator whosecharacteristic polynomial splits is diagonalizable.

Definition. Let λ be an eigenvalue of a linear operator or matrix withcharacteristic polynomial f(t). The (algebraic) multiplicity of λ is thelargest positive integer k for which (t − λ)k is a factor of f(t).

Example 2

Let

A =

⎛⎝3 1 00 3 40 0 4

⎞⎠ ,

which has characteristic polynomial f(t) = −(t − 3)2(t − 4). Hence λ = 3 isan eigenvalue of A with multiplicity 2, and λ = 4 is an eigenvalue of A withmultiplicity 1. ♦

If T is a diagonalizable linear operator on a finite-dimensional vector spaceV, then there is an ordered basis β for V consisting of eigenvectors of T. Weknow from Theorem 5.1 (p. 246) that [T]β is a diagonal matrix in which thediagonal entries are the eigenvalues of T. Since the characteristic polynomialof T is det([T]β − tI), it is easily seen that each eigenvalue of T must occuras a diagonal entry of [T]β exactly as many times as its multiplicity. Hence


β contains as many (linearly independent) eigenvectors corresponding to aneigenvalue as the multiplicity of that eigenvalue. So the number of linearlyindependent eigenvectors corresponding to a given eigenvalue is of interest indetermining whether an operator can be diagonalized. Recalling from Theo-rem 5.4 (p. 250) that the eigenvectors of T corresponding to the eigenvalueλ are the nonzero vectors in the null space of T − λI, we are led naturally tothe study of this set.

Definition. Let T be a linear operator on a vector space V, and letλ be an eigenvalue of T. Define Eλ = {x ∈ V : T(x) = λx} = N(T − λIV).The set Eλ is called the eigenspace of T corresponding to the eigenvalueλ. Analogously, we define the eigenspace of a square matrix A to be theeigenspace of LA.

Clearly, Eλ is a subspace of V consisting of the zero vector and the eigen-vectors of T corresponding to the eigenvalue λ. The maximum number oflinearly independent eigenvectors of T corresponding to the eigenvalue λ istherefore the dimension of Eλ. Our next result relates this dimension to themultiplicity of λ.

Theorem 5.7. Let T be a linear operator on a finite-dimensional vec-tor space V, and let λ be an eigenvalue of T having multiplicity m. Then1 ≤ dim(Eλ) ≤ m.

Proof. Choose an ordered basis {v1, v2, . . . , vp} for Eλ, extend it to an or-dered basis β = {v1, v2, . . . , vp, vp+1, . . . , vn} for V, and let A = [T]β . Observethat vi (1 ≤ i ≤ p) is an eigenvector of T corresponding to λ, and therefore

A =(

λIp BO C

).

By Exercise 21 of Section 4.3, the characteristic polynomial of T is

f(t) = det(A − tIn) = det(

(λ − t)Ip BO C − tIn−p

)= det((λ − t)Ip) det(C − tIn−p)

= (λ − t)pg(t),

where g(t) is a polynomial. Thus (λ − t)p is a factor of f(t), and hence themultiplicity of λ is at least p. But dim(Eλ) = p, and so dim(Eλ) ≤ m.

Example 3

Let T be the linear operator on P2(R) defined by T(f(x)) = f ′(x). Thematrix representation of T with respect to the standard ordered basis β for


P2(R) is

[T]β =

⎛⎝0 1 00 0 20 0 0

⎞⎠ .

Consequently, the characteristic polynomial of T is

det([T]β − tI) = det

⎛⎝−t 1 00 −t 20 0 −t

⎞⎠ = −t3.

Thus T has only one eigenvalue (λ = 0) with multiplicity 3. Solving T(f(x)) =f ′(x) = 0 shows that Eλ = N(T − λI) = N(T) is the subspace of P2(R) con-sisting of the constant polynomials. So {1} is a basis for Eλ, and thereforedim(Eλ) = 1. Consequently, there is no basis for P2(R) consisting of eigen-vectors of T, and therefore T is not diagonalizable. ♦Example 4


T

⎛⎝a1

a2

a3

⎞⎠ =

⎛⎝4a1 + a3

2a1 + 3a2 + 2a3

a1 + 4a3

⎞⎠ .

We determine the eigenspace of T corresponding to each eigenvalue. Let βbe the standard ordered basis for R3. Then

[T]β =

⎛⎝4 0 12 3 21 0 4

⎞⎠ ,

and hence the characteristic polynomial of T is

det([T]β − tI) = det

⎛⎝4 − t 0 12 3 − t 21 0 4 − t

⎞⎠ = −(t − 5)(t − 3)2.

So the eigenvalues of T are λ1 = 5 and λ2 = 3 with multiplicities 1 and 2,respectively.

Since

Eλ1 = N(T − λ1I) =

⎧⎨⎩⎛⎝x1

x2

x3

⎞⎠ ∈ R3 :

⎛⎝−1 0 12 −2 21 0 −1

⎞⎠⎛⎝x1

x2

x3

⎞⎠ =

⎛⎝000

⎞⎠⎫⎬⎭ ,


Eλ1 is the solution space of the system of linear equations

−x1 + x3 = 02x1 − 2x2 + 2x3 = 0x1 − x3 = 0.

It is easily seen (using the techniques of Chapter 3) that⎧⎨⎩⎛⎝1

21

⎞⎠⎫⎬⎭is a basis for Eλ1 . Hence dim(Eλ1) = 1.

Similarly, Eλ2 = N(T − λ2I) is the solution space of the system

x1 + x3 = 02x1 + 2x3 = 0x1 + x3 = 0.

Since the unknown x2 does not appear in this system, we assign it a para-metric value, say, x2 = s, and solve the system for x1 and x3, introducinganother parameter t. The result is the general solution to the system⎛⎝x1

x2

x3

⎞⎠ = s

⎛⎝010

⎞⎠+ t

⎛⎝−101

⎞⎠ , for s, t ∈ R.

It follows that ⎧⎨⎩⎛⎝0

10

⎞⎠ ,

⎛⎝−101

⎞⎠⎫⎬⎭is a basis for Eλ2 , and dim(Eλ2) = 2.

In this case, the multiplicity of each eigenvalue λi is equal to the dimensionof the corresponding eigenspace Eλi . Observe that the union of the two basesjust derived, namely, ⎧⎨⎩

⎛⎝121

⎞⎠ ,

⎛⎝010

⎞⎠ ,

⎛⎝−101

⎞⎠⎫⎬⎭ ,

is linearly independent and hence is a basis for R3 consisting of eigenvectorsof T. Consequently, T is diagonalizable. ♦


Examples 3 and 4 suggest that an operator whose characteristic polyno-mial splits is diagonalizable if and only if the dimension of each eigenspaceis equal to the multiplicity of the corresponding eigenvalue. This is indeedtrue, as we now show. We begin with the following lemma, which is a slightvariation of Theorem 5.5.

Lemma. Let T be a linear operator, and let λ1, λ2, . . . , λk be distincteigenvalues of T. For each i = 1, 2, . . . , k, let vi ∈ Eλi , the eigenspace corre-sponding to λi. If

v1 + v2 + · · · + vk = 0 ,

then vi = 0 for all i.

Proof. Suppose otherwise. By renumbering if necessary, suppose that, for1 ≤ m ≤ k, we have vi �= 0 for 1 ≤ i ≤ m, and vi = 0 for i > m. Then, foreach i ≤ m, vi is an eigenvector of T corresponding to λi and

v1 + v2 + · · · + vm = 0 .

But this contradicts Theorem 5.5, which states that these vi’s are linearlyindependent. We conclude, therefore, that vi = 0 for all i.

Theorem 5.8. Let T be a linear operator on a vector space V, and letλ1, λ2, . . . , λk be distinct eigenvalues of T. For each i = 1, 2, . . . , k, let Si

be a finite linearly independent subset of the eigenspace Eλi. Then S =

S1 ∪ S2 ∪ · · · ∪ Sk is a linearly independent subset of V.

Proof. Suppose that for each i

Si = {vi1, vi2, . . . , vini}.Then S = {vij : 1 ≤ j ≤ ni, and 1 ≤ i ≤ k}. Consider any scalars {aij} suchthat

k∑i=1

ni∑j=1

aijvij = 0 .

For each i, let

wi =ni∑

j=1

aijvij .

Then wi ∈ Eλifor each i, and w1 + · · · + wk = 0 . Therefore, by the lemma,

wi = 0 for all i. But each Si is linearly independent, and hence aij = 0 forall j. We conclude that S is linearly independent.


Theorem 5.8 tells us how to construct a linearly independent subset ofeigenvectors, namely, by collecting bases for the individual eigenspaces. Thenext theorem tells us when the resulting set is a basis for the entire space.

Theorem 5.9. Let T be a linear operator on a finite-dimensional vectorspace V such that the characteristic polynomial of T splits. Let λ1, λ2, . . . , λk

be the distinct eigenvalues of T. Then(a) T is diagonalizable if and only if the multiplicity of λi is equal to

dim(Eλi) for all i.

(b) If T is diagonalizable and βi is an ordered basis for Eλifor each i, then

β = β1∪β2∪· · ·∪βk is an ordered basis2 for V consisting of eigenvectorsof T.

Proof. For each i, let mi denote the multiplicity of λi, di = dim(Eλi), andn = dim(V).

First, suppose that T is diagonalizable. Let β be a basis for V consistingof eigenvectors of T. For each i, let βi = β ∩ Eλi

, the set of vectors in β thatare eigenvectors corresponding to λi, and let ni denote the number of vectorsin βi. Then ni ≤ di for each i because βi is a linearly independent subset ofa subspace of dimension di, and di ≤ mi by Theorem 5.7. The ni’s sum to nbecause β contains n vectors. The mi’s also sum to n because the degree ofthe characteristic polynomial of T is equal to the sum of the multiplicities ofthe eigenvalues. Thus

n =k∑

i=1

ni ≤k∑

i=1

di ≤k∑

i=1

mi = n.

It follows that

k∑i=1

(mi − di) = 0.

Since (mi − di) ≥ 0 for all i, we conclude that mi = di for all i.Conversely, suppose that mi = di for all i. We simultaneously show that

T is diagonalizable and prove (b). For each i, let βi be an ordered basis forEλi

, and let β = β1∪β2∪· · ·∪βk. By Theorem 5.8, β is linearly independent.Furthermore, since di = mi for all i, β contains

k∑i=1

di =k∑

i=1

mi = n

2We regard β1 ∪β2 ∪ · · ·∪βk as an ordered basis in the natural way—the vectorsin β1 are listed first (in the same order as in β1), then the vectors in β2 (in the sameorder as in β2), etc.


vectors. Therefore β is an ordered basis for V consisting of eigenvectors of V,and we conclude that T is diagonalizable.

This theorem completes our study of the diagonalization problem. Wesummarize our results.

Test for Diagonalization

Let T be a linear operator on an n-dimensional vector space V. Then Tis diagonalizable if and only if both of the following conditions hold.

1. The characteristic polynomial of T splits.2. For each eigenvalue λ of T, the multiplicity of λ equals n−rank(T−λI).

These same conditions can be used to test if a square matrix A is diagonal-izable because diagonalizability of A is equivalent to diagonalizability of theoperator LA.

If T is a diagonalizable operator and β1, β2, . . . , βk are ordered bases forthe eigenspaces of T, then the union β = β1 ∪β2 ∪ · · · ∪βk is an ordered basisfor V consisting of eigenvectors of T, and hence [T]β is a diagonal matrix.

When testing T for diagonalizability, it is usually easiest to choose a conve-nient basis α for V and work with B = [T]α. If the characteristic polynomialof B splits, then use condition 2 above to check if the multiplicity of eachrepeated eigenvalue of B equals n − rank(B − λI). (By Theorem 5.7, condi-tion 2 is automatically satisfied for eigenvalues of multiplicity 1.) If so, thenB, and hence T, is diagonalizable.

If T is diagonalizable and a basis β for V consisting of eigenvectors of Tis desired, then we first find a basis for each eigenspace of B. The union ofthese bases is a basis γ for Fn consisting of eigenvectors of B. Each vectorin γ is the coordinate vector relative to α of an eigenvector of T. The setconsisting of these n eigenvectors of T is the desired basis β.

Furthermore, if A is an n×n diagonalizable matrix, we can use the corol-lary to Theorem 2.23 (p. 115) to find an invertible n × n matrix Q and adiagonal n × n matrix D such that Q−1AQ = D. The matrix Q has as itscolumns the vectors in a basis of eigenvectors of A, and D has as its jthdiagonal entry the eigenvalue of A corresponding to the jth column of Q.

We now consider some examples illustrating the preceding ideas.

Example 5

We test the matrix

A =

⎛⎝3 1 00 3 00 0 4

⎞⎠ ∈ M3×3(R)

for diagonalizability.


The characteristic polynomial of A is det(A−tI) = −(t−4)(t−3)2, whichsplits, and so condition 1 of the test for diagonalization is satisfied. Also Ahas eigenvalues λ1 = 4 and λ2 = 3 with multiplicities 1 and 2, respectively.Since λ1 has multiplicity 1, condition 2 is satisfied for λ1. Thus we need onlytest condition 2 for λ2. Because

A − λ2I =

⎛⎝0 1 00 0 00 0 1

⎞⎠has rank 2, we see that 3 − rank(A − λ2I) = 1, which is not the multiplicityof λ2. Thus condition 2 fails for λ2, and A is therefore not diagonalizable.

♦Example 6

Let T be the linear operator on P2(R) defined by

T(f(x)) = f(1) + f ′(0)x + (f ′(0) + f ′′(0))x2.

We first test T for diagonalizability. Let α denote the standard ordered basisfor P2(R) and B = [T]α. Then

B =

⎛⎝1 1 10 1 00 1 2

⎞⎠ .

The characteristic polynomial of B, and hence of T, is −(t−1)2(t−2), whichsplits. Hence condition 1 of the test for diagonalization is satisfied. Also Bhas the eigenvalues λ1 = 1 and λ2 = 2 with multiplicities 2 and 1, respectively.Condition 2 is satisfied for λ2 because it has multiplicity 1. So we need onlyverify condition 2 for λ1 = 1. For this case,

3 − rank(B − λ1I) = 3 − rank

⎛⎝0 1 10 0 00 1 1

⎞⎠ = 3 − 1 = 2,

which is equal to the multiplicity of λ1. Therefore T is diagonalizable.

We now find an ordered basis γ for R3 of eigenvectors of B. We considereach eigenvalue separately.

The eigenspace corresponding to λ1 = 1 is

Eλ1 =

⎧⎨⎩⎛⎝x1

x2

x3

⎞⎠ ∈ R3 :

⎛⎝0 1 10 0 00 1 1

⎞⎠⎛⎝x1

x2

x3

⎞⎠ = 0

⎫⎬⎭ ,


which is the solution space for the system

x2 + x3 = 0,

and has

γ1 =

⎧⎨⎩⎛⎝1

00

⎞⎠ ,

⎛⎝ 0−1

1

⎞⎠⎫⎬⎭as a basis.

The eigenspace corresponding to λ2 = 2 is

Eλ2 =

⎧⎨⎩⎛⎝x1

x2

x3

⎞⎠ ∈ R3 :

⎛⎝−1 1 10 −1 00 1 0

⎞⎠⎛⎝x1

x2

x3

⎞⎠ = 0

⎫⎬⎭ ,

which is the solution space for the system

−x1 + x2 + x3 = 0x2 = 0,

and has

γ2 =

⎧⎨⎩⎛⎝1

01

⎞⎠⎫⎬⎭as a basis.

Let

γ = γ1 ∪ γ2 =

⎧⎨⎩⎛⎝1

00

⎞⎠ ,

⎛⎝ 0−1

1

⎞⎠ ,

⎛⎝101

⎞⎠⎫⎬⎭ .

Then γ is an ordered basis for R3 consisting of eigenvectors of B.

Finally, observe that the vectors in γ are the coordinate vectors relativeto α of the vectors in the set

β = {1,−x + x2, 1 + x2},

which is an ordered basis for P2(R) consisting of eigenvectors of T. Thus

[T]β =

⎛⎝1 0 00 1 00 0 2

⎞⎠ . ♦


Our next example is an application of diagonalization that is of interestin Section 5.3.

Example 7

Let

A =(

0 −21 3

).

We show that A is diagonalizable and find a 2×2 matrix Q such that Q−1AQis a diagonal matrix. We then show how to use this result to compute An forany positive integer n.

First observe that the characteristic polynomial of A is (t− 1)(t− 2), andhence A has two distinct eigenvalues, λ1 = 1 and λ2 = 2. By applying thecorollary to Theorem 5.5 to the operator LA, we see that A is diagonalizable.Moreover,

γ1 ={(−2

1

)}and γ2 =

{(−11

)}are bases for the eigenspaces Eλ1 and Eλ2 , respectively. Therefore

γ = γ1 ∪ γ2 ={(−2

1

),

(−11

)}is an ordered basis for R2 consisting of eigenvectors of R2. Let

Q =(−2 −1

1 1

),

the matrix whose columns are the vectors in γ. Then, by the corollary toTheorem 2.23 (p. 115),

D = Q−1AQ = [LA]β =(

1 00 2

).

To find An for any positive integer n, observe that A = QDQ−1. Therefore

An = (QDQ−1)n

= (QDQ−1)(QDQ−1) · · · (QDQ−1)

= QDnQ−1

= Q

(1n 00 2n

)Q−1

=(−2 −1

1 1

)(1 00 2n

)(−1 −11 2

)=(

2 − 2n 2 − 2n+1

−1 + 2n −1 + 2n+1

). ♦


We now consider an application that uses diagonalization to solve a systemof differential equations.

Systems of Differential Equations

Consider the system of differential equations

x′1 = 3x1 + x2 + x3

x′2 = 2x1 + 4x2 + 2x3

x′3 = −x1 − x2 + x3,

where, for each i, xi = xi(t) is a differentiable real-valued function of thereal variable t. Clearly, this system has a solution, namely, the solution inwhich each xi(t) is the zero function. We determine all of the solutions tothis system.

Let x : R → R3 be the function defined by

x(t) =

⎛⎜⎝x1(t)x2(t)x3(t)

⎞⎟⎠ .

The derivative of x, denoted x′, is defined by

x′(t) =

⎛⎜⎝x′1(t)

x′2(t)

x′3(t)

⎞⎟⎠ .

Let

A =

⎛⎝ 3 1 12 4 2

−1 −1 1

⎞⎠be the coefficient matrix of the given system, so that we can rewrite thesystem as the matrix equation x′ = Ax.

It can be verified that for

Q =

⎛⎝−1 0 −10 −1 −21 1 1

⎞⎠ and D =

⎛⎝2 0 00 2 00 0 4

⎞⎠ ,

we have Q−1AQ = D. Substitute A = QDQ−1 into x′ = Ax to obtainx′ = QDQ−1x or, equivalently, Q−1x′ = DQ−1x. The function y : R → R3

defined by y(t) = Q−1x(t) can be shown to be differentiable, and y′ = Q−1x′

(see Exercise 16). Hence the original system can be written as y′ = Dy.


Since D is a diagonal matrix, the system y′ = Dy is easy to solve. Setting

y(t) =

⎛⎜⎝y1(t)y2(t)y3(t)

⎞⎟⎠ ,

we can rewrite y′ = Dy as⎛⎜⎝y′1(t)

y′2(t)

y′3(t)

⎞⎟⎠ =

⎛⎝2 0 00 2 00 0 4

⎞⎠⎛⎜⎝y1(t)

y2(t)y3(t)

⎞⎟⎠ =

⎛⎜⎝2y1(t)2y2(t)4y3(t)

⎞⎟⎠.

The three equations

y′1 = 2y1

y′2 = 2y2

y′3 = 4y3

are independent of each other, and thus can be solved individually. It iseasily seen (as in Example 3 of Section 5.1) that the general solution to theseequations is y1(t) = c1e

2t, y2(t) = c2e2t, and y3(t) = c3e

4t, where c1, c2, andc3 are arbitrary constants. Finally,⎛⎜⎝x1(t)

x2(t)x3(t)

⎞⎟⎠ = x(t) = Qy(t) =

⎛⎝−1 0 −10 −1 −21 1 1

⎞⎠⎛⎜⎝c1e

2t

c2e2t

c3e4t

⎞⎟⎠

=

⎛⎝−c1e2t − c3e

4t

− c2e2t − 2c3e

4t

c1e2t + c2e

2t + c3e4t

⎞⎠yields the general solution of the original system. Note that this solution canbe written as

x(t) = e2t

⎡⎣c1

⎛⎝−101

⎞⎠+ c2

⎛⎝ 0−1

1

⎞⎠⎤⎦+ e4t

⎡⎣c3

⎛⎝−1−2

1

⎞⎠⎤⎦ .

The expressions in brackets are arbitrary vectors in Eλ1 and Eλ2 , respectively,where λ1 = 2 and λ2 = 4. Thus the general solution of the original system isx(t) = e2tz1 + e4tz2, where z1 ∈ Eλ1 and z2 ∈ Eλ2 . This result is generalizedin Exercise 15.

Direct Sums*

Let T be a linear operator on a finite-dimensional vector space V. Thereis a way of decomposing V into simpler subspaces that offers insight into the


behavior of T. This approach is especially useful in Chapter 7, where we studynondiagonalizable linear operators. In the case of diagonalizable operators,the simpler subspaces are the eigenspaces of the operator.

Definition. Let W1, W2, . . . ,Wk be subspaces of a vector space V. Wedefine the sum of these subspaces to be the set

{v1 + v2 + · · · + vk : vi ∈ Wi for 1 ≤ i ≤ k},

which we denote by W1 + W2 + · · · + Wk ork∑

i=1

Wi.

It is a simple exercise to show that the sum of subspaces of a vector spaceis also a subspace.

Example 8

Let V = R3, let W1 denote the xy-plane, and let W2 denote the yz-plane.Then R3 = W1 + W2 because, for any vector (a, b, c) ∈ R3, we have

(a, b, c) = (a, 0, 0) + (0, b, c),

where (a, 0, 0) ∈ W1 and (0, b, c) ∈ W2. ♦Notice that in Example 8 the representation of (a, b, c) as a sum of vectors

in W1 and W2 is not unique. For example, (a, b, c) = (a, b, 0) + (0, 0, c) isanother representation. Because we are often interested in sums for whichrepresentations are unique, we introduce a condition that assures this out-come. The definition of direct sum that follows is a generalization of thedefinition given in the exercises of Section 1.3.

Definition. Let W1, W2, . . . ,Wk be subspaces of a vector space V. Wecall V the direct sum of the subspaces W1, W2, . . . ,Wk and write V =W1 ⊕ W2 ⊕ · · · ⊕ Wk, if

V =k∑

i=1

Wi

and

Wj ∩∑i �=j

Wi = {0} for each j (1 ≤ j ≤ k).

Example 9

Let V = R4, W1 = {(a, b, 0, 0) : a, b,∈ R}, W2 = {(0, 0, c, 0) : c ∈ R}, andW3 = {(0, 0, 0, d) : d ∈ R}. For any (a, b, c, d) ∈ V,

(a, b, c, d) = (a, b, 0, 0) + (0, 0, c, 0) + (0, 0, 0, d) ∈ W1 + W2 + W3.


Thus

V =3∑

i=1

Wi.

To show that V is the direct sum of W1, W2, and W3, we must prove thatW1 ∩ (W2 + W3) = W2 ∩ (W1 + W3) = W3 ∩ (W1 + W2) = {0}. But theseequalities are obvious, and so V = W1 ⊕ W2 ⊕ W3. ♦

Our next result contains several conditions that are equivalent to thedefinition of a direct sum.

Theorem 5.10. Let W1, W2, . . . ,Wk be subspaces of a finite-dimensionalvector space V. The following conditions are equivalent.

(a) V = W1 ⊕ W2 ⊕ · · · ⊕ Wk.

(b) V =k∑

i=1

Wi and, for any vectors v1, v2, . . . , vk such that vi ∈ Wi

(1 ≤ i ≤ k), if v1 + v2 + · · · + vk = 0 , then vi = 0 for all i.(c) Each vector v ∈ V can be uniquely written as v = v1 + v2 + · · · + vk,

where vi ∈ Wi.(d) If γi is an ordered basis for Wi (1 ≤ i ≤ k), then γ1 ∪ γ2 ∪ · · · ∪ γk is an

ordered basis for V.(e) For each i = 1, 2, . . . , k, there exists an ordered basis γi for Wi such

that γ1 ∪ γ2 ∪ · · · ∪ γk is an ordered basis for V.

Proof. Assume (a). We prove (b). Clearly

V =k∑

i=1

Wi.

Now suppose that v1, v2, . . . , vk are vectors such that vi ∈ Wi for all i andv1 + v2 + · · · + vk = 0 . Then for any j

−vj =∑i �=j

vi ∈∑i �=j

Wi.

But −vj ∈ Wj and hence

−vj ∈ Wj ∩∑i �=j

Wi = {0}.

So vj = 0 , proving (b).Now assume (b). We prove (c). Let v ∈ V. By (b), there exist vectors

v1, v2, . . . , vk such that vi ∈ Wi and v = v1 + v2 + · · · + vk. We must show


that this representation is unique. Suppose also that v = w1 + w2 + · · ·+ wk,where wi ∈ Wi for all i. Then

(v1 − w1) + (v2 − w2) + · · · + (vk − wk) = 0 .

But vi − wi ∈ Wi for all i, and therefore vi − wi = 0 for all i by (b). Thusvi = wi for all i, proving the uniqueness of the representation.

Now assume (c). We prove (d). For each i, let γi be an ordered basis forWi. Since

V =k∑

i=1

Wi

by (c), it follows that γ1 ∪ γ2 ∪ · · · ∪ γk generates V. To show that thisset is linearly independent, consider vectors vij ∈ γi (j = 1, 2, . . . , mi andi = 1, 2, . . . , k) and scalars aij such that∑

i,j

aijvij = 0 .

For each i, set

wi =mi∑j=1

aijvij .

Then for each i, wi ∈ span(γi) = Wi and

w1 + w2 + · · · + wk =∑i,j

aijvij = 0 .

Since 0 ∈ Wi for each i and 0 + 0 + · · ·+ 0 = w1 + w2 + · · ·+ wk, (c) impliesthat wi = 0 for all i. Thus

0 = wi =mi∑j=1

aijvij

for each i. But each γi is linearly independent, and hence aij = 0 for all iand j. Consequently γ1 ∪ γ2 ∪ · · · ∪ γk is linearly independent and thereforeis a basis for V.

Clearly (e) follows immediately from (d).Finally, we assume (e) and prove (a). For each i, let γi be an ordered

basis for Wi such that γ1 ∪ γ2 ∪ · · · ∪ γk is an ordered basis for V. Then

V = span(γ1 ∪ γ2 ∪ · · · ∪ γk)


= span(γ1) + span(γ2) + · · · + span(γk) =k∑

i=1

Wi

by repeated applications of Exercise 14 of Section 1.4. Fix j (1 ≤ j ≤ k), andsuppose that, for some nonzero vector v ∈ V,

v ∈ Wj ∩∑i �=j

Wi.

Then

v ∈ Wj = span(γj) and v ∈∑i �=j

Wi = span

⎛⎝⋃i �=j

γi

⎞⎠ .

Hence v is a nontrivial linear combination of both γj and

⎛⎝⋃i �=j

γi

⎞⎠, so that

v can be expressed as a linear combination of γ1 ∪ γ2 ∪ · · · ∪ γk in more thanone way. But these representations contradict Theorem 1.8 (p. 43), and sowe conclude that

Wj ∩∑i �=j

Wi = {0},

proving (a).

With the aid of Theorem 5.10, we are able to characterize diagonalizabilityin terms of direct sums.

Theorem 5.11. A linear operator T on a finite-dimensional vector spaceV is diagonalizable if and only if V is the direct sum of the eigenspaces of T.

Proof. Let λ1, λ2, . . . , λk be the distinct eigenvalues of T.First suppose that T is diagonalizable, and for each i choose an ordered

basis γi for the eigenspace Eλi. By Theorem 5.9, γ1 ∪ γ2 ∪ · · · ∪ γk is a basis

for V, and hence V is a direct sum of the Eλi’s by Theorem 5.10.

Conversely, suppose that V is a direct sum of the eigenspaces of T. Foreach i, choose an ordered basis γi of Eλi . By Theorem 5.10, the unionγ1 ∪ γ2 ∪ · · · ∪ γk is a basis for V. Since this basis consists of eigenvectors ofT, we conclude that T is diagonalizable.

Example 10


T(a, b, c, d) = (a, b, 2c, 3d).


It is easily seen that T is diagonalizable with eigenvalues λ1 = 1, λ2 = 2,and λ3 = 3. Furthermore, the corresponding eigenspaces coincide with thesubspaces W1, W2, and W3 of Example 9. Thus Theorem 5.11 provides uswith another proof that R4 = W1 ⊕ W2 ⊕ W3. ♦

EXERCISES


(a) Any linear operator on an n-dimensional vector space that hasfewer than n distinct eigenvalues is not diagonalizable.

(b) Two distinct eigenvectors corresponding to the same eigenvalueare always linearly dependent.

(c) If λ is an eigenvalue of a linear operator T, then each vector in Eλ

is an eigenvector of T.(d) If λ1 and λ2 are distinct eigenvalues of a linear operator T, then

Eλ1 ∩ Eλ2 = {0}.(e) Let A ∈ Mn×n(F ) and β = {v1, v2, . . . , vn} be an ordered basis for

Fn consisting of eigenvectors of A. If Q is the n× n matrix whosejth column is vj (1 ≤ j ≤ n), then Q−1AQ is a diagonal matrix.

(f) A linear operator T on a finite-dimensional vector space is diago-nalizable if and only if the multiplicity of each eigenvalue λ equalsthe dimension of Eλ.

(g) Every diagonalizable linear operator on a nonzero vector space hasat least one eigenvalue.

The following two items relate to the optional subsection on direct sums.

(h) If a vector space is the direct sum of subspaces W1, W2, . . . ,Wk,then Wi ∩ Wj = {0} for i �= j.

(i) If

V =k∑

i=1

Wi and Wi ∩ Wj = {0} for i �= j,

then V = W1 ⊕ W2 ⊕ · · · ⊕ Wk.

2. For each of the following matrices A ∈ Mn×n(R), test A for diagonal-izability, and if A is diagonalizable, find an invertible matrix Q and adiagonal matrix D such that Q−1AQ = D.

(a)(

1 20 1

)(b)

(1 33 1

)(c)

(1 43 2

)

(d)

⎛⎝7 −4 08 −5 06 −6 3

⎞⎠ (e)

⎛⎝0 0 11 0 −10 1 1

⎞⎠ (f)

⎛⎝1 1 00 1 20 0 3

⎞⎠


(g)

⎛⎝ 3 1 12 4 2

−1 −1 1

⎞⎠3. For each of the following linear operators T on a vector space V, test

T for diagonalizability, and if T is diagonalizable, find a basis β for Vsuch that [T]β is a diagonal matrix.

(a) V = P3(R) and T is defined by T(f(x)) = f ′(x) + f ′′(x), respec-tively.

(b) V = P2(R) and T is defined by T(ax2 + bx + c) = cx2 + bx + a.(c) V = R3 and T is defined by

T

⎛⎝a1

a2

a3

⎞⎠ =

⎛⎝ a2

−a1

2a3

⎞⎠ .

(d) V = P2(R) and T is defined by T(f(x)) = f(0) + f(1)(x + x2).(e) V = C2 and T is defined by T(z, w) = (z + iw, iz + w).(f) V = M2×2(R) and T is defined by T(A) = At.

4. Prove the matrix version of the corollary to Theorem 5.5: If A ∈Mn×n(F ) has n distinct eigenvalues, then A is diagonalizable.

5. State and prove the matrix version of Theorem 5.6.

6. (a) Justify the test for diagonalizability and the method for diagonal-ization stated in this section.

(b) Formulate the results in (a) for matrices.

7. For

A =(

1 42 3

)∈ M2×2(R),

find an expression for An, where n is an arbitrary positive integer.

8. Suppose that A ∈ Mn×n(F ) has two distinct eigenvalues, λ1 and λ2,and that dim(Eλ1) = n − 1. Prove that A is diagonalizable.

9. Let T be a linear operator on a finite-dimensional vector space V, andsuppose there exists an ordered basis β for V such that [T]β is an uppertriangular matrix.

(a) Prove that the characteristic polynomial for T splits.(b) State and prove an analogous result for matrices.

The converse of (a) is treated in Exercise 32 of Section 5.4.


10. Let T be a linear operator on a finite-dimensional vector space V withthe distinct eigenvalues λ1, λ2, . . . , λk and corresponding multiplicitiesm1, m2, . . . , mk. Suppose that β is a basis for V such that [T]β is anupper triangular matrix. Prove that the diagonal entries of [T]β areλ1, λ2, . . . , λk and that each λi occurs mi times (1 ≤ i ≤ k).

11. Let A be an n × n matrix that is similar to an upper triangular ma-trix and has the distinct eigenvalues λ1, λ2, . . . , λk with correspondingmultiplicities m1, m2, . . . , mk. Prove the following statements.

(a) tr(A) =k∑

i=1

miλi

(b) det(A) = (λ1)m1(λ2)m2 · · · (λk)mk .

12. Let T be an invertible linear operator on a finite-dimensional vectorspace V.

(a) Recall that for any eigenvalue λ of T, λ−1 is an eigenvalue of T−1

(Exercise 8 of Section 5.1). Prove that the eigenspace of T corre-sponding to λ is the same as the eigenspace of T−1 correspondingto λ−1.

(b) Prove that if T is diagonalizable, then T−1 is diagonalizable.

13. Let A ∈ Mn×n(F ). Recall from Exercise 14 of Section 5.1 that A andAt have the same characteristic polynomial and hence share the sameeigenvalues with the same multiplicities. For any eigenvalue λ of A andAt, let Eλ and E′

λ denote the corresponding eigenspaces for A and At,respectively.

(a) Show by way of example that for a given common eigenvalue, thesetwo eigenspaces need not be the same.

(b) Prove that for any eigenvalue λ, dim(Eλ) = dim(E′λ).

(c) Prove that if A is diagonalizable, then At is also diagonalizable.

14. Find the general solution to each system of differential equations.

(a)x′ = x + yy′ = 3x − y

(b)x′

1 = 8x1 + 10x2

x′2 = −5x1 − 7x2

(c)x′

1 = x1 + x3

x′2 = x2 + x3

x′3 = 2x3

15. Let

A =

⎛⎜⎜⎜⎝a11 a12 · · · a1n

a21 a22 · · · a2n

......

...an1 an2 · · · ann

⎞⎟⎟⎟⎠


be the coefficient matrix of the system of differential equations

x′1 = a11x1 + a12x2 + · · · + a1nxn

x′2 = a21x1 + a22x2 + · · · + a2nxn

...x′

n = an1x1 + an2x2 + · · · + annxn.

Suppose that A is diagonalizable and that the distinct eigenvalues of Aare λ1, λ2, . . . , λk. Prove that a differentiable function x : R → Rn is asolution to the system if and only if x is of the form

x(t) = eλ1tz1 + eλ2tz2 + · · · + eλktzk,

where zi ∈ Eλifor i = 1, 2, . . . , k. Use this result to prove that the set

of solutions to the system is an n-dimensional real vector space.

16. Let C ∈ Mm×n(R), and let Y be an n × p matrix of differentiablefunctions. Prove (CY )′ = CY ′, where (Y ′)ij = Y ′

ij for all i, j.

Exercises 17 through 19 are concerned with simultaneous diagonalization.

Definitions. Two linear operators T and U on a finite-dimensional vectorspace V are called simultaneously diagonalizable if there exists an orderedbasis β for V such that both [T]β and [U]β are diagonal matrices. Similarly,A, B ∈ Mn×n(F ) are called simultaneously diagonalizable if there existsan invertible matrix Q ∈ Mn×n(F ) such that both Q−1AQ and Q−1BQ arediagonal matrices.

17. (a) Prove that if T and U are simultaneously diagonalizable linearoperators on a finite-dimensional vector space V, then the matrices[T]β and [U]β are simultaneously diagonalizable for any orderedbasis β.

(b) Prove that if A and B are simultaneously diagonalizable matrices,then LA and LB are simultaneously diagonalizable linear operators.

18. (a) Prove that if T and U are simultaneously diagonalizable operators,then T and U commute (i.e., TU = UT).

(b) Show that if A and B are simultaneously diagonalizable matrices,then A and B commute.

The converses of (a) and (b) are established in Exercise 25 of Section 5.4.

19. Let T be a diagonalizable linear operator on a finite-dimensional vectorspace, and let m be any positive integer. Prove that T and Tm aresimultaneously diagonalizable.

Exercises 20 through 23 are concerned with direct sums.

Sec. 5.3 Matrix Limits and Markov Chains 283

20. Let W1, W2, . . . ,Wk be subspaces of a finite-dimensional vector space Vsuch that

k∑i=1

Wi = V.

Prove that V is the direct sum of W1, W2, . . . ,Wk if and only if

dim(V) =k∑

i=1

dim(Wi).

21. Let V be a finite-dimensional vector space with a basis β, and letβ1, β2, . . . , βk be a partition of β (i.e., β1, β2, . . . , βk are subsets of βsuch that β = β1 ∪ β2 ∪ · · · ∪ βk and βi ∩ βj = ∅ if i �= j). Prove thatV = span(β1) ⊕ span(β2) ⊕ · · · ⊕ span(βk).

22. Let T be a linear operator on a finite-dimensional vector space V, andsuppose that the distinct eigenvalues of T are λ1, λ2, . . . , λk. Prove that

span({x ∈ V : x is an eigenvector of T}) = Eλ1 ⊕ Eλ2 ⊕ · · · ⊕ Eλk.

23. Let W1, W2, K1, K2, . . . ,Kp, M1, M2, . . . ,Mq be subspaces of a vectorspace V such that W1 = K1⊕K2⊕· · ·⊕Kp and W2 = M1⊕M2⊕· · ·⊕Mq.Prove that if W1 ∩ W2 = {0}, then

W1 + W2 = W1 ⊕ W2 = K1 ⊕ K2 ⊕ · · · ⊕ Kp ⊕ M1 ⊕ M2 ⊕ · · · ⊕ Mq.

5.3∗ MATRIX LIMITS AND MARKOV CHAINS

In this section, we apply what we have learned thus far in Chapter 5 to studythe limit of a sequence of powers A, A2, . . . , An, . . ., where A is a squarematrix with complex entries. Such sequences and their limits have practicalapplications in the natural and social sciences.

We assume familiarity with limits of sequences of real numbers. Thelimit of a sequence of complex numbers {zm : m = 1, 2, . . .} can be definedin terms of the limits of the sequences of the real and imaginary parts: Ifzm = rm + ism, where rm and sm are real numbers, and i is the imaginarynumber such that i2 = −1, then

limm→∞ zm = lim

m→∞ rm + i limm→∞ sm,

provided that limm→∞ rm and lim

m→∞ sm exist.


Definition. Let L, A1, A2, . . . be n× p matrices having complex entries.The sequence A1, A2, . . . is said to converge to the n × p matrix L, calledthe limit of the sequence, if

limm→∞(Am)ij = Lij

for all 1 ≤ i ≤ n and 1 ≤ j ≤ p. To designate that L is the limit of thesequence, we write

limm→∞Am = L.

Example 1

If

Am =

⎛⎜⎜⎜⎜⎝1 − 1

m

(− 3

4

)m3m2

m2+1 + i

(2m+1m−1

)(

i2

)m

2(

1 + 1m

)m

⎞⎟⎟⎟⎟⎠ ,

then

limm→∞Am =

(1 0 3 + 2i0 2 e

),

where e is the base of the natural logarithm. ♦A simple, but important, property of matrix limits is contained in the next

theorem. Note the analogy with the familiar property of limits of sequencesof real numbers that asserts that if lim

m→∞ am exists, then

limm→∞ cam = c

(lim

m→∞ am

).

Theorem 5.12. Let A1, A2, . . . be a sequence of n × p matrices withcomplex entries that converges to the matrix L. Then for any P ∈ Mr×n(C)and Q ∈ Mp×s(C),

limm→∞PAm = PL and lim

m→∞AmQ = LQ.

Proof. For any i (1 ≤ i ≤ r) and j (1 ≤ j ≤ p),

limm→∞(PAm)ij = lim

m→∞

n∑k=1

Pik(Am)kj


=n∑

k=1

Pik · limm→∞(Am)kj =

n∑k=1

PikLkj = (PL)ij .

Hence limm→∞PAm = PL. The proof that lim

m→∞AmQ = LQ is similar.

Corollary. Let A ∈ Mn×n(C) be such that limm→∞Am = L. Then for any

invertible matrix Q ∈ Mn×n(C),

limm→∞(QAQ−1)m = QLQ−1.

Proof. Since

(QAQ−1)m = (QAQ−1)(QAQ−1) · · · (QAQ−1) = QAmQ−1,

we have

limm→∞(QAQ−1)m = lim

m→∞QAmQ−1 = Q(

limm→∞Am

)Q−1 = QLQ−1

by applying Theorem 5.12 twice.

In the discussion that follows, we frequently encounter the set

S = {λ ∈ C : |λ| < 1 or λ = 1}.

Geometrically, this set consists of the complex number 1 and the interior ofthe unit disk (the disk of radius 1 centered at the origin). This set is ofinterest because if λ is a complex number, then lim

m→∞λn exists if and onlyλ ∈ S. This fact, which is obviously true if λ is real, can be shown to be truefor complex numbers also.

The following important result gives necessary and sufficient conditionsfor the existence of the type of limit under consideration.

Theorem 5.13. Let A be a square matrix with complex entries. Thenlim

m→∞Am exists if and only if both of the following conditions hold.

(a) Every eigenvalue of A is contained in S.(b) If 1 is an eigenvalue of A, then the dimension of the eigenspace corre-

sponding to 1 equals the multiplicity of 1 as an eigenvalue of A.

One proof of this theorem, which relies on the theory of Jordan canonicalforms (Section 7.2), can be found in Exercise 19 of Section 7.2. A secondproof, which makes use of Schur’s theorem (Theorem 6.14 of Section 6.4),can be found in the article by S. H. Friedberg and A. J. Insel, “Convergenceof matrix powers,” Int. J. Math. Educ. Sci. Technol., 1992, Vol. 23, no. 5,pp. 765-769.


The necessity of condition (a) is easily justified. For suppose that λ is aneigenvalue of A such that λ /∈ S. Let v be an eigenvector of A correspondingto λ. Regarding v as an n × 1 matrix, we see that

limm→∞(Amv) =

(lim

m→∞Am)

v = Lv

by Theorem 5.12, where L = limm→∞Am. But lim

m→∞(Amv) = limm→∞(λmv)

diverges because limm→∞λm does not exist. Hence if lim

m→∞Am exists, then

condition (a) of Theorem 5.13 must hold.Although we are unable to prove the necessity of condition (b) here, we

consider an example for which this condition fails. Observe that the charac-teristic polynomial for the matrix

B =(

1 10 1

)is (t − 1)2, and hence B has eigenvalue λ = 1 with multiplicity 2. It caneasily be verified that dim(Eλ) = 1, so that condition (b) of Theorem 5.13is violated. A simple mathematical induction argument can be used to showthat

Bm =(

1 m0 1

),

and therefore that limm→∞Bm does not exist. We see in Chapter 7 that if A

is a matrix for which condition (b) fails, then A is similar to a matrix whoseupper left 2 × 2 submatrix is precisely this matrix B.

In most of the applications involving matrix limits, the matrix is diag-onalizable, and so condition (b) of Theorem 5.13 is automatically satisfied.In this case, Theorem 5.13 reduces to the following theorem, which can beproved using our previous results.

Theorem 5.14. Let A ∈ Mn×n(C) satisfy the following two conditions.

(i) Every eigenvalue of A is contained in S.

(ii) A is diagonalizable.

Then limm→∞Am exists.

Proof. Since A is diagonalizable, there exists an invertible matrix Q suchthat Q−1AQ = D is a diagonal matrix. Suppose that

D =

⎛⎜⎜⎜⎝λ1 0 · · · 00 λ2 · · · 0...

......

0 0 · · · λn

⎞⎟⎟⎟⎠ .


Because λ1, λ2, . . . , λn are the eigenvalues of A, condition (i) requires that foreach i, either λi = 1 or |λi| < 1. Thus

limm→∞λi

m =

{1 if λi = 10 otherwise.

But since

Dm =

⎛⎜⎜⎜⎝λ1

m 0 · · · 00 λ2

m · · · 0...

......

0 0 · · · λnm

⎞⎟⎟⎟⎠ ,

the sequence D, D2, . . . converges to a limit L. Hence

limm→∞Am = lim

m→∞(QDQ−1)m = QLQ−1

by the corollary to Theorem 5.12.

The technique for computing limm→∞Am used in the proof of Theorem 5.14

can be employed in actual computations, as we now illustrate. Let

A =

⎛⎜⎜⎝74 − 9

4 − 154

34

74

34

34 − 9

4 − 114

⎞⎟⎟⎠ .

Using the methods in Sections 5.1 and 5.2, we obtain

Q =

⎛⎝ 1 3 −1−3 −2 1

2 3 −1

⎞⎠ and D =

⎛⎜⎝1 0 00 − 1

2 00 0 1

4

⎞⎟⎠such that Q−1AQ = D. Hence

limm→∞Am = lim

m→∞(QDQ−1)m = limm→∞QDmQ−1 = Q

(lim

m→∞Dm)

Q−1

=

⎛⎝ 1 3 −1−3 −2 1

2 3 −1

⎞⎠⎡⎢⎣ lim

m→∞

⎛⎜⎝1 0 00 (− 1

2 )m 0

0 0 ( 14 )m

⎞⎟⎠⎤⎥⎦⎛⎝−1 0 1−1 1 2−5 3 7

⎞⎠

=

⎛⎝ 1 3 −1−3 −2 1

2 3 −1

⎞⎠⎛⎝1 0 00 0 00 0 0

⎞⎠⎛⎝−1 0 1−1 1 2−5 3 7

⎞⎠ =

⎛⎝−1 0 13 0 −3

−2 0 2

⎞⎠ .


Next, we consider an application that uses the limit of powers of a ma-trix. Suppose that the population of a certain metropolitan area remainsconstant but there is a continual movement of people between the city andthe suburbs. Specifically, let the entries of the following matrix A representthe probabilities that someone living in the city or in the suburbs on January1 will be living in each region on January 1 of the next year.

Currently Currentlyliving in living inthe city the suburbs

Living next year in the cityLiving next year in the suburbs

(0.90 0.020.10 0.98

)= A

For instance, the probability that someone living in the city (on January 1)will be living in the suburbs next year (on January 1) is 0.10. Notice thatsince the entries of A are probabilities, they are nonnegative. Moreover, theassumption of a constant population in the metropolitan area requires thatthe sum of the entries of each column of A be 1.

Any square matrix having these two properties (nonnegative entries andcolumns that sum to 1) is called a transition matrix or a stochastic ma-trix. For an arbitrary n × n transition matrix M , the rows and columnscorrespond to n states, and the entry Mij represents the probability of mov-ing from state j to state i in one stage.

In our example, there are two states (residing in the city and residing inthe suburbs). So, for example, A21 is the probability of moving from thecity to the suburbs in one stage, that is, in one year. We now determine the

City ��

��

City

Suburbs

��

��Suburbs

0.90

0.10

0.10

0.98

Figure 5.3

probability that a city resident will be living in the suburbs after 2 years.There are two different ways in which such a move can be made: remainingin the city for 1 year and then moving to the suburbs, or moving to thesuburbs during the first year and remaining there the second year. (See


Figure 5.3.) The probability that a city dweller remains in the city for thefirst year is 0.90, whereas the probability that the city dweller moves to thesuburbs during the first year is 0.10. Hence the probability that a city dwellerstays in the city for the first year and then moves to the suburbs during thesecond year is the product (0.90)(0.10). Likewise, the probability that a citydweller moves to the suburbs in the first year and remains in the suburbsduring the second year is the product (0.10)(0.98). Thus the probability thata city dweller will be living in the suburbs after 2 years is the sum of theseproducts, (0.90)(0.10) + (0.10)(0.98) = 0.188. Observe that this number isobtained by the same calculation as that which produces (A2)21, and hence(A2)21 represents the probability that a city dweller will be living in thesuburbs after 2 years. In general, for any transition matrix M , the entry(Mm)ij represents the probability of moving from state j to state i in mstages.

Suppose additionally that 70% of the 2000 population of the metropolitanarea lived in the city and 30% lived in the suburbs. We record these data asa column vector:

Proportion of city dwellersProportion of suburb residents

(0.700.30

)= P.

Notice that the rows of P correspond to the states of residing in the city andresiding in the suburbs, respectively, and that these states are listed in thesame order as the listing in the transition matrix A. Observe also that thecolumn vector P contains nonnegative entries that sum to 1; such a vector iscalled a probability vector. In this terminology, each column of a transitionmatrix is a probability vector. It is often convenient to regard the entries of atransition matrix or a probability vector as proportions or percentages insteadof probabilities, as we have already done with the probability vector P .

In the vector AP , the first coordinate is the sum (0.90)(0.70)+(0.02)(0.30).The first term of this sum, (0.90)(0.70), represents the proportion of the 2000metropolitan population that remained in the city during the next year, andthe second term, (0.02)(0.30), represents the proportion of the 2000 metropoli-tan population that moved into the city during the next year. Hence the firstcoordinate of AP represents the proportion of the metropolitan populationthat was living in the city in 2001. Similarly, the second coordinate of

AP =(

0.6360.364

)represents the proportion of the metropolitan population that was living inthe suburbs in 2001. This argument can be easily extended to show that thecoordinates of

A2P = A(AP ) =(

0.579680.42032

)


represent the proportions of the metropolitan population that were livingin each location in 2002. In general, the coordinates of AmP represent theproportion of the metropolitan population that will be living in the city andsuburbs, respectively, after m stages (m years after 2000).

Will the city eventually be depleted if this trend continues? In view ofthe preceding discussion, it is natural to define the eventual proportion ofthe city dwellers and suburbanites to be the first and second coordinates,respectively, of lim

m→∞AmP . We now compute this limit. It is easily shown

that A is diagonalizable, and so there is an invertible matrix Q and a diagonalmatrix D such that Q−1AQ = D. In fact,

Q =

⎛⎝ 16 − 1

6

56

16

⎞⎠ and D =(

1 00 0.88

).

Therefore

L = limm→∞Am = lim

m→∞QDmQ−1 = Q

(1 00 0

)Q−1 =

⎛⎝ 16

16

56

56

⎞⎠ .

Consequently

limm→∞AmP = LP =

⎛⎝ 16

56

⎞⎠ .

Thus, eventually, 16 of the population will live in the city and 5

6 will live in thesuburbs each year. Note that the vector LP satisfies A(LP ) = LP . HenceLP is both a probability vector and an eigenvector of A corresponding tothe eigenvalue 1. Since the eigenspace of A corresponding to the eigenvalue1 is one-dimensional, there is only one such vector, and LP is independentof the initial choice of probability vector P . (See Exercise 15.) For example,had the 2000 metropolitan population consisted entirely of city dwellers, thelimiting outcome would be the same.

In analyzing the city–suburb problem, we gave probabilistic interpreta-tions of A2 and AP , showing that A2 is a transition matrix and AP is aprobability vector. In fact, the product of any two transition matrices is atransition matrix, and the product of any transition matrix and probabilityvector is a probability vector. A proof of these facts is a simple corollaryof the next theorem, which characterizes transition matrices and probabilityvectors.

Theorem 5.15. Let M be an n×n matrix having real nonnegative entries,let v be a column vector in Rn having nonnegative coordinates, and let u ∈ Rn

be the column vector in which each coordinate equals 1. Then


(a) M is a transition matrix if and only if M tu = u;(b) v is a probability vector if and only if utv = (1).

Proof. Exercise.

Corollary.(a) The product of two n × n transition matrices is an n × n transition

matrix. In particular, any power of a transition matrix is a transitionmatrix.

(b) The product of a transition matrix and a probability vector is a prob-ability vector.

Proof. Exercise.

The city–suburb problem is an example of a process in which elements ofa set are each classified as being in one of several fixed states that can switchover time. In general, such a process is called a stochastic process. Theswitching to a particular state is described by a probability, and in generalthis probability depends on such factors as the state in question, the timein question, some or all of the previous states in which the object has been(including the current state), and the states that other objects are in or havebeen in.

For instance, the object could be an American voter, and the state of theobject could be his or her preference of political party; or the object couldbe a molecule of H2O, and the states could be the three physical states inwhich H2O can exist (solid, liquid, and gas). In these examples, all four ofthe factors mentioned above influence the probability that an object is in aparticular state at a particular time.

If, however, the probability that an object in one state changes to a differ-ent state in a fixed interval of time depends only on the two states (and not onthe time, earlier states, or other factors), then the stochastic process is calleda Markov process. If, in addition, the number of possible states is finite,then the Markov process is called a Markov chain. We treated the city–suburb example as a two-state Markov chain. Of course, a Markov process isusually only an idealization of reality because the probabilities involved arealmost never constant over time.

With this in mind, we consider another Markov chain. A certain com-munity college would like to obtain information about the likelihood thatstudents in various categories will graduate. The school classifies a studentas a sophomore or a freshman depending on the number of credits that thestudent has earned. Data from the school indicate that, from one fall semesterto the next, 40% of the sophomores will graduate, 30% will remain sopho-mores, and 30% will quit permanently. For freshmen, the data show that10% will graduate by next fall, 50% will become sophomores, 20% will re-main freshmen, and 20% will quit permanently. During the present year,


50% of the students at the school are sophomores and 50% are freshmen. As-suming that the trend indicated by the data continues indefinitely, the schoolwould like to know

1. the percentage of the present students who will graduate, the percentagewho will be sophomores, the percentage who will be freshmen, and thepercentage who will quit school permanently by next fall;

2. the same percentages as in item 1 for the fall semester two years hence;and

3. the probability that one of its present students will eventually graduate.

The preceding paragraph describes a four-state Markov chain with thefollowing states:

1. having graduated2. being a sophomore3. being a freshman4. having quit permanently.

The given data provide us with the transition matrix

A =

⎛⎜⎜⎝1 0.4 0.1 00 0.3 0.5 00 0 0.2 00 0.3 0.2 1

⎞⎟⎟⎠of the Markov chain. (Notice that students who have graduated or have quitpermanently are assumed to remain indefinitely in those respective states.Thus a freshman who quits the school and returns during a later semesteris not regarded as having changed states—the student is assumed to haveremained in the state of being a freshman during the time he or she was notenrolled.) Moreover, we are told that the present distribution of students ishalf in each of states 2 and 3 and none in states 1 and 4. The vector

P =

⎛⎜⎜⎝0

0.50.5

0

⎞⎟⎟⎠that describes the initial probability of being in each state is called the initialprobability vector for the Markov chain.

To answer question 1, we must determine the probabilities that a presentstudent will be in each state by next fall. As we have seen, these probabilitiesare the coordinates of the vector

AP =

⎛⎜⎜⎝1 0.4 0.1 00 0.3 0.5 00 0 0.2 00 0.3 0.2 1

⎞⎟⎟⎠⎛⎜⎜⎝

00.50.5

0

⎞⎟⎟⎠ =

⎛⎜⎜⎝0.250.400.100.25

⎞⎟⎟⎠ .


Hence by next fall, 25% of the present students will graduate, 40% will besophomores, 10% will be freshmen, and 25% will quit the school permanently.Similarly,

A2P = A(AP ) =

⎛⎜⎜⎝1 0.4 0.1 00 0.3 0.5 00 0 0.2 00 0.3 0.2 1

⎞⎟⎟⎠⎛⎜⎜⎝

0.250.400.100.25

⎞⎟⎟⎠ =

⎛⎜⎜⎝0.420.170.020.39

⎞⎟⎟⎠provides the information needed to answer question 2: within two years 42%of the present students will graduate, 17% will be sophomores, 2% will befreshmen, and 39% will quit school.

Finally, the answer to question 3 is provided by the vector LP , whereL = lim

m→∞Am. For the matrices

Q =

⎛⎜⎜⎝1 4 19 00 −7 −40 00 0 8 00 3 13 1

⎞⎟⎟⎠ and D =

⎛⎜⎜⎝1 0 0 00 0.3 0 00 0 0.2 00 0 0 1

⎞⎟⎟⎠ ,

we have Q−1AQ = D. Thus

L = limm→∞Am = Q

(lim

m→∞Dm)

Q−1

=

⎛⎜⎜⎝1 4 19 00 −7 −40 00 0 8 00 3 13 1

⎞⎟⎟⎠⎛⎜⎜⎝

1 0 0 00 0 0 00 0 0 00 0 0 1

⎞⎟⎟⎠⎛⎜⎜⎜⎜⎜⎝

1 47

2756 0

0 − 17 − 5

7 0

0 0 18 0

0 37

2956 1

⎞⎟⎟⎟⎟⎟⎠

=

⎛⎜⎜⎜⎝1 4

72756 0

0 0 0 00 0 0 00 3

72956 1

⎞⎟⎟⎟⎠ .

So

LP =

⎛⎜⎜⎜⎝1 4

72756 0

0 0 0 00 0 0 00 3

72956 1

⎞⎟⎟⎟⎠⎛⎜⎜⎝

00.50.5

0

⎞⎟⎟⎠ =

⎛⎜⎜⎜⎝59112

0053112

⎞⎟⎟⎟⎠ ,

and hence the probability that one of the present students will graduate is 59112 .


In the preceding two examples, we saw that limm→∞AmP , where A is the

transition matrix and P is the initial probability vector of the Markov chain,gives the eventual proportions in each state. In general, however, the limit ofpowers of a transition matrix need not exist. For example, if

M =(

0 11 0

),

then limm→∞Mm does not exist because odd powers of M equal M and even

powers of M equal I. The reason that the limit fails to exist is that con-dition (a) of Theorem 5.13 does not hold for M (−1 is an eigenvalue). Infact, it can be shown (see Exercise 20 of Section 7.2) that the only transitionmatrices A such that lim

m→∞Am does not exist are precisely those matrices for

which condition (a) of Theorem 5.13 fails to hold.But even if the limit of powers of the transition matrix exists, the compu-

tation of the limit may be quite difficult. (The reader is encouraged to workExercise 6 to appreciate the truth of the last sentence.) Fortunately, there isa large and important class of transition matrices for which this limit existsand is easily computed—this is the class of regular transition matrices.

Definition. A transition matrix is called regular if some power of thematrix contains only positive entries.

Example 2

The transition matrix (0.90 0.020.10 0.98

)of the Markov chain used in the city–suburb problem is clearly regular becauseeach entry is positive. On the other hand, the transition matrix

A =

⎛⎜⎜⎝1 0.4 0.1 00 0.3 0.5 00 0 0.2 00 0.3 0.2 1

⎞⎟⎟⎠of the Markov chain describing community college enrollments is not regularbecause the first column of Am is ⎛⎜⎜⎝

1000

⎞⎟⎟⎠for any power m.


Observe that a regular transition matrix may contain zero entries. Forexample,

M =

⎛⎝0.9 0.5 00 0.5 0.4

0.1 0 0.6

⎞⎠is regular because every entry of M2 is positive. ♦

The remainder of this section is devoted to proving that, for a regulartransition matrix A, the limit of the sequence of powers of A exists andhas identical columns. From this fact, it is easy to compute this limit. Inthe course of proving this result, we obtain some interesting bounds for themagnitudes of eigenvalues of any square matrix. These bounds are given interms of the sum of the absolute values of the rows and columns of the matrix.The necessary terminology is introduced in the definitions that follow.

Definitions. Let A ∈ Mn×n(C). For 1 ≤ i, j ≤ n, define ρi(A) to be thesum of the absolute values of the entries of row i of A, and define νj(A) to beequal to the sum of the absolute values of the entries of column j of A. Thus

ρi(A) =n∑

j=1

|Aij | for i = 1, 2, . . . n

and

νj(A) =n∑

i=1

|Aij | for j = 1, 2, . . . n.

The row sum of A, denoted ρ(A), and the column sum of A, denoted ν(A),are defined as

ρ(A) = max{ρi(A) : 1 ≤ i ≤ n} and ν(A) = max{νj(A) : 1 ≤ j ≤ n}.

Example 3

For the matrix

A =

⎛⎝ 1 −i 3 − 4i−2 + i 0 6

3 2 i

⎞⎠ ,

ρ1(A) = 7, ρ2(A) = 6 +√

5, ρ3(A) = 6, ν1(A) = 4 +√

5, ν2(A) = 3, andν3(A) = 12. Hence ρ(A) = 6 +

√5 and ν(A) = 12. ♦


Our next results show that the smaller of ρ(A) and ν(A) is an upperbound for the absolute values of eigenvalues of A. In the preceding example,for instance, A has no eigenvalue with absolute value greater than 6 +

√5.

To obtain a geometric view of the following theorem, we introduce someterminology. For an n×n matrix A, we define the ith Gerschgorin disk Ci tobe the disk in the complex plane with center Aii and radius ri = ρi(A)−|Aii|;that is,

Ci = {z ∈ C : |z − Aii| < ri}.For example, consider the matrix

A =(

1 + 2i 12i −3

).

For this matrix, C1 is the disk with center 1 + 2i and radius 1, and C2 is thedisk with center −3 and radius 2. (See Figure 5.4.)

.........

............................................

...........................................................................................................................................................................................................................................................................................................................................................�

� �.............................................................................................

............................

...........................................

........................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................

....................................................................................................................................

real axis

imaginary axis

−3

1 + 2i

1

2

0

C1

C2

Figure 5.4

Gershgorin’s disk theorem, stated below, tells us that all the eigenvaluesof A are located within these two disks. In particular, we see that 0 is not aneigenvalue, and hence by Exercise 8(c) of section 5.1, A is invertible.

Theorem 5.16 (Gerschgorin’s Disk Theorem). Let A ∈ Mn×n(C).Then every eigenvalue of A is contained in a Gerschgorin disk.

Proof. Let λ be an eigenvalue of A with the corresponding eigenvector

v =

⎛⎜⎜⎜⎝v1

v2

...vn

⎞⎟⎟⎟⎠ .


Then v satisfies the matrix equation Av = λv, which can be written

n∑j=1

Aijvj = λvi (i = 1, 2, . . . , n). (2)

Suppose that vk is the coordinate of v having the largest absolute value; notethat vk �= 0 because v is an eigenvector of A.

We show that λ lies in Ck, that is, |λ − Akk| ≤ rk. For i = k, it followsfrom (2) that

|λvk − Akkvk| =

∣∣∣∣∣∣n∑

j=1

Akjvj − Akkvk

∣∣∣∣∣∣ =

∣∣∣∣∣∣∑j �=k

Akjvj

∣∣∣∣∣∣≤∑j �=k

|Akj ||vj | ≤∑j �=k

|Akj ||vk|

= |vk|∑j �=k

|Akj | = |vk|rk.

Thus

|vk||λ − Akk| ≤ |vk|rk;

so

|λ − Akk| ≤ rk

because |vk| > 0.

Corollary 1. Let λ be any eigenvalue of A ∈ Mn×n(C). Then |λ| ≤ ρ(A).

Proof. By Gerschgorin’s disk theorem, |λ − Akk| ≤ rk for some k. Hence

|λ| = |(λ − Akk) + Akk| ≤ |λ − Akk| + |Akk|≤ rk + |Akk| = ρk(A) ≤ ρ(A).

Corollary 2. Let λ be any eigenvalue of A ∈ Mn×n(C). Then

|λ| ≤ min{ρ(A), ν(A)}.

Proof. Since |λ| ≤ ρ(A) by Corollary 1, it suffices to show that |λ| ≤ ν(A).By Exercise 14 of Section 5.1, λ is an eigenvalue of At, and so |λ| ≤ ρ(At)by Corollary 2. But the rows of At are the columns of A; consequentlyρ(At) = ν(A). Therefore |λ| ≤ ν(A).

The next corollary is immediate from Corollary 2.


Corollary 3. If λ is an eigenvalue of a transition matrix, then |λ| ≤ 1.

The next result asserts that the upper bound in Corollary 3 is attained.

Theorem 5.17. Every transition matrix has 1 as an eigenvalue.

Proof. Let A be an n×n transition matrix, and let u ∈ Rn be the columnvector in which each coordinate is 1. Then Atu = u by Theorem 5.15, andhence u is an eigenvector of At corresponding to the eigenvalue 1. But sinceA and At have the same eigenvalues, it follows that 1 is also an eigenvalue ofA.

Suppose that A is a transition matrix for which some eigenvector corre-sponding to the eigenvalue 1 has only nonnegative coordinates. Then somemultiple of this vector is a probability vector P as well as an eigenvector ofA corresponding to eigenvalue 1. It is interesting to observe that if P is theinitial probability vector of a Markov chain having A as its transition matrix,then the Markov chain is completely static. For in this situation, AmP = Pfor every positive integer m; hence the probability of being in each state neverchanges. Consider, for instance, the city–suburb problem with

P =

⎛⎝ 16

56

⎞⎠ .

Theorem 5.18. Let A ∈ Mn×n(C) be a matrix in which each entry ispositive, and let λ be an eigenvalue of A such that |λ| = ρ(A). Then λ = ρ(A)and {u} is a basis for Eλ, where u ∈ Cn is the column vector in which eachcoordinate equals 1.

Proof. Let v be an eigenvector of A corresponding to λ, with coordinatesv1, v2, . . . , vn. Suppose that vk is the coordinate of v having the largest ab-solute value, and let b = |vk|. Then

|λ|b = |λ||vk| = |λvk| =

∣∣∣∣∣∣n∑

j=1

Akjvj

∣∣∣∣∣∣ ≤n∑

j=1

|Akjvj |

=n∑

j=1

|Akj ||vj | ≤n∑

j=1

|Akj |b = ρk(A)b ≤ ρ(A)b. (3)

Since |λ| = ρ(A), the three inequalities in (3) are actually equalities; that is,

(a)

∣∣∣∣∣∣n∑

j=1

Akjvj

∣∣∣∣∣∣ =n∑

j=1

|Akjvj |,


(b)n∑

j=1

|Akj ||vj | =n∑

j=1

|Akj |b, and

(c) ρk(A) = ρ(A).

We see in Exercise 15(b) of Section 6.1 that (a) holds if and only if allthe terms Akjvj (j = 1, 2, . . . , n) are nonnegative multiples of some nonzerocomplex number z. Without loss of generality, we assume that |z| = 1. Thusthere exist nonnegative real numbers c1, c2, . . . , cn such that

Akjvj = cjz. (4)

By (b) and the assumption that Akj �= 0 for all k and j, we have

|vj | = b for j = 1, 2, . . . , n. (5)

Combining (4) and (5), we obtain

b = |vj | =∣∣∣∣ cj

Akjz

∣∣∣∣ =cj

Akjfor j = 1, 2, . . . , n,

and therefore by (4), we have vj = bz for all j. So

v =

⎛⎜⎜⎜⎝v1

v2

...vn

⎞⎟⎟⎟⎠ =

⎛⎜⎜⎜⎝bzbz...bz

⎞⎟⎟⎟⎠ = bzu,

and hence {u} is a basis for Eλ.Finally, observe that all of the entries of Au are positive because the same

is true for the entries of both A and u. But Au = λu, and hence λ > 0.Therefore, λ = |λ| = ρ(A).

Corollary 1. Let A ∈ Mn×n(C) be a matrix in which each entry ispositive, and let λ be an eigenvalue of A such that |λ| = ν(A). Then λ = ν(A),and the dimension of Eλ = 1.

Proof. Exercise.

Corollary 2. Let A ∈ Mn×n(C) be a transition matrix in which eachentry is positive, and let λ be any eigenvalue of A other than 1. Then |λ| < 1.Moreover, the eigenspace corresponding to the eigenvalue 1 has dimension 1.

Proof. Exercise.

Our next result extends Corollary 2 to regular transition matrices and thusshows that regular transition matrices satisfy condition (a) of Theorems 5.13and 5.14.


Theorem 5.19. Let A be a regular transition matrix, and let λ be aneigenvalue of A. Then

(a) |λ| ≤ 1.(b) If |λ| = 1, then λ = 1, and dim(Eλ) = 1.

Proof. Statement (a) was proved as Corollary 3 to Theorem 5.16.(b) Since A is regular, there exists a positive integer s such that As has

only positive entries. Because A is a transition matrix and the entries ofAs are positive, the entries of As+1 = As(A) are positive. Suppose that|λ| = 1. Then λs and λs+1 are eigenvalues of As and As+1, respectively,having absolute value 1. So by Corollary 2 to Theorem 5.18, λs = λs+1 = 1.Thus λ = 1. Let Eλ and E′

λ denote the eigenspaces of A and As, respectively,corresponding to λ = 1. Then Eλ ⊆ E′

λ and, by Corollary 2 to Theorem 5.18,dim(E′

λ) = 1. Hence Eλ = E′λ, and dim(Eλ) = 1.

Corollary. Let A be a regular transition matrix that is diagonalizable.Then lim

m→∞Am exists.

The preceding corollary, which follows immediately from Theorems 5.19and 5.14, is not the best possible result. In fact, it can be shown that if A isa regular transition matrix, then the multiplicity of 1 as an eigenvalue of A is1. Thus, by Theorem 5.7 (p. 264), condition (b) of Theorem 5.13 is satisfied.So if A is a regular transition matrix, lim

m→∞Am exists regardless of whetherA is or is not diagonalizable. As with Theorem 5.13, however, the fact thatthe multiplicity of 1 as an eigenvalue of A is 1 cannot be proved at this time.Nevertheless, we state this result here (leaving the proof until Exercise 20 ofSection 7.2) and deduce further facts about lim

m→∞Am when A is a regulartransition matrix.

Theorem 5.20. Let A be an n × n regular transition matrix. Then(a) The multiplicity of 1 as an eigenvalue of A is 1.(b) lim

m→∞Am exists.

(c) L = limm→∞Am is a transition matrix.

(d) AL = LA = L.(e) The columns of L are identical. In fact, each column of L is equal to

the unique probability vector v that is also an eigenvector of A corre-sponding to the eigenvalue 1.

(f) For any probability vector w, limm→∞(Amw) = v.

Proof. (a) See Exercise 20 of Section 7.2.(b) This follows from (a) and Theorems 5.19 and 5.13.(c) By Theorem 5.15, we must show that utL = ut. Now Am is a transition

matrix by the corollary to Theorem 5.15, so

utL = ut limm→∞Am = lim

m→∞utAm = limm→∞ut = ut,


and it follows that L is a transition matrix.(d) By Theorem 5.12,

AL = A limm→∞Am = lim

m→∞AAm = limm→∞Am+1 = L.

Similarly, LA = L.(e) Since AL = L by (d), each column of L is an eigenvector of A cor-

responding to the eigenvalue 1. Moreover, by (c), each column of L is aprobability vector. Thus, by (a), each column of L is equal to the uniqueprobability vector v corresponding to the eigenvalue 1 of A.

(f) Let w be any probability vector, and set y = limm→∞Amw = Lw. Then

y is a probability vector by the corollary to Theorem 5.15, and also Ay =ALw = Lw = y by (d). Hence y is also an eigenvector corresponding to theeigenvalue 1 of A. So y = v by (e).

Definition. The vector v in Theorem 5.20(e) is called the fixed prob-ability vector or stationary vector of the regular transition matrix A.

Theorem 5.20 can be used to deduce information about the eventual dis-tribution in each state of a Markov chain having a regular transition matrix.

Example 4

A survey in Persia showed that on a particular day 50% of the Persianspreferred a loaf of bread, 30% preferred a jug of wine, and 20% preferred“thou beside me in the wilderness.” A subsequent survey 1 month lateryielded the following data: Of those who preferred a loaf of bread on the firstsurvey, 40% continued to prefer a loaf of bread, 10% now preferred a jug ofwine, and 50% preferred “thou”; of those who preferred a jug of wine on thefirst survey, 20% now preferred a loaf of bread, 70% continued to prefer a jugof wine, and 10% now preferred “thou”; of those who preferred “thou” on thefirst survey, 20% now preferred a loaf of bread, 20% now preferred a jug ofwine, and 60% continued to prefer “thou.”

Assuming that this trend continues, the situation described in the preced-ing paragraph is a three-state Markov chain in which the states are the threepossible preferences. We can predict the percentage of Persians in each statefor each month following the original survey. Letting the first, second, andthird states be preferences for bread, wine, and “thou”, respectively, we seethat the probability vector that gives the initial probability of being in eachstate is

P =

⎛⎝0.500.300.20

⎞⎠ ,


and the transition matrix is

A =

⎛⎝0.40 0.20 0.200.10 0.70 0.200.50 0.10 0.60

⎞⎠ .

The probabilities of being in each state m months after the original surveyare the coordinates of the vector AmP . The reader may check that

AP =

⎛⎝0.300.300.40

⎞⎠, A2P =

⎛⎝0.260.320.42

⎞⎠, A3P =

⎛⎝0.2520.3340.414

⎞⎠, and A4P =

⎛⎝0.25040.34180.4078

⎞⎠ .

Note the apparent convergence of AmP .

Since A is regular, the long-range prediction concerning the Persians’ pref-erences can be found by computing the fixed probability vector for A. Thisvector is the unique probability vector v such that (A − I)v = 0 . Letting

v =

⎛⎝v1

v2

v3

⎞⎠ ,

we see that the matrix equation (A − I)v = 0 yields the following system oflinear equations:

−0.60v1 + 0.20v2 + 0.20v3 = 00.10v1 − 0.30v2 + 0.20v3 = 00.50v1 + 0.10v2 − 0.40v3 = 0 .

It is easily shown that ⎛⎝578

⎞⎠is a basis for the solution space of this system. Hence the unique fixed prob-ability vector for A is ⎛⎜⎜⎝

55+7+8

75+7+8

85+7+8

⎞⎟⎟⎠ =

⎛⎝0.250.350.40

⎞⎠ .

Thus, in the long run, 25% of the Persians prefer a loaf of bread, 35% prefera jug of wine, and 40% prefer “thou beside me in the wilderness.”


Note that if

Q =

⎛⎝5 0 −37 −1 −18 1 4

⎞⎠ ,

then

Q−1AQ =

⎛⎝1 0 00 0.5 00 0 0.2

⎞⎠ .

So

limm→∞Am = Q

⎡⎣ limm→∞

⎛⎝1 0 00 0.5 00 0 0.2

⎞⎠m⎤⎦Q−1 = Q

⎛⎝1 0 00 0 00 0 0

⎞⎠Q−1

=

⎛⎝0.25 0.25 0.250.35 0.35 0.350.40 0.40 0.40

⎞⎠ . ♦

Example 5

Farmers in Lamron plant one crop per year—either corn, soybeans, or wheat.Because they believe in the necessity of rotating their crops, these farmers donot plant the same crop in successive years. In fact, of the total acreage onwhich a particular crop is planted, exactly half is planted with each of theother two crops during the succeeding year. This year, 300 acres of corn, 200acres of soybeans, and 100 acres of wheat were planted.

The situation just described is another three-state Markov chain in whichthe three states correspond to the planting of corn, soybeans, and wheat,respectively. In this problem, however, the amount of land devoted to eachcrop, rather than the percentage of the total acreage (600 acres), is given. Byconverting these amounts into fractions of the total acreage, we see that thetransition matrix A and the initial probability vector P of the Markov chainare

A =

⎛⎜⎜⎝0 1

212

12 0 1

2

12

12 0

⎞⎟⎟⎠ and P =

⎛⎜⎜⎝300600

200600

100600

⎞⎟⎟⎠ =

⎛⎜⎜⎝12

13

16

⎞⎟⎟⎠ .

The fraction of the total acreage devoted to each crop in m years is given bythe coordinates of AmP , and the eventual proportions of the total acreageused for each crop are the coordinates of lim

m→∞AmP . Thus the eventual


amounts of land devoted to each crop are found by multiplying this limit bythe total acreage; that is, the eventual amounts of land used for each cropare the coordinates of 600 · lim

m→∞AmP .

Since A is a regular transition matrix, Theorem 5.20 shows that limm→∞Am

is a matrix L in which each column equals the unique fixed probability vectorfor A. It is easily seen that the fixed probability vector for A is⎛⎜⎜⎝

13

13

13

⎞⎟⎟⎠ .

Hence

L =

⎛⎜⎜⎝13

13

13

13

13

13

13

13

13

⎞⎟⎟⎠ ;

so

600 · limm→∞AmP = 600LP =

⎛⎝200200200

⎞⎠ .

Thus, in the long run, we expect 200 acres of each crop to be planted eachyear. (For a direct computation of 600 · lim

m→∞AmP , see Exercise 14.) ♦

In this section, we have concentrated primarily on the theory of regulartransition matrices. There is another interesting class of transition matricesthat can be represented in the form(

I BO C

),

where I is an identity matrix and O is a zero matrix. (Such transition ma-trices are not regular since the lower left block remains O in any power ofthe matrix.) The states corresponding to the identity submatrix are calledabsorbing states because such a state is never left once it is entered. AMarkov chain is called an absorbing Markov chain if it is possible to gofrom an arbitrary state into an absorbing state in a finite number of stages.Observe that the Markov chain that describes the enrollment pattern in acommunity college is an absorbing Markov chain with states 1 and 4 as its ab-sorbing states. Readers interested in learning more about absorbing Markovchains are referred to Introduction to Finite Mathematics (third edition) by


J. Kemeny, J. Snell, and G. Thompson (Prentice-Hall, Inc., Englewood Cliffs,N. J., 1974) or Discrete Mathematical Models by Fred S. Roberts (Prentice-Hall, Inc., Englewood Cliffs, N. J., 1976).

An Application

In species that reproduce sexually, the characteristics of an offspring withrespect to a particular genetic trait are determined by a pair of genes, oneinherited from each parent. The genes for a particular trait are of two types,which are denoted by G and g. The gene G represents the dominant char-acteristic, and g represents the recessive characteristic. Offspring with geno-types GG or Gg exhibit the dominant characteristic, whereas offspring withgenotype gg exhibit the recessive characteristic. For example, in humans,brown eyes are a dominant characteristic and blue eyes are the correspond-ing recessive characteristic; thus the offspring with genotypes GG or Gg arebrown-eyed, whereas those of type gg are blue-eyed.

Let us consider the probability of offspring of each genotype for a maleparent of genotype Gg. (We assume that the population under considerationis large, that mating is random with respect to genotype, and that the distri-bution of each genotype within the population is independent of sex and lifeexpectancy.) Let

P =

⎛⎝pqr

⎞⎠denote the proportion of the adult population with genotypes GG, Gg, andgg, respectively, at the start of the experiment. This experiment describes athree-state Markov chain with the following transition matrix:

Genotype of female parent

GG Gg gg

Genotype GGof Ggoffspring gg

⎛⎜⎜⎝12

14 0

12

12

12

0 14

12

⎞⎟⎟⎠ = B.

It is easily checked that B2 contains only positive entries; so B is regular.Thus, by permitting only males of genotype Gg to reproduce, the proportionof offspring in the population having a certain genotype will stabilize at thefixed probability vector for B, which is⎛⎜⎜⎝

14

12

14

⎞⎟⎟⎠ .


Now suppose that similar experiments are to be performed with males ofgenotypes GG and gg. As already mentioned, these experiments are three-state Markov chains with transition matrices

A =

⎛⎜⎝1 12 0

0 12 1

0 0 0

⎞⎟⎠ and C =

⎛⎜⎝0 0 01 1

2 0

0 12 1

⎞⎟⎠ ,

respectively. In order to consider the case where all male genotypes are per-mitted to reproduce, we must form the transition matrix M = pA+ qB +rC,which is the linear combination of A, B, and C weighted by the proportionof males of each genotype. Thus

M =

⎛⎜⎜⎝p + 1

2q 12p + 1

4q 012q + r 1

2p + 12q + 1

2r p + 12q

0 14q + 1

2r 12q + r

⎞⎟⎟⎠ .

To simplify the notation, let a = p+ 12q and b = 1

2q + r. (The numbers a andb represent the proportions of G and g genes, respectively, in the population.)Then

M =

⎛⎜⎜⎝a 1

2a 0

b 12 a

0 12b b

⎞⎟⎟⎠ ,

where a + b = p + q + r = 1.Let p′, q′, and r′ denote the proportions of the first-generation offspring

having genotypes GG, Gg, and gg, respectively. Then

⎛⎝p′

q′

r′

⎞⎠ = MP =

⎛⎜⎜⎝ap + 1

2aq

bp + 12q + ar

12bq + br

⎞⎟⎟⎠ =

⎛⎝ a2

2abb2

⎞⎠ .

In order to consider the effects of unrestricted matings among the first-generation offspring, a new transition matrix M must be determined basedupon the distribution of first-generation genotypes. As before, we find that

M =

⎛⎜⎜⎝p′ + 1

2q′ 12p′ + 1

4q′ 012q′ + r′ 1

2p′ + 12q′ + 1

2r′ p′ + 12q′

0 14q′ + 1

2r′ 12q′ + r′

⎞⎟⎟⎠ =

⎛⎜⎜⎝a′ 1

2a′ 0

b′ 12 a′

0 12b′ b′

⎞⎟⎟⎠ ,


where a′ = p′ + 12q′ and b′ = 1

2q′ + r′. However

a′ = a2 +12(2ab) = a(a + b) = a and b′ =

12(2ab) + b2 = b(a + b) = b.

Thus M = M ; so the distribution of second-generation offspring amongthe three genotypes is

M(MP ) = M2P =

⎛⎝ a3 + a2ba2b + ab + ab2

ab2 + b3

⎞⎠ =

⎛⎝ a2(a + b)ab(a + 1 + b)

b2(a + b)

⎞⎠ =

⎛⎝ a2

2abb2

⎞⎠= MP,

the same as the first-generation offspring. In other words, MP is the fixedprobability vector for M , and genetic equilibrium is achieved in the populationafter only one generation. (This result is called the Hardy–Weinberg law.)Notice that in the important special case that a = b (or equivalently, thatp = r), the distribution at equilibrium is

MP =

⎛⎝ a2

2abb2

⎞⎠ =

⎛⎜⎜⎝14

12

14

⎞⎟⎟⎠ .

EXERCISES


(a) If A ∈ Mn×n(C) and limm→∞Am = L, then, for any invertible matrix

Q ∈ Mn×n(C), we have limm→∞QAmQ−1 = QLQ−1.

(b) If 2 is an eigenvalue of A ∈ Mn×n(C), then limm→∞Am does not

exist.(c) Any vector ⎛⎜⎜⎜⎝

x1

x2

...xn

⎞⎟⎟⎟⎠ ∈ Rn

such that x1 + x2 + · · · + xn = 1 is a probability vector.(d) The sum of the entries of each row of a transition matrix equals 1.(e) The product of a transition matrix and a probability vector is a

probability vector.


(f) Let z be any complex number such that |z| < 1. Then the matrix⎛⎝ 1 z −1z 1 1

−1 1 z

⎞⎠does not have 3 as an eigenvalue.

(g) Every transition matrix has 1 as an eigenvalue.(h) No transition matrix can have −1 as an eigenvalue.(i) If A is a transition matrix, then lim

m→∞Am exists.

(j) If A is a regular transition matrix, then limm→∞Am exists and has

rank 1.

2. Determine whether limm→∞Am exists for each of the following matrices

A, and compute the limit if it exists.

(a)(

0.1 0.70.7 0.1

)(b)

(−1.4 0.8−2.4 1.8

)(c)

(0.4 0.70.6 0.3

)

(d)(−1.8 4.8−0.8 2.2

)(e)

(−2 −14 3

)(f)

(2.0 −0.53.0 −0.5

)

(g)

⎛⎝−1.8 0 −1.4−5.6 1 −2.8

2.8 0 2.4

⎞⎠ (h)

⎛⎝ 3.4 −0.2 0.83.9 1.8 1.3

−16.5 −2.0 −4.5

⎞⎠

(i)

⎛⎜⎝− 12 − 2i 4i 1

2 + 5i

1 + 2i −3i −1 − 4i−1 − 2i 4i 1 + 5i

⎞⎟⎠

(j)

⎛⎜⎜⎜⎜⎜⎜⎜⎜⎝

−26 + i

3−28 − 4i

328

−7 + 2i

3−5 + i

37 − 2i

−13 + 6i

6−5 + 6i

635 − 20i

6

⎞⎟⎟⎟⎟⎟⎟⎟⎟⎠3. Prove that if A1, A2, . . . is a sequence of n × p matrices with complex

entries such that limm→∞Am = L, then lim

m→∞(Am)t = Lt.

4. Prove that if A ∈ Mn×n(C) is diagonalizable and L = limm→∞Am exists,

then either L = In or rank(L) < n.


5. Find 2 × 2 matrices A and B having real entries such that limm→∞Am,

limm→∞Bm, and lim

m→∞(AB)m all exist, but

limm→∞(AB)m �= ( lim

m→∞Am)( limm→∞Bm).

6. A hospital trauma unit has determined that 30% of its patients areambulatory and 70% are bedridden at the time of arrival at the hospital.A month after arrival, 60% of the ambulatory patients have recovered,20% remain ambulatory, and 20% have become bedridden. After thesame amount of time, 10% of the bedridden patients have recovered,20% have become ambulatory, 50% remain bedridden, and 20% havedied. Determine the percentages of patients who have recovered, areambulatory, are bedridden, and have died 1 month after arrival. Alsodetermine the eventual percentages of patients of each type.

7. A player begins a game of chance by placing a marker in box 2, markedStart. (See Figure 5.5.) A die is rolled, and the marker is moved onesquare to the left if a 1 or a 2 is rolled and one square to the right if a3, 4, 5, or 6 is rolled. This process continues until the marker lands insquare 1, in which case the player wins the game, or in square 4, in whichcase the player loses the game. What is the probability of winning thisgame? Hint: Instead of diagonalizing the appropriate transition matrix

Win Start Lose1 2 3 4

Figure 5.5

A, it is easier to represent e2 as a linear combination of eigenvectors ofA and then apply An to the result.

8. Which of the following transition matrices are regular?

(a)

⎛⎝0.2 0.3 0.50.3 0.2 0.50.5 0.5 0

⎞⎠ (b)

⎛⎝0.5 0 10.5 0 0

0 1 0

⎞⎠ (c)

⎛⎝0.5 0 00.5 0 1

0 1 0

⎞⎠

(d)

⎛⎝0.5 0 10.5 1 0

0 0 0

⎞⎠ (e)

⎛⎜⎜⎝13 0 013 1 013 0 1

⎞⎟⎟⎠ (f)

⎛⎝1 0 00 0.7 0.20 0.3 0.8

⎞⎠


(g)

⎛⎜⎜⎜⎜⎜⎝0 1

2 0 012 0 0 014

14 1 0

14

14 0 1

⎞⎟⎟⎟⎟⎟⎠ (h)

⎛⎜⎜⎜⎜⎜⎝14

14 0 0

14

14 0 0

14

14 1 0

14

14 0 1

⎞⎟⎟⎟⎟⎟⎠9. Compute lim

m→∞Am if it exists, for each matrix A in Exercise 8.

10. Each of the matrices that follow is a regular transition matrix for athree-state Markov chain. In all cases, the initial probability vector is

P =

⎛⎝0.30.30.4

⎞⎠ .

For each transition matrix, compute the proportions of objects in eachstate after two stages and the eventual proportions of objects in eachstate by determining the fixed probability vector.

(a)

⎛⎝0.6 0.1 0.10.1 0.9 0.20.3 0 0.7

⎞⎠ (b)

⎛⎝0.8 0.1 0.20.1 0.8 0.20.1 0.1 0.6

⎞⎠ (c)

⎛⎝0.9 0.1 0.10.1 0.6 0.1

0 0.3 0.8

⎞⎠(d)

⎛⎝0.4 0.2 0.20.1 0.7 0.20.5 0.1 0.6

⎞⎠ (e)

⎛⎝0.5 0.3 0.20.2 0.5 0.30.3 0.2 0.5

⎞⎠ (f)

⎛⎝0.6 0 0.40.2 0.8 0.20.2 0.2 0.4

⎞⎠11. In 1940, a county land-use survey showed that 10% of the county land

was urban, 50% was unused, and 40% was agricultural. Five years later,a follow-up survey revealed that 70% of the urban land had remainedurban, 10% had become unused, and 20% had become agricultural.Likewise, 20% of the unused land had become urban, 60% had remainedunused, and 20% had become agricultural. Finally, the 1945 surveyshowed that 20% of the agricultural land had become unused while80% remained agricultural. Assuming that the trends indicated by the1945 survey continue, compute the percentages of urban, unused, andagricultural land in the county in 1950 and the corresponding eventualpercentages.

12. A diaper liner is placed in each diaper worn by a baby. If, after adiaper change, the liner is soiled, then it is discarded and replaced by anew liner. Otherwise, the liner is washed with the diapers and reused,except that each liner is discarded and replaced after its third use (evenif it has never been soiled). The probability that the baby will soil anydiaper liner is one-third. If there are only new diaper liners at first,eventually what proportions of the diaper liners being used will be new,


once used, and twice used? Hint: Assume that a diaper liner ready foruse is in one of three states: new, once used, and twice used. After itsuse, it then transforms into one of the three states described.

13. In 1975, the automobile industry determined that 40% of American carowners drove large cars, 20% drove intermediate-sized cars, and 40%drove small cars. A second survey in 1985 showed that 70% of the large-car owners in 1975 still owned large cars in 1985, but 30% had changedto an intermediate-sized car. Of those who owned intermediate-sizedcars in 1975, 10% had switched to large cars, 70% continued to driveintermediate-sized cars, and 20% had changed to small cars in 1985.Finally, of the small-car owners in 1975, 10% owned intermediate-sizedcars and 90% owned small cars in 1985. Assuming that these trendscontinue, determine the percentages of Americans who own cars of eachsize in 1995 and the corresponding eventual percentages.

14. Show that if A and P are as in Example 5, then

Am =

⎛⎝ rm rm+1 rm+1

rm+1 rm rm+1

rm+1 rm+1 rm

⎞⎠ ,

where

rm =13

[1 +

(−1)m

2m−1

].

Deduce that

600(AmP ) = Am

⎛⎝300200100

⎞⎠ =

⎛⎜⎜⎜⎜⎜⎝200 +

(−1)m

2m(100)

200

200 +(−1)m+1

2m(100)

⎞⎟⎟⎟⎟⎟⎠ .

15. Prove that if a 1-dimensional subspace W of Rn contains a nonzero vec-tor with all nonnegative entries, then W contains a unique probabilityvector.


17. Prove the two corollaries of Theorem 5.18.

18. Prove the corollary of Theorem 5.19.

19. Suppose that M and M ′ are n × n transition matrices.


(a) Prove that if M is regular, N is any n × n transition matrix, andc is a real number such that 0 < c ≤ 1, then cM + (1 − c)N is aregular transition matrix.

(b) Suppose that for all i, j, we have that M ′ij > 0 whenever Mij > 0.

Prove that there exists a transition matrix N and a real number cwith 0 < c ≤ 1 such that M ′ = cM + (1 − c)N .

(c) Deduce that if the nonzero entries of M and M ′ occur in the samepositions, then M is regular if and only if M ′ is regular.


Definition. For A ∈ Mn×n(C), define eA = limm→∞Bm, where

Bm = I + A +A2

2!+ · · · + Am

m!

(see Exercise 22). Thus eA is the sum of the infinite series

I + A +A2

2!+

A3

3!+ · · · ,

and Bm is the mth partial sum of this series. (Note the analogy with thepower series

ea = 1 + a +a2

2!+

a3

3!+ · · · ,

which is valid for all complex numbers a.)

20. Compute eO and eI , where O and I denote the n× n zero and identitymatrices, respectively.

21. Let P−1AP = D be a diagonal matrix. Prove that eA = PeDP−1.

22. Let A ∈ Mn×n(C) be diagonalizable. Use the result of Exercise 21 toshow that eA exists. (Exercise 21 of Section 7.2 shows that eA existsfor every A ∈ Mn×n(C).)

23. Find A, B ∈ M2×2(R) such that eAeB �= eA+B .

24. Prove that a differentiable function x : R → Rn is a solution to thesystem of differential equations defined in Exercise 15 of Section 5.2 ifand only if x(t) = etAv for some v ∈ Rn, where A is defined in thatexercise.

Sec. 5.4 Invariant Subspaces and the Cayley–Hamilton Theorem 313

5.4 INVARIANT SUBSPACES AND THE CAYLEY–HAMILTONTHEOREM

In Section 5.1, we observed that if v is an eigenvector of a linear operatorT, then T maps the span of {v} into itself. Subspaces that are mapped intothemselves are of great importance in the study of linear operators (see, e.g.,Exercises 28–32 of Section 2.1).

Definition. Let T be a linear operator on a vector space V. A subspaceW of V is called a T-invariant subspace of V if T(W) ⊆ W, that is, ifT(v) ∈ W for all v ∈ W.

Example 1

Suppose that T is a linear operator on a vector space V. Then the followingsubspaces of V are T-invariant:

1. {0}2. V3. R(T)4. N(T)5. Eλ, for any eigenvalue λ of T.

The proofs that these subspaces are T-invariant are left as exercises. (SeeExercise 3.) ♦Example 2


T(a, b, c) = (a + b, b + c, 0).

Then the xy-plane = {(x, y, 0) : x, y ∈ R} and the x-axis = {(x, 0, 0) : x ∈ R}are T-invariant subspaces of R3. ♦

Let T be a linear operator on a vector space V, and let x be a nonzerovector in V. The subspace

W = span({x,T(x), T2(x), . . .})is called the T-cyclic subspace of V generated by x. It is a simple matterto show that W is T-invariant. In fact, W is the “smallest” T-invariant sub-space of V containing x. That is, any T-invariant subspace of V containing xmust also contain W (see Exercise 11). Cyclic subspaces have various uses.We apply them in this section to establish the Cayley–Hamilton theorem. InExercise 31, we outline a method for using cyclic subspaces to compute thecharacteristic polynomial of a linear operator without resorting to determi-nants. Cyclic subspaces also play an important role in Chapter 7, where westudy matrix representations of nondiagonalizable linear operators.


Example 3


T(a, b, c) = (−b + c, a + c, 3c).

We determine the T-cyclic subspace generated by e1 = (1, 0, 0). Since

T(e1) = T(1, 0, 0) = (0, 1, 0) = e2

and

T2(e1) = T(T(e1)) = T(e2) = (−1, 0, 0) = −e1,

it follows that

span({e1, T(e1), T2(e1), . . .}) = span({e1, e2}) = {(s, t, 0) : s, t ∈ R}. ♦

Example 4

Let T be the linear operator on P(R) defined by T(f(x)) = f ′(x). Then theT-cyclic subspace generated by x2 is span({x2, 2x, 2}) = P2(R). ♦

The existence of a T-invariant subspace provides the opportunity to definea new linear operator whose domain is this subspace. If T is a linear operatoron V and W is a T-invariant subspace of V, then the restriction TW of T toW (see Appendix B) is a mapping from W to W, and it follows that TW isa linear operator on W (see Exercise 7). As a linear operator, TW inheritscertain properties from its parent operator T. The following result illustratesone way in which the two operators are linked.

Theorem 5.21. Let T be a linear operator on a finite-dimensional vectorspace V, and let W be a T-invariant subspace of V. Then the characteristicpolynomial of TW divides the characteristic polynomial of T.

Proof. Choose an ordered basis γ = {v1, v2, . . . , vk} for W, and extend itto an ordered basis β = {v1, v2, . . . , vk, vk+1, . . . , vn} for V. Let A = [T]β andB1 = [TW]γ . Then, by Exercise 12, A can be written in the form

A =(

B1 B2

O B3

).

Let f(t) be the characteristic polynomial of T and g(t) the characteristicpolynomial of TW. Then

f(t) = det(A − tIn) = det(

B1 − tIk B2

O B3 − tIn−k

)= g(t) · det(B3 − tIn−k)

by Exercise 21 of Section 4.3. Thus g(t) divides f(t).


Example 5


T(a, b, c, d) = (a + b + 2c − d, b + d, 2c − d, c + d),

and let W = {(t, s, 0, 0) : t, s ∈ R}. Observe that W is a T-invariant subspaceof R4 because, for any vector (a, b, 0, 0) ∈ R4,

T(a, b, 0, 0) = (a + b, b, 0, 0) ∈ W.

Let γ = {e1, e2}, which is an ordered basis for W. Extend γ to the standardordered basis β for R4. Then

B1 = [TW]γ =(

1 10 1

)and A = [T]β =

⎛⎜⎜⎝1 1 2 −10 1 0 10 0 2 −10 0 1 1

⎞⎟⎟⎠in the notation of Theorem 5.21. Let f(t) be the characteristic polynomial ofT and g(t) be the characteristic polynomial of TW. Then

f(t) = det(A − tI4) = det

⎛⎜⎜⎝1 − t 1 2 −1

0 1 − t 0 10 0 2 − t −10 0 1 1 − t

⎞⎟⎟⎠= det

(1 − t 1

0 1 − t

)· det

(2 − t −1

1 1 − t

)= g(t) · det

(2 − t −1

1 1 − t

). ♦

In view of Theorem 5.21, we may use the characteristic polynomial of TW

to gain information about the characteristic polynomial of T itself. In this re-gard, cyclic subspaces are useful because the characteristic polynomial of therestriction of a linear operator T to a cyclic subspace is readily computable.

Theorem 5.22. Let T be a linear operator on a finite-dimensional vectorspace V, and let W denote the T-cyclic subspace of V generated by a nonzerovector v ∈ V. Let k = dim(W). Then

(a) {v,T(v), T2(v), . . . ,Tk−1(v)} is a basis for W.

(b) If a0v+a1T(v)+ · · ·+ak−1Tk−1(v)+Tk(v) = 0 , then the characteristic

polynomial of TW is f(t) = (−1)k(a0 + a1t + · · · + ak−1tk−1 + tk).


Proof. (a) Since v �= 0 , the set {v} is linearly independent. Let j be thelargest positive integer for which

β = {v,T(v), . . . ,Tj−1(v)}is linearly independent. Such a j must exist because V is finite-dimensional.Let Z = span(β). Then β is a basis for Z. Furthermore, Tj(v) ∈ Z byTheorem 1.7 (p. 39). We use this information to show that Z is a T-invariantsubspace of V. Let w ∈ Z. Since w is a linear combination of the vectors ofβ, there exist scalars b0, b1, . . . , bj−1 such that

w = b0v + b1T(v) + · · · + bj−1Tj−1(v),

and hence

T(w) = b0T(v) + b1T2(v) + · · · + bj−1T

j(v).

Thus T(w) is a linear combination of vectors in Z, and hence belongs to Z.So Z is T-invariant. Furthermore, v ∈ Z. By Exercise 11, W is the smallestT-invariant subspace of V that contains v, so that W ⊆ Z. Clearly, Z ⊆ W,and so we conclude that Z = W. It follows that β is a basis for W, andtherefore dim(W) = j. Thus j = k. This proves (a).

(b) Now view β (from (a)) as an ordered basis for W. Let a0, a1, . . . , ak−1

be the scalars such that

a0v + a1T(v) + · · · + ak−1Tk−1(v) + Tk(v) = 0 .

Observe that

[TW]β =

⎛⎜⎜⎜⎝0 0 · · · 0 −a0

1 0 · · · 0 −a1

......

......

0 0 · · · 1 −ak−1

⎞⎟⎟⎟⎠ ,

which has the characteristic polynomial

f(t) = (−1)k(a0 + a1t + · · · + ak−1tk−1 + tk)

by Exercise 19. Thus f(t) is the characteristic polynomial of TW, proving (b).

Example 6

Let T be the linear operator of Example 3, and let W = span({e1, e2}), theT-cyclic subspace generated by e1. We compute the characteristic polyno-mial f(t) of TW in two ways: by means of Theorem 5.22 and by means ofdeterminants.


(a) By means of Theorem 5.22. From Example 3, we have that {e1, e2} isa cycle that generates W, and that T2(e1) = −e1. Hence

1e1 + 0T(e1) + T2(e1) = 0 .

Therefore, by Theorem 5.22(b),

f(t) = (−1)2(1 + 0t + t2) = t2 + 1.

(b) By means of determinants. Let β = {e1, e2}, which is an ordered basisfor W. Since T(e1) = e2 and T(e2) = −e1, we have

[TW]β =(

0 −11 0

)and therefore,

f(t) = det(−t −1

1 −t

)= t2 + 1. ♦

The Cayley–Hamilton Theorem

As an illustration of the importance of Theorem 5.22, we prove a well-known result that is used in Chapter 7. The reader should refer to Ap-pendix E for the definition of f(T), where T is a linear operator and f(x) isa polynomial.

Theorem 5.23 (Cayley–Hamilton). Let T be a linear operator on afinite-dimensional vector space V, and let f(t) be the characteristic polyno-mial of T. Then f(T) = T0, the zero transformation. That is, T “satisfies”its characteristic equation.

Proof. We show that f(T)(v) = 0 for all v ∈ V. This is obvious if v = 0because f(T) is linear; so suppose that v �= 0 . Let W be the T-cyclic subspacegenerated by v, and suppose that dim(W) = k. By Theorem 5.22(a), thereexist scalars a0, a1, . . . , ak−1 such that

a0v + a1T(v) + · · · + ak−1Tk−1(v) + Tk(v) = 0 .

Hence Theorem 5.22(b) implies that

g(t) = (−1)k(a0 + a1t + · · · + ak−1tk−1 + tk)

is the characteristic polynomial of TW. Combining these two equations yields

g(T)(v) = (−1)k(a0I + a1T + · · · + ak−1Tk−1 + Tk)(v) = 0 .

By Theorem 5.21, g(t) divides f(t); hence there exists a polynomial q(t) suchthat f(t) = q(t)g(t). So

f(T)(v) = q(T)g(T)(v) = q(T)(g(T)(v)) = q(T)(0 ) = 0 .


Example 7

Let T be the linear operator on R2 defined by T(a, b) = (a+2b,−2a+ b), andlet β = {e1, e2}. Then

A =(

1 2−2 1

),

where A = [T]β . The characteristic polynomial of T is, therefore,

f(t) = det(A − tI) = det(

1 − t 2−2 1 − t

)= t2 − 2t + 5.

It is easily verified that T0 = f(T) = T2 − 2T + 5I. Similarly,

f(A) = A2 − 2A + 5I =(−3 4−4 −3

)+(−2 −4

4 −2

)+(

5 00 5

)=(

0 00 0

). ♦

Example 7 suggests the following result.

Corollary (Cayley–Hamilton Theorem for Matrices). Let A bean n × n matrix, and let f(t) be the characteristic polynomial of A. Thenf(A) = O, the n × n zero matrix.


Invariant Subspaces and Direct Sums*3

It is useful to decompose a finite-dimensional vector space V into a directsum of as many T-invariant subspaces as possible because the behavior of Ton V can be inferred from its behavior on the direct summands. For example,T is diagonalizable if and only if V can be decomposed into a direct sumof one-dimensional T-invariant subspaces (see Exercise 36). In Chapter 7,we consider alternate ways of decomposing V into direct sums of T-invariantsubspaces if T is not diagonalizable. We proceed to gather a few facts aboutdirect sums of T-invariant subspaces that are used in Section 7.4. The firstof these facts is about characteristic polynomials.

Theorem 5.24. Let T be a linear operator on a finite-dimensional vectorspace V, and suppose that V = W1 ⊕ W2 ⊕ · · · ⊕ Wk, where Wi is a T-invariant subspace of V for each i (1 ≤ i ≤ k). Suppose that fi(t) is thecharacteristic polynomial of TWi

(1 ≤ i ≤ k). Then f1(t) ·f2(t) · · · · ·fk(t) isthe characteristic polynomial of T.

3This subsection uses optional material on direct sums from Section 5.2.


Proof. The proof is by mathematical induction on k. In what follows, f(t)denotes the characteristic polynomial of T. Suppose first that k = 2. Let β1

be an ordered basis for W1, β2 an ordered basis for W2, and β = β1 ∪ β2.Then β is an ordered basis for V by Theorem 5.10(d) (p. 276). Let A = [T]β ,B1 = [TW1 ]β1 , and B2 = [TW2 ]β2 . By Exercise 34, it follows that

A =(

B1 OO′ B2

),

where O and O′ are zero matrices of the appropriate sizes. Then

f(t) = det(A − tI) = det(B1 − tI) · det(B2 − tI) = f1(t) ·f2(t)

as in the proof of Theorem 5.21, proving the result for k = 2.Now assume that the theorem is valid for k−1 summands, where k−1 ≥ 2,

and suppose that V is a direct sum of k subspaces, say,

V = W1 ⊕ W2 ⊕ · · · ⊕ Wk.

Let W = W1 +W2 + · · ·+Wk−1. It is easily verified that W is T-invariant andthat V = W ⊕ Wk. So by the case for k = 2, f(t) = g(t) ·fk(t), where g(t) isthe characteristic polynomial of TW. Clearly W = W1⊕W2⊕· · ·⊕Wk−1, andtherefore g(t) = f1(t) ·f2(t) · · · · ·fk−1(t) by the induction hypothesis. Weconclude that f(t) = g(t) ·fk(t) = f1(t) ·f2(t) · · · · ·fk(t).

As an illustration of this result, suppose that T is a diagonalizable lin-ear operator on a finite-dimensional vector space V with distinct eigenvaluesλ1, λ2, . . . , λk. By Theorem 5.11 (p. 278), V is a direct sum of the eigenspacesof T. Since each eigenspace is T-invariant, we may view this situation in thecontext of Theorem 5.24. For each eigenvalue λi, the restriction of T to Eλi

has characteristic polynomial (λi − t)mi , where mi is the dimension of Eλi.

By Theorem 5.24, the characteristic polynomial f(t) of T is the product

f(t) = (λ1 − t)m1(λ2 − t)m2 · · · (λk − t)mk .

It follows that the multiplicity of each eigenvalue is equal to the dimensionof the corresponding eigenspace, as expected.

Example 8


T(a, b, c, d) = (2a − b, a + b, c − d, c + d),

and let W1 = {(s, t, 0, 0) : s, t ∈ R} and W2 = {(0, 0, s, t) : s, t ∈ R}. Noticethat W1 and W2 are each T-invariant and that R4 = W1 ⊕ W2. Let β1 ={e1, e2}, β2 = {e3, e4}, and β = β1 ∪ β2 = {e1, e2, e3, e4}. Then β1 is an


ordered basis for W1, β2 is an ordered basis for W2, and β is an ordered basisfor R4. Let A = [T]β , B1 = [TW1 ]β1 , and B2 = [TW2 ]β2 . Then

B1 =(

2 −11 1

), B2 =

(1 −11 1

),

and

A =(

B1 OO B2

)=

⎛⎜⎜⎝2 −1 0 01 1 0 00 0 1 −10 0 1 1

⎞⎟⎟⎠ .

Let f(t), f1(t), and f2(t) denote the characteristic polynomials of T, TW1 ,and TW2 , respectively. Then

f(t) = det(A − tI) = det(B1 − tI) · det(B2 − tI) = f1(t) ·f2(t). ♦

The matrix A in Example 8 can be obtained by joining the matrices B1

and B2 in the manner explained in the next definition.

Definition. Let B1 ∈ Mm×m(F ), and let B2 ∈ Mn×n(F ). We define thedirect sum of B1 and B2, denoted B1⊕B2, as the (m+n)× (m+n) matrixA such that

Aij =

⎧⎪⎨⎪⎩(B1)ij for 1 ≤ i, j ≤ m

(B2)(i−m),(j−m) for m + 1 ≤ i, j ≤ n + m

0 otherwise.

If B1, B2, . . . , Bk are square matrices with entries from F , then we define thedirect sum of B1, B2, . . . , Bk recursively by

B1 ⊕ B2 ⊕ · · · ⊕ Bk = (B1 ⊕ B2 ⊕ · · · ⊕ Bk−1) ⊕ Bk.

If A = B1 ⊕ B2 ⊕ · · · ⊕ Bk, then we often write

A =

⎛⎜⎜⎜⎝B1 O · · · OO B2 · · · O...

......

O O · · · Bk

⎞⎟⎟⎟⎠ .

Example 9

Let

B1 =(

1 21 1

), B2 = (3), and B3 =

⎛⎝1 2 11 2 31 1 1

⎞⎠ .


Then

B1 ⊕ B2 ⊕ B3 =

⎛⎜⎜⎜⎜⎜⎜⎝

1 2 0 0 0 01 1 0 0 0 00 0 3 0 0 00 0 0 1 2 10 0 0 1 2 30 0 0 1 1 1

⎞⎟⎟⎟⎟⎟⎟⎠ . ♦

The final result of this section relates direct sums of matrices to directsums of invariant subspaces. It is an extension of Exercise 34 to the casek ≥ 2.

Theorem 5.25. Let T be a linear operator on a finite-dimensional vectorspace V, and let W1, W2, . . . ,Wk be T-invariant subspaces of V such thatV = W1 ⊕ W2 ⊕ · · · ⊕ Wk. For each i, let βi be an ordered basis for Wi, andlet β = β1 ∪ β2 ∪ · · · ∪ βk. Let A = [T]β and Bi = [TWi ]βi for i = 1, 2, . . . , k.Then A = B1 ⊕ B2 ⊕ · · · ⊕ Bk.


EXERCISES


(a) There exists a linear operator T with no T-invariant subspace.(b) If T is a linear operator on a finite-dimensional vector space V and

W is a T-invariant subspace of V, then the characteristic polyno-mial of TW divides the characteristic polynomial of T.

(c) Let T be a linear operator on a finite-dimensional vector space V,and let v and w be in V. If W is the T-cyclic subspace generatedby v, W′ is the T-cyclic subspace generated by w, and W = W′,then v = w.

(d) If T is a linear operator on a finite-dimensional vector space V,then for any v ∈ V the T-cyclic subspace generated by v is thesame as the T-cyclic subspace generated by T(v).

(e) Let T be a linear operator on an n-dimensional vector space. Thenthere exists a polynomial g(t) of degree n such that g(T) = T0.

(f) Any polynomial of degree n with leading coefficient (−1)n is thecharacteristic polynomial of some linear operator.

(g) If T is a linear operator on a finite-dimensional vector space V, andif V is the direct sum of k T-invariant subspaces, then there is anordered basis β for V such that [T]β is a direct sum of k matrices.


2. For each of the following linear operators T on the vector space V,determine whether the given subspace W is a T-invariant subspace ofV.

(a) V = P3(R), T(f(x)) = f ′(x), and W = P2(R)(b) V = P(R), T(f(x)) = xf(x), and W = P2(R)(c) V = R3, T(a, b, c) = (a + b + c, a + b + c, a + b + c), and

W = {(t, t, t) : t ∈ R}(d) V = C([0, 1]), T(f(t)) =

[∫ 1

0f(x) dx

]t, and

W = {f ∈ V : f(t) = at + b for some a and b}(e) V = M2×2(R), T(A) =

(0 11 0

)A, and W = {A ∈ V : At = A}

3. Let T be a linear operator on a finite-dimensional vector space V. Provethat the following subspaces are T-invariant.

(a) {0} and V(b) N(T) and R(T)(c) Eλ, for any eigenvalue λ of T

4. Let T be a linear operator on a vector space V, and let W be a T-invariant subspace of V. Prove that W is g(T)-invariant for any poly-nomial g(t).

5. Let T be a linear operator on a vector space V. Prove that the inter-section of any collection of T-invariant subspaces of V is a T-invariantsubspace of V.

6. For each linear operator T on the vector space V, find an ordered basisfor the T-cyclic subspace generated by the vector z.

(a) V = R4, T(a, b, c, d) = (a + b, b − c, a + c, a + d), and z = e1.(b) V = P3(R), T(f(x)) = f ′′(x), and z = x3.

(c) V = M2×2(R), T(A) = At, and z =(

0 11 0

).

(d) V = M2×2(R), T(A) =(

1 12 2

)A, and z =

(0 11 0

).

7. Prove that the restriction of a linear operator T to a T-invariant sub-space is a linear operator on that subspace.

8. Let T be a linear operator on a vector space with a T-invariant subspaceW. Prove that if v is an eigenvector of TW with corresponding eigenvalueλ, then the same is true for T.

9. For each linear operator T and cyclic subspace W in Exercise 6, computethe characteristic polynomial of TW in two ways, as in Example 6.


10. For each linear operator in Exercise 6, find the characteristic polynomialf(t) of T, and verify that the characteristic polynomial of TW (computedin Exercise 9) divides f(t).

11. Let T be a linear operator on a vector space V, let v be a nonzero vectorin V, and let W be the T-cyclic subspace of V generated by v. Provethat

(a) W is T-invariant.(b) Any T-invariant subspace of V containing v also contains W.

12. Prove that A =(

B1 B2

O B3

)in the proof of Theorem 5.21.

13. Let T be a linear operator on a vector space V, let v be a nonzero vectorin V, and let W be the T-cyclic subspace of V generated by v. For anyw ∈ V, prove that w ∈ W if and only if there exists a polynomial g(t)such that w = g(T)(v).

14. Prove that the polynomial g(t) of Exercise 13 can always be chosen sothat its degree is less than or equal to dim(W).

15. Use the Cayley–Hamilton theorem (Theorem 5.23) to prove its corol-lary for matrices. Warning: If f(t) = det(A − tI) is the characteristicpolynomial of A, it is tempting to “prove” that f(A) = O by saying“f(A) = det(A − AI) = det(O) = 0.” But this argument is nonsense.Why?

16. Let T be a linear operator on a finite-dimensional vector space V.

(a) Prove that if the characteristic polynomial of T splits, then sodoes the characteristic polynomial of the restriction of T to anyT-invariant subspace of V.

(b) Deduce that if the characteristic polynomial of T splits, then anynontrivial T-invariant subspace of V contains an eigenvector of T.

17. Let A be an n × n matrix. Prove that

dim(span({In, A, A2, . . .})) ≤ n.

18. Let A be an n × n matrix with characteristic polynomial

f(t) = (−1)ntn + an−1tn−1 + · · · + a1t + a0.

(a) Prove that A is invertible if and only if a0 �= 0.(b) Prove that if A is invertible, then

A−1 = (−1/a0)[(−1)nAn−1 + an−1An−2 + · · · + a1In].


(c) Use (b) to compute A−1 for

A =

⎛⎝1 2 10 2 30 0 −1

⎞⎠ .

19. Let A denote the k × k matrix⎛⎜⎜⎜⎜⎜⎜⎜⎝

0 0 · · · 0 −a0

1 0 · · · 0 −a1

0 1 · · · 0 −a2

......

......

0 0 · · · 0 −ak−2

0 0 · · · 1 −ak−1

⎞⎟⎟⎟⎟⎟⎟⎟⎠,

where a0, a1, . . . , ak−1 are arbitrary scalars. Prove that the character-istic polynomial of A is

(−1)k(a0 + a1t + · · · + ak−1tk−1 + tk).

Hint: Use mathematical induction on k, expanding the determinantalong the first row.

20. Let T be a linear operator on a vector space V, and suppose that V isa T-cyclic subspace of itself. Prove that if U is a linear operator on V,then UT = TU if and only if U = g(T) for some polynomial g(t). Hint:Suppose that V is generated by v. Choose g(t) according to Exercise 13so that g(T)(v) = U(v).

21. Let T be a linear operator on a two-dimensional vector space V. Provethat either V is a T-cyclic subspace of itself or T = cI for some scalar c.

22. Let T be a linear operator on a two-dimensional vector space V andsuppose that T �= cI for any scalar c. Show that if U is any linearoperator on V such that UT = TU, then U = g(T) for some polynomialg(t).

23. Let T be a linear operator on a finite-dimensional vector space V, andlet W be a T-invariant subspace of V. Suppose that v1, v2, . . . , vk areeigenvectors of T corresponding to distinct eigenvalues. Prove that ifv1 +v2 + · · ·+vk is in W, then vi ∈ W for all i. Hint: Use mathematicalinduction on k.

24. Prove that the restriction of a diagonalizable linear operator T to anynontrivial T-invariant subspace is also diagonalizable. Hint: Use theresult of Exercise 23.


25. (a) Prove the converse to Exercise 18(a) of Section 5.2: If T and Uare diagonalizable linear operators on a finite-dimensional vectorspace V such that UT = TU, then T and U are simultaneouslydiagonalizable. (See the definitions in the exercises of Section 5.2.)Hint: For any eigenvalue λ of T, show that Eλ is U-invariant, andapply Exercise 24 to obtain a basis for Eλ of eigenvectors of U.

(b) State and prove a matrix version of (a).

26. Let T be a linear operator on an n-dimensional vector space V such thatT has n distinct eigenvalues. Prove that V is a T-cyclic subspace of itself.Hint: Use Exercise 23 to find a vector v such that {v,T(v), . . . ,Tn−1(v)}is linearly independent.

Exercises 27 through 32 require familiarity with quotient spaces as definedin Exercise 31 of Section 1.3. Before attempting these exercises, the readershould first review the other exercises treating quotient spaces: Exercise 35of Section 1.6, Exercise 40 of Section 2.1, and Exercise 24 of Section 2.4.

For the purposes of Exercises 27 through 32, T is a fixed linear operator ona finite-dimensional vector space V, and W is a nonzero T-invariant subspaceof V. We require the following definition.

Definition. Let T be a linear operator on a vector space V, and let Wbe a T-invariant subspace of V. Define T : V/W → V/W by

T(v + W) = T(v) + W for any v + W ∈ V/W.

27. (a) Prove that T is well defined. That is, show that T(v + W) =T(v′ + W) whenever v + W = v′ + W.

(b) Prove that T is a linear operator on V/W.(c) Let η : V → V/W be the linear transformation defined in Exer-

cise 40 of Section 2.1 by η(v) = v + W. Show that the diagram ofFigure 5.6 commutes; that is, prove that ηT = Tη. (This exercisedoes not require the assumption that V is finite-dimensional.)

VT−−−−→ V

η

⏐⏐! ⏐⏐!η

V/WT−−−−→ V/W

Figure 5.6

28. Let f(t), g(t), and h(t) be the characteristic polynomials of T, TW,and T, respectively. Prove that f(t) = g(t)h(t). Hint: Extend anordered basis γ = {v1, v2, . . . , vk} for W to an ordered basis β ={v1, v2, . . . , vk, vk+1, . . . , vn} for V. Then show that the collection of


cosets α = {vk+1 + W, vk+2 + W, . . . , vn + W} is an ordered basis forV/W, and prove that

[T]β =(

B1 B2

O B3

),

where B1 = [T]γ and B3 = [T]α.

29. Use the hint in Exercise 28 to prove that if T is diagonalizable, then sois T.

30. Prove that if both TW and T are diagonalizable and have no commoneigenvalues, then T is diagonalizable.

The results of Theorem 5.22 and Exercise 28 are useful in devising methodsfor computing characteristic polynomials without the use of determinants.This is illustrated in the next exercise.

31. Let A =

⎛⎝1 1 −32 3 41 2 1

⎞⎠, let T = LA, and let W be the cyclic subspace

of R3 generated by e1.

(a) Use Theorem 5.22 to compute the characteristic polynomial of TW.(b) Show that {e2 + W} is a basis for R3/W, and use this fact to

compute the characteristic polynomial of T.(c) Use the results of (a) and (b) to find the characteristic polynomial

of A.

32. Prove the converse to Exercise 9(a) of Section 5.2: If the characteristicpolynomial of T splits, then there is an ordered basis β for V suchthat [T]β is an upper triangular matrix. Hints: Apply mathematicalinduction to dim(V). First prove that T has an eigenvector v, let W =span({v}), and apply the induction hypothesis to T : V/W → V/W.Exercise 35(b) of Section 1.6 is helpful here.

Exercises 33 through 40 are concerned with direct sums.

33. Let T be a linear operator on a vector space V, and let W1, W2, . . . ,Wk

be T-invariant subspaces of V. Prove that W1 + W2 + · · · + Wk is alsoa T-invariant subspace of V.

34. Give a direct proof of Theorem 5.25 for the case k = 2. (This result isused in the proof of Theorem 5.24.)

35. Prove Theorem 5.25. Hint: Begin with Exercise 34 and extend it usingmathematical induction on k, the number of subspaces.


36. Let T be a linear operator on a finite-dimensional vector space V.Prove that T is diagonalizable if and only if V is the direct sum ofone-dimensional T-invariant subspaces.

37. Let T be a linear operator on a finite-dimensional vector space V,and let W1, W2, . . . ,Wk be T-invariant subspaces of V such that V =W1 ⊕ W2 ⊕ · · · ⊕ Wk. Prove that

det(T) = det(TW1) det(TW2) · · ·det(TWk).

38. Let T be a linear operator on a finite-dimensional vector space V,and let W1, W2, . . . ,Wk be T-invariant subspaces of V such that V =W1 ⊕ W2 ⊕ · · · ⊕ Wk. Prove that T is diagonalizable if and only if TWi

is diagonalizable for all i.

39. Let C be a collection of diagonalizable linear operators on a finite-dimensional vector space V. Prove that there is an ordered basis βsuch that [T]β is a diagonal matrix for all T ∈ C if and only if theoperators of C commute under composition. (This is an extension ofExercise 25.) Hints for the case that the operators commute: The resultis trivial if each operator has only one eigenvalue. Otherwise, establishthe general result by mathematical induction on dim(V), using the factthat V is the direct sum of the eigenspaces of some operator in C thathas more than one eigenvalue.

40. Let B1, B2, . . . , Bk be square matrices with entries in the same field, andlet A = B1 ⊕ B2 ⊕ · · · ⊕ Bk. Prove that the characteristic polynomialof A is the product of the characteristic polynomials of the Bi’s.

41. Let

A =

⎛⎜⎜⎜⎝1 2 · · · n

n + 1 n + 2 · · · 2n...

......

n2 − n + 1 n2 − n + 2 · · · n2

⎞⎟⎟⎟⎠ .

Find the characteristic polynomial of A. Hint: First prove that A hasrank 2 and that span({(1, 1, . . . , 1), (1, 2, . . . , n)}) is LA-invariant.

42. Let A ∈ Mn×n(R) be the matrix defined by Aij = 1 for all i and j.Find the characteristic polynomial of A.



Absorbing Markov chain 304Absorbing state 304Characteristic polynomial of a linear

operator 249Characteristic polynomial of a ma-

trix 248Column sum of a matrix 295Convergence of matrices 284Cyclic subspace 313Diagonalizable linear operator 245Diagonalizable matrix 246Direct sum of matrices 320Direct sum of subspaces 275Eigenspace of a linear operator 264Eigenspace of a matrix 264Eigenvalue of a linear operator 246Eigenvalue of a matrix 246Eigenvector of a linear operator

246

Eigenvector of a matrix 246Fixed probability vector 301Generator of a cyclic subspace 313Gerschgorin disk 296Initial probability vector for a

Markov chain 292Invariant subspace 313Limit of a sequence of matrices 284Markov chain 291Markov process 291Multiplicity of an eigenvalue 263Probability vector 289Regular transition matrix 294Row sum of a matrix 295Splits 262Stochastic process 288Sum of subspaces 275Transition matrix 288

6Inner Product Spaces

6.1 Inner Products and Norms6.2 The Gram-Schmidt Orthogonalization Process and Orthogonal

Complements6.3 The Adjoint of a Linear Operator6.4 Normal and Self-Adjoint Operators6.5 Unitary and Orthogonal Operators and Their Matrices6.6 Orthogonal Projections and the Spectral Theorem6.7* The Singular Value Decomposition and the Pseudoinverse6.8* Bilinear and Quadratic Forms6.9* Einstein’s Special Theory of Relativity6.10* Conditioning and the Rayleigh Quotient6.11* The Geometry of Orthogonal Operators

Most applications of mathematics are involved with the concept of mea-surement and hence of the magnitude or relative size of various quantities. Soit is not surprising that the fields of real and complex numbers, which have abuilt-in notion of distance, should play a special role. Except for Section 6.8,we assume that all vector spaces are over the field F , where F denotes eitherR or C. (See Appendix D for properties of complex numbers.)

We introduce the idea of distance or length into vector spaces via a muchricher structure, the so-called inner product space structure. This addedstructure provides applications to geometry (Sections 6.5 and 6.11), physics(Section 6.9), conditioning in systems of linear equations (Section 6.10), leastsquares (Section 6.3), and quadratic forms (Section 6.8).

6.1 INNER PRODUCTS AND NORMS

Many geometric notions such as angle, length, and perpendicularity in R2

and R3 may be extended to more general real and complex vector spaces. Allof these ideas are related to the concept of inner product.

Definition. Let V be a vector space over F . An inner product on Vis a function that assigns, to every ordered pair of vectors x and y in V, a

329

330 Chap. 6 Inner Product Spaces

scalar in F , denoted 〈x, y〉, such that for all x, y, and z in V and all c in F ,the following hold:

(a) 〈x + z, y〉 = 〈x, y〉 + 〈z, y〉.(b) 〈cx, y〉 = c 〈x, y〉.(c) 〈x, y〉 = 〈y, x〉, where the bar denotes complex conjugation.(d) 〈x, x〉 > 0 if x �= 0 .

Note that (c) reduces to 〈x, y〉 = 〈y, x〉 if F = R. Conditions (a) and (b)simply require that the inner product be linear in the first component.

It is easily shown that if a1, a2, . . . , an ∈ F and y, v1, v2, . . . , vn ∈ V, then⟨n∑

i=1

aivi, y

⟩=

n∑i=1

ai 〈vi, y〉 .

Example 1

For x = (a1, a2, . . . , an) and y = (b1, b2, . . . , bn) in Fn, define

〈x, y〉 =n∑

i=1

aibi.

The verification that 〈 · , ·〉 satisfies conditions (a) through (d) is easy. Forexample, if z = (c1, c2, . . . , cn), we have for (a)

〈x + z, y〉 =n∑

i=1

(ai + ci)bi =n∑

i=1

aibi +n∑

i=1

cibi

= 〈x, y〉 + 〈z, y〉 .

Thus, for x = (1 + i, 4) and y = (2 − 3i, 4 + 5i) in C2,

〈x, y〉 = (1 + i)(2 + 3i) + 4(4 − 5i) = 15 − 15i. ♦

The inner product in Example 1 is called the standard inner producton Fn. When F = R the conjugations are not needed, and in early coursesthis standard inner product is usually called the dot product and is denotedby x �y instead of 〈x, y〉.

Example 2

If 〈x, y〉 is any inner product on a vector space V and r > 0, we may defineanother inner product by the rule 〈x, y〉′ = r 〈x, y〉. If r ≤ 0, then (d) wouldnot hold. ♦

Sec. 6.1 Inner Products and Norms 331

Example 3

Let V = C([0, 1]), the vector space of real-valued continuous functions on[0, 1]. For f, g ∈ V, define 〈f, g〉 =

∫ 1

0f(t)g(t) dt. Since the preceding integral

is linear in f , (a) and (b) are immediate, and (c) is trivial. If f �= 0 , then f2

is bounded away from zero on some subinterval of [0, 1] (continuity is usedhere), and hence 〈f, f〉 =

∫ 1

0[f(t)]2 dt > 0. ♦

Definition. Let A ∈ Mm×n(F ). We define the conjugate transposeor adjoint of A to be the n×m matrix A∗ such that (A∗)ij = Aji for all i, j.

Example 4

Let

A =(

i 1 + 2i2 3 + 4i

).

Then

A∗ =( −i 2

1 − 2i 3 − 4i

). ♦

Notice that if x and y are viewed as column vectors in Fn, then 〈x, y〉 =y∗x.

The conjugate transpose of a matrix plays a very important role in theremainder of this chapter. In the case that A has real entries, A∗ is simplythe transpose of A.

Example 5

Let V = Mn×n(F ), and define 〈A, B〉 = tr(B∗A) for A, B ∈ V. (Recall thatthe trace of a matrix A is defined by tr(A) =

∑ni=1 Aii.) We verify that

(a) and (d) of the definition of inner product hold and leave (b) and (c) tothe reader. For this purpose, let A, B, C ∈ V. Then (using Exercise 6 ofSection 1.3)

〈A + B, C〉 = tr(C∗(A + B)) = tr(C∗A + C∗B)= tr(C∗A) + tr(C∗B) = 〈A, C〉 + 〈B, C〉 .

Also

〈A, A〉 = tr(A∗A) =n∑

i=1

(A∗A)ii =n∑

i=1

n∑k=1

(A∗)ikAki

=n∑

i=1

n∑k=1

AkiAki =n∑

i=1

n∑k=1

|Aki|2.

Now if A �= O, then Aki �= 0 for some k and i. So 〈A, A〉 > 0. ♦


The inner product on Mn×n(F ) in Example 5 is called the Frobeniusinner product.

A vector space V over F endowed with a specific inner product is calledan inner product space. If F = C, we call V a complex inner productspace, whereas if F = R, we call V a real inner product space.

It is clear that if V has an inner product 〈x, y〉 and W is a subspace ofV, then W is also an inner product space when the same function 〈x, y〉 isrestricted to the vectors x, y ∈ W.

Thus Examples 1, 3, and 5 also provide examples of inner product spaces.For the remainder of this chapter, Fn denotes the inner product space withthe standard inner product as defined in Example 1. Likewise, Mn×n(F )denotes the inner product space with the Frobenius inner product as definedin Example 5. The reader is cautioned that two distinct inner products ona given vector space yield two distinct inner product spaces. For instance, itcan be shown that both

〈f(x), g(x)〉1 =∫ 1

0

f(t)g(t) dt and 〈f(x), g(x)〉2 =∫ 1

−1

f(t)g(t) dt

are inner products on the vector space P(R). Even though the underlyingvector space is the same, however, these two inner products yield two differentinner product spaces. For example, the polynomials f(x) = x and g(x) = x2

are orthogonal in the second inner product space, but not in the first.A very important inner product space that resembles C([0, 1]) is the space

H of continuous complex-valued functions defined on the interval [0, 2π] withthe inner product

〈f, g〉 =12π

∫ 2π

0

f(t)g(t) dt.

The reason for the constant 1/2π will become evident later. This inner prod-uct space, which arises often in the context of physical situations, is examinedmore closely in later sections.

At this point, we mention a few facts about integration of complex-valuedfunctions. First, the imaginary number i can be treated as a constant underthe integration sign. Second, every complex-valued function f may be writtenas f = f1 + if2, where f1 and f2 are real-valued functions. Thus we have∫

f =∫

f1 + i

∫f2 and

∫f =

∫f.

From these properties, as well as the assumption of continuity, it followsthat H is an inner product space (see Exercise 16(a)).

Some properties that follow easily from the definition of an inner productare contained in the next theorem.


Theorem 6.1. Let V be an inner product space. Then for x, y, z ∈ V andc ∈ F , the following statements are true.

(a) 〈x, y + z〉 = 〈x, y〉 + 〈x, z〉.(b) 〈x, cy〉 = c 〈x, y〉.(c) 〈x, 0 〉 = 〈0 , x〉 = 0.(d) 〈x, x〉 = 0 if and only if x = 0 .(e) If 〈x, y〉 = 〈x, z〉 for all x ∈ V, then y = z.

Proof. (a) We have

〈x, y + z〉 = 〈y + z, x〉 = 〈y, x〉 + 〈z, x〉= 〈y, x〉 + 〈z, x〉 = 〈x, y〉 + 〈x, z〉 .

The proofs of (b), (c), (d), and (e) are left as exercises.

The reader should observe that (a) and (b) of Theorem 6.1 show that theinner product is conjugate linear in the second component.

In order to generalize the notion of length in R3 to arbitrary inner productspaces, we need only observe that the length of x = (a, b, c) ∈ R3 is given by√

a2 + b2 + c2 =√〈x, x〉. This leads to the following definition.

Definition. Let V be an inner product space. For x ∈ V, we define thenorm or length of x by ‖x‖ =

√〈x, x〉.

Example 6

Let V = Fn. If x = (a1, a2 . . . , an), then

‖x‖ = ‖(a1, a2 . . . , an)‖ =

[n∑

i=1

|ai|2]1/2

is the Euclidean definition of length. Note that if n = 1, we have ‖a‖ = |a|.♦

As we might expect, the well-known properties of Euclidean length in R3

hold in general, as shown next.

Theorem 6.2. Let V be an inner product space over F . Then for allx, y ∈ V and c ∈ F , the following statements are true.

(a) ‖cx‖ = |c| ·‖x‖.(b) ‖x‖ = 0 if and only if x = 0 . In any case, ‖x‖ ≥ 0.(c) (Cauchy–Schwarz Inequality) | 〈x, y〉 | ≤ ‖x‖ ·‖y‖.(d) (Triangle Inequality) ‖x + y‖ ≤ ‖x‖ + ‖y‖.


Proof. We leave the proofs of (a) and (b) as exercises.(c) If y = 0 , then the result is immediate. So assume that y �= 0 . For any

c ∈ F , we have

0 ≤ ‖x − cy‖2 = 〈x − cy, x − cy〉 = 〈x, x − cy〉 − c 〈y, x − cy〉= 〈x, x〉 − c 〈x, y〉 − c 〈y, x〉 + cc 〈y, y〉 .

In particular, if we set

c =〈x, y〉〈y, y〉 ,

the inequality becomes

0 ≤ 〈x, x〉 − | 〈x, y〉 |2〈y, y〉 = ‖x‖2 − | 〈x, y〉 |2

‖y‖2,

from which (c) follows.(d) We have

‖x + y‖2 = 〈x + y, x + y〉 = 〈x, x〉 + 〈y, x〉 + 〈x, y〉 + 〈y, y〉= ‖x‖2 + 2�〈x, y〉 + ‖y‖2

≤ ‖x‖2 + 2| 〈x, y〉 | + ‖y‖2

≤ ‖x‖2 + 2‖x‖ ·‖y‖ + ‖y‖2

= (‖x‖ + ‖y‖)2,

where �〈x, y〉 denotes the real part of the complex number 〈x, y〉. Note thatwe used (c) to prove (d).

The case when equality results in (c) and (d) is considered in Exercise 15.

Example 7

For Fn, we may apply (c) and (d) of Theorem 6.2 to the standard innerproduct to obtain the following well-known inequalities:∣∣∣∣∣

n∑i=1

aibi

∣∣∣∣∣ ≤[

n∑i=1

|ai|2]1/2 [ n∑

i=1

|bi|2]1/2

and [n∑

i=1

|ai + bi|2]1/2

≤[

n∑i=1

|ai|2]1/2

+

[n∑

i=1

|bi|2]1/2

. ♦


The reader may recall from earlier courses that, for x and y in R3 or R2,we have that 〈x, y〉 = ‖x‖ ·‖y‖ cos θ, where θ (0 ≤ θ ≤ π) denotes the anglebetween x and y. This equation implies (c) immediately since | cos θ| ≤ 1.Notice also that nonzero vectors x and y are perpendicular if and only ifcos θ = 0, that is, if and only if 〈x, y〉 = 0.

We are now at the point where we can generalize the notion of perpendic-ularity to arbitrary inner product spaces.

Definitions. Let V be an inner product space. Vectors x and y in V areorthogonal (perpendicular) if 〈x, y〉 = 0. A subset S of V is orthogonalif any two distinct vectors in S are orthogonal. A vector x in V is a unitvector if ‖x‖ = 1. Finally, a subset S of V is orthonormal if S is orthogonaland consists entirely of unit vectors.

Note that if S = {v1, v2, . . .}, then S is orthonormal if and only if 〈vi, vj〉 =δij , where δij denotes the Kronecker delta. Also, observe that multiplyingvectors by nonzero scalars does not affect their orthogonality and that if x isany nonzero vector, then (1/‖x‖)x is a unit vector. The process of multiplyinga nonzero vector by the reciprocal of its length is called normalizing.

Example 8

In F3, {(1, 1, 0), (1,−1, 1), (−1, 1, 2)} is an orthogonal set of nonzero vectors,but it is not orthonormal; however, if we normalize the vectors in the set, weobtain the orthonormal set{

1√2(1, 1, 0),

1√3(1,−1, 1),

1√6(−1, 1, 2)

}. ♦

Our next example is of an infinite orthonormal set that is important inanalysis. This set is used in later examples in this chapter.

Example 9

Recall the inner product space H (defined on page 332). We introduce an im-portant orthonormal subset S of H. For what follows, i is the imaginary num-ber such that i2 = −1. For any integer n, let fn(t) = eint, where 0 ≤ t ≤ 2π.(Recall that eint = cos nt + i sin nt.) Now define S = {fn : n is an integer}.Clearly S is a subset of H. Using the property that eit = e−it for every realnumber t, we have, for m �= n,

〈fm, fn〉 =12π

∫ 2π

0

eimteint dt =12π

∫ 2π

0

ei(m−n)t dt

=1

2π(m − n)ei(m−n)t

∣∣∣∣2π

0

= 0.


Also,

〈fn, fn〉 =12π

∫ 2π

0

ei(n−n)t dt =12π

∫ 2π

0

1 dt = 1.

In other words, 〈fm, fn〉 = δmn. ♦

EXERCISES


(a) An inner product is a scalar-valued function on the set of orderedpairs of vectors.

(b) An inner product space must be over the field of real or complexnumbers.

(c) An inner product is linear in both components.(d) There is exactly one inner product on the vector space Rn.(e) The triangle inequality only holds in finite-dimensional inner prod-

uct spaces.(f) Only square matrices have a conjugate-transpose.(g) If x, y, and z are vectors in an inner product space such that

〈x, y〉 = 〈x, z〉, then y = z.(h) If 〈x, y〉 = 0 for all x in an inner product space, then y = 0 .

2. Let x = (2, 1 + i, i) and y = (2− i, 2, 1 + 2i) be vectors in C3. Compute〈x, y〉, ‖x‖, ‖y‖, and ‖x + y‖. Then verify both the Cauchy–Schwarzinequality and the triangle inequality.

3. In C([0, 1]), let f(t) = t and g(t) = et. Compute 〈f, g〉 (as defined inExample 3), ‖f‖, ‖g‖, and ‖f + g‖. Then verify both the Cauchy–Schwarz inequality and the triangle inequality.

4. (a) Complete the proof in Example 5 that 〈 · , ·〉 is an inner product(the Frobenius inner product) on Mn×n(F ).

(b) Use the Frobenius inner product to compute ‖A‖, ‖B‖, and 〈A, B〉for

A =(

1 2 + i3 i

)and B =

(1 + i 0

i −i

).

5. In C2, show that 〈x, y〉 = xAy∗ is an inner product, where

A =(

1 i−i 2

).

Compute 〈x, y〉 for x = (1 − i, 2 + 3i) and y = (2 + i, 3 − 2i).




8. Provide reasons why each of the following is not an inner product onthe given vector spaces.

(a) 〈(a, b), (c, d)〉 = ac − bd on R2.(b) 〈A, B〉 = tr(A + B) on M2×2(R).(c) 〈f(x), g(x)〉 =

∫ 1

0f ′(t)g(t) dt on P(R), where ′ denotes differentia-

tion.

9. Let β be a basis for a finite-dimensional inner product space.

(a) Prove that if 〈x, z〉 = 0 for all z ∈ β, then x = 0 .(b) Prove that if 〈x, z〉 = 〈y, z〉 for all z ∈ β, then x = y.

10.† Let V be an inner product space, and suppose that x and y are orthog-onal vectors in V. Prove that ‖x + y‖2 = ‖x‖2 + ‖y‖2. Deduce thePythagorean theorem in R2.

11. Prove the parallelogram law on an inner product space V; that is, showthat

‖x + y‖2 + ‖x − y‖2 = 2‖x‖2 + 2‖y‖2 for all x, y ∈ V.

What does this equation state about parallelograms in R2?

12.† Let {v1, v2, . . . , vk} be an orthogonal set in V, and let a1, a2, . . . , ak bescalars. Prove that ∥∥∥∥∥

k∑i=1

aivi

∥∥∥∥∥2

=k∑

i=1

|ai|2‖vi‖2.

13. Suppose that 〈 · , ·〉1 and 〈 · , ·〉2 are two inner products on a vector spaceV. Prove that 〈 · , ·〉 = 〈 · , ·〉1 + 〈 · , ·〉2 is another inner product on V.

14. Let A and B be n × n matrices, and let c be a scalar. Prove that(A + cB)∗ = A∗ + cB∗.

15. (a) Prove that if V is an inner product space, then | 〈x, y〉 | = ‖x‖ ·‖y‖if and only if one of the vectors x or y is a multiple of the other.Hint: If the identity holds and y �= 0 , let

a =〈x, y〉‖y‖2

,


and let z = x − ay. Prove that y and z are orthogonal and

|a| =‖x‖‖y‖ .

Then apply Exercise 10 to ‖x‖2 = ‖ay + z‖2 to obtain ‖z‖ = 0.(b) Derive a similar result for the equality ‖x + y‖ = ‖x‖ + ‖y‖, and

generalize it to the case of n vectors.

16. (a) Show that the vector space H with 〈 · , ·〉 defined on page 332 is aninner product space.

(b) Let V = C([0, 1]), and define

〈f, g〉 =∫ 1/2

0

f(t)g(t) dt.

Is this an inner product on V?

17. Let T be a linear operator on an inner product space V, and supposethat ‖T(x)‖ = ‖x‖ for all x. Prove that T is one-to-one.

18. Let V be a vector space over F , where F = R or F = C, and let W bean inner product space over F with inner product 〈 · , ·〉. If T : V → Wis linear, prove that 〈x, y〉′ = 〈T(x), T(y)〉 defines an inner product onV if and only if T is one-to-one.

19. Let V be an inner product space. Prove that

(a) ‖x ± y‖2 = ‖x‖2 ± 2�〈x, y〉 + ‖y‖2 for all x, y ∈ V, where �〈x, y〉denotes the real part of the complex number 〈x, y〉.

(b) | ‖x‖ − ‖y‖ | ≤ ‖x − y‖ for all x, y ∈ V.

20. Let V be an inner product space over F . Prove the polar identities: Forall x, y ∈ V,

(a) 〈x, y〉 = 14‖x + y‖2 − 1

4‖x − y‖2 if F = R;

(b) 〈x, y〉 = 14

∑4k=1 ik‖x + iky‖2 if F = C, where i2 = −1.

21. Let A be an n × n matrix. Define

A1 =12(A + A∗) and A2 =

12i

(A − A∗).

(a) Prove that A∗1 = A1, A∗

2 = A2, and A = A1 + iA2. Would it bereasonable to define A1 and A2 to be the real and imaginary parts,respectively, of the matrix A?

(b) Let A be an n × n matrix. Prove that the representation in (a) isunique. That is, prove that if A = B1 + iB2, where B∗

1 = B1 andB∗

2 = B2, then B1 = A1 and B2 = A2.


22. Let V be a real or complex vector space (possibly infinite-dimensional),and let β be a basis for V. For x, y ∈ V there exist v1, v2, . . . , vn ∈ βsuch that

x =n∑

i=1

aivi and y =n∑

i=1

bivi.

Define

〈x, y〉 =n∑

i=1

aibi.

(a) Prove that 〈 · , ·〉 is an inner product on V and that β is an or-thonormal basis for V. Thus every real or complex vector spacemay be regarded as an inner product space.

(b) Prove that if V = Rn or V = Cn and β is the standard orderedbasis, then the inner product defined above is the standard innerproduct.

23. Let V = Fn, and let A ∈ Mn×n(F ).

(a) Prove that 〈x, Ay〉 = 〈A∗x, y〉 for all x, y ∈ V.(b) Suppose that for some B ∈ Mn×n(F ), we have 〈x, Ay〉 = 〈Bx, y〉

for all x, y ∈ V. Prove that B = A∗.(c) Let α be the standard ordered basis for V. For any orthonormal

basis β for V, let Q be the n × n matrix whose columns are thevectors in β. Prove that Q∗ = Q−1.

(d) Define linear operators T and U on V by T(x) = Ax and U(x) =A∗x. Show that [U]β = [T]∗β for any orthonormal basis β for V.


Definition. Let V be a vector space over F , where F is either R orC. Regardless of whether V is or is not an inner product space, we may stilldefine a norm ‖ ·‖ as a real-valued function on V satisfying the following threeconditions for all x, y ∈ V and a ∈ F :

(1) ‖x‖ ≥ 0, and ‖x‖ = 0 if and only if x = 0 .

(2) ‖ax‖ = |a| ·‖x‖.(3) ‖x + y‖ ≤ ‖x‖ + ‖y‖.

24. Prove that the following are norms on the given vector spaces V.

(a) V = Mm×n(F ); ‖A‖ = maxi,j

|Aij | for all A ∈ V

(b) V = C([0, 1]); ‖f‖ = maxt∈[0,1]

|f(t)| for all f ∈ V


(c) V = C([0, 1]); ‖f‖ =∫ 1

0

|f(t)| dt for all f ∈ V

(d) V = R2; ‖(a, b)‖ = max{|a|, |b|} for all (a, b) ∈ V

25. Use Exercise 20 to show that there is no inner product 〈 · , ·〉 on R2

such that ‖x‖2 = 〈x, x〉 for all x ∈ R2 if the norm is defined as inExercise 24(d).

26. Let ‖ ·‖ be a norm on a vector space V, and define, for each ordered pairof vectors, the scalar d(x, y) = ‖x − y‖, called the distance between xand y. Prove the following results for all x, y, z ∈ V.

(a) d(x, y) ≥ 0.(b) d(x, y) = d(y, x).(c) d(x, y) ≤ d(x, z) + d(z, y).(d) d(x, x) = 0.(e) d(x, y) �= 0 if x �= y.

27. Let ‖ ·‖ be a norm on a real vector space V satisfying the parallelogramlaw given in Exercise 11. Define

〈x, y〉 =14[‖x + y‖2 − ‖x − y‖2

].

Prove that 〈 · , ·〉 defines an inner product on V such that ‖x‖2 = 〈x, x〉for all x ∈ V.

Hints:

(a) Prove 〈x, 2y〉 = 2 〈x, y〉 for all x, y ∈ V.(b) Prove 〈x + u, y〉 = 〈x, y〉 + 〈u, y〉 for all x, u, y ∈ V.(c) Prove 〈nx, y〉 = n 〈x, y〉 for every positive integer n and every

x, y ∈ V.(d) Prove m

⟨1mx, y

⟩= 〈x, y〉 for every positive integer m and every

x, y ∈ V.(e) Prove 〈rx, y〉 = r 〈x, y〉 for every rational number r and every

x, y ∈ V.(f) Prove | 〈x, y〉 | ≤ ‖x‖‖y‖ for every x, y ∈ V. Hint: Condition (3) in

the definition of norm can be helpful.(g) Prove that for every c ∈ R, every rational number r, and every

x, y ∈ V,

|c 〈x, y〉 − 〈cx, y〉 | = |(c−r) 〈x, y〉 − 〈(c−r)x, y〉 | ≤ 2|c−r|‖x‖‖y‖.

(h) Use the fact that for any c ∈ R, |c − r| can be made arbitrarilysmall, where r varies over the set of rational numbers, to establishitem (b) of the definition of inner product.

Sec. 6.2 Gram-Schmidt Orthogonalization Process 341

28. Let V be a complex inner product space with an inner product 〈 · , ·〉.Let [ · , · ] be the real-valued function such that [x, y] is the real part ofthe complex number 〈x, y〉 for all x, y ∈ V. Prove that [ · , · ] is an innerproduct for V, where V is regarded as a vector space over R. Prove,furthermore, that [x, ix] = 0 for all x ∈ V.

29. Let V be a vector space over C, and suppose that [ · , · ] is a real innerproduct on V, where V is regarded as a vector space over R, such that[x, ix] = 0 for all x ∈ V. Let 〈 · , ·〉 be the complex-valued functiondefined by

〈x, y〉 = [x, y] + i[x, iy] for x, y ∈ V.

Prove that 〈 · , ·〉 is a complex inner product on V.

30. Let ‖ ·‖ be a norm (as defined in Exercise 24) on a complex vectorspace V satisfying the parallelogram law given in Exercise 11. Provethat there is an inner product 〈 · , ·〉 on V such that ‖x‖2 = 〈x, x〉 forall x ∈ V.

Hint: Apply Exercise 27 to V regarded as a vector space over R. Thenapply Exercise 29.

6.2 THE GRAM–SCHMIDT ORTHOGONALIZATION PROCESSAND ORTHOGONAL COMPLEMENTS

In previous chapters, we have seen the special role of the standard orderedbases for Cn and Rn. The special properties of these bases stem from the factthat the basis vectors form an orthonormal set. Just as bases are the buildingblocks of vector spaces, bases that are also orthonormal sets are the buildingblocks of inner product spaces. We now name such bases.

Definition. Let V be an inner product space. A subset of V is anorthonormal basis for V if it is an ordered basis that is orthonormal.

Example 1

The standard ordered basis for Fn is an orthonormal basis for Fn. ♦Example 2

The set {(1√5,

2√5

),

(2√5,−1√

5

)}is an orthonormal basis for R2. ♦


The next theorem and its corollaries illustrate why orthonormal sets and,in particular, orthonormal bases are so important.

Theorem 6.3. Let V be an inner product space and S = {v1, v2, . . . , vk}be an orthogonal subset of V consisting of nonzero vectors. If y ∈ span(S),then

y =k∑

i=1

〈y, vi〉‖vi‖2

vi.

Proof. Write y =k∑

i=1

aivi, where a1, a2, . . . , ak ∈ F . Then, for 1 ≤ j ≤ k,

we have

〈y, vj〉 =

⟨k∑

i=1

aivi, vj

⟩=

k∑i=1

ai 〈vi, vj〉 = aj 〈vj , vj〉 = aj‖vj‖2.

So aj =〈y, vj〉‖vj‖2

, and the result follows.

The next corollary follows immediately from Theorem 6.3.

Corollary 1. If, in addition to the hypotheses of Theorem 6.3, S isorthonormal and y ∈ span(S), then

y =k∑

i=1

〈y, vi〉 vi.

If V possesses a finite orthonormal basis, then Corollary 1 allows us tocompute the coefficients in a linear combination very easily. (See Example 3.)

Corollary 2. Let V be an inner product space, and let S be an orthogonalsubset of V consisting of nonzero vectors. Then S is linearly independent.

Proof. Suppose that v1, v2, . . . , vk ∈ S and

k∑i=1

aivi = 0 .

As in the proof of Theorem 6.3 with y = 0 , we have aj = 〈0 , vj〉 /‖vj‖2 = 0for all j. So S is linearly independent.


Example 3

By Corollary 2, the orthonormal set{1√2(1, 1, 0),

1√3(1,−1, 1),

1√6(−1, 1, 2)

}obtained in Example 8 of Section 6.1 is an orthonormal basis for R3. Letx = (2, 1, 3). The coefficients given by Corollary 1 to Theorem 6.3 thatexpress x as a linear combination of the basis vectors are

a1 =1√2(2 + 1) =

3√2, a2 =

1√3(2 − 1 + 3) =

4√3,

and

a3 =1√6(−2 + 1 + 6) =

5√6.

As a check, we have

(2, 1, 3) =32(1, 1, 0) +

43(1,−1, 1) +

56(−1, 1, 2). ♦

Corollary 2 tells us that the vector space H in Section 6.1 contains aninfinite linearly independent set, and hence H is not a finite-dimensional vectorspace.

Of course, we have not yet shown that every finite-dimensional inner prod-uct space possesses an orthonormal basis. The next theorem takes us mostof the way in obtaining this result. It tells us how to construct an orthogonalset from a linearly independent set of vectors in such a way that both setsgenerate the same subspace.

Before stating this theorem, let us consider a simple case. Suppose that{w1, w2} is a linearly independent subset of an inner product space (andhence a basis for some two-dimensional subspace). We want to constructan orthogonal set from {w1, w2} that spans the same subspace. Figure 6.1suggests that the set {v1, v2}, where v1 = w1 and v2 = w2 − cw1, has thisproperty if c is chosen so that v2 is orthogonal to W1.

To find c, we need only solve the following equation:

0 = 〈v2, w1〉 = 〈w2 − cw1, w1〉 = 〈w2, w1〉 − c 〈w1, w1〉 .

So

c =〈w2, w1〉‖w1‖2

.

Thus

v2 = w2 − 〈w2, w1〉‖w1‖2

w1.


��

��

��

��

��

��

��

��

�

w2v2

cw1

w1 = v1

Figure 6.1

The next theorem shows us that this process can be extended to any finitelinearly independent subset.

Theorem 6.4. Let V be an inner product space and S = {w1, w2, . . . , wn}be a linearly independent subset of V. Define S′ = {v1, v2, . . . , vn}, wherev1 = w1 and

vk = wk −k−1∑j=1

〈wk, vj〉‖vj‖2

vj for 2 ≤ k ≤ n. (1)

Then S′ is an orthogonal set of nonzero vectors such that span(S′) = span(S).

Proof. The proof is by mathematical induction on n, the number of vectorsin S. For k = 1, 2, . . . , n, let Sk = {w1, w2, . . . , wk}. If n = 1, then thetheorem is proved by taking S′

1 = S1; i.e., v1 = w1 �= 0 . Assume then that theset S′

k−1 = {v1, v2, . . . , vk−1} with the desired properties has been constructedby the repeated use of (1). We show that the set S′

k = {v1, v2, . . . , vk−1, vk}also has the desired properties, where vk is obtained from S′

k−1 by (1). If vk =0 , then (1) implies that wk ∈ span(S′

k−1) = span(Sk−1), which contradictsthe assumption that Sk is linearly independent. For 1 ≤ i ≤ k − 1, it followsfrom (1) that

〈vk, vi〉 = 〈wk, vi〉 −k−1∑j=1

〈wk, vj〉‖vj‖2

〈vj , vi〉 = 〈wk, vi〉 − 〈wk, vi〉‖vi‖2

‖vi‖2 = 0,

since 〈vj , vi〉 = 0 if i �= j by the induction assumption that S′k−1 is orthogonal.

Hence S′k is an orthogonal set of nonzero vectors. Now, by (1), we have that

span(S′k) ⊆ span(Sk). But by Corollary 2 to Theorem 6.3, S′

k is linearlyindependent; so dim(span(S′

k)) = dim(span(Sk)) = k. Therefore span(S′k) =

span(Sk).

The construction of {v1, v2, . . . , vn} by the use of Theorem 6.4 is calledthe Gram–Schmidt process.


Example 4

In R4, let w1 = (1, 0, 1, 0), w2 = (1, 1, 1, 1), and w3 = (0, 1, 2, 1). Then{w1, w2, w3} is linearly independent. We use the Gram–Schmidt process tocompute the orthogonal vectors v1, v2, and v3, and then we normalize thesevectors to obtain an orthonormal set.

Take v1 = w1 = (1, 0, 1, 0). Then

v2 = w2 − 〈w2, v1〉‖v1‖2

v1

= (1, 1, 1, 1) − 22(1, 0, 1, 0)

= (0, 1, 0, 1).

Finally,

v3 = w3 − 〈w3, v1〉‖v1‖2

v1 − 〈w3, v2〉‖v2‖2

v2

= (0, 1, 2, 1) − 22(1, 0, 1, 0) − 2

2(0, 1, 0, 1)

= (−1, 0, 1, 0).

These vectors can be normalized to obtain the orthonormal basis {u1, u2, u3},where

u1 =1

‖v1‖v1 =1√2(1, 0, 1, 0),

u2 =1

‖v2‖v2 =1√2(0, 1, 0, 1),

and

u3 =v3

‖v3‖ =1√2(−1, 0, 1, 0). ♦

Example 5

Let V = P(R) with the inner product 〈f(x), g(x)〉 =∫ 1

−1f(t)g(t) dt, and

consider the subspace P2(R) with the standard ordered basis β. We use theGram–Schmidt process to replace β by an orthogonal basis {v1, v2, v3} forP2(R), and then use this orthogonal basis to obtain an orthonormal basis forP2(R).

Take v1 = 1. Then ‖v1‖2 =∫ 1

−1

12 dt = 2, and 〈x, v1〉 =∫ 1

−1

t · 1 dt = 0.

Thus

v2 = x − 〈v1, x〉‖v1‖2

= x − 02

= x.


Furthermore,

⟨x2, v1

⟩=∫ 1

−1

t2 · 1 dt =23

and⟨x2, v2

⟩=∫ 1

−1

t2 · t dt = 0.

Therefore

v3 = x2 −⟨x2, v1

⟩‖v1‖2

v1 −⟨x2, v2

⟩‖v2‖2

v2

= x2 − 13· 1 − 0 · x

= x2 − 13.

We conclude that {1, x, x2 − 13} is an orthogonal basis for P2(R).

To obtain an orthonormal basis, we normalize v1, v2, and v3 to obtain

u1 =1√∫ 1

−112 dt

=1√2,

u2 =x√∫ 1

−1t2 dt

=

√32

x,

and similarly,

u3 =v3

‖v3‖ =

√58

(3x2 − 1).

Thus {u1, u2, u3} is the desired orthonormal basis for P2(R). ♦If we continue applying the Gram–Schmidt orthogonalization process to

the basis {1, x, x2, . . .} for P(R), we obtain an orthogonal basis whose elementsare called the Legendre polynomials. The orthogonal polynomials v1, v2, andv3 in Example 5 are the first three Legendre polynomials.

The following result gives us a simple method of representing a vector asa linear combination of the vectors in an orthonormal basis.

Theorem 6.5. Let V be a nonzero finite-dimensional inner product space.Then V has an orthonormal basis β. Furthermore, if β = {v1, v2, . . . , vn} andx ∈ V, then

x =n∑

i=1

〈x, vi〉 vi.


Proof. Let β0 be an ordered basis for V. Apply Theorem 6.4 to obtainan orthogonal set β′ of nonzero vectors with span(β′) = span(β0) = V. Bynormalizing each vector in β′, we obtain an orthonormal set β that generatesV. By Corollary 2 to Theorem 6.3, β is linearly independent; therefore βis an orthonormal basis for V. The remainder of the theorem follows fromCorollary 1 to Theorem 6.3.

Example 6

We use Theorem 6.5 to represent the polynomial f(x) = 1 + 2x + 3x2 asa linear combination of the vectors in the orthonormal basis {u1, u2, u3} forP2(R) obtained in Example 5. Observe that

〈f(x), u1〉 =∫ 1

−1

1√2(1 + 2t + 3t2) dt = 2

√2,

〈f(x), u2〉 =∫ 1

−1

√32t(1 + 2t + 3t2) dt =

2√

63

,

and

〈f(x), u3〉 =∫ 1

−1

√58(3t2 − 1)(1 + 2t + 3t2) dt =

2√

105

.

Therefore f(x) = 2√

2 u1 +2√

63

u2 +2√

105

u3. ♦

Theorem 6.5 gives us a simple method for computing the entries of thematrix representation of a linear operator with respect to an orthonormalbasis.

Corollary. Let V be a finite-dimensional inner product space with anorthonormal basis β = {v1, v2, . . . , vn}. Let T be a linear operator on V, andlet A = [T]β . Then for any i and j, Aij = 〈T(vj), vi〉.

Proof. From Theorem 6.5, we have

T(vj) =n∑

i=1

〈T(vj), vi〉 vi.

Hence Aij = 〈T(vj), vi〉.The scalars 〈x, vi〉 given in Theorem 6.5 have been studied extensively

for special inner product spaces. Although the vectors v1, v2, . . . , vn werechosen from an orthonormal basis, we introduce a terminology associatedwith orthonormal sets β in more general inner product spaces.


Definition. Let β be an orthonormal subset (possibly infinite) of aninner product space V, and let x ∈ V. We define the Fourier coefficientsof x relative to β to be the scalars 〈x, y〉, where y ∈ β.

In the first half of the 19th century, the French mathematician Jean Bap-tiste Fourier was associated with the study of the scalars∫ 2π

0

f(t) sin nt dt and∫ 2π

0

f(t) cos nt dt,

or more generally,

cn =12π

∫ 2π

0

f(t)e−int dt,

for a function f . In the context of Example 9 of Section 6.1, we see thatcn = 〈f, fn〉, where fn(t) = eint; that is, cn is the nth Fourier coefficient for acontinuous function f ∈ V relative to S. These coefficients are the “classical”Fourier coefficients of a function, and the literature concerning the behavior ofthese coefficients is extensive. We learn more about these Fourier coefficientsin the remainder of this chapter.

Example 7

Let S = {eint : n is an integer}. In Example 9 of Section 6.1, S was shown tobe an orthonormal set in H. We compute the Fourier coefficients of f(t) = trelative to S. Using integration by parts, we have, for n �= 0,

〈f, fn〉 =12π

∫ 2π

0

teint dt =12π

∫ 2π

0

te−int dt =−1in

,

and, for n = 0,

〈f, 1〉 =12π

∫ 2π

0

t(1) dt = π.

As a result of these computations, and using Exercise 16 of this section, weobtain an upper bound for the sum of a special infinite series as follows:

‖f‖2 ≥−1∑

n=−k

| 〈f, fn〉 |2 + | 〈f, 1〉 |2 +k∑

n=1

| 〈f, fn〉 |2

=−1∑

n=−k

1n2

+ π2 +k∑

n=1

1n2

= 2k∑

n=1

1n2

+ π2


for every k. Now, using the fact that ‖f‖2 =43π2, we obtain

43π2 ≥ 2

k∑n=1

1n2

+ π2,

or

π2

6≥

k∑n=1

1n2

.

Because this inequality holds for all k, we may let k → ∞ to obtain

π2

6≥

∞∑n=1

1n2

.

Additional results may be produced by replacing f by other functions. ♦We are now ready to proceed with the concept of an orthogonal comple-

ment.

Definition. Let S be a nonempty subset of an inner product space V. Wedefine S⊥ (read “S perp”) to be the set of all vectors in V that are orthogonalto every vector in S; that is, S⊥ = {x ∈ V : 〈x, y〉 = 0 for all y ∈ S}. The setS⊥ is called the orthogonal complement of S.

It is easily seen that S⊥ is a subspace of V for any subset S of V.

Example 8

The reader should verify that {0}⊥ = V and V⊥ = {0} for any inner productspace V. ♦Example 9

If V = R3 and S = {e3}, then S⊥ equals the xy-plane (see Exercise 5). ♦Exercise 18 provides an interesting example of an orthogonal complement

in an infinite-dimensional inner product space.Consider the problem in R3 of finding the distance from a point P to a

plane W. (See Figure 6.2.) Problems of this type arise in many settings. Ifwe let y be the vector determined by 0 and P , we may restate the problemas follows: Determine the vector u in W that is “closest” to y. The desireddistance is clearly given by ‖y − u‖. Notice from the figure that the vectorz = y − u is orthogonal to every vector in W, and so z ∈ W⊥.

The next result presents a practical method of finding u in the case thatW is a finite-dimensional subspace of an inner product space.


��

P

��

u

y z = y − u

W

0

Figure 6.2

Theorem 6.6. Let W be a finite-dimensional subspace of an inner productspace V, and let y ∈ V. Then there exist unique vectors u ∈ W and z ∈ W⊥

such that y = u + z. Furthermore, if {v1, v2, . . . , vk} is an orthonormal basisfor W, then

u =k∑

i=1

〈y, vi〉 vi.

Proof. Let {v1, v2, . . . , vk} be an orthonormal basis for W, let u be asdefined in the preceding equation, and let z = y − u. Clearly u ∈ W andy = u + z.

To show that z ∈ W⊥, it suffices to show, by Exercise 7, that z is orthog-onal to each vj . For any j, we have

〈z, vj〉 =

⟨(y −

k∑i=1

〈y, vi〉 vi

), vj

⟩= 〈y, vj〉 −

k∑i=1

〈y, vi〉〈vi, vj〉

= 〈y, vj〉 − 〈y, vj〉 = 0.

To show uniqueness of u and z, suppose that y = u + z = u′ + z′, whereu′ ∈ W and z′ ∈ W⊥. Then u − u′ = z′ − z ∈ W ∩ W⊥ = {0}. Therefore,u = u′ and z = z′.

Corollary. In the notation of Theorem 6.6, the vector u is the uniquevector in W that is “closest” to y; that is, for any x ∈ W, ‖y − x‖ ≥ ‖y − u‖,and this inequality is an equality if and only if x = u.

Proof. As in Theorem 6.6, we have that y = u + z, where z ∈ W⊥. Letx ∈ W. Then u − x is orthogonal to z, so, by Exercise 10 of Section 6.1, we


have

‖y − x‖2 = ‖u + z − x‖2 = ‖(u − x) + z‖2 = ‖u − x‖2 + ‖z‖2

≥ ‖z‖2 = ‖y − u‖2.

Now suppose that ‖y − x‖ = ‖y − u‖. Then the inequality above becomes anequality, and therefore ‖u − x‖2 + ‖z‖2 = ‖z‖2. It follows that ‖u − x‖ = 0,and hence x = u. The proof of the converse is obvious.

The vector u in the corollary is called the orthogonal projection of yon W. We will see the importance of orthogonal projections of vectors in theapplication to least squares in Section 6.3.

Example 10

Let V = P3(R) with the inner product

〈f(x), g(x)〉 =∫ 1

−1

f(t)g(t) dt for all f(x), g(x) ∈ V.

We compute the orthogonal projection f1(x) of f(x) = x3 on P2(R).

By Example 5,

{u1, u2, u3} =

{1√2,

√32

x,

√58

(3x2 − 1)

}

is an orthonormal basis for P2(R). For these vectors, we have

〈f(x), u1〉 =∫ 1

−1

t31√2

dt = 0, 〈f(x), u2〉 =∫ 1

−1

t3√

32

t dt =√

65

,

and

〈f(x), u3〉 =∫ 1

−1

t3√

58

(3t2 − 1) dt = 0.

Hence

f1(x) = 〈f(x), u1〉u1 + 〈f(x), u2〉u2 + 〈f(x), u3〉u3 =35x. ♦

It was shown (Corollary 2 to the replacement theorem, p. 47) that any lin-early independent set in a finite-dimensional vector space can be extended toa basis. The next theorem provides an interesting analog for an orthonormalsubset of a finite-dimensional inner product space.


Theorem 6.7. Suppose that S = {v1, v2, . . . , vk} is an orthonormal setin an n-dimensional inner product space V. Then

(a) S can be extended to an orthonormal basis {v1, v2, . . . , vk, vk+1, . . . , vn}for V.

(b) If W = span(S), then S1 = {vk+1, vk+2, . . . , vn} is an orthonormalbasis for W⊥ (using the preceding notation).

(c) If W is any subspace of V, then dim(V) = dim(W) + dim(W⊥).

Proof. (a) By Corollary 2 to the replacement theorem (p. 47), S can beextended to an ordered basis S′ = {v1, v2, . . . , vk, wk+1, . . . , wn} for V. Nowapply the Gram–Schmidt process to S′. The first k vectors resulting fromthis process are the vectors in S by Exercise 8, and this new set spans V.Normalizing the last n − k vectors of this set produces an orthonormal setthat spans V. The result now follows.

(b) Because S1 is a subset of a basis, it is linearly independent. Since S1

is clearly a subset of W⊥, we need only show that it spans W⊥. Note that,for any x ∈ V, we have

x =n∑

i=1

〈x, vi〉 vi.

If x ∈ W⊥, then 〈x, vi〉 = 0 for 1 ≤ i ≤ k. Therefore,

x =n∑

i=k+1

〈x, vi〉 vi ∈ span(S1).

(c) Let W be a subspace of V. It is a finite-dimensional inner productspace because V is, and so it has an orthonormal basis {v1, v2, . . . , vk}. By(a) and (b), we have

dim(V) = n = k + (n − k) = dim(W) + dim(W⊥).

Example 11

Let W = span({e1, e2}) in F3. Then x = (a, b, c) ∈ W⊥ if and only if 0 =〈x, e1〉 = a and 0 = 〈x, e2〉 = b. So x = (0, 0, c), and therefore W⊥ =span({e3}). One can deduce the same result by noting that e3 ∈ W⊥ and,from (c), that dim(W⊥) = 3 − 2 = 1. ♦

EXERCISES


(a) The Gram–Schmidt orthogonalization process allows us to con-struct an orthonormal set from an arbitrary set of vectors.


(b) Every nonzero finite-dimensional inner product space has an or-thonormal basis.

(c) The orthogonal complement of any set is a subspace.(d) If {v1, v2, . . . , vn} is a basis for an inner product space V, then for

any x ∈ V the scalars 〈x, vi〉 are the Fourier coefficients of x.(e) An orthonormal basis must be an ordered basis.(f) Every orthogonal set is linearly independent.(g) Every orthonormal set is linearly independent.

2. In each part, apply the Gram–Schmidt process to the given subset S ofthe inner product space V to obtain an orthogonal basis for span(S).Then normalize the vectors in this basis to obtain an orthonormal basisβ for span(S), and compute the Fourier coefficients of the given vectorrelative to β. Finally, use Theorem 6.5 to verify your result.

(a) V = R3, S = {(1, 0, 1), (0, 1, 1), (1, 3, 3)}, and x = (1, 1, 2)(b) V = R3, S = {(1, 1, 1), (0, 1, 1), (0, 0, 1)}, and x = (1, 0, 1)(c) V = P2(R) with the inner product 〈f(x), g(x)〉 =

∫ 1

0f(t)g(t) dt,

S = {1, x, x2}, and h(x) = 1 + x(d) V = span(S), where S = {(1, i, 0), (1 − i, 2, 4i)}, and

x = (3 + i, 4i,−4)(e) V = R4, S = {(2,−1,−2, 4), (−2, 1,−5, 5), (−1, 3, 7, 11)}, and x =

(−11, 8,−4, 18)(f) V = R4, S = {(1,−2,−1, 3), (3, 6, 3,−1), (1, 4, 2, 8)},

and x = (−1, 2, 1, 1)

(g) V = M2×2(R), S ={(

3 5−1 1

),

(−1 95 −1

),

(7 −172 −6

)}, and

A =(−1 27−4 8

)(h) V = M2×2(R), S =

{(2 22 1

),

(11 42 5

),

(4 −123 −16

)}, and A =(

8 625 −13

)(i) V = span(S) with the inner product 〈f, g〉 =

∫ π

0

f(t)g(t) dt,

S = {sin t, cos t, 1, t}, and h(t) = 2t + 1(j) V = C4, S = {(1, i, 2 − i,−1), (2 + 3i, 3i, 1 − i, 2i),

(−1+7i, 6+10i, 11−4i, 3+4i)}, and x = (−2+7i, 6+9i, 9−3i, 4+4i)

(k) V = C4, S = {(−4, 3 − 2i, i, 1 − 4i),(−1−5i, 5−4i,−3+5i, 7−2i), (−27−i,−7−6i,−15+25i,−7−6i)},and x = (−13 − 7i,−12 + 3i,−39 − 11i,−26 + 5i)


(l) V = M2×2(C), S ={(

1 − i −2 − 3i2 + 2i 4 + i

),

(8i 4

−3 − 3i −4 + 4i

),(−25 − 38i −2 − 13i

12 − 78i −7 + 24i

)}, and A =

(−2 + 8i −13 + i10 − 10i 9 − 9i

)(m) V = M2×2(C), S =

{(−1 + i −i2 − i 1 + 3i

),

(−1 − 7i −9 − 8i1 + 10i −6 − 2i

),(−11 − 132i −34 − 31i

7 − 126i −71 − 5i

)}, and A =

(−7 + 5i 3 + 18i9 − 6i −3 + 7i

)3. In R2, let

β ={(

1√2,

1√2

),

(1√2,−1√

2

)}.

Find the Fourier coefficients of (3, 4) relative to β.

4. Let S = {(1, 0, i), (1, 2, 1)} in C3. Compute S⊥.

5. Let S0 = {x0}, where x0 is a nonzero vector in R3. Describe S⊥0 ge-

ometrically. Now suppose that S = {x1, x2} is a linearly independentsubset of R3. Describe S⊥ geometrically.

6. Let V be an inner product space, and let W be a finite-dimensionalsubspace of V. If x /∈ W, prove that there exists y ∈ V such thaty ∈ W⊥, but 〈x, y〉 �= 0. Hint: Use Theorem 6.6.

7. Let β be a basis for a subspace W of an inner product space V, and letz ∈ V. Prove that z ∈ W⊥ if and only if 〈z, v〉 = 0 for every v ∈ β.

8. Prove that if {w1, w2, . . . , wn} is an orthogonal set of nonzero vectors,then the vectors v1, v2, . . . , vn derived from the Gram–Schmidt processsatisfy vi = wi for i = 1, 2, . . . , n. Hint: Use mathematical induction.

9. Let W = span({(i, 0, 1)}) in C3. Find orthonormal bases for W and W⊥.

10. Let W be a finite-dimensional subspace of an inner product space V.Prove that there exists a projection T on W along W⊥ that satisfiesN(T) = W⊥. In addition, prove that ‖T(x)‖ ≤ ‖x‖ for all x ∈ V.Hint: Use Theorem 6.6 and Exercise 10 of Section 6.1. (Projections aredefined in the exercises of Section 2.1.)

11. Let A be an n× n matrix with complex entries. Prove that AA∗ = I ifand only if the rows of A form an orthonormal basis for Cn.

12. Prove that for any matrix A ∈ Mm×n(F ), (R(LA∗))⊥ = N(LA).


13. Let V be an inner product space, S and S0 be subsets of V, and W bea finite-dimensional subspace of V. Prove the following results.

(a) S0 ⊆ S implies that S⊥ ⊆ S⊥0 .

(b) S ⊆ (S⊥)⊥; so span(S) ⊆ (S⊥)⊥.(c) W = (W⊥)⊥. Hint: Use Exercise 6.(d) V = W ⊕ W⊥. (See the exercises of Section 1.3.)

14. Let W1 and W2 be subspaces of a finite-dimensional inner product space.Prove that (W1 +W2)⊥ = W⊥

1 ∩W⊥2 and (W1∩W2)⊥ = W⊥

1 +W⊥2 . (See

the definition of the sum of subsets of a vector space on page 22.) Hintfor the second equation: Apply Exercise 13(c) to the first equation.

15. Let V be a finite-dimensional inner product space over F .

(a) Parseval’s Identity. Let {v1, v2, . . . , vn} be an orthonormal basisfor V. For any x, y ∈ V prove that

〈x, y〉 =n∑

i=1

〈x, vi〉〈y, vi〉.

(b) Use (a) to prove that if β is an orthonormal basis for V with innerproduct 〈 · , ·〉, then for any x, y ∈ V

〈φβ(x), φβ(y)〉′ = 〈[x]β , [y]β〉′ = 〈x, y〉 ,

where 〈 · , ·〉′ is the standard inner product on Fn.

16. (a) Bessel’s Inequality. Let V be an inner product space, and let S ={v1, v2, . . . , vn} be an orthonormal subset of V. Prove that for anyx ∈ V we have

‖x‖2 ≥n∑

i=1

| 〈x, vi〉 |2.

Hint: Apply Theorem 6.6 to x ∈ V and W = span(S). Then useExercise 10 of Section 6.1.

(b) In the context of (a), prove that Bessel’s inequality is an equalityif and only if x ∈ span(S).

17. Let T be a linear operator on an inner product space V. If 〈T(x), y〉 = 0for all x, y ∈ V, prove that T = T0. In fact, prove this result if theequality holds for all x and y in some basis for V.

18. Let V = C([−1, 1]). Suppose that We and Wo denote the subspaces of Vconsisting of the even and odd functions, respectively. (See Exercise 22


of Section 1.3.) Prove that W⊥e = Wo, where the inner product on V is

defined by

〈f, g〉 =∫ 1

−1

f(t)g(t) dt.

19. In each of the following parts, find the orthogonal projection of thegiven vector on the given subspace W of the inner product space V.

(a) V = R2, u = (2, 6), and W = {(x, y) : y = 4x}.(b) V = R3, u = (2, 1, 3), and W = {(x, y, z) : x + 3y − 2z = 0}.(c) V = P(R) with the inner product 〈f(x), g(x)〉 =

∫ 1

0f(t)g(t) dt,

h(x) = 4 + 3x − 2x2, and W = P1(R).

20. In each part of Exercise 19, find the distance from the given vector tothe subspace W.

21. Let V = C([−1, 1]) with the inner product 〈f, g〉 =∫ 1

−1f(t)g(t) dt, and

let W be the subspace P2(R), viewed as a space of functions. Usethe orthonormal basis obtained in Example 5 to compute the “best”(closest) second-degree polynomial approximation of the function h(t) =et on the interval [−1, 1].

22. Let V = C([0, 1]) with the inner product 〈f, g〉 =∫ 1

0f(t)g(t) dt. Let W

be the subspace spanned by the linearly independent set {t,√t}.(a) Find an orthonormal basis for W.(b) Let h(t) = t2. Use the orthonormal basis obtained in (a) to obtain

the “best” (closest) approximation of h in W.

23. Let V be the vector space defined in Example 5 of Section 1.2, thespace of all sequences σ in F (where F = R or F = C) such thatσ(n) �= 0 for only finitely many positive integers n. For σ, μ ∈ V, we

define 〈σ, μ〉 =∞∑

n=1

σ(n)μ(n). Since all but a finite number of terms of

the series are zero, the series converges.

(a) Prove that 〈 · , ·〉 is an inner product on V, and hence V is an innerproduct space.

(b) For each positive integer n, let en be the sequence defined byen(k) = δn,k, where δn,k is the Kronecker delta. Prove that{e1, e2, . . .} is an orthonormal basis for V.

(c) Let σn = e1 + en and W = span({σn : n ≥ 2}.(i) Prove that e1 /∈ W, so W �= V.(ii) Prove that W⊥ = {0}, and conclude that W �= (W⊥)⊥.

Sec. 6.3 The Adjoint of a Linear Operator 357

Thus the assumption in Exercise 13(c) that W is finite-dimensionalis essential.

6.3 THE ADJOINT OF A LINEAR OPERATOR

In Section 6.1, we defined the conjugate transpose A∗ of a matrix A. Fora linear operator T on an inner product space V, we now define a relatedlinear operator on V called the adjoint of T, whose matrix representationwith respect to any orthonormal basis β for V is [T]∗β . The analogy betweenconjugation of complex numbers and adjoints of linear operators will becomeapparent. We first need a preliminary result.

Let V be an inner product space, and let y ∈ V. The function g : V → Fdefined by g(x) = 〈x, y〉 is clearly linear. More interesting is the fact that ifV is finite-dimensional, every linear transformation from V into F is of thisform.

Theorem 6.8. Let V be a finite-dimensional inner product space over F ,and let g : V → F be a linear transformation. Then there exists a uniquevector y ∈ V such that g(x) = 〈x, y〉 for all x ∈ V.

Proof. Let β = {v1, v2, . . . , vn} be an orthonormal basis for V, and let

y =n∑

i=1

g(vi)vi.

Define h : V → F by h(x) = 〈x, y〉, which is clearly linear. Furthermore, for1 ≤ j ≤ n we have

h(vj) = 〈vj , y〉 =

⟨vj ,

n∑i=1

g(vi)vi

⟩=

n∑i=1

g(vi) 〈vj , vi〉

=n∑

i=1

g(vi)δji = g(vj).

Since g and h both agree on β, we have that g = h by the corollary toTheorem 2.6 (p. 73).

To show that y is unique, suppose that g(x) = 〈x, y′〉 for all x. Then〈x, y〉 = 〈x, y′〉 for all x; so by Theorem 6.1(e) (p. 333), we have y = y′.

Example 1

Define g : R2 → R by g(a1, a2) = 2a1+a2; clearly g is a linear transformation.Let β = {e1, e2}, and let y = g(e1)e1 + g(e2)e2 = 2e1 + e2 = (2, 1), as in theproof of Theorem 6.8. Then g(a1, a2) = 〈(a1, a2), (2, 1)〉 = 2a1 + a2. ♦


Theorem 6.9. Let V be a finite-dimensional inner product space, and letT be a linear operator on V. Then there exists a unique function T∗ : V → Vsuch that 〈T(x), y〉 = 〈x,T∗(y)〉 for all x, y ∈ V. Furthermore, T∗ is linear.

Proof. Let y ∈ V. Define g : V → F by g(x) = 〈T(x), y〉 for all x ∈ V. Wefirst show that g is linear. Let x1, x2 ∈ V and c ∈ F . Then

g(cx1 + x2) = 〈T(cx1 + x2), y〉 = 〈cT(x1) + T(x2), y〉= c 〈T(x1), y〉 + 〈T(x2), y〉 = cg(x1) + g(x2).

Hence g is linear.We now apply Theorem 6.8 to obtain a unique vector y′ ∈ V such that

g(x) = 〈x, y′〉; that is, 〈T(x), y〉 = 〈x, y′〉 for all x ∈ V. Defining T∗ : V → Vby T∗(y) = y′, we have 〈T(x), y〉 = 〈x,T∗(y)〉.

To show that T∗ is linear, let y1, y2 ∈ V and c ∈ F . Then for any x ∈ V,we have

〈x,T∗(cy1 + y2)〉 = 〈T(x), cy1 + y2〉= c 〈T(x), y1〉 + 〈T(x), y2〉= c 〈x,T∗(y1)〉 + 〈x,T∗(y2)〉= 〈x, cT∗(y1) + T∗(y2)〉 .

Since x is arbitrary, T∗(cy1 + y2) = cT∗(y1) + T∗(y2) by Theorem 6.1(e)(p. 333).

Finally, we need to show that T∗ is unique. Suppose that U : V → Vis linear and that it satisfies 〈T(x), y〉 = 〈x,U(y)〉 for all x, y ∈ V. Then〈x,T∗(y)〉 = 〈x,U(y)〉 for all x, y ∈ V, so T∗ = U.

The linear operator T∗ described in Theorem 6.9 is called the adjoint ofthe operator T. The symbol T∗ is read “T star.”

Thus T∗ is the unique operator on V satisfying 〈T(x), y〉 = 〈x,T∗(y)〉 forall x, y ∈ V. Note that we also have

〈x,T(y)〉 = 〈T(y), x〉 = 〈y, T∗(x)〉 = 〈T∗(x), y〉 ;

so 〈x,T(y)〉 = 〈T∗(x), y〉 for all x, y ∈ V. We may view these equationssymbolically as adding a * to T when shifting its position inside the innerproduct symbol.

For an infinite-dimensional inner product space, the adjoint of a linear op-erator T may be defined to be the function T∗ such that 〈T(x), y〉 = 〈x,T∗(y)〉for all x, y ∈ V, provided it exists. Although the uniqueness and linearity ofT∗ follow as before, the existence of the adjoint is not guaranteed (see Exer-cise 24). The reader should observe the necessity of the hypothesis of finite-dimensionality in the proof of Theorem 6.8. Many of the theorems we prove


about adjoints, nevertheless, do not depend on V being finite-dimensional.Thus, unless stated otherwise, for the remainder of this chapter we adopt theconvention that a reference to the adjoint of a linear operator on an infinite-dimensional inner product space assumes its existence.

Theorem 6.10 is a useful result for computing adjoints.

Theorem 6.10. Let V be a finite-dimensional inner product space, andlet β be an orthonormal basis for V. If T is a linear operator on V, then

[T∗]β = [T]∗β .

Proof. Let A = [T]β , B = [T∗]β , and β = {v1, v2, . . . , vn}. Then from thecorollary to Theorem 6.5 (p. 346), we have

Bij = 〈T∗(vj), vi〉 = 〈vi, T∗(vj)〉 = 〈T(vi), vj〉 = Aji = (A∗)ij .

Hence B = A∗.

Corollary. Let A be an n × n matrix. Then LA∗ = (LA)∗.

Proof. If β is the standard ordered basis for Fn, then, by Theorem 2.16(p. 93), we have [LA]β = A. Hence [(LA)∗]β = [LA]∗β = A∗ = [LA∗ ]β , and so(LA)∗ = LA∗ .

As an illustration of Theorem 6.10, we compute the adjoint of a specificlinear operator.

Example 2

Let T be the linear operator on C2 defined by T(a1, a2) = (2ia1+3a2, a1−a2).If β is the standard ordered basis for C2, then

[T]β =(

2i 31 −1

).

So

[T∗]β = [T]∗β =(−2i 1

3 −1

).

Hence

T∗(a1, a2) = (−2ia1 + a2, 3a1 − a2). ♦

The following theorem suggests an analogy between the conjugates ofcomplex numbers and the adjoints of linear operators.

Theorem 6.11. Let V be an inner product space, and let T and U belinear operators on V. Then


(a) (T + U)∗ = T∗ + U∗;(b) (cT)∗ = c T∗ for any c ∈ F ;(c) (TU)∗ = U∗T∗;(d) T∗∗ = T;(e) I∗ = I.

Proof. We prove (a) and (d); the rest are proved similarly. Let x, y ∈ V.(a) Because

〈x, (T + U)∗(y)〉 = 〈(T + U)(x), y〉 = 〈T(x) + U(x), y〉= 〈T(x), y〉 + 〈U(x), y〉 = 〈x,T∗(y)〉 + 〈x,U∗(y)〉= 〈x,T∗(y) + U∗(y)〉 = 〈x, (T∗ + U∗)(y)〉 ,

T∗ + U∗ has the property unique to (T + U)∗. Hence T∗ + U∗ = (T + U)∗.(d) Similarly, since

〈x,T(y)〉 = 〈T∗(x), y〉 = 〈x,T∗∗(y)〉 ,

(d) follows.

The same proof works in the infinite-dimensional case, provided that theexistence of T∗ and U∗ is assumed.

Corollary. Let A and B be n × n matrices. Then(a) (A + B)∗ = A∗ + B∗;(b) (cA)∗ = cA∗ for all c ∈ F ;(c) (AB)∗ = B∗A∗;(d) A∗∗ = A;(e) I∗ = I.

Proof. We prove only (c); the remaining parts can be proved similarly.Since L(AB)∗ = (LAB)∗ = (LALB)∗ = (LB)∗(LA)∗ = LB∗LA∗ = LB∗A∗ , we

have (AB)∗ = B∗A∗.

In the preceding proof, we relied on the corollary to Theorem 6.10. Analternative proof, which holds even for nonsquare matrices, can be given byappealing directly to the definition of the conjugate transpose of a matrix(see Exercise 5).

Least Squares Approximation

Consider the following problem: An experimenter collects data by takingmeasurements y1, y2, . . . , ym at times t1, t2, . . . , tm, respectively. For example,he or she may be measuring unemployment at various times during someperiod. Suppose that the data (t1, y1), (t2, y2), . . . , (tm, ym) are plotted aspoints in the plane. (See Figure 6.3.) From this plot, the experimenter


feels that there exists an essentially linear relationship between y and t, sayy = ct + d, and would like to find the constants c and d so that the liney = ct + d represents the best possible fit to the data collected. One suchestimate of fit is to calculate the error E that represents the sum of thesquares of the vertical distances from the points to the line; that is,

E =m∑

i=1

(yi − cti − d)2.

�

�

y

t

��

��

��

!�

(t1, y1)

�

��

�

(ti, yi)

�

(ti, cti + d)

�

y = ct + d

Figure 6.3

Thus the problem is reduced to finding the constants c and d that minimizeE. (For this reason the line y = ct + d is called the least squares line.) Ifwe let

A =

⎛⎜⎜⎜⎝t1 1t2 1...

...tm 1

⎞⎟⎟⎟⎠ , x =(

cd

), and y =

⎛⎜⎜⎜⎝y1

y2

...ym

⎞⎟⎟⎟⎠ ,

then it follows that E = ‖y − Ax‖2.We develop a general method for finding an explicit vector x0 ∈ Fn that

minimizes E; that is, given an m × n matrix A, we find x0 ∈ Fn such that‖y−Ax0‖ ≤ ‖y−Ax‖ for all vectors x ∈ Fn. This method not only allows usto find the linear function that best fits the data, but also, for any positiveinteger n, the best fit using a polynomial of degree at most n.


First, we need some notation and two simple lemmas. For x, y ∈ Fn, let〈x, y〉n denote the standard inner product of x and y in Fn. Recall that if xand y are regarded as column vectors, then 〈x, y〉n = y∗x.

Lemma 1. Let A ∈ Mm×n(F ), x ∈ Fn, and y ∈ Fm. Then

〈Ax, y〉m = 〈x, A∗y〉n .

Proof. By a generalization of the corollary to Theorem 6.11 (see Exer-cise 5(b)), we have

〈Ax, y〉m = y∗(Ax) = (y∗A)x = (A∗y)∗x = 〈x, A∗y〉n .

Lemma 2. Let A ∈ Mm×n(F ). Then rank(A∗A) = rank(A).

Proof. By the dimension theorem, we need only show that, for x ∈ Fn,we have A∗Ax = 0 if and only if Ax = 0 . Clearly, Ax = 0 implies thatA∗Ax = 0 . So assume that A∗Ax = 0 . Then

0 = 〈A∗Ax, x〉n = 〈Ax, A∗∗x〉m = 〈Ax, Ax〉m ,

so that Ax = 0 .

Corollary. If A is an m × n matrix such that rank(A) = n, then A∗A isinvertible.

Now let A be an m × n matrix and y ∈ Fm. Define W = {Ax : x ∈ Fn};that is, W = R(LA). By the corollary to Theorem 6.6 (p. 350), there exists aunique vector in W that is closest to y. Call this vector Ax0, where x0 ∈ Fn.Then ‖Ax0 − y‖ ≤ ‖Ax − y‖ for all x ∈ Fn; so x0 has the property thatE = ‖Ax0 − y‖ is minimal, as desired.

To develop a practical method for finding such an x0, we note from The-orem 6.6 and its corollary that Ax0 − y ∈ W⊥; so 〈Ax, Ax0 − y〉m = 0 forall x ∈ Fn. Thus, by Lemma 1, we have that 〈x, A∗(Ax0 − y)〉n = 0 for allx ∈ Fn; that is, A∗(Ax0 − y) = 0 . So we need only find a solution x0 toA∗Ax = A∗y. If, in addition, we assume that rank(A) = n, then by Lemma 2we have x0 = (A∗A)−1A∗y. We summarize this discussion in the followingtheorem.

Theorem 6.12. Let A ∈ Mm×n(F ) and y ∈ Fm. Then there existsx0 ∈ Fn such that (A∗A)x0 = A∗y and ‖Ax0 − y‖ ≤ ‖Ax− y‖ for all x ∈ Fn.Furthermore, if rank(A) = n, then x0 = (A∗A)−1A∗y.


To return to our experimenter, let us suppose that the data collected are(1, 2), (2, 3), (3, 5), and (4, 7). Then

A =

⎛⎜⎜⎝1 12 13 14 1

⎞⎟⎟⎠ and y =

⎛⎜⎜⎝2357

⎞⎟⎟⎠ ;

hence

A∗A =(

1 2 3 41 1 1 1

)⎛⎜⎜⎝1 12 13 14 1

⎞⎟⎟⎠ =(

30 1010 4

).

Thus

(A∗A)−1 =120

(4 −10

−10 30

).

Therefore

(cd

)= x0 =

120

(4 −10

−10 30

)(1 2 3 41 1 1 1

)⎛⎜⎜⎝2357

⎞⎟⎟⎠ =(

1.70

).

It follows that the line y = 1.7t is the least squares line. The error E may becomputed directly as ‖Ax0 − y‖2 = 0.3.

Suppose that the experimenter chose the times ti (1 ≤ i ≤ m) to satisfym∑

i=1

ti = 0.

Then the two columns of A would be orthogonal, so A∗A would be a diagonalmatrix (see Exercise 19). In this case, the computations are greatly simplified.

In practice, the m× 2 matrix A in our least squares application has rankequal to two, and hence A∗A is invertible by the corollary to Lemma 2. For,otherwise, the first column of A is a multiple of the second column, whichconsists only of ones. But this would occur only if the experimenter collectsall the data at exactly one time.

Finally, the method above may also be applied if, for some k, the ex-perimenter wants to fit a polynomial of degree at most k to the data. Forinstance, if a polynomial y = ct2 + dt + e of degree at most 2 is desired, theappropriate model is

x =

⎛⎝cde

⎞⎠ , y =

⎛⎜⎜⎜⎝y1

y2

...ym

⎞⎟⎟⎟⎠ , and A =

⎛⎜⎝ t21 t1 1...

......

t2m tm 1

⎞⎟⎠ .


Minimal Solutions to Systems of Linear Equations

Even when a system of linear equations Ax = b is consistent, there maybe no unique solution. In such cases, it may be desirable to find a solution ofminimal norm. A solution s to Ax = b is called a minimal solution if ‖s‖ ≤‖u‖ for all other solutions u. The next theorem assures that every consistentsystem of linear equations has a unique minimal solution and provides amethod for computing it.

Theorem 6.13. Let A ∈ Mm×n(F ) and b ∈ Fm. Suppose that Ax = b isconsistent. Then the following statements are true.

(a) There exists exactly one minimal solution s of Ax = b, and s ∈ R(LA∗).(b) The vector s is the only solution to Ax = b that lies in R(LA∗); that is,

if u satisfies (AA∗)u = b, then s = A∗u.

Proof. (a) For simplicity of notation, we let W = R(LA∗) and W′ = N(LA).Let x be any solution to Ax = b. By Theorem 6.6 (p. 350), x = s + y forsome s ∈ W and y ∈ W⊥. But W⊥ = W′ by Exercise 12, and thereforeb = Ax = As + Ay = As. So s is a solution to Ax = b that lies in W. Toprove (a), we need only show that s is the unique minimal solution. Let v beany solution to Ax = b. By Theorem 3.9 (p. 172), we have that v = s + u,where u ∈ W′. Since s ∈ W, which equals W′⊥ by Exercise 12, we have

‖v‖2 = ‖s + u‖2 = ‖s‖2 + ‖u‖2 ≥ ‖s‖2

by Exercise 10 of Section 6.1. Thus s is a minimal solution. We can also seefrom the preceding calculation that if ‖v‖ = ‖s‖, then u = 0 ; hence v = s.Therefore s is the unique minimal solution to Ax = b, proving (a).

(b) Assume that v is also a solution to Ax = b that lies in W. Then

v − s ∈ W ∩ W′ = W ∩ W⊥ = {0};so v = s.

Finally, suppose that (AA∗)u = b, and let v = A∗u. Then v ∈ W andAv = b. Therefore s = v = A∗u by the discussion above.

Example 3

Consider the system

x + 2y + z = 4x − y + 2z = −11x + 5y = 19.

Let

A =

⎛⎝1 2 11 −1 21 5 0

⎞⎠ and b =

⎛⎝ 4−11

19

⎞⎠ .


To find the minimal solution to this system, we must first find some solutionu to AA∗x = b. Now

AA∗ =

⎛⎝ 6 1 111 6 −4

11 −4 26

⎞⎠ ;

so we consider the system

6x + y + 11z = 4x + 6y − 4z = −11

11x − 4y + 26z = 19,

for which one solution is

u =

⎛⎝ 1−2

0

⎞⎠ .

(Any solution will suffice.) Hence

s = A∗u =

⎛⎝−14

−3

⎞⎠is the minimal solution to the given system. ♦

EXERCISES

1. Label the following statements as true or false. Assume that the under-lying inner product spaces are finite-dimensional.

(a) Every linear operator has an adjoint.(b) Every linear operator on V has the form x → 〈x, y〉 for some y ∈ V.(c) For every linear operator T on V and every ordered basis β for V,

we have [T∗]β = ([T]β)∗.(d) The adjoint of a linear operator is unique.(e) For any linear operators T and U and scalars a and b,

(aT + bU)∗ = aT∗ + bU∗.

(f) For any n × n matrix A, we have (LA)∗ = LA∗ .(g) For any linear operator T, we have (T∗)∗ = T.

2. For each of the following inner product spaces V (over F ) and lineartransformations g : V → F , find a vector y such that g(x) = 〈x, y〉 forall x ∈ V.


(a) V = R3, g(a1, a2, a3) = a1 − 2a2 + 4a3

(b) V = C2, g(z1, z2) = z1 − 2z2

(c) V = P2(R) with 〈f, h〉 =∫ 1

0

f(t)h(t) dt, g(f) = f(0) + f ′(1)

3. For each of the following inner product spaces V and linear operators Ton V, evaluate T∗ at the given vector in V.

(a) V = R2, T(a, b) = (2a + b, a − 3b), x = (3, 5).(b) V = C2, T(z1, z2) = (2z1 + iz2, (1 − i)z1), x = (3 − i, 1 + 2i).

(c) V = P1(R) with 〈f, g〉 =∫ 1

−1

f(t)g(t) dt, T(f) = f ′ + 3f ,

f(t) = 4 − 2t


5. (a) Complete the proof of the corollary to Theorem 6.11 by usingTheorem 6.11, as in the proof of (c).

(b) State a result for nonsquare matrices that is analogous to the corol-lary to Theorem 6.11, and prove it using a matrix argument.

6. Let T be a linear operator on an inner product space V. Let U1 = T+T∗

and U2 = TT∗. Prove that U1 = U∗1 and U2 = U∗

2.

7. Give an example of a linear operator T on an inner product space Vsuch that N(T) �= N(T∗).

8. Let V be a finite-dimensional inner product space, and let T be a linearoperator on V. Prove that if T is invertible, then T∗ is invertible and(T∗)−1 = (T−1)∗.

9. Prove that if V = W ⊕ W⊥ and T is the projection on W along W⊥,then T = T∗. Hint: Recall that N(T) = W⊥. (For definitions, see theexercises of Sections 1.3 and 2.1.)

10. Let T be a linear operator on an inner product space V. Prove that‖T(x)‖ = ‖x‖ for all x ∈ V if and only if 〈T(x), T(y)〉 = 〈x, y〉 for allx, y ∈ V. Hint: Use Exercise 20 of Section 6.1.

11. For a linear operator T on an inner product space V, prove that T∗T =T0 implies T = T0. Is the same result true if we assume that TT∗ = T0?

12. Let V be an inner product space, and let T be a linear operator on V.Prove the following results.

(a) R(T∗)⊥ = N(T).(b) If V is finite-dimensional, then R(T∗) = N(T)⊥. Hint: Use Exer-

cise 13(c) of Section 6.2.


13. Let T be a linear operator on a finite-dimensional vector space V. Provethe following results.

(a) N(T∗T) = N(T). Deduce that rank(T∗T) = rank(T).(b) rank(T) = rank(T∗). Deduce from (a) that rank(TT∗) = rank(T).(c) For any n × n matrix A, rank(A∗A) = rank(AA∗) = rank(A).

14. Let V be an inner product space, and let y, z ∈ V. Define T : V → V byT(x) = 〈x, y〉z for all x ∈ V. First prove that T is linear. Then showthat T∗ exists, and find an explicit expression for it.

The following definition is used in Exercises 15–17 and is an extension of thedefinition of the adjoint of a linear operator.

Definition. Let T : V → W be a linear transformation, where V and Ware finite-dimensional inner product spaces with inner products 〈 · , ·〉1 and〈 · , ·〉2, respectively. A function T∗ : W → V is called an adjoint of T if〈T(x), y〉2 = 〈x,T∗(y)〉1 for all x ∈ V and y ∈ W.

15. Let T : V → W be a linear transformation, where V and W are finite-dimensional inner product spaces with inner products 〈 · , ·〉1 and 〈 · , ·〉2,respectively. Prove the following results.

(a) There is a unique adjoint T∗ of T, and T∗ is linear.(b) If β and γ are orthonormal bases for V and W, respectively, then

[T∗]βγ = ([T]γβ)∗.(c) rank(T∗) = rank(T).(d) 〈T∗(x), y〉1 = 〈x,T(y)〉2 for all x ∈ W and y ∈ V.(e) For all x ∈ V, T∗T(x) = 0 if and only if T(x) = 0 .

16. State and prove a result that extends the first four parts of Theorem 6.11using the preceding definition.

17. Let T : V → W be a linear transformation, where V and W are finite-dimensional inner product spaces. Prove that (R(T∗))⊥ = N(T), usingthe preceding definition.

18.† Let A be an n × n matrix. Prove that det(A∗) = det(A).

19. Suppose that A is an m×n matrix in which no two columns are identical.Prove that A∗A is a diagonal matrix if and only if every pair of columnsof A is orthogonal.

20. For each of the sets of data that follows, use the least squares approx-imation to find the best fits with both (i) a linear function and (ii) aquadratic function. Compute the error E in both cases.

(a) {(−3, 9), (−2, 6), (0, 2), (1, 1)}


(b) {(1, 2), (3, 4), (5, 7), (7, 9), (9, 12)}(c) {(−2, 4), (−1, 3), (0, 1), (1,−1), (2,−3)}

21. In physics, Hooke’s law states that (within certain limits) there is alinear relationship between the length x of a spring and the force yapplied to (or exerted by) the spring. That is, y = cx + d, where c iscalled the spring constant. Use the following data to estimate thespring constant (the length is given in inches and the force is given inpounds).

Length Force

x y

3.5 1.0

4.0 2.2

4.5 2.8

5.0 4.3

22. Find the minimal solution to each of the following systems of linearequations.

(a) x + 2y − z = 12 (b)x + 2y − z = 1

2x + 3y + z = 24x + 7y − z = 4

(c)x + y − z = 0

2x − y + z = 3x − y + z = 2

(d)x + y + z − w = 1

2x − y + w = 1

23. Consider the problem of finding the least squares line y = ct + d corre-sponding to the m observations (t1, y1), (t2, y2), . . . , (tm, ym).

(a) Show that the equation (A∗A)x0 = A∗y of Theorem 6.12 takes theform of the normal equations:(

m∑i=1

t2i

)c +

(m∑

i=1

ti

)d =

m∑i=1

tiyi

and (m∑

i=1

ti

)c + md =

m∑i=1

yi.

These equations may also be obtained from the error E by settingthe partial derivatives of E with respect to both c and d equal tozero.

Sec. 6.4 Normal and Self-Adjoint Operators 369

(b) Use the second normal equation of (a) to show that the leastsquares line must pass through the center of mass, (t, y), where

t =1m

m∑i=1

ti and y =1m

m∑i=1

yi.

24. Let V and {e1, e2, . . .} be defined as in Exercise 23 of Section 6.2. DefineT : V → V by

T(σ)(k) =∞∑

i=k

σ(i) for every positive integer k.

Notice that the infinite series in the definition of T converges becauseσ(i) �= 0 for only finitely many i.

(a) Prove that T is a linear operator on V.(b) Prove that for any positive integer n, T(en) =

∑ni=1 ei.

(c) Prove that T has no adjoint. Hint: By way of contradiction,suppose that T∗ exists. Prove that for any positive integer n,T∗(en)(k) �= 0 for infinitely many k.

6.4 NORMAL AND SELF-ADJOINT OPERATORS

We have seen the importance of diagonalizable operators in Chapter 5. Forthese operators, it is necessary and sufficient for the vector space V to possessa basis of eigenvectors. As V is an inner product space in this chapter, itis reasonable to seek conditions that guarantee that V has an orthonormalbasis of eigenvectors. A very important result that helps achieve our goal isSchur’s theorem (Theorem 6.14). The formulation that follows is in terms oflinear operators. The next section contains the more familiar matrix form.We begin with a lemma.

Lemma. Let T be a linear operator on a finite-dimensional inner productspace V. If T has an eigenvector, then so does T∗.

Proof. Suppose that v is an eigenvector of T with corresponding eigenvalueλ. Then for any x ∈ V,

0 = 〈0 , x〉 = 〈(T − λI)(v), x〉 = 〈v, (T − λI)∗(x)〉 =⟨v, (T ∗ − λI)(x)

⟩,

and hence v is orthogonal to the range of T ∗ − λI. So T ∗ − λI is not ontoand hence is not one-to-one. Thus T ∗ − λI has a nonzero null space, and anynonzero vector in this null space is an eigenvector of T∗ with correspondingeigenvalue λ.


Recall (see the exercises of Section 2.1 and see Section 5.4) that a subspaceW of V is said to be T-invariant if T(W) is contained in W. If W is T-invariant, we may define the restriction TW : W → W by TW(x) = T(x) for allx ∈ W. It is clear that TW is a linear operator on W. Recall from Section 5.2that a polynomial is said to split if it factors into linear polynomials.

Theorem 6.14 (Schur). Let T be a linear operator on a finite-dimensional inner product space V. Suppose that the characteristic poly-nomial of T splits. Then there exists an orthonormal basis β for V such thatthe matrix [T]β is upper triangular.

Proof. The proof is by mathematical induction on the dimension n of V.The result is immediate if n = 1. So suppose that the result is true for linearoperators on (n − 1)-dimensional inner product spaces whose characteristicpolynomials split. By the lemma, we can assume that T∗ has a unit eigen-vector z. Suppose that T∗(z) = λz and that W = span({z}). We show thatW⊥ is T-invariant. If y ∈ W⊥ and x = cz ∈ W, then

〈T(y), x〉 = 〈T(y), cz〉 = 〈y, T∗(cz)〉 = 〈y, cT∗(z)〉 = 〈y, cλz〉= cλ 〈y, z〉 = cλ(0) = 0.

So T(y) ∈ W⊥. It is easy to show (see Theorem 5.21 p. 314, or as a con-sequence of Exercise 6 of Section 4.4) that the characteristic polynomial ofTW⊥ divides the characteristic polynomial of T and hence splits. By Theo-rem 6.7(c) (p. 352), dim(W⊥) = n − 1, so we may apply the induction hy-pothesis to TW⊥ and obtain an orthonormal basis γ of W⊥ such that [TW⊥ ]γis upper triangular. Clearly, β = γ ∪ {z} is an orthonormal basis for V suchthat [T]β is upper triangular.

We now return to our original goal of finding an orthonormal basis ofeigenvectors of a linear operator T on a finite-dimensional inner product spaceV. Note that if such an orthonormal basis β exists, then [T]β is a diagonalmatrix, and hence [T∗]β = [T]∗β is also a diagonal matrix. Because diagonalmatrices commute, we conclude that T and T∗ commute. Thus if V possessesan orthonormal basis of eigenvectors of T, then TT∗ = T∗T .

Definitions. Let V be an inner product space, and let T be a linearoperator on V. We say that T is normal if TT∗ = T∗T. An n × n real orcomplex matrix A is normal if AA∗ = A∗A.

It follows immediately from Theorem 6.10 (p. 359) that T is normal if andonly if [T]β is normal, where β is an orthonormal basis.


Example 1

Let T : R2 → R2 be rotation by θ, where 0 < θ < π. The matrix representationof T in the standard ordered basis is given by

A =(

cos θ − sin θsin θ cos θ

).

Note that AA∗ = I = A∗A; so A, and hence T, is normal. ♦Example 2

Suppose that A is a real skew-symmetric matrix; that is, At = −A. Then Ais normal because both AAt and AtA are equal to −A2. ♦

Clearly, the operator T in Example 1 does not even possess one eigenvec-tor. So in the case of a real inner product space, we see that normality is notsufficient to guarantee an orthonormal basis of eigenvectors. All is not lost,however. We show that normality suffices if V is a complex inner productspace.

Before we prove the promised result for normal operators, we need somegeneral properties of normal operators.

Theorem 6.15. Let V be an inner product space, and let T be a normaloperator on V. Then the following statements are true.

(a) ‖T(x)‖ = ‖T∗(x)‖ for all x ∈ V.(b) T − cI is normal for every c ∈ F .(c) If x is an eigenvector of T, then x is also an eigenvector of T∗. In fact,

if T(x) = λx, then T∗(x) = λx.(d) If λ1 and λ2 are distinct eigenvalues of T with corresponding eigenvec-

tors x1 and x2, then x1 and x2 are orthogonal.

Proof. (a) For any x ∈ V, we have

‖T(x)‖2 = 〈T(x), T(x)〉 = 〈T∗T(x), x〉 = 〈TT∗(x), x〉= 〈T∗(x), T∗(x)〉 = ‖T∗(x)‖2.

The proof of (b) is left as an exercise.(c) Suppose that T(x) = λx for some x ∈ V. Let U = T − λI. Then

U(x) = 0 , and U is normal by (b). Thus (a) implies that

0 = ‖U(x)‖ = ‖U∗(x)‖ = ‖(T∗ − λI)(x)‖ = ‖T∗(x) − λx‖.Hence T∗(x) = λx. So x is an eigenvector of T∗.

(d) Let λ1 and λ2 be distinct eigenvalues of T with corresponding eigen-vectors x1 and x2. Then, using (c), we have

λ1 〈x1, x2〉 = 〈λ1x1, x2〉 = 〈T(x1), x2〉 = 〈x1, T∗(x2)〉


=⟨x1, λ2x2

⟩= λ2 〈x1, x2〉 .

Since λ1 �= λ2, we conclude that 〈x1, x2〉 = 0.

Theorem 6.16. Let T be a linear operator on a finite-dimensional com-plex inner product space V. Then T is normal if and only if there exists anorthonormal basis for V consisting of eigenvectors of T.

Proof. Suppose that T is normal. By the fundamental theorem of algebra(Theorem D.4), the characteristic polynomial of T splits. So we may applySchur’s theorem to obtain an orthonormal basis β = {v1, v2, . . . , vn} for Vsuch that [T]β = A is upper triangular. We know that v1 is an eigenvectorof T because A is upper triangular. Assume that v1, v2, . . . , vk−1 are eigen-vectors of T. We claim that vk is also an eigenvector of T. It then followsby mathematical induction on k that all of the vi’s are eigenvectors of T.Consider any j < k, and let λj denote the eigenvalue of T corresponding tovj . By Theorem 6.15, T ∗(vj) = λjvj . Since A is upper triangular,

T(vk) = A1kv1 + A2kv2 + · · · + Ajkvj + · · · + Akkvk.

Furthermore, by the corollary to Theorem 6.5 (p. 347),

Ajk = 〈T(vk), vj〉 = 〈vk, T∗(vj)〉 =⟨vk, λjvj

⟩= λj 〈vk, vj〉 = 0.

It follows that T(vk) = Akkvk, and hence vk is an eigenvector of T. So byinduction, all the vectors in β are eigenvectors of T.

The converse was already proved on page 370.

Interestingly, as the next example shows, Theorem 6.16 does not extendto infinite-dimensional complex inner product spaces.

Example 3

Consider the inner product space H with the orthonormal set S from Exam-ple 9 in Section 6.1. Let V = span(S), and let T and U be the linear operatorson V defined by T(f) = f1f and U(f) = f−1f . Then

T(fn) = fn+1 and U(fn) = fn−1

for all integers n. Thus

〈T(fm), fn〉 = 〈fm+1, fn〉 = δ(m+1),n = δm,(n−1) = 〈fm, fn−1〉 = 〈fm, U(fn)〉 .

It follows that U = T∗. Furthermore, TT∗ = I = T∗T; so T is normal.

We show that T has no eigenvectors. Suppose that f is an eigenvector ofT, say, T(f) = λf for some λ. Since V equals the span of S, we may write

f =m∑

i=n

aifi, where am �= 0.


Hencem∑

i=n

aifi+1 = T(f) = λf =m∑

i=n

λaifi.

Since am �= 0, we can write fm+1 as a linear combination of fn, fn+1, . . . , fm.But this is a contradiction because S is linearly independent. ♦

Example 1 illustrates that normality is not sufficient to guarantee theexistence of an orthonormal basis of eigenvectors for real inner product spaces.For real inner product spaces, we must replace normality by the strongercondition that T = T∗ in order to guarantee such a basis.

Definitions. Let T be a linear operator on an inner product space V.We say that T is self-adjoint (Hermitian) if T = T∗. An n × n real orcomplex matrix A is self-adjoint (Hermitian) if A = A∗.

It follows immediately that if β is an orthonormal basis, then T is self-adjoint if and only if [T]β is self-adjoint. For real matrices, this conditionreduces to the requirement that A be symmetric.

Before we state our main result for self-adjoint operators, we need somepreliminary work.

By definition, a linear operator on a real inner product space has onlyreal eigenvalues. The lemma that follows shows that the same can be saidfor self-adjoint operators on a complex inner product space. Similarly, thecharacteristic polynomial of every linear operator on a complex inner productspace splits, and the same is true for self-adjoint operators on a real innerproduct space.

Lemma. Let T be a self-adjoint operator on a finite-dimensional innerproduct space V. Then

(a) Every eigenvalue of T is real.(b) Suppose that V is a real inner product space. Then the characteristic

polynomial of T splits.

Proof. (a) Suppose that T(x) = λx for x �= 0 . Because a self-adjointoperator is also normal, we can apply Theorem 6.15(c) to obtain

λx = T(x) = T∗(x) = λx.

So λ = λ; that is, λ is real.(b) Let n = dim(V), β be an orthonormal basis for V, and A = [T]β .

Then A is self-adjoint. Let TA be the linear operator on Cn defined byTA(x) = Ax for all x ∈ Cn. Note that TA is self-adjoint because [TA]γ = A,where γ is the standard ordered (orthonormal) basis for Cn. So, by (a),the eigenvalues of TA are real. By the fundamental theorem of algebra, the


characteristic polynomial of TA splits into factors of the form t−λ. Since eachλ is real, the characteristic polynomial splits over R. But TA has the samecharacteristic polynomial as A, which has the same characteristic polynomialas T. Therefore the characteristic polynomial of T splits.

We are now able to establish one of the major results of this chapter.

Theorem 6.17. Let T be a linear operator on a finite-dimensional realinner product space V. Then T is self-adjoint if and only if there exists anorthonormal basis β for V consisting of eigenvectors of T.

Proof. Suppose that T is self-adjoint. By the lemma, we may apply Schur’stheorem to obtain an orthonormal basis β for V such that the matrix A = [T]βis upper triangular. But

A∗ = [T]∗β = [T∗]β = [T]β = A.

So A and A∗ are both upper triangular, and therefore A is a diagonal matrix.Thus β must consist of eigenvectors of T.

The converse is left as an exercise.

Theorem 6.17 is used extensively in many areas of mathematics and statis-tics. We restate this theorem in matrix form in the next section.

Example 4

As we noted earlier, real symmetric matrices are self-adjoint, and self-adjointmatrices are normal. The following matrix A is complex and symmetric:

A =(

i ii 1

)and A∗ =

(−i −i−i 1

).

But A is not normal, because (AA∗)12 = 1+i and (A∗A)12 = 1−i. Thereforecomplex symmetric matrices need not be normal. ♦

EXERCISES


(a) Every self-adjoint operator is normal.(b) Operators and their adjoints have the same eigenvectors.(c) If T is an operator on an inner product space V, then T is normal

if and only if [T]β is normal, where β is any ordered basis for V.(d) A real or complex matrix A is normal if and only if LA is normal.(e) The eigenvalues of a self-adjoint operator must all be real.


(f) The identity and zero operators are self-adjoint.(g) Every normal operator is diagonalizable.(h) Every self-adjoint operator is diagonalizable.

2. For each linear operator T on an inner product space V, determinewhether T is normal, self-adjoint, or neither. If possible, produce anorthonormal basis of eigenvectors of T for V and list the correspondingeigenvalues.

(a) V = R2 and T is defined by T(a, b) = (2a − 2b,−2a + 5b).(b) V = R3 and T is defined by T(a, b, c) = (−a + b, 5b, 4a − 2b + 5c).(c) V = C2 and T is defined by T(a, b) = (2a + ib, a + 2b).(d) V = P2(R) and T is defined by T(f) = f ′, where

〈f, g〉 =∫ 1

0

f(t)g(t) dt.

(e) V = M2×2(R) and T is defined by T(A) = At.

(f) V = M2×2(R) and T is defined by T

(a bc d

)=(

c da b

).

3. Give an example of a linear operator T on R2 and an ordered basis forR2 that provides a counterexample to the statement in Exercise 1(c).

4. Let T and U be self-adjoint operators on an inner product space V.Prove that TU is self-adjoint if and only if TU = UT.

5. Prove (b) of Theorem 6.15.

6. Let V be a complex inner product space, and let T be a linear operatoron V. Define

T1 =12(T + T∗) and T2 =

12i

(T − T∗).

(a) Prove that T1 and T2 are self-adjoint and that T = T1 + iT2.(b) Suppose also that T = U1 + iU2, where U1 and U2 are self-adjoint.

Prove that U1 = T1 and U2 = T2.(c) Prove that T is normal if and only if T1T2 = T2T1.

7. Let T be a linear operator on an inner product space V, and let W bea T-invariant subspace of V. Prove the following results.

(a) If T is self-adjoint, then TW is self-adjoint.(b) W⊥ is T∗-invariant.(c) If W is both T- and T∗-invariant, then (TW)∗ = (T∗)W.(d) If W is both T- and T∗-invariant and T is normal, then TW is

normal.


8. Let T be a normal operator on a finite-dimensional complex innerproduct space V, and let W be a subspace of V. Prove that if W isT-invariant, then W is also T∗-invariant. Hint: Use Exercise 24 of Sec-tion 5.4.

9. Let T be a normal operator on a finite-dimensional inner product spaceV. Prove that N(T) = N(T∗) and R(T) = R(T∗). Hint: Use Theo-rem 6.15 and Exercise 12 of Section 6.3.

10. Let T be a self-adjoint operator on a finite-dimensional inner productspace V. Prove that for all x ∈ V

‖T(x) ± ix‖2 = ‖T(x)‖2 + ‖x‖2.

Deduce that T − iI is invertible and that [(T − iI)−1]∗ = (T + iI)−1.

11. Assume that T is a linear operator on a complex (not necessarily finite-dimensional) inner product space V with an adjoint T∗. Prove thefollowing results.

(a) If T is self-adjoint, then 〈T(x), x〉 is real for all x ∈ V.(b) If T satisfies 〈T(x), x〉 = 0 for all x ∈ V, then T = T0. Hint:

Replace x by x + y and then by x + iy, and expand the resultinginner products.

(c) If 〈T(x), x〉 is real for all x ∈ V, then T = T∗.

12. Let T be a normal operator on a finite-dimensional real inner productspace V whose characteristic polynomial splits. Prove that V has anorthonormal basis of eigenvectors of T. Hence prove that T is self-adjoint.

13. An n×n real matrix A is said to be a Gramian matrix if there exists areal (square) matrix B such that A = BtB. Prove that A is a Gramianmatrix if and only if A is symmetric and all of its eigenvalues are non-negative. Hint: Apply Theorem 6.17 to T = LA to obtain an orthonor-mal basis {v1, v2, . . . , vn} of eigenvectors with the associated eigenvaluesλ1, λ2, . . . , λn. Define the linear operator U by U(vi) =

√λivi.

14. Simultaneous Diagonalization. Let V be a finite-dimensional real innerproduct space, and let U and T be self-adjoint linear operators on Vsuch that UT = TU. Prove that there exists an orthonormal basis forV consisting of vectors that are eigenvectors of both U and T. (Thecomplex version of this result appears as Exercise 10 of Section 6.6.)Hint: For any eigenspace W = Eλ of T, we have that W is both T- andU-invariant. By Exercise 7, we have that W⊥ is both T- and U-invariant.Apply Theorem 6.17 and Theorem 6.6 (p. 350).


15. Let A and B be symmetric n × n matrices such that AB = BA. UseExercise 14 to prove that there exists an orthogonal matrix P such thatP tAP and P tBP are both diagonal matrices.

16. Prove the Cayley–Hamilton theorem for a complex n×n matrix A. Thatis, if f(t) is the characteristic polynomial of A, prove that f(A) = O.Hint: Use Schur’s theorem to show that A may be assumed to be uppertriangular, in which case

f(t) =n∏

i=1

(Aii − t).

Now if T = LA, we have (Ajj I − T)(ej) ∈ span({e1, e2, . . . , ej−1}) forj ≥ 2, where {e1, e2, . . . , en} is the standard ordered basis for Cn. (Thegeneral case is proved in Section 5.4.)

The following definitions are used in Exercises 17 through 23.

Definitions. A linear operator T on a finite-dimensional inner productspace is called positive definite [positive semidefinite] if T is self-adjointand 〈T(x), x〉 > 0 [〈T(x), x〉 ≥ 0] for all x �= 0 .

An n × n matrix A with entries from R or C is called positive definite[positive semidefinite] if LA is positive definite [positive semidefinite].

17. Let T and U be a self-adjoint linear operators on an n-dimensional innerproduct space V, and let A = [T]β , where β is an orthonormal basis forV. Prove the following results.

(a) T is positive definite [semidefinite] if and only if all of its eigenval-ues are positive [nonnegative].

(b) T is positive definite if and only if∑i,j

Aijajai > 0 for all nonzero n-tuples (a1, a2, . . . , an).

(c) T is positive semidefinite if and only if A = B∗B for some squarematrix B.

(d) If T and U are positive semidefinite operators such that T2 = U2,then T = U.

(e) If T and U are positive definite operators such that TU = UT, thenTU is positive definite.

(f) T is positive definite [semidefinite] if and only if A is positive def-inite [semidefinite].

Because of (f), results analogous to items (a) through (d) hold for ma-trices as well as operators.


18. Let T : V → W be a linear transformation, where V and W are finite-dimensional inner product spaces. Prove the following results.

(a) T∗T and TT∗ are positive semidefinite. (See Exercise 15 of Sec-tion 6.3.)

(b) rank(T∗T) = rank(TT∗) = rank(T).

19. Let T and U be positive definite operators on an inner product spaceV. Prove the following results.

(a) T + U is positive definite.(b) If c > 0, then cT is positive definite.(c) T−1 is positive definite.

20. Let V be an inner product space with inner product 〈 · , ·〉, and let T bea positive definite linear operator on V. Prove that 〈x, y〉′ = 〈T(x), y〉defines another inner product on V.

21. Let V be a finite-dimensional inner product space, and let T and U beself-adjoint operators on V such that T is positive definite. Prove thatboth TU and UT are diagonalizable linear operators that have only realeigenvalues. Hint: Show that UT is self-adjoint with respect to the innerproduct 〈x, y〉′ = 〈T(x), y〉. To show that TU is self-adjoint, repeat theargument with T−1 in place of T.

22. This exercise provides a converse to Exercise 20. Let V be a finite-dimensional inner product space with inner product 〈 · , ·〉, and let 〈 · , ·〉′be any other inner product on V.

(a) Prove that there exists a unique linear operator T on V suchthat 〈x, y〉′ = 〈T(x), y〉 for all x and y in V. Hint: Let β ={v1, v2, . . . , vn} be an orthonormal basis for V with respect to〈 · , ·〉, and define a matrix A by Aij = 〈vj , vi〉′ for all i and j.Let T be the unique linear operator on V such that [T]β = A.

(b) Prove that the operator T of (a) is positive definite with respectto both inner products.

23. Let U be a diagonalizable linear operator on a finite-dimensional innerproduct space V such that all of the eigenvalues of U are real. Prove thatthere exist positive definite linear operators T1 and T′

1 and self-adjointlinear operators T2 and T′

2 such that U = T2T1 = T′1T

′2. Hint: Let 〈 · , ·〉

be the inner product associated with V, β a basis of eigenvectors for U,〈 · , ·〉′ the inner product on V with respect to which β is orthonormal(see Exercise 22(a) of Section 6.1), and T1 the positive definite operatoraccording to Exercise 22. Show that U is self-adjoint with respect to〈 · , ·〉′ and U = T−1

1 U∗T1 (the adjoint is with respect to 〈 · , ·〉). LetT2 = T1

−1U∗.

Sec. 6.5 Unitary and Orthogonal Operators and Their Matrices 379

24. This argument gives another proof of Schur’s theorem. Let T be a linearoperator on a finite dimensional inner product space V.

(a) Suppose that β is an ordered basis for V such that [T]β is an uppertriangular matrix. Let γ be the orthonormal basis for V obtainedby applying the Gram–Schmidt orthogonalization process to β andthen normalizing the resulting vectors. Prove that [T]γ is an uppertriangular matrix.

(b) Use Exercise 32 of Section 5.4 and (a) to obtain an alternate proofof Schur’s theorem.

6.5 UNITARY AND ORTHOGONAL OPERATORSAND THEIR MATRICES

In this section, we continue our analogy between complex numbers and linearoperators. Recall that the adjoint of a linear operator acts similarly to theconjugate of a complex number (see, for example, Theorem 6.11 p. 359). Acomplex number z has length 1 if zz = 1. In this section, we study thoselinear operators T on an inner product space V such that TT∗ = T∗T = I. Wewill see that these are precisely the linear operators that “preserve length”in the sense that ‖T(x)‖ = ‖x‖ for all x ∈ V. As another characterization,we prove that, on a finite-dimensional complex inner product space, these arethe normal operators whose eigenvalues all have absolute value 1.

In past chapters, we were interested in studying those functions that pre-serve the structure of the underlying space. In particular, linear operatorspreserve the operations of vector addition and scalar multiplication, and iso-morphisms preserve all the vector space structure. It is now natural to con-sider those linear operators T on an inner product space that preserve length.We will see that this condition guarantees, in fact, that T preserves the innerproduct.

Definitions. Let T be a linear operator on a finite-dimensional innerproduct space V (over F ). If ‖T(x)‖ = ‖x‖ for all x ∈ V, we call T a unitaryoperator if F = C and an orthogonal operator if F = R.

It should be noted that, in the infinite-dimensional case, an operator sat-isfying the preceding norm requirement is generally called an isometry. If,in addition, the operator is onto (the condition guarantees one-to-one), thenthe operator is called a unitary or orthogonal operator.

Clearly, any rotation or reflection in R2 preserves length and hence isan orthogonal operator. We study these operators in much more detail inSection 6.11.


Example 1

Let h ∈ H satisfy |h(x)| = 1 for all x. Define the linear operator T on H byT(f) = hf . Then

‖T(f)‖2 = ‖hf‖2 =12π

∫ 2π

0

h(t)f(t)h(t)f(t) dt = ‖f‖2

since |h(t)|2 = 1 for all t. So T is a unitary operator. ♦Theorem 6.18. Let T be a linear operator on a finite-dimensional inner

product space V. Then the following statements are equivalent.(a) TT∗ = T∗T = I.(b) 〈T(x), T(y)〉 = 〈x, y〉 for all x, y ∈ V.(c) If β is an orthonormal basis for V, then T(β) is an orthonormal basis

for V.(d) There exists an orthonormal basis β for V such that T(β) is an orthonor-

mal basis for V.(e) ‖T(x)‖ = ‖x‖ for all x ∈ V.

Thus all the conditions above are equivalent to the definition of a uni-tary or orthogonal operator. From (a), it follows that unitary or orthogonaloperators are normal.

Before proving the theorem, we first prove a lemma. Compare this lemmato Exercise 11(b) of Section 6.4.

Lemma. Let U be a self-adjoint operator on a finite-dimensional innerproduct space V. If 〈x,U(x)〉 = 0 for all x ∈ V, then U = T0.

Proof. By either Theorem 6.16 (p. 372) or 6.17 (p. 374), we may choosean orthonormal basis β for V consisting of eigenvectors of U. If x ∈ β, thenU(x) = λx for some λ. Thus

0 = 〈x,U(x)〉 = 〈x, λx〉 = λ 〈x, x〉 ;

so λ = 0. Hence U(x) = 0 for all x ∈ β, and thus U = T0.

Proof of Theorem 6.18. We prove first that (a) implies (b). Let x, y ∈ V.Then 〈x, y〉 = 〈T∗T(x), y〉 = 〈T(x), T(y)〉.

Second, we prove that (b) implies (c). Let β = {v1, v2, . . . , vn} be anorthonormal basis for V; so T(β) = {T(v1), T(v2), . . . ,T(vn)}. It follows that〈T(vi), T(vj)〉 = 〈vi, vj〉 = δij . Therefore T(β) is an orthonormal basis for V.

That (c) implies (d) is obvious.Next we prove that (d) implies (e). Let x ∈ V, and let β = {v1, v2, . . . , vn}.

Now

x =n∑

i=1

aivi


for some scalars ai, and so

‖x‖2 =

⟨n∑

i=1

aivi,n∑

j=1

ajvj

⟩=

n∑i=1

n∑j=1

aiaj 〈vi, vj〉

=n∑

i=1

n∑j=1

aiajδij =n∑

i=1

|ai|2

since β is orthonormal.Applying the same manipulations to

T(x) =n∑

i=1

aiT(vi)

and using the fact that T(β) is also orthonormal, we obtain

‖T(x)‖2 =n∑

i=1

|ai|2.

Hence ‖T(x)‖ = ‖x‖.Finally, we prove that (e) implies (a). For any x ∈ V, we have

〈x, x〉 = ‖x‖2 = ‖T(x)‖2 = 〈T(x), T(x)〉 = 〈x,T∗T(x)〉 .

So 〈x, (I − T∗T)(x)〉 = 0 for all x ∈ V. Let U = I − T∗T; then U is self-adjoint, and 〈x,U(x)〉 = 0 for all x ∈ V. Hence, by the lemma, we haveT0 = U = I − T∗T, and therefore T∗T = I. Since V is finite-dimensional, wemay use Exercise 10 of Section 2.4 to conclude that TT∗ = I.

It follows immediately from the definition that every eigenvalue of a uni-tary or orthogonal operator has absolute value 1. In fact, even more is true.

Corollary 1. Let T be a linear operator on a finite-dimensional realinner product space V. Then V has an orthonormal basis of eigenvectors ofT with corresponding eigenvalues of absolute value 1 if and only if T is bothself-adjoint and orthogonal.

Proof. Suppose that V has an orthonormal basis {v1, v2, . . . , vn} such thatT(vi) = λivi and |λi| = 1 for all i. By Theorem 6.17 (p. 374), T is self-adjoint.Thus (TT∗)(vi) = T(λivi) = λiλivi = λ2

i vi = vi for each i. So TT∗ = I, andagain by Exercise 10 of Section 2.4, T is orthogonal by Theorem 6.18(a).

If T is self-adjoint, then, by Theorem 6.17, we have that V possesses anorthonormal basis {v1, v2, . . . , vn} such that T(vi) = λivi for all i. If T is alsoorthogonal, we have

|λi| ·‖vi‖ = ‖λivi‖ = ‖T(vi)‖ = ‖vi‖;so |λi| = 1 for every i.


Corollary 2. Let T be a linear operator on a finite-dimensional complexinner product space V. Then V has an orthonormal basis of eigenvectors of Twith corresponding eigenvalues of absolute value 1 if and only if T is unitary.

Proof. The proof is similar to the proof of Corollary 1.

Example 2

Let T : R2 → R2 be a rotation by θ, where 0 < θ < π. It is clear geometricallythat T “preserves length”, that is, that ‖T(x)‖ = ‖x‖ for all x ∈ R2. Thefact that rotations by a fixed angle preserve perpendicularity not only can beseen geometrically but now follows from (b) of Theorem 6.18. Perhaps thefact that such a transformation preserves the inner product is not so obvious;however, we obtain this fact from (b) also. Finally, an inspection of the matrixrepresentation of T with respect to the standard ordered basis, which is(


),

reveals that T is not self-adjoint for the given restriction on θ. As we men-tioned earlier, this fact also follows from the geometric observation that Thas no eigenvectors and from Theorem 6.15 (p. 371). It is seen easily fromthe preceding matrix that T∗ is the rotation by −θ. ♦

Definition. Let L be a one-dimensional subspace of R2. We may view Las a line in the plane through the origin. A linear operator T on R2 is calleda reflection of R2 about L if T(x) = x for all x ∈ L and T(x) = −x for allx ∈ L⊥.

As an example of a reflection, consider the operator defined in Example 3 ofSection 2.5.

Example 3

Let T be a reflection of R2 about a line L through the origin. We show thatT is an orthogonal operator. Select vectors v1 ∈ L and v2 ∈ L⊥ such that‖v1‖ = ‖v2‖ = 1. Then T(v1) = v1 and T(v2) = −v2. Thus v1 and v2

are eigenvectors of T with corresponding eigenvalues 1 and −1, respectively.Furthermore, {v1, v2} is an orthonormal basis for R2. It follows that T is anorthogonal operator by Corollary 1 to Theorem 6.18. ♦

We now examine the matrices that represent unitary and orthogonal trans-formations.

Definitions. A square matrix A is called an an orthogonal matrix ifAtA = AAt = I and unitary if A∗A = AA∗ = I.


Since for a real matrix A we have A∗ = At, a real unitary matrix is alsoorthogonal. In this case, we call A orthogonal rather than unitary.

Note that the condition AA∗ = I is equivalent to the statement that therows of A form an orthonormal basis for Fn because

δij = Iij = (AA∗)ij =n∑

k=1

Aik(A∗)kj =n∑

k=1

AikAjk,

and the last term represents the inner product of the ith and jth rows of A.A similar remark can be made about the columns of A and the condition

A∗A = I.It also follows from the definition above and from Theorem 6.10 (p. 359)

that a linear operator T on an inner product space V is unitary [orthogonal]if and only if [T]β is unitary [orthogonal] for some orthonormal basis β for V.

Example 4

From Example 2, the matrix (cos θ − sin θsin θ cos θ

)is clearly orthogonal. One can easily see that the rows of the matrix forman orthonormal basis for R2. Similarly, the columns of the matrix form anorthonormal basis for R2. ♦Example 5

Let T be a reflection of R2 about a line L through the origin, let β be thestandard ordered basis for R2, and let A = [T]β . Then T = LA. Since T isan orthogonal operator and β is an orthonormal basis, A is an orthogonalmatrix. We describe A.

Suppose that α is the angle from the positive x-axis to L. Let v1 =(cos α, sin α) and v2 = (− sin α, cos α). Then ‖v1‖ = ‖v2‖ = 1, v1 ∈ L,and v2 ∈ L⊥. Hence γ = {v1, v2} is an orthonormal basis for R2. BecauseT(v1) = v1 and T(v2) = −v2, we have

[T ]γ = [LA]γ =(

1 00 −1

).

Let

Q =(

cos α − sin αsin α cos α

).

By the corollary to Theorem 2.23 (p. 115),

A = Q[LA]γQ−1


=(

cos α − sin αsin α cos α

)(1 00 −1

)(cos α sin α

− sin α cos α

)

=(

cos2 α − sin2 α 2 sin α cos α2 sin α cos α −(cos2 α − sin2 α)

)

=(

cos 2α sin 2αsin 2α − cos 2α

). ♦

We know that, for a complex normal [real symmetric] matrix A, thereexists an orthonormal basis β for Fn consisting of eigenvectors of A. Hence Ais similar to a diagonal matrix D. By the corollary to Theorem 2.23 (p. 115),the matrix Q whose columns are the vectors in β is such that D = Q−1AQ.But since the columns of Q are an orthonormal basis for Fn, it follows that Qis unitary [orthogonal]. In this case, we say that A is unitarily equivalent[orthogonally equivalent] to D. It is easily seen (see Exercise 18) that thisrelation is an equivalence relation on Mn×n(C) [Mn×n(R)]. More generally,A and B are unitarily equivalent [orthogonally equivalent ] if and only if thereexists a unitary [orthogonal ] matrix P such that A = P ∗BP .

The preceding paragraph has proved half of each of the next two theo-rems.

Theorem 6.19. Let A be a complex n × n matrix. Then A is normal ifand only if A is unitarily equivalent to a diagonal matrix.

Proof. By the preceding remarks, we need only prove that if A is unitarilyequivalent to a diagonal matrix, then A is normal.

Suppose that A = P ∗DP , where P is a unitary matrix and D is a diagonalmatrix. Then

AA∗ = (P ∗DP )(P ∗DP )∗ = (P ∗DP )(P ∗D∗P ) = P ∗DID∗P = P ∗DD∗P.

Similarly, A∗A = P ∗D∗DP . Since D is a diagonal matrix, however, we haveDD∗ = D∗D. Thus AA∗ = A∗A.

Theorem 6.20. Let A be a real n × n matrix. Then A is symmetric ifand only if A is orthogonally equivalent to a real diagonal matrix.

Proof. The proof is similar to the proof of Theorem 6.19 and is left as anexercise.

Example 6

Let

A =

⎛⎝4 2 22 4 22 2 4

⎞⎠ .


Since A is symmetric, Theorem 6.20 tells us that A is orthogonally equivalentto a diagonal matrix. We find an orthogonal matrix P and a diagonal matrixD such that P tAP = D.

To find P , we obtain an orthonormal basis of eigenvectors. It is easy toshow that the eigenvalues of A are 2 and 8. The set {(−1, 1, 0), (−1, 0, 1)}is a basis for the eigenspace corresponding to 2. Because this set is notorthogonal, we apply the Gram–Schmidt process to obtain the orthogonalset {(−1, 1, 0),−1

2 (1, 1,−2)}. The set {(1, 1, 1)} is a basis for the eigenspacecorresponding to 8. Notice that (1, 1, 1) is orthogonal to the preceding twovectors, as predicted by Theorem 6.15(d) (p. 371). Taking the union of thesetwo bases and normalizing the vectors, we obtain the following orthonormalbasis for R3 consisting of eigenvectors of A:{

1√2(−1, 1, 0),

1√6(1, 1,−2),

1√3(1, 1, 1)

}.

Thus one possible choice for P is

P =

⎛⎜⎜⎝−1√

21√6

1√3

1√2

1√6

1√3

0 −2√6

1√3

⎞⎟⎟⎠ , and D =

⎛⎝2 0 00 2 00 0 8

⎞⎠ . ♦

Because of Schur’s theorem (Theorem 6.14 p. 370), the next result isimmediate. As it is the matrix form of Schur’s theorem, we also refer to it asSchur’s theorem.

Theorem 6.21 (Schur). Let A ∈ Mn×n(F ) be a matrix whose charac-teristic polynomial splits over F .

(a) If F = C, then A is unitarily equivalent to a complex upper triangularmatrix.

(b) If F = R, then A is orthogonally equivalent to a real upper triangularmatrix.

Rigid Motions*

The purpose of this application is to characterize the so-called rigid mo-tions of a finite-dimensional real inner product space. One may think intu-itively of such a motion as a transformation that does not affect the shape ofa figure under its action, hence the term rigid. The key requirement for sucha transformation is that it preserves distances.

Definition. Let V be a real inner product space. A function f : V → Vis called a rigid motion if

‖f(x) − f(y)‖ = ‖x − y‖


for all x, y ∈ V.

For example, any orthogonal operator on a finite-dimensional real innerproduct space is a rigid motion.

Another class of rigid motions are the translations. A function g : V → V,where V is a real inner product space, is called a translation if there existsa vector v0 ∈ V such that g(x) = x + v0 for all x ∈ V. We say that g isthe translation by v0. It is a simple exercise to show that translations, aswell as composites of rigid motions on a real inner product space, are alsorigid motions. (See Exercise 22.) Thus an orthogonal operator on a finite-dimensional real inner product space V followed by a translation on V is arigid motion on V. Remarkably, every rigid motion on V may be characterizedin this way.

Theorem 6.22. Let f : V → V be a rigid motion on a finite-dimensionalreal inner product space V. Then there exists a unique orthogonal operatorT on V and a unique translation g on V such that f = g ◦ T .

Any orthogonal operator is a special case of this composite, in whichthe translation is by 0 . Any translation is also a special case, in which theorthogonal operator is the identity operator.

Proof. Let T : V → V be defined by

T(x) = f(x) − f(0 )

for all x ∈ V. We show that T is an orthogonal operator, from which itfollows that f = g ◦ T , where g is the translation by f(0 ). Observe that T isthe composite of f and the translation by −f(0 ); hence T is a rigid motion.Furthermore, for any x ∈ V

‖T(x)‖2 = ‖f(x) − f(0 )‖2 = ‖x − 0‖2 = ‖x‖2,

and consequently ‖T(x)‖ = ‖x‖ for any x ∈ V. Thus for any x, y ∈ V,

‖T (x) − T (y)‖2 = ‖T(x)‖2 − 2 〈T(x), T(y)〉 + ‖T(y)‖2

= ‖x‖2 − 2 〈T(x), T(y)〉 + ‖y‖2

and

‖x − y‖2 = ‖x‖2 − 2 〈x, y〉 + ‖y‖2.

But ‖T (x) − T (y)‖2 = ‖x − y‖2; so 〈T(x), T(y)〉 = 〈x, y〉 for all x, y ∈ V.We are now in a position to show that T is a linear transformation. Let

x, y ∈ V, and let a ∈ R. Then

‖T(x + ay) − T(x) − aT(y)‖2 = ‖[T(x + ay) − T(x)] − aT(y)‖2


= ‖T(x + ay) − T(x)‖2 + a2‖T(y)‖2 − 2a 〈T(x + ay) − T(x), T(y)〉= ‖(x + ay) − x‖2 + a2‖y‖2 − 2a[〈T(x + ay), T(y)〉 − 〈T(x), T(y)〉]= a2‖y‖2 + a2‖y‖2 − 2a[〈x + ay, y〉 − 〈x, y〉]= 2a2‖y‖2 − 2a[〈x, y〉 + a‖y‖2 − 〈x, y〉]= 0.

Thus T(x+ay) = T(x)+aT(y), and hence T is linear. Since T also preservesinner products, T is an orthogonal operator.

To prove uniqueness, suppose that u0 and v0 are in V and T and U areorthogonal operators on V such that

f(x) = T(x) + u0 = U(x) + v0

for all x ∈ V. Substituting x = 0 in the preceding equation yields u0 = v0,and hence the translation is unique. This equation, therefore, reduces toT(x) = U(x) for all x ∈ V, and hence T = U.

Orthogonal Operators on R2

Because of Theorem 6.22, an understanding of rigid motions requires acharacterization of orthogonal operators. The next result characterizes or-thogonal operators on R2. We postpone the case of orthogonal operators onmore general spaces to Section 6.11.

Theorem 6.23. Let T be an orthogonal operator on R2, and let A = [T]β ,where β is the standard ordered basis for R2. Then exactly one of the followingconditions is satisfied:

(a) T is a rotation, and det(A) = 1.(b) T is a reflection about a line through the origin, and det(A) = −1.

Proof. Because T is an orthogonal operator, T(β) = {T(e1), T(e2)} is anorthonormal basis for R2 by Theorem 6.18(c). Since T(e1) is a unit vector,there is a unique angle θ, 0 ≤ θ < 2π, such that T(e1) = (cos θ, sin θ). SinceT(e2) is a unit vector and is orthogonal to T(e1), there are only two possiblechoices for T(e2). Either

T(e2) = (− sin θ, cos θ) or T(e2) = (sin θ,− cos θ).

First, suppose that T(e2) = (− sin θ, cos θ). Then A =(


).

It follows from Example 1 of Section 6.4 that T is a rotation by the angle θ.Also

det(A) = cos2 θ + sin2 θ = 1.


Now suppose that T(e2) = (sin θ,− cos θ). Then A =(

cos θ sin θsin θ − cos θ

).

Comparing this matrix to the matrix A of Example 5, we see that T is thereflection of R2 about a line L, so that α = θ/2 is the angle from the positivex-axis to L. Furthermore,

det(A) = − cos2 θ − sin2 θ = −1.

Combining Theorems 6.22 and 6.23, we obtain the following characteriza-tion of rigid motions on R2.

Corollary. Any rigid motion on R2 is either a rotation followed by a trans-lation or a reflection about a line through the origin followed by a translation.

Example 7

Let

A =

⎛⎜⎜⎝1√5

2√5

2√5

−1√5

⎞⎟⎟⎠ .

We show that LA is the reflection of R2 about a line L through the origin, andthen describe L.

Clearly AA∗ = A∗A = I, and therefore A is an orthogonal matrix. HenceLA is an orthogonal operator. Furthermore,

det(A) = −15− 4

5= −1,

and thus LA is a reflection of R2 about a line L through the origin by The-orem 6.23. Since L is the one-dimensional eigenspace corresponding to theeigenvalue 1 of LA, it suffices to find an eigenvector of LA corresponding to 1.One such vector is v = (2,

√5 − 1). Thus L is the span of {v}. Alternatively,

L is the line through the origin with slope (√

5 − 1)/2, and hence is the linewith the equation

y =√

5 − 12

x. ♦

Conic Sections

As an application of Theorem 6.20, we consider the quadratic equation

ax2 + 2bxy + cy2 + dx + ey + f = 0. (2)


For special choices of the coefficients in (2), we obtain the various conicsections. For example, if a = c = 1, b = d = e = 0, and f = −1, weobtain the circle x2 + y2 = 1 with center at the origin. The remainingconic sections, namely, the ellipse, parabola, and hyperbola, are obtainedby other choices of the coefficients. If b = 0, then it is easy to graph theequation by the method of completing the square because the xy-term isabsent. For example, the equation x2 +2x+y2 +4y+2 = 0 may be rewrittenas (x + 1)2 + (y + 2)2 = 3, which describes a circle with radius

√3 and center

at (−1,−2) in the xy-coordinate system. If we consider the transformationof coordinates (x, y) → (x′, y′), where x′ = x + 1 and y′ = y + 2, then ourequation simplifies to (x′)2 + (y′)2 = 3. This change of variable allows us toeliminate the x- and y-terms.

We now concentrate solely on the elimination of the xy-term. To accom-plish this, we consider the expression

ax2 + 2bxy + cy2, (3)

which is called the associated quadratic form of (2). Quadratic forms arestudied in more generality in Section 6.8.

If we let

A =(

a bb c

)and X =

(xy

),

then (3) may be written as XtAX = 〈AX, X〉. For example, the quadraticform 3x2 + 4xy + 6y2 may be written as

Xt

(3 22 6

)X.

The fact that A is symmetric is crucial in our discussion. For, by Theo-rem 6.20, we may choose an orthogonal matrix P and a diagonal matrix Dwith real diagonal entries λ1 and λ2 such that P tAP = D. Now define

X ′ =(

x′

y′

)by X ′ = P tX or, equivalently, by PX ′ = PP tX = X. Then

XtAX = (PX ′)tA(PX ′) = X ′t(P tAP )X ′ = X ′tDX ′ = λ1(x′)2 + λ2(y′)2.

Thus the transformation (x, y) → (x′, y′) allows us to eliminate the xy-termin (3), and hence in (2).

Furthermore, since P is orthogonal, we have by Theorem 6.23 (with T =LP ) that det(P ) = ±1. If det(P ) = −1, we may interchange the columns


of P to obtain a matrix Q. Because the columns of P form an orthonormalbasis of eigenvectors of A, the same is true of the columns of Q. Therefore,

QtAQ =(

λ2 00 λ1

).

Notice that det(Q) = −det(P ) = 1. So, if det(P ) = −1, we can take Q forour new P ; consequently, we may always choose P so that det(P ) = 1. ByLemma 4 to Theorem 6.22 (with T = LP ), it follows that matrix P representsa rotation.

In summary, the xy-term in (2) may be eliminated by a rotation of thex-axis and y-axis to new axes x′ and y′ given by X = PX ′, where P is anorthogonal matrix and det(P ) = 1. Furthermore, the coefficients of (x′)2 and(y′)2 are the eigenvalues of

A =(

a bb c

).

This result is a restatement of a result known as the principal axis theoremfor R2. The arguments above, of course, are easily extended to quadraticequations in n variables. For example, in the case n = 3, by special choicesof the coefficients, we obtain the quadratic surfaces—the elliptic cone, theellipsoid, the hyperbolic paraboloid, etc.

As an illustration of the preceding transformation, consider the quadraticequation

2x2 − 4xy + 5y2 − 36 = 0,

for which the associated quadratic form is 2x2 − 4xy + 5y2. In the notationwe have been using,

A =(

2 −2−2 5

),

so that the eigenvalues of A are 1 and 6 with associated eigenvectors(21

)and

(−12

).

As expected (from Theorem 6.15(d) p. 371), these vectors are orthogonal.The corresponding orthonormal basis of eigenvectors

β =

⎧⎪⎪⎨⎪⎪⎩⎛⎜⎜⎝

2√5

1√5

⎞⎟⎟⎠ ,

⎛⎜⎜⎝−1√

52√5

⎞⎟⎟⎠⎫⎪⎪⎬⎪⎪⎭


determines new axes x′ and y′ as in Figure 6.4. Hence if

P =

⎛⎜⎜⎝2√5

−1√5

1√5

2√5

⎞⎟⎟⎠ =1√5

(2 −11 2

),

then

P tAP =(

1 00 6

).

Under the transformation X = PX ′ or

x =2√5x′ − 1√

5y′

y =1√5x′ +

2√5y′ ,

we have the new quadratic form (x′)2 + 6(y′)2. Thus the original equation2x2−4xy+5y2 = 36 may be written in the form (x′)2+6(y′)2 = 36 relative toa new coordinate system with the x′- and y′-axes in the directions of the firstand second vectors of β, respectively. It is clear that this equation represents

�

�

x

x′

yy′

Figure 6.4

an ellipse. (See Figure 6.4.) Note that the preceding matrix P has the form(cos θ − sin θsin θ cos θ

),

where θ = cos−1 2√5≈ 26.6◦. So P is the matrix representation of a rotation

of R2 through the angle θ. Thus the change of variable X = PX ′ can be ac-complished by this rotation of the x- and y-axes. There is another possibility


for P , however. If the eigenvector of A corresponding to the eigenvalue 6 istaken to be (1,−2) instead of (−1, 2), and the eigenvalues are interchanged,then we obtain the matrix ⎛⎜⎜⎝

1√5

2√5

−2√5

1√5

⎞⎟⎟⎠,

which is the matrix representation of a rotation through the angle θ =

sin−1

(− 2√

5

)≈ −63.4◦. This possibility produces the same ellipse as the

one in Figure 6.4, but interchanges the names of the x′- and y′-axes.

EXERCISES


(a) Every unitary operator is normal.(b) Every orthogonal operator is diagonalizable.(c) A matrix is unitary if and only if it is invertible.(d) If two matrices are unitarily equivalent, then they are also similar.(e) The sum of unitary matrices is unitary.(f) The adjoint of a unitary operator is unitary.(g) If T is an orthogonal operator on V, then [T]β is an orthogonal

matrix for any ordered basis β for V.(h) If all the eigenvalues of a linear operator are 1, then the operator

must be unitary or orthogonal.(i) A linear operator may preserve the norm, but not the inner prod-

uct.

2. For each of the following matrices A, find an orthogonal or unitarymatrix P and a diagonal matrix D such that P ∗AP = D.

(a)(

1 22 1

)(b)

(0 −11 0

)(c)

(2 3 − 3i

3 + 3i 5

)

(d)

⎛⎝0 2 22 0 22 2 0

⎞⎠ (e)

⎛⎝2 1 11 2 11 1 2

⎞⎠3. Prove that the composite of unitary [orthogonal] operators is unitary

[orthogonal].


4. For z ∈ C, define Tz : C → C by Tz(u) = zu. Characterize those z forwhich Tz is normal, self-adjoint, or unitary.

5. Which of the following pairs of matrices are unitarily equivalent?

(a)(

1 00 1

)and

(0 11 0

)(b)

(0 11 0

)and

⎛⎝0 12

12 0

⎞⎠(c)

⎛⎝ 0 1 0−1 0 0

0 0 1

⎞⎠ and

⎛⎝2 0 00 −1 00 0 0

⎞⎠(d)

⎛⎝ 0 1 0−1 0 0

0 0 1

⎞⎠ and

⎛⎝1 0 00 i 00 0 −i

⎞⎠(e)

⎛⎝1 1 00 2 20 0 3

⎞⎠ and

⎛⎝1 0 00 2 00 0 3

⎞⎠6. Let V be the inner product space of complex-valued continuous func-

tions on [0, 1] with the inner product

〈f, g〉 =∫ 1

0

f(t)g(t) dt.

Let h ∈ V, and define T : V → V by T(f) = hf . Prove that T is aunitary operator if and only if |h(t)| = 1 for 0 ≤ t ≤ 1.

7. Prove that if T is a unitary operator on a finite-dimensional inner prod-uct space V, then T has a unitary square root ; that is, there exists aunitary operator U such that T = U2.

8. Let T be a self-adjoint linear operator on a finite-dimensional innerproduct space. Prove that (T+iI)(T−iI)−1 is unitary using Exercise 10of Section 6.4.

9. Let U be a linear operator on a finite-dimensional inner product spaceV. If ‖U(x)‖ = ‖x‖ for all x in some orthonormal basis for V, must Ube unitary? Justify your answer with a proof or a counterexample.

10. Let A be an n × n real symmetric or complex normal matrix. Provethat

tr(A) =n∑

i=1

λi and tr(A∗A) =n∑

i=1

|λi|2,

where the λi’s are the (not necessarily distinct) eigenvalues of A.


11. Find an orthogonal matrix whose first row is (13 , 2

3 , 23 ).

12. Let A be an n × n real symmetric or complex normal matrix. Provethat

det(A) =n∏

i=1

λi,

where the λi’s are the (not necessarily distinct) eigenvalues of A.

13. Suppose that A and B are diagonalizable matrices. Prove or disprovethat A is similar to B if and only if A and B are unitarily equivalent.

14. Prove that if A and B are unitarily equivalent matrices, then A is pos-itive definite [semidefinite] if and only if B is positive definite [semidef-inite]. (See the definitions in the exercises in Section 6.4.)

15. Let U be a unitary operator on an inner product space V, and let W bea finite-dimensional U-invariant subspace of V. Prove that

(a) U(W) = W;(b) W⊥ is U-invariant.

Contrast (b) with Exercise 16.

16. Find an example of a unitary operator U on an inner product space anda U-invariant subspace W such that W⊥ is not U-invariant.

17. Prove that a matrix that is both unitary and upper triangular must bea diagonal matrix.

18. Show that “is unitarily equivalent to” is an equivalence relation onMn×n(C).

19. Let W be a finite-dimensional subspace of an inner product space V.By Theorem 6.7 (p. 352) and the exercises of Section 1.3, V = W⊕W⊥.Define U : V → V by U(v1 + v2) = v1 − v2, where v1 ∈ W and v2 ∈ W⊥.Prove that U is a self-adjoint unitary operator.

20. Let V be a finite-dimensional inner product space. A linear operator Uon V is called a partial isometry if there exists a subspace W of Vsuch that ‖U(x)‖ = ‖x‖ for all x ∈ W and U(x) = 0 for all x ∈ W⊥.Observe that W need not be U-invariant. Suppose that U is such anoperator and {v1, v2, . . . , vk} is an orthonormal basis for W. Prove thefollowing results.

(a) 〈U(x), U(y)〉 = 〈x, y〉 for all x, y ∈ W. Hint: Use Exercise 20 ofSection 6.1.

(b) {U(v1), U(v2), . . . ,U(vk)} is an orthonormal basis for R(U).


(c) There exists an orthonormal basis γ for V such that the firstk columns of [U]γ form an orthonormal set and the remainingcolumns are zero.

(d) Let {w1, w2, . . . , wj} be an orthonormal basis for R(U)⊥ and β ={U(v1), U(v2), . . . ,U(vk), w1, . . . , wj}. Then β is an orthonormalbasis for V.

(e) Let T be the linear operator on V that satisfies T(U(vi)) = vi

(1 ≤ i ≤ k) and T(wi) = 0 (1 ≤ i ≤ j). Then T is well defined,and T = U∗. Hint: Show that 〈U(x), y〉 = 〈x,T(y)〉 for all x, y ∈ β.There are four cases.

(f) U∗ is a partial isometry.

This exercise is continued in Exercise 9 of Section 6.6.

21. Let A and B be n × n matrices that are unitarily equivalent.

(a) Prove that tr(A∗A) = tr(B∗B).(b) Use (a) to prove that

n∑i,j=1

|Aij |2 =n∑

i,j=1

|Bij |2.

(c) Use (b) to show that the matrices(1 22 i

)and

(i 41 1

)are not unitarily equivalent.

22. Let V be a real inner product space.

(a) Prove that any translation on V is a rigid motion.(b) Prove that the composite of any two rigid motions on V is a rigid

motion on V.

23. Prove the following variation of Theorem 6.22: If f : V → V is a rigidmotion on a finite-dimensional real inner product space V, then thereexists a unique orthogonal operator T on V and a unique translation gon V such that f = T ◦ g.

24. Let T and U be orthogonal operators on R2. Use Theorem 6.23 to provethe following results.

(a) If T and U are both reflections about lines through the origin, thenUT is a rotation.

(b) If T is a rotation and U is a reflection about a line through theorigin, then both UT and TU are reflections about lines throughthe origin.


25. Suppose that T and U are reflections of R2 about the respective linesL and L′ through the origin and that φ and ψ are the angles fromthe positive x-axis to L and L′, respectively. By Exercise 24, UT is arotation. Find its angle of rotation.

26. Suppose that T and U are orthogonal operators on R2 such that T isthe rotation by the angle φ and U is the reflection about the line Lthrough the origin. Let ψ be the angle from the positive x-axis to L.By Exercise 24, both UT and TU are reflections about lines L1 and L2,respectively, through the origin.

(a) Find the angle θ from the positive x-axis to L1.(b) Find the angle θ from the positive x-axis to L2.

27. Find new coordinates x′, y′ so that the following quadratic forms canbe written as λ1(x′)2 + λ2(y′)2.

(a) x2 + 4xy + y2

(b) 2x2 + 2xy + 2y2

(c) x2 − 12xy − 4y2

(d) 3x2 + 2xy + 3y2

(e) x2 − 2xy + y2

28. Consider the expression XtAX, where Xt = (x, y, z) and A is as definedin Exercise 2(e). Find a change of coordinates x′, y′, z′ so that thepreceding expression is of the form λ1(x′)2 + λ2(y′)2 + λ3(z′)2.

29. QR-Factorization. Let w1, w2, . . . , wn be linearly independent vectorsin Fn, and let v1, v2, . . . , vn be the orthogonal vectors obtained fromw1, w2, . . . , wn by the Gram–Schmidt process. Let u1, u2, . . . , un be theorthonormal basis obtained by normalizing the vi’s.

(a) Solving (1) in Section 6.2 for wk in terms of uk, show that

wk = ‖vk‖uk +k−1∑j=1

〈wk, uj〉uj (1 ≤ k ≤ n).

(b) Let A and Q denote the n × n matrices in which the kth columnsare wk and uk, respectively. Define R ∈ Mn×n(F ) by

Rjk =

⎧⎪⎨⎪⎩‖vj‖ if j = k

〈wk, uj〉 if j < k

0 if j > k.

Prove A = QR.(c) Compute Q and R as in (b) for the 3×3 matrix whose columns are

the vectors w1, w2, w3, respectively, in Example 4 of Section 6.2.


(d) Since Q is unitary [orthogonal] and R is upper triangular in (b),we have shown that every invertible matrix is the product of a uni-tary [orthogonal] matrix and an upper triangular matrix. Supposethat A ∈ Mn×n(F ) is invertible and A = Q1R1 = Q2R2, whereQ1, Q2 ∈ Mn×n(F ) are unitary and R1, R2 ∈ Mn×n(F ) are uppertriangular. Prove that D = R2R

−11 is a unitary diagonal matrix.

Hint: Use Exercise 17.(e) The QR factorization described in (b) provides an orthogonaliza-

tion method for solving a linear system Ax = b when A is in-vertible. Decompose A to QR, by the Gram–Schmidt process orother means, where Q is unitary and R is upper triangular. ThenQRx = b, and hence Rx = Q∗b. This last system can be easilysolved since R is upper triangular. 1

Use the orthogonalization method and (c) to solve the system

x1 + 2x2 + 2x3 = 1x1 + 2x3 = 11

x2 + x3 = −1.

30. Suppose that β and γ are ordered bases for an n-dimensional real [com-plex] inner product space V. Prove that if Q is an orthogonal [unitary]n × n matrix that changes γ-coordinates into β-coordinates, then β isorthonormal if and only if γ is orthonormal.

The following definition is used in Exercises 31 and 32.

Definition. Let V be a finite-dimensional complex [real] inner productspace, and let u be a unit vector in V. Define the Householder operatorHu : V → V by Hu(x) = x − 2 〈x, u〉u for all x ∈ V.

31. Let Hu be a Householder operator on a finite-dimensional inner productspace V. Prove the following results.

(a) Hu is linear.(b) Hu(x) = x if and only if x is orthogonal to u.(c) Hu(u) = −u.(d) H∗

u = Hu and H2u = I, and hence Hu is a unitary [orthogonal]

operator on V.

(Note: If V is a real inner product space, then in the language of Sec-tion 6.11, Hu is a reflection.)

1At one time, because of its great stability, this method for solving large sys-tems of linear equations with a computer was being advocated as a better methodthan Gaussian elimination even though it requires about three times as much work.(Later, however, J. H. Wilkinson showed that if Gaussian elimination is done “prop-erly,” then it is nearly as stable as the orthogonalization method.)


32. Let V be a finite-dimensional inner product space over F . Let x and ybe linearly independent vectors in V such that ‖x‖ = ‖y‖.(a) If F = C, prove that there exists a unit vector u in V and a complex

number θ with |θ| = 1 such that Hu(x) = θy. Hint: Choose θ so

that 〈x, θy〉 is real, and set u =1

‖x − θy‖ (x − θy).

(b) If F = R, prove that there exists a unit vector u in V such thatHu(x) = y.

6.6 ORTHOGONAL PROJECTIONSAND THE SPECTRAL THEOREM

In this section, we rely heavily on Theorems 6.16 (p. 372) and 6.17 (p. 374) todevelop an elegant representation of a normal (if F = C) or a self-adjoint (ifF = R) operator T on a finite-dimensional inner product space. We prove thatT can be written in the form λ1T1 + λ2T2 + · · ·+ λkTk, where λ1, λ2, . . . , λk

are the distinct eigenvalues of T and T1, T2, . . . ,Tk are orthogonal projections.We must first develop some results about these special projections.

We assume that the reader is familiar with the results about direct sumsdeveloped at the end of Section 5.2. The special case where V is a direct sumof two subspaces is considered in the exercises of Section 1.3.

Recall from the exercises of Section 2.1 that if V = W1⊕W2, then a linearoperator T on V is the projection on W1 along W2 if, whenever x = x1+x2,with x1 ∈ W1 and x2 ∈ W2, we have T(x) = x1. By Exercise 26 of Section 2.1,we have

R(T) = W1 = {x ∈ V : T(x) = x} and N(T) = W2.

So V = R(T) ⊕ N(T). Thus there is no ambiguity if we refer to T as a“projection on W1” or simply as a “projection.” In fact, it can be shown(see Exercise 17 of Section 2.3) that T is a projection if and only if T = T2.Because V = W1⊕W2 = W1⊕W3 does not imply that W2 = W3, we see thatW1 does not uniquely determine T. For an orthogonal projection T, however,T is uniquely determined by its range.

Definition. Let V be an inner product space, and let T : V → V be aprojection. We say that T is an orthogonal projection if R(T)⊥ = N(T)and N(T)⊥ = R(T).

Note that by Exercise 13(c) of Section 6.2, if V is finite-dimensional, weneed only assume that one of the preceding conditions holds. For example, ifR(T)⊥ = N(T), then R(T) = R(T)⊥⊥ = N(T)⊥.

Now assume that W is a finite-dimensional subspace of an inner productspace V. In the notation of Theorem 6.6 (p. 350), we can define a function

Sec. 6.6 Orthogonal Projections and the Spectral Theorem 399

T : V → V by T(y) = u. It is easy to show that T is an orthogonal projectionon W. We can say even more—there exists exactly one orthogonal projectionon W. For if T and U are orthogonal projections on W, then R(T) = W =R(U). Hence N(T) = R(T)⊥ = R(U)⊥ = N(U), and since every projection isuniquely determined by its range and null space, we have T = U. We call Tthe orthogonal projection of V on W.

To understand the geometric difference between an arbitrary projectionon W and the orthogonal projection on W, let V = R2 and W = span{(1, 1)}.Define U and T as in Figure 6.5, where T(v) is the foot of a perpendicularfrom v on the line y = x and U(a1, a2) = (a1, a1). Then T is the orthogo-nal projection of V on W, and U is a different projection on W. Note thatv − T(v) ∈ W⊥, whereas v − U(v) /∈ W⊥.

��

��

��

��

�

��

��

��

��

��

�

U(v)

T(v)

W

v

Figure 6.5

From Figure 6.5, we see that T(v) is the “best approximation in W to v”;that is, if w ∈ W, then ‖w − v‖ ≥ ‖T(v) − v‖. In fact, this approximationproperty characterizes T. These results follow immediately from the corollaryto Theorem 6.6 (p. 350).

As an application to Fourier analysis, recall the inner product space H andthe orthonormal set S in Example 9 of Section 6.1. Define a trigonometricpolynomial of degree n to be a function g ∈ H of the form

g(t) =n∑

j=−n

ajfj(t) =n∑

j=−n

ajeijt,

where an or a−n is nonzero.Let f ∈ H. We show that the best approximation to f by a trigonometric

polynomial of degree less than or equal to n is the trigonometric polynomial


whose coefficients are the Fourier coefficients of f relative to the orthonormalset S. For this result, let W = span({fj : |j| ≤ n}), and let T be the orthogo-nal projection of H on W. The corollary to Theorem 6.6 (p. 350) tells us thatthe best approximation to f by a function in W is

T(f) =n∑

j=−n

〈f, fj〉 fj .

An algebraic characterization of orthogonal projections follows in the nexttheorem.

Theorem 6.24. Let V be an inner product space, and let T be a linearoperator on V. Then T is an orthogonal projection if and only if T has anadjoint T∗ and T2 = T = T∗.

Proof. Suppose that T is an orthogonal projection. Since T2 = T becauseT is a projection, we need only show that T∗ exists and T = T∗. NowV = R(T) ⊕ N(T) and R(T)⊥ = N(T). Let x, y ∈ V. Then we can writex = x1 + x2 and y = y1 + y2, where x1, y1 ∈ R(T) and x2, y2 ∈ N(T). Hence

〈x,T(y)〉 = 〈x1 + x2, y1〉 = 〈x1, y1〉 + 〈x2, y1〉 = 〈x1, y1〉and

〈T(x), y〉 = 〈x1, y1 + y2〉 = 〈x1, y1〉 + 〈x1, y2〉 = 〈x1, y1〉 .

So 〈x,T(y)〉 = 〈T(x), y〉 for all x, y ∈ V; thus T∗ exists and T = T∗.Now suppose that T2 = T = T∗. Then T is a projection by Exercise 17 of

Section 2.3, and hence we must show that R(T) = N(T)⊥ and R(T)⊥ = N(T).Let x ∈ R(T) and y ∈ N(T). Then x = T(x) = T∗(x), and so

〈x, y〉 = 〈T∗(x), y〉 = 〈x,T(y)〉 = 〈x, 0 〉 = 0.

Therefore x ∈ N(T)⊥, from which it follows that R(T) ⊆ N(T)⊥.Let y ∈ N(T)⊥. We must show that y ∈ R(T), that is, T(y) = y. Now

‖y − T(y)‖2 = 〈y − T(y), y − T(y)〉= 〈y, y − T(y)〉 − 〈T(y), y − T(y)〉 .

Since y − T(y) ∈ N(T), the first term must equal zero. But also

〈T(y), y − T(y)〉 = 〈y, T∗(y − T(y))〉 = 〈y, T(y − T(y))〉 = 〈y, 0 〉 = 0.

Thus y − T(y) = 0 ; that is, y = T(y) ∈ R(T). Hence R(T) = N(T)⊥.Using the preceding results, we have R(T)⊥ = N(T)⊥⊥ ⊇ N(T) by Exer-

cise 13(b) of Section 6.2. Now suppose that x ∈ R(T)⊥. For any y ∈ V, wehave 〈T(x), y〉 = 〈x,T∗(y)〉 = 〈x,T(y)〉 = 0. So T(x) = 0 , and thus x ∈ N(T).Hence R(T)⊥ = N(T).


Let V be a finite-dimensional inner product space, W be a subspace of V,and T be the orthogonal projection of V on W. We may choose an orthonormalbasis β = {v1, v2, . . . , vn} for V such that {v1, v2, . . . , vk} is a basis for W.Then [T]β is a diagonal matrix with ones as the first k diagonal entries andzeros elsewhere. In fact, [T]β has the form(

Ik O1

O2 O3

).

If U is any projection on W, we may choose a basis γ for V such that [U]γ hasthe form above; however γ is not necessarily orthonormal.

We are now ready for the principal theorem of this section.

Theorem 6.25 (The Spectral Theorem). Suppose that T is a linearoperator on a finite-dimensional inner product space V over F with the dis-tinct eigenvalues λ1, λ2, . . . , λk. Assume that T is normal if F = C and thatT is self-adjoint if F = R. For each i (1 ≤ i ≤ k), let Wi be the eigenspace ofT corresponding to the eigenvalue λi, and let Ti be the orthogonal projectionof V on Wi. Then the following statements are true.

(a) V = W1 ⊕ W2 ⊕ · · · ⊕ Wk.(b) If W′

i denotes the direct sum of the subspaces Wj for j �= i, thenW⊥

i = W′i.

(c) TiTj = δijTi for 1 ≤ i, j ≤ k.(d) I = T1 + T2 + · · · + Tk.(e) T = λ1T1 + λ2T2 + · · · + λkTk.

Proof. (a) By Theorems 6.16 (p. 372) and 6.17 (p. 374), T is diagonalizable;so

V = W1 ⊕ W2 ⊕ · · · ⊕ Wk

by Theorem 5.11 (p. 278).(b) If x ∈ Wi and y ∈ Wj for some i �= j, then 〈x, y〉 = 0 by The-

orem 6.15(d) (p. 371). It follows easily from this result that W′i ⊆ W⊥

i .From (a), we have

dim(W′i) =

∑j �=i

dim(Wj) = dim(V) − dim(Wi).

On the other hand, we have dim(W⊥i ) = dim(V)−dim(Wi) by Theorem 6.7(c)

(p. 352). Hence W′i = W⊥

i , proving (b).(c) The proof of (c) is left as an exercise.(d) Since Ti is the orthogonal projection of V on Wi, it follows from

(b) that N(Ti) = R(Ti)⊥ = W⊥i = W′

i. Hence, for x ∈ V, we have x =x1 + x2 + · · · + xk, where Ti(x) = xi ∈ Wi, proving (d).


(e) For x ∈ V, write x = x1 + x2 + · · · + xk, where xi ∈ Wi. Then

T(x) = T(x1) + T(x2) + · · · + T(xk)= λ1x1 + λ2x2 + · · · + λkxk

= λ1T1(x) + λ2T2(x) + · · · + λkTk(x)

= (λ1T1 + λ2T2 + · · · + λkTk)(x).

The set {λ1, λ2, . . . , λk} of eigenvalues of T is called the spectrum of T,the sum I = T1+T2+ · · ·+Tk in (d) is called the resolution of the identityoperator induced by T, and the sum T = λ1T1 + λ2T2 + · · · + λkTk in (e)is called the spectral decomposition of T. The spectral decomposition ofT is unique up to the order of its eigenvalues.

With the preceding notation, let β be the union of orthonormal bases ofthe Wi’s and let mi = dim(Wi). (Thus mi is the multiplicity of λi.) Then[T]β has the form ⎛⎜⎜⎜⎝

λ1Im1 O · · · OO λ2Im2 · · · O...

......

O O · · · λkImk

⎞⎟⎟⎟⎠ ;

that is, [T]β is a diagonal matrix in which the diagonal entries are the eigen-values λi of T, and each λi is repeated mi times. If λ1T1 +λ2T2 + · · ·+λkTk

is the spectral decomposition of T, then it follows (from Exercise 7) thatg(T) = g(λ1)T1 + g(λ2)T2 + · · ·+ g(λk)Tk for any polynomial g. This fact isused below.

We now list several interesting corollaries of the spectral theorem; manymore results are found in the exercises. For what follows, we assume that Tis a linear operator on a finite-dimensional inner product space V over F .

Corollary 1. If F = C, then T is normal if and only if T∗ = g(T) forsome polynomial g.

Proof. Suppose first that T is normal. Let T = λ1T1 + λ2T2 + · · ·+ λkTk

be the spectral decomposition of T. Taking the adjoint of both sides of thepreceding equation, we have T∗ = λ1T1 + λ2T2 + · · ·+ λkTk since each Ti isself-adjoint. Using the Lagrange interpolation formula (see page 52), we maychoose a polynomial g such that g(λi) = λi for 1 ≤ i ≤ k. Then

g(T)=g(λ1)T1 + g(λ2)T2 + · · · + g(λk)Tk =λ1T1 + λ2T2 + · · · + λkTk =T∗.

Conversely, if T∗ = g(T) for some polynomial g, then T commutes withT∗ since T commutes with every polynomial in T. So T is normal.


Corollary 2. If F = C, then T is unitary if and only if T is normal and|λ| = 1 for every eigenvalue λ of T.

Proof. If T is unitary, then T is normal and every eigenvalue of T hasabsolute value 1 by Corollary 2 to Theorem 6.18 (p. 382).

Let T = λ1T1 + λ2T2 + · · ·+ λkTk be the spectral decomposition of T. If|λ| = 1 for every eigenvalue λ of T, then by (c) of the spectral theorem,

TT∗ = (λ1T1 + λ2T2 + · · · + λkTk)(λ1T1 + λ2T2 + · · · + λkTk)

= |λ1|2T1 + |λ2|2T2 + · · · + |λk|2Tk

= T1 + T2 + · · · + Tk

= I.

Hence T is unitary.

Corollary 3. If F = C and T is normal, then T is self-adjoint if andonly if every eigenvalue of T is real.

Proof. Let T = λ1T1 + λ2T2 + · · · + λkTk be the spectral decompositionof T. Suppose that every eigenvalue of T is real. Then

T∗ = λ1T1 + λ2T2 + · · · + λkTk = λ1T1 + λ2T2 + · · · + λkTk = T.

The converse has been proved in the lemma to Theorem 6.17 (p. 374).

Corollary 4. Let T be as in the spectral theorem with spectral decom-position T = λ1T1 + λ2T2 + · · · + λkTk. Then each Tj is a polynomial inT.

Proof. Choose a polynomial gj (1 ≤ j ≤ k) such that gj(λi) = δij . Then

gj(T) = gj(λ1)T1 + gj(λ2)T2 + · · · + gj(λk)Tk

= δ1jT1 + δ2jT2 + · · · + δkjTk = Tj .

EXERCISES


(a) All projections are self-adjoint.(b) An orthogonal projection is uniquely determined by its range.(c) Every self-adjoint operator is a linear combination of orthogonal

projections.


(d) If T is a projection on W, then T(x) is the vector in W that isclosest to x.

(e) Every orthogonal projection is a unitary operator.

2. Let V = R2, W = span({(1, 2)}), and β be the standard ordered basisfor V. Compute [T]β , where T is the orthogonal projection of V on W.Do the same for V = R3 and W = span({(1, 0, 1)}).

3. For each of the matrices A in Exercise 2 of Section 6.5:

(1) Verify that LA possesses a spectral decomposition.(2) For each eigenvalue of LA, explicitly define the orthogonal projec-

tion on the corresponding eigenspace.(3) Verify your results using the spectral theorem.

4. Let W be a finite-dimensional subspace of an inner product space V.Show that if T is the orthogonal projection of V on W, then I−T is theorthogonal projection of V on W⊥.

5. Let T be a linear operator on a finite-dimensional inner product spaceV.

(a) If T is an orthogonal projection, prove that ‖T(x)‖ ≤ ‖x‖ for allx ∈ V. Give an example of a projection for which this inequalitydoes not hold. What can be concluded about a projection forwhich the inequality is actually an equality for all x ∈ V?

(b) Suppose that T is a projection such that ‖T(x)‖ ≤ ‖x‖ for x ∈ V.Prove that T is an orthogonal projection.

6. Let T be a normal operator on a finite-dimensional inner product space.Prove that if T is a projection, then T is also an orthogonal projection.

7. Let T be a normal operator on a finite-dimensional complex inner prod-uct space V. Use the spectral decomposition λ1T1 + λ2T2 + · · ·+ λkTk

of T to prove the following results.

(a) If g is a polynomial, then

g(T) =k∑

i=1

g(λi)Ti.

(b) If Tn = T0 for some n, then T = T0.(c) Let U be a linear operator on V. Then U commutes with T if and

only if U commutes with each Ti.(d) There exists a normal operator U on V such that U2 = T.(e) T is invertible if and only if λi �= 0 for 1 ≤ i ≤ k.(f) T is a projection if and only if every eigenvalue of T is 1 or 0.

Sec. 6.7 The Singular Value Decomposition and the Pseudoinverse 405

(g) T = −T∗ if and only if every λi is an imaginary number.

8. Use Corollary 1 of the spectral theorem to show that if T is a normaloperator on a complex finite-dimensional inner product space and U isa linear operator that commutes with T, then U commutes with T∗.

9. Referring to Exercise 20 of Section 6.5, prove the following facts abouta partial isometry U.

(a) U∗U is an orthogonal projection on W.(b) UU∗U = U.

10. Simultaneous diagonalization. Let U and T be normal operators on afinite-dimensional complex inner product space V such that TU = UT.Prove that there exists an orthonormal basis for V consisting of vectorsthat are eigenvectors of both T and U. Hint: Use the hint of Exercise 14of Section 6.4 along with Exercise 8.

11. Prove (c) of the spectral theorem.

6.7∗ THE SINGULAR VALUE DECOMPOSITIONAND THE PSEUDOINVERSE

In Section 6.4, we characterized normal operators on complex spaces and self-adjoint operators on real spaces in terms of orthonormal bases of eigenvectorsand their corresponding eigenvalues (Theorems 6.16, p. 372, and 6.17, p. 374).In this section, we establish a comparable theorem whose scope is the entireclass of linear transformations on both complex and real finite-dimensionalinner product spaces—the singular value theorem for linear transformations(Theorem 6.26). There are similarities and differences among these theorems.All rely on the use of orthonormal bases and numerical invariants. However,because of its general scope, the singular value theorem is concerned withtwo (usually distinct) inner product spaces and with two (usually distinct)orthonormal bases. If the two spaces and the two bases are identical, then thetransformation would, in fact, be a normal or self-adjoint operator. Anotherdifference is that the numerical invariants in the singular value theorem, thesingular values, are nonnegative, in contrast to their counterparts, the eigen-values, for which there is no such restriction. This property is necessary toguarantee the uniqueness of singular values.

The singular value theorem encompasses both real and complex spaces.For brevity, in this section we use the terms unitary operator and unitarymatrix to include orthogonal operators and orthogonal matrices in the contextof real spaces. Thus any operator T for which 〈T(x), T(y)〉 = 〈x, y〉, or anymatrix A for which 〈Ax, Ay〉 = 〈x, y〉, for all x and y is called unitary for thepurposes of this section.


In Exercise 15 of Section 6.3, the definition of the adjoint of an operatoris extended to any linear transformation T : V → W, where V and W arefinite-dimensional inner product spaces. By this exercise, the adjoint T∗ ofT is a linear transformation from W to V and [T∗]βγ = ([T]γβ)∗, where β andγ are orthonormal bases for V and W, respectively. Furthermore, the linearoperator T∗T on V is positive semidefinite and rank(T∗T) = rank(T) byExercise 18 of Section 6.4.

With these facts in mind, we begin with the principal result.

Theorem 6.26 (Singular Value Theorem for Linear Transforma-tions). Let V and W be finite-dimensional inner product spaces, and letT : V → W be a linear transformation of rank r. Then there exist orthonormalbases {v1, v2, . . . , vn} for V and {u1, u2, . . . , um} for W and positive scalarsσ1 ≥ σ2 ≥ · · · ≥ σr such that

T(vi) =

{σiui if 1 ≤ i ≤ r

0 if i > r.(4)

Conversely, suppose that the preceding conditions are satisfied. Then for1 ≤ i ≤ n, vi is an eigenvector of T∗T with corresponding eigenvalue σ2

i if1 ≤ i ≤ r and 0 if i > r. Therefore the scalars σ1, σ2, . . . , σr are uniquelydetermined by T.

Proof. We first establish the existence of the bases and scalars. By Ex-ercises 18 of Section 6.4 and 15(d) of Section 6.3, T∗T is a positive semidef-inite linear operator of rank r on V; hence there is an orthonormal basis{v1, v2, . . . , vn} for V consisting of eigenvectors of T∗T with correspondingeigenvalues λi, where λ1 ≥ λ2 ≥ · · · ≥ λr > 0, and λi = 0 for i > r. For

1 ≤ i ≤ r, define σi =√

λi and ui =1σi

T(vi). We show that {u1, u2, . . . , ur}is an orthonormal subset of W. Suppose 1 ≤ i, j ≤ r. Then

〈ui, uj〉 =

⟨1σ i

T(vi),1σ j

T(vj)

⟩

=1

σiσj

⟨T∗T(vi), vj

⟩

=1

σiσj〈λivi, vj〉

=σ2

i

σiσj〈vi, vj〉

= δij ,


and hence {u1, u2, . . . , ur} is orthonormal. By Theorem 6.7(a) (p. 352), thisset extends to an orthonormal basis {u1, u2, . . . , ur, . . . , um} for W. ClearlyT(vi) = σiui if 1 ≤ i ≤ r. If i > r, then T∗T(vi) = 0 , and so T(vi) = 0 byExercise 15(d) of Section 6.3.

To establish uniqueness, suppose that {v1, v2, . . . , vn}, {u1, u2, . . . , um},and σ1 ≥ σ2 ≥ · · · ≥ σr > 0 satisfy the properties stated in the first part ofthe theorem. Then for 1 ≤ i ≤ m and 1 ≤ j ≤ n,

〈T∗(ui), vj〉 = 〈ui, T(vj)〉

=

{σi if i = j ≤ r

0 otherwise,

and hence for any 1 ≤ i ≤ m,

T∗(ui) =n∑

j=1

〈T∗(ui), vj〉 vj =

{σivi if i = j ≤ r

0 otherwise.(5)

So for i ≤ r,

T∗T(vi) = T∗(σiui) = σiT∗(ui) = σ2

i ui

and T∗T(vi) = T∗(0 ) = 0 for i > r. Therefore each vi is an eigenvector ofT∗T with corresponding eigenvalue σ2

i if i ≤ r and 0 if i > r.

Definition. The unique scalars σ1, σ2, . . . , σr in Theorem 6.26 are calledthe singular values of T. If r is less than both m and n, then the termsingular value is extended to include σr+1 = · · · = σk = 0, where k is theminimum of m and n.

Although the singular values of a linear transformation T are uniquely de-termined by T, the orthonormal bases given in the statement of Theorem 6.26are not uniquely determined because there is more than one orthonormal basisof eigenvectors of T∗T.

In view of (5), the singular values of a linear transformation T : V → Wand its adjoint T∗ are identical. Furthermore, the orthonormal bases for Vand W given in Theorem 6.26 are simply reversed for T∗.

Example 1

Let P2(R) and P1(R) be the polynomial spaces with inner products definedby

〈f(x), g(x)〉 =∫ 1

−1

f(t)g(t) dt.


Let T : P2(R) → P1(R) be the linear transformation defined by T(f(x)) =f ′(x). Find orthonormal bases β = {v1, v2, v3} for P2(R) and γ = {u1, u2} forP1(R) such that T(vi) = σiui for i = 1, 2 and T(v3) = 0 , where σ1 ≥ σ2 > 0are the nonzero singular values of T.

To facilitate the computations, we translate this problem into the corre-sponding problem for a matrix representation of T. Caution is advised herebecause not any matrix representation will do. Since the adjoint is definedin terms of inner products, we must use a matrix representation constructedfrom orthonormal bases for P2(R) and P1(R) to guarantee that the adjointof the matrix representation of T is the same as the matrix representation ofthe adjoint of T. (See Exercise 15 of Section 6.3.) For this purpose, we usethe results of Exercise 21(a) of Section 6.2 to obtain orthonormal bases

α =

{1√2,

√32

x,

√58

(3x2 − 1)

}and α′ =

{1√2,

√32

x

}

for P2(R) and P1(R), respectively.

Let

A = [T]α′

α =(

0√

3 00 0

√15

).

Then

A∗A =

⎛⎝ 0 0√3 0

0√

15

⎞⎠(0

√3 0

0 0√

15

)=

⎛⎝0 0 00 3 00 0 15

⎞⎠ ,

which has eigenvalues (listed in descending order of size) λ1 = 15, λ2 = 3,and λ3 = 0. These eigenvalues correspond, respectively, to the orthonormaleigenvectors e3 = (0, 0, 1), e2 = (0, 1, 0), and e1 = (1, 0, 0) in R3. Translatingeverything into the context of T, P2(R), and P1(R), let

v1 =

√58

(3x2 − 1), v2 =

√32

x, and v3 =1√2.

Then β = {v1, v2, v3} is an orthonormal basis for P2(R) consisting of eigen-vectors of T∗T with corresponding eigenvalues λ1, λ2, and λ3. Now setσ1 =

√λ1 =

√15 and σ2 =

√λ2 =

√3, the nonzero singular values of T,

and take

u1 =1σ1

T(v1) =

√32

x and u2 =1σ2

T(v2) =1√2,

to obtain the required basis γ = {u1, u2} for P1(R). ♦


We can use singular values to describe how a figure is distorted by a lineartransformation. This is illustrated in the next example.

Example 2

Let T be an invertible linear operator on R2 and S = {x ∈ R2 : ‖x‖ = 1}, theunit circle in R2. We apply Theorem 6.26 to describe S′ = T(S).

Since T is invertible, it has rank equal to 2 and hence has singular valuesσ1 ≥ σ2 > 0. Let {v1, v2} and β = {u1, u2} be orthonormal bases for R2 sothat T(v1) = σ1u1 and T(v2) = σ2u2, as in Theorem 6.26. Then β determinesa coordinate system, which we shall call the x′y′-coordinate system for R2,where the x′-axis contains u1 and the y′-axis contains u2. For any vector

u ∈ R2, if u = x′1u1 + x′

2u2, then [u]β =(

x′1

x′2

)is the coordinate vector of u

relative to β. We characterize S′ in terms of an equation relating x′1 and x′

2.

For any vector v = x1v1 + x2v2 ∈ R2, the equation u = T(v) means that

u = T(x1v1 + x2v2) = x1T(v1) + x2T(v2) = x1σ1u1 + x2σ2u2.

Thus for u = x′1u1 + x′

2u2, we have x′1 = x1σ1 and x′

2 = x2σ2. Furthermore,u ∈ S′ if and only if v ∈ S if and only if

(x′1)

2

σ21

+(x′

2)2

σ22

= x21 + x2

2 = 1.

If σ1 = σ2, this is the equation of a circle of radius σ1, and if σ1 > σ2, this isthe equation of an ellipse with major axis and minor axis oriented along thex′-axis and the y′-axis, respectively. (See Figure 6.6.) ♦

............

........................................................

............................................................................................................................................................................................................................................................................................................................................................................................................................................................................�

��

��"v1

v2

S

�

�

v = x1v1 + x2v2

�T

��

��

��

��

��

��

��

��

��

��

��

��

��

��

�� σ1σ2

..............................................................

..........................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................

................................................................................................................................................................................................................................................................................................................................

��

��

x′y′

u1u2

S′

�

�

u = x′1u1 + x′

2u2

Figure 6.6


The singular value theorem for linear transformations is useful in its ma-trix form because we can perform numerical computations on matrices. Webegin with the definition of the singular values of a matrix.

Definition. Let A be an m× n matrix. We define the singular valuesof A to be the singular values of the linear transformation LA.

Theorem 6.27 (Singular Value Decomposition Theorem for Ma-trices). Let A be an m × n matrix of rank r with the positive singularvalues σ1 ≥ σ2 ≥ · · · ≥ σr, and let Σ be the m × n matrix defined by

Σij =

{σi if i = j ≤ r

0 otherwise.

Then there exists an m × m unitary matrix U and an n × n unitary matrixV such that

A = UΣV ∗.

Proof. Let T = LA : Fn → Fm. By Theorem 6.26, there exist orthonormalbases β = {v1, v2, . . . , vn} for Fn and γ = {u1, u2, . . . , um} for Fm such thatT(vi) = σiui for 1 ≤ i ≤ r and T(vi) = 0 for i > r. Let U be the m × mmatrix whose jth column is uj for all j, and let V be the n×n matrix whosejth column is vj for all j. Note that both U and V are unitary matrices.

By Theorem 2.13(a) (p. 90), the jth column of AV is Avj = σjuj . Observethat the jth column of Σ is σjej , where ej is the jth standard vector of Fm.So by Theorem 2.13(a) and (b), the jth column of UΣ is given by

U(σjej) = σjU(ej) = σjuj .

It follows that AV and UΣ are m×n matrices whose corresponding columnsare equal, and hence AV = UΣ. Therefore A = AV V ∗ = UΣV ∗.

Definition. Let A be an m × n matrix of rank r with positive singularvalues σ1 ≥ σ2 ≥ · · · ≥ σr. A factorization A = UΣV ∗ where U and V areunitary matrices and Σ is the m × n matrix defined as in Theorem 6.27 iscalled a singular value decomposition of A.

In the proof of Theorem 6.27, the columns of V are the vectors in β, andthe columns of U are the vectors in γ. Furthermore, the nonzero singularvalues of A are the same as those of LA; hence they are the square roots ofthe nonzero eigenvalues of A∗A or of AA∗. (See Exercise 9.)


Example 3

We find a singular value decomposition for A =(

1 1 −11 1 −1

).

First observe that for

v1 =1√3

⎛⎝ 11

−1

⎞⎠ , v2 =1√2

⎛⎝ 1−1

0

⎞⎠ , and v3 =1√6

⎛⎝112

⎞⎠ ,

the set β = {v1, v2, v3} is an orthonormal basis for R3 consisting of eigen-vectors of A∗A with corresponding eigenvalues λ1 = 6, and λ2 = λ3 = 0.Consequently, σ1 =

√6 is the only nonzero singular value of A. Hence, as in

the proof of Theorem 6.27, we let V be the matrix whose columns are thevectors in β. Then

Σ =(√

6 0 00 0 0

)and V =

⎛⎜⎜⎝1√3

1√2

1√6

1√3

−1√2

1√6

−1√3

0 2√6

⎞⎟⎟⎠ .

Also, as in Theorem 6.27, we take

u1 =1σi

LA(v1) =1σi

Av1 =1√2

(11

).

Next choose u2 =1√2

(1

−1

), a unit vector orthogonal to u1, to obtain the

orthonormal basis γ = {u1, u2} for R2, and set

U =

(1√2

1√2

1√2

−1√2

).

Then A = UΣV ∗ is the desired singular value decomposition. ♦

The Polar Decomposition of a Square Matrix

A singular value decomposition of a matrix can be used to factor a squarematrix in a manner analogous to the factoring of a complex number as theproduct of a complex number of length 1 and a nonnegative number. In thecase of matrices, the complex number of length 1 is replaced by a unitarymatrix, and the nonnegative number is replaced by a positive semidefinitematrix.

Theorem 6.28 (Polar Decomposition). For any square matrix A,there exists a unitary matrix W and a positive semidefinite matrix P suchthat

A = WP.


Furthermore, if A is invertible, then the representation is unique.

Proof. By Theorem 6.27, there exist unitary matrices U and V and adiagonal matrix Σ with nonnegative diagonal entries such that A = UΣV ∗.So

A = UΣV ∗ = UV ∗V ΣV ∗ = WP,

where W = UV ∗ and P = V ΣV ∗. Since W is the product of unitary matrices,W is unitary, and since Σ is positive semidefinite and P is unitarily equivalentto Σ, P is positive semidefinite by Exercise 14 of Section 6.5.

Now suppose that A is invertible and factors as the products

A = WP = ZQ,

where W and Z are unitary and P and Q are positive semidefinite. Since Ais invertible, it follows that P and Q are positive definite and invertible, andtherefore Z∗W = QP−1. Thus QP−1 is unitary, and so

I = (QP−1)∗(QP−1) = P−1Q2P−1.

Hence P 2 = Q2. Since both P and Q are positive definite, it follows thatP = Q by Exercise 17 of Section 6.4. Therefore W = Z, and consequentlythe factorization is unique.

The factorization of a square matrix A as WP where W is unitary and Pis positive semidefinite, is called a polar decomposition of A.

Example 4

To find the polar decomposition of A =(

11 −5−2 10

), we begin by finding a sin-

gular value decomposition UΣV ∗ of A. The object is to find an orthonormalbasis β for R2 consisting of eigenvectors of A∗A. It can be shown that

v1 =1√2

(1

−1

)and v2 =

1√2

(11

)are orthonormal eigenvectors of A∗A with corresponding eigenvalues λ1 = 200and λ2 = 50. So β = {v1, v2} is an appropriate basis. Thus σ1 =

√200 =

10√

2 and σ2 =√

50 = 5√

2 are the singular values of A. So we have

V =

(1√2

1√2

−1√2

1√2

)and Σ =

(10√

2 00 5

√2

).

Next, we find the columns u1 and u2 of U :

u1 =1σ1

Av1 =15

(4

−3

)and u2 =

1σ2

Av2 =15

(34

).


Thus

U =

(45

35

− 35

45

).

Therefore, in the notation of Theorem 6.28, we have

W = UV ∗ =

(45

35

− 35

45

)(1√2

−1√2

1√2

1√2

)=

15√

2

(7 −11 7

),

and

P = V ΣV ∗ =

(1√2

1√2

−1√2

1√2

)(10√

2 00 5

√2

)( 1√2

−1√2

1√2

1√2

)=

5√2

(3 −1

−1 3

).

♦

The Pseudoinverse

Let V and W be finite-dimensional inner product spaces over the samefield, and let T : V → W be a linear transformation. It is desirable to have alinear transformation from W to V that captures some of the essence of aninverse of T even if T is not invertible. A simple approach to this problemis to focus on the “part” of T that is invertible, namely, the restriction ofT to N(T)⊥. Let L : N(T)⊥ → R(T) be the linear transformation defined byL(x) = T(x) for all x ∈ N(T)⊥. Then L is invertible, and we can use theinverse of L to construct a linear transformation from W to V that salvagessome of the benefits of an inverse of T.

Definition. Let V and W be finite-dimensional inner product spacesover the same field, and let T : V → W be a linear transformation. LetL : N(T)⊥ → R(T) be the linear transformation defined by L(x) = T(x) for allx ∈ N(T)⊥. The pseudoinverse (or Moore-Penrose generalized inverse) ofT, denoted by T †, is defined as the unique linear transformation from W toV such that

T†(y) =

{L−1(y) for y ∈ R(T)0 for y ∈ R(T)⊥.

The pseudoinverse of a linear transformation T on a finite-dimensionalinner product space exists even if T is not invertible. Furthermore, if Tis invertible, then T† = T−1 because N(T)⊥ = V, and L (as just defined)coincides with T.

As an extreme example, consider the zero transformation T0 : V → Wbetween two finite-dimensional inner product spaces V and W. Then R(T0) ={0}, and therefore T† is the zero transformation from W to V.


We can use the singular value theorem to describe the pseudoinverse of alinear transformation. Suppose that V and W are finite-dimensional vectorspaces and T : V → W is a linear transformation or rank r. Let {v1, v2, . . . , vn}and {u1, u2, . . . , um} be orthonormal bases for V and W, respectively, and letσ1 ≥ σ2 ≥ · · · ≥ σr be the nonzero singular values of T satisfying (4) in Theo-rem 6.26. Then {v1, v2, . . . , vr} is a basis for N(T)⊥, {vr+1, vr+2, . . . , vn} is abasis for N(T), {u1, u2, . . . , ur} is a basis for R(T), and {ur+1, ur+2, . . . , um} isa basis for R(T)⊥. Let L be the restriction of T to N(T)⊥, as in the definition

of pseudoinverse. Then L−1(ui) =1σi

vi for 1 ≤ i ≤ r. Therefore

T†(ui) =

⎧⎨⎩1σi

vi if 1 ≤ i ≤ r

0 if r < i ≤ m.

(6)

Example 5

Let T : P2(R) → P1(R) be the linear transformation defined by T(f(x)) =f ′(x), as in Example 1. Let β = {v1, v2, v3} and γ = {u1, u2} be the or-thonormal bases for P2(R) and P1(R) in Example 1. Then σ1 =

√15 and

σ2 =√

3 are the nonzero singular values of T. It follows that

T†(√

32x

)= T†(u1) =

1σ1

v1 =1√15

√58(3x2 − 1),

and hence

T†(x) =16(3x2 − 1).

Similarly, T†(1) = x. Thus, for any polynomial a + bx ∈ P1(R),

T†(a + bx) = aT†(1) + bT†(x) = ax +b

6(3x2 − 1). ♦

The Pseudoinverse of a Matrix

Let A be an m × n matrix. Then there exists a unique n × m matrix Bsuch that (LA)† : Fm → Fn is equal to the left-multiplication transformationLB . We call B the pseudoinverse of A and denote it by B = A†. Thus

(LA)† = LA† .

Let A be an m × n matrix of rank r. The pseudoinverse of A can becomputed with the aid of a singular value decomposition A = UΣV ∗. Letβ and γ be the ordered bases whose vectors are the columns of V and U ,


respectively, and let σ1 ≥ σ2 ≥ · · · ≥ σr be the nonzero singular values ofA. Then β and γ are orthonormal bases for Fn and Fm, respectively, and (4)and (6) are satisfied for T = LA. Reversing the roles of β and γ in the proofof Theorem 6.27, we obtain the following result.

Theorem 6.29. Let A be an m×n matrix of rank r with a singular valuedecomposition A = UΣV ∗ and nonzero singular values σ1 ≥ σ2 ≥ · · · ≥ σr.Let Σ† be the n × m matrix defined by

Σ†ij =

⎧⎨⎩1σi

if i = j ≤ r

0 otherwise.

Then A† = V Σ†U∗, and this is a singular value decomposition of A†.

Notice that Σ† as defined in Theorem 6.29 is actually the pseudoinverseof Σ.

Example 6

We find A† for the matrix A =(

1 1 −11 1 −1

).

Since A is the matrix of Example 3, we can use the singular value decom-position obtained in that example:

A = UΣV ∗ =

(1√2

1√2

1√2

−1√2

)(√6 0 0

0 0 0

)⎛⎜⎜⎝1√3

1√2

1√6

1√3

−1√2

1√6

−1√3

0 2√6

⎞⎟⎟⎠∗

.

By Theorem 6.29, we have

A† = V Σ†U∗ =

⎛⎜⎜⎝1√3

1√2

1√6

1√3

−1√2

1√6

−1√3

0 2√6

⎞⎟⎟⎠⎛⎜⎝

1√6

0

0 00 0

⎞⎟⎠(1√2

1√2

1√2

−1√2

)=

16

⎛⎝ 1 11 1

−1 −1

⎞⎠ .

♦Notice that the linear transformation T of Example 5 is LA, where A is

the matrix of Example 6, and that T† = LA† .

The Pseudoinverse and Systems of Linear Equations

Let A be an m × n matrix with entries in F . Then for any b ∈ Fm, thematrix equation Ax = b is a system of linear equations, and so it either has nosolutions, a unique solution, or infinitely many solutions. We know that the


system has a unique solution for every b ∈ Fm if and only if A is invertible,in which case the solution is given by A−1b. Furthermore, if A is invertible,then A−1 = A†, and so the solution can be written as x = A†b. If, on theother hand, A is not invertible or the system Ax = b is inconsistent, then A†bstill exists. We therefore pose the following question: In general, how is thevector A†b related to the system of linear equations Ax = b?

In order to answer this question, we need the following lemma.

Lemma. Let V and W be finite-dimensional inner product spaces, and letT : V → W be linear. Then

(a) T†T is the orthogonal projection of V on N(T)⊥.(b) TT† is the orthogonal projection of W on R(T).

Proof. As in the earlier discussion, we define L : N(T)⊥ → W by L(x) =T(x) for all x ∈ N(T)⊥. If x ∈ N(T)⊥, then T†T(x) = L−1L(x) = x, and ifx ∈ N(T), then T†T(x) = T†(0 ) = 0 . Consequently T†T is the orthogonalprojection of V on N(T)⊥. This proves (a).

The proof of (b) is similar and is left as an exercise.

Theorem 6.30. Consider the system of linear equations Ax = b, whereA is an m × n matrix and b ∈ Fm. If z = A†b, then z has the followingproperties.

(a) If Ax = b is consistent, then z is the unique solution to the systemhaving minimum norm. That is, z is a solution to the system, and if yis any solution to the system, then ‖z‖ ≤ ‖y‖ with equality if and onlyif z = y.

(b) If Ax = b is inconsistent, then z is the unique best approximation to asolution having minimum norm. That is, ‖Az − b‖ ≤ ‖Ay − b‖ for anyy ∈ Fn, with equality if and only if Az = Ay. Furthermore, if Az = Ay,then ‖z‖ ≤ ‖y‖ with equality if and only if z = y.

Proof. For convenience, let T = LA.(a) Suppose that Ax = b is consistent, and let z = A†b. Observe that

b ∈ R(T), and therefore Az = AA†b = TT†(b) = b by part (b) of the lemma.Thus z is a solution to the system. Now suppose that y is any solution to thesystem. Then

T†T(y) = A†Ay = A†b = z,

and hence z is the orthogonal projection of y on N(T)⊥ by part (a) of thelemma. Therefore, by the corollary to Theorem 6.6 (p. 350), we have that‖z‖ ≤ ‖y‖ with equality if and only if z = y.

(b) Suppose that Ax = b is inconsistent. By the lemma, Az = AA†b =TT†(b) = b is the orthogonal projection of b on R(T); therefore, by the corol-lary to Theorem 6.6 (p. 350), Az is the vector in R(T) nearest b. That is, if


Ay is any other vector in R(T), then ‖Az − b‖ ≤ ‖Ay − b‖ with equality ifand only if Az = Ay.

Finally, suppose that y is any vector in Fn such that Az = Ay = c. Then

A†c = A†Az = A†AA†b = A†b = z

by Exercise 23; hence we may apply part (a) of this theorem to the systemAx = c to conclude that ‖z‖ ≤ ‖y‖ with equality if and only if z = y.

Note that the vector z = A†b in Theorem 6.30 is the vector x0 describedin Theorem 6.12 that arises in the least squares application on pages 360–364.

Example 7

Consider the linear systems

x1 + x2 − x3 = 1x1 + x2 − x3 = 1 and

x1 + x2 − x3 = 1x1 + x2 − x3 = 2.

The first system has infinitely many solutions. Let A =(

1 1 −11 1 −1

), the

coefficient matrix of the system, and let b =(

11

). By Example 6,

A† =16

⎛⎝ 1 11 1

−1 −1

⎞⎠ ,

and therefore

z = A†b =16

⎛⎝ 1 11 1

−1 −1

⎞⎠(11

)=

13

⎛⎝ 11

−1

⎞⎠is the solution of minimal norm by Theorem 6.30(a).

The second system is obviously inconsistent. Let b =(

12

). Thus, al-

though

z = A†b =16

⎛⎝ 1 11 1

−1 −1

⎞⎠(12

)=

12

⎛⎝ 11

−1

⎞⎠is not a solution to the second system, it is the “best approximation” to asolution having minimum norm, as described in Theorem 6.30(b). ♦


EXERCISES


(a) The singular values of any linear operator on a finite-dimensionalvector space are also eigenvalues of the operator.

(b) The singular values of any matrix A are the eigenvalues of A∗A.(c) For any matrix A and any scalar c, if σ is a singular value of A,

then |c|σ is a singular value of cA.(d) The singular values of any linear operator are nonnegative.(e) If λ is an eigenvalue of a self-adjoint matrix A, then λ is a singular

value of A.(f) For any m×n matrix A and any b ∈ Fn, the vector A†b is a solution

to Ax = b.(g) The pseudoinverse of any linear operator exists even if the operator

is not invertible.

2. Let T : V → W be a linear transformation of rank r, where V and Ware finite-dimensional inner product spaces. In each of the following,find orthonormal bases {v1, v2, . . . , vn} for V and {u1, u2, . . . , um} forW, and the nonzero singular values σ1 ≥ σ2 ≥ · · · ≥ σr of T such thatT(vi) = σiui for 1 ≤ i ≤ r.

(a) T : R2 → R3 defined by T(x1, x2) = (x1, x1 + x2, x1 − x2)(b) T : P2(R) → P1(R), where T(f(x)) = f ′′(x), and the inner prod-

ucts are defined as in Example 1(c) Let V = W = span({1, sin x, cos x}) with the inner product defined

by 〈f, g〉 =∫ 2π

0f(t)g(t) dt, and T is defined by T(f) = f ′ + 2f

(d) T : C2 → C2 defined by T(z1, z2) = ((1 − i)z2, (1 + i)z1 + z2)

3. Find a singular value decomposition for each of the following matrices.

(a)

⎛⎝ 1 11 1

−1 −1

⎞⎠ (b)(

1 0 11 0 −1

)(c)

⎛⎜⎜⎝1 10 11 01 1

⎞⎟⎟⎠(d)

⎛⎝1 1 11 −1 01 0 −1

⎞⎠ (e)(

1 + i 11 − i −i

)(f)

⎛⎝1 1 1 11 0 −2 11 −1 1 1

⎞⎠4. Find a polar decomposition for each of the following matrices.

(a)(

1 12 −2

)(b)

⎛⎝20 4 00 0 14 20 0

⎞⎠5. Find an explicit formula for each of the following expressions.


(a) T†(x1, x2, x3), where T is the linear transformation of Exercise 2(a)(b) T†(a + bx + cx2), where T is the linear transformation of Exer-

cise 2(b)(c) T†(a + b sin x + c cos x), where T is the linear transformation of

Exercise 2(c)(d) T†(z1, z2), where T is the linear transformation of Exercise 2(d)

6. Use the results of Exercise 3 to find the pseudoinverse of each of thefollowing matrices.

(a)

⎛⎝ 1 11 1

−1 −1

⎞⎠ (b)(

1 0 11 0 −1

)(c)

⎛⎜⎜⎝1 10 11 01 1

⎞⎟⎟⎠(d)

⎛⎝1 1 11 −1 01 0 −1

⎞⎠ (e)(

1 + i 11 − i −i

)(f)

⎛⎝1 1 1 11 0 −2 11 −1 1 1

⎞⎠7. For each of the given linear transformations T : V → W,

(i) Describe the subspace Z1 of V such that T†T is the orthogonalprojection of V on Z1.

(ii) Describe the subspace Z2 of W such that TT† is the orthogonalprojection of W on Z2.

(a) T is the linear transformation of Exercise 2(a)(b) T is the linear transformation of Exercise 2(b)(c) T is the linear transformation of Exercise 2(c)(d) T is the linear transformation of Exercise 2(d)

8. For each of the given systems of linear equations,(i) If the system is consistent, find the unique solution having mini-

mum norm.(ii) If the system is inconsistent, find the “best approximation to a

solution” having minimum norm, as described in Theorem 6.30(b).(Use your answers to parts (a) and (f) of Exercise 6.)

(a)x1 + x2 = 1x1 + x2 = 2

−x1 + −x2 = 0(b)

x1 + x2 + x3 + x4 = 2x1 − 2x3 + x4 = −1x1 − x2 + x3 + x4 = 2

9. Let V and W be finite-dimensional inner product spaces over F , and sup-pose that {v1, v2, . . . , vn} and {u1, u2, . . . , um} are orthonormal basesfor V and W, respectively. Let T : V → W is a linear transformation ofrank r, and suppose that σ1 ≥ σ2 ≥ · · · ≥ σr > 0 are such that

T(vi) =

{σiui if 1 ≤ i ≤ r

0 if r < i.


(a) Prove that {u1, u2, . . . , um} is a set of eigenvectors of TT∗ withcorresponding eigenvalues λ1, λ2, . . . , λm, where

λi =

{σ2

i if 1 ≤ i ≤ r

0 if r < i.

(b) Let A be an m×n matrix with real or complex entries. Prove thatthe nonzero singular values of A are the positive square roots ofthe nonzero eigenvalues of AA∗, including repetitions.

(c) Prove that TT∗ and T∗T have the same nonzero eigenvalues, in-cluding repetitions.

(d) State and prove a result for matrices analogous to (c).

10. Use Exercise 8 of Section 2.5 to obtain another proof of Theorem 6.27,the singular value decomposition theorem for matrices.

11. This exercise relates the singular values of a well-behaved linear operatoror matrix to its eigenvalues.

(a) Let T be a normal linear operator on an n-dimensional inner prod-uct space with eigenvalues λ1, λ2, . . . , λn. Prove that the singularvalues of T are |λ1|, |λ2|, . . . , |λn|.

(b) State and prove a result for matrices analogous to (a).

12. Let A be a normal matrix with an orthonormal basis of eigenvectorsβ = {v1, v2, . . . , vn} and corresponding eigenvalues λ1, λ2, . . . , λn. LetV be the n × n matrix whose columns are the vectors in β. Prove thatfor each i there is a scalar θi of absolute value 1 such that if U is then × n matrix with θivi as column i and Σ is the diagonal matrix suchthat Σii = |λi| for each i, then UΣV ∗ is a singular value decompositionof A.

13. Prove that if A is a positive semidefinite matrix, then the singular valuesof A are the same as the eigenvalues of A.

14. Prove that if A is a positive definite matrix and A = UΣV ∗ is a singularvalue decomposition of A, then U = V .

15. Let A be a square matrix with a polar decomposition A = WP .

(a) Prove that A is normal if and only if WP 2 = P 2W .(b) Use (a) to prove that A is normal if and only if WP = PW .

16. Let A be a square matrix. Prove an alternate form of the polar de-composition for A: There exists a unitary matrix W and a positivesemidefinite matrix P such that A = PW .


17. Let T and U be linear operators on R2 defined for all (x1, x2) ∈ R2 by

T(x1, x2) = (x1, 0) and U(x1, x2) = (x1 + x2, 0).

(a) Prove that (UT)† �= T†U†.(b) Exhibit matrices A and B such that AB is defined, but (AB)† �=

B†A†.

18. Let A be an m × n matrix. Prove the following results.

(a) For any m × m unitary matrix G, (GA)† = A†G∗.(b) For any n × n unitary matrix H, (AH)† = H∗A†.

19. Let A be a matrix with real or complex entries. Prove the followingresults.

(a) The nonzero singular values of A are the same as the nonzerosingular values of A∗, which are the same as the nonzero singularvalues of At.

(b) (A†)∗ = (A∗)†.(c) (A†)t = (At)†.

20. Let A be a square matrix such that A2 = O. Prove that (A†)2 = O.

21. Let V and W be finite-dimensional inner product spaces, and letT : V → W be linear. Prove the following results.

(a) TT†T = T.(b) T†TT† = T†.(c) Both T†T and TT† are self-adjoint.

The preceding three statements are called the Penrose conditions,and they characterize the pseudoinverse of a linear transformation asshown in Exercise 22.

22. Let V and W be finite-dimensional inner product spaces. Let T : V → Wand U : W → V be linear transformations such that TUT = T, UTU = U,and both UT and TU are self-adjoint. Prove that U = T†.

23. State and prove a result for matrices that is analogous to the result ofExercise 21.

24. State and prove a result for matrices that is analogous to the result ofExercise 22.

25. Let V and W be finite-dimensional inner product spaces, and letT : V → W be linear. Prove the following results.

(a) If T is one-to-one, then T∗T is invertible and T† = (T∗T)−1T∗.(b) If T is onto, then TT∗ is invertible and T† = T∗(TT∗)−1.


26. Let V and W be finite-dimensional inner product spaces with orthonor-mal bases β and γ, respectively, and let T : V → W be linear. Provethat ([T]γβ)† = [T†]βγ .

27. Let V and W be finite-dimensional inner product spaces, and letT : V → W be a linear transformation. Prove part (b) of the lemmato Theorem 6.30: TT† is the orthogonal projection of W on R(T).

6.8∗ BILINEAR AND QUADRATIC FORMS

There is a certain class of scalar-valued functions of two variables defined ona vector space that arises in the study of such diverse subjects as geometryand multivariable calculus. This is the class of bilinear forms. We study thebasic properties of this class with a special emphasis on symmetric bilinearforms, and we consider some of its applications to quadratic surfaces andmultivariable calculus.

Bilinear Forms

Definition. Let V be a vector space over a field F . A function H fromthe set V×V of ordered pairs of vectors to F is called a bilinear form on Vif H is linear in each variable when the other variable is held fixed; that is,H is a bilinear form on V if

(a) H(ax1 + x2, y) = aH(x1, y) + H(x2, y) for all x1, x2, y ∈ V and a ∈ F(b) H(x, ay1 + y2) = aH(x, y1) + H(x, y2) for all x, y1, y2 ∈ V and a ∈ F .

We denote the set of all bilinear forms on V by B(V). Observe that aninner product on a vector space is a bilinear form if the underlying field isreal, but not if the underlying field is complex.

Example 1

Define a function H : R2 × R2 → R by

H

((a1

a2

),

(b1

b2

))= 2a1b1 + 3a1b2 + 4a2b1 − a2b2 for

(a1

a2

),

(b1

b2

)∈ R2.

We could verify directly that H is a bilinear form on R2. However, it is moreenlightening and less tedious to observe that if

A =(

2 34 −1

), x =

(a1

a2

), and y =

(b1

b2

),

then

H(x, y) = xtAy.

The bilinearity of H now follows directly from the distributive property ofmatrix multiplication over matrix addition. ♦

Sec. 6.8 Bilinear and Quadratic Forms 423

The preceding bilinear form is a special case of the next example.

Example 2

Let V = Fn, where the vectors are considered as column vectors. For anyA ∈ Mn×n(F ), define H : V × V → F by

H(x, y) = xtAy for x, y ∈ V.

Notice that since x and y are n×1 matrices and A is an n×n matrix, H(x, y)is a 1×1 matrix. We identify this matrix with its single entry. The bilinearityof H follows as in Example 1. For example, for a ∈ F and x1, x2, y ∈ V, wehave

H(ax1 + x2, y) = (ax1 + x2)tAy = (axt1 + xt

2)Ay

= axt1Ay + xt

2Ay

= aH(x1, y) + H(x2, y). ♦

We list several properties possessed by all bilinear forms. Their proofs areleft to the reader (see Exercise 2).

For any bilinear form H on a vector space V over a field F , the followingproperties hold.

1. If, for any x ∈ V, the functions Lx, Rx : V → F are defined by

Lx(y) = H(x, y) and Rx(y) = H(y, x) for all y ∈ V,

then Lx and Rx are linear.2. H(0 , x) = H(x, 0 ) = 0 for all x ∈ V.3. For all x, y, z, w ∈ V,

H(x + y, z + w) = H(x, z) + H(x, w) + H(y, z) + H(y, w).

4. If J : V × V → F is defined by J(x, y) = H(y, x), then J is a bilinearform.

Definitions. Let V be a vector space, let H1 and H2 be bilinear formson V, and let a be a scalar. We define the sum H1 + H2 and the scalarproduct aH1 by the equations

(H1 + H2)(x, y) = H1(x, y) + H2(x, y)

and

(aH1)(x, y) = a(H1(x, y)) for all x, y ∈ V.

The following theorem is an immediate consequence of the definitions.


Theorem 6.31. For any vector space V, the sum of two bilinear formsand the product of a scalar and a bilinear form on V are again bilinear formson V. Furthermore, B(V) is a vector space with respect to these operations.

Proof. Exercise.

Let β = {v1, v2, . . . , vn} be an ordered basis for an n-dimensional vectorspace V, and let H ∈ B(V). We can associate with H an n × n matrix Awhose entry in row i and column j is defined by

Aij = H(vi, vj) for i, j = 1, 2, . . . , n.

Definition. The matrix A above is called the matrix representationof H with respect to the ordered basis β and is denoted by ψβ(H).

We can therefore regard ψβ as a mapping from B(V) to Mn×n(F ), whereF is the field of scalars for V, that takes a bilinear form H into its matrixrepresentation ψβ(H). We first consider an example and then show that ψβ

is an isomorphism.

Example 3

Consider the bilinear form H of Example 1, and let

β ={(

11

),

(1

−1

)}and B = ψβ(H).

Then

B11 = H

((11

),

(11

))= 2 + 3 + 4 − 1 = 8,

B12 = H

((11

),

(1

−1

))= 2 − 3 + 4 + 1 = 4,

B21 = H

((1

−1

),

(11

))= 2 + 3 − 4 + 1 = 2,

and

B22 = H

((1

−1

),

(1

−1

))= 2 − 3 − 4 − 1 = −6.

So

ψβ(H) =(

8 42 −6

).

If γ is the standard ordered basis for R2, the reader can verify that

ψγ(H) =(

2 34 −1

). ♦


Theorem 6.32. For any n-dimensional vector space V over F and anyordered basis β for V, ψβ : B(V) → Mn×n(F ) is an isomorphism.

Proof. We leave the proof that ψβ is linear to the reader.To show that ψβ is one-to-one, suppose that ψβ(H) = O for some H ∈

B(V). Fix vi ∈ β, and recall the mapping Lvi: V → F , which is linear by

property 1 on page 423. By hypothesis, Lvi(vj) = H(vi, vj) = 0 for all vj ∈ β.

Hence Lviis the zero transformation from V to F . So

H(vi, x) = Lvi(x) = 0 for all x ∈ V and vi ∈ β. (7)

Next fix an arbitrary y ∈ V, and recall the linear mapping Ry : V → F definedin property 1 on page 423. By (7), Ry(vi) = H(vi, y) = 0 for all vi ∈ β, andhence Ry is the zero transformation. So H(x, y) = Ry(x) = 0 for all x, y ∈ V.Thus H is the zero bilinear form, and therefore ψβ is one-to-one.

To show that ψβ is onto, consider any A ∈ Mn×n(F ). Recall the isomor-phism φβ : V → Fn defined in Section 2.4. For x ∈ V, we view φβ(x) ∈ Fn asa column vector. Let H : V × V → F be the mapping defined by

H(x, y) = [φβ(x)]tA[φβ(y)] for all x, y ∈ V.

A slight embellishment of the method of Example 2 can be used to prove thatH ∈ B(V). We show that ψβ(H) = A. Let vi, vj ∈ β. Then φβ(vi) = ei andφβ(vj) = ej ; hence, for any i and j,

H(vi, vj) = [φβ(vi)]tA[φβ(vj)] = etiAej = Aij .

We conclude that ψβ(H) = A and ψβ is onto.

Corollary 1. For any n-dimensional vector space V, B(V) has dimen-sion n2.

Proof. Exercise.

The following corollary is easily established by reviewing the proof ofTheorem 6.32.

Corollary 2. Let V be an n-dimensional vector space over F withordered basis β. If H ∈ B(V) and A ∈ Mn×n(F ), then ψβ(H) = A if andonly if H(x, y) = [φβ(x)]tA[φβ(y)] for all x, y ∈ V.

The following result is now an immediate consequence of Corollary 2.

Corollary 3. Let F be a field, n a positive integer, and β be the standardordered basis for Fn. Then for any H ∈ B(Fn), there exists a unique matrixA ∈ Mn×n(F ), namely, A = ψβ(H), such that

H(x, y) = xtAy for all x, y ∈ Fn.


Example 4

Define a function H : R2 × R2 → R by

H

((a1

a2

),

(b1

b2

))= det

(a1 b1

a2 b2

)= a1b2 − a2b1 for

(a1

a2

),

(b1

b2

)∈ R2.

It can be shown that H is a bilinear form. We find the matrix A in Corollary 3such that H(x, y) = xtAy for all x, y ∈ R2.

Since Aij = H(ei, ej) for all i and j, we have

A11 = det(

1 10 0

)= 0 A12 = det

(1 00 1

)= 1,

A21 = det(

0 11 0

)= −1 and A22 = det

(0 01 1

)= 0.

Therefore A =(

0 1−1 0

). ♦

There is an analogy between bilinear forms and linear operators on finite-dimensional vector spaces in that both are associated with unique squarematrices and the correspondences depend on the choice of an ordered basis forthe vector space. As in the case of linear operators, one can pose the followingquestion: How does the matrix corresponding to a fixed bilinear form changewhen the ordered basis is changed? As we have seen, the correspondingquestion for matrix representations of linear operators leads to the definitionof the similarity relation on square matrices. In the case of bilinear forms,the corresponding question leads to another relation on square matrices, thecongruence relation.

Definition. Let A, B ∈ Mn×n(F ). Then B is said to be congruent toA if there exists an invertible matrix Q ∈ Mn×n(F ) such that B = QtAQ.

Observe that the relation of congruence is an equivalence relation (seeExercise 12).

The next theorem relates congruence to the matrix representation of abilinear form.

Theorem 6.33. Let V be a finite-dimensional vector space with orderedbases β = {v1, v2, . . . , vn} and γ = {w1, w2, . . . , wn}, and let Q be the changeof coordinate matrix changing γ-coordinates into β-coordinates. Then, forany H ∈ B(V), we have ψγ(H) = Qtψβ(H)Q. Therefore ψγ(H) is congruentto ψβ(H).

Proof. There are essentially two proofs of this theorem. One involves adirect computation, while the other follows immediately from a clever obser-vation. We give the more direct proof here, leaving the other proof for theexercises (see Exercise 13).


Suppose that A = ψβ(H) and B = ψγ(H). Then for 1 ≤ i, j ≤ n,

wi =n∑

k=1

Qkivk and wj =n∑

r=1

Qrjvr.

Thus

Bij = H(wi, wj) = H

(n∑

k=1

Qkivk, wj

)

=n∑

k=1

QkiH(vk, wj)

=n∑

k=1

QkiH

(vk,

n∑r=1

Qrjvr

)

=n∑

k=1

Qki

n∑r=1

QrjH(vk, vr)

=n∑

k=1

Qki

n∑r=1

QrjAkr

=n∑

k=1

Qki

n∑r=1

AkrQrj

=n∑

k=1

Qki(AQ)kj

=n∑

k=1

Qtik(AQ)kj = (QtAQ)ij .

Hence B = QtAQ.

The following result is the converse of Theorem 6.33.

Corollary. Let V be an n-dimensional vector space with ordered basis β,and let H be a bilinear form on V. For any n×n matrix B, if B is congruentto ψβ(H), then there exists an ordered basis γ for V such that ψγ(H) = B.Furthermore, if B = Qtψβ(H)Q for some invertible matrix Q, then Q changesγ-coordinates into β-coordinates.

Proof. Suppose that B = Qtψβ(H)Q for some invertible matrix Q andthat β = {v1, v2, . . . , vn}. Let γ = {w1, w2, . . . , wn}, where

wj =n∑

i=1

Qijvi for 1 ≤ j ≤ n.


Since Q is invertible, γ is an ordered basis for V, and Q is the change ofcoordinate matrix that changes γ-coordinates into β-coordinates. Therefore,by Theorem 6.32,

B = Qtψβ(H)Q = ψγ(H).

Symmetric Bilinear Forms

Like the diagonalization problem for linear operators, there is an analogousdiagonalization problem for bilinear forms, namely, the problem of determin-ing those bilinear forms for which there are diagonal matrix representations.As we will see, there is a close relationship between diagonalizable bilinearforms and those that are called symmetric.

Definition. A bilinear form H on a vector space V is symmetric ifH(x, y) = H(y, x) for all x, y ∈ V.

As the name suggests, symmetric bilinear forms correspond to symmetricmatrices.

Theorem 6.34. Let H be a bilinear form on a finite-dimensional vectorspace V, and let β be an ordered basis for V. Then H is symmetric if andonly if ψβ(H) is symmetric.

Proof. Let β = {v1, v2, . . . , vn} and B = ψβ(H).First assume that H is symmetric. Then for 1 ≤ i, j ≤ n,

Bij = H(vi, vj) = H(vj , vi) = Bji,

and it follows that B is symmetric.Conversely, suppose that B is symmetric. Let J : V × V → F , where F is

the field of scalars for V, be the mapping defined by J(x, y) = H(y, x) for allx, y ∈ V. By property 4 on page 423, J is a bilinear form. Let C = ψβ(J).Then, for 1 ≤ i, j ≤ n,

Cij = J(vi, vj) = H(vj , vi) = Bji = Bij .

Thus C = B. Since ψβ is one-to-one, we have J = H. Hence H(y, x) =J(x, y) = H(x, y) for all x, y ∈ V, and therefore H is symmetric.

Definition. A bilinear form H on a finite-dimensional vector space V iscalled diagonalizable if there is an ordered basis β for V such that ψβ(H)is a diagonal matrix.

Corollary. Let H be a diagonalizable bilinear form on a finite-dimensionalvector space V. Then H is symmetric.


Proof. Suppose that H is diagonalizable. Then there is an ordered basis βfor V such that ψβ(H) = D is a diagonal matrix. Trivially, D is a symmetricmatrix, and hence, by Theorem 6.34, H is symmetric.

Unfortunately, the converse is not true, as is illustrated by the followingexample.

Example 5

Let F = Z2, V = F2, and H : V × V → F be the bilinear form defined by

H

((a1

a2

),

(b1

b2

))= a1b2 + a2b1.

Clearly H is symmetric. In fact, if β is the standard ordered basis for V, then

A = ψβ(H) =(

0 11 0

),

a symmetric matrix. We show that H is not diagonalizable.

By way of contradiction, suppose that H is diagonalizable. Then there isan ordered basis γ for V such that B = ψγ(H) is a diagonal matrix. So byTheorem 6.33, there exists an invertible matrix Q such that B = QtAQ. SinceQ is invertible, it follows that rank(B) = rank(A) = 2, and consequently thediagonal entries of B are nonzero. Since the only nonzero scalar of F is 1,

B =(

1 00 1

).

Suppose that

Q =(

a bc d

).

Then (1 00 1

)= B = QtAQ

=(

a cb d

)(0 11 0

)(a bc d

)=(

ac + ac bc + adbc + ad bd + bd

).

But p + p = 0 for all p ∈ F ; hence ac + ac = 0. Thus, comparing the row1, column 1 entries of the matrices in the equation above, we conclude that1 = 0, a contradiction. Therefore H is not diagonalizable. ♦

The bilinear form of Example 5 is an anomaly. Its failure to be diagonal-izable is due to the fact that the scalar field Z2 is of characteristic two. Recall


from Appendix C that a field F is of characteristic two if 1 + 1 = 0 in F .If F is not of characteristic two, then 1 + 1 = 2 has a multiplicative inverse,which we denote by 1/2.

Before proving the converse of the corollary to Theorem 6.34 for scalarfields that are not of characteristic two, we establish the following lemma.

Lemma. Let H be a nonzero symmetric bilinear form on a vector spaceV over a field F not of characteristic two. Then there is a vector x in V suchthat H(x, x) �= 0.

Proof. Since H is nonzero, we can choose vectors u, v ∈ V such thatH(u, v) �= 0. If H(u, u) �= 0 or H(v, v) �= 0, there is nothing to prove.Otherwise, set x = u + v. Then

H(x, x) = H(u, u) + H(u, v) + H(v, u) + H(v, v) = 2H(u, v) �= 0

because 2 �= 0 and H(u, v) �= 0.

Theorem 6.35. Let V be a finite-dimensional vector space over a fieldF not of characteristic two. Then every symmetric bilinear form on V isdiagonalizable.

Proof. We use mathematical induction on n = dim(V). If n = 1, then everyelement of B(V) is diagonalizable. Now suppose that the theorem is validfor vector spaces of dimension less than n for some fixed integer n > 1, andsuppose that dim(V) = n. If H is the zero bilinear form on V, then trivially His diagonalizable; so suppose that H is a nonzero symmetric bilinear form onV. By the lemma, there exists a nonzero vector x in V such that H(x, x) �= 0.Recall the function Lx : V → F defined by Lx(y) = H(x, y) for all y ∈ V. Byproperty 1 on page 423, Lx is linear. Furthermore, since Lx(x) = H(x, x) �= 0,Lx is nonzero. Consequently, rank(Lx) = 1, and hence dim(N(Lx)) = n − 1.

The restriction of H to N(Lx) is obviously a symmetric bilinear form ona vector space of dimension n − 1. Thus, by the induction hypothesis, thereexists an ordered basis {v1, v2, . . . , vn−1} for N(Lx) such that H(vi, vj) = 0for i �= j (1 ≤ i, j ≤ n − 1). Set vn = x. Then vn /∈ N(Lx), and soβ = {v1, v2, . . . , vn} is an ordered basis for V. In addition, H(vi, vn) =H(vn, vi) = 0 for i = 1, 2, . . . , n − 1. We conclude that ψβ(H) is a diagonalmatrix, and therefore H is diagonalizable.

Corollary. Let F be a field that is not of characteristic two. If A ∈Mn×n(F ) is a symmetric matrix, then A is congruent to a diagonal matrix.

Proof. Exercise.


Diagonalization of Symmetric Matrices

Let A be a symmetric n × n matrix with entries from a field F not ofcharacteristic two. By the corollary to Theorem 6.35, there are matricesQ, D ∈ Mn×n(F ) such that Q is invertible, D is diagonal, and QtAQ = D. Wenow give a method for computing Q and D. This method requires familiaritywith elementary matrices and their properties, which the reader may wish toreview in Section 3.1.

If E is an elementary n×n matrix, then AE can be obtained by performingan elementary column operation on A. By Exercise 21, EtA can be obtainedby performing the same operation on the rows of A rather than on its columns.Thus EtAE can be obtained from A by performing an elementary operationon the columns of A and then performing the same operation on the rowsof AE. (Note that the order of the operations can be reversed because ofthe associative property of matrix multiplication.) Suppose that Q is aninvertible matrix and D is a diagonal matrix such that QtAQ = D. ByCorollary 3 to Theorem 3.6 (p. 159), Q is a product of elementary matrices,say Q = E1E2 · · ·Ek. Thus

D = QtAQ = EtkEt

k−1 · · ·Et1AE1E2 · · ·Ek.

From the preceding equation, we conclude that by means of several elemen-tary column operations and the corresponding row operations, A can be trans-formed into a diagonal matrix D. Furthermore, if E1, E2, . . . , Ek are theelementary matrices corresponding to these elementary column operations in-dexed in the order performed, and if Q = E1E2 · · ·Ek, then QtAQ = D.

Example 6

Let A be the symmetric matrix in M3×3(R) defined by

A =

⎛⎝ 1 −1 3−1 2 1

3 1 1

⎞⎠ .

We use the procedure just described to find an invertible matrix Q and adiagonal matrix D such that QtAQ = D.

We begin by eliminating all of the nonzero entries in the first row andfirst column except for the entry in column 1 and row 1. To this end, weadd the first column of A to the second column to produce a zero in row 1and column 2. The elementary matrix that corresponds to this elementarycolumn operation is

E1 =

⎛⎝1 1 00 1 00 0 1

⎞⎠ .


We perform the corresponding elementary operation on the rows of AE1 toobtain

Et1AE1 =

⎛⎝1 0 30 1 43 4 1

⎞⎠ .

We now use the first column of Et1AE1 to eliminate the 3 in row 1 column 3,

and follow this operation with the corresponding row operation. The corre-sponding elementary matrix E2 and the result of the elementary operationsEt

2Et1AE1E2 are, respectively,

E2 =

⎛⎝1 0 −30 1 00 0 1

⎞⎠ and Et2E

t1AE1E2 =

⎛⎝1 0 00 1 40 4 −8

⎞⎠ .

Finally, we subtract 4 times the second column of Et2E

t1AE1E2 from the

third column and follow this with the corresponding row operation. The cor-responding elementary matrix E3 and the result of the elementary operationsEt

3Et2E

t1AE1E2E3 are, respectively,

E3 =

⎛⎝1 0 00 1 −40 0 1

⎞⎠ and Et3E

t2E

t1AE1E2E3 =

⎛⎝1 0 00 1 00 0 −24

⎞⎠ .

Since we have obtained a diagonal matrix, the process is complete. So we let

Q = E1E2E3 =

⎛⎝1 1 −70 1 −40 0 1

⎞⎠ and D =

⎛⎝1 0 00 1 00 0 −24

⎞⎠to obtain the desired diagonalization QtAQ = D. ♦

The reader should justify the following method for computing Q withoutrecording each elementary matrix separately. The method is inspired by thealgorithm for computing the inverse of a matrix developed in Section 3.2.We use a sequence of elementary column operations and corresponding rowoperations to change the n × 2n matrix (A|I) into the form (D|B), where Dis a diagonal matrix and B = Qt. It then follows that D = QtAQ.

Starting with the matrix A of the preceding example, this method pro-duces the following sequence of matrices:

(A|I) =

⎛⎝ 1 −1 3 1 0 0−1 2 1 0 1 0

3 1 1 0 0 1

⎞⎠ −→⎛⎝ 1 0 3 1 0 0−1 1 1 0 1 0

3 4 1 0 0 1

⎞⎠


−→⎛⎝1 0 3 1 0 0

0 1 4 1 1 03 4 1 0 0 1

⎞⎠ −→⎛⎝1 0 0 1 0 0

0 1 4 1 1 03 4 −8 0 0 1

⎞⎠−→

⎛⎝1 0 0 1 0 00 1 4 1 1 00 4 −8 −3 0 1

⎞⎠ −→⎛⎝1 0 0 1 0 0

0 1 0 1 1 00 4 −24 −3 0 1

⎞⎠−→

⎛⎝1 0 0 1 0 00 1 0 1 1 00 0 −24 −7 −4 1

⎞⎠ = (D|Qt).

Therefore

D =

⎛⎝1 0 00 1 00 0 −24

⎞⎠ , Qt =

⎛⎝ 1 0 01 1 0

−7 −4 1

⎞⎠ , and Q =

⎛⎝1 1 −70 1 −40 0 1

⎞⎠ .

Quadratic Forms

Associated with symmetric bilinear forms are functions called quadraticforms.

Definition. Let V be a vector space over F . A function K : V → F iscalled a quadratic form if there exists a symmetric bilinear form H ∈ B(V)such that

K(x) = H(x, x) for all x ∈ V. (8)

If the field F is not of characteristic two, there is a one-to-one correspon-dence between symmetric bilinear forms and quadratic forms given by (8).In fact, if K is a quadratic form on a vector space V over a field F not ofcharacteristic two, and K(x) = H(x, x) for some symmetric bilinear form Hon V, then we can recover H from K because

H(x, y) =12[K(x + y) − K(x) − K(y)] (9)

(See Exercise 16.)

Example 7

The classic example of a quadratic form is the homogeneous second-degreepolynomial of several variables. Given the variables t1, t2, . . . , tn that takevalues in a field F not of characteristic two and given (not necessarily distinct)scalars aij (1 ≤ i ≤ j ≤ n), define the polynomial

f(t1, t2, . . . , tn) =∑i≤j

aijtitj .


Any such polynomial is a quadratic form. In fact, if β is the standard or-dered basis for Fn, then the symmetric bilinear form H corresponding to thequadratic form f has the matrix representation ψβ(H) = A, where

Aij = Aji =

{aii if i = j12aij if i �= j.

To see this, apply (9) to obtain H(ei, ej) = Aij from the quadratic form K,and verify that f is computable from H by (8) using f in place of K.

For example, given the polynomial

f(t1, t2, t3) = 2t21 − t22 + 6t1t2 − 4t2t3

with real coefficients, let

A =

⎛⎝2 3 03 −1 −20 −2 0

⎞⎠ .

Setting H(x, y) = xtAy for all x, y ∈ R3, we see that

f(t1, t2, t3) = (t1, t2, t3)A

⎛⎝t1t2t3

⎞⎠ for

⎛⎝t1t2t3

⎞⎠ ∈ R3. ♦

Quadratic Forms Over the Field R

Since symmetric matrices over R are orthogonally diagonalizable (see The-orem 6.20 p. 384), the theory of symmetric bilinear forms and quadratic formson finite-dimensional vector spaces over R is especially nice. The followingtheorem and its corollary are useful.

Theorem 6.36. Let V be a finite-dimensional real inner product space,and let H be a symmetric bilinear form on V. Then there exists an orthonor-mal basis β for V such that ψβ(H) is a diagonal matrix.

Proof. Choose any orthonormal basis γ = {v1, v2, . . . , vn} for V, and letA = ψγ(H). Since A is symmetric, there exists an orthogonal matrix Qand a diagonal matrix D such that D = QtAQ by Theorem 6.20. Let β ={w1, w2, . . . , wn} be defined by

wj =n∑

i=1

Qijvi for 1 ≤ j ≤ n.

By Theorem 6.33, ψβ(H) = D. Furthermore, since Q is orthogonal and γ isorthonormal, β is orthonormal by Exercise 30 of Section 6.5.


Corollary. Let K be a quadratic form on a finite-dimensional real innerproduct space V. There exists an orthonormal basis β = {v1, v2, . . . , vn} forV and scalars λ1, λ2, . . . , λn (not necessarily distinct) such that if x ∈ V and

x =n∑

i=1

sivi, si ∈ R,

then

K(x) =n∑

i=1

λis2i .

In fact, if H is the symmetric bilinear form determined by K, then β canbe chosen to be any orthonormal basis for V such that ψβ(H) is a diagonalmatrix.

Proof. Let H be the symmetric bilinear form for which K(x) = H(x, x)for all x ∈ V. By Theorem 6.36, there exists an orthonormal basis β ={v1, v2, . . . , vn} for V such that ψβ(H) is the diagonal matrix

D =

⎛⎜⎜⎜⎝λ1 0 · · · 00 λ2 · · · 0...

......

0 0 · · · λn

⎞⎟⎟⎟⎠ .

Let x ∈ V, and suppose that x =∑n

i=1 sivi. Then

K(x)=H(x, x) = [φβ(x)]tD[φβ(x)]=(s1, s2, . . . , sn)D

⎛⎜⎜⎜⎝s1

s2

...sn

⎞⎟⎟⎟⎠=n∑

i=1

λis2i .

Example 8

For the homogeneous real polynomial of degree 2 defined by

f(t1, t2) = 5t21 + 2t22 + 4t1t2, (10)

we find an orthonormal basis γ = {x1, x2} for R2 and scalars λ1 and λ2 suchthat if (

t1t2

)∈ R2 and

(t1t2

)= s1x1 + s2x2,

then f(t1, t2) = λ1s21 + λ2s

22. We can think of s1 and s2 as the coordinates of

(t1, t2) relative to γ. Thus the polynomial f(t1, t2), as an expression involving


the coordinates of a point with respect to the standard ordered basis for R2,is transformed into a new polynomial g(s1, s2) = λ1s

21 + λ2s

22 interpreted as

an expression involving the coordinates of a point relative to the new orderedbasis γ.

Let H denote the symmetric bilinear form corresponding to the quadraticform defined by (10), let β be the standard ordered basis for R2, and letA = ψβ(H). Then

A = ψβ(H) =(

5 22 2

).

Next, we find an orthogonal matrix Q such that QtAQ is a diagonal matrix.For this purpose, observe that λ1 = 6 and λ2 = 1 are the eigenvalues of Awith corresponding orthonormal eigenvectors

v1 =1√5

(21

)and v2 =

1√5

(1

−2

).

Let γ = {v1, v2}. Then γ is an orthonormal basis for R2 consisting of eigen-vectors of A. Hence, setting

Q =1√5

(2 11 −2

),

we see that Q is an orthogonal matrix and

QtAQ =(

6 00 1

).

Clearly Q is also a change of coordinate matrix. Consequently,

ψγ(H) = Qtψβ(H)Q = QtAQ =(

6 00 1

).

Thus by the corollary to Theorem 6.36,

K(x) = 6s21 + s2

2

for any x = s1v1 + s2v2 ∈ R2. So g(s1, s2) = 6s21 + s2

2. ♦The next example illustrates how the theory of quadratic forms can be

applied to the problem of describing quadratic surfaces in R3.

Example 9

Let S be the surface in R3 defined by the equation

2t21 + 6t1t2 + 5t22 − 2t2t3 + 2t23 + 3t1 − 2t2 − t3 + 14 = 0. (11)


Then (11) describes the points of S in terms of their coordinates relative to β,the standard ordered basis for R3. We find a new orthonormal basis γ for R3

so that the equation describing the coordinates of S relative to γ is simplerthan (11).

We begin with the observation that the terms of second degree on the leftside of (11) add to form a quadratic form K on R3:

K

⎛⎝t1t2t3

⎞⎠ = 2t21 + 6t1t2 + 5t22 − 2t2t3 + 2t23.

Next, we diagonalize K. Let H be the symmetric bilinear form corre-sponding to K, and let A = ψβ(H). Then

A =

⎛⎝2 3 03 5 −10 −1 2

⎞⎠ .

The characteristic polynomial of A is (−1)(t − 2)(t − 7)t; hence A has theeigenvalues λ1 = 2, λ2 = 7, and λ3 = 0. Corresponding unit eigenvectors are

v1 =1√10

⎛⎝103

⎞⎠ , v2 =1√35

⎛⎝ 35

−1

⎞⎠ , and v3 =1√14

⎛⎝−321

⎞⎠ .

Set γ = {v1, v2, v3} and

Q =

⎛⎜⎜⎜⎜⎜⎜⎜⎜⎝

1√10

3√35

−3√14

05√35

2√14

3√10

−1√35

1√14

⎞⎟⎟⎟⎟⎟⎟⎟⎟⎠.

As in Example 8, Q is a change of coordinate matrix changing γ-coordinatesto β-coordinates, and

ψγ(H) = Qtψβ(H)Q = QtAQ =

⎛⎝2 0 00 7 00 0 0

⎞⎠ .

By the corollary to Theorem 6.36, if x = s1v1 + s2v2 + s3v3, then

K(x) = 2s21 + 7s2

2. (12)


�

x′ y′

z′

v3

v1 v2

S

��

��

��# ��

Figure 6.7

We are now ready to transform (11) into an equation involving coordinatesrelative to γ. Let x = (t1, t2, t3) ∈ R3, and suppose that x = s1v1+s2v2+s3v3.Then, by Theorem 2.22 (p. 111),

x =

⎛⎝t1t2t3

⎞⎠ = Q

⎛⎝s1

s2

s3

⎞⎠ ,

and therefore

t1 =s1√10

+3s2√35

− 3s3√14

,

t2 =5s2√35

+2s3√14

,

and

t3 =3s1√10

− s2√35

+s3√14

.


Thus

3t1 − 2t2 − t3 = −14s3√14

= −√

14s3.

Combining (11), (12), and the preceding equation, we conclude that if x ∈ R3

and x = s1v1 + s2v2 + s3v3, then x ∈ S if and only if

2s21 + 7s2

2 −√

14s3 + 14 = 0 or s3 =√

147

s21 +

√142

s22 +

√14.

Consequently, if we draw new axes x′, y′, and z′ in the directions of v1, v2,and v3, respectively, the graph of the equation, rewritten as

z′ =√

147

(x′)2 +√

142

(y′)2 +√

14,

coincides with the surface S. We recognize S to be an elliptic paraboloid.

Figure 6.7 is a sketch of the surface S drawn so that the vectors v1, v2 andv3 are oriented to lie in the principal directions. For practical purposes, thescale of the z′ axis has been adjusted so that the figure fits the page. ♦

The Second Derivative Test for Functions of Several Variables

We now consider an application of the theory of quadratic forms to mul-tivariable calculus—the derivation of the second derivative test for local ex-trema of a function of several variables. We assume an acquaintance with thecalculus of functions of several variables to the extent of Taylor’s theorem.The reader is undoubtedly familiar with the one-variable version of Taylor’stheorem. For a statement and proof of the multivariable version, consult, forexample, An Introduction to Analysis 2d ed, by William R. Wade (PrenticeHall, Upper Saddle River, N.J., 2000).

Let z = f(t1, t2, . . . , tn) be a fixed real-valued function of n real variablesfor which all third-order partial derivatives exist and are continuous. Thefunction f is said to have a local maximum at a point p ∈ Rn if there existsa δ > 0 such that f(p) ≥ f(x) whenever ||x− p|| < δ. Likewise, f has a localminimum at p ∈ Rn if there exists a δ > 0 such that f(p) ≤ f(x) whenever||x−p|| < δ. If f has either a local minimum or a local maximum at p, we saythat f has a local extremum at p. A point p ∈ Rn is called a critical pointof f if ∂f(p)/∂ti = 0 for i = 1, 2, . . . , n. It is a well-known fact that if f hasa local extremum at a point p ∈ Rn, then p is a critical point of f . For, if fhas a local extremum at p = (p1, p2, . . . , pn), then for any i = 1, 2, . . . , n the


function φi defined by φi(t) = f(p1, p2, . . . , pi−1, t, pi+1, . . . , pn) has a localextremum at t = pi. So, by an elementary single-variable argument,

∂f(p)∂ti

=dφi(pi)

dt= 0.

Thus p is a critical point of f . But critical points are not necessarily localextrema.

The second-order partial derivatives of f at a critical point p can oftenbe used to test for a local extremum at p. These partials determine a matrixA(p) in which the row i, column j entry is

∂2f(p)(∂ti)(∂tj)

.

This matrix is called the Hessian matrix of f at p. Note that if the third-order partial derivatives of f are continuous, then the mixed second-orderpartials of f at p are independent of the order in which they are taken, andhence A(p) is a symmetric matrix. In this case, all of the eigenvalues of A(p)are real.

Theorem 6.37 (The Second Derivative Test). Let f(t1, t2, . . . , tn)be a real-valued function in n real variables for which all third-order partialderivatives exist and are continuous. Let p = (p1, p2, . . . , pn) be a criticalpoint of f , and let A(p) be the Hessian of f at p.

(a) If all eigenvalues of A(p) are positive, then f has a local minimum at p.(b) If all eigenvalues of A(p) are negative, then f has a local maximum at p.(c) If A(p) has at least one positive and at least one negative eigenvalue,

then f has no local extremum at p (p is called a saddle-point of f).(d) If rank(A(p)) < n and A(p) does not have both positive and negative

eigenvalues, then the second derivative test is inconclusive.

Proof. If p �= 0 , we may define a function g : Rn → R by

g(t1, t2, . . . , tn) = f(t1 + p1, t2 + p2, . . . , pn + tn) − f(p).

The following facts are easily verified.

1. The function f has a local maximum [minimum] at p if and only if ghas a local maximum [minimum] at 0 = (0, 0, . . . , 0).

2. The partial derivatives of g at 0 are equal to the corresponding partialderivatives of f at p.

3. 0 is a critical point of g.

4. Aij(p) =∂2g(0)

(∂ti)(∂tj)for all i and j.


In view of these facts, we may assume without loss of generality that p = 0and f(p) = 0.

Now we apply Taylor’s theorem to f to obtain the first-order approxima-tion of f around 0 . We have

f(t1, t2, . . . , tn) = f(0 )+n∑

i=1

∂f(0 )∂ti

ti+12

n∑i,j=1

∂2f(0 )(∂ti)(∂tj)

titj +S(t1, t2, . . . , tn)

=12

n∑i,j=1

∂2f(0 )(∂ti)(∂tj)

titj + S(t1, t2, . . . , tn),

(13)

where S is a real-valued function on Rn such that

limx→0

S(x)||x||2 = lim

(t1,t2,...,tn)→0

S(t1, t2, . . . , tn)t21 + t22 + · · · + t2n

= 0. (14)

Let K : Rn → R be the quadratic form defined by

K

⎛⎜⎜⎜⎝t1t2...tn

⎞⎟⎟⎟⎠ =12

n∑i,j=1

∂2f(0 )(∂ti)(∂tj)

titj , (15)

H be the symmetric bilinear form corresponding to K, and β be the standardordered basis for Rn. It is easy to verify that ψβ(H) = 1

2A(p). Since A(p)is symmetric, Theorem 6.20 (p. 384) implies that there exists an orthogonalmatrix Q such that

QtA(p)Q =

⎛⎜⎜⎜⎝λ1 0 . . . 00 λ2 . . . 0...

......

0 0 . . . λn

⎞⎟⎟⎟⎠is a diagonal matrix whose diagonal entries are the eigenvalues of A(p). Letγ = {v1, v2, . . . , vn} be the orthogonal basis for Rn whose ith vector is theith column of Q. Then Q is the change of coordinate matrix changing γ-coordinates into β-coordinates, and by Theorem 6.33

ψγ(H) = Qtψβ(H)Q =12QtA(p)Q =

⎛⎜⎜⎜⎜⎜⎜⎜⎝

λ1

20 . . . 0

0λ2

2. . . 0

......

...

0 0 . . .λn

2

⎞⎟⎟⎟⎟⎟⎟⎟⎠.


Suppose that A(p) is not the zero matrix. Then A(p) has nonzero eigen-values. Choose ε > 0 such that ε < |λi|/2 for all λi �= 0. By (14), thereexists δ > 0 such that for any x ∈ Rn satisfying 0 < ||x|| < δ, we have|S(x)| < ε||x||2. Consider any x ∈ Rn such that 0 < ||x|| < δ. Then, by (13)and (15),

|f(x) − K(x)| = |S(x)| < ε||x||2,

and hence

K(x) − ε||x||2 < f(x) < K(x) + ε||x||2. (16)

Suppose that x =n∑

i=1

sivi. Then

||x||2 =n∑

i=1

s2i and K(x) =

12

n∑i=1

λis2i .

Combining these equations with (16), we obtain

n∑i=1

(12λi − ε

)s2

i < f(x) <n∑

i=1

(12λi + ε

)s2

i . (17)

Now suppose that all eigenvalues of A(p) are positive. Then 12λi − ε > 0

for all i, and hence, by the left inequality in (17),

f(0 ) = 0 ≤n∑

i=1

(12λi − ε

)s2

i < f(x).

Thus f(0 ) ≤ f(x) for ||x|| < δ, and so f has a local minimum at 0 . By asimilar argument using the right inequality in (17), we have that if all of theeigenvalues of A(p) are negative, then f has a local maximum at 0 . Thisestablishes (a) and (b) of the theorem.

Next, suppose that A(p) has both a positive and a negative eigenvalue,say, λi > 0 and λj < 0 for some i and j. Then 1

2λi − ε > 0 and 12λj + ε < 0.

Let s be any real number such that 0 < |s| < δ. Substituting x = svi andx = svj into the left inequality and the right inequality of (17), respectively,we obtain

f(0 ) = 0 < ( 12λi − ε)s2 < f(svi) and f(svj) < ( 1

2λj + ε)s2 < 0 = f(0 ).

Thus f attains both positive and negative values arbitrarily close to 0 ; so fhas neither a local maximum nor a local minimum at 0 . This establishes (c).


To show that the second-derivative test is inconclusive under the condi-tions stated in (d), consider the functions

f(t1, t2) = t21 − t42 and g(t1, t2) = t21 + t42

at p = 0 . In both cases, the function has a critical point at p, and

A(p) =(

2 00 0

).

However, f does not have a local extremum at 0 , whereas g has a localminimum at 0 .

Sylvester’s Law of Inertia

Any two matrix representations of a bilinear form have the same rankbecause rank is preserved under congruence. We can therefore define therank of a bilinear form to be the rank of any of its matrix representations.If a matrix representation is a diagonal matrix, then the rank is equal to thenumber of nonzero diagonal entries of the matrix.

We confine our analysis to symmetric bilinear forms on finite-dimensionalreal vector spaces. Each such form has a diagonal matrix representation inwhich the diagonal entries may be positive, negative, or zero. Although theseentries are not unique, we show that the number of entries that are positiveand the number that are negative are unique. That is, they are independentof the choice of diagonal representation. This result is called Sylvester’s lawof inertia. We prove the law and apply it to describe the equivalence classesof congruent symmetric real matrices.

Theorem 6.38 (Sylvester’s Law of Inertia). Let H be a symmetricbilinear form on a finite-dimensional real vector space V. Then the number ofpositive diagonal entries and the number of negative diagonal entries in anydiagonal matrix representation of H are each independent of the diagonalrepresentation.

Proof. Suppose that β and γ are ordered bases for V that determine di-agonal representations of H. Without loss of generality, we may assume thatβ and γ are ordered so that on each diagonal the entries are in the orderof positive, negative, and zero. It suffices to show that both representationshave the same number of positive entries because the number of negative en-tries is equal to the difference between the rank and the number of positiveentries. Let p and q be the number of positive diagonal entries in the matrixrepresentations of H with respect to β and γ, respectively. We suppose thatp �= q and arrive at a contradiction. Without loss of generality, assume thatp < q. Let

β = {v1, v2, . . . , vp, . . . , vr, . . . , vn} and γ = {w1, w2, . . . , wq, . . . , wr, . . . , wn},


where r is the rank of H and n = dim(V). Let L : V → Rp+r−q be the mappingdefined by

L(x) = (H(x, v1), H(x, v2), . . . , H(x, vp), H(x, wq+1), . . . , H(x, wr)).

It is easily verified that L is linear and rank(L) ≤ p + r − q. Hence

nullity(L) ≥ n − (p + r − q) > n − r.

So there exists a nonzero vector v0 such that v0 /∈ span({vr+1, vr+2, . . . , vn}),but v0 ∈ N(L). Since v0 ∈ N(L), it follows that H(v0, vi) = 0 for i ≤ p andH(v0, wi) = 0 for q < i ≤ r. Suppose that

v0 =n∑

j=1

ajvj =n∑

j=1

bjwj .

For any i ≤ p,

H(v0, vi) = H

⎛⎝ n∑j=1

ajvj , vi

⎞⎠ =n∑

j=1

ajH(vj , vi) = aiH(vi, vi).

But for i ≤ p, we have H(vi, vi) > 0 and H(v0, vi) = 0, so that ai =0. Similarly, bi = 0 for q + 1 ≤ i ≤ r. Since v0 is not in the span of{vr+1, vr+2, . . . , vn}, it follows that ai �= 0 for some p < i ≤ r. Thus

H(v0, v0)=H

⎛⎝ n∑j=1

ajvj ,n∑

i=1

aivi

⎞⎠=n∑

j=1

a2jH(vj , vj)=

r∑j=p+1

a2jH(vj , vj)<0.

Furthermore,

H(v0, v0)=H

⎛⎝ n∑j=1

bjwj ,n∑

i=1

biwi

⎞⎠=n∑

j=1

b2jH(wj , wj)=

r∑j=p+1

b2jH(wj , wj)≥0.

So H(v0, v0) < 0 and H(v0, v0) ≥ 0, which is a contradiction. We concludethat p = q.

Definitions. The number of positive diagonal entries in a diagonalrepresentation of a symmetric bilinear form on a real vector space is calledthe index of the form. The difference between the number of positive andthe number of negative diagonal entries in a diagonal representation of asymmetric bilinear form is called the signature of the form. The three termsrank, index, and signature are called the invariants of the bilinear formbecause they are invariant with respect to matrix representations. Thesesame terms apply to the associated quadratic form. Notice that the values ofany two of these invariants determine the value of the third.


Example 10

The bilinear form corresponding to the quadratic form K of Example 9 hasa 3 × 3 diagonal matrix representation with diagonal entries of 2, 7, and 0.Therefore the rank, index, and signature of K are each 2. ♦Example 11

The matrix representation of the bilinear form corresponding to the quadraticform K(x, y) = x2 − y2 on R2 with respect to the standard ordered basis isthe diagonal matrix with diagonal entries of 1 and −1. Therefore the rank ofK is 2, the index of K is 1, and the signature of K is 0. ♦

Since the congruence relation is intimately associated with bilinear forms,we can apply Sylvester’s law of inertia to study this relation on the set of realsymmetric matrices. Let A be an n × n real symmetric matrix, and supposethat D and E are each diagonal matrices congruent to A. By Corollary 3to Theorem 6.32, A is the matrix representation of the bilinear form H onRn defined by H(x, y) = xtAy with respect to the standard ordered basis forRn. Therefore Sylvester’s law of inertia tells us that D and E have the samenumber of positive and negative diagonal entries. We can state this result asthe matrix version of Sylvester’s law.

Corollary 1 (Sylvester’s Law of Inertia for Matrices). Let A bea real symmetric matrix. Then the number of positive diagonal entries andthe number of negative diagonal entries in any diagonal matrix congruent toA is independent of the choice of the diagonal matrix.

Definitions. Let A be a real symmetric matrix, and let D be a diagonalmatrix that is congruent to A. The number of positive diagonal entries ofD is called the index of A. The difference between the number of positivediagonal entries and the number of negative diagonal entries of D is calledthe signature of A. As before, the rank, index, and signature of a matrixare called the invariants of the matrix, and the values of any two of theseinvariants determine the value of the third.

Any two of these invariants can be used to determine an equivalence classof congruent real symmetric matrices.

Corollary 2. Two real symmetric n × n matrices are congruent if andonly if they have the same invariants.

Proof. If A and B are congruent n× n symmetric matrices, then they areboth congruent to the same diagonal matrix, and it follows that they havethe same invariants.

Conversely, suppose that A and B are n×n symmetric matrices with thesame invariants. Let D and E be diagonal matrices congruent to A and B,


respectively, chosen so that the diagonal entries are in the order of positive,negative, and zero. (Exercise 23 allows us to do this.) Since A and B havethe same invariants, so do D and E. Let p and r denote the index and therank, respectively, of both D and E. Let di denote the ith diagonal entryof D, and let Q be the n × n diagonal matrix whose ith diagonal entry qi isgiven by

qi =

⎧⎪⎪⎪⎪⎪⎪⎨⎪⎪⎪⎪⎪⎪⎩

1√di

if 1 ≤ i ≤ p

1√−di

if p < i ≤ r

1 if r < i.

Then QtDQ = Jpr, where

Jpr =

⎛⎝Ip O OO −Ir−p OO O O

⎞⎠ .

It follows that A is congruent to Jpr. Similarly, B is congruent to Jpr, andhence A is congruent to B.

The matrix Jpr acts as a canonical form for the theory of real symmet-ric matrices. The next corollary, whose proof is contained in the proof ofCorollary 2, describes the role of Jpr.

Corollary 3. A real symmetric n × n matrix A has index p and rank rif and only if A is congruent to Jpr (as just defined).

Example 12

Let

A =

⎛⎝ 1 1 −3−1 2 1

3 1 1

⎞⎠ , B =

⎛⎝1 2 12 3 21 2 1

⎞⎠ , and C =

⎛⎝1 0 10 1 21 2 1

⎞⎠ .

We apply Corollary 2 to determine which pairs of the matrices A, B, and Care congruent.

The matrix A is the 3 × 3 matrix of Example 6, where it is shown thatA is congruent to a diagonal matrix with diagonal entries 1, 1, and −24.Therefore, A has rank 3 and index 2. Using the methods of Example 6 (it isnot necessary to compute Q), it can be shown that B and C are congruent,respectively, to the diagonal matrices⎛⎝1 0 0

0 −1 00 0 −1

⎞⎠ and

⎛⎝1 0 00 1 00 0 −4

⎞⎠ .


It follows that both A and C have rank 3 and index 2, while B has rank 3 andindex 1. We conclude that A and C are congruent but that B is congruentto neither A nor C. ♦

EXERCISES


(a) Every quadratic form is a bilinear form.(b) If two matrices are congruent, they have the same eigenvalues.(c) Symmetric bilinear forms have symmetric matrix representations.(d) Any symmetric matrix is congruent to a diagonal matrix.(e) The sum of two symmetric bilinear forms is a symmetric bilinear

form.(f) Two symmetric matrices with the same characteristic polynomial

are matrix representations of the same bilinear form.(g) There exists a bilinear form H such that H(x, y) �= 0 for all x and

y.(h) If V is a vector space of dimension n, then dim(B(V )) = 2n.(i) Let H be a bilinear form on a finite-dimensional vector space V

with dim(V) > 1. For any x ∈ V, there exists y ∈ V such thaty �= 0 , but H(x, y) = 0.

(j) If H is any bilinear form on a finite-dimensional real inner productspace V, then there exists an ordered basis β for V such that ψβ(H)is a diagonal matrix.

2. Prove properties 1, 2, 3, and 4 on page 423.

3. (a) Prove that the sum of two bilinear forms is a bilinear form.(b) Prove that the product of a scalar and a bilinear form is a bilinear

form.(c) Prove Theorem 6.31.

4. Determine which of the mappings that follow are bilinear forms. Justifyyour answers.

(a) Let V = C[0, 1] be the space of continuous real-valued functions onthe closed interval [0, 1]. For f, g ∈ V, define

H(f, g) =∫ 1

0

f(t)g(t)dt.

(b) Let V be a vector space over F , and let J ∈ B(V) be nonzero.Define H : V × V → F by

H(x, y) = [J(x, y)]2 for all x, y ∈ V.


(c) Define H : R × R → R by H(t1, t2) = t1 + 2t2.(d) Consider the vectors of R2 as column vectors, and let H : R2 → R

be the function defined by H(x, y) = det(x, y), the determinant ofthe 2 × 2 matrix with columns x and y.

(e) Let V be a real inner product space, and let H : V×V → R be thefunction defined by H(x, y) = 〈x, y〉 for x, y ∈ V.

(f) Let V be a complex inner product space, and let H : V × V → Cbe the function defined by H(x, y) = 〈x, y〉 for x, y ∈ V.

5. Verify that each of the given mappings is a bilinear form. Then computeits matrix representation with respect to the given ordered basis β.

(a) H : R3 × R3 → R, where

H

⎛⎝⎛⎝a1

a2

a3

⎞⎠ ,

⎛⎝b1

b2

b3

⎞⎠⎞⎠ = a1b1 − 2a1b2 + a2b1 − a3b3

and

β =

⎧⎨⎩⎛⎝1

01

⎞⎠ ,

⎛⎝ 10

−1

⎞⎠ ,

⎛⎝010

⎞⎠⎫⎬⎭ .

(b) Let V = M2×2(R) and

β ={(

1 00 0

),

(0 10 0

),

(0 01 0

),

(0 00 1

)}.

Define H : V × V → R by H(A, B) = tr(A) · tr(B).(c) Let β = {cos t, sin t, cos 2t, sin 2t}. Then β is an ordered basis

for V = span(β), a four-dimensional subspace of the space of allcontinuous functions on R. Let H : V × V → R be the functiondefined by H(f, g) = f ′(0) · g′′(0).

6. Let H : R2 → R be the function defined by

H

((a1

a2

),

(b1

b2

))= a1b2 + a2b1 for

(a1

a2

),

(b1

b2

)∈ R2.

(a) Prove that H is a bilinear form.(b) Find the 2×2 matrix A such that H(x, y) = xtAy for all x, y ∈ R2.

For a 2×2 matrix M with columns x and y, the bilinear form H(M) =H(x, y) is called the permanent of M .

7. Let V and W be vector spaces over the same field, and let T : V → W bea linear transformation. For any H ∈ B(W), define T(H) : V × V → F

by T(H)(x, y) = H(T(x), T(y)) for all x, y ∈ V. Prove the followingresults.


(a) If H ∈ B(W), then T(H) ∈ B(V).(b) T : B(W) → B(V) is a linear transformation.(c) If T is an isomorphism, then so is T.

8. Assume the notation of Theorem 6.32.

(a) Prove that for any ordered basis β, ψβ is linear.(b) Let β be an ordered basis for an n-dimensional space V over F , and

let φβ : V → Fn be the standard representation of V with respectto β. For A ∈ Mn×n(F ), define H : V × V → F by H(x, y) =[φβ(x)]tA[φβ(y)]. Prove that H ∈ B(V). Can you establish this asa corollary to Exercise 7?

(c) Prove the converse of (b): Let H be a bilinear form on V. IfA = ψβ(H), then H(x, y) = [φβ(x)]tA[φβ(y)].

9. (a) Prove Corollary 1 to Theorem 6.32.(b) For a finite-dimensional vector space V, describe a method for

finding an ordered basis for B(V).

10. Prove Corollary 2 to Theorem 6.32.


12. Prove that the relation of congruence is an equivalence relation.

13. The following outline provides an alternative proof to Theorem 6.33.

(a) Suppose that β and γ are ordered bases for a finite-dimensionalvector space V, and let Q be the change of coordinate matrixchanging γ-coordinates to β-coordinates. Prove that φβ = LQφγ ,where φβ and φγ are the standard representations of V with respectto β and γ, respectively.

(b) Apply Corollary 2 to Theorem 6.32 to (a) to obtain an alternativeproof of Theorem 6.33.

14. Let V be a finite-dimensional vector space and H ∈ B(V). Prove that,for any ordered bases β and γ of V, rank(ψβ(H)) = rank(ψγ(H)).

15. Prove the following results.

(a) Any square diagonal matrix is symmetric.(b) Any matrix congruent to a diagonal matrix is symmetric.(c) the corollary to Theorem 6.35

16. Let V be a vector space over a field F not of characteristic two, and letH be a symmetric bilinear form on V. Prove that if K(x) = H(x, x) isthe quadratic form associated with H, then, for all x, y ∈ V,

H(x, y) =12[K(x + y) − K(x) − K(y)].


17. For each of the given quadratic forms K on a real inner product spaceV, find a symmetric bilinear form H such that K(x) = H(x, x) for allx ∈ V. Then find an orthonormal basis β for V such that ψβ(H) is adiagonal matrix.

(a) K : R2 → R defined by K

(t1t2

)= −2t21 + 4t1t2 + t22

(b) K : R2 → R defined by K

(t1t2

)= 7t21 − 8t1t2 + t22

(c) K : R3 → R defined by K

⎛⎝t1t2t3

⎞⎠ = 3t21 + 3t22 + 3t23 − 2t1t3

18. Let S be the set of all (t1, t2, t3) ∈ R3 for which

3t21 + 3t22 + 3t23 − 2t1t3 + 2√

2(t1 + t3) + 1 = 0.

Find an orthonormal basis β for R3 for which the equation relatingthe coordinates of points of S relative to β is simpler. Describe Sgeometrically.

19. Prove the following refinement of Theorem 6.37(d).

(a) If 0 < rank(A) < n and A has no negative eigenvalues, then f hasno local maximum at p.

(b) If 0 < rank(A) < n and A has no positive eigenvalues, then f hasno local minimum at p.

20. Prove the following variation of the second-derivative test for the casen = 2: Define

D =[∂2f(p)

∂t21

] [∂2f(p)

∂t22

]−[∂2f(p)∂t1∂t2

]2

.

(a) If D > 0 and ∂2f(p)/∂t21 > 0, then f has a local minimum at p.(b) If D > 0 and ∂2f(p)/∂t21 < 0, then f has a local maximum at p.(c) If D < 0, then f has no local extremum at p.(d) If D = 0, then the test is inconclusive.

Hint: Observe that, as in Theorem 6.37, D = det(A) = λ1λ2, where λ1

and λ2 are the eigenvalues of A.

21. Let A and E be in Mn×n(F ), with E an elementary matrix. In Sec-tion 3.1, it was shown that AE can be obtained from A by means ofan elementary column operation. Prove that EtA can be obtained bymeans of the same elementary operation performed on the rows ratherthan on the columns of A. Hint: Note that EtA = (AtE)t.

Sec. 6.9 Einstein’s Special Theory of Relativity 451

22. For each of the following matrices A with entries from R, find a diagonalmatrix D and an invertible matrix Q such that QtAQ = D.

(a)(

1 33 2

)(b)

(0 11 0

)(c)

⎛⎝3 1 21 4 02 0 −1

⎞⎠Hint for (b): Use an elementary operation other than interchangingcolumns.

23. Prove that if the diagonal entries of a diagonal matrix are permuted,then the resulting diagonal matrix is congruent to the original one.

24. Let T be a linear operator on a real inner product space V, and defineH : V × V → R by H(x, y) = 〈x,T(y)〉 for all x, y ∈ V.

(a) Prove that H is a bilinear form.(b) Prove that H is symmetric if and only if T is self-adjoint.(c) What properties must T have for H to be an inner product on V?(d) Explain why H may fail to be a bilinear form if V is a complex

inner product space.

25. Prove the converse to Exercise 24(a): Let V be a finite-dimensional realinner product space, and let H be a bilinear form on V. Then thereexists a unique linear operator T on V such that H(x, y) = 〈x,T(y)〉 forall x, y ∈ V. Hint: Choose an orthonormal basis β for V, let A = ψβ(H),and let T be the linear operator on V such that [T]β = A. ApplyExercise 8(c) of this section and Exercise 15 of Section 6.2 (p. 355).

26. Prove that the number of distinct equivalence classes of congruent n×nreal symmetric matrices is

(n + 1)(n + 2)2

.

6.9∗ EINSTEIN’S SPECIAL THEORY OF RELATIVITY

As a consequence of physical experiments performed in the latter half of thenineteenth century (most notably the Michelson–Morley experiment of 1887),physicists concluded that the results obtained in measuring the speed of lightare independent of the velocity of the instrument used to measure the speed oflight. For example, suppose that while on Earth, an experimenter measuresthe speed of light emitted from the sun and finds it to be 186,000 miles persecond. Now suppose that the experimenter places the measuring equipmentin a spaceship that leaves Earth traveling at 100,000 miles per second in adirection away from the sun. A repetition of the same experiment from thespaceship yields the same result: Light is traveling at 186,000 miles per second


relative to the spaceship, rather than 86,000 miles per second as one mightexpect!

This revelation led to a new way of relating coordinate systems used tolocate events in space–time. The result was Albert Einstein’s special theoryof relativity. In this section, we develop via a linear algebra viewpoint theessence of Einstein’s theory.

.........

...........................................

..........................................

.................................................................................................................................................................................................................................................................................................................................. .........

...........................................

..........................................

..................................................................................................................................................................................................................................................................................................................................� �

01

2

3

45

6

7

8

90

1

2

3

45

6

7

8

9

$$%&&'

��

�

� �

�

��

z z′

y

x′

y′

S′xS

C C′

Figure 6.8

The basic problem is to compare two different inertial (nonaccelerating)coordinate systems S and S′ in three-space (R3) that are in motion relativeto each other under the assumption that the speed of light is the same whenmeasured in either system. We assume that S′ moves at a constant velocityin relation to S as measured from S. (See Figure 6.8.) To simplify matters,let us suppose that the following conditions hold:

1. The corresponding axes of S and S′ (x and x′, y and y′, z and z′) areparallel, and the origin of S′ moves in the positive direction of the x-axisof S at a constant velocity v > 0 relative to S.

2. Two clocks C and C ′ are placed in space—the first stationary relativeto the coordinate system S and the second stationary relative to thecoordinate system S′. These clocks are designed to give real numbersin units of seconds as readings. The clocks are calibrated so that at theinstant the origins of S and S′ coincide, both clocks give the readingzero.

3. The unit of length is the light second (the distance light travels in 1second), and the unit of time is the second. Note that, with respect tothese units, the speed of light is 1 light second per second.

Given any event (something whose position and time of occurrence can bedescribed), we may assign a set of space–time coordinates to it. For example,


if p is an event that occurs at position⎛⎝xyz

⎞⎠relative to S and at time t as read on clock C, we can assign to p the set ofcoordinates ⎛⎜⎜⎝

xyzt

⎞⎟⎟⎠ .

This ordered 4-tuple is called the space–time coordinates of p relative toS and C. Likewise, p has a set of space–time coordinates⎛⎜⎜⎝

x′

y′

z′

t′

⎞⎟⎟⎠relative to S′ and C ′.

For a fixed velocity v, let Tv : R4 → R4 be the mapping defined by

Tv

⎛⎜⎜⎝xyzt

⎞⎟⎟⎠ =

⎛⎜⎜⎝x′

y′

z′

t′

⎞⎟⎟⎠ ,

where ⎛⎜⎜⎝xyzt

⎞⎟⎟⎠ and

⎛⎜⎜⎝x′

y′

z′

t′

⎞⎟⎟⎠are the space–time coordinates of the same event with respect to S and Cand with respect to S′ and C ′, respectively.

Einstein made certain assumptions about Tv that led to his special theoryof relativity. We formulate an equivalent set of assumptions.

Axioms of the Special Theory of Relativity

(R 1) The speed of any light beam, when measured in either coordinate systemusing a clock stationary relative to that coordinate system, is 1.


(R 2) The mapping Tv : R4 → R4 is an isomorphism.(R 3) If

Tv

⎛⎜⎜⎝xyzt

⎞⎟⎟⎠ =

⎛⎜⎜⎝x′

y′

z′

t′

⎞⎟⎟⎠ ,

then y′ = y and z′ = z.(R 4) If

Tv

⎛⎜⎜⎝xy1

z1

t

⎞⎟⎟⎠ =

⎛⎜⎜⎝x′

y′

z′

t′

⎞⎟⎟⎠ and Tv

⎛⎜⎜⎝xy2

z2

t

⎞⎟⎟⎠ =

⎛⎜⎜⎝x′′

y′′

z′′

t′′

⎞⎟⎟⎠ ,

then x′′ = x′ and t′′ = t′.(R 5) The origin of S moves in the negative direction of the x′-axis of S′ at

the constant velocity −v < 0 as measured from S′.

Axioms (R 3) and (R 4) tell us that for p ∈ R4, the second and third coor-dinates of Tv(p) are unchanged and the first and fourth coordinates of Tv(p)are independent of the second and third coordinates of p.

As we will see, these five axioms completely characterize Tv. The operatorTv is called the Lorentz transformation in direction x. We intend tocompute Tv and use it to study the curious phenomenon of time contraction.

Theorem 6.39. On R4, the following statements are true.(a) Tv(ei) = ei for i = 2, 3.(b) span({e2, e3}) is Tv-invariant.(c) span({e1, e4}) is Tv-invariant.(d) Both span({e2, e3}) and span({e1, e4}) are T∗

v-invariant.(e) T∗

v(ei) = ei for i = 2, 3.

Proof. (a) By axiom (R 2),

Tv

⎛⎜⎜⎝0000

⎞⎟⎟⎠ =

⎛⎜⎜⎝0000

⎞⎟⎟⎠ ,

and hence, by axiom (R 4), the first and fourth coordinates of

Tv

⎛⎜⎜⎝0ab0

⎞⎟⎟⎠


are both zero for any a, b ∈ R. Thus, by axiom (R 3),

Tv

⎛⎜⎜⎝0100

⎞⎟⎟⎠ =

⎛⎜⎜⎝0100

⎞⎟⎟⎠ and Tv

⎛⎜⎜⎝0010

⎞⎟⎟⎠ =

⎛⎜⎜⎝0010

⎞⎟⎟⎠ .

The proofs of (b), (c), and (d) are left as exercises.(e) For any j �= 2, 〈T∗

v(e2), ej〉 = 〈e2, Tv(ej)〉 = 0 by (a) and (c); for j = 2,〈T∗

v(e2), ej〉 = 〈e2, Tv(e2)〉 = 〈e2, e2〉 = 1 by (a). We conclude that T∗v(e2) is

a multiple of e2 (i.e., that T∗v(e2) = ke2 for some k ∈ R). Thus,

1 = 〈e2, e2〉 = 〈e2, Tv(e2)〉 = 〈T∗v(e2), e2〉 = 〈ke2, e2〉 = k,

and hence T∗v(e2) = e2. Similarly, T∗

v(e3) = e3.

Suppose that, at the instant the origins of S and S′ coincide, a lightflash is emitted from their common origin. The event of the light flash whenmeasured either relative to S and C or relative to S′ and C ′ has space–timecoordinates ⎛⎜⎜⎝

0000

⎞⎟⎟⎠ .

Let P be the set of all events whose space–time coordinates⎛⎜⎜⎝xyzt

⎞⎟⎟⎠relative to S and C are such that the flash is observable from the point withcoordinates ⎛⎝x

yz

⎞⎠(as measured relative to S) at the time t (as measured on C). Let us charac-terize P in terms of x, y, z, and t. Since the speed of light is 1, at any timet ≥ 0 the light flash is observable from any point whose distance to the originof S (as measured on S) is t · 1 = t. These are precisely the points that lie onthe sphere of radius t with center at the origin. The coordinates (relative to


S) of such points satisfy the equation x2 + y2 + z2 − t2 = 0. Hence an eventlies in P if and only if its space–time coordinates⎛⎜⎜⎝

xyzt

⎞⎟⎟⎠ (t ≥ 0)

relative to S and C satisfy the equation x2 + y2 + z2 − t2 = 0. By virtue ofaxiom (R 1), we can characterize P in terms of the space–time coordinatesrelative to S′ and C ′ similarly: An event lies in P if and only if, relative toS′ and C ′, its space–time coordinates⎛⎜⎜⎝

x′

y′

z′

t′

⎞⎟⎟⎠ (t ≥ 0)

satisfy the equation (x′)2 + (y′)2 + (z′)2 − (t′)2 = 0.Let

A =

⎛⎜⎜⎝1 0 0 00 1 0 00 0 1 00 0 0 −1

⎞⎟⎟⎠ .

Theorem 6.40. If 〈LA(w), w〉 = 0 for some w ∈ R4, then

〈T∗vLATv(w), w〉 = 0.

Proof. Let

w =

⎛⎜⎜⎝xyzt

⎞⎟⎟⎠ ∈ R4,

and suppose that 〈LA(w), w〉 = 0.Case 1. t ≥ 0. Since 〈LA(w), w〉 = x2 + y2 + z2 − t2, the vector w gives

the coordinates of an event in P relative to S and C. Because⎛⎜⎜⎝xyzt

⎞⎟⎟⎠ and

⎛⎜⎜⎝x′

y′

z′

t′

⎞⎟⎟⎠


are the space–time coordinates of the same event relative to S′ and C ′, thediscussion preceding Theorem 6.40 yields

(x′)2 + (y′)2 + (z′)2 − (t′)2 = 0.

Thus 〈T∗vLATv(w), w〉 = 〈LATv(w), Tv(w)〉 = (x′)2 + (y′)2 + (z′)2 − (t′)2 = 0,

and the conclusion follows.Case 2. t < 0. The proof follows by applying case 1 to −w.

We now proceed to deduce information about Tv. Let

w1 =

⎛⎜⎜⎝1001

⎞⎟⎟⎠ and w2 =

⎛⎜⎜⎝100

−1

⎞⎟⎟⎠ .

By Exercise 3, {w1, w2} is an orthogonal basis for span({e1, e4}), andspan({e1, e4}) is T∗

vLATv-invariant. The next result tells us even more.

Theorem 6.41. There exist nonzero scalars a and b such that(a) T∗

vLATv(w1) = aw2.

(b) T∗vLATv(w2) = bw1.

Proof. (a) Because 〈LA(w1), w1〉 = 0, 〈T∗vLATv(w1), w1〉 = 0 by Theo-

rem 6.40. Thus T∗vLATv(w1) is orthogonal to w1. Since span({e1, e4}) =

span({w1, w2}) is T∗vLATv-invariant, T∗

vLATv(w1) must lie in this set. But{w1, w2} is an orthogonal basis for this subspace, and so T∗

vLATv(w1) mustbe a multiple of w2. Thus T∗

vLATv(w1) = aw2 for some scalar a. Since Tv

and A are invertible, so is T∗vLATv. Thus a �= 0, proving (a).

The proof of (b) is similar to (a).

Corollary. Let Bv = [Tv]β , where β is the standard ordered basis for R4.Then

(a) B∗vABv = A.

(b) T∗vLATv = LA.

We leave the proof of the corollary as an exercise. For hints, see Exercise 4.Now consider the situation 1 second after the origins of S and S′ have

coincided as measured by the clock C. Since the origin of S′ is moving alongthe x-axis at a velocity v as measured in S, its space–time coordinates relativeto S and C are ⎛⎜⎜⎝

v001

⎞⎟⎟⎠ .


Similarly, the space–time coordinates for the origin of S′ relative to S′ andC ′ must be ⎛⎜⎜⎝

000t′

⎞⎟⎟⎠for some t′ > 0. Thus we have

Tv

⎛⎜⎜⎝v001

⎞⎟⎟⎠ =

⎛⎜⎜⎝000t′

⎞⎟⎟⎠ for some t′ > 0. (18)

By the corollary to Theorem 6.41,

⟨T∗

vLATv

⎛⎜⎜⎝v001

⎞⎟⎟⎠ ,

⎛⎜⎜⎝v001

⎞⎟⎟⎠⟩

=

⟨LA

⎛⎜⎜⎝v001

⎞⎟⎟⎠ ,

⎛⎜⎜⎝v001

⎞⎟⎟⎠⟩

= v2 − 1. (19)

But also

⟨T∗

vLATv

⎛⎜⎜⎝v001

⎞⎟⎟⎠ ,

⎛⎜⎜⎝v001

⎞⎟⎟⎠⟩

=

⟨LATv

⎛⎜⎜⎝v001

⎞⎟⎟⎠ , Tv

⎛⎜⎜⎝v001

⎞⎟⎟⎠⟩

=

⟨LA

⎛⎜⎜⎝000t′

⎞⎟⎟⎠ ,

⎛⎜⎜⎝000t′

⎞⎟⎟⎠⟩

= −(t′)2. (20)

Combining (19) and (20), we conclude that v2 − 1 = −(t′)2, or

t′ =√

1 − v2. (21)

Thus, from (18) and (21), we obtain

Tv

⎛⎜⎜⎝v001

⎞⎟⎟⎠ =

⎛⎜⎜⎝000√

1 − v2

⎞⎟⎟⎠ . (22)

Next recall that the origin of S moves in the negative direction of thex′-axis of S′ at the constant velocity −v < 0 as measured from S′. [This fact


is axiom (R 5).] Consequently, 1 second after the origins of S and S′ havecoincided as measured on clock C, there exists a time t′′ > 0 as measured onclock C ′ such that

Tv

⎛⎜⎜⎝0001

⎞⎟⎟⎠ =

⎛⎜⎜⎝−vt′′

00t′′

⎞⎟⎟⎠ . (23)

From (23), it follows in a manner similar to the derivation of (22) that

t′′ =1√

1 − v2; (24)

hence, from (23) and (24),

Tv

⎛⎜⎜⎝0001

⎞⎟⎟⎠ =

⎛⎜⎜⎜⎜⎜⎝−v√1 − v2

001√

1 − v2

⎞⎟⎟⎟⎟⎟⎠ . (25)

The following result is now easily proved using (22), (25), and Theorem 6.39.

Theorem 6.42. Let β be the standard ordered basis for R4. Then

[TV ]β = Bv =

⎛⎜⎜⎜⎜⎜⎜⎜⎜⎜⎝

1√1 − v2

0 0−v√1 − v2

0 1 0 0

0 0 1 0−v√1 − v2

0 01√

1 − v2

⎞⎟⎟⎟⎟⎟⎟⎟⎟⎟⎠.

Time Contraction

A most curious and paradoxical conclusion follows if we accept Einstein’stheory. Suppose that an astronaut leaves our solar system in a space vehicletraveling at a fixed velocity v as measured relative to our solar system. Itfollows from Einstein’s theory that, at the end of time t as measured on Earth,the time that passes on the space vehicle is only t

√1 − v2. To establish this

result, consider the coordinate systems S and S′ and clocks C and C ′ thatwe have been studying. Suppose that the origin of S′ coincides with thespace vehicle and the origin of S coincides with a point in the solar system


(stationary relative to the sun) so that the origins of S and S′ coincide andclocks C and C ′ read zero at the moment the astronaut embarks on the trip.

As viewed from S, the space–time coordinates of the vehicle at any timet > 0 as measured by C are ⎛⎜⎜⎝

vt00t

⎞⎟⎟⎠ ,

whereas, as viewed from S′, the space–time coordinates of the vehicle at anytime t′ > 0 as measured by C ′ are ⎛⎜⎜⎝

000t′

⎞⎟⎟⎠ .

But if two sets of space–time coordinates⎛⎜⎜⎝vt00t

⎞⎟⎟⎠ and

⎛⎜⎜⎝000t′

⎞⎟⎟⎠are to describe the same event, it must follow that

Tv

⎛⎜⎜⎝vt00t

⎞⎟⎟⎠ =

⎛⎜⎜⎝000t′

⎞⎟⎟⎠ .

Thus

[TV ]β = Bv =

⎛⎜⎜⎜⎜⎜⎜⎜⎜⎜⎝

1√1 − v2

0 0−v√1 − v2

0 1 0 0

0 0 1 0−v√1 − v2

0 01√

1 − v2

⎞⎟⎟⎟⎟⎟⎟⎟⎟⎟⎠

⎛⎜⎜⎝vt00t

⎞⎟⎟⎠ =

⎛⎜⎜⎝000t′

⎞⎟⎟⎠ .

From the preceding equation, we obtain−v2t√1 − v2

+t√

1 − v2= t′, or

t′ = t√

1 − v2. (26)


This is the desired result.A dramatic consequence of time contraction is that distances are con-

tracted along the line of motion (see Exercise 9).Let us make one additional point. Suppose that we consider units of

distance and time more commonly used than the light second and second,such as the mile and hour, or the kilometer and second. Let c denote thespeed of light relative to our chosen units of distance. It is easily seen that ifan object travels at a velocity v relative to a set of units, then it is travelingat a velocity v/c in units of light seconds per second. Thus, for an arbitraryset of units of distance and time, (26) becomes

t′ = t

√1 − v2

c2.

EXERCISES

1. Prove (b), (c), and (d) of Theorem 6.39.

2. Complete the proof of Theorem 6.40 for the case t < 0.

3. For

w1 =

⎛⎜⎜⎝1001

⎞⎟⎟⎠ and w2 =

⎛⎜⎜⎝100

−1

⎞⎟⎟⎠ ,

show that

(a) {w1, w2} is an orthogonal basis for span({e1, e4});(b) span({e1, e4}) is T∗

vLATv-invariant.


Hints:

(a) Prove that

B∗vABv =

⎛⎜⎜⎝p 0 0 q0 1 0 00 0 1 0

−q 0 0 −p

⎞⎟⎟⎠ ,

where

p =a + b

2and q =

a − b

2.


(b) Show that q = 0 by using the fact that B∗vABv is self-adjoint.

(c) Apply Theorem 6.40 to

w =

⎛⎜⎜⎝0101

⎞⎟⎟⎠to show that p = 1.

5. Derive (24), and prove that

Tv

⎛⎜⎜⎝0001

⎞⎟⎟⎠ =

⎛⎜⎜⎜⎜⎜⎝−v√1 − v2

001√

1 − v2

⎞⎟⎟⎟⎟⎟⎠ . (25)

Hint: Use a technique similar to the derivation of (22).

6. Consider three coordinate systems S, S′, and S′′ with the correspondingaxes (x,x′,x′′; y,y′,y′′; and z,z′,z′′) parallel and such that the x-, x′-,and x′′-axes coincide. Suppose that S′ is moving past S at a velocityv1 > 0 (as measured on S), S′′ is moving past S′ at a velocity v2 > 0(as measured on S′), and S′′ is moving past S at a velocity v3 > 0 (asmeasured on S), and that there are three clocks C, C ′, and C ′′ suchthat C is stationary relative to S, C ′ is stationary relative to S′, andC ′′ is stationary relative to S′′. Suppose that when measured on anyof the three clocks, all the origins of S, S′, and S′′ coincide at time 0.Assuming that Tv3 = Tv2Tv1 (i.e., Bv3 = Bv2Bv1), prove that

v3 =v1 + v2

1 + v1v2.

Note that substituting v2 = 1 in this equation yields v3 = 1. This tellsus that the speed of light as measured in S or S′ is the same. Whywould we be surprised if this were not the case?

7. Compute (Bv)−1. Show (Bv)−1 = B(−v). Conclude that if S′ moves ata negative velocity v relative to S, then [Tv]β = Bv, where Bv is of theform given in Theorem 6.42.

8. Suppose that an astronaut left Earth in the year 2000 and traveled toa star 99 light years away from Earth at 99% of the speed of light andthat upon reaching the star immediately turned around and returnedto Earth at the same speed. Assuming Einstein’s special theory of


relativity, show that if the astronaut was 20 years old at the time ofdeparture, then he or she would return to Earth at age 48.2 in the year2200. Explain the use of Exercise 7 in solving this problem.

9. Recall the moving space vehicle considered in the study of time contrac-tion. Suppose that the vehicle is moving toward a fixed star located onthe x-axis of S at a distance b units from the origin of S. If the spacevehicle moves toward the star at velocity v, Earthlings (who remain “al-most” stationary relative to S) compute the time it takes for the vehicleto reach the star as t = b/v. Due to the phenomenon of time contraction,the astronaut perceives a time span of t′ = t

√1 − v2 = (b/v)

√1 − v2.

A paradox appears in that the astronaut perceives a time span incon-sistent with a distance of b and a velocity of v. The paradox is resolvedby observing that the distance from the solar system to the star asmeasured by the astronaut is less than b.

Assuming that the coordinate systems S and S′ and clocks C and C ′

are as in the discussion of time contraction, prove the following results.

(a) At time t (as measured on C), the space–time coordinates of starrelative to S and C are ⎛⎜⎜⎝

b00t

⎞⎟⎟⎠ .

(b) At time t (as measured on C), the space–time coordinates of thestar relative to S′ and C ′ are⎛⎜⎜⎜⎜⎜⎝

b − vt√1 − v2

00

t − bv√1 − v2

⎞⎟⎟⎟⎟⎟⎠ .

(c) For

x′ =b − tv√1 − v2

and t′ =t − bv√1 − v2

,

we have x′ = b√

1 − v2 − t′v.

This result may be interpreted to mean that at time t′ as measured bythe astronaut, the distance from the astronaut to the star as measuredby the astronaut (see Figure 6.9) is

b√

1 − v2 − t′v.


.........

...........................................

..........................................

.................................................................................................................................................................................................................................................................................................................................. .........

...........................................

..........................................

..................................................................................................................................................................................................................................................................................................................................� �

01

2

3

45

6

7

8

90

1

2

3

45

6

7

8

9

$$%&&'

��

�

� �

�

��

z z′

y

x′

y′

S′xS

C C′

*(star)

(x′, 0, 0)coordinates

relative to S′

(b, 0, 0)coordinatesrelative to SFigure 6.9

(d) Conclude from the preceding equation that(1) the speed of the space vehicle relative to the star, as measured

by the astronaut, is v;(2) the distance from Earth to the star, as measured by the astro-

naut, is b√

1 − v2.Thus distances along the line of motion of the space vehicle appearto be contracted by a factor of

√1 − v2.

6.10∗ CONDITIONING AND THE RAYLEIGH QUOTIENT

In Section 3.4, we studied specific techniques that allow us to solve systems oflinear equations in the form Ax = b, where A is an m× n matrix and b is anm× 1 vector. Such systems often arise in applications to the real world. Thecoefficients in the system are frequently obtained from experimental data,and, in many cases, both m and n are so large that a computer must be usedin the calculation of the solution. Thus two types of errors must be considered.First, experimental errors arise in the collection of data since no instrumentscan provide completely accurate measurements. Second, computers introduceroundoff errors. One might intuitively feel that small relative changes in thecoefficients of the system cause small relative errors in the solution. A systemthat has this property is called well-conditioned; otherwise, the system iscalled ill-conditioned.

We now consider several examples of these types of errors, concentratingprimarily on changes in b rather than on changes in the entries of A. Inaddition, we assume that A is a square, complex (or real), invertible matrixsince this is the case most frequently encountered in applications.

Sec. 6.10 Conditioning and the Rayleigh Quotient 465

Example 1

Consider the system

x1 + x2 = 5x1 − x2 = 1.

The solution to this system is (32

).

Now suppose that we change the system somewhat and consider the newsystem

x1 + x2 = 5x1 − x2 = 1.0001.

This modified system has the solution(3.000051.99995

).

We see that a change of 10−4 in one coefficient has caused a change of lessthan 10−4 in each coordinate of the new solution. More generally, the system

x1 + x2 = 5x1 − x2 = 1 + δ

has the solution (3 + δ/22 − δ/2

).

Hence small changes in b introduce small changes in the solution. Of course,we are really interested in relative changes since a change in the solution of,say, 10, is considered large if the original solution is of the order 10−2, butsmall if the original solution is of the order 106.

We use the notation δb to denote the vector b′ − b, where b is the vectorin the original system and b′ is the vector in the modified system. Thus wehave

δb =(

51 + h

)−(

51

)=(

0h

).

We now define the relative change in b to be the scalar ‖δb‖/‖b‖, where‖ · ‖ denotes the standard norm on Cn (or Rn); that is, ‖b‖ =

√〈b, b〉. Most


of what follows, however, is true for any norm. Similar definitions hold forthe relative change in x. In this example,

‖δb‖‖b‖ =

|h|√26

and‖δx‖‖x‖ =

∥∥∥∥(3 + (h/2)2 − (h/2)

)−(

32

)∥∥∥∥∥∥∥∥(32

)∥∥∥∥ =|h|√26

.

Thus the relative change in x equals, coincidentally, the relative change in b;so the system is well-conditioned. ♦Example 2

Consider the system

x1 + x2 = 3x1 + 1.00001x2 = 3.00001,

which has (21

)as its solution. The solution to the related system

x1 + x2 = 3x1 + 1.00001x2 = 3.00001 + δ

is (2 − (105)h1 + (105)h

).

Hence,

‖δx‖‖x‖ = 105

√2/5 |h| ≥ 104|h|,

while

‖δb‖‖b‖ ≈ |h|

3√

2.

Thus the relative change in x is at least 104 times the relative change in b!This system is very ill-conditioned. Observe that the lines defined by the twoequations are nearly coincident. So a small change in either line could greatlyalter the point of intersection, that is, the solution to the system. ♦


To apply the full strength of the theory of self-adjoint matrices to thestudy of conditioning, we need the notion of the norm of a matrix. (SeeExercise 24 of Section 6.1 for further results about norms.)

Definition. Let A be a complex (or real) n × n matrix. Define the(Euclidean) norm of A by

‖A‖ = maxx�=0

‖Ax‖‖x‖ ,

where x ∈ Cn or x ∈ Rn.

Intuitively, ‖A‖ represents the maximum magnification of a vector by thematrix A. The question of whether or not this maximum exists, as well asthe problem of how to compute it, can be answered by the use of the so-calledRayleigh quotient.

Definition. Let B be an n × n self-adjoint matrix. The Rayleighquotient for x �= 0 is defined to be the scalar R(x) = 〈Bx, x〉 /‖x‖2.

The following result characterizes the extreme values of the Rayleigh quo-tient of a self-adjoint matrix.

Theorem 6.43. For a self-adjoint matrix B ∈ Mn×n(F ), we have thatmaxx�=0

R(x) is the largest eigenvalue of B and minx�=0

R(x) is the smallest eigenvalue

of B.

Proof. By Theorems 6.19 (p. 384) and 6.20 (p. 384), we may choose anorthonormal basis {v1, v2, . . . , vn} of eigenvectors of B such that Bvi = λivi

(1 ≤ i ≤ n), where λ1 ≥ λ2 ≥ · · · ≥ λn. (Recall that by the lemma toTheorem 6.17, p. 373, the eigenvalues of B are real.) Now, for x ∈ Fn, thereexist scalars a1, a2, . . . , an such that

x =n∑

i=1

aivi.

Hence

R(x) =〈Bx, x〉‖x‖2

=

⟨∑ni=1 aiλivi,

∑nj=1 ajvj

⟩‖x‖2

=∑n

i=1 λi|ai|2‖x‖2

≤ λ1

∑ni=1 |ai|2‖x‖2

=λ1‖x‖2

‖x‖2= λ1.

It is easy to see that R(v1) = λ1, so we have demonstrated the first half ofthe theorem. The second half is proved similarly.


Corollary 1. For any square matrix A, ‖A‖ is finite and, in fact, equals√λ, where λ is the largest eigenvalue of A∗A.

Proof. Let B be the self-adjoint matrix A∗A, and let λ be the largesteigenvalue of B. Since, for x �= 0 ,

0 ≤ ‖Ax‖2

‖x‖2=

〈Ax, Ax〉‖x‖2

=〈A∗Ax, x〉

‖x‖2=

〈Bx, x〉‖x‖2

= R(x),

it follows from Theorem 6.43 that ‖A‖2 = λ.

Observe that the proof of Corollary 1 shows that all the eigenvalues ofA∗A are nonnegative. For our next result, we need the following lemma.

Lemma. For any square matrix A, λ is an eigenvalue of A∗A if and onlyif λ is an eigenvalue of AA∗.

Proof. Let λ be an eigenvalue of A∗A. If λ = 0, then A∗A is not invertible.Hence A and A∗ are not invertible, so that λ is also an eigenvalue of AA∗.The proof of the converse is similar.

Suppose now that λ �= 0. Then there exists x �= 0 such that A∗Ax = λx.Apply A to both sides to obtain (AA∗)(Ax) = λ(Ax). Since Ax �= 0 (lestλx = 0 ), we have that λ is an eigenvalue of AA∗. The proof of the converseis left as an exercise.

Corollary 2. Let A be an invertible matrix. Then ‖A−1‖ = 1/√

λ,where λ is the smallest eigenvalue of A∗A.

Proof. Recall that λ is an eigenvalue of an invertible matrix if and only ifλ−1 is an eigenvalue of its inverse.

Now let λ1 ≥ λ2 ≥ · · · ≥ λn be the eigenvalues of A∗A, which by thelemma are the eigenvalues of AA∗. Then ‖A−1‖2 equals the largest eigenvalueof (A−1)∗A−1 = (AA∗)−1, which equals 1/λn.

For many applications, it is only the largest and smallest eigenvalues thatare of interest. For example, in the case of vibration problems, the smallesteigenvalue represents the lowest frequency at which vibrations can occur.

We see the role of both of these eigenvalues in our study of conditioning.

Example 3

Let

A =

⎛⎝ 1 0 1−1 1 0

0 1 1

⎞⎠ .


Then

B = A∗A =

⎛⎝ 2 −1 1−1 2 1

1 1 2

⎞⎠ .

The eigenvalues of B are 3, 3, and 0. Therefore, ‖A‖ =√

3. For any

x =

⎛⎝abc

⎞⎠ �= 0 ,

we may compute R(x) for the matrix B as

3 ≥ R(x) =〈Bx, x〉‖x‖2

=2(a2 + b2 + c2 − ab + ac + bc)

a2 + b2 + c2. ♦

Now that we know ‖A‖ exists for every square matrix A, we can make useof the inequality ‖Ax‖ ≤ ‖A‖ · ‖x‖, which holds for every x.

Assume in what follows that A is invertible, b �= 0 , and Ax = b. Fora given δb, let δx be the vector that satisfies A(x + δx) = b + δb. ThenA(δx) = δb, and so δx = A−1(δb). Hence

‖b‖ = ‖Ax‖ ≤ ‖A‖ · ‖x‖ and ‖δx‖ = ‖A−1(δb)‖ ≤ ‖A−1‖ · ‖δb‖.Thus

‖δx‖‖x‖ ≤ ‖x‖2

‖b‖/‖A‖ ≤ ‖A−1‖ · ‖δb‖ · ‖A‖‖b‖ = ‖A‖ · ‖A−1‖ ·

(‖δb‖‖b‖

).

Similarly (see Exercise 9),

1‖A‖ · ‖A−1‖

(‖δb‖‖b‖

)≤ ‖δx‖

‖x‖ .

The number ‖A‖ · ‖A−1‖ is called the condition number of A and isdenoted cond(A). It should be noted that the definition of cond(A) dependson how the norm of A is defined. There are many reasonable ways of definingthe norm of a matrix. In fact, the only property needed to establish theinequalities above is that ‖Ax‖ ≤ ‖A‖ · ‖x‖ for all x. We summarize theseresults in the following theorem.

Theorem 6.44. For the system Ax = b, where A is invertible and b �= 0 ,the following statements are true.

(a) For any norm ‖ · ‖, we have1

cond(A)‖δb‖‖b‖ ≤ ‖δx‖

‖x‖ ≤ cond(A)‖δb‖‖b‖ .


(b) If ‖ · ‖ is the Euclidean norm, then cond(A) =√

λ1/λn , where λ1 andλn are the largest and smallest eigenvalues, respectively, of A∗A.

Proof. Statement (a) follows from the previous inequalities, and (b) followsfrom Corollaries 1 and 2 to Theorem 6.43.

It is clear from Theorem 6.44 that cond(A) ≥ 1. It is left as an exerciseto prove that cond(A) = 1 if and only if A is a scalar multiple of a unitary ororthogonal matrix. Moreover, it can be shown with some work that equalitycan be obtained in (a) by an appropriate choice of b and δb.

We can see immediately from (a) that if cond(A) is close to 1, then asmall relative error in b forces a small relative error in x. If cond(A) is large,however, then the relative error in x may be small even though the relativeerror in b is large, or the relative error in x may be large even though therelative error in b is small! In short, cond(A) merely indicates the potentialfor large relative errors.

We have so far considered only errors in the vector b. If there is an errorδA in the coefficient matrix of the system Ax = b, the situation is morecomplicated. For example, A + δA may fail to be invertible. But under theappropriate assumptions, it can be shown that a bound for the relative errorin x can be given in terms of cond(A). For example, Charles Cullen (CharlesG. Cullen, An Introduction to Numerical Linear Algebra, PWS PublishingCo., Boston 1994, p. 60) shows that if A + δA is invertible, then

‖δx‖‖x + δx‖ ≤ cond(A)

‖δA‖‖A‖ .

It should be mentioned that, in practice, one never computes cond(A)from its definition, for it would be an unnecessary waste of time to computeA−1 merely to determine its norm. In fact, if a computer is used to findA−1, the computed inverse of A in all likelihood only approximates A−1, andthe error in the computed inverse is affected by the size of cond(A). So weare caught in a vicious circle! There are, however, some situations in whicha usable approximation of cond(A) can be found. Thus, in most cases, theestimate of the relative error in x is based on an estimate of cond(A).

EXERCISES


(a) If Ax = b is well-conditioned, then cond(A) is small.(b) If cond(A) is large, then Ax = b is ill-conditioned.(c) If cond(A) is small, then Ax = b is well-conditioned.(d) The norm of A equals the Rayleigh quotient.(e) The norm of A always equals the largest eigenvalue of A.


2. Compute the norms of the following matrices.

(a)(

4 01 3

)(b)

(5 3

−3 3

)(c)

⎛⎜⎜⎝1 −2√

30

0 −2√3

1

0 2√3

1

⎞⎟⎟⎠3. Prove that if B is symmetric, then ‖B‖ is the largest eigenvalue of B.

4. Let A and A−1 be as follows:

A =

⎛⎝ 6 13 −1713 29 −38

−17 −38 50

⎞⎠ and A−1 =

⎛⎝ 6 −4 1−4 11 7−1 7 5

⎞⎠ .

The eigenvalues of A are approximately 84.74, 0.2007, and 0.0588.

(a) Approximate ‖A‖, ‖A−1‖, and cond(A). (Note Exercise 3.)(b) Suppose that we have vectors x and x such that Ax = b and

‖b − Ax‖ ≤ 0.001. Use (a) to determine upper bounds for‖x−A−1b‖ (the absolute error) and ‖x−A−1b‖/‖A−1b‖ (the rel-ative error).

5. Suppose that x is the actual solution of Ax = b and that a computerarrives at an approximate solution x. If cond(A) = 100, ‖b‖ = 1, and‖b − Ax‖ = 0.1, obtain upper and lower bounds for ‖x − x‖/‖x‖.

6. Let

B =

⎛⎝2 1 11 2 11 1 2

⎞⎠ .

Compute

R

⎛⎝ 1−2

3

⎞⎠ , ‖B‖, and cond(B).

7. Let B be a symmetric matrix. Prove that minx�=0

R(x) equals the smallest

eigenvalue of B.

8. Prove that if λ is an eigenvalue of AA∗, then λ is an eigenvalue of A∗A.This completes the proof of the lemma to Corollary 2 to Theorem 6.43.

9. Prove that if A is an invertible matrix and Ax = b, then

1‖A‖ · ‖A−1‖

(‖δb‖‖b‖

)≤ ‖δx‖

‖x‖ .


10. Prove the left inequality of (a) in Theorem 6.44.

11. Prove that cond(A) = 1 if and only if A is a scalar multiple of a unitaryor orthogonal matrix.

12. (a) Let A and B be square matrices that are unitarily equivalent.Prove that ‖A‖ = ‖B‖.

(b) Let T be a linear operator on a finite-dimensional inner productspace V. Define

‖T‖ = maxx�=0

‖T(x)‖‖x‖ .

Prove that ‖T‖ = ‖[T]β‖, where β is any orthonormal basis for V.(c) Let V be an infinite-dimensional inner product space with an or-

thonormal basis {v1, v2, . . .}. Let T be the linear operator on Vsuch that T(vk) = kvk. Prove that ‖T‖ (defined in (b)) does notexist.

The next exercise assumes the definitions of singular value and pseudoinverseand the results of Section 6.7.

13. Let A be an n × n matrix of rank r with the nonzero singular valuesσ1 ≥ σ2 ≥ · · · ≥ σr. Prove each of the following results.

(a) ‖A‖ = σ1.

(b) ‖A†‖ =1σr

.

(c) If A is invertible (and hence r = n), then cond(A) =σ1

σn.

6.11∗ THE GEOMETRY OF ORTHOGONAL OPERATORS

By Theorem 6.22 (p. 386), any rigid motion on a finite-dimensional real innerproduct space is the composite of an orthogonal operator and a translation.Thus, to understand the geometry of rigid motions thoroughly, we must ana-lyze the structure of orthogonal operators. Such is the aim of this section. Weshow that any orthogonal operator on a finite-dimensional real inner productspace is the composite of rotations and reflections.

This material assumes familiarity with the results about direct sums de-veloped at the end of Section 5.2, and familiarity with the definition andelementary properties of the determinant of a linear operator defined in Ex-ercise 7 of Section 5.1.

Definitions. Let T be a linear operator on a finite-dimensional real innerproduct space V. The operator T is called a rotation if T is the identity on

Sec. 6.11 The Geometry of Orthogonal Operators 473

V or if there exists a two-dimensional subspace W of V, an orthonormal basisβ = {x1, x2} for W, and a real number θ such that

T(x1) = (cos θ)x1 + (sin θ)x2, T(x2) = (− sin θ)x1 + (cos θ)x2,

and T(y) = y for all y ∈ W⊥. In this context, T is called a rotation of Wabout W⊥. The subspace W⊥ is called the axis of rotation.

Rotations are defined in Section 2.1 for the special case that V = R2.

Definitions. Let T be a linear operator on a finite-dimensional realinner product space V. The operator T is called a reflection if there existsa one-dimensional subspace W of V such that T(x) = −x for all x ∈ W andT(y) = y for all y ∈ W⊥. In this context, T is called a reflection of V aboutW⊥.

It should be noted that rotations and reflections (or composites of these)are orthogonal operators (see Exercise 2). The principal aim of this sectionis to establish that the converse is also true, that is, any orthogonal operatoron a finite-dimensional real inner product space is the composite of rotationsand reflections.

Example 1

A Characterization of Orthogonal Operators on a One-Dimensional Real In-ner Product Space

Let T be an orthogonal operator on a one-dimensional inner product spaceV. Choose any nonzero vector x in V. Then V = span({x}), and so T(x) = λxfor some λ ∈ R. Since T is orthogonal and λ is an eigenvalue of T, λ = ±1.If λ = 1, then T is the identity on V, and hence T is a rotation. If λ = −1,then T(x) = −x for all x ∈ V; so T is a reflection of V about V⊥ = {0}. ThusT is either a rotation or a reflection. Note that in the first case, det(T) = 1,and in the second case, det(T) = −1. ♦Example 2

Some Typical Reflections

(a) Define T : R2 → R2 by T(a, b) = (−a, b), and let W = span({e1}).Then T(x) = −x for all x ∈ W, and T(y) = y for all y ∈ W⊥. Thus T is areflection of R2 about W⊥ = span({e2}), the y-axis.

(b) Let T : R3 → R3 be defined by T(a, b, c) = (a, b,−c), and let W =span({e3}). Then T(x) = −x for all x ∈ W, and T(y) = y for all y ∈ W⊥ =span({e1, e2}), the xy-plane. Hence T is a reflection of R3 about W⊥. ♦

Example 1 characterizes all orthogonal operators on a one-dimensionalreal inner product space. The following theorem characterizes all orthogonal


operators on a two-dimensional real inner product space V. The proof fol-lows from Theorem 6.23 (p. 387) since all two-dimensional real inner productspaces are structurally identical. For a rigorous justification, apply Theo-rem 2.21 (p. 104), where β is an orthonormal basis for V. By Exercise 15 ofSection 6.2, the resulting isomorphism φβ : V → R2 preserves inner products.(See Exercise 8.)

Theorem 6.45. Let T be an orthogonal operator on a two-dimensionalreal inner product space V. Then T is either a rotation or a reflection. Fur-thermore, T is a rotation if and only if det(T) = 1, and T is a reflection ifand only if det(T) = −1.

A complete description of the reflections of R2 is given in Section 6.5.

Corollary. Let V be a two-dimensional real inner product space. Thecomposite of a reflection and a rotation on V is a reflection on V.

Proof. If T1 is a reflection on V and T2 is a rotation on V, then byTheorem 6.45, det(T1) = 1 and det(T2) = −1. Let T = T2T1 be thecomposite. Since T2 and T1 are orthogonal, so is T. Moreover, det(T) =det(T2) · det(T1) = −1. Thus, by Theorem 6.45, T is a reflection. The prooffor T1T2 is similar.

We now study orthogonal operators on spaces of higher dimension.

Lemma. If T is a linear operator on a nonzero finite-dimensional realvector space V, then there exists a T-invariant subspace W of V such that1 ≤ dim(W) ≤ 2.

Proof. Fix an ordered basis β = {y1, y2, . . . , yn} for V, and let A = [T ]β .Let φβ : V → Rn be the linear transformation defined by φβ(yi) = ei fori = 1, 2, . . . , n. Then φβ is an isomorphism, and, as we have seen in Sec-tion 2.4, the diagram in Figure 6.10 commutes, that is, LAφβ = φβT. As aconsequence, it suffices to show that there exists an LA-invariant subspace Zof Rn such that 1 ≤ dim(Z) ≤ 2. If we then define W = φ−1

β (Z), it followsthat W satisfies the conclusions of the lemma (see Exercise 13).

VT−−−−→ V⏐⏐!φβ

⏐⏐!φβ

Rn LA−−−−→ Rn

Figure 6.10


The matrix A can be considered as an n × n matrix over C and, as such,can be used to define a linear operator U on Cn by U(v) = Av. Since Uis a linear operator on a finite-dimensional vector space over C, it has aneigenvalue λ ∈ C. Let x ∈ Cn be an eigenvector corresponding to λ. We maywrite λ = λ1 + iλ2, where λ1 and λ2 are real, and

x =

⎛⎜⎜⎜⎝a1 + ib1

a2 + ib2

...an + ibn

⎞⎟⎟⎟⎠ ,

where the ai’s and bi’s are real. Thus, setting

x1 =

⎛⎜⎜⎜⎝a1

a2

...an

⎞⎟⎟⎟⎠ and x2 =

⎛⎜⎜⎜⎝b1

b2

...bn

⎞⎟⎟⎟⎠ ,

we have x = x1 + ix2, where x1 and x2 have real entries. Note that at leastone of x1 or x2 is nonzero since x �= 0 . Hence

U(x) = λx = (λ1 + iλ2)(x1 + ix2) = (λ1x1 − λ2x2) + i(λ1x2 + λ2x1).

Similarly,

U(x) = A(x1 + ix2) = Ax1 + iAx2.

Comparing the real and imaginary parts of these two expressions for U(x),we conclude that

Ax1 = λ1x1 − λ2x2 and Ax2 = λ1x2 + λ2x1.

Finally, let Z = span({x1, x2}), the span being taken as a subspace of Rn.Since x1 �= 0 or x2 �= 0 , Z is a nonzero subspace. Thus 1 ≤ dim(Z) ≤ 2, andthe preceding pair of equations shows that Z is LA-invariant.

Theorem 6.46. Let T be an orthogonal operator on a nonzero finite-dimensional real inner product space V. Then there exists a collection ofpairwise orthogonal T-invariant subspaces {W1, W2, . . . ,Wm} of V such that

(a) 1 ≤ dim(Wi) ≤ 2 for i = 1, 2, . . . , m.

(b) V = W1 ⊕ W2 ⊕ · · · ⊕ Wm.

Proof. The proof is by mathematical induction on dim(V). If dim(V) = 1,the result is obvious. So assume that the result is true whenever dim(V) < nfor some fixed integer n > 1.


Suppose dim(V) = n. By the lemma, there exists a T-invariant subspaceW1 of V such that 1 ≤ dim(W) ≤ 2. If W1 = V, the result is established.Otherwise, W⊥

1 �= {0}. By Exercise 14, W⊥1 is T-invariant and the restriction

of T to W⊥1 is orthogonal. Since dim(W⊥

1 ) < n, we may apply the induc-tion hypothesis to TW⊥

1and conclude that there exists a collection of pair-

wise orthogonal T-invariant subspaces {W1, W2, . . . ,Wm} of W⊥1 such that

1 ≤ dim(Wi) ≤ 2 for i = 2, 3, . . . , m and W⊥1 = W2 ⊕ W3 ⊕ · · · ⊕ Wm.

Thus {W1, W2, . . . ,Wm} is pairwise orthogonal, and by Exercise 13(d) ofSection 6.2,

V = W1 ⊕ W⊥1 = W1 ⊕ W2 ⊕ · · · ⊕ Wm.

Applying Example 1 and Theorem 6.45 in the context of Theorem 6.46,we conclude that the restriction of T to Wi is either a rotation or a reflectionfor each i = 2, 3, . . . , m. Thus, in some sense, T is composed of rotations andreflections. Unfortunately, very little can be said about the uniqueness of thedecomposition of V in Theorem 6.46. For example, the Wi’s, the number mof Wi’s, and the number of Wi’s for which TWi is a reflection are not unique.Although the number of Wi’s for which TWi is a reflection is not unique,whether this number is even or odd is an intrinsic property of T. Moreover,we can always decompose V so that TWi

is a reflection for at most one Wi.These facts are established in the following result.

Theorem 6.47. Let T, V, W1, . . . , Wm be as in Theorem 6.46.(a) The number of Wi’s for which TWi is a reflection is even or odd according

to whether det(T) = 1 or det(T) = −1.(b) It is always possible to decompose V as in Theorem 6.46 so that the

number of Wi’s for which TWiis a reflection is zero or one according to

whether det(T) = 1 or det(T) = −1. Furthermore, if TWiis a reflection,

then dim(Wi) = 1.

Proof. (a) Let r denote the number of Wi’s in the decomposition for whichTWi

is a reflection. Then, by Exercise 15,

det(T) = det(TW1) · det(TW2) · · · · · det(TWm) = (−1)r,

proving (a).(b) Let E = {x ∈ V : T(x) = −x}; then E is a T-invariant subspace

of V. If W = E⊥, then W is T-invariant. So by applying Theorem 6.46to TW, we obtain a collection of pairwise orthogonal T-invariant subspaces{W1, W2, . . . ,Wk} of W such that W = W1 ⊕ W2 ⊕ · · · ⊕ Wk and for 1 ≤i ≤ k, the dimension of each Wi is either 1 or 2. Observe that, for eachi = 1, 2, . . . , k, TWi

is a rotation. For otherwise, if TWiis a reflection, there

exists a nonzero x ∈ Wi for which T(x) = −x. But then, x ∈ Wi ∩ E ⊆E⊥ ∩ E = {0}, a contradiction. If E = {0}, the result follows. Otherwise,


choose an orthonormal basis β for E containing p vectors (p > 0). It ispossible to decompose β into a pairwise disjoint union β = β1 ∪ β2 ∪ · · · ∪ βr

such that each βi contains exactly two vectors for i < r, and βr containstwo vectors if p is even and one vector if p is odd. For each i = 1, 2, . . . , r,let Wk+i = span(βi). Then, clearly, {W1, W2, . . . ,Wk, . . . ,Wk+r} is pairwiseorthogonal, and

V = W1 ⊕ W2 ⊕ · · · ⊕ Wk ⊕ · · · ⊕ Wk+r. (27)

Moreover, if any βi contains two vectors, then

det(TWk+i) = det([TWk+i

]βi) = det

(−1 00 −1

)= 1.

So TWk+iis a rotation, and hence TWj

is a rotation for j < k + r. If βr

consists of one vector, then dim(Wk+r) = 1 and

det(TWk+r) = det([TWk+r

]βr) = det(−1) = −1.

Thus TWk+ris a reflection by Theorem 6.46, and we conclude that the de-

composition in (27) satisfies the condition of (b).

As a consequence of the preceding theorem, an orthogonal operator canbe factored as a product of rotations and reflections.

Corollary. Let T be an orthogonal operator on a finite-dimensional realinner product space V. Then there exists a collection {T1, T2, . . . ,Tm} oforthogonal operators on V such that the following statements are true.

(a) For each i, Ti is either a reflection or a rotation.(b) For at most one i, Ti is a reflection.(c) TiTj = TjTi for all i and j.(d) T = T1T2 · · · Tm.

(e) det(T) =

{1 if Ti is a rotation for each i

−1 otherwise.

Proof. As in the proof of Theorem 6.47(b), we can write

V = W1 ⊕ W2 ⊕ · · · ⊕ Wm,

where TWiis a rotation for i < m. For each i = 1, 2, . . . , m, define Ti : V → V

by

Ti(x1 + x2 + · · · + xm) = x1 + x2 + · · · + xi−1 + T(xi) + xi+1 + · · · + xm,

where xj ∈ Wj for all j. It is easily shown that each Ti is an orthogonaloperator on V. In fact, Ti is a rotation or a reflection according to whetherTWi is a rotation or a reflection. This establishes (a) and (b). The proofsof (c), (d), and (e) are left as exercises. (See Exercise 16.)


Example 3

Orthogonal Operators on a Three-Dimensional Real Inner Product Space

Let T be an orthogonal operator on a three-dimensional real inner productspace V. We show that T can be decomposed into the composite of a rotationand at most one reflection. Let

V = W1 ⊕ W2 ⊕ · · · ⊕ Wm

be a decomposition as in Theorem 6.47(b). Clearly, m = 2 or m = 3.

If m = 2, then V = W1 ⊕ W2. Without loss of generality, suppose thatdim(W1) = 1 and dim(W2) = 2. Thus TW1 is a reflection or the identity onW1, and TW2 is a rotation. Defining T1 and T2 as in the proof of the corollaryto Theorem 6.47, we have that T = T1T2 is the composite of a rotation andat most one reflection. (Note that if TW1 is not a reflection, then T1 is theidentity on V and T = T2.)

If m = 3, then V = W1 ⊕ W2 ⊕ W3 and dim(Wi) = 1 for all i. For eachi, let Ti be as in the proof of the corollary to Theorem 6.47. If TWi is not areflection, then Ti is the identity on Wi. Otherwise, Ti is a reflection. SinceTWi

is a reflection for at most one i, we conclude that T is either a singlereflection or the identity (a rotation). ♦

EXERCISES

1. Label the following statements as true or false. Assume that the under-lying vector spaces are finite-dimensional real inner product spaces.

(a) Any orthogonal operator is either a rotation or a reflection.(b) The composite of any two rotations on a two-dimensional space is

a rotation.(c) The composite of any two rotations on a three-dimensional space

is a rotation.(d) The composite of any two rotations on a four-dimensional space is

a rotation.(e) The identity operator is a rotation.(f) The composite of two reflections is a reflection.(g) Any orthogonal operator is a composite of rotations.(h) For any orthogonal operator T, if det(T) = −1, then T is a reflec-

tion.(i) Reflections always have eigenvalues.(j) Rotations always have eigenvalues.

2. Prove that rotations, reflections, and composites of rotations and re-flections are orthogonal operators.


3. Let

A =

⎛⎜⎜⎝12

√3

2√3

2−1

2

⎞⎟⎟⎠ and B =(

1 00 −1

).

(a) Prove that LA is a reflection.(b) Find the axis in R2 about which LA reflects, that is, the subspace

of R2 on which LA acts as the identity.(c) Prove that LAB and LBA are rotations.

4. For any real number φ, let

A =(

cos φ sin φsin φ − cos φ

).

(a) Prove that LA is a reflection.(b) Find the axis in R2 about which LA reflects.

5. For any real number φ, define Tφ = LA, where

A =(

cos φ − sin φsin φ cos φ

).

(a) Prove that any rotation on R2 is of the form Tφ for some φ.(b) Prove that TφTψ = T(φ+ψ) for any φ, ψ ∈ R.(c) Deduce that any two rotations on R2 commute.

6. Prove that the composite of any two rotations on R3 is a rotation onR3.

7. Given real numbers φ and ψ, define matrices

A =

⎛⎝1 0 00 cos φ − sin φ0 sin φ cos φ

⎞⎠ and B =

⎛⎝cos ψ − sin ψ 0sin ψ cos ψ 0

0 0 1

⎞⎠ .

(a) Prove that LA and LB are rotations.(b) Prove that LAB is a rotation.(c) Find the axis of rotation for LAB .

8. Prove Theorem 6.45 using the hints preceding the statement of thetheorem.

9. Prove that no orthogonal operator can be both a rotation and a reflec-tion.


10. Prove that if V is a two- or three-dimensional real inner product space,then the composite of two reflections on V is a rotation of V.

11. Give an example of an orthogonal operator that is neither a reflectionnor a rotation.

12. Let V be a finite-dimensional real inner product space. Define T : V → Vby T(x) = −x. Prove that T is a product of rotations if and only ifdim(V) is even.

13. Complete the proof of the lemma to Theorem 6.46 by showing thatW = φ−1

β (Z) satisfies the required conditions.

14. Let T be an orthogonal [unitary] operator on a finite-dimensional real[complex] inner product space V. If W is a T-invariant subspace of V,prove the following results.

(a) TW is an orthogonal [unitary] operator on W.(b) W⊥ is a T-invariant subspace of V. Hint: Use the fact that TW

is one-to-one and onto to conclude that, for any y ∈ W, T∗(y) =T−1(y) ∈ W.

(c) TW⊥ is an orthogonal [unitary] operator on W.

15. Let T be a linear operator on a finite-dimensional vector space V, whereV is a direct sum of T-invariant subspaces, say, V = W1⊕W2⊕· · ·⊕Wk.Prove that det(T) = det(TW1) · det(TW2) · · · · · det(TWk

).

16. Complete the proof of the corollary to Theorem 6.47.

17. Let T be a linear operator on an n-dimensional real inner product spaceV. Suppose that T is not the identity. Prove the following results.

(a) If n is odd, then T can be expressed as the composite of at mostone reflection and at most 1

2 (n − 1) rotations.(b) If n is even, then T can be expressed as the composite of at most

12n rotations or as the composite of one reflection and at most12 (n − 2) rotations.

18. Let V be a real inner product space of dimension 2. For any x, y ∈ Vsuch that x �= y and ‖x‖ = ‖y‖ = 1, show that there exists a uniquerotation T on V such that T(x) = y.


Adjoint of a linear operator 358Adjoint of a matrix 331Axis of rotation 473

Bilinear form 422Complex inner product space 332Condition number 469


Congruent matrices 426Conjugate transpose (adjoint) of a

matrix 331Critical point 439Diagonalizable bilinear form 428Fourier coefficients of a vector rela-

tive to an orthonormal set 348Frobenius inner product 332Gram-Schmidt orthogonalization

process 344Hessian matrix 440Index of a bilinear form 444Index of a matrix 445Inner product 329Inner product space 332Invariants of a bilinear form 444Invariants of a matrix 445Least squares line 361Legendre polynomials 346Local extremum 439Local maximum 439Local minimum 439Lorentz transformation 454Matrix representation of a bilinear

form 424Minimal solution of a system of equa-

tions 364Norm of a matrix 467Norm of a vector 333Normal matrix 370Normal operator 370Normalizing a vector 335Orthogonal complement of a subset

of an inner product space 349Orthogonally equivalent

matrices 384Orthogonal matrix 382Orthogonal operator 379Orthogonal projection 398Orthogonal projection on a subspace

351Orthogonal subset of an inner prod-

uct space 335

Orthogonal vectors 335Orthonormal basis 341Orthonormal set 335Penrose conditions 421Permanent of a 2 × 2 matrix 448Polar decomposition of a matrix

412Pseudoinverse of a linear transforma-

tion 413Pseudoinverse of a matrix 414Quadratic form 433Rank of a bilinear form 443Rayleigh quotient 467Real inner product space 332Reflection 473Resolution of the identity operator

induced by a linear transformation402

Rigid motion 385Rotation 472Self-adjoint matrix 373Self-adjoint operator 373Signature of a form 444Signature of a matrix 445Singular value decomposition of a

matrix 410Singular value of a linear transforma-

tion 407Singular value of a matrix 410Space-time coordinates 453Spectral decomposition of a linear

operator 402Spectrum of a linear operator 402Standard inner product 330Symmetric bilinear form 428Translation 386Trigonometric polynomial 399Unitarily equivalent matrices 384Unitary matrix 382Unitary operator 379Unit vector 335

7Canonical Forms7.1 The Jordan Canonical Form I7.2 The Jordan Canonical Form II7.3 The Minimal Polynomial7.4* The Rational Canonical Form

As we learned in Chapter 5, the advantage of a diagonalizable linear oper-ator lies in the simplicity of its description. Such an operator has a diagonalmatrix representation, or, equivalently, there is an ordered basis for the un-derlying vector space consisting of eigenvectors of the operator. However, notevery linear operator is diagonalizable, even if its characteristic polynomialsplits. Example 3 of Section 5.2 describes such an operator.

It is the purpose of this chapter to consider alternative matrix repre-sentations for nondiagonalizable operators. These representations are calledcanonical forms. There are different kinds of canonical forms, and their ad-vantages and disadvantages depend on how they are applied. The choice of acanonical form is determined by the appropriate choice of an ordered basis.Naturally, the canonical forms of a linear operator are not diagonal matricesif the linear operator is not diagonalizable.

In this chapter, we treat two common canonical forms. The first of these,the Jordan canonical form, requires that the characteristic polynomial ofthe operator splits. This form is always available if the underlying field isalgebraically closed, that is, if every polynomial with coefficients from the fieldsplits. For example, the field of complex numbers is algebraically closed bythe fundamental theorem of algebra (see Appendix D). The first two sectionsdeal with this form. The rational canonical form, treated in Section 7.4, doesnot require such a factorization.

7.1 THE JORDAN CANONICAL FORM I

Let T be a linear operator on a finite-dimensional vector space V, and supposethat the characteristic polynomial of T splits. Recall from Section 5.2 thatthe diagonalizability of T depends on whether the union of ordered basesfor the distinct eigenspaces of T is an ordered basis for V. So a lack ofdiagonalizability means that at least one eigenspace of T is too “small.”

482

Sec. 7.1 The Jordan Canonical Form I 483

In this section, we extend the definition of eigenspace to generalizedeigenspace. From these subspaces, we select ordered bases whose union isan ordered basis β for V such that

[T]β =

⎛⎜⎜⎜⎝A1 O · · · OO A2 · · · O...

......

O O · · · Ak

⎞⎟⎟⎟⎠ ,

where each O is a zero matrix, and each Ai is a square matrix of the form(λ) or ⎛⎜⎜⎜⎜⎜⎝

λ 1 0 · · · 0 00 λ 1 · · · 0 0...

......

......

0 0 0 · · · λ 10 0 0 · · · 0 λ

⎞⎟⎟⎟⎟⎟⎠for some eigenvalue λ of T. Such a matrix Ai is called a Jordan blockcorresponding to λ, and the matrix [T]β is called a Jordan canonical formof T. We also say that the ordered basis β is a Jordan canonical basisfor T. Observe that each Jordan block Ai is “almost” a diagonal matrix—infact, [T]β is a diagonal matrix if and only if each Ai is of the form (λ).

Example 1

Suppose that T is a linear operator on C8, and β = {v1, v2, . . . , v8} is anordered basis for C8 such that

J = [T]β =

⎛⎜⎜⎜⎜⎜⎜⎜⎜⎜⎜⎝

2 1 0 0 0 0 0 00 2 1 0 0 0 0 00 0 2 0 0 0 0 00 0 0 2 0 0 0 00 0 0 0 3 1 0 00 0 0 0 0 3 0 00 0 0 0 0 0 0 10 0 0 0 0 0 0 0

⎞⎟⎟⎟⎟⎟⎟⎟⎟⎟⎟⎠is a Jordan canonical form of T. Notice that the characteristic polynomialof T is det(J − tI) = (t − 2)4(t − 3)2t2, and hence the multiplicity of eacheigenvalue is the number of times that the eigenvalue appears on the diagonalof J . Also observe that v1, v4, v5, and v7 are the only vectors in β that areeigenvectors of T. These are the vectors corresponding to the columns of Jwith no 1 above the diagonal entry. ♦

484 Chap. 7 Canonical Forms

In Sections 7.1 and 7.2, we prove that every linear operator whose charac-teristic polynomial splits has a Jordan canonical form that is unique up to theorder of the Jordan blocks. Nevertheless, it is not the case that the Jordancanonical form is completely determined by the characteristic polynomial ofthe operator. For example, let T′ be the linear operator on C8 such that[T′]β = J ′, where β is the ordered basis in Example 1 and

J ′ =

⎛⎜⎜⎜⎜⎜⎜⎜⎜⎜⎜⎝

2 0 0 0 0 0 0 00 2 0 0 0 0 0 00 0 2 0 0 0 0 00 0 0 2 0 0 0 00 0 0 0 3 0 0 00 0 0 0 0 3 0 00 0 0 0 0 0 0 00 0 0 0 0 0 0 0

⎞⎟⎟⎟⎟⎟⎟⎟⎟⎟⎟⎠.

Then the characteristic polynomial of T′ is also (t − 2)4(t − 3)2t2. But theoperator T′ has the Jordan canonical form J ′, which is different from J , theJordan canonical form of the linear operator T of Example 1.

Consider again the matrix J and the ordered basis β of Example 1. Noticethat T(v2) = v1+2v2 and therefore, (T−2I)(v2) = v1. Similarly, (T−2I)(v3) =v2. Since v1 and v4 are eigenvectors of T corresponding to λ = 2, it followsthat (T − 2I)3(vi) = 0 for i = 1, 2, 3, and 4. Similarly (T − 3I)2(vi) = 0 fori = 5, 6, and (T − 0I)2(vi) = 0 for i = 7, 8.

Because of the structure of each Jordan block in a Jordan canonical form,we can generalize these observations: If v lies in a Jordan canonical basis fora linear operator T and is associated with a Jordan block with diagonal entryλ, then (T − λI)p(v) = 0 for sufficiently large p. Eigenvectors satisfy thiscondition for p = 1.

Definition. Let T be a linear operator on a vector space V, and let λ bea scalar. A nonzero vector x in V is called a generalized eigenvector of Tcorresponding to λ if (T − λI)p(x) = 0 for some positive integer p.

Notice that if x is a generalized eigenvector of T corresponding to λ, and pis the smallest positive integer for which (T−λI)p(x) = 0 , then (T−λI)p−1(x)is an eigenvector of T corresponding to λ. Therefore λ is an eigenvalue of T.

In the context of Example 1, each vector in β is a generalized eigenvectorof T. In fact, v1, v2, v3 and v4 correspond to the scalar 2, v5 and v6 correspondto the scalar 3, and v7 and v8 correspond to the scalar 0.

Just as eigenvectors lie in eigenspaces, generalized eigenvectors lie in “gen-eralized eigenspaces.”

Definition. Let T be a linear operator on a vector space V, and let λ bean eigenvalue of T. The generalized eigenspace of T corresponding to


λ, denoted Kλ, is the subset of V defined by

Kλ = {x ∈ V : (T − λI)p(x) = 0 for some positive integer p}.Note that Kλ consists of the zero vector and all generalized eigenvectors

corresponding to λ.Recall that a subspace W of V is T-invariant for a linear operator T if

T(W) ⊆ W. In the development that follows, we assume the results of Exer-cises 3 and 4 of Section 5.4. In particular, for any polynomial g(x), if W isT-invariant, then it is also g(T)-invariant. Furthermore, the range of a linearoperator T is T-invariant.

Theorem 7.1. Let T be a linear operator on a vector space V, and let λbe an eigenvalue of T. Then

(a) Kλ is a T-invariant subspace of V containing Eλ (the eigenspace of Tcorresponding to λ).

(b) For any scalar μ �= λ, the restriction of T − μI to Kλ is one-to-one.

Proof. (a) Clearly, 0 ∈ Kλ. Suppose that x and y are in Kλ. Then thereexist positive integers p and q such that

(T − λI)p(x) = (T − λI)q(y) = 0 .

Therefore

(T − λI)p+q(x + y) = (T − λI)p+q(x) + (T − λI)p+q(y)= (T − λI)q(0 ) + (T − λI)p(0 )= 0 ,

and hence x+y ∈ Kλ. The proof that Kλ is closed under scalar multiplicationis straightforward.

To show that Kλ is T-invariant, consider any x ∈ Kλ. Choose a positiveinteger p such that (T − λI)p(x) = 0 . Then

(T − λI)pT(x) = T(T − λI)p(x) = T(0 ) = 0 .

Therefore T(x) ∈ Kλ.Finally, it is a simple observation that Eλ is contained in Kλ.(b) Let x ∈ Kλ and (T − μI)(x) = 0 . By way of contradiction, suppose

that x �= 0 . Let p be the smallest integer for which (T− λI)p(x) = 0 , and lety = (T − λI)p−1(x). Then

(T − λI)(y) = (T − λI)p(x) = 0 ,

and hence y ∈ Eλ. Furthermore,

(T − μI)(y) = (T − μI)(T − λI)p−1(x) = (T − λI)p−1(T − μI)(x) = 0 ,

so that y ∈ Eμ. But Eλ ∩ Eμ = {0}, and thus y = 0 , contrary to thehypothesis. So x = 0 , and the restriction of T − μI to Kλ is one-to-one.


Theorem 7.2. Let T be a linear operator on a finite-dimensional vectorspace V such that the characteristic polynomial of T splits. Suppose that λis an eigenvalue of T with multiplicity m. Then

(a) dim(Kλ) ≤ m.

(b) Kλ = N((T − λI)m).

Proof. (a) Let W = Kλ, and let h(t) be the characteristic polynomial of TW.By Theorem 5.21 (p. 314), h(t) divides the characteristic polynomial of T, andby Theorem 7.1(b), λ is the only eigenvalue of TW. Hence h(t) = (−1)d(t−λ)d,where d = dim(W), and d ≤ m.

(b) Clearly N((T − λI)m) ⊆ Kλ. Now let W and h(t) be as in (a). Thenh(TW) is identically zero by the Cayley–Hamilton theorem (p. 317); therefore(T−λI)d(x) = 0 for all x ∈ W. Since d ≤ m, we have Kλ ⊆ N((T−λI)m).

Theorem 7.3. Let T be a linear operator on a finite-dimensional vec-tor space V such that the characteristic polynomial of T splits, and letλ1, λ2, . . . , λk be the distinct eigenvalues of T. Then, for every x ∈ V, thereexist vectors vi ∈ Kλi

, 1 ≤ i ≤ k, such that

x = v1 + v2 + · · · + vk.

Proof. The proof is by mathematical induction on the number k of dis-tinct eigenvalues of T. First suppose that k = 1, and let m be the multiplic-ity of λ1. Then (λ1 − t)m is the characteristic polynomial of T, and hence(λ1I − T)m = T0 by the Cayley-Hamilton theorem (p.317). Thus V = Kλ1 ,and the result follows.

Now suppose that for some integer k > 1, the result is established when-ever T has fewer than k distinct eigenvalues, and suppose that T has k distincteigenvalues. Let m be the multiplicity of λk, and let f(t) be the characteristicpolynomial of T. Then f(t) = (t − λk)mg(t) for some polynomial g(t) notdivisible by (t − λk). Let W = R((T − λkI)m). Clearly W is T-invariant.Observe that (T − λkI)m maps Kλi onto itself for i < k. For suppose thati < k. Since (T − λkI)m maps Kλi into itself and λk �= λi, the restrictionof T − λkI to Kλi

is one-to-one (by Theorem 7.1(b)) and hence is onto. Oneconsequence of this is that for i < k, Kλi

is contained in W; hence λi is aneigenvalue of TW for i < k.

Next, observe that λk is not an eigenvalue of TW. For suppose that T(v) =λkv for some v ∈ W. Then v = (T − λkI)m(y) for some y ∈ V, and it followsthat

0 = (T − λkI)(v) = (T − λkI)m+1(y).

Therefore y ∈ Kλk. So by Theorem 7.2, v = (T − λkI)m(y) = 0 .

Since every eigenvalue of TW is an eigenvalue of T, the distinct eigenvaluesof TW are λ1, λ2, . . . , λk−1.


Now let x ∈ V. Then (T−λkI)m(x) ∈ W. Since TW has the k− 1 distincteigenvalues λ1, λ2, . . . , λk−1, the induction hypothesis applies. Hence thereare vectors wi ∈ K′

λi, 1 ≤ i ≤ k − 1, such that

(T − λkI)m(x) = w1 + w2 + · · · + wk−1.

Since K′λi

⊆ Kλifor i < k and (T − λkI)m maps Kλi

onto itself for i < k,there exist vectors vi ∈ Kλi such that (T − λkI)m(vi) = wi for i < k. Thuswe have

(T − λkI)m(x) = (T − λkI)m(v1) + (T − λkI)m(v2) + · · · + (T − λkI)m(vk−1),

and it follows that x− (v1 + v2 + · · · + vk−1) ∈ Kλk. Therefore there exists a

vector vk ∈ Kλksuch that

x = v1 + v2 + · · · + vk.

The next result extends Theorem 5.9(b) (p. 268) to all linear operatorswhose characteristic polynomials split. In this case, the eigenspaces are re-placed by generalized eigenspaces.

Theorem 7.4. Let T be a linear operator on a finite-dimensional vec-tor space V such that the characteristic polynomial of T splits, and letλ1, λ2, . . . , λk be the distinct eigenvalues of T with corresponding multiplici-ties m1, m2, . . . , mk. For 1 ≤ i ≤ k, let βi be an ordered basis for Kλi

. Thenthe following statements are true.

(a) βi ∩ βj = ∅ for i �= j.(b) β = β1 ∪ β2 ∪ · · · ∪ βk is an ordered basis for V.(c) dim(Kλi) = mi for all i.

Proof. (a) Suppose that x ∈ βi ∩ βj ⊆ Kλi ∩ Kλj , where i �= j. ByTheorem 7.1(b), T−λiI is one-to-one on Kλj

, and therefore (T−λiI)p(x) �= 0for any positive integer p. But this contradicts the fact that x ∈ Kλi

, and theresult follows.

(b) Let x ∈ V. By Theorem 7.3, for 1 ≤ i ≤ k, there exist vectors vi ∈ Kλi

such that x = v1 + v2 + · · · + vk. Since each vi is a linear combination ofthe vectors of βi, it follows that x is a linear combination of the vectors of β.Therefore β spans V. Let q be the number of vectors in β. Then dimV ≤ q.For each i, let di = dim(Kλi

). Then, by Theorem 7.2(a),

q =k∑

i=1

di ≤k∑

i=1

mi = dim(V).

Hence q = dim(V). Consequently β is a basis for V by Corollary 2 to thereplacement theorem (p. 47).


(c) Using the notation and result of (b), we see thatk∑

i=1

di =k∑

i=1

mi. But

di ≤ mi by Theorem 7.2(a), and therefore di = mi for all i.

Corollary. Let T be a linear operator on a finite-dimensional vector spaceV such that the characteristic polynomial of T splits. Then T is diagonalizableif and only if Eλ = Kλ for every eigenvalue λ of T.

Proof. Combining Theorems 7.4 and 5.9(a) (p. 268), we see that T isdiagonalizable if and only if dim(Eλ) = dim(Kλ) for each eigenvalue λ of T.But Eλ ⊆ Kλ, and hence these subspaces have the same dimension if and onlyif they are equal.

We now focus our attention on the problem of selecting suitable bases forthe generalized eigenspaces of a linear operator so that we may use Theo-rem 7.4 to obtain a Jordan canonical basis for the operator. For this purpose,we consider again the basis β of Example 1. We have seen that the first fourvectors of β lie in the generalized eigenspace K2. Observe that the vectors inβ that determine the first Jordan block of J are of the form

{v1, v2, v3} = {(T − 2I)2(v3), (T − 2I)(v3), v3}.Furthermore, observe that (T−2I)3(v3) = 0 . The relation between these vec-tors is the key to finding Jordan canonical bases. This leads to the followingdefinitions.

Definitions. Let T be a linear operator on a vector space V, and let xbe a generalized eigenvector of T corresponding to the eigenvalue λ. Supposethat p is the smallest positive integer for which (T − λI)p(x) = 0 . Then theordered set

{(T − λI)p−1(x), (T − λI)p−2(x), . . . , (T − λI)(x), x}is called a cycle of generalized eigenvectors of T corresponding to λ.The vectors (T − λI)p−1(x) and x are called the initial vector and the endvector of the cycle, respectively. We say that the length of the cycle is p.

Notice that the initial vector of a cycle of generalized eigenvectors of alinear operator T is the only eigenvector of T in the cycle. Also observe thatif x is an eigenvector of T corresponding to the eigenvalue λ, then the set {x}is a cycle of generalized eigenvectors of T corresponding to λ of length 1.

In Example 1, the subsets β1 = {v1, v2, v3}, β2 = {v4}, β3 = {v5, v6},and β4 = {v7, v8} are the cycles of generalized eigenvectors of T that occurin β. Notice that β is a disjoint union of these cycles. Furthermore, settingWi = span(βi) for 1 ≤ i ≤ 4, we see that βi is a basis for Wi and [TWi ]βi isthe ith Jordan block of the Jordan canonical form of T. This is precisely thecondition that is required for a Jordan canonical basis.


Theorem 7.5. Let T be a linear operator on a finite-dimensional vectorspace V whose characteristic polynomial splits, and suppose that β is a basisfor V such that β is a disjoint union of cycles of generalized eigenvectors ofT. Then the following statements are true.

(a) For each cycle γ of generalized eigenvectors contained in β, W = span(γ)is T-invariant, and [TW]γ is a Jordan block.

(b) β is a Jordan canonical basis for V.

Proof. (a) Suppose that γ corresponds to λ, γ has length p, and x is theend vector of γ. Then γ = {v1, v2, . . . , vp}, where

vi = (T − λI)p−i(x) for i < p and vp = x.

So

(T − λI)(v1) = (T − λI)p(x) = 0 ,

and hence T(v1) = λv1. For i > 1,

(T − λI)(vi) = (T − λI)p−(i−1)(x) = vi−1.

Therefore T maps W into itself, and, by the preceding equations, we see that[TW]γ is a Jordan block.

For (b), simply repeat the arguments of (a) for each cycle in β in order toobtain [T]β . We leave the details as an exercise.

In view of this result, we must show that, under appropriate conditions,there exist bases that are disjoint unions of cycles of generalized eigenvectors.Since the characteristic polynomial of a Jordan canonical form splits, this isa necessary condition. We will soon see that it is also sufficient. The nextresult moves us toward the desired existence theorem.

Theorem 7.6. Let T be a linear operator on a vector space V, and letλ be an eigenvalue of T. Suppose that γ1, γ2, . . . , γq are cycles of generalizedeigenvectors of T corresponding to λ such that the initial vectors of the γi’sare distinct and form a linearly independent set. Then the γi’s are disjoint,

and their union γ =q⋃

i=1

γi is linearly independent.

Proof. Exercise 5 shows that the γi’s are disjoint.The proof that γ is linearly independent is by mathematical induction on

the number of vectors in γ. If this number is less than 2, then the result isclear. So assume that, for some integer n > 1, the result is valid whenever γhas fewer than n vectors, and suppose that γ has exactly n vectors. Let Wbe the subspace of V generated by γ. Clearly W is (T − λI)-invariant, anddim(W) ≤ n. Let U denote the restriction of T − λI to W.


For each i, let γ′i denote the cycle obtained from γi by deleting the end

vector. Note that if γi has length one, then γ′i = ∅. In the case that γ′

i �= ∅,each vector of γ′

i is the image under U of a vector in γi, and conversely, everynonzero image under U of a vector of γi is contained in γ′

i. Let γ′ =⋃i

γ′i.

Then by the last statement, γ′ generates R(U). Furthermore, γ′ consists ofn − q vectors, and the initial vectors of the γ′

i’s are also initial vectors ofthe γi’s. Thus we may apply the induction hypothesis to conclude that γ′ islinearly independent. Therefore γ′ is a basis for R(U). Hence dim(R(U)) =n − q. Since the q initial vectors of the γi’s form a linearly independent setand lie in N(U), we have dim(N(U)) ≥ q. From these inequalities and thedimension theorem, we obtain

n ≥ dim(W)= dim(R(U)) + dim(N(U))≥ (n − q) + q

= n.

We conclude that dim(W) = n. Since γ generates W and consists of n vectors,it must be a basis for W. Hence γ is linearly independent.

Corollary. Every cycle of generalized eigenvectors of a linear operator islinearly independent.

Theorem 7.7. Let T be a linear operator on a finite-dimensional vectorspace V, and let λ be an eigenvalue of T. Then Kλ has an ordered basis con-sisting of a union of disjoint cycles of generalized eigenvectors correspondingto λ.

Proof. The proof is by mathematical induction on n = dim(Kλ). Theresult is clear for n = 1. So suppose that for some integer n > 1 the result isvalid whenever dim(Kλ) < n, and assume that dim(Kλ) = n. Let U denote therestriction of T−λI to Kλ. Then R(U) is a subspace of Kλ of lesser dimension,and R(U) is the space of generalized eigenvectors corresponding to λ for therestriction of T to R(U). Therefore, by the induction hypothesis, there existdisjoint cycles γ1, γ2, . . . , γq of generalized eigenvectors of this restriction, and

hence of T itself, corresponding to λ for which γ =q⋃

i=1

γi is a basis for R(U).

For 1 ≤ i ≤ q, the end vector of γi is the image under U of a vector vi ∈ Kλ,and so we can extend each γi to a larger cycle γi = γi ∪ {vi} of generalizedeigenvectors of T corresponding to λ. For 1 ≤ i ≤ q, let wi be the initial vectorof γi (and hence of γi). Since {w1, w2, . . . , wq} is a linearly independent sub-set of Eλ, this set can be extended to a basis {w1, w2, . . . , wq, u1, u2, . . . , us}


for Eλ. Then γ1, γ2, . . . , γq, {u1}, {u2}, . . . , {us} are disjoint cycles of gener-alized eigenvectors of T corresponding to λ such that the initial vectors ofthese cycles are linearly independent. Therefore their union γ is a linearlyindependent subset of Kλ by Theorem 7.6.

We show that γ is a basis for Kλ. Suppose that γ consists of r =rank(U) vectors. Then γ consists of r + q + s vectors. Furthermore, since{w1, w2, . . . , wq, u1, u2, . . . , us} is a basis for Eλ = N(U), it follows thatnullity(U) = q + s. Therefore

dim(Kλ) = rank(U) + nullity(U) = r + q + s.

So γ is a linearly independent subset of Kλ containing dim(Kλ) vectors. Itfollows that γ is a basis for Kλ.

The following corollary is immediate.

Corollary 1. Let T be a linear operator on a finite-dimensional vec-tor space V whose characteristic polynomial splits. Then T has a Jordancanonical form.

Proof. Let λ1, λ2, . . . , λk be the distinct eigenvalues of T. By Theorem 7.7,for each i there is an ordered basis βi consisting of a disjoint union of cyclesof generalized eigenvectors corresponding to λi. Let β = β1 ∪ β2 ∪ · · · ∪ βk.Then, by Theorem 7.4(b), β is an ordered basis for V.

The Jordan canonical form also can be studied from the viewpoint ofmatrices.

Definition. Let A ∈ Mn×n(F ) be such that the characteristic polynomialof A (and hence of LA) splits. Then the Jordan canonical form of A isdefined to be the Jordan canonical form of the linear operator LA on Fn.

The next result is an immediate consequence of this definition and Corol-lary 1.

Corollary 2. Let A be an n×n matrix whose characteristic polynomialsplits. Then A has a Jordan canonical form J , and A is similar to J .

Proof. Exercise.

We can now compute the Jordan canonical forms of matrices and linearoperators in some simple cases, as is illustrated in the next two examples.The tools necessary for computing the Jordan canonical forms in general aredeveloped in the next section.


Example 2

Let

A =

⎛⎝ 3 1 −2−1 0 5−1 −1 4

⎞⎠ ∈ M3×3(R).

To find the Jordan canonical form for A, we need to find a Jordan canonicalbasis for T = LA.

The characteristic polynomial of A is

f(t) = det(A − tI) = −(t − 3)(t − 2)2.

Hence λ1 = 3 and λ2 = 2 are the eigenvalues of A with multiplicities 1and 2, respectively. By Theorem 7.4, dim(Kλ1) = 1, and dim(Kλ2) = 2. ByTheorem 7.2, Kλ1 = N(T−3I), and Kλ2 = N((T−2I)2). Since Eλ1 = N(T−3I),we have that Eλ1 = Kλ1 . Observe that (−1, 2, 1) is an eigenvector of Tcorresponding to λ1 = 3; therefore

β1 =

⎧⎨⎩⎛⎝−1

21

⎞⎠⎫⎬⎭is a basis for Kλ1 .

Since dim(Kλ2) = 2 and a generalized eigenspace has a basis consisting ofa union of cycles, this basis is either a union of two cycles of length 1 or asingle cycle of length 2. The former case is impossible because the vectors inthe basis would be eigenvectors—contradicting the fact that dim(Eλ2) = 1.Therefore the desired basis is a single cycle of length 2. A vector v is the endvector of such a cycle if and only if (A − 2I)v �= 0 , but (A − 2I)2v = 0 . Itcan easily be shown that ⎧⎨⎩

⎛⎝ 1−3−1

⎞⎠ ,

⎛⎝−120

⎞⎠⎫⎬⎭is a basis for the solution space of the homogeneous system (A − 2I)2x = 0 .Now choose a vector v in this set so that (A − 2I)v �= 0 . The vector v =(−1, 2, 0) is an acceptable candidate for v. Since (A− 2I)v = (1,−3,−1), weobtain the cycle of generalized eigenvectors

β2 = {(A − 2I)v, v} =

⎧⎨⎩⎛⎝ 1−3−1

⎞⎠ ,

⎛⎝−120

⎞⎠⎫⎬⎭


as a basis for Kλ2 . Finally, we take the union of these two bases to obtain

β = β1 ∪ β2 =

⎧⎨⎩⎛⎝−1

21

⎞⎠ ,

⎛⎝ 1−3−1

⎞⎠ ,

⎛⎝−120

⎞⎠⎫⎬⎭ ,

which is a Jordan canonical basis for A. Therefore,

J = [T]β =

⎛⎝ 3 0 00 2 10 0 2

⎞⎠is a Jordan canonical form for A. Notice that A is similar to J . In fact,J = Q−1AQ, where Q is the matrix whose columns are the vectors in β.

♦Example 3

Let T be the linear operator on P2(R) defined by T(g(x)) = −g(x) − g′(x).We find a Jordan canonical form of T and a Jordan canonical basis for T.

Let β be the standard ordered basis for P2(R). Then

[T]β =

⎛⎝−1 −1 00 −1 −20 0 −1

⎞⎠ ,

which has the characteristic polynomial f(t) = −(t + 1)3. Thus λ = −1 isthe only eigenvalue of T, and hence Kλ = P2(R) by Theorem 7.4. So β is abasis for Kλ. Now

dim(Eλ) = 3 − rank(A + I) = 3 − rank

⎛⎝0 −1 00 0 −20 0 0

⎞⎠ = 3 − 2 = 1.

Therefore a basis for Kλ cannot be a union of two or three cycles becausethe initial vector of each cycle is an eigenvector, and there do not exist twoor more linearly independent eigenvectors. So the desired basis must consistof a single cycle of length 3. If γ is such a cycle, then γ determines a singleJordan block

[T]γ =

⎛⎝−1 1 00 −1 10 0 −1

⎞⎠ ,

which is a Jordan canonical form of T.

The end vector h(x) of such a cycle must satisfy (T + I)2(h(x)) �= 0 . Inany basis for Kλ, there must be a vector that satisfies this condition, or else


no vector in Kλ satisfies this condition, contrary to our reasoning. Testingthe vectors in β, we see that h(x) = x2 is acceptable. Therefore

γ = {(T + I)2(x2), (T + I)(x2), x2} = {2,−2x, x2}is a Jordan canonical basis for T. ♦

In the next section, we develop a computational approach for finding aJordan canonical form and a Jordan canonical basis. In the process, we provethat Jordan canonical forms are unique up to the order of the Jordan blocks.

Let T be a linear operator on a finite-dimensional vector space V, and sup-pose that the characteristic polynomial of T splits. By Theorem 5.11 (p. 278),T is diagonalizable if and only if V is the direct sum of the eigenspaces of T.If T is diagonalizable, then the eigenspaces and the generalized eigenspacescoincide. The next result, which is optional, extends Theorem 5.11 to thenondiagonalizable case.

Theorem 7.8. Let T be a linear operator on a finite-dimensional vectorspace V whose characteristic polynomial splits. Then V is the direct sum ofthe generalized eigenspaces of T.

Proof. Exercise.

EXERCISES


(a) Eigenvectors of a linear operator T are also generalized eigenvec-tors of T.

(b) It is possible for a generalized eigenvector of a linear operator Tto correspond to a scalar that is not an eigenvalue of T.

(c) Any linear operator on a finite-dimensional vector space has a Jor-dan canonical form.

(d) A cycle of generalized eigenvectors is linearly independent.(e) There is exactly one cycle of generalized eigenvectors correspond-

ing to each eigenvalue of a linear operator on a finite-dimensionalvector space.

(f) Let T be a linear operator on a finite-dimensional vector spacewhose characteristic polynomial splits, and let λ1, λ2, . . . , λk bethe distinct eigenvalues of T. If, for each i, βi is a basis for Kλi ,then β1 ∪ β2 ∪ · · · ∪ βk is a Jordan canonical basis for T.

(g) For any Jordan block J , the operator LJ has Jordan canonicalform J .

(h) Let T be a linear operator on an n-dimensional vector space whosecharacteristic polynomial splits. Then, for any eigenvalue λ of T,Kλ = N((T − λI)n).


2. For each matrix A, find a basis for each generalized eigenspace of LA

consisting of a union of disjoint cycles of generalized eigenvectors. Thenfind a Jordan canonical form J of A.

(a) A =(

1 1−1 3

)(b) A =

(1 23 2

)

(c) A =

⎛⎝11 −4 −521 −8 −113 −1 0

⎞⎠ (d) A =

⎛⎜⎜⎝2 1 0 00 2 1 00 0 3 00 1 −1 3

⎞⎟⎟⎠3. For each linear operator T, find a basis for each generalized eigenspace

of T consisting of a union of disjoint cycles of generalized eigenvectors.Then find a Jordan canonical form J of T.

(a) T is the linear operator on P2(R) defined by T(f(x)) = 2f(x) −f ′(x)

(b) V is the real vector space of functions spanned by the set of realvalued functions {1, t, t2, et, tet}, and T is the linear operator on Vdefined by T(f) = f ′.

(c) T is the linear operator on M2×2(R) defined by T(A) =(

1 10 1

)·A

for all A ∈ M2×2(R).(d) T(A) = 2A + At for all A ∈ M2×2(R).

4.† Let T be a linear operator on a vector space V, and let γ be a cycleof generalized eigenvectors that corresponds to the eigenvalue λ. Provethat span(γ) is a T-invariant subspace of V.

5. Let γ1, γ2, . . . , γp be cycles of generalized eigenvectors of a linear op-erator T corresponding to an eigenvalue λ. Prove that if the initialeigenvectors are distinct, then the cycles are disjoint.

6. Let T : V → W be a linear transformation. Prove the following results.

(a) N(T) = N(−T).(b) N(Tk) = N((−T)k).(c) If V = W (so that T is a linear operator on V) and λ is an eigen-

value of T, then for any positive integer k

N((T − λIV)k) = N((λIV − T)k).

7. Let U be a linear operator on a finite-dimensional vector space V. Provethe following results.

(a) N(U) ⊆ N(U2) ⊆ · · · ⊆ N(Uk) ⊆ N(Uk+1) ⊆ · · · .


(b) If rank(Um) = rank(Um+1) for some positive integer m, thenrank(Um) = rank(Uk) for any positive integer k ≥ m.

(c) If rank(Um) = rank(Um+1) for some positive integer m, thenN(Um) = N(Uk) for any positive integer k ≥ m.

(d) Let T be a linear operator on V, and let λ be an eigenvalue of T.Prove that if rank((T−λI)m) = rank((T−λI)m+1) for some integerm, then Kλ = N((T − λI)m).

(e) Second Test for Diagonalizability. Let T be a linear operator onV whose characteristic polynomial splits, and let λ1, λ2, . . . , λk bethe distinct eigenvalues of T. Then T is diagonalizable if and onlyif rank(T − λI) = rank((T − λI)2) for 1 ≤ i ≤ k.

(f) Use (e) to obtain a simpler proof of Exercise 24 of Section 5.4: IfT is a diagonalizable linear operator on a finite-dimensional vec-tor space V and W is a T-invariant subspace of V, then TW isdiagonalizable.

8. Use Theorem 7.4 to prove that the vectors v1, v2, . . . , vk in the statementof Theorem 7.3 are unique.

9. Let T be a linear operator on a finite-dimensional vector space V whosecharacteristic polynomial splits.(a) Prove Theorem 7.5(b).(b) Suppose that β is a Jordan canonical basis for T, and let λ be an

eigenvalue of T. Let β′ = β ∩ Kλ. Prove that β′ is a basis for Kλ.

10. Let T be a linear operator on a finite-dimensional vector space whosecharacteristic polynomial splits, and let λ be an eigenvalue of T.(a) Suppose that γ is a basis for Kλ consisting of the union of q disjoint

cycles of generalized eigenvectors. Prove that q ≤ dim(Eλ).(b) Let β be a Jordan canonical basis for T, and suppose that J = [T]β

has q Jordan blocks with λ in the diagonal positions. Prove thatq ≤ dim(Eλ).


Exercises 12 and 13 are concerned with direct sums of matrices, defined inSection 5.4 on page 320.


13. Let T be a linear operator on a finite-dimensional vector space V suchthat the characteristic polynomial of T splits, and let λ1, λ2, . . . , λk bethe distinct eigenvalues of T. For each i, let Ji be the Jordan canonicalform of the restriction of T to Kλi . Prove that

J = J1 ⊕ J2 ⊕ · · · ⊕ Jk

is the Jordan canonical form of J .

Sec. 7.2 The Jordan Canonical Form II 497

7.2 THE JORDAN CANONICAL FORM II

For the purposes of this section, we fix a linear operator T on an n-dimensionalvector space V such that the characteristic polynomial of T splits. Letλ1, λ2, . . . , λk be the distinct eigenvalues of T.

By Theorem 7.7 (p. 490), each generalized eigenspace Kλicontains an

ordered basis βi consisting of a union of disjoint cycles of generalized eigen-vectors corresponding to λi. So by Theorems 7.4(b) (p. 487) and 7.5 (p. 489),

the union β =k⋃

i=1

βi is a Jordan canonical basis for T. For each i, let Ti

be the restriction of T to Kλi, and let Ai = [Ti]βi

. Then Ai is the Jordancanonical form of Ti, and

J = [T]β =

⎛⎜⎜⎜⎝A1 O · · · OO A2 · · · O...

......

O O · · · Ak

⎞⎟⎟⎟⎠is the Jordan canonical form of T. In this matrix, each O is a zero matrix ofappropriate size.

In this section, we compute the matrices Ai and the bases βi, therebycomputing J and β as well. While developing a method for finding J , itbecomes evident that in some sense the matrices Ai are unique.

To aid in formulating the uniqueness theorem for J , we adopt the followingconvention: The basis βi for Kλi

will henceforth be ordered in such a waythat the cycles appear in order of decreasing length. That is, if βi is a disjointunion of cycles γ1, γ2, . . . , γni and if the length of the cycle γj is pj , we indexthe cycles so that p1 ≥ p2 ≥ · · · ≥ pni

. This ordering of the cycles limits thepossible orderings of vectors in βi, which in turn determines the matrix Ai.It is in this sense that Ai is unique. It then follows that the Jordan canonicalform for T is unique up to an ordering of the eigenvalues of T. As we willsee, there is no uniqueness theorem for the bases βi or for β. Specifically, weshow that for each i, the number ni of cycles that form βi, and the length pj

(j = 1, 2, . . . , ni) of each cycle, is completely determined by T.

Example 1

To illustrate the discussion above, suppose that, for some i, the ordered basisβi for Kλi is the union of four cycles βi = γ1 ∪ γ2 ∪ γ3 ∪ γ4 with respective


lengths p1 = 3, p2 = 3, p3 = 2, and p4 = 1. Then

Ai =

⎛⎜⎜⎜⎜⎜⎜⎜⎜⎜⎜⎜⎜⎝

λi 1 0 0 0 0 0 0 00 λi 1 0 0 0 0 0 00 0 λi 0 0 0 0 0 00 0 0 λi 1 0 0 0 00 0 0 0 λi 1 0 0 00 0 0 0 0 λi 0 0 00 0 0 0 0 0 λi 1 00 0 0 0 0 0 0 λi 00 0 0 0 0 0 0 0 λi

⎞⎟⎟⎟⎟⎟⎟⎟⎟⎟⎟⎟⎟⎠. ♦

To help us visualize each of the matrices Ai and ordered bases βi, weuse an array of dots called a dot diagram of Ti, where Ti is the restrictionof T to Kλi

. Suppose that βi is a disjoint union of cycles of generalizedeigenvectors γ1, γ2, . . . , γni with lengths p1 ≥ p2 ≥ · · · ≥ pni , respectively.The dot diagram of Ti contains one dot for each vector in βi, and the dotsare configured according to the following rules.

1. The array consists of ni columns (one column for each cycle).2. Counting from left to right, the jth column consists of the pj dots that

correspond to the vectors of γj starting with the initial vector at thetop and continuing down to the end vector.

Denote the end vectors of the cycles by v1, v2, . . . , vni. In the following

dot diagram of Ti, each dot is labeled with the name of the vector in βi towhich it corresponds.

• (T − λiI)p1−1(v1) • (T − λiI)p2−1(v2) · · · • (T − λiI)pni−1(vni

)• (T − λiI)p1−2(v1) • (T − λiI)p2−2(v2) · · · • (T − λiI)pni

−2(vni)...

......• (T − λiI)(vni

)• vni

• (T − λiI)(v2)• v2

• (T − λiI)(v1)• v1

Notice that the dot diagram of Ti has ni columns (one for each cycle) andp1 rows. Since p1 ≥ p2 ≥ · · · ≥ pni

, the columns of the dot diagram becomeshorter (or at least not longer) as we move from left to right.

Now let rj denote the number of dots in the jth row of the dot diagram.Observe that r1 ≥ r2 ≥ · · · ≥ rp1 . Furthermore, the diagram can be re-constructed from the values of the ri’s. The proofs of these facts, which arecombinatorial in nature, are treated in Exercise 9.


In Example 1, with ni = 4, p1 = p2 = 3, p3 = 2, and p4 = 1, the dotdiagram of Ti is as follows:

• • • •• • •• •

Here r1 = 4, r2 = 3, and r3 = 2.We now devise a method for computing the dot diagram of Ti using the

ranks of linear operators determined by T and λi. Hence the dot diagramis completely determined by T, from which it follows that it is unique. Onthe other hand, βi is not unique. For example, see Exercise 8. (It is for thisreason that we associate the dot diagram with Ti rather than with βi.)

To determine the dot diagram of Ti, we devise a method for computingeach rj , the number of dots in the jth row of the dot diagram, using only Tand λi. The next three results give us the required method. To facilitate ourarguments, we fix a basis βi for Kλi

so that βi is a disjoint union of ni cyclesof generalized eigenvectors with lengths p1 ≥ p2 ≥ · · · ≥ pni .

Theorem 7.9. For any positive integer r, the vectors in βi that areassociated with the dots in the first r rows of the dot diagram of Ti constitutea basis for N((T− λiI)r). Hence the number of dots in the first r rows of thedot diagram equals nullity((T − λiI)r).

Proof. Clearly, N((T−λiI)r) ⊆ Kλi, and Kλi

is invariant under (T−λiI)r.Let U denote the restriction of (T − λiI)r to Kλi

. By the preceding remarks,N((T − λiI)r) = N(U), and hence it suffices to establish the theorem for U.Now define

S1 = {x ∈ βi : U(x) = 0} and S2 = {x ∈ βi : U(x) �= 0}.Let a and b denote the number of vectors in S1 and S2, respectively, and letmi = dim(Kλi

). Then a + b = mi. For any x ∈ βi, x ∈ S1 if and only if x isone of the first r vectors of a cycle, and this is true if and only if x correspondsto a dot in the first r rows of the dot diagram. Hence a is the number of dotsin the first r rows of the dot diagram. For any x ∈ S2, the effect of applyingU to x is to move the dot corresponding to x exactly r places up its column toanother dot. It follows that U maps S2 in a one-to-one fashion into βi. Thus{U(x) : x ∈ S2} is a basis for R(U) consisting of b vectors. Hence rank(U) = b,and so nullity(U) = mi − b = a. But S1 is a linearly independent subset ofN(U) consisting of a vectors; therefore S1 is a basis for N(U).

In the case that r = 1, Theorem 7.9 yields the following corollary.

Corollary. The dimension of Eλiis ni. Hence in a Jordan canonical form

of T, the number of Jordan blocks corresponding to λi equals the dimensionof Eλi .


Proof. Exercise.

We are now able to devise a method for describing the dot diagram interms of the ranks of operators.

Theorem 7.10. Let rj denote the number of dots in the jth row of thedot diagram of Ti, the restriction of T to Kλi

. Then the following statementsare true.

(a) r1 = dim(V) − rank(T − λiI).(b) rj = rank((T − λiI)j−1) − rank((T − λiI)j) if j > 1.

Proof. By Theorem 7.9, for 1 ≤ j ≤ p1, we have

r1 + r2 + · · · + rj = nullity((T − λiI)j)

= dim(V) − rank((T − λiI)j).

Hence

r1 = dim(V) − rank(T − λiI),

and for j > 1,

rj = (r1 + r2 + · · · + rj) − (r1 + r2 + · · · + rj−1)

= [dim(V) − rank((T − λiI)j)] − [dim(V) − rank((T − λiI)j−1)]

= rank((T − λiI)j−1) − rank((T − λiI)j).

Theorem 7.10 shows that the dot diagram of Ti is completely determinedby T and λi. Hence we have proved the following result.

Corollary. For any eigenvalue λi of T, the dot diagram of Ti is unique.Thus, subject to the convention that the cycles of generalized eigenvectorsfor the bases of each generalized eigenspace are listed in order of decreasinglength, the Jordan canonical form of a linear operator or a matrix is uniqueup to the ordering of the eigenvalues.

We apply these results to find the Jordan canonical forms of two matricesand a linear operator.

Example 2

Let

A =

⎛⎜⎜⎝2 −1 0 10 3 −1 00 1 1 00 −1 0 3

⎞⎟⎟⎠ .


We find the Jordan canonical form of A and a Jordan canonical basis for thelinear operator T = LA. The characteristic polynomial of A is

det(A − tI) = (t − 2)3(t − 3).

Thus A has two distinct eigenvalues, λ1 = 2 and λ2 = 3, with multiplicities 3and 1, respectively. Let T1 and T2 be the restrictions of LA to the generalizedeigenspaces Kλ1 and Kλ2 , respectively.

Suppose that β1 is a Jordan canonical basis for T1. Since λ1 has multi-plicity 3, it follows that dim(Kλ1) = 3 by Theorem 7.4(c) (p. 487); hence thedot diagram of T1 has three dots. As we did earlier, let rj denote the numberof dots in the jth row of this dot diagram. Then, by Theorem 7.10,

r1 = 4 − rank(A − 2I) = 4 − rank

⎛⎜⎜⎝0 −1 0 10 1 −1 00 1 −1 00 −1 0 1

⎞⎟⎟⎠ = 4 − 2 = 2,

and

r2 = rank(A − 2I) − rank((A − 2I)2) = 2 − 1 = 1.

(Actually, the computation of r2 is unnecessary in this case because r1 = 2 andthe dot diagram only contains three dots.) Hence the dot diagram associatedwith β1 is

• ••

So

A1 = [T1]β1 =

⎛⎝2 1 00 2 00 0 2

⎞⎠ .

Since λ2 = 3 has multiplicity 1, it follows that dim(Kλ2) = 1, and conse-quently any basis β2 for Kλ2 consists of a single eigenvector corresponding toλ2 = 3. Therefore

A2 = [T2]β2 = (3).

Setting β = β1 ∪ β2, we have

J = [LA]β =

⎛⎜⎜⎝2 1 0 00 2 0 00 0 2 00 0 0 3

⎞⎟⎟⎠ ,


and so J is the Jordan canonical form of A.

We now find a Jordan canonical basis for T = LA. We begin by determin-ing a Jordan canonical basis β1 for T1. Since the dot diagram of T1 has twocolumns, each corresponding to a cycle of generalized eigenvectors, there aretwo such cycles. Let v1 and v2 denote the end vectors of the first and secondcycles, respectively. We reprint below the dot diagram with the dots labeledwith the names of the vectors to which they correspond.

• (T − 2I)(v1) • v2

• v1

From this diagram we see that v1 ∈ N((T − 2I)2) but v1 /∈ N(T − 2I). Now

A − 2I =

⎛⎜⎜⎝0 −1 0 10 1 −1 00 1 −1 00 −1 0 1

⎞⎟⎟⎠ and (A − 2I)2 =

⎛⎜⎜⎝0 −2 1 10 0 0 00 0 0 00 −2 1 1

⎞⎟⎟⎠ .

It is easily seen that ⎧⎪⎪⎨⎪⎪⎩⎛⎜⎜⎝

1000

⎞⎟⎟⎠ ,

⎛⎜⎜⎝0120

⎞⎟⎟⎠ ,

⎛⎜⎜⎝0102

⎞⎟⎟⎠⎫⎪⎪⎬⎪⎪⎭

is a basis for N((T − 2I)2) = Kλ1 . Of these three basis vectors, the last twodo not belong to N(T− 2I), and hence we select one of these for v1. Supposethat we choose

v1 =

⎛⎜⎜⎝0120

⎞⎟⎟⎠ .

Then

(T − 2I)(v1) = (A − 2I)(v1) =

⎛⎜⎜⎝0 −1 0 10 1 −1 00 1 −1 00 −1 0 1

⎞⎟⎟⎠⎛⎜⎜⎝

0120

⎞⎟⎟⎠ =

⎛⎜⎜⎝−1−1−1−1

⎞⎟⎟⎠ .

Now simply choose v2 to be a vector in Eλ1 that is linearly independent of(T − 2I)(v1); for example, select

v2 =

⎛⎜⎜⎝1000

⎞⎟⎟⎠ .


Thus we have associated the Jordan canonical basis

β1 =

⎧⎪⎪⎨⎪⎪⎩⎛⎜⎜⎝−1−1−1−1

⎞⎟⎟⎠ ,

⎛⎜⎜⎝0120

⎞⎟⎟⎠ ,

⎛⎜⎜⎝1000

⎞⎟⎟⎠⎫⎪⎪⎬⎪⎪⎭

with the dot diagram in the following manner.

•

⎛⎜⎜⎝−1−1−1−1

⎞⎟⎟⎠ •

⎛⎜⎜⎝1000

⎞⎟⎟⎠

•

⎛⎜⎜⎝0120

⎞⎟⎟⎠By Theorem 7.6 (p. 489), the linear independence of β1 is guaranteed since

v2 was chosen to be linearly independent of (T − 2I)(v1).

Since λ2 = 3 has multiplicity 1, dim(Kλ2) = dim(Eλ2) = 1. Hence anyeigenvector of LA corresponding to λ2 = 3 constitutes an appropriate basisβ2. For example,

β2 =

⎧⎪⎪⎨⎪⎪⎩⎛⎜⎜⎝

1001

⎞⎟⎟⎠⎫⎪⎪⎬⎪⎪⎭ .

Thus

β = β1 ∪ β2 =

⎧⎪⎪⎨⎪⎪⎩⎛⎜⎜⎝−1−1−1−1

⎞⎟⎟⎠ ,

⎛⎜⎜⎝0120

⎞⎟⎟⎠ ,

⎛⎜⎜⎝1000

⎞⎟⎟⎠ ,

⎛⎜⎜⎝1001

⎞⎟⎟⎠⎫⎪⎪⎬⎪⎪⎭

is a Jordan canonical basis for LA.

Notice that if

Q =

⎛⎜⎜⎝−1 0 1 1−1 1 0 0−1 2 0 0−1 0 0 1

⎞⎟⎟⎠ ,

then J = Q−1AQ. ♦


Example 3

Let

A =

⎛⎜⎜⎝2 −4 2 2

−2 0 1 3−2 −2 3 3−2 −6 3 7

⎞⎟⎟⎠ .

We find the Jordan canonical form J of A, a Jordan canonical basis for LA,and a matrix Q such that J = Q−1AQ.

The characteristic polynomial of A is det(A − tI) = (t − 2)2(t − 4)2. LetT = LA, λ1 = 2, and λ2 = 4, and let Ti be the restriction of LA to Kλi

fori = 1, 2.

We begin by computing the dot diagram of T1. Let r1 denote the numberof dots in the first row of this diagram. Then

r1 = 4 − rank(A − 2I) = 4 − 2 = 2;

hence the dot diagram of T1 is as follows.

• •Therefore

A1 = [T1]β1 =(

2 00 2

),

where β1 is any basis corresponding to the dots. In this case, β1 is an arbitrarybasis for Eλ1 = N(T − 2I), for example,

β1 =

⎧⎪⎪⎨⎪⎪⎩⎛⎜⎜⎝

2102

⎞⎟⎟⎠ ,

⎛⎜⎜⎝0120

⎞⎟⎟⎠⎫⎪⎪⎬⎪⎪⎭ .

Next we compute the dot diagram of T2. Since rank(A − 4I) = 3, thereis only 4 − 3 = 1 dot in the first row of the diagram. Since λ2 = 4 hasmultiplicity 2, we have dim(Kλ2) = 2, and hence this dot diagram has thefollowing form:

••

Thus

A2 = [T2]β2 =(

4 10 4

),


where β2 is any basis for Kλ2 corresponding to the dots. In this case, β2

is a cycle of length 2. The end vector of this cycle is a vector v ∈ Kλ2 =N((T − 4I)2) such that v /∈ N(T − 4I). One way of finding such a vector wasused to select the vector v1 in Example 2. In this example, we illustrateanother method. A simple calculation shows that a basis for the null spaceof LA − 4I is ⎧⎪⎪⎨⎪⎪⎩

⎛⎜⎜⎝0111

⎞⎟⎟⎠⎫⎪⎪⎬⎪⎪⎭ .

Choose v to be any solution to the system of linear equations

(A − 4I)x =

⎛⎜⎜⎝0111

⎞⎟⎟⎠ ,

for example,

v =

⎛⎜⎜⎝1

−1−1

0

⎞⎟⎟⎠ .

Thus

β2 = {(LA − 4I)(v), v} =

⎧⎪⎪⎨⎪⎪⎩⎛⎜⎜⎝

0111

⎞⎟⎟⎠ ,

⎛⎜⎜⎝1

−1−1

0

⎞⎟⎟⎠⎫⎪⎪⎬⎪⎪⎭ .

Therefore

β = β1 ∪ β2 =

⎧⎪⎪⎨⎪⎪⎩⎛⎜⎜⎝

2102

⎞⎟⎟⎠ ,

⎛⎜⎜⎝0120

⎞⎟⎟⎠ ,

⎛⎜⎜⎝0111

⎞⎟⎟⎠ ,

⎛⎜⎜⎝1

−1−1

0

⎞⎟⎟⎠⎫⎪⎪⎬⎪⎪⎭

is a Jordan canonical basis for LA. The corresponding Jordan canonical formis given by

J = [LA]β =(

A1 OO A2

)=

⎛⎜⎜⎝2 0 0 00 2 0 00 0 4 10 0 0 4

⎞⎟⎟⎠ .


Finally, we define Q to be the matrix whose columns are the vectors of βlisted in the same order, namely,

Q =

⎛⎜⎜⎝2 0 0 11 1 1 −10 2 1 −12 0 1 0

⎞⎟⎟⎠ .

Then J = Q−1AQ. ♦Example 4

Let V be the vector space of polynomial functions in two real variables xand y of degree at most 2. Then V is a vector space over R and α ={1, x, y, x2, y2, xy} is an ordered basis for V. Let T be the linear operatoron V defined by

T(f(x, y)) =∂

∂xf(x, y).

For example, if f(x, y) = x + 2x2 − 3xy + y, then

T(f(x, y)) =∂

∂x(x + 2x2 − 3xy + y) = 1 + 4x − 3y.

We find the Jordan canonical form and a Jordan canonical basis for T.

Let A = [T]α. Then

A =

⎛⎜⎜⎜⎜⎜⎜⎝0 1 0 0 0 00 0 0 2 0 00 0 0 0 0 10 0 0 0 0 00 0 0 0 0 00 0 0 0 0 0

⎞⎟⎟⎟⎟⎟⎟⎠ ,


det(A − tI) = det

⎛⎜⎜⎜⎜⎜⎜⎝−t 1 0 0 0 00 −t 0 2 0 00 0 −t 0 0 10 0 0 −t 0 00 0 0 0 −t 00 0 0 0 0 −t

⎞⎟⎟⎟⎟⎟⎟⎠ = t6.

Thus λ = 0 is the only eigenvalue of T, and Kλ = V. For each j, let rj denotethe number of dots in the jth row of the dot diagram of T. By Theorem 7.10,

r1 = 6 − rank(A) = 6 − 3 = 3,


and since

A2 =

⎛⎜⎜⎜⎜⎜⎜⎝0 0 0 2 0 00 0 0 0 0 00 0 0 0 0 00 0 0 0 0 00 0 0 0 0 00 0 0 0 0 0

⎞⎟⎟⎟⎟⎟⎟⎠ ,

r2 = rank(A) − rank(A2) = 3 − 1 = 2.

Because there are a total of six dots in the dot diagram and r1 = 3 andr2 = 2, it follows that r3 = 1. So the dot diagram of T is

• • •• ••

We conclude that the Jordan canonical form of T is

J =

⎛⎜⎜⎜⎜⎜⎜⎝0 1 0 0 0 00 0 1 0 0 00 0 0 0 0 00 0 0 0 1 00 0 0 0 0 00 0 0 0 0 0

⎞⎟⎟⎟⎟⎟⎟⎠ .

We now find a Jordan canonical basis for T. Since the first column of thedot diagram of T consists of three dots, we must find a polynomial f1(x, y)

such that∂2

∂x2f1(x, y) �= 0 . Examining the basis α = {1, x, y, x2, y2, xy} for

Kλ = V, we see that x2 is a suitable candidate. Setting f1(x, y) = x2, we seethat

(T − λI)(f1(x, y)) = T(f1(x, y)) =∂

∂x(x2) = 2x

and

(T − λI)2(f1(x, y)) = T2(f1(x, y)) =∂2

∂x2(x2) = 2.

Likewise, since the second column of the dot diagram consists of two dots, wemust find a polynomial f2(x, y) such that

∂

∂x(f2(x, y)) �= 0 , but

∂2

∂x2(f2(x, y)) = 0 .


Since our choice must be linearly independent of the polynomials alreadychosen for the first cycle, the only choice in α that satisfies these constraintsis xy. So we set f2(x, y) = xy. Thus

(T − λI)(f2(x, y)) = T(f2(x, y)) =∂

∂x(xy) = y.

Finally, the third column of the dot diagram consists of a single polynomialthat lies in the null space of T. The only remaining polynomial in α is y2,and it is suitable here. So set f3(x, y) = y2. Therefore we have identifiedpolynomials with the dots in the dot diagram as follows.

• 2 • y • y2

• 2x •xy•x2

Thus β = {2, 2x, x2, y, xy, y2} is a Jordan canonical basis for T. ♦In the three preceding examples, we relied on our ingenuity and the con-

text of the problem to find Jordan canonical bases. The reader can do thesame in the exercises. We are successful in these cases because the dimen-sions of the generalized eigenspaces under consideration are small. We donot attempt, however, to develop a general algorithm for computing Jordancanonical bases, although one could be devised by following the steps in theproof of the existence of such a basis (Theorem 7.7 p. 490).

The following result may be thought of as a corollary to Theorem 7.10.

Theorem 7.11. Let A and B be n × n matrices, each having Jordancanonical forms computed according to the conventions of this section. ThenA and B are similar if and only if they have (up to an ordering of theireigenvalues) the same Jordan canonical form.

Proof. If A and B have the same Jordan canonical form J , then A and Bare each similar to J and hence are similar to each other.

Conversely, suppose that A and B are similar. Then A and B have thesame eigenvalues. Let JA and JB denote the Jordan canonical forms of A andB, respectively, with the same ordering of their eigenvalues. Then A is similarto both JA and JB , and therefore, by the corollary to Theorem 2.23 (p. 115),JA and JB are matrix representations of LA. Hence JA and JB are Jordancanonical forms of LA. Thus JA = JB by the corollary to Theorem 7.10.

Example 5

We determine which of the matrices

A =

⎛⎝−3 3 −2−7 6 −3

1 −1 2

⎞⎠ , B =

⎛⎝ 0 1 −1−4 4 −2−2 1 1

⎞⎠ ,


C =

⎛⎝ 0 −1 −1−3 −1 −2

7 5 6

⎞⎠ , and D =

⎛⎝0 1 20 1 10 0 2

⎞⎠are similar. Observe that A, B, and C have the same characteristic poly-nomial −(t − 1)(t − 2)2, whereas D has −t(t − 1)(t − 2) as its characteristicpolynomial. Because similar matrices have the same characteristic polynomi-als, D cannot be similar to A, B, or C. Let JA, JB , and JC be the Jordancanonical forms of A, B, and C, respectively, using the ordering 1, 2 for theircommon eigenvalues. Then (see Exercise 4)

JA =

⎛⎝1 0 00 2 10 0 2

⎞⎠ , JB =

⎛⎝1 0 00 2 00 0 2

⎞⎠ , and JC =

⎛⎝1 0 00 2 10 0 2

⎞⎠ .

Since JA = JC , A is similar to C. Since JB is different from JA and JC , B issimilar to neither A nor C. ♦

The reader should observe that any diagonal matrix is a Jordan canonicalform. Thus a linear operator T on a finite-dimensional vector space V is diag-onalizable if and only if its Jordan canonical form is a diagonal matrix. HenceT is diagonalizable if and only if the Jordan canonical basis for T consists ofeigenvectors of T. Similar statements can be made about matrices. Thus,of the matrices A, B, and C in Example 5, A and C are not diagonalizablebecause their Jordan canonical forms are not diagonal matrices.

EXERCISES

1. Label the following statements as true or false. Assume that the char-acteristic polynomial of the matrix or linear operator splits.

(a) The Jordan canonical form of a diagonal matrix is the matrix itself.(b) Let T be a linear operator on a finite-dimensional vector space V

that has a Jordan canonical form J . If β is any basis for V, thenthe Jordan canonical form of [T]β is J .

(c) Linear operators having the same characteristic polynomial aresimilar.

(d) Matrices having the same Jordan canonical form are similar.(e) Every matrix is similar to its Jordan canonical form.(f) Every linear operator with the characteristic polynomial

(−1)n(t − λ)n has the same Jordan canonical form.(g) Every linear operator on a finite-dimensional vector space has a

unique Jordan canonical basis.(h) The dot diagrams of a linear operator on a finite-dimensional vec-

tor space are unique.


2. Let T be a linear operator on a finite-dimensional vector space V suchthat the characteristic polynomial of T splits. Suppose that λ1 = 2,λ2 = 4, and λ3 = −3 are the distinct eigenvalues of T and that the dotdiagrams for the restriction of T to Kλi (i = 1, 2, 3) are as follows:

λ1 = 2 λ2 = 4 λ3 = −3• • •• ••

• •••

• •

Find the Jordan canonical form J of T.

3. Let T be a linear operator on a finite-dimensional vector space V withJordan canonical form⎛⎜⎜⎜⎜⎜⎜⎜⎜⎝

2 1 0 0 0 0 00 2 1 0 0 0 00 0 2 0 0 0 00 0 0 2 1 0 00 0 0 0 2 0 00 0 0 0 0 3 00 0 0 0 0 0 3

⎞⎟⎟⎟⎟⎟⎟⎟⎟⎠.

(a) Find the characteristic polynomial of T.(b) Find the dot diagram corresponding to each eigenvalue of T.(c) For which eigenvalues λi, if any, does Eλi = Kλi?(d) For each eigenvalue λi, find the smallest positive integer pi for

which Kλi= N((T − λiI)pi).

(e) Compute the following numbers for each i, where Ui denotes therestriction of T − λiI to Kλi

.(i) rank(Ui)(ii) rank(U2

i )(iii) nullity(Ui)(iv) nullity(U2

i )

4. For each of the matrices A that follow, find a Jordan canonical formJ and an invertible matrix Q such that J = Q−1AQ. Notice that thematrices in (a), (b), and (c) are those used in Example 5.

(a) A =

⎛⎝−3 3 −2−7 6 −3

1 −1 2

⎞⎠ (b) A =

⎛⎝ 0 1 −1−4 4 −2−2 1 1

⎞⎠

(c) A =

⎛⎝ 0 −1 −1−3 −1 −2

7 5 6

⎞⎠ (d) A =

⎛⎜⎜⎝0 −3 1 2

−2 1 −1 2−2 1 −1 2−2 −3 1 4

⎞⎟⎟⎠


5. For each linear operator T, find a Jordan canonical form J of T and aJordan canonical basis β for T.

(a) V is the real vector space of functions spanned by the set of real-valued functions {et, tet, t2et, e2t}, and T is the linear operator onV defined by T(f) = f ′.

(b) T is the linear operator on P3(R) defined by T(f(x)) = xf ′′(x).(c) T is the linear operator on P3(R) defined by

T(f(x)) = f ′′(x) + 2f(x).(d) T is the linear operator on M2×2(R) defined by

T(A) =(

3 10 3

)· A − At.

(e) T is the linear operator on M2×2(R) defined by

T(A) =(

3 10 3

)· (A − At).

(f) V is the vector space of polynomial functions in two real variablesx and y of degree at most 2, as defined in Example 4, and T is thelinear operator on V defined by

T(f(x, y)) =∂

∂xf(x, y) +

∂

∂yf(x, y).

6. Let A be an n×n matrix whose characteristic polynomial splits. Provethat A and At have the same Jordan canonical form, and conclude thatA and At are similar. Hint: For any eigenvalue λ of A and At and anypositive integer r, show that rank((A − λI)r) = rank((At − λI)r).

7. Let A be an n × n matrix whose characteristic polynomial splits, γ bea cycle of generalized eigenvectors corresponding to an eigenvalue λ,and W be the subspace spanned by γ. Define γ′ to be the ordered setobtained from γ by reversing the order of the vectors in γ.

(a) Prove that [TW]γ′ = ([TW]γ)t.(b) Let J be the Jordan canonical form of A. Use (a) to prove that J

and J t are similar.(c) Use (b) to prove that A and At are similar.

8. Let T be a linear operator on a finite-dimensional vector space, andsuppose that the characteristic polynomial of T splits. Let β be a Jordancanonical basis for T.

(a) Prove that for any nonzero scalar c, {cx : x ∈ β} is a Jordan canon-ical basis for T.


(b) Suppose that γ is one of the cycles of generalized eigenvectors thatforms β, and suppose that γ corresponds to the eigenvalue λ andhas length greater than 1. Let x be the end vector of γ, and let ybe a nonzero vector in Eλ. Let γ′ be the ordered set obtained fromγ by replacing x by x + y. Prove that γ′ is a cycle of generalizedeigenvectors corresponding to λ, and that if γ′ replaces γ in theunion that defines β, then the new union is also a Jordan canonicalbasis for T.

(c) Apply (b) to obtain a Jordan canonical basis for LA, where A is thematrix given in Example 2, that is different from the basis givenin the example.

9. Suppose that a dot diagram has k columns and m rows with pj dots incolumn j and ri dots in row i. Prove the following results.

(a) m = p1 and k = r1.(b) pj = max {i : ri ≥ j} for 1 ≤ j ≤ k and ri = max {j : pj ≥ i} for

1 ≤ i ≤ m. Hint: Use mathematical induction on m.(c) r1 ≥ r2 ≥ · · · ≥ rm.(d) Deduce that the number of dots in each column of a dot diagram

is completely determined by the number of dots in the rows.

10. Let T be a linear operator whose characteristic polynomial splits, andlet λ be an eigenvalue of T.

(a) Prove that dim(Kλ) is the sum of the lengths of all the blockscorresponding to λ in the Jordan canonical form of T.

(b) Deduce that Eλ = Kλ if and only if all the Jordan blocks corre-sponding to λ are 1 × 1 matrices.


Definitions. A linear operator T on a vector space V is called nilpotentif Tp = T0 for some positive integer p. An n×n matrix A is called nilpotentif Ap = O for some positive integer p.

11. Let T be a linear operator on a finite-dimensional vector space V, andlet β be an ordered basis for V. Prove that T is nilpotent if and only if[T]β is nilpotent.

12. Prove that any square upper triangular matrix with each diagonal entryequal to zero is nilpotent.

13. Let T be a nilpotent operator on an n-dimensional vector space V, andsuppose that p is the smallest positive integer for which Tp = T0. Provethe following results.

(a) N(Ti) ⊆ N(Ti+1) for every positive integer i.


(b) There is a sequence of ordered bases β1, β2, . . . , βp such that βi isa basis for N(Ti) and βi+1 contains βi for 1 ≤ i ≤ p − 1.

(c) Let β = βp be the ordered basis for N(Tp) = V in (b). Then [T]βis an upper triangular matrix with each diagonal entry equal tozero.

(d) The characteristic polynomial of T is (−1)ntn. Hence the charac-teristic polynomial of T splits, and 0 is the only eigenvalue of T.

14. Prove the converse of Exercise 13(d): If T is a linear operator on an n-dimensional vector space V and (−1)ntn is the characteristic polynomialof T, then T is nilpotent.

15. Give an example of a linear operator T on a finite-dimensional vectorspace such that T is not nilpotent, but zero is the only eigenvalue of T.Characterize all such operators.

16. Let T be a nilpotent linear operator on a finite-dimensional vector spaceV. Recall from Exercise 13 that λ = 0 is the only eigenvalue of T, andhence V = Kλ. Let β be a Jordan canonical basis for T. Prove that forany positive integer i, if we delete from β the vectors corresponding tothe last i dots in each column of a dot diagram of β, the resulting set isa basis for R(Ti). (If a column of the dot diagram contains fewer than idots, all the vectors associated with that column are removed from β.)

17. Let T be a linear operator on a finite-dimensional vector space V suchthat the characteristic polynomial of T splits, and let λ1, λ2, . . . , λk bethe distinct eigenvalues of T. Let S : V → V be the mapping defined by

S(x) = λ1v1 + λ2v2 + · · · + λkvk,

where, for each i, vi is the unique vector in Kλisuch that x = v1 +

v2 + · · ·+vk. (This unique representation is guaranteed by Theorem 7.3(p. 486) and Exercise 8 of Section 7.1.)

(a) Prove that S is a diagonalizable linear operator on V.(b) Let U = T − S. Prove that U is nilpotent and commutes with S,

that is, SU = US.

18. Let T be a linear operator on a finite-dimensional vector space V, andlet J be the Jordan canonical form of T. Let D be the diagonal matrixwhose diagonal entries are the diagonal entries of J , and let M = J−D.Prove the following results.

(a) M is nilpotent.(b) MD = DM .


(c) If p is the smallest positive integer for which Mp = O, then, forany positive integer r < p,

Jr = Dr + rDr−1M +r(r − 1)

2!Dr−2M2 + · · · + rDMr−1 + Mr,

and, for any positive integer r ≥ p,

Jr = Dr + rDr−1M +r(r − 1)

2!Dr−2M2 + · · ·

+r!

(r − p + 1)!(p − 1)!Dr−p+1Mp−1.

19. Let

J =

⎛⎜⎜⎜⎜⎜⎜⎜⎝

λ 1 0 · · · 00 λ 1 · · · 00 0 λ · · · 0...

......

...0 0 0 · · · 10 0 0 · · · λ

⎞⎟⎟⎟⎟⎟⎟⎟⎠be the m × m Jordan block corresponding to λ, and let N = J − λIm.Prove the following results:

(a) Nm = O, and for 1 ≤ r < m,

Nrij =

{1 if j = i + r

0 otherwise.

(b) For any integer r ≥ m,

Jr =

⎛⎜⎜⎜⎜⎜⎜⎜⎜⎜⎜⎜⎝

λr rλr−1 r(r − 1)2!

λr−2 · · · r(r − 1) · · · (r − m + 2)(m − 1)!

λr−m+1

0 λr rλr−1 · · · r(r − 1) · · · (r − m + 3)(m − 2)!

λr−m+2

......

......

0 0 0 · · · λr

⎞⎟⎟⎟⎟⎟⎟⎟⎟⎟⎟⎟⎠.

(c) limr→∞Jr exists if and only if one of the following holds:

(i) |λ| < 1.(ii) λ = 1 and m = 1.


(Note that limr→∞λr exists under these conditions. See the discus-

sion preceding Theorem 5.13 on page 285.) Furthermore, limr→∞Jr

is the zero matrix if condition (i) holds and is the 1× 1 matrix (1)if condition (ii) holds.

(d) Prove Theorem 5.13 on page 285.

The following definition is used in Exercises 20 and 21.

Definition. For any A ∈ Mn×n(C), define the norm of A by

‖A‖ = max {|Aij | : 1 ≤ i, j ≤ n}.

20. Let A, B ∈ Mn×n(C). Prove the following results.

(a) ‖A‖ ≥ 0 and ‖A‖ = 0 if and only if A = O.

(b) ‖cA‖ = |c| ·‖A‖ for any scalar c.

(c) ‖A + B‖ ≤ ‖A‖ + ‖B‖.(d) ‖AB‖ ≤ n‖A‖‖B‖.

21. Let A ∈ Mn×n(C) be a transition matrix. (See Section 5.3.) Since C isan algebraically closed field, A has a Jordan canonical form J to whichA is similar. Let P be an invertible matrix such that P−1AP = J .Prove the following results.

(a) ‖Am‖ ≤ 1 for every positive integer m.(b) There exists a positive number c such that ‖Jm‖ ≤ c for every

positive integer m.(c) Each Jordan block of J corresponding to the eigenvalue λ = 1 is a

1 × 1 matrix.(d) lim

m→∞Am exists if and only if 1 is the only eigenvalue of A with

absolute value 1.(e) Theorem 5.20(a) using (c) and Theorem 5.19.

The next exercise requires knowledge of absolutely convergent series as wellas the definition of eA for a matrix A. (See page 312.)

22. Use Exercise 20(d) to prove that eA exists for every A ∈ Mn×n(C).

23. Let x′ = Ax be a system of n linear differential equations, where x isan n-tuple of differentiable functions x1(t), x2(t), . . . , xn(t) of the realvariable t, and A is an n × n coefficient matrix as in Exercise 15 ofSection 5.2. In contrast to that exercise, however, do not assume thatA is diagonalizable, but assume that the characteristic polynomial of Asplits. Let λ1, λ2, . . . , λk be the distinct eigenvalues of A.


(a) Prove that if u is the end vector of a cycle of generalized eigenvec-tors of LA of length p and u corresponds to the eigenvalue λi, thenfor any polynomial f(t) of degree less than p, the function

eλit[f(t)(A − λiI)p−1 + f ′(t)(A − λiI)p−2 + · · · + f (p−1)(t)]u

is a solution to the system x′ = Ax.(b) Prove that the general solution to x′ = Ax is a sum of the functions

of the form given in (a), where the vectors u are the end vectors ofthe distinct cycles that constitute a fixed Jordan canonical basisfor LA.

24. Use Exercise 23 to find the general solution to each of the following sys-tems of linear equations, where x, y, and z are real-valued differentiablefunctions of the real variable t.

(a)x′ = 2x + yy′ = 2y − zz′ = 3z

(b)x′ = 2x + yy′ = 2y + zz′ = 2z

7.3 THE MINIMAL POLYNOMIAL

The Cayley-Hamilton theorem (Theorem 5.23 p. 317) tells us that for anylinear operator T on an n-dimensional vector space, there is a polynomialf(t) of degree n such that f(T) = T0, namely, the characteristic polynomialof T. Hence there is a polynomial of least degree with this property, and thisdegree is at most n. If g(t) is such a polynomial, we can divide g(t) by itsleading coefficient to obtain another polynomial p(t) of the same degree withleading coefficient 1, that is, p(t) is a monic polynomial. (See Appendix E.)

Definition. Let T be a linear operator on a finite-dimensional vectorspace. A polynomial p(t) is called a minimal polynomial of T if p(t) is amonic polynomial of least positive degree for which p(T) = T0.

The preceding discussion shows that every linear operator on a finite-dimensional vector space has a minimal polynomial. The next result showsthat it is unique.

Theorem 7.12. Let p(t) be a minimal polynomial of a linear operator Ton a finite-dimensional vector space V.

(a) For any polynomial g(t), if g(T) = T0, then p(t) divides g(t). In partic-ular, p(t) divides the characteristic polynomial of T.

(b) The minimal polynomial of T is unique.

Proof. (a) Let g(t) be a polynomial for which g(T) = T0. By the divisionalgorithm for polynomials (Theorem E.1 of Appendix E, p. 562), there existpolynomials q(t) and r(t) such that

g(t) = q(t)p(t) + r(t), (1)

Sec. 7.3 The Minimal Polynomial 517

where r(t) has degree less than the degree of p(t). Substituting T into (1)and using that g(T) = p(T) = T0, we have r(T) = T0. Since r(t) has degreeless than p(t) and p(t) is the minimal polynomial of T, r(t) must be the zeropolynomial. Thus (1) simplifies to g(t) = q(t)p(t), proving (a).

(b) Suppose that p1(t) and p2(t) are each minimal polynomials of T. Thenp1(t) divides p2(t) by (a). Since p1(t) and p2(t) have the same degree, we havethat p2(t) = cp1(t) for some nonzero scalar c. Because p1(t) and p2(t) aremonic, c = 1; hence p1(t) = p2(t).

The minimal polynomial of a linear operator has an obvious analog for amatrix.

Definition. Let A ∈ Mn×n(F ). The minimal polynomial p(t) of A isthe monic polynomial of least positive degree for which p(A) = O.

The following results are now immediate.

Theorem 7.13. Let T be a linear operator on a finite-dimensional vectorspace V, and let β be an ordered basis for V. Then the minimal polynomialof T is the same as the minimal polynomial of [T]β .

Proof. Exercise.

Corollary. For any A ∈ Mn×n(F ), the minimal polynomial of A is thesame as the minimal polynomial of LA.

Proof. Exercise.

In view of the preceding theorem and corollary, Theorem 7.12 and allsubsequent theorems in this section that are stated for operators are alsovalid for matrices.

For the remainder of this section, we study primarily minimal polynomialsof operators (and hence matrices) whose characteristic polynomials split. Amore general treatment of minimal polynomials is given in Section 7.4.

Theorem 7.14. Let T be a linear operator on a finite-dimensional vectorspace V, and let p(t) be the minimal polynomial of T. A scalar λ is aneigenvalue of T if and only if p(λ) = 0. Hence the characteristic polynomialand the minimal polynomial of T have the same zeros.

Proof. Let f(t) be the characteristic polynomial of T. Since p(t) dividesf(t), there exists a polynomial q(t) such that f(t) = q(t)p(t). If λ is a zero ofp(t), then

f(λ) = q(λ)p(λ) = q(λ) ·0 = 0.

So λ is a zero of f(t); that is, λ is an eigenvalue of T.


Conversely, suppose that λ is an eigenvalue of T, and let x ∈ V be aneigenvector corresponding to λ. By Exercise 22 of Section 5.1, we have

0 = T0(x) = p(T)(x) = p(λ)x.

Since x �= 0 , it follows that p(λ) = 0, and so λ is a zero of p(t).

The following corollary is immediate.

Corollary. Let T be a linear operator on a finite-dimensional vector spaceV with minimal polynomial p(t) and characteristic polynomial f(t). Supposethat f(t) factors as

f(t) = (λ1 − t)n1(λ2 − t)n2 · · · (λk − t)nk ,

where λ1, λ2, . . . , λk are the distinct eigenvalues of T. Then there exist inte-gers m1, m2, . . . , mk such that 1 ≤ mi ≤ ni for all i and

p(t) = (t − λ1)m1(t − λ2)m2 · · · (t − λk)mk .

Example 1

We compute the minimal polynomial of the matrix

A =

⎛⎝3 −1 00 2 01 −1 2

⎞⎠ .

Since A has the characteristic polynomial

f(t) = det

⎛⎝3 − t −1 00 2 − t 01 −1 2 − t

⎞⎠ = −(t − 2)2(t − 3),

the minimal polynomial of A must be either (t − 2)(t − 3) or (t − 2)2(t − 3)by the corollary to Theorem 7.14. Substituting A into p(t) = (t − 2)(t − 3),we find that p(A) = O; hence p(t) is the minimal polynomial of A. ♦Example 2


T(a, b) = (2a + 5b, 6a + b)

and β be the standard ordered basis for R2. Then

[T]β =(

2 56 1

),


f(t) = det(

2 − t 56 1 − t

)= (t − 7)(t + 4).

Thus the minimal polynomial of T is also (t − 7)(t + 4). ♦


Example 3

Let D be the linear operator on P2(R) defined by D(g(x)) = g′(x), the deriva-tive of g(x). We compute the minimal polynomial of T. Let β be the standardordered basis for P2(R). Then

[D]β =

⎛⎝0 1 00 0 20 0 0

⎞⎠ ,

and it follows that the characteristic polynomial of D is −t3. So by thecorollary to Theorem 7.14, the minimal polynomial of D is t, t2, or t3. SinceD2(x2) = 2 �= 0 , it follows that D2 �= T0; hence the minimal polynomial of Dmust be t3. ♦

In Example 3, it is easily verified that P2(R) is a D-cyclic subspace (ofitself). Here the minimal and characteristic polynomials are of the samedegree. This is no coincidence.

Theorem 7.15. Let T be a linear operator on an n-dimensional vectorspace V such that V is a T-cyclic subspace of itself. Then the characteristicpolynomial f(t) and the minimal polynomial p(t) have the same degree, andhence f(t) = (−1)np(t).

Proof. Since V is a T-cyclic space, there exists an x ∈ V such that

β = {x,T(x), . . . ,Tn−1(x)}is a basis for V (Theorem 5.22 p. 315). Let

g(t) = a0 + a1t + · · · + aktk,

be a polynomial of degree k < n. Then ak �= 0 and

g(T)(x) = a0x + a1T(x) + · · · + akTk(x),

and so g(T)(x) is a linear combination of the vectors of β having at least onenonzero coefficient, namely, ak. Since β is linearly independent, it followsthat g(T)(x) �= 0 ; hence g(T) �= T0. Therefore the minimal polynomial of Thas degree n, which is also the degree of the characteristic polynomial of T.

Theorem 7.15 gives a condition under which the degree of the minimalpolynomial of an operator is as large as possible. We now investigate theother extreme. By Theorem 7.14, the degree of the minimal polynomial of anoperator must be greater than or equal to the number of distinct eigenvaluesof the operator. The next result shows that the operators for which thedegree of the minimal polynomial is as small as possible are precisely thediagonalizable operators.


Theorem 7.16. Let T be a linear operator on a finite-dimensional vectorspace V. Then T is diagonalizable if and only if the minimal polynomial of Tis of the form

p(t) = (t − λ1)(t − λ2) · · · (t − λk),

where λ1, λ2, . . . , λk are the distinct eigenvalues of T.

Proof. Suppose that T is diagonalizable. Let λ1, λ2, . . . , λk be the distincteigenvalues of T, and define

p(t) = (t − λ1)(t − λ2) · · · (t − λk).

By Theorem 7.14, p(t) divides the minimal polynomial of T. Let β ={v1, v2, . . . , vn} be a basis for V consisting of eigenvectors of T, and con-sider any vi ∈ β. Then (T−λj I)(vi) = 0 for some eigenvalue λj . Since t−λj

divides p(t), there is a polynomial qj(t) such that p(t) = qj(t)(t− λj). Hence

p(T)(vi) = qj(T)(T − λj I)(vi) = 0 .

It follows that p(T) = T0, since p(T) takes each vector in a basis for V into0 . Therefore p(t) is the minimal polynomial of T.

Conversely, suppose that there are distinct scalars λ1, λ2, . . . , λk such thatthe minimal polynomial p(t) of T factors as

p(t) = (t − λ1)(t − λ2) · · · (t − λk).

By Theorem 7.14, the λi’s are eigenvalues of T. We apply mathematicalinduction on n = dim(V). Clearly T is diagonalizable for n = 1. Nowassume that T is diagonalizable whenever dim(V) < n for some n > 1, andlet dim(V) = n and W = R(T − λkI). Obviously W �= V, because λk is aneigenvalue of T. If W = {0}, then T = λkI, which is clearly diagonalizable.So suppose that 0 < dim(W) < n. Then W is T-invariant, and for any x ∈ W,

(T − λ1I)(T − λ2I) · · · (T − λk−1I)(x) = 0 .

It follows that the minimal polynomial of TW divides the polynomial(t − λ1)(t − λ2) · · · (t − λk−1). Hence by the induction hypothesis, TW isdiagonalizable. Furthermore, λk is not an eigenvalue of TW by Theorem 7.14.Therefore W ∩ N(T − λkI) = {0}. Now let β1 = {v1, v2, . . . , vm} be a ba-sis for W consisting of eigenvectors of TW (and hence of T), and let β2 ={w1, w2, . . . , wp} be a basis for N(T−λkI), the eigenspace of T correspondingto λk. Then β1 and β2 are disjoint by the previous comment. Moreover,m + p = n by the dimension theorem applied to T − λkI. We show thatβ = β1 ∪ β2 is linearly independent. Consider scalars a1, a2, . . . , am andb1, b2, . . . , bp such that

a1v1 + a2v2 + · · · + amvm + b1w1 + b2w2 + · · · + bpwp = 0 .


Let

x =m∑

i=1

aivi and y =p∑

i=1

biwi.

Then x ∈ W, y ∈ N(T − λkI), and x + y = 0 . It follows that x = −y ∈W ∩ N(T − λkI), and therefore x = 0 . Since β1 is linearly independent, wehave that a1 = a2 = · · · = am = 0. Similarly, b1 = b2 = · · · = bp = 0,and we conclude that β is a linearly independent subset of V consisting of neigenvectors. It follows that β is a basis for V consisting of eigenvectors of T,and consequently T is diagonalizable.

In addition to diagonalizable operators, there are methods for determin-ing the minimal polynomial of any linear operator on a finite-dimensionalvector space. In the case that the characteristic polynomial of the operatorsplits, the minimal polynomial can be described using the Jordan canonicalform of the operator. (See Exercise 13.) In the case that the characteristicpolynomial does not split, the minimal polynomial can be described using therational canonical form, which we study in the next section. (See Exercise 7of Section 7.4.)

Example 4

We determine all matrices A ∈ M2×2(R) for which A2 − 3A + 2I = O. Letg(t) = t2 − 3t + 2 = (t − 1)(t − 2). Since g(A) = O, the minimal polynomialp(t) of A divides g(t). Hence the only possible candidates for p(t) are t − 1,t− 2, and (t− 1)(t− 2). If p(t) = t− 1 or p(t) = t− 2, then A = I or A = 2I,respectively. If p(t) = (t−1)(t−2), then A is diagonalizable with eigenvalues1 and 2, and hence A is similar to(

1 00 2

). ♦

Example 5

Let A ∈ Mn×n(R) satisfy A3 = A. We show that A is diagonalizable. Letg(t) = t3 − t = t(t + 1)(t − 1). Then g(A) = O, and hence the minimalpolynomial p(t) of A divides g(t). Since g(t) has no repeated factors, neitherdoes p(t). Thus A is diagonalizable by Theorem 7.16. ♦

Example 6

In Example 3, we saw that the minimal polynomial of the differential operatorD on P2(R) is t3. Hence, by Theorem 7.16, D is not diagonalizable. ♦


EXERCISES

1. Label the following statements as true or false. Assume that all vectorspaces are finite-dimensional.

(a) Every linear operator T has a polynomial p(t) of largest degree forwhich p(T) = T0.

(b) Every linear operator has a unique minimal polynomial.(c) The characteristic polynomial of a linear operator divides the min-

imal polynomial of that operator.(d) The minimal and the characteristic polynomials of any diagonal-

izable operator are equal.(e) Let T be a linear operator on an n-dimensional vector space V, p(t)

be the minimal polynomial of T, and f(t) be the characteristicpolynomial of T. Suppose that f(t) splits. Then f(t) divides[p(t)]n.

(f) The minimal polynomial of a linear operator always has the samedegree as the characteristic polynomial of the operator.

(g) A linear operator is diagonalizable if its minimal polynomial splits.(h) Let T be a linear operator on a vector space V such that V is a

T-cyclic subspace of itself. Then the degree of the minimal poly-nomial of T equals dim(V).

(i) Let T be a linear operator on a vector space V such that T has ndistinct eigenvalues, where n = dim(V). Then the degree of theminimal polynomial of T equals n.

2. Find the minimal polynomial of each of the following matrices.

(a)(

2 11 2

)(b)

(1 10 1

)

(c)

⎛⎝4 −14 51 −4 21 −6 4

⎞⎠ (d)

⎛⎝ 3 0 12 2 2

−1 0 1

⎞⎠3. For each linear operator T on V, find the minimal polynomial of T.

(a) V = R2 and T(a, b) = (a + b, a − b)(b) V = P2(R) and T(g(x)) = g′(x) + 2g(x)(c) V = P2(R) and T(f(x)) = −xf ′′(x) + f ′(x) + 2f(x)(d) V = Mn×n(R) and T(A) = At. Hint: Note that T2 = I.

4. Determine which of the matrices and operators in Exercises 2 and 3 arediagonalizable.

5. Describe all linear operators T on R2 such that T is diagonalizable andT3 − 2T2 + T = T0.




8. Let T be a linear operator on a finite-dimensional vector space, and letp(t) be the minimal polynomial of T. Prove the following results.

(a) T is invertible if and only if p(0) �= 0.(b) If T is invertible and p(t) = tn + an−1t

n−1 + · · · + a1t + a0, then

T−1 = − 1a0

(Tn−1 + an−1T

n−2 + · · · + a2T + a1I).

9. Let T be a diagonalizable linear operator on a finite-dimensional vectorspace V. Prove that V is a T-cyclic subspace if and only if each of theeigenspaces of T is one-dimensional.

10. Let T be a linear operator on a finite-dimensional vector space V, andsuppose that W is a T-invariant subspace of V. Prove that the minimalpolynomial of TW divides the minimal polynomial of T.

11. Let g(t) be the auxiliary polynomial associated with a homogeneous lin-ear differential equation with constant coefficients (as defined in Section2.7), and let V denote the solution space of this differential equation.Prove the following results.

(a) V is a D-invariant subspace, where D is the differentiation operatoron C∞.

(b) The minimal polynomial of DV (the restriction of D to V) is g(t).(c) If the degree of g(t) is n, then the characteristic polynomial of DV

is (−1)ng(t).

Hint: Use Theorem 2.32 (p. 135) for (b) and (c).

12. Let D be the differentiation operator on P(R), the space of polynomialsover R. Prove that there exists no polynomial g(t) for which g(D) = T0.Hence D has no minimal polynomial.

13. Let T be a linear operator on a finite-dimensional vector space, andsuppose that the characteristic polynomial of T splits. Let λ1, λ2, . . . , λk

be the distinct eigenvalues of T, and for each i let pi be the order of thelargest Jordan block corresponding to λi in a Jordan canonical form ofT. Prove that the minimal polynomial of T is

(t − λ1)p1(t − λ2)p2 · · · (t − λk)pk .

The following exercise requires knowledge of direct sums (see Section 5.2).


14. Let T be linear operator on a finite-dimensional vector space V, andlet W1 and W2 be T-invariant subspaces of V such that V = W1 ⊕ W2.Suppose that p1(t) and p2(t) are the minimal polynomials of TW1 andTW2 , respectively. Prove or disprove that p1(t)p2(t) is the minimalpolynomial of T.

Exercise 15 uses the following definition.

Definition. Let T be a linear operator on a finite-dimensional vectorspace V, and let x be a nonzero vector in V. The polynomial p(t) is calleda T-annihilator of x if p(t) is a monic polynomial of least degree for whichp(T)(x) = 0 .

15.† Let T be a linear operator on a finite-dimensional vector space V, andlet x be a nonzero vector in V. Prove the following results.

(a) The vector x has a unique T-annihilator.(b) The T-annihilator of x divides any polynomial g(t) for which

g(T) = T0.(c) If p(t) is the T-annihilator of x and W is the T-cyclic subspace

generated by x, then p(t) is the minimal polynomial of TW, anddim(W) equals the degree of p(t).

(d) The degree of the T-annihilator of x is 1 if and only if x is aneigenvector of T.

16. T be a linear operator on a finite-dimensional vector space V, and letW1 be a T-invariant subspace of V. Let x ∈ V such that x /∈ W1. Provethe following results.

(a) There exists a unique monic polynomial g1(t) of least positive de-gree such that g1(T)(x) ∈ W1.

(b) If h(t) is a polynomial for which h(T)(x) ∈ W1, then g1(t) dividesh(t).

(c) g1(t) divides the minimal and the characteristic polynomials of T.(d) Let W2 be a T-invariant subspace of V such that W2 ⊆ W1, and

let g2(t) be the unique monic polynomial of least degree such thatg2(T)(x) ∈ W2. Then g1(t) divides g2(t).

7.4∗ THE RATIONAL CANONICAL FORM

Until now we have used eigenvalues, eigenvectors, and generalized eigenvec-tors in our analysis of linear operators with characteristic polynomials thatsplit. In general, characteristic polynomials need not split, and indeed, oper-ators need not have eigenvalues! However, the unique factorization theoremfor polynomials (see Appendix E) guarantees that the characteristic polyno-mial f(t) of any linear operator T on an n-dimensional vector space factors

Sec. 7.4 The Rational Canonical Form 525

uniquely as

f(t) = (−1)n(φ1(t))n1(φ2(t))n2 · · · (φk(t))nk ,

where the φi(t)’s (1 ≤ i ≤ k) are distinct irreducible monic polynomials andthe ni’s are positive integers. In the case that f(t) splits, each irreduciblemonic polynomial factor is of the form φi(t) = t−λi, where λi is an eigenvalueof T, and there is a one-to-one correspondence between eigenvalues of T andthe irreducible monic factors of the characteristic polynomial. In general,eigenvalues need not exist, but the irreducible monic factors always exist. Inthis section, we establish structure theorems based on the irreducible monicfactors of the characteristic polynomial instead of eigenvalues.

In this context, the following definition is the appropriate replacement foreigenspace and generalized eigenspace.

Definition. Let T be a linear operator on a finite-dimensional vectorspace V with characteristic polynomial

f(t) = (−1)n(φ1(t))n1(φ2(t))n2 · · · (φk(t))nk ,

where the φi(t)’s (1 ≤ i ≤ k) are distinct irreducible monic polynomials andthe ni’s are positive integers. For 1 ≤ i ≤ k, we define the subset Kφi of V by

Kφi= {x ∈ V : (φi(T))p(x) = 0 for some positive integer p}.

We show that each Kφiis a nonzero T-invariant subspace of V. Note that

if φi(t) = t − λ is of degree one, then Kφi is the generalized eigenspace of Tcorresponding to the eigenvalue λ.

Having obtained suitable generalizations of the related concepts of eigen-value and eigenspace, our next task is to describe a canonical form of a linearoperator suitable to this context. The one that we study is called the rationalcanonical form. Since a canonical form is a description of a matrix represen-tation of a linear operator, it can be defined by specifying the form of theordered bases allowed for these representations.

Here the bases of interest naturally arise from the generators of certaincyclic subspaces. For this reason, the reader should recall the definition ofa T-cyclic subspace generated by a vector and Theorem 5.22 (p. 315). Webriefly review this concept and introduce some new notation and terminology.

Let T be a linear operator on a finite-dimensional vector space V, and letx be a nonzero vector in V. We use the notation Cx for the T-cyclic subspacegenerated by x. Recall (Theorem 5.22) that if dim(Cx) = k, then the set

{x,T(x), T2(x), . . . ,Tk−1(x)}is an ordered basis for Cx. To distinguish this basis from all other orderedbases for Cx, we call it the T-cyclic basis generated by x and denote it by


βx. Let A be the matrix representation of the restriction of T to Cx relativeto the ordered basis βx. Recall from the proof of Theorem 5.22 that

A =

⎛⎜⎜⎜⎜⎜⎝0 0 · · · 0 −a0

1 0 · · · 0 −a1

0 1 · · · 0 −a2

......

......

0 0 · · · 1 −ak−1

⎞⎟⎟⎟⎟⎟⎠ ,

where

a0x + a1T(x) + · · · + ak−1Tk−1(x) + Tk(x) = 0 .

Furthermore, the characteristic polynomial of A is given by

det(A − tI) = (−1)k(a0 + a1t + · · · + ak−1tk−1 + tk).

The matrix A is called the companion matrix of the monic polynomialh(t) = a0 + a1t + · · · + ak−1t

k−1 + tk. Every monic polynomial has a com-panion matrix, and the characteristic polynomial of the companion matrix ofa monic polynomial g(t) of degree k is equal to (−1)kg(t). (See Exercise 19of Section 5.4.) By Theorem 7.15 (p. 519), the monic polynomial h(t) is alsothe minimal polynomial of A. Since A is the matrix representation of therestriction of T to Cx, h(t) is also the minimal polynomial of this restriction.By Exercise 15 of Section 7.3, h(t) is also the T-annihilator of x.

It is the object of this section to prove that for every linear operator Ton a finite-dimensional vector space V, there exists an ordered basis β for Vsuch that the matrix representation [T]β is of the form⎛⎜⎜⎜⎝

C1 O · · · OO C2 · · · O...

......

O O · · · Cr

⎞⎟⎟⎟⎠ ,

where each Ci is the companion matrix of a polynomial (φ(t))m such that φ(t)is a monic irreducible divisor of the characteristic polynomial of T and m isa positive integer. A matrix representation of this kind is called a rationalcanonical form of T. We call the accompanying basis a rational canonicalbasis for T.

The next theorem is a simple consequence of the following lemma, whichrelies on the concept of T-annihilator, introduced in the Exercises of Sec-tion 7.3.

Lemma. Let T be a linear operator on a finite-dimensional vector spaceV, let x be a nonzero vector in V, and suppose that the T-annihilator of xis of the form (φ(t))p for some irreducible monic polynomial φ(t). Then φ(t)divides the minimal polynomial of T, and x ∈ Kφ.


Proof. By Exercise 15(b) of Section 7.3, (φ(t))p divides the minimal poly-nomial of T. Therefore φ(t) divides the minimal polynomial of T. Further-more, x ∈ Kφ by the definition of Kφ.

Theorem 7.17. Let T be a linear operator on a finite-dimensional vectorspace V, and let β be an ordered basis for V. Then β is a rational canonicalbasis for T if and only if β is the disjoint union of T-cyclic bases βvi

, whereeach vi lies in Kφ for some irreducible monic divisor φ(t) of the characteristicpolynomial of T.

Proof. Exercise.

Example 1

Suppose that T is a linear operator on R8 and

β = {v1, v2, v3, v4, v5, v6, v7, v8}

is a rational canonical basis for T such that

C = [T]β =

⎛⎜⎜⎜⎜⎜⎜⎜⎜⎜⎜⎝

0 −3 0 0 0 0 0 01 1 0 0 0 0 0 00 0 0 0 0 −1 0 00 0 1 0 0 0 0 00 0 0 1 0 −2 0 00 0 0 0 1 0 0 00 0 0 0 0 0 0 −10 0 0 0 0 0 1 0

⎞⎟⎟⎟⎟⎟⎟⎟⎟⎟⎟⎠is a rational canonical form of T. In this case, the submatrices C1, C2, andC3 are the companion matrices of the polynomials φ1(t), (φ2(t))2, and φ2(t),respectively, where

φ1(t) = t2 − t + 3 and φ2(t) = t2 + 1.

In the context of Theorem 7.17, β is the disjoint union of the T-cyclic bases;that is,

β = βv1 ∪ βv3 ∪ βv7

= {v1, v2} ∪ {v3, v4, v5, v6} ∪ {v7, v8}.

By Exercise 40 of Section 5.4, the characteristic polynomial f(t) of T is theproduct of the characteristic polynomials of the companion matrices:

f(t) = φ1(t)(φ2(t))2φ2(t) = φ1(t)(φ2(t))3. ♦


The rational canonical form C of the operator T in Example 1 is con-structed from matrices of the form Ci, each of which is the companion matrixof some power of a monic irreducible divisor of the characteristic polynomialof T. Furthermore, each such divisor is used in this way at least once.

In the course of showing that every linear operator T on a finite dimen-sional vector space has a rational canonical form C, we show that the com-panion matrices Ci that constitute C are always constructed from powers ofthe monic irreducible divisors of the characteristic polynomial of T. A keyrole in our analysis is played by the subspaces Kφ, where φ(t) is an irreduciblemonic divisor of the minimal polynomial of T. Since the minimal polynomialof an operator divides the characteristic polynomial of the operator, every ir-reducible divisor of the former is also an irreducible divisor of the latter. Weeventually show that the converse is also true; that is, the minimal polynomialand the characteristic polynomial have the same irreducible divisors.

We begin with a result that lists several properties of irreducible divisorsof the minimal polynomial. The reader is advised to review the definition ofT-annihilator and the accompanying Exercise 15 of Section 7.3.

Theorem 7.18. Let T be a linear operator on a finite-dimensional vectorspace V, and suppose that

p(t) = (φ1(t))m1(φ2(t))m2 · · · (φk(t))mk

is the minimal polynomial of T, where the φi(t)’s (1 ≤ i ≤ k) are the distinctirreducible monic factors of p(t) and the mi’s are positive integers. Then thefollowing statements are true.

(a) Kφi is a nonzero T-invariant subspace of V for each i.(b) If x is a nonzero vector in some Kφi , then the T-annihilator of x is of

the form (φi(t))p for some integer p.(c) Kφi

∩ Kφj= {0} for i �= j. .

(d) Kφiis invariant under φj(T) for i �= j, and the restriction of φj(T) to

Kφiis one-to-one and onto.

(e) Kφi = N((φi(T))mi) for each i.

Proof. If k = 1, then (a), (b), and (e) are obvious, while (c) and (d) arevacuously true. Now suppose that k > 1.

(a) The proof that Kφiis a T-invariant subspace of V is left as an exer-

cise. Let fi(t) be the polynomial obtained from p(t) by omitting the factor(φi(t))mi . To prove that Kφi is nonzero, first observe that fi(t) is a proper di-visor of p(t); therefore there exists a vector z ∈ V such that x = fi(T)(z) �= 0 .Then x ∈ Kφi

because

(φi(T))mi(x) = (φi(T))mifi(T)(z) = p(T)(z) = 0 .

(b) Assume the hypothesis. Then (φi(T))q(x) = 0 for some positive in-teger q. Hence the T-annihilator of x divides (φi(t))q by Exercise 15(b) ofSection 7.3, and the result follows.


(c) Assume i �= j. Let x ∈ Kφi∩Kφj

, and suppose that x �= 0 . By (b), theT-annihilator of x is a power of both φi(t) and φj(t). But this is impossiblebecause φi(t) and φj(t) are relatively prime (see Appendix E). We concludethat x = 0 .

(d) Assume i �= j. Since Kφi is T-invariant, it is also φj(T)-invariant.Suppose that φj(T)(x) = 0 for some x ∈ Kφi

. Then x ∈ Kφi∩ Kφj

= {0}by (c). Therefore the restriction of φj(T) to Kφi

is one-to-one. Since V isfinite-dimensional, this restriction is also onto.

(e) Suppose that 1 ≤ i ≤ k. Clearly, N((φi(T))mi) ⊆ Kφi. Let fi(t) be the

polynomial defined in (a). Since fi(t) is a product of polynomials of the formφj(t) for j �= i, we have by (d) that the restriction of fi(T) to Kφi is onto.Let x ∈ Kφi . Then there exists y ∈ Kφi such that fi(T)(y) = x. Therefore

((φi(T))mi)(x) = ((φi(T))mi)fi(T)(y) = p(T)(y) = 0 ,

and hence x ∈ N((φi(T))mi). Thus Kφi= N((φi(T))mi).

Since a rational canonical basis for an operator T is obtained from a unionof T-cyclic bases, we need to know when such a union is linearly independent.The next major result, Theorem 7.19, reduces this problem to the study ofT-cyclic bases within Kφ, where φ(t) is an irreducible monic divisor of theminimal polynomial of T. We begin with the following lemma.

Lemma. Let T be a linear operator on a finite-dimensional vector spaceV, and suppose that

p(t) = (φ1(t))m1(φ2(t))m2 · · · (φk(t))mk

is the minimal polynomial of T, where the φi’s (1 ≤ i ≤ k) are the dis-tinct irreducible monic factors of p(t) and the mi’s are positive integers. For1 ≤ i ≤ k, let vi ∈ Kφi

be such that

v1 + v2 + · · · + vk = 0 . (2)

Then vi = 0 for all i.

Proof. The result is trivial if k = 1, so suppose that k > 1. Considerany i. Let fi(t) be the polynomial obtained from p(t) by omitting the factor(φi(t))mi . As a consequence of Theorem 7.18, fi(T) is one-to-one on Kφi

, andfi(T)(vj) = 0 for i �= j. Thus, applying fi(T) to (2), we obtain fi(T)(vi) = 0 ,from which it follows that vi = 0 .

Theorem 7.19. Let T be a linear operator on a finite-dimensional vectorspace V, and suppose that

p(t) = (φ1(t))m1(φ2(t))m2 · · · (φk(t))mk


is the minimal polynomial of T, where the φi’s (1 ≤ i ≤ k) are the dis-tinct irreducible monic factors of p(t) and the mi’s are positive integers. For1 ≤ i ≤ k, let Si be a linearly independent subset of Kφi

. Then(a) Si ∩ Sj = ∅ for i �= j(b) S1 ∪ S2 ∪ · · · ∪ Sk is linearly independent.

Proof. If k = 1, then (a) is vacuously true and (b) is obvious. Nowsuppose that k > 1. Then (a) follows immediately from Theorem 7.18(c).Furthermore, the proof of (b) is identical to the proof of Theorem 5.8 (p. 267)with the eigenspaces replaced by the subspaces Kφi

.

In view of Theorem 7.19, we can focus on bases of individual spaces ofthe form Kφ(t), where φ(t) is an irreducible monic divisor of the minimalpolynomial of T. The next several results give us ways to construct bases forthese spaces that are unions of T-cyclic bases. These results serve the dualpurposes of leading to the existence theorem for the rational canonical formand of providing methods for constructing rational canonical bases.

For Theorems 7.20 and 7.21 and the latter’s corollary, we fix a linearoperator T on a finite-dimensional vector space V and an irreducible monicdivisor φ(t) of the minimal polynomial of T.

Theorem 7.20. Let v1, v2, . . . , vk be distinct vectors in Kφ such that

S1 = βv1 ∪ βv2 ∪ · · · ∪ βvk

is linearly independent. For each i, choose wi ∈ V such that φ(T)(wi) = vi.Then

S2 = βw1 ∪ βw2 ∪ · · · ∪ βwk

is also linearly independent.

Proof. Consider any linear combination of vectors in S2 that sums to zero,say,

k∑i=1

ni∑j=0

aijTj(wi) = 0 . (3)

For each i, let fi(t) be the polynomial defined by

fi(t) =ni∑

j=0

aijtj .

Then (3) can be rewritten as

k∑i=1

fi(T)(wi) = 0 . (4)


Apply φ(T) to both sides of (4) to obtain

k∑i=1

φ(T)fi(T)(wi) =k∑

i=1

fi(T)φ(T)(wi) =k∑

i=1

fi(T)(vi) = 0 .

This last sum can be rewritten as a linear combination of the vectors in S1

so that each fi(T)(vi) is a linear combination of the vectors in βvi. Since S1

is linearly independent, it follows that

fi(T)(vi) = 0 for all i.

Therefore the T-annihilator of vi divides fi(t) for all i. (See Exercise 15 ofSection 7.3.) By Theorem 7.18(b), φ(t) divides the T-annihilator of vi, andhence φ(t) divides fi(t) for all i. Thus, for each i, there exists a polynomialgi(t) such that fi(t) = gi(t)φ(t). So (4) becomes

k∑i=1

gi(T)φ(T)(wi) =k∑

i=1

gi(T)(vi) = 0 .

Again, linear independence of S1 requires that

fi(T)(wi) = gi(T)(vi) = 0 for all i.

But fi(T)(wi) is the result of grouping the terms of the linear combinationin (3) that arise from the linearly independent set βwi . We conclude that foreach i, aij = 0 for all j. Therefore S2 is linearly independent.

We now show that Kφ has a basis consisting of a union of T-cycles.

Lemma. Let W be a T-invariant subspace of Kφ, and let β be a basis forW. Then the following statements are true.

(a) Suppose that x ∈ N(φ(T)), but x /∈ W. Then β ∪ βx is linearly inde-pendent.

(b) For some w1, w2, . . . , ws in N(φ(T)), β can be extended to the linearlyindependent set

β′ = β ∪ βw1 ∪ βw2 ∪ · · · ∪ βws ,

whose span contains N(φ(T)).

Proof. (a) Let β = {v1, v2, . . . , vk}, and suppose that

k∑i=1

aivi + z = 0 and z =d−1∑j=0

bjTj(x),


where d is the degree of φ(t). Then z ∈ Cx ∩ W, and hence Cz ⊆ Cx ∩ W.Suppose that z �= 0 . Then z has φ(t) as its T-annihilator, and therefore

d = dim(Cz) ≤ dim(Cx ∩ W) ≤ dim(Cx) = d.

It follows that Cx∩W = Cx, and consequently x ∈ W, contrary to hypothesis.Therefore z = 0 , from which it follows that bj = 0 for all j. Since β islinearly independent, it follows that ai = 0 for all i. Thus β ∪ βx is linearlyindependent.

(b) Suppose that W does not contain N(φ(T)). Choose a vector w1 ∈N(φ(t)) that is not in W. By (a), β1 = β ∪ βw1 is linearly independent.Let W1 = span(β1). If W1 does not contain N(φ(t)), choose a vector w2 inN(φ(t)), but not in W1, so that β2 = β1∪βw2 = β∪βw1 ∪βw2 is linearly inde-pendent. Continuing this process, we eventually obtain vectors w1, w2, . . . , ws

in N(φ(T)) such that the union

β′ = β ∪ βw1 ∪ βw2 ∪ · · · ∪ βws

is a linearly independent set whose span contains N(φ(T)).

Theorem 7.21. If the minimal polynomial of T is of the form p(t) =(φ(t))m, then there exists a rational canonical basis for T.

Proof. The proof is by mathematical induction on m. Suppose that m = 1.Apply (b) of the lemma to W = {0} to obtain a linearly independent subsetof V of the form βv1 ∪ βv2 ∪ · · · ∪ βvk

, whose span contains N(φ(T)). SinceV = N(φ(T)), this set is a rational canonical basis for V.

Now suppose that, for some integer m > 1, the result is valid whenever theminimal polynomial of T is of the form (φ(T))k, where k < m, and assumethat the minimal polynomial of T is p(t) = (φ(t))m. Let r = rank(φ(T)).Then R(φ(T)) is a T-invariant subspace of V, and the restriction of T to thissubspace has (φ(t))m−1 as its minimal polynomial. Therefore we may applythe induction hypothesis to obtain a rational canonical basis for the restrictionof T to R(T). Suppose that v1, v2, . . . , vk are the generating vectors of theT-cyclic bases that constitute this rational canonical basis. For each i, choosewi in V such that vi = φ(T)(wi). By Theorem 7.20, the union β of the sets βwi

is linearly independent. Let W = span(β). Then W contains R(φ(T)). Apply(b) of the lemma and adjoin additional T-cyclic bases βwk+1 , βwk+2 , . . . , βws

to β, if necessary, where wi is in N(φ(T)) for i ≥ k, to obtain a linearlyindependent set

β′ = βw1 ∪ βw2 ∪ · · · ∪ βwk∪ · · · ∪ βws

whose span W′ contains both W and N(φ(T)).


We show that W′ = V. Let U denote the restriction of φ(T) to W′, whichis φ(T)-invariant. By the way in which W′ was obtained from R(φ(T)), itfollows that R(U) = R(φ(T)) and N(U) = N(φ(T)). Therefore

dim(W′) = rank(U) + nullity(U)= rank(φ(T)) + nullity(φ(T))= dim(V).

Thus W′ = V, and β′ is a rational canonical basis for T.

Corollary. Kφ has a basis consisting of the union of T-cyclic bases.

Proof. Apply Theorem 7.21 to the restriction of T to Kφ.

We are now ready to study the general case.

Theorem 7.22. Every linear operator on a finite-dimensional vector spacehas a rational canonical basis and, hence, a rational canonical form.

Proof. Let T be a linear operator on a finite-dimensional vector space V,and let p(t) = (φ1(t))m1(φ2(t))m2 · · · (φk(t))mk be the minimal polynomialof T, where the φi(t)’s are the distinct irreducible monic factors of p(t) andmi > 0 for all i. The proof is by mathematical induction on k. The casek = 1 is proved in Theorem 7.21.

Suppose that the result is valid whenever the minimal polynomial containsfewer than k distinct irreducible factors for some k > 1, and suppose that p(t)contains k distinct factors. Let U be the restriction of T to the T-invariantsubspace W = R((φk(T)mk), and let q(t) be the minimal polynomial of U.Then q(t) divides p(t) by Exercise 10 of Section 7.3. Furthermore, φk(t) doesnot divide q(t). For otherwise, there would exist a nonzero vector x ∈ W suchthat φk(U)(x) = 0 and a vector y ∈ V such that x = (φk(T))mk(y). It followsthat (φk(T))mk+1(y) = 0 , and hence y ∈ Kφk

and x = (φk(T))mk(y) =0 by Theorem 7.18(e), a contradiction. Thus q(t) contains fewer than kdistinct irreducible divisors. So by the induction hypothesis, U has a rationalcanonical basis β1 consisting of a union of U-cyclic bases (and hence T-cyclicbases) of vectors from some of the subspaces Kφi

, 1 ≤ i ≤ k − 1. By thecorollary to Theorem 7.21, Kφk

has a basis β2 consisting of a union of T-cyclic bases. By Theorem 7.19, β1 and β2 are disjoint, and β = β1 ∪ β2 islinearly independent. Let s denote the number of vectors in β. Then

s = dim(R((φk(T))mk)) + dim(Kφk)

= rank((φk(T))mk) + nullity((φk(T))mk)= n.

We conclude that β is a basis for V. Therefore β is a rational canonical basis,and T has a rational canonical form.


In our study of the rational canonical form, we relied on the minimalpolynomial. We are now able to relate the rational canonical form to thecharacteristic polynomial.

Theorem 7.23. Let T be a linear operator on an n-dimensional vectorspace V with characteristic polynomial

f(t) = (−1)n(φ1(t))n1(φ2(t))n2 · · · (φk(t))nk ,

where the φi(t)’s (1 ≤ i ≤ k) are distinct irreducible monic polynomials andthe ni’s are positive integers. Then the following statements are true.

(a) φ1(t), φ2(t), . . . , φk(t) are the irreducible monic factors of the minimalpolynomial.

(b) For each i, dim(Kφi) = dini, where di is the degree of φi(t).

(c) If β is a rational canonical basis for T, then βi = β ∩ Kφiis a basis for

Kφifor each i.

(d) If γi is a basis for Kφi for each i, then γ = γ1 ∪ γ2 ∪ · · · ∪ γk is a basisfor V. In particular, if each γi is a disjoint union of T-cyclic bases, thenγ is a rational canonical basis for T.

Proof. (a) By Theorem 7.22, T has a rational canonical form C. ByExercise 40 of Section 5.4, the characteristic polynomial of C, and hence ofT, is the product of the characteristic polynomials of the companion matricesthat compose C. Therefore each irreducible monic divisor φi(t) of f(t) dividesthe characteristic polynomial of at least one of the companion matrices, andhence for some integer p, (φi(t))p is the T-annihilator of a nonzero vector ofV. We conclude that (φi(t))p, and so φi(t), divides the minimal polynomialof T. Conversely, if φ(t) is an irreducible monic polynomial that divides theminimal polynomial of T, then φ(t) divides the characteristic polynomial ofT because the minimal polynomial divides the characteristic polynomial.

(b), (c), and (d) Let C = [T]β , which is a rational canonical form of T.Consider any i, (1 ≤ i ≤ k). Since f(t) is the product of the characteristicpolynomials of the companion matrices that compose C, we may multiplythose characteristic polynomials that arise from the T-cyclic bases in βi toobtain the factor (φi(t))ni of f(t). Since this polynomial has degree nidi, andthe union of these bases is a linearly independent subset βi of Kφi

, we have

nidi ≤ dim(Kφi).

Furthermore, n =k∑

i=1

dini, because this sum is equal to the degree of f(t).

Now let s denote the number of vectors in γ. By Theorem 7.19, γ is linearlyindependent, and therefore

n =k∑

i=1

dini ≤k∑

i=1

dim(Kφi) = s ≤ n.


Hence n = s, and dini = dim(Kφi) for all i. It follows that γ is a basis for V

and βi is a basis for Kφifor each i.

Uniqueness of the Rational Canonical Form

Having shown that a rational canonical form exists, we are now in a po-sition to ask about the extent to which it is unique. Certainly, the rationalcanonical form of a linear operator T can be modified by permuting the T-cyclic bases that constitute the corresponding rational canonical basis. Thishas the effect of permuting the companion matrices that make up the rationalcanonical form. As in the case of the Jordan canonical form, we show thatexcept for these permutations, the rational canonical form is unique, althoughthe rational canonical bases are not.

To simplify this task, we adopt the convention of ordering every rationalcanonical basis so that all the T-cyclic bases associated with the same irre-ducible monic divisor of the characteristic polynomial are grouped together.Furthermore, within each such grouping, we arrange the T-cyclic bases indecreasing order of size. Our task is to show that, subject to this order, therational canonical form of a linear operator is unique up to the arrangementof the irreducible monic divisors.

As in the case of the Jordan canonical form, we introduce arrays of dotsfrom which we can reconstruct the rational canonical form. For the Jordancanonical form, we devised a dot diagram for each eigenvalue of the givenoperator. In the case of the rational canonical form, we define a dot diagramfor each irreducible monic divisor of the characteristic polynomial of the givenoperator. A proof that the resulting dot diagrams are completely determinedby the operator is also a proof that the rational canonical form is unique.

In what follows, T is a linear operator on a finite-dimensional vector spacewith rational canonical basis β; φ(t) is an irreducible monic divisor of the char-acteristic polynomial of T; βv1 , βv2 , . . . , βvk

are the T-cyclic bases of β thatare contained in Kφ; and d is the degree of φ(t). For each j, let (φ(t))pj be theannihilator of vj . This polynomial has degree dpj ; therefore, by Exercise 15of Section 7.3, βvj contains dpj vectors. Furthermore, p1 ≥ p2 ≥ · · · ≥ pk

since the T-cyclic bases are arranged in decreasing order of size. We definethe dot diagram of φ(t) to be the array consisting of k columns of dots withpj dots in the jth column, arranged so that the jth column begins at the topand terminates after pj dots. For example, if k = 3, p1 = 4, p2 = 2, andp3 = 2, then the dot diagram is

• • •• • •••

Although each column of a dot diagram corresponds to a T-cyclic basis


βviin Kφ, there are fewer dots in the column than there are vectors in the

basis.

Example 2

Recall the linear operator T of Example 1 with the rational canonical basisβ and the rational canonical form C = [T]β . Since there are two irreduciblemonic divisors of the characteristic polynomial of T, φ1(t) = t2 − t + 3 andφ2(t) = t2 + 1, there are two dot diagrams to consider. Because φ1(t) isthe T-annihilator of v1 and βv1 is a basis for Kφ1 , the dot diagram for φ1(t)consists of a single dot. The other two T cyclic bases, βv3 and βv7 , lie in Kφ2 .Since v3 has T-annihilator (φ2(t))2 and v7 has T-annihilator φ2(t), in the dotdiagram of φ2(t) we have p1 = 2 and p2 = 1. These diagrams are as follows:

• • ••

Dot diagram for φ1(t) Dot diagram for φ2(t) ♦

In practice, we obtain the rational canonical form of a linear operatorfrom the information provided by dot diagrams. This is illustrated in thenext example.

Example 3

Let T be a linear operator on a finite-dimensional vector space over R, andsuppose that the irreducible monic divisors of the characteristic polynomialof T are

φ1(t) = t − 1, φ2(t) = t2 + 2, and φ3(t) = t2 + t + 1.

Suppose, furthermore, that the dot diagrams associated with these divisorsare as follows:

• ••

• • •

Diagram for φ1(t) Diagram for φ2(t) Diagram for φ3(t)

Since the dot diagram for φ1(t) has two columns, it contributes two companionmatrices to the rational canonical form. The first column has two dots, andtherefore corresponds to the 2 × 2 companion matrix of (φ1(t))2 = (t − 1)2.The second column, with only one dot, corresponds to the 1 × 1 companionmatrix of φ1(t) = t − 1. These two companion matrices are given by

C1 =(

0 −11 2

)and C2 =

(1).

The dot diagram for φ2(t) = t2 +2 consists of two columns. each containing asingle dot; hence this diagram contributes two copies of the 2× 2 companion


matrix for φ2(t), namely,

C3 = C4 =(

0 −21 0

).

The dot diagram for φ3(t) = t2 + t + 1 consists of a single column with asingle dot contributing the single 2 × 2 companion matrix

C5 =(

0 −11 −1

).

Therefore the rational canonical form of T is the 9 × 9 matrix

C =

⎛⎜⎜⎜⎜⎝C1 O O O OO C2 O O OO O C3 O OO O O C4 OO O O O C5

⎞⎟⎟⎟⎟⎠

=

⎛⎜⎜⎜⎜⎜⎜⎜⎜⎜⎜⎜⎜⎝

0 −1 0 0 0 0 0 0 01 2 0 0 0 0 0 0 00 0 1 0 0 0 0 0 00 0 0 0 −2 0 0 0 00 0 0 1 0 0 0 0 00 0 0 0 0 0 −2 0 00 0 0 0 0 1 0 0 00 0 0 0 0 0 0 0 −10 0 0 0 0 0 0 1 −1

⎞⎟⎟⎟⎟⎟⎟⎟⎟⎟⎟⎟⎟⎠. ♦

We return to the general problem of finding dot diagrams. As we didbefore, we fix a linear operator T on a finite-dimensional vector space and anirreducible monic divisor φ(t) of the characteristic polynomial of T. Let Udenote the restriction of the linear operator φ(T) to Kφ. By Theorem 7.18(d),Uq = T0 for some positive integer q. Consequently, by Exercise 12 of Sec-tion 7.2, the characteristic polynomial of U is (−1)mtm, where m = dim(Kφ).Therefore Kφ is the generalized eigenspace of U corresponding to λ = 0, andU has a Jordan canonical form. The dot diagram associated with the Jordancanonical form of U gives us a key to understanding the dot diagram of Tthat is associated with φ(t). We now relate the two diagrams.

Let β be a rational canonical basis for T, and βv1 , βv2 , . . . , βvkbe the T-

cyclic bases of β that are contained in Kφ. Consider one of these T-cyclicbases βvj

, and suppose again that the T-annihilator of vj is (φ(t))pj . Thenβvj consists of dpj vectors in β. For 0 ≤ i < d, let γi be the cycle ofgeneralized eigenvectors of U corresponding to λ = 0 with end vector Ti(vj),


where T0(vj) = bj . Then

γi = {(φ(T))pj−1Ti(vj), (φ(T))pj−2Ti(vj), . . . , (φ(T))Ti(vj), Ti(vj)}.By Theorem 7.1 (p. 485), γi is a linearly independent subset of Cvi

. Now let

αj = γ0 ∪ γ1 ∪ · · · ∪ γd−1.

Notice that αj contains pjd vectors.

Lemma 1. αj is an ordered basis for Cvj .

Proof. The key to this proof is Theorem 7.4 (p. 487). Since αj is the unionof cycles of generalized eigenvectors of U corresponding to λ = 0, it sufficesto show that the set of initial vectors of these cycles

{(φ(T))pj−1(vj), (φ(T))pj−1T(vj), . . . , (φ(T))pj−1Td−1(vj)}is linearly independent. Consider any linear combination of these vectors

a0(φ(T))pj−1(vj) + a1(φ(T))pj−1T(vj) + · · · + ad−1(φ(T))pj−1Td−1(vj),

where not all of the coefficients are zero. Let g(t) be the polynomial definedby g(t) = a0 + a1t + · · · + ad−1t

d−1. Then g(t) is a nonzero polynomial ofdegree less than d, and hence (φ(t))pj−1g(t) is a nonzero polynomial withdegree less than pjd. Since (φ(t))pj is the T-annihilator of vj , it followsthat (φ(T))pj−1g(T)(vj) �= 0 . Therefore the set of initial vectors is linearlyindependent. So by Theorem 7.4, αj is linearly independent, and the γi’s aredisjoint. Consequently, αj consists of pjd linearly independent vectors in Cvj ,which has dimension pjd. We conclude that αj is a basis for Cvj

.

Thus we may replace βvjby αj as a basis for Cvj

. We do this for each jto obtain a subset α = α1 ∪ α2 · · · ∪ αk of Kφ.

Lemma 2. α is a Jordan canonical basis for Kφ.

Proof. Since βv1 ∪ βv2 ∪ · · · ∪ βvkis a basis for Kφ, and since span(αi) =

span(βvi) = Cvi

, Exercise 9 implies that α is a basis for Kφ. Because α isa union of cycles of generalized eigenvectors of U, we conclude that α is aJordan canonical basis.

We are now in a position to relate the dot diagram of T corresponding toφ(t) to the dot diagram of U, bearing in mind that in the first case we areconsidering a rational canonical form and in the second case we are consider-ing a Jordan canonical form. For convenience, we designate the first diagramD1, and the second diagram D2. For each j, the presence of the T-cyclicbasis βxj

results in a column of pj dots in D1. By Lemma 1, this basis is


replaced by the union αj of d cycles of generalized eigenvectors of U, each oflength pj , which becomes part of the Jordan canonical basis for U. In effect,αj determines d columns each containing pj dots in D2. So each column inD1 determines d columns in D2 of the same length, and all columns in D2 areobtained in this way. Alternatively, each row in D2 has d times as many dotsas the corresponding row in D1. Since Theorem 7.10 (p. 500) gives us thenumber of dots in any row of D2, we may divide the appropriate expressionin this theorem by d to obtain the number of dots in the corresponding rowof D1. Thus we have the following result.

Theorem 7.24. Let T be a linear operator on a finite-dimensional vectorspace V, let φ(t) be an irreducible monic divisor of the characteristic poly-nomial of T of degree d, and let ri denote the number of dots in the ith rowof the dot diagram for φ(t) with respect to a rational canonical basis for T.Then

(a) r1 =1d[dim(V) − rank(φ(T))]

(b) ri =1d[rank((φ(T))i−1) − rank((φ(T))i)] for i > 1.

Thus the dot diagrams associated with a rational canonical form of an op-erator are completely determined by the operator. Since the rational canoni-cal form is completely determined by its dot diagrams, we have the followinguniqueness condition.

Corollary. Under the conventions described earlier, the rational canonicalform of a linear operator is unique up to the arrangement of the irreduciblemonic divisors of the characteristic polynomial.

Since the rational canonical form of a linear operator is unique, the poly-nomials corresponding to the companion matrices that determine this formare also unique. These polynomials, which are powers of the irreducible monicdivisors, are called the elementary divisors of the linear operator. Since acompanion matrix may occur more than once in a rational canonical form,the same is true for the elementary divisors. We call the number of suchoccurrences the multiplicity of the elementary divisor.

Conversely, the elementary divisors and their multiplicities determine thecompanion matrices and, therefore, the rational canonical form of a linearoperator.

Example 4

Let

β = {ex cos 2x, ex sin 2x, xex cos 2x, xex sin 2x}


be viewed as a subset of F(R, R), the space of all real-valued functions definedon R, and let V = span(β). Then V is a four-dimensional subspace of F(R, R),and β is an ordered basis for V. Let D be the linear operator on V defined byD(y) = y′, the derivative of y, and let A = [D]β . Then

A =

⎛⎜⎜⎝1 2 1 0

−2 1 0 10 0 1 20 0 −2 1

⎞⎟⎟⎠ ,

and the characteristic polynomial of D, and hence of A, is

f(t) = (t2 − 2t + 5)2.

Thus φ(t) = t2−2t+5 is the only irreducible monic divisor of f(t). Since φ(t)has degree 2 and V is four-dimensional, the dot diagram for φ(t) contains onlytwo dots. Therefore the dot diagram is determined by r1, the number of dotsin the first row. Because ranks are preserved under matrix representations,we can use A in place of D in the formula given in Theorem 7.24. Now

φ(A) =

⎛⎜⎜⎝0 0 0 40 0 −4 00 0 0 00 0 0 0

⎞⎟⎟⎠ ,

and so

r1 = 12 [4 − rank(φ(A))] = 1

2 [4 − 2] = 1.

It follows that the second dot lies in the second row, and the dot diagram isas follows:

••

Hence V is a D-cyclic space generated by a single function with D-annihilator(φ(t))2. Furthermore, its rational canonical form is given by the companionmatrix of (φ(t))2 = t4 − 4t3 + 14t2 − 20t + 25, which is⎛⎜⎜⎝

0 0 0 −251 0 0 200 1 0 −140 0 1 4

⎞⎟⎟⎠ .

Thus (φ(t))2 is the only elementary divisor of D, and it has multiplicity 1. Forthe cyclic generator, it suffices to find a function g in V for which φ(D)(g) �= 0 .


Since φ(A)(e3) �= 0 , it follows that φ(D)(xex cos 2x) �= 0 ; therefore g(x) =xex cos 2x can be chosen as the cyclic generator. Hence

βg = {xex cos 2x,D(xex cos 2x), D2(xex cos 2x), D3(xex cos 2x)}

is a rational canonical basis for D. Notice that the function h defined byh(x) = xex sin 2x can be chosen in place of g. This shows that the rationalcanonical basis is not unique. ♦

It is convenient to refer to the rational canonical form and elementarydivisors of a matrix, which are defined in the obvious way.

Definitions. Let A ∈ Mn×n(F ). The rational canonical form ofA is defined to be the rational canonical form of LA. Likewise, for A, theelementary divisors and their multiplicities are the same as those of LA.

Let A be an n×n matrix, let C be a rational canonical form of A, and letβ be the appropriate rational canonical basis for LA. Then C = [LA]β , andtherefore A is similar to C. In fact, if Q is the matrix whose columns are thevectors of β in the same order, then Q−1AQ = C.

Example 5

For the following real matrix A, we find the rational canonical form C of Aand a matrix Q such that Q−1AQ = C.

A =

⎛⎜⎜⎜⎜⎝0 2 0 −6 21 −2 0 0 21 0 1 −3 21 −2 1 −1 21 −4 3 −3 4

⎞⎟⎟⎟⎟⎠The characteristic polynomial of A is f(t) = −(t2 + 2)2(t − 2); thereforeφ1(t) = t2 + 2 and φ2(t) = t− 2 are the distinct irreducible monic divisors off(t). By Theorem 7.23, dim(Kφ1) = 4 and dim(Kφ2) = 1. Since the degreeof φ1(t) is 2, the total number of dots in the dot diagram of φ1(t) is 4/2 = 2,and the number of dots r1 in the first row is given by

r1 = 12 [dim(R5) − rank(φ1(A))]

= 12 [5 − rank(A2 + 2I)]

= 12 [5 − 1] = 2.

Thus the dot diagram of φ1(t) is

• •


and each column contributes the companion matrix(0 −21 0

)for φ1(t) = t2 + 2 to the rational canonical form C. Consequently φ1(t) is anelementary divisor with multiplicity 2. Since dim(Kφ2) = 1, the dot diagramof φ2(t) = t − 2 consists of a single dot, which contributes the 1 × 1 matrix(2). Hence φ2(t) is an elementary divisor with multiplicity 1. Therefore the

rational canonical form C is

C =

⎛⎜⎜⎜⎜⎝0 −2 0 0 01 0 0 0 00 0 0 −2 00 0 1 0 0

0 0 0 0 2

⎞⎟⎟⎟⎟⎠ .

We can infer from the dot diagram of φ1(t) that if β is a rational canonicalbasis for LA, then β ∩Kφ1 is the union of two cyclic bases βv1 and βv2 , wherev1 and v2 each have annihilator φ1(t). It follows that both v1 and v2 lie inN(φ1(LA)). It can be shown that⎧⎪⎪⎪⎪⎨⎪⎪⎪⎪⎩

⎛⎜⎜⎜⎜⎝10000

⎞⎟⎟⎟⎟⎠ ,

⎛⎜⎜⎜⎜⎝01000

⎞⎟⎟⎟⎟⎠ ,

⎛⎜⎜⎜⎜⎝00210

⎞⎟⎟⎟⎟⎠ ,

⎛⎜⎜⎜⎜⎝00

−101

⎞⎟⎟⎟⎟⎠⎫⎪⎪⎪⎪⎬⎪⎪⎪⎪⎭

is a basis for N(φ1(LA)). Setting v1 = e1, we see that

Av1 =

⎛⎜⎜⎜⎜⎝01111

⎞⎟⎟⎟⎟⎠ .

Next choose v2 in Kφ1 = N(φ(LA)), but not in the span of βv1 = {v1, Av1}.For example, v2 = e2. Then it can be seen that

Av2 =

⎛⎜⎜⎜⎜⎝2

−20

−2−4

⎞⎟⎟⎟⎟⎠ ,

and βv1 ∪ βv2 is a basis for Kφ1 .


Since the dot diagram of φ2(t) = t−2 consists of a single dot, any nonzerovector in Kφ2 is an eigenvector of A corresponding to the eigenvalue λ = 2.For example, choose

v3 =

⎛⎜⎜⎜⎜⎝01112

⎞⎟⎟⎟⎟⎠ .

By Theorem 7.23, β = {v1, Av1, v2, Av2, v3} is a rational canonical basis forLA. So setting

Q =

⎛⎜⎜⎜⎜⎝1 0 0 2 00 1 1 −2 10 1 0 0 10 1 0 −2 10 1 0 −4 2

⎞⎟⎟⎟⎟⎠ ,

we have Q−1AQ = C. ♦Example 6

For the following matrix A, we find the rational canonical form C and amatrix Q such that Q−1AQ = C:

A =

⎛⎜⎜⎝2 1 0 00 2 1 00 0 2 00 0 0 2

⎞⎟⎟⎠ .

Since the characteristic polynomial of A is f(t) = (t−2)4, the only irreduciblemonic divisor of f(t) is φ(t) = t − 2, and so Kφ = R4. In this case, φ(t) hasdegree 1; hence in applying Theorem 7.24 to compute the dot diagram forφ(t), we obtain

r1 = 4 − rank(φ(A)) = 4 − 2 = 2,

r2 = rank(φ(A)) − rank((φ(A))2) = 2 − 1 = 1,

and

r3 = rank((φ(A))2) − rank((φ(A))3) = 1 − 0 = 1,

where ri is the number of dots in the ith row of the dot diagram. Since thereare dim(R4) = 4 dots in the diagram, we may terminate these computations


with r3. Thus the dot diagram for A is

• •••

Since (t − 2)3 has the companion matrix⎛⎝0 0 81 0 −120 1 6

⎞⎠and (t − 2) has the companion matrix

(2), the rational canonical form of A

is given by

C =

⎛⎜⎜⎝0 0 8 01 0 −12 00 1 6 00 0 0 2

⎞⎟⎟⎠ .

Next we find a rational canonical basis for LA. The preceding dot diagramindicates that there are two vectors v1 and v2 in R4 with annihilators (φ(t))3

and φ(t), respectively, and such that

β = {βv1 ∪ βv1} = {v1, Av1, A2v1, v2}

is a rational canonical basis for LA. Furthermore, v1 /∈ N((LA − 2I)2), andv2 ∈ N(LA − 2I). It can easily be shown that

N(LA − 2I) = span({e1, e4})and

N((LA − 2I)2) = span({e1, e2, e4}).The standard vector e3 meets the criteria for v1; so we set v1 = e3. It followsthat

Av1 =

⎛⎜⎜⎝0120

⎞⎟⎟⎠ and A2v1 =

⎛⎜⎜⎝1440

⎞⎟⎟⎠ .

Next we choose a vector v2 ∈ N(LA−2I) that is not in the span of βv1 . Clearly,v2 = e4 satisfies this condition. Thus⎧⎪⎪⎨⎪⎪⎩

⎛⎜⎜⎝0010

⎞⎟⎟⎠ ,

⎛⎜⎜⎝0120

⎞⎟⎟⎠ ,

⎛⎜⎜⎝1440

⎞⎟⎟⎠ ,

⎛⎜⎜⎝0001

⎞⎟⎟⎠⎫⎪⎪⎬⎪⎪⎭


is a rational canonical basis for LA.

Finally, let Q be the matrix whose columns are the vectors of β in thesame order:

Q =

⎛⎜⎜⎝0 0 1 00 1 4 01 2 4 00 0 0 1

⎞⎟⎟⎠ .

Then C = Q−1AQ. ♦

Direct Sums*

The next theorem is a simple consequence of Theorem 7.23.

Theorem 7.25 (Primary Decomposition Theorem). Let T be alinear operator on an n-dimensional vector space V with characteristic poly-nomial

f(t) = (−1)n(φ1(t))n1(φ2(t))n2 · · · (φk(t))nk ,

where the φi(t)’s (1 ≤ i ≤ k) are distinct irreducible monic polynomials andthe ni’s are positive integers. Then the following statements are true.

(a) V = Kφ1 ⊕ Kφ2 ⊕ · · · ⊕ Kφk.

(b) If Ti (1 ≤ i ≤ k) is the restriction of T to Kφiand Ci is the rational

canonical form of Ti, then C1 ⊕ C2 ⊕ · · · ⊕ Ck is the rational canonicalform of T.

Proof. Exercise.

The next theorem is a simple consequence of Theorem 7.17.

Theorem 7.26. Let T be a linear operator on a finite-dimensional vectorspace V. Then V is a direct sum of T-cyclic subspaces Cvi

, where each vi liesin Kφ for some irreducible monic divisor φ(t) of the characteristic polynomialof T.

Proof. Exercise.

EXERCISES


(a) Every rational canonical basis for a linear operator T is the unionof T-cyclic bases.


(b) If a basis is the union of T-cyclic bases for a linear operator T,then it is a rational canonical basis for T.

(c) There exist square matrices having no rational canonical form.(d) A square matrix is similar to its rational canonical form.(e) For any linear operator T on a finite-dimensional vector space, any

irreducible factor of the characteristic polynomial of T divides theminimal polynomial of T.

(f) Let φ(t) be an irreducible monic divisor of the characteristic poly-nomial of a linear operator T. The dots in the diagram used tocompute the rational canonical form of the restriction of T to Kφ

are in one-to-one correspondence with the vectors in a basis forKφ.

(g) If a matrix has a Jordan canonical form, then its Jordan canonicalform and rational canonical form are similar.

2. For each of the following matrices A ∈ Mn×n(F ), find the rationalcanonical form C of A and a matrix Q ∈ Mn×n(F ) such that Q−1AQ =C.

(a) A =

⎛⎝3 1 00 3 10 0 3

⎞⎠ F = R (b) A =(

0 −11 −1

)F = R

(c) A =(

0 −11 −1

)F = C

(d) A =

⎛⎜⎜⎝0 −7 14 −61 −4 6 −30 −4 9 −40 −4 11 −5

⎞⎟⎟⎠ F = R

(e) A =

⎛⎜⎜⎝0 −4 12 −71 −1 3 −30 −1 6 −40 −1 8 −5

⎞⎟⎟⎠ F = R

3. For each of the following linear operators T, find the elementary divisors,the rational canonical form C, and a rational canonical basis β.

(a) T is the linear operator on P3(R) defined by

T(f(x)) = f(0)x − f ′(1).

(b) Let S = {sin x, cos x, x sin x, x cos x}, a subset of F(R, R), and letV = span(S). Define T to be the linear operator on V such that

T(f) = f ′.

(c) T is the linear operator on M2×2(R) defined by


T(A) =(

0 1−1 1

)·A.

(d) Let S = {sin x sin y, sin x cos y, cos x sin y, cos x cos y}, a subset ofF(R × R, R), and let V = span(S). Define T to be the linearoperator on V such that

T(f)(x, y) =∂f(x, y)

∂x+

∂f(x, y)∂y

.

4. Let T be a linear operator on a finite-dimensional vector space V withminimal polynomial (φ(t))m for some positive integer m.

(a) Prove that R(φ(T)) ⊆ N((φ(T))m−1).(b) Give an example to show that the subspaces in (a) need not be

equal.(c) Prove that the minimal polynomial of the restriction of T to

R(φ(T)) equals (φ(t))m−1.

5. Let T be a linear operator on a finite-dimensional vector space. Provethat the rational canonical form of T is a diagonal matrix if and only ifT is diagonalizable.

6. Let T be a linear operator on a finite-dimensional vector space V withcharacteristic polynomial f(t) = (−1)nφ1(t)φ2(t), where φ1(t) and φ2(t)are distinct irreducible monic polynomials and n = dim(V).

(a) Prove that there exist v1, v2 ∈ V such that v1 has T-annihilatorφ1(t), v2 has T-annihilator φ2(t), and βv1 ∪ βv2 is a basis for V.

(b) Prove that there is a vector v3 ∈ V with T-annihilator φ1(t)φ2(t)such that βv3 is a basis for V.

(c) Describe the difference between the matrix representation of Twith respect to βv1 ∪ βv2 and the matrix representation of T withrespect to βv3 .

Thus, to assure the uniqueness of the rational canonical form, we re-quire that the generators of the T-cyclic bases that constitute a rationalcanonical basis have T-annihilators equal to powers of irreducible monicfactors of the characteristic polynomial of T.

7. Let T be a linear operator on a finite-dimensional vector space withminimal polynomial

f(t) = (φ1(t))m1(φ2(t))m2 · · · (φk(t))mk ,

where the φi(t)’s are distinct irreducible monic factors of f(t). Provethat for each i, mi is the number of entries in the first column of thedot diagram for φi(t).


8. Let T be a linear operator on a finite-dimensional vector space V. Provethat for any irreducible polynomial φ(t), if φ(T) is not one-to-one, thenφ(t) divides the characteristic polynomial of T. Hint: Apply Exercise 15of Section 7.3.

9. Let V be a vector space and β1, β2, . . . , βk be disjoint subsets of V whoseunion is a basis for V. Now suppose that γ1, γ2, . . . , γk are linearlyindependent subsets of V such that span(γi) = span(βi) for all i. Provethat γ1 ∪ γ2 ∪ · · · ∪ γk is also a basis for V.

10. Let T be a linear operator on a finite-dimensional vector space, andsuppose that φ(t) is an irreducible monic factor of the characteristicpolynomial of T. Prove that if φ(t) is the T-annihilator of vectors x andy, then x ∈ Cy if and only if Cx = Cy.

Exercises 11 and 12 are concerned with direct sums.




Companion matrix 526Cycle of generalized eigenvectors

488Cyclic basis 525Dot diagram for Jordan canonical

form 498Dot diagram for rational canonical

form 535Elementary divisor of a linear oper-

ator 539Elementary divisor of a matrix 541End vector of a cycle 488Generalized eigenspace 484Generalized eigenvector 484Generator of a cyclic basis 525Initial vector of a cycle 488Jordan block 483Jordan canonical basis 483

Jordan canonical form of a linear op-erator 483

Jordan canonical form of a matrix491

Length of a cycle 488Minimal polynomial of a linear oper-

ator 516Minimal polynomial of a matrix

517Multiplicity of an elementary divisor

539Rational canonical basis of a linear

operator 526Rational canonical form for a linear

operator 526Rational canonical form of a matrix

541

Appendices

APPENDIX A SETS

A set is a collection of objects, called elements of the set. If x is an elementof the set A, then we write x ∈ A; otherwise, we write x �∈ A. For example,if Z is the set of integers, then 3 ∈ Z and 1

2 �∈ Z.One set that appears frequently is the set of real numbers, which we denote

by R throughout this text.Two sets A and B are called equal, written A = B, if they contain exactly

the same elements. Sets may be described in one of two ways:

1. By listing the elements of the set between set braces { }.2. By describing the elements of the set in terms of some characteristic

property.

For example, the set consisting of the elements 1, 2, 3, and 4 can bewritten as {1, 2, 3, 4} or as

{x : x is a positive integer less than 5}.Note that the order in which the elements of a set are listed is immaterial;hence

{1, 2, 3, 4} = {3, 1, 2, 4} = {1, 3, 1, 4, 2}.Example 1

Let A denote the set of real numbers between 1 and 2. Then A may bewritten as

A = {x ∈ R : 1 < x < 2}. ♦A set B is called a subset of a set A, written B ⊆ A or A ⊇ B, if every

element of B is an element of A. For example, {1, 2, 6} ⊆ {2, 8, 7, 6, 1}. IfB ⊆ A, and B �= A, then B is called a proper subset of A. Observe thatA = B if and only if A ⊆ B and B ⊆ A, a fact that is often used to provethat two sets are equal.

The empty set, denoted by ∅, is the set containing no elements. Theempty set is a subset of every set.

Sets may be combined to form other sets in two basic ways. The unionof two sets A and B, denoted A ∪ B, is the set of elements that are in A, orB, or both; that is,

A ∪ B = {x : x ∈ A or x ∈ B}.

549

550 Appendices

The intersection of two sets A and B, denoted A∩B, is the set of elementsthat are in both A and B; that is,

A ∩ B = {x : x ∈ A and x ∈ B}.Two sets are called disjoint if their intersection equals the empty set.

Example 2

Let A = {1, 3, 5} and B = {1, 5, 7, 8}. Then

A ∪ B = {1, 3, 5, 7, 8} and A ∩ B = {1, 5}.Likewise, if X = {1, 2, 8} and Y = {3, 4, 5}, then

X ∪ Y = {1, 2, 3, 4, 5, 8} and X ∩ Y = ∅.

Thus X and Y are disjoint sets. ♦The union and intersection of more than two sets can be defined analo-

gously. Specifically, if A1, A2, . . . , An are sets, then the union and intersec-tions of these sets are defined, respectively, by

n⋃i=1

Ai = {x : x ∈ Ai for some i = 1, 2, . . . , n}

andn⋂

i=1

Ai = {x : x ∈ Ai for all i = 1, 2, . . . , n}.

Similarly, if Λ is an index set and {Aα : α ∈ Λ} is a collection of sets, theunion and intersection of these sets are defined, respectively, by⋃

α∈Λ

Aα = {x : x ∈ Aα for some α ∈ Λ}

and ⋂α∈Λ

Aα = {x : x ∈ Aα for all α ∈ Λ}.

Example 3

Let Λ = {α ∈ R : α > 1}, and let

Aα ={

x ∈ R :−1α

≤ x ≤ 1 + α

}for each α ∈ Λ. Then⋃

α∈Λ

Aα = {x ∈ R : x > −1} and⋂

α∈Λ

Aα = {x ∈ R : 0 ≤ x ≤ 2}. ♦

Appendix B Functions 551

By a relation on a set A, we mean a rule for determining whether or not,for any elements x and y in A, x stands in a given relationship to y. Moreprecisely, a relation on A is a set S of ordered pairs of elements of A suchthat (x, y) ∈ S if and only if x stands in the given relationship to y. On theset of real numbers, for instance, “is equal to,” “is less than,” and “is greaterthan or equal to” are familiar relations. If S is a relation on a set A, we oftenwrite x ∼ y in place of (x, y) ∈ S.

A relation S on a set A is called an equivalence relation on A if thefollowing three conditions hold:

1. For each x ∈ A, x ∼ x (reflexivity).2. If x ∼ y, then y ∼ x (symmetry).3. If x ∼ y and y ∼ z, then x ∼ z (transitivity).

For example, if we define x ∼ y to mean that x − y is divisible by a fixedinteger n, then ∼ is an equivalence relation on the set of integers.

APPENDIX B FUNCTIONS

If A and B are sets, then a function f from A to B, written f : A → B, isa rule that associates to each element x in A a unique element denoted f(x)in B. The element f(x) is called the image of x (under f), and x is calleda preimage of f(x) (under f). If f : A → B, then A is called the domainof f , B is called the codomain of f , and the set {f(x) : x ∈ A} is called therange of f . Note that the range of f is a subset of B. If S ⊆ A, we denoteby f(S) the set {f(x) : x ∈ S} of all images of elements of S. Likewise, ifT ⊆ B, we denote by f−1(T ) the set {x ∈ A : f(x) ∈ T} of all preimages ofelements in T . Finally, two functions f : A → B and g : A → B are equal,written f = g, if f(x) = g(x) for all x ∈ A.

Example 1

Suppose that A = [−10, 10]. Let f : A → R be the function that assignsto each element x in A the element x2 + 1 in R; that is, f is defined byf(x) = x2+1. Then A is the domain of f , R is the codomain of f , and [1, 101]is the range of f . Since f(2) = 5, the image of 2 is 5, and 2 is a preimageof 5. Notice that −2 is another preimage of 5. Moreover, if S = [1, 2] andT = [82, 101], then f(S) = [2, 5] and f−1(T ) = [−10,−9] ∪ [9, 10]. ♦

As Example 1 shows, the preimage of an element in the range need not beunique. Functions such that each element of the range has a unique preimageare called one-to-one; that is f : A → B is one-to-one if f(x) = f(y) impliesx = y or, equivalently, if x �= y implies f(x) �= f(y).

If f : A → B is a function with range B, that is, if f(A) = B, then f iscalled onto. So f is onto if and only if the range of f equals the codomainof f .

552 Appendices

Let f : A → B be a function and S ⊆ A. Then a function fS : S → B,called the restriction of f to S, can be formed by defining fS(x) = f(x) foreach x ∈ S.

The next example illustrates these concepts.

Example 2

Let f : [−1, 1] → [0, 1] be defined by f(x) = x2. This function is onto, butnot one-to-one since f(−1) = f(1) = 1. Note that if S = [0, 1], then fS isboth onto and one-to-one. Finally, if T = [12 , 1], then fT is one-to-one, butnot onto. ♦

Let A, B, and C be sets and f : A → B and g : B → C be functions. Byfollowing f with g, we obtain a function g ◦ f : A → C called the compositeof g and f . Thus (g ◦ f)(x) = g(f(x)) for all x ∈ A. For example, letA = B = C = R, f(x) = sin x, and g(x) = x2 + 3. Then (g ◦ f)(x) =(g(f(x)) = sin2 x + 3, whereas (f ◦ g)(x) = f(g(x)) = sin(x2 + 3). Hence,g ◦ f �= f ◦ g. Functional composition is associative, however; that is, ifh : C → D is another function, then h ◦ (g ◦ f) = (h ◦ g) ◦ f .

A function f : A → B is said to be invertible if there exists a functiong : B → A such that (f ◦ g)(y) = y for all y ∈ B and (g ◦ f)(x) = x for allx ∈ A. If such a function g exists, then it is unique and is called the inverseof f . We denote the inverse of f (when it exists) by f−1. It can be shownthat f is invertible if and only if f is both one-to-one and onto.

Example 3

The function f : R → R defined by f(x) = 3x + 1 is one-to-one and onto;hence f is invertible. The inverse of f is the function f−1 : R → R definedby f−1(x) = (x − 1)/3. ♦

The following facts about invertible functions are easily proved.

1. If f : A → B is invertible, then f−1 is invertible, and (f−1)−1 = f .2. If f : A → B and g : B → C are invertible, then g ◦ f is invertible, and

(g ◦ f)−1 = f−1 ◦ g−1.

APPENDIX C FIELDS

The set of real numbers is an example of an algebraic structure called afield. Basically, a field is a set in which four operations (called addition,multiplication, subtraction, and division) can be defined so that, with theexception of division by zero, the sum, product, difference, and quotient ofany two elements in the set is an element of the set. More precisely, a field isdefined as follows.

Appendix C Fields 553

Definitions. A field F is a set on which two operations + and · (calledaddition and multiplication, respectively) are defined so that, for each pairof elements x, y in F , there are unique elements x+ y and x ·y in F for whichthe following conditions hold for all elements a, b, c in F .

(F 1) a + b = b + a and a ·b = b ·a(commutativity of addition and multiplication)

(F 2) (a + b) + c = a + (b + c) and (a ·b) ·c = a ·(b ·c)(associativity of addition and multiplication)

(F 3) There exist distinct elements 0 and 1 in F such that

0 + a = a and 1·a = a

(existence of identity elements for addition and multiplication)

(F 4) For each element a in F and each nonzero element b in F , there existelements c and d in F such that

a + c = 0 and b·d = 1

(existence of inverses for addition and multiplication)

(F 5) a·(b + c) = a·b + a·c(distributivity of multiplication over addition)

The elements x + y and x ·y are called the sum and product, respectively,of x and y. The elements 0 (read “zero”) and 1 (read “one”) mentioned in(F 3) are called identity elements for addition and multiplication, respec-tively, and the elements c and d referred to in (F 4) are called an additiveinverse for a and a multiplicative inverse for b, respectively.

Example 1

The set of real numbers R with the usual definitions of addition and multi-plication is a field. ♦Example 2

The set of rational numbers with the usual definitions of addition and multi-plication is a field. ♦Example 3

The set of all real numbers of the form a + b√

2, where a and b are rationalnumbers, with addition and multiplication as in R is a field. ♦Example 4

The field Z2 consists of two elements 0 and 1 with the operations of additionand multiplication defined by the equations

0 + 0 = 0, 0 + 1 = 1 + 0 = 1, 1 + 1 = 0,

0 ·0 = 0, 0 ·1 = 1 ·0 = 0, and 1 ·1 = 1. ♦

554 Appendices

Example 5

Neither the set of positive integers nor the set of integers with the usualdefinitions of addition and multiplication is a field, for in either case (F 4)does not hold. ♦

The identity and inverse elements guaranteed by (F 3) and (F 4) areunique; this is a consequence of the following theorem.

Theorem C.1 (Cancellation Laws). For arbitrary elements a, b, andc in a field, the following statements are true.

(a) If a + b = c + b, then a = c.(b) If a ·b = c ·b and b �= 0, then a = c.

Proof. (a) The proof of (a) is left as an exercise.(b) If b �= 0, then (F 4) guarantees the existence of an element d in the

field such that b ·d = 1. Multiply both sides of the equality a ·b = c · b by dto obtain (a ·b) ·d = (c ·b) ·d. Consider the left side of this equality: By (F 2)and (F 3), we have

(a ·b) ·d = a ·(b ·d) = a ·1 = a.

Similarly, the right side of the equality reduces to c. Thus a = c.

Corollary. The elements 0 and 1 mentioned in (F 3), and the elements cand d mentioned in (F 4), are unique.

Proof. Suppose that 0′ ∈ F satisfies 0′ + a = a for each a ∈ F . Since0 + a = a for each a ∈ F , we have 0′ + a = 0 + a for each a ∈ F . Thus 0′ = 0by Theorem C.1.

The proofs of the remaining parts are similar.

Thus each element b in a field has a unique additive inverse and, if b �= 0,a unique multiplicative inverse. (It is shown in the corollary to Theorem C.2that 0 has no multiplicative inverse.) The additive inverse and the multi-plicative inverse of b are denoted by −b and b−1, respectively. Note that−(−b) = b and (b−1)−1 = b.

Subtraction and division can be defined in terms of addition and multi-plication by using the additive and multiplicative inverses. Specifically, sub-traction of b is defined to be addition of −b and division by b �= 0 is definedto be multiplication by b−1; that is,

a − b = a + (−b) anda

b= a ·b−1.

In particular, the symbol1b

denotes b−1. Division by zero is undefined, but,with this exception, the sum, product, difference, and quotient of any twoelements of a field are defined.

Appendix C Fields 555

Many of the familiar properties of multiplication of real numbers are truein any field, as the next theorem shows.

Theorem C.2. Let a and b be arbitrary elements of a field. Then eachof the following statements are true.

(a) a ·0 = 0.(b) (−a) ·b = a ·(−b) = −(a ·b).(c) (−a) ·(−b) = a ·b.

Proof. (a) Since 0 + 0 = 0, (F 5) shows that

0 + a ·0 = a ·0 = a ·(0 + 0) = a ·0 + a ·0.

Thus 0 = a ·0 by Theorem C.1.(b) By definition, −(a ·b) is the unique element of F with the property

a ·b + [−(a ·b)] = 0. So in order to prove that (−a) ·b = −(a ·b), it sufficesto show that a ·b + (−a) ·b = 0. But −a is the element of F such thata + (−a) = 0; so

a ·b + (−a) ·b = [a + (−a)] ·b = 0 ·b = b ·0 = 0

by (F 5) and (a). Thus (−a) ·b = −(a ·b). The proof that a ·(−b) = −(a ·b)is similar.

(c) By applying (b) twice, we find that

(−a) ·(−b) = −[a ·(−b)] = −[−(a ·b)] = a ·b.

Corollary. The additive identity of a field has no multiplicative inverse.

In an arbitrary field F , it may happen that a sum 1 + 1 + · · · + 1 (p sum-mands) equals 0 for some positive integer p. For example, in the field Z2

(defined in Example 4), 1+1 = 0. In this case, the smallest positive integer pfor which a sum of p 1’s equals 0 is called the characteristic of F ; if no suchpositive integer exists, then F is said to have characteristic zero. Thus Z2

has characteristic two, and R has characteristic zero. Observe that if F is afield of characteristic p �= 0, then x+x+ · · · +x (p summands) equals 0 for allx ∈ F . In a field having nonzero characteristic (especially characteristic two),many unnatural problems arise. For this reason, some of the results aboutvector spaces stated in this book require that the field over which the vectorspace is defined be of characteristic zero (or, at least, of some characteristicother than two).

Finally, note that in other sections of this book, the product of two ele-ments a and b in a field is usually denoted ab rather than a ·b.

556 Appendices

APPENDIX D COMPLEX NUMBERS

For the purposes of algebra, the field of real numbers is not sufficient, forthere are polynomials of nonzero degree with real coefficients that have nozeros in the field of real numbers (for example, x2 + 1). It is often desirableto have a field in which any polynomial of nonzero degree with coefficientsfrom that field has a zero in that field. It is possible to “enlarge” the field ofreal numbers to obtain such a field.

Definitions. A complex number is an expression of the form z = a+bi,where a and b are real numbers called the real part and the imaginary partof z, respectively.

The sum and product of two complex numbers z = a+bi and w = c+di(where a, b, c, and d are real numbers) are defined, respectively, as follows:

z + w = (a + bi) + (c + di) = (a + c) + (b + d)i

and

zw = (a + bi)(c + di) = (ac − bd) + (bc + ad)i.

Example 1

The sum and product of z = 3 − 5i and w = 9 + 7i are, respectively,

z + w = (3 − 5i) + (9 + 7i) = (3 + 9) + [(−5) + 7]i = 12 + 2i

and

zw = (3 − 5i)(9 + 7i) = [3 ·9 − (−5) ·7] + [(−5) ·9 + 3 ·7]i = 62 − 24i. ♦

Any real number c may be regarded as a complex number by identifying cwith the complex number c + 0i. Observe that this correspondence preservessums and products; that is,

(c + 0i) + (d + 0i) = (c + d) + 0i and (c + 0i)(d + 0i) = cd + 0i.

Any complex number of the form bi = 0 + bi, where b is a nonzero realnumber, is called imaginary. The product of two imaginary numbers is realsince

(bi)(di) = (0 + bi)(0 + di) = (0 − bd) + (b ·0 + 0 ·d)i = −bd.

In particular, for i = 0 + 1i, we have i · i = −1.The observation that i2 = i ·i = −1 provides an easy way to remember the

definition of multiplication of complex numbers: simply multiply two complexnumbers as you would any two algebraic expressions, and replace i2 by −1.Example 2 illustrates this technique.

Appendix D Complex Numbers 557

Example 2

The product of −5 + 2i and 1 − 3i is

(−5 + 2i)(1 − 3i) = −5(1 − 3i) + 2i(1 − 3i)

= −5 + 15i + 2i − 6i2

= −5 + 15i + 2i − 6(−1)= 1 + 17i. ♦

The real number 0, regarded as a complex number, is an additive identityelement for the complex numbers since

(a + bi) + 0 = (a + bi) + (0 + 0i) = (a + 0) + (b + 0)i = a + bi.

Likewise the real number 1, regarded as a complex number, is a multiplicativeidentity element for the set of complex numbers since

(a + bi) ·1 = (a + bi)(1 + 0i) = (a ·1 − b ·0) + (b ·1 + a ·0)i = a + bi.

Every complex number a + bi has an additive inverse, namely (−a) + (−b)i.But also each complex number except 0 has a multiplicative inverse. In fact,

(a + bi)−1 =(

a

a2 + b2

)−(

b

a2 + b2

)i.

In view of the preceding statements, the following result is not surprising.

Theorem D.1. The set of complex numbers with the operations of addi-tion and multiplication previously defined is a field.

Proof. Exercise.

Definition. The (complex) conjugate of a complex number a + bi isthe complex number a− bi. We denote the conjugate of the complex numberz by z.

Example 3

The conjugates of −3 + 2i, 4 − 7i, and 6 are, respectively,

−3 + 2i = −3 − 2i, 4 − 7i = 4 + 7i, and 6 = 6 + 0i = 6 − 0i = 6. ♦

The next theorem contains some important properties of the conjugate ofa complex number.

Theorem D.2. Let z and w be complex numbers. Then the followingstatements are true.

558 Appendices

(a) z = z.(b) (z + w) = z + w.(c) zw = z ·w.

(d)( z

w

)=

z

wif w �= 0.

(e) z is a real number if and only if z = z.

Proof. We leave the proofs of (a), (d), and (e) to the reader.(b) Let z = a + bi and w = c + di, where a, b, c, d ∈ R. Then

(z + w) = (a + c) + (b + d)i = (a + c) − (b + d)i= (a − bi) + (c − di) = z + w.

(c) For z and w, we have

zw = (a + bi)(c + di) = (ac − bd) + (ad + bc)i

= (ac − bd) − (ad + bc)i = (a − bi)(c − di) = z ·w.

For any complex number z = a + bi, zz is real and nonnegative, for

zz = (a + bi)(a − bi) = a2 + b2.

This fact can be used to define the absolute value of a complex number.

Definition. Let z = a + bi, where a, b ∈ R. The absolute value (ormodulus) of z is the real number

√a2 + b2. We denote the absolute value

of z by |z|.Observe that zz = |z|2. The fact that the product of a complex number

and its conjugate is real provides an easy method for determining the quotientof two complex numbers; for if c + di �= 0, then

a + bi

c + di=

a + bi

c + di· c − di

c − di=

(ac + bd) + (bc − ad)ic2 + d2

=ac + bd

c2 + d2+

bc − ad

c2 + d2i.

Example 4

To illustrate this procedure, we compute the quotient (1 + 4i)/(3 − 2i):

1 + 4i

3 − 2i=

1 + 4i

3 − 2i· 3 + 2i

3 + 2i=

−5 + 14i

9 + 4= − 5

13+

1413

i. ♦

The absolute value of a complex number has the familiar properties of theabsolute value of a real number, as the following result shows.

Theorem D.3. Let z and w denote any two complex numbers. Then thefollowing statements are true.


(a) |zw| = |z| · |w|.(b)

∣∣∣ zw

∣∣∣ =|z||w| if w �= 0.

(c) |z + w| ≤ |z| + |w|.(d) |z| − |w| ≤ |z + w|.

Proof. (a) By Theorem D.2, we have

|zw|2 = (zw)(zw) = (zw)(z · w) = (zz)(ww) = |z|2|w|2,

proving (a).(b) For the proof of (b), apply (a) to the product

( z

w

)w.

(c) For any complex number x = a + bi, where a, b ∈ R, observe that

x + x = (a + bi) + (a − bi) = 2a ≤ 2√

a2 + b2 = 2|x|.

Thus x + x is real and satisfies the inequality x + x ≤ 2|x|. Taking x = wz,we have, by Theorem D.2 and (a),

wz + wz ≤ 2|wz| = 2|w||z| = 2|z||w|.

Using Theorem D.2 again gives

|z + w|2 = (z + w)(z + w) = (z + w)(z + w) = zz + wz + zw + ww

≤ |z|2 + 2|z||w| + |w|2 = (|z| + |w|)2.

By taking square roots, we obtain (c).(d) From (a) and (c), it follows that

|z| = |(z + w) − w| ≤ |z + w| + | −w| = |z + w| + |w|.

So

|z| − |w| ≤ |z + w|,

proving (d).

It is interesting as well as useful that complex numbers have both a ge-ometric and an algebraic representation. Suppose that z = a + bi, where aand b are real numbers. We may represent z as a vector in the complex plane(see Figure D.1(a)). Notice that, as in R2, there are two axes, the real axisand the imaginary axis. The real and imaginary parts of z are the first andsecond coordinates, and the absolute value of z gives the length of the vectorz. It is clear that addition of complex numbers may be represented as in R2

using the parallelogram law.

560 Appendices

��

imaginary axis

real axis

z = a + bib

a0

(a)

.........

..........

.....................................................

...........................

.............................................

.......................................................................................................................................................................................................................................................................................................................................................................................................................................

.........................................................................................................�

��

��

............

.........................

..........................................

.......................................................(

)θ

φ eiθ

z = |z|eiφ

1−1 0

(b)

Figure D.1

In Section 2.7 (p.132), we introduce Euler’s formula. The special caseeiθ = cos θ + i sin θ is of particular interest. Because of the geometry we haveintroduced, we may represent the vector eiθ as in Figure D.1(b); that is, eiθ

is the unit vector that makes an angle θ with the positive real axis. Fromthis figure, we see that any nonzero complex number z may be depicted asa multiple of a unit vector, namely, z = |z|eiφ, where φ is the angle that thevector z makes with the positive real axis. Thus multiplication, as well asaddition, has a simple geometric interpretation: If z = |z|eiθ and w = |w|eiω

are two nonzero complex numbers, then from the properties established inSection 2.7 and Theorem D.3, we have

zw = |z|eiθ · |w|eiω = |zw|ei(θ+ω).

So zw is the vector whose length is the product of the lengths of z and w,and makes the angle θ + ω with the positive real axis.

Our motivation for enlarging the set of real numbers to the set of complexnumbers is to obtain a field such that every polynomial with nonzero degreehaving coefficients in that field has a zero. Our next result guarantees thatthe field of complex numbers has this property.

Theorem D.4 (The Fundamental Theorem of Algebra). Supposethat p(z) = anzn + an−1z

n−1 + · · · + a1z + a0 is a polynomial in P(C) ofdegree n ≥ 1. Then p(z) has a zero.

The following proof is based on one in the book Principles of MathematicalAnalysis 3d., by Walter Rudin (McGraw-Hill Higher Education, New York,1976).

Proof. We want to find z0 in C such that p(z0) = 0. Let m be the greatestlower bound of {|p(z)| : z ∈ C}. For |z| = s > 0, we have

|p(z)| = |anzn + an−1zn−1 + · · · + a0|


≥ |an||zn| − |an−1||z|n−1 − · · · − |a0|= |an|sn − |an−1|sn−1 − · · · − |a0|= sn[|an| − |an−1|s−1 − · · · − |a0|s−n].

Because the last expression approaches infinity as s approaches infinity, wemay choose a closed disk D about the origin such that |p(z)| > m + 1 if z isnot in D. It follows that m is the greatest lower bound of {|p(z)| : z ∈ D}.Because D is closed and bounded and p(z) is continuous, there exists z0 inD such that |p(z0)| = m. We want to show that m = 0. We argue bycontradiction.

Assume that m �= 0. Let q(z) =p(z + z0)

p(z0). Then q(z) is a polynomial of

degree n, q(0) = 1, and |q(z)| ≥ 1 for all z in C. So we may write

q(z) = 1 + bkzk + bk+1zk+1 + · · · + bnzn,

where bk �= 0. Because −|bk|bk

has modulus one, we may pick a real number θ

such that eikθ = −|bk|bk

, or eikθbk = −|bk|. For any r > 0, we have

q(reiθ) = 1 + bkrkeikθ + bk+1rk+1ei(k+1)θ + · · · + bnrneinθ

= 1 − |bk|rk + bk+1rk+1ei(k+1)θ + · · · + bnrneinθ.

Choose r small enough so that 1 − |bk|rk > 0. Then

|q(reiθ)| ≤ 1 − |bk|rk + |bk+1|rk+1 + · · · + |bn|rn

= 1 − rk[|bk| − |bk+1|r − · · · − |bn|rn−k].

Now choose r even smaller, if necessary, so that the expression within thebrackets is positive. We obtain that |q(reiθ)| < 1. But this is a contradiction.

The following important corollary is a consequence of Theorem D.4 andthe division algorithm for polynomials (Theorem E.1).

Corollary. If p(z) = anzn + an−1zn−1 + · · · + a1z + a0 is a polynomial

of degree n ≥ 1 with complex coefficients, then there exist complex numbersc1, c2, · · · , cn (not necessarily distinct) such that

p(z) = an(z − c1)(z − c2) · · · (z − cn).

Proof. Exercise.

A field is called algebraically closed if it has the property that everypolynomial of positive degree with coefficients from that field factors as aproduct of polynomials of degree 1. Thus the preceding corollary asserts thatthe field of complex numbers is algebraically closed.

562 Appendices

APPENDIX E POLYNOMIALS

In this appendix, we discuss some useful properties of the polynomials withcoefficients from a field. For the definition of a polynomial, refer to Sec-tion 1.2. Throughout this appendix, we assume that all polynomials havecoefficients from a fixed field F .

Definition. A polynomial f(x) divides a polynomial g(x) if there existsa polynomial q(x) such that g(x) = f(x)q(x).

Our first result shows that the familiar long division process for polyno-mials with real coefficients is valid for polynomials with coefficients from anarbitrary field.

Theorem E.1 (The Division Algorithm for Polynomials). Letf(x) be a polynomial of degree n, and let g(x) be a polynomial of degreem ≥ 0. Then there exist unique polynomials q(x) and r(x) such that

f(x) = q(x)g(x) + r(x), (1)

where the degree of r(x) is less than m.

Proof. We begin by establishing the existence of q(x) and r(x) that sat-isfy (1).

Case 1. If n < m, take q(x) = 0 and r(x) = f(x) to satisfy (1).Case 2. When 0 ≤ m ≤ n, we apply mathematical induction on n.

First suppose that n = 0. Then m = 0, and it follows that f(x) and g(x)are nonzero constants. Hence we may take q(x) = f(x)/g(x) and r(x) = 0 tosatisfy (1).

Now suppose that the result is valid for all polynomials with degree lessthan n for some fixed n > 0, and assume that f(x) has degree n. Supposethat

f(x) = anxn + an−1xn−1 + · · · + a1x + a0

and

g(x) = bmxm + bm−1xm−1 + · · · + b1x + b0,

and let h(x) be the polynomial defined by

h(x) = f(x) − anb−1m xn−mg(x). (2)

Then h(x) is a polynomial of degree less than n, and therefore we may ap-ply the induction hypothesis or Case 1 (whichever is relevant) to obtainpolynomials q1(x) and r(x) such that r(x) has degree less than m and

h(x) = q1(x)g(x) + r(x). (3)

Appendix E Polynomials 563

Combining (2) and (3) and solving for f(x) gives us f(x) = q(x)g(x) + r(x)with q(x) = anb−1

m xn−m + q1(x), which establishes (a) and (b) for any n ≥ 0by mathematical induction. This establishes the existence of q(x) and r(x).

We now show the uniqueness of q(x) and r(x). Suppose that q1(x), q2(x),r1(x), and r2(x) exist such that r1(x) and r2(x) each has degree less than mand

f(x) = q1(x)g(x) + r1(x) = q2(x)g(x) + r2(x).

Then

[q1(x) − q2(x)] g(x) = r2(x) − r1(x). (4)

The right side of (4) is a polynomial of degree less than m. Since g(x) hasdegree m, it must follow that q1(x) − q2(x) is the zero polynomial. Henceq1(x) = q2(x); thus r1(x) = r2(x) by (4).

In the context of Theorem E.1, we call q(x) and r(x) the quotient andremainder, respectively, for the division of f(x) by g(x). For example,suppose that F is the field of complex numbers. Then the quotient andremainder for the division of

f(x) = (3 + i)x5 − (1 − i)x4 + 6x3 + (−6 + 2i)x2 + (2 + i)x + 1

by

g(x) = (3 + i)x2 − 2ix + 4

are, respectively,

q(x) = x3 + ix2 − 2 and r(x) = (2 − 3i)x + 9.

Corollary 1. Let f(x) be a polynomial of positive degree, and let a ∈ F .Then f(a) = 0 if and only if x − a divides f(x).

Proof. Suppose that x − a divides f(x). Then there exists a polynomialq(x) such that f(x) = (x − a)q(x). Thus f(a) = (a − a)q(a) = 0 ·q(a) = 0.

Conversely, suppose that f(a) = 0. By the division algorithm, there existpolynomials q(x) and r(x) such that r(x) has degree less than one and

f(x) = q(x)(x − a) + r(x).

Substituting a for x in the equation above, we obtain r(a) = 0. Since r(x)has degree less than 1, it must be the constant polynomial r(x) = 0. Thusf(x) = q(x)(x − a).

564 Appendices

For any polynomial f(x) with coefficients from a field F , an element a ∈ Fis called a zero of f(x) if f(a) = 0. With this terminology, the precedingcorollary states that a is a zero of f(x) if and only if x − a divides f(x).

Corollary 2. Any polynomial of degree n ≥ 1 has at most n distinctzeros.

Proof. The proof is by mathematical induction on n. The result is obviousif n = 1. Now suppose that the result is true for some positive integer n, andlet f(x) be a polynomial of degree n + 1. If f(x) has no zeros, then there isnothing to prove. Otherwise, if a is a zero of f(x), then by Corollary 1 wemay write f(x) = (x−a)q(x) for some polynomial q(x). Note that q(x) mustbe of degree n; therefore, by the induction hypothesis, q(x) can have at mostn distinct zeros. Since any zero of f(x) distinct from a is also a zero of q(x),it follows that f(x) can have at most n + 1 distinct zeros.

Polynomials having no common divisors arise naturally in the study ofcanonical forms. (See Chapter 7.)

Definition. Two nonzero polynomials are called relatively prime if nopolynomial of positive degree divides each of them.

For example, the polynomials with real coefficients f(x) = x2(x − 1) andh(x) = (x − 1)(x − 2) are not relatively prime because x − 1 divides each ofthem. On the other hand, consider f(x) and g(x) = (x− 2)(x− 3), which donot appear to have common factors. Could other factorizations of f(x) andg(x) reveal a hidden common factor? We will soon see (Theorem E.9) thatthe preceding factors are the only ones. Thus f(x) and g(x) are relativelyprime because they have no common factors of positive degree.

Theorem E.2. If f1(x) and f2(x) are relatively prime polynomials, thereexist polynomials q1(x) and q2(x) such that

q1(x)f1(x) + q2(x)f2(x) = 1,

where 1 denotes the constant polynomial with value 1.

Proof. Without loss of generality, assume that the degree of f1(x) is greaterthan or equal to the degree of f2(x). The proof is by mathematical inductionon the degree of f2(x). If f2(x) has degree 0, then f2(x) is a nonzero constantc. In this case, we can take q1(x) = 0 and q2(x) = 1/c.

Now suppose that the theorem holds whenever the polynomial of lesserdegree has degree less than n for some positive integer n, and suppose thatf2(x) has degree n. By the division algorithm, there exist polynomials q(x)and r(x) such that r(x) has degree less than n and

f1(x) = q(x)f2(x) + r(x). (5)


Since f1(x) and f2(x) are relatively prime, r(x) is not the zero polynomial. Weclaim that f2(x) and r(x) are relatively prime. Suppose otherwise; then thereexists a polynomial g(x) of positive degree that divides both f2(x) and r(x).Hence, by (5), g(x) also divides f1(x), contradicting the fact that f1(x) andf2(x) are relatively prime. Since r(x) has degree less than n, we may applythe induction hypothesis to f2(x) and r(x). Thus there exist polynomialsg1(x) and g2(x) such that

g1(x)f2(x) + g2(x)r(x) = 1. (6)

Combining (5) and (6), we have

1 = g1(x)f2(x) + g2(x) [f1(x) − q(x)f2(x)]= g2(x)f1(x) + [g1(x) − g2(x)q(x)] f2(x).

Thus, setting q1(x) = g2(x) and q2(x) = g1(x) − g2(x)q(x), we obtain thedesired result.

Example 1

Let f1(x) = x3 − x2 + 1 and f2(x) = (x − 1)2. As polynomials with realcoefficients, f1(x) and f2(x) are relatively prime. It is easily verified that thepolynomials q1(x) = −x + 2 and q2(x) = x2 − x − 1 satisfy

q1(x)f1(x) + q2(x)f2(x) = 1,

and hence these polynomials satisfy the conclusion of Theorem E.2. ♦Throughout Chapters 5, 6, and 7, we consider linear operators that are

polynomials in a particular operator T and matrices that are polynomials in aparticular matrix A. For these operators and matrices, the following notationis convenient.

Definitions. Let

f(x) = a0 + a1(x) + · · · + anxn

be a polynomial with coefficients from a field F . If T is a linear operator ona vector space V over F , we define

f(T) = a0I + a1T + · · · + anTn.

Similarly, if A is a n × n matrix with entries from F , we define

f(A) = a0I + a1A + · · · + anAn.

566 Appendices

Example 2

Let T be the linear operator on R2 defined by T(a, b) = (2a + b, a − b), andlet f(x) = x2 + 2x− 3. It is easily checked that T2(a, b) = (5a + b, a + 2b); so

f(T)(a, b) = (T2 + 2T − 3I)(a, b)= (5a + b, a + 2b) + (4a + 2b, 2a − 2b) − 3(a, b)= (6a + 3b, 3a − 3b).

Similarly, if

A =(

2 11 −1

),

then

f(A) = A2+2A−3I =(

5 11 2

)+2

(2 11 −1

)−3

(1 00 1

)=(

6 33 −3

). ♦

The next three results use this notation.

Theorem E.3. Let f(x) be a polynomial with coefficients from a field F ,and let T be a linear operator on a vector space V over F . Then the followingstatements are true.

(a) f(T) is a linear operator on V.(b) If β is a finite ordered basis for V and A = [T]β , then [f(T)]β = f(A).

Proof. Exercise.

Theorem E.4. Let T be a linear operator on a vector space V over afield F , and let A be a square matrix with entries from F . Then, for anypolynomials f1(x) and f2(x) with coefficients from F ,

(a) f1(T)f2(T) = f2(T)f1(T)(b) f1(A)f2(A) = f2(A)f1(A).

Proof. Exercise.

Theorem E.5. Let T be a linear operator on a vector space V over afield F , and let A be an n × n matrix with entries from F . If f1(x) andf2(x) are relatively prime polynomials with entries from F , then there existpolynomials q1(x) and q2(x) with entries from F such that

(a) q1(T)f1(T) + q2(T)f2(T) = I

(b) q1(A)f1(A) + q2(A)f2(A) = I.

Proof. Exercise.


In Chapters 5 and 7, we are concerned with determining when a linearoperator T on a finite-dimensional vector space can be diagonalized and withfinding a simple (canonical) representation of T. Both of these problems areaffected by the factorization of a certain polynomial determined by T (thecharacteristic polynomial of T). In this setting, particular types of polynomi-als play an important role.

Definitions. A polynomial f(x) with coefficients from a field F is calledmonic if its leading coefficient is 1. If f(x) has positive degree and cannot beexpressed as a product of polynomials with coefficients from F each havingpositive degree, then f(x) is called irreducible.

Observe that whether a polynomial is irreducible depends on the field Ffrom which its coefficients come. For example, f(x) = x2 + 1 is irreducibleover the field of real numbers, but it is not irreducible over the field of complexnumbers since x2 + 1 = (x + i)(x − i).

Clearly any polynomial of degree 1 is irreducible. Moreover, for polyno-mials with coefficients from an algebraically closed field, the polynomials ofdegree 1 are the only irreducible polynomials.

The following facts are easily established.

Theorem E.6. Let φ(x) and f(x) be polynomials. If φ(x) is irreducibleand φ(x) does not divide f(x), then φ(x) and f(x) are relatively prime.

Proof. Exercise.

Theorem E.7. Any two distinct irreducible monic polynomials are rela-tively prime.

Proof. Exercise.

Theorem E.8. Let f(x), g(x), and φ(x) be polynomials. If φ(x) is ir-reducible and divides the product f(x)g(x), then φ(x) divides f(x) or φ(x)divides g(x).

Proof. Suppose that φ(x) does not divide f(x). Then φ(x) and f(x) arerelatively prime by Theorem E.6, and so there exist polynomials q1(x) andq2(x) such that

1 = q1(x)φ(x) + q2(x)f(x).

Multiplying both sides of this equation by g(x) yields

g(x) = q1(x)φ(x)g(x) + q2(x)f(x)g(x). (7)

Since φ(x) divides f(x)g(x), there is a polynomial h(x) such that f(x)g(x) =φ(x)h(x). Thus (7) becomes

g(x) = q1(x)φ(x)g(x) + q2(x)φ(x)h(x) = φ(x) [q1(x)g(x) + q2(x)h(x)] .

So φ(x) divides g(x).

568 Appendices

Corollary. Let φ(x), φ1(x), φ2(x), . . . , φn(x) be irreducible monic polyno-mials. If φ(x) divides the product φ1(x)φ2(x) · · ·φn(x), then φ(x) = φi(x)for some i (i = 1, 2, . . . , n).

Proof. We prove the corollary by mathematical induction on n. For n = 1,the result is an immediate consequence of Theorem E.7. Suppose then that forsome n > 1, the corollary is true for any n−1 irreducible monic polynomials,and let φ1(x), φ2(x), . . . , φn(x) be n irreducible polynomials. If φ(x) divides

φ1(x)φ2(x) · · ·φn(x) = [φ1(x)φ2(x) · · ·φn−1(x)] φn(x),

then φ(x) divides the product φ1(x)φ2(x) · · ·φn−1(x) or φ(x) divides φn(x) byTheorem E.8. In the first case, φ(x) = φi(x) for some i (i = 1, 2, . . . , n−1) bythe induction hypothesis; in the second case, φ(x) = φn(x) by Theorem E.7.

We are now able to establish the unique factorization theorem, which isused throughout Chapters 5 and 7. This result states that every polynomialof positive degree is uniquely expressible as a constant times a product ofirreducible monic polynomials.

Theorem E.9 (Unique Factorization Theorem for Polynomials).For any polynomial f(x) of positive degree, there exist a unique constantc; unique distinct irreducible monic polynomials φ1(x), φ2(x), . . . , φk(x); andunique positive integers n1, n2, . . . , nk such that

f(x) = c[φ1(x)]n1 [φ2(x)]n2 · · · [φk(x)]nk .

Proof. We begin by showing the existence of such a factorization usingmathematical induction on the degree of f(x). If f(x) is of degree 1, thenf(x) = ax+ b for some constants a and b with a �= 0. Setting φ(x) = x+ b/a,we have f(x) = aφ(x). Since φ(x) is an irreducible monic polynomial, theresult is proved in this case. Now suppose that the conclusion is true for anypolynomial with positive degree less than some integer n > 1, and let f(x)be a polynomial of degree n. Then

f(x) = anxn + · · · + a1x + a0

for some constants ai with an �= 0. If f(x) is irreducible, then

f(x) = an

(xn +

an−1

anxn−1 + · · · + a1

an+

a0

an

)is a representation of f(x) as a product of an and an irreducible monic poly-nomial. If f(x) is not irreducible, then f(x) = g(x)h(x) for some polynomialsg(x) and h(x), each of positive degree less than n. The induction hypothesis


guarantees that both g(x) and h(x) factor as products of a constant and pow-ers of distinct irreducible monic polynomials. Consequently f(x) = g(x)h(x)also factors in this way. Thus, in either case, f(x) can be factored as a productof a constant and powers of distinct irreducible monic polynomials.

It remains to establish the uniqueness of such a factorization. Supposethat

f(x) = c[φ1(x)]n1 [φ2(x)]n2 · · · [φk(x)]nk

= d[ψ1(x)]m1 [ψ2(x)]m2 · · · [ψr(x)]mr ,(8)

where c and d are constants, φi(x) and ψj(x) are irreducible monic polynomi-als, and ni and mj are positive integers for i = 1, 2, . . . , k and j = 1, 2, . . . , r.Clearly both c and d must be the leading coefficient of f(x); hence c = d.Dividing by c, we find that (8) becomes

[φ1(x)]n1 [φ2(x)]n2 · · · [φk(x)]nk = [ψ1(x)]m1 [ψ2(x)]m2 · · · [ψr(x)]mr . (9)

So φi(x) divides the right side of (9) for i = 1, 2, . . . , k. Consequently, by thecorollary to Theorem E.8, each φi(x) equals some ψj(x), and similarly, eachψj(x) equals some φi(x). We conclude that r = k and that, by renumberingif necessary, φi(x) = ψi(x) for i = 1, 2, . . . , k. Suppose that ni �= mi for somei. Without loss of generality, we may suppose that i = 1 and n1 > m1. Thenby canceling [φ1(x)]m1 from both sides of (9), we obtain

[φ1(x)]n1−m1 [φ2(x)]n2 · · · [φk(x)]nk = [φ2(x)]m2 · · · [φk(x)]mk . (10)

Since n1 − m1 > 0, φ1(x) divides the left side of (10) and hence divides theright side also. So φ1(x) = φi(x) for some i = 2, . . . , k by the corollary toTheorem E.8. But this contradicts that φ1(x), φ2(x), . . . , φk(x) are distinct.Hence the factorizations of f(x) in (8) are the same.

It is often useful to regard a polynomial f(x) = anxn + · · ·+a1x+a0 withcoefficients from a field F as a function f : F → F . In this case, the value off at c ∈ F is f(c) = ancn + · · ·+ a1c + a0. Unfortunately, for arbitrary fieldsthere is not a one-to-one correspondence between polynomials and polynomialfunctions. For example, if f(x) = x2 and g(x) = x are two polynomials overthe field Z2 (defined in Example 4 of Appendix C), then f(x) and g(x) havedifferent degrees and hence are not equal as polynomials. But f(a) = g(a) forall a ∈ Z2, so that f and g are equal polynomial functions. Our final resultshows that this anomaly cannot occur over an infinite field.

Theorem E.10. Let f(x) and g(x) be polynomials with coefficients froman infinite field F . If f(a) = g(a) for all a ∈ F , then f(x) and g(x) are equal.

Proof. Suppose that f(a) = g(a) for all a ∈ F . Define h(x) = f(x)− g(x),and suppose that h(x) is of degree n ≥ 1. It follows from Corollary 2 to

570 Appendices

Theorem E.1 that h(x) can have at most n zeroes. But

h(a) = f(a) − g(a) = 0

for every a ∈ F , contradicting the assumption that h(x) has positive degree.Thus h(x) is a constant polynomial, and since h(a) = 0 for each a ∈ F , itfollows that h(x) is the zero polynomial. Hence f(x) = g(x).

Answersto Selected ExercisesCHAPTER 1

SECTION 1.1

1. Only the pairs in (b) and (c) are parallel.

2. (a) x = (3,−2, 4) + t(−8, 9,−3) (c) x = (3, 7, 2) + t(0, 0,−10)

3. (a) x = (2,−5,−1) + s(−2, 9, 7) + t(−5, 12, 2)

(c) x = (−8, 2, 0) + s(9, 1, 0) + t(14,−7, 0)

SECTION 1.2

1. (a) T (b) F (c) F (d) F (e) T (f) F(g) F (h) F (i) T (j) T (k) T

3. M13 = 3, M21 = 4, and M22 = 5

4. (a)

(6 3 2

−4 3 9

)(c)

(8 20 −124 0 28

)(e) 2x4 + x3 + 2x2 − 2x + 10 (g) 10x7 − 30x4 + 40x2 − 15x

13. No, (VS 4) fails.

14. Yes

15. No

17. No, (VS 5) fails.

22. 2mn

SECTION 1.3

1. (a) F (b) F (c) T (d) F (e) T (f) F (g) F

2. (a)

(−4 52 −1

); the trace is −5 (c)

(−3 0 69 −2 1

)

(e)

⎛⎜⎜⎝1

−135

⎞⎟⎟⎠ (g)(5 6 7

)8. (a) Yes (c) Yes (e) No

11. No, the set is not closed under addition.

15. Yes

571

572 Answers to Selected Exercises

SECTION 1.4

1. (a) T (b) F (c) T (d) F (e) T (f) F

2. (a) {r(1, 1, 0, 0) + s(−3, 0,−2, 1) + (5, 0, 4, 0) : r, s ∈ R}(c) There are no solutions.(e) {r(10,−3, 1, 0, 0) + s(−3, 2, 0, 1, 0) + (−4, 3, 0, 0, 5) : r, s ∈ R}

3. (a) Yes (c) No (e) No

4. (a) Yes (c) Yes (e) No

5. (a) Yes (c) No (e) Yes (g) Yes

SECTION 1.5

1. (a) F (b) T (c) F (d) F (e) T (f) T

2. (a) linearly dependent (c) linearly independent (e) linearly dependent(g) linearly dependent (i) linearly independent

7.

{(1 00 0

),

(0 00 1

)}11. 2n

SECTION 1.6

1. (a) F (b) T (c) F (d) F (e) T (f) F(g) F (h) T (i) F (j) T (k) T (l) T

2. (a) Yes (c) Yes (e) No

3. (a) No (c) Yes (e) No

4. No

5. No

7. {u1, u2, u5}9. (a1, a2, a3, a4) = a1u1 + (a2 − a1)u2 + (a3 − a2)u3 + (a4 − a3)u4

10. (a) −4x2 − x + 8 (c) −x3 + 2x2 + 4x − 5

13. {(1, 1, 1)}15. n2 − 1

17. 12n(n − 1)

26. n

30. dim(W1) = 3, dim(W2) = 2, dim(W1 + W2) = 4, and dim(W1 ∩ W2) = 1

SECTION 1.7

1. (a) F (b) F (c) F (d) T (e) T (f) T

CHAPTER 2

SECTION 2.1

1. (a) T (b) F (c) F (d) T (e) F (f) F (g) T (h) F


2. The nullity is 1, and the rank is 2. T is not one-to-one but is onto.

4. The nullity is 4, and the rank is 2. T is neither one-to-one nor onto.

5. The nullity is 0, and the rank is 3. T is one-to-one but not onto.

10. T(2, 3) = (5, 11). T is one-to-one. 12. No.

SECTION 2.2

1. (a) T (b) T (c) F (d) T (e) T (f) F

2. (a)

⎛⎝2 −13 41 0

⎞⎠ (c)(2 1 −3

)(d)

⎛⎝ 0 2 1−1 4 5

1 0 1

⎞⎠

(f)

⎛⎜⎜⎜⎜⎜⎝0 0 · · · 0 10 0 · · · 1 0...

......

...0 1 · · · 0 01 0 · · · 0 0

⎞⎟⎟⎟⎟⎟⎠ (g)(1 0 · · · 0 1

)

3. [T]γβ =

⎛⎜⎝− 13

−1

0 123

0

⎞⎟⎠ and [T]γα =

⎛⎜⎝− 73

− 113

2 323

43

⎞⎟⎠

5. (a)

⎛⎜⎜⎝1 0 0 00 0 1 00 1 0 00 0 0 1

⎞⎟⎟⎠ (b)

⎛⎜⎜⎝0 1 02 2 20 0 00 0 2

⎞⎟⎟⎠ (e)

⎛⎜⎜⎝1

−204

⎞⎟⎟⎠

10.

⎛⎜⎜⎜⎜⎜⎜⎜⎝

1 1 0 · · · 00 1 1 · · · 00 0 1 · · · 0...

......

...0 0 0 · · · 10 0 0 · · · 1

⎞⎟⎟⎟⎟⎟⎟⎟⎠SECTION 2.3

1. (a) F (b) T (c) F (d) T (e) F (f) F(g) F (h) F (i) T (j) T

2. (a) A(2B + 3C) =

(20 −9 185 10 8

)and A(BD) =

(29

−26

)(b) AtB =

(23 19 026 −1 10

)and CB =

(27 7 9

)3. (a) [T]β =

⎛⎝2 3 00 3 60 0 4

⎞⎠, [U]γβ =

⎛⎝1 1 00 0 11 −1 0

⎞⎠, and [UT]γβ =

⎛⎝2 6 60 0 42 0 −6

⎞⎠

4. (a)

⎛⎜⎜⎝1

−146

⎞⎟⎟⎠ (c) (5)


12. (a) No. (b) No.

SECTION 2.4

1. (a) F (b) T (c) F (d) F (e) T (f) F(g) T (h) T (i) T

2. (a) No (b) No (c) Yes (d) No (e) No (f) Yes

3. (a) No (b) Yes (c) Yes (d) No

19. (b) [T]β =

⎛⎜⎜⎝1 0 0 00 0 1 00 1 0 00 0 0 1

⎞⎟⎟⎠SECTION 2.5

1. (a) F (b) T (c) T (d) F (e) T

2. (a)

(a1 b1

a2 b2

)(c)

(3 −15 −2

)

3. (a)

⎛⎝a2 b2 c2

a1 b1 c1

a0 b0 c0

⎞⎠ (c)

⎛⎝ 0 −1 01 0 0

−3 2 1

⎞⎠ (e)

⎛⎝5 −6 30 4 −13 −1 2

⎞⎠4. [T]β′ =

(2 −1

−1 1

)(2 11 −3

)(1 11 2

)=

(8 13

−5 −9

)

5. [T]β′ =

⎛⎝ 12

12

12

− 12

⎞⎠(0 10 0

)(1 11 −1

)=

⎛⎝ 12

− 12

12

− 12

⎞⎠6. (a) Q =

(1 11 2

), [LA]β =

(6 11

−2 −4

)

(c) Q =

⎛⎝1 1 11 0 11 1 2

⎞⎠, [LA]β =

⎛⎝ 2 2 2−2 −3 −4

1 1 2

⎞⎠7. (a) T(x, y) =

1

1 + m2((1 − m2)x + 2my, 2mx + (m2 − 1)y)

SECTION 2.6

1. (a) F (b) T (c) T (d) T (e) F (f) T (g) T (h) F

2. The functions in (a), (c), (e), and (f) are linear functionals.

3. (a) f1(x, y, z) = x − 12y, f2(x, y, z) = 1

2y, and f3(x, y, z) = −x + z

5. The basis for V is {p1(x), p2(x)}, where p1(x) = 2 − 2x and p2(x) = − 12

+ x.

7. (a) Tt(f) = g, where g(a + bx) = −3a − 4b

(b) [Tt]β∗

γ∗ =

(−1 1−2 1

)(c) [T]γβ =

(−1 −21 1

)


SECTION 2.7

1. (a) T (b) T (c) F (d) F (e) T (f) F (g) T

2. (a) F (b) F (c) T (d) T (e) F

3. (a) {e−t, te−t} (c) {e−t, te−t, et, tet} (e) {e−t, et cos 2t, et sin 2t}4. (a) {e(1+

√5)t/2, e(1−√

5)t/2} (c) {1, e−4t, e−2t}

CHAPTER 3

SECTION 3.1

1. (a) T (b) F (c) T (d) F (e) T (f) F(g) T (h) F (i) T

2. Adding −2 times column 1 to column 2 transforms A into B.

3. (a)

⎛⎝0 0 10 1 01 0 0

⎞⎠ (c)

⎛⎝1 0 00 1 02 0 1

⎞⎠SECTION 3.2

1. (a) F (b) F (c) T (d) T (e) F (f) T(g) T (h) T (i) T

2. (a) 2 (c) 2 (e) 3 (g) 1

4. (a)

⎛⎝1 0 0 00 1 0 00 0 0 0

⎞⎠; the rank is 2.

5. (a) The rank is 2, and the inverse is

(−1 21 −1

).

(c) The rank is 2, and so no inverse exists.

(e) The rank is 3, and the inverse is

⎛⎜⎝16

− 13

12

12

0 − 12

− 16

13

12

⎞⎟⎠.

(g) The rank if 4, and the inverse is

⎛⎜⎜⎝−51 15 7 12

31 −9 −4 −7−10 3 1 2−3 1 1 1

⎞⎟⎟⎠.

6. (a) T−1(ax2 + bx + c) = −ax2 − (4a + b)x − (10a + 2b + c)

(c) T−1(a, b, c) =(

16a − 1

3b + 1

2c, 1

2a − 1

2c,− 1

6+ 1

3b + 1

2c)

(e) T−1(a, b, c) =(

12a − b + 1

2c)x2 +

(− 12a + 1

2c)x + b

7.

⎛⎝1 0 00 1 01 0 1

⎞⎠⎛⎝1 0 01 1 00 0 1

⎞⎠⎛⎝1 0 00 −2 00 0 1

⎞⎠⎛⎝1 2 00 1 00 0 1

⎞⎠⎛⎝1 0 00 1 00 −1 1

⎞⎠⎛⎝1 0 10 1 00 0 1

⎞⎠


20. (a)

⎛⎜⎜⎜⎜⎝1 3 0 0 0

−2 1 0 0 01 0 0 0 00 −2 0 0 00 1 0 0 0

⎞⎟⎟⎟⎟⎠SECTION 3.3

1. (a) F (b) F (c) T (d) F (e) F (f) F (g) T (h) F

2. (a)

{(−31

)}(c)

⎧⎨⎩⎛⎝−1

11

⎞⎠⎫⎬⎭(e)

⎧⎪⎪⎨⎪⎪⎩⎛⎜⎜⎝−2

100

⎞⎟⎟⎠ ,

⎛⎜⎜⎝3010

⎞⎟⎟⎠ ,

⎛⎜⎜⎝−1

001

⎞⎟⎟⎠⎫⎪⎪⎬⎪⎪⎭ (g)

⎧⎪⎪⎨⎪⎪⎩⎛⎜⎜⎝−3

110

⎞⎟⎟⎠ ,

⎛⎜⎜⎝1

−101

⎞⎟⎟⎠⎫⎪⎪⎬⎪⎪⎭

3. (a)

{(50

)+ t

(−31

): t ∈ R

}(c)

⎧⎨⎩⎛⎝2

11

⎞⎠+ t

⎛⎝−111

⎞⎠: t ∈ R

⎫⎬⎭(e)

⎧⎪⎪⎨⎪⎪⎩⎛⎜⎜⎝

1000

⎞⎟⎟⎠+ r

⎛⎜⎜⎝−2

100

⎞⎟⎟⎠+ s

⎛⎜⎜⎝3010

⎞⎟⎟⎠+ t

⎛⎜⎜⎝−1

001

⎞⎟⎟⎠: r, s, t ∈ R

⎫⎪⎪⎬⎪⎪⎭(g)

⎧⎪⎪⎨⎪⎪⎩⎛⎜⎜⎝

0001

⎞⎟⎟⎠+ r

⎛⎜⎜⎝−3

110

⎞⎟⎟⎠+ s

⎛⎜⎜⎝1

−101

⎞⎟⎟⎠: r, s,∈ R

⎫⎪⎪⎬⎪⎪⎭4. (b) (1) A−1 =

⎛⎜⎝13

0 13

19

13

− 29

− 49

23

− 19

⎞⎟⎠ (2)

⎛⎝x1

x2

x3

⎞⎠ =

⎛⎝ 30

−2

⎞⎠

6. T−1{(1, 11)} =

⎧⎪⎪⎪⎨⎪⎪⎪⎩⎛⎜⎜⎜⎝

112

− 92

0

⎞⎟⎟⎟⎠+ t

⎛⎝ 1−1

2

⎞⎠: t ∈ R

⎫⎪⎪⎪⎬⎪⎪⎪⎭7. The systems in parts (b), (c), and (d) have solutions.

11. The farmer, tailor, and carpenter must have incomes in the proportions 4 : 3 : 4.

13. There must be 7.8 units of the first commodity and 9.5 units of the second.

SECTION 3.4

1. (a) F (b) T (c) T (d) T (e) F (f) T (g) T


2. (a)

⎛⎝ 4−3−1

⎞⎠ (c)

⎛⎜⎜⎝23

−2−1

⎞⎟⎟⎠ (e)

⎧⎪⎪⎨⎪⎪⎩⎛⎜⎜⎝

4010

⎞⎟⎟⎠+ r

⎛⎜⎜⎝4100

⎞⎟⎟⎠+ s

⎛⎜⎜⎝1021

⎞⎟⎟⎠: r, s ∈ R

⎫⎪⎪⎬⎪⎪⎭

(g)

⎧⎪⎪⎪⎪⎨⎪⎪⎪⎪⎩

⎛⎜⎜⎜⎜⎝−23

0790

⎞⎟⎟⎟⎟⎠+ r

⎛⎜⎜⎜⎜⎝11000

⎞⎟⎟⎟⎟⎠+ s

⎛⎜⎜⎜⎜⎝−23

0691

⎞⎟⎟⎟⎟⎠: r, s ∈ R

⎫⎪⎪⎪⎪⎬⎪⎪⎪⎪⎭

(i)

⎧⎪⎪⎪⎪⎨⎪⎪⎪⎪⎩

⎛⎜⎜⎜⎜⎝200

−10

⎞⎟⎟⎟⎟⎠+ r

⎛⎜⎜⎜⎜⎝02100

⎞⎟⎟⎟⎟⎠+ s

⎛⎜⎜⎜⎜⎝1

−40

−21

⎞⎟⎟⎟⎟⎠: r, s ∈ R

⎫⎪⎪⎪⎪⎬⎪⎪⎪⎪⎭

4. (a)

⎧⎪⎪⎪⎪⎪⎪⎪⎨⎪⎪⎪⎪⎪⎪⎪⎩

⎛⎜⎜⎜⎜⎜⎜⎜⎝

43

13

0

0

⎞⎟⎟⎟⎟⎟⎟⎟⎠+ t

⎛⎜⎜⎝1

−112

⎞⎟⎟⎠: t ∈ R

⎫⎪⎪⎪⎪⎪⎪⎪⎬⎪⎪⎪⎪⎪⎪⎪⎭(c) There are no solutions.

5.

⎛⎝ 1 0 2 1 4−1 −1 3 −2 −7

3 1 1 0 −9

⎞⎠7. {u1, u2, u5}

11. (b) {(1, 2, 1, 0, 0), (2, 1, 0, 0, 0), (1, 0, 0, 1, 0), (−2, 0, 0, 0, 1)}13. (b) {(1, 0, 1, 1, 1, 0), (0, 2, 1, 1, 0, 0), (1, 1, 1, 0, 0, 0), (−3,−2, 0, 0, 0, 1)}

CHAPTER 4

SECTION 4.1

1. (a) F (b) T (c) F (d) F (e) T

2. (a) 30 (c) −8

3. (a) −10 + 15i (c) −24

4. (a) 19 (c) 14

SECTION 4.2

1. (a) F (b) T (c) T (d) T (e) F (f) F (g) F (h) T

3. 42 5. −12 7. −12 9. 22 11. −3

13. −8 15. 0 17. −49 19. −28 − i 21. 95


SECTION 4.3

1. (a) F (b) T (c) F (d) T (e) F (f) T (g) F (h) F

3. (4,−3, 0) 5. (−20,−48,−8) 7. (0,−12, 16)

24. tn + an−1tn−1 + · · · + a1t + a0

26. (a)

(A22 −A12

−A21 A11

)(c)

⎛⎝10 0 00 −20 00 0 −8

⎞⎠(e)

⎛⎝ −3i 0 04 −1 + i 0

10 + 16i −5 − 3i 3 + 3i

⎞⎠ (g)

⎛⎝ 18 28 −6−20 −21 37

48 14 −16

⎞⎠SECTION 4.4

1. (a) T (b) T (c) T (d) F (e) F (f) T(g) T (h) F (i) T (j) T (k) T

2. (a) 22 (c) 2 − 4i

3. (a) −12 (c) −12 (e) 22 (g) −3

4. (a) 0 (c) −49 (e) −28 − i (g) 95

SECTION 4.5

1. (a) F (b) T (c) T (d) F (e) F (f) T

3. No 5. Yes 7. Yes 9. No

CHAPTER 5

SECTION 5.1

1. (a) F (b) T (c) T (d) F (e) F (f) F(g) F (h) T (i) T (j) F (k) F

2. (a) [T]β =

(0 2

−1 0

), no (c) [T]β =

⎛⎝−1 0 00 1 00 0 −1

⎞⎠, yes

(e) [T]β =

⎛⎜⎜⎝−1 1 0 0

0 −1 1 00 0 −1 00 0 0 −1

⎞⎟⎟⎠, no

3. (a) The eigenvalues are 4 and −1, a basis of eigenvectors is{(23

),

(1

−1

)}, Q =

(2 13 −1

), and D =

(4 00 −1

).

(c) The eigenvalues are 1 and −1, a basis of eigenvectors is{(1

1 − i

),

(1

−1 − i

)}, Q =

(1 1

1 − i −1 − i

), and D =

(1 00 −1

).


4. (a) λ = 3, 4 β = {(3, 5), (1, 2)}(b) λ = −1, 1, 2 β = {(1, 2, 0), (1,−1,−1), (2, 0,−1)}(f) λ = 1, 3 β = {−2 + x,−4 + x2,−8 + x3, x}(h) λ = −1, 1, 1, 1 β =

{(−1 00 1

),

(0 10 0

),

(1 00 1

),

(0 01 0

)}(i) λ = 1, 1,−1,−1 β =

{(1 01 0

),

(0 10 1

),

(−1 01 0

),

(0 −10 1

)}(j) λ = −1, 1, 5 β =

{(0 1

−1 0

),

(1 00 −1

),

(0 11 0

),

(1 00 1

)}26. 4

SECTION 5.2

1. (a) F (b) F (c) F (d) T (e) T (f) F(g) T (h) T (i) F

2. (a) Not diagonalizable (c) Q =

(1 41 −3

)

(e) Not diagonalizable (g) Q =

⎛⎝ 1 1 12 −1 0

−1 0 −1

⎞⎠3. (a) Not diagonalizable (c) Not diagonalizable

(d) β = {x − x2, 1 − x − x2, x + x2} (e) β = {(1, 1), (1,−1)}

7. An =1

3

(5n + 2(−1)n 2(5n) − 2(−1)n

5n − (−1)n 2(5)n + (−1)n

)14. (b) x(t) = c1e

3t

(−21

)+ c2e

−2t

(1

−1

)

(c) x(t) = et

⎡⎣c1

⎛⎝100

⎞⎠+ c2

⎛⎝010

⎞⎠⎤⎦+ c3e2t

⎛⎝111

⎞⎠SECTION 5.3

1. (a) T (b) T (c) F (d) F (e) T (f) T(g) T (h) F (i) F (j) T

2. (a)

(0 00 0

)(c)

⎛⎜⎝ 713

713

613

613

⎞⎟⎠ (e) No limit exists.

(g)

⎛⎝−1 0 −1−4 1 −2

2 0 2

⎞⎠ (i) No limit exists.

6. One month after arrival, 25% of the patients have recovered, 20% are ambu-latory, 41% are bedridden, and 14% have died. Eventually 59

90recover and 31

90

die.


7. 37.

8. Only the matrices in (a) and (b) are regular transition matrices.

9. (a)

⎛⎜⎜⎜⎜⎝13

13

13

13

13

13

13

13

13

⎞⎟⎟⎟⎟⎠ (c) No limit exists.

(e)

⎛⎜⎜⎜⎜⎝0 0 0

12

1 0

12

0 1

⎞⎟⎟⎟⎟⎠ (g)

⎛⎜⎜⎜⎜⎜⎜⎜⎝0 0 0 0

0 0 0 0

12

12

1 0

12

12

0 1

⎞⎟⎟⎟⎟⎟⎟⎟⎠10. (a)

⎛⎝0.2250.4410.334

⎞⎠ after two stages and

⎛⎝0.200.600.20

⎞⎠ eventually

(c)

⎛⎝0.3720.2250.403


⎛⎝0.500.200.30

⎞⎠ eventually

(e)

⎛⎝0.3290.3340.337


⎛⎜⎜⎜⎜⎝13

13

13

⎞⎟⎟⎟⎟⎠ eventually

12. 919

new, 619

once-used, and 419

twice-used

13. In 1995, 24% will own large cars, 34% will own intermediate-sized cars, and42% will own small cars; the corresponding eventual percentages are 10%, 30%,and 60%.

20. eO = I and eI = eI.

SECTION 5.4

1. (a) F (b) T (c) F (d) F (e) T (f) T (g) T

2. The subspaces in (a), (c), and (d) are T-invariant.

6. (a)

⎧⎪⎪⎨⎪⎪⎩⎛⎜⎜⎝

1000

⎞⎟⎟⎠ ,

⎛⎜⎜⎝1011

⎞⎟⎟⎠ ,

⎛⎜⎜⎝1

−122

⎞⎟⎟⎠⎫⎪⎪⎬⎪⎪⎭ (c)

{(0 11 0

)}

9. (a) −t(t2 − 3t + 3) (c) 1 − t

10. (a) t(t − 1)(t2 − 3t + 3) (c) (t − 1)3(t + 1)

18. (c) A−1 =1

2

⎛⎝2 −2 −40 1 30 0 −2

⎞⎠


31. (a) t2 − 6t + 6 (c) −(t + 1)(t2 − 6t + 6)

CHAPTER 6

SECTION 6.1

1. (a) T (b) T (c) F (d) F (e) F (f) F (g) F (h) T

2. 〈x, y〉 = 8 + 5i, ‖x‖ =√

7, ‖y‖ =√

14, and ‖x + y‖ =√

37.

3. 〈f, g〉 = 1, ‖f‖ =√

33

, ‖g‖ =

√e2 − 1

2, and ‖f + g‖ =

√11 + 3e2

6.

16. (b) No

SECTION 6.2

1. (a) F (b) T (c) T (d) F (e) T (f) F (g) T

2. For each part the orthonormal basis and the Fourier coefficients are given.

(b){√

33

(1, 1, 1),√

66

(−2, 1, 1),√

22

(0,−1, 1)}

; 2√

33

, −√

66

,√

22

.

(c) {1, 2√

3(x − 12), 6

√5(x2 − x + 1

6)}; 3

2,

√3

6, 0.

(e){

15(2,−1,−2, 4), 1√

30(−4, 2,−3, 1), 1√

155(−3, 4, 9, 7)

}; 10, 3

√30,

√155

(g)

{1

6

(3 5

−1 1

),

1

6√

2

(−4 46 −2

),

1

9√

2

(9 −36 −6

)}; 24, 6

√2, −9

√2

(i){√

2π

sin t,√

2π

cos t,√

ππ2−8

(1 − 4π

sin t),√

12ππ4−96

(t + 4π

cos t − π2)}

;√2π(2π + 2), −4

√2π,√

π2−8π

(1 + π),√

π4−963π

(k){

1√47

(−4, 3 − 2i, i, 1 − 4i), 1√60

(3 − i,−5i,−2 + 4i, 2 + i),

1√1160

(−17 − i,−9 + 8i,−18 + 6i,−9 + 8i)}

;√

47(−1 − i),√

60(−1 + 2i),√

1160(1 + i)

(m)

{1√18

(−1 + i −i2 − i 1 + 3i

),

1√246

( −4i −11 − 9i1 + 5i 1 − i

),

1√39063

(−5 − 118i −7 − 26i−145i −58

)};

√18(2 + i),

√246(−1 − i), 0

4. S⊥ = span({(i,− 12(1 + i), 1)})

5. S⊥0 is the plane through the origin that is perpendicular to x0; S⊥ is the line

through the origin that is perpendicular to the plane containing x1 and x2.

19. (a)1

17

(26

104

)(b)

1

14

⎛⎝291740

⎞⎠20. (b)

1√14


SECTION 6.3

1. (a) T (b) F (c) F (d) T (e) F (f) T (g) T

2. (a) y = (1,−2, 4) (c) y = 210x2 − 204x + 33

3. (a) T∗(x) = (11,−12) (c) T∗(f(t)) = 12 + 6t

14. T∗(x) = 〈x, z〉 y

20. (a) The linear function is y = −2t + 5/2 with E = 1, and the quadraticfunction is y = t2/3 − 4t/3 + 2 with E = 0.

(b) The linear function is y = 1.25t + 0.55 with E = 0.3, and the quadraticfunction is t2/56 + 15t/14 + 239/280 with E = 0.22857 (approximation).

21. The spring constant is approximately 2.1.

22. (a) x = 27, y = 3

7, z = 1

7(d) x = 7

12, y = 1

12, z = 1

4, w = − 1

12

SECTION 6.4

1. (a) T (b) F (c) F (d) T (e) T (f) T (g) F (h) T

2. (a) T is self-adjoint. An orthonormal basis of eigenvectors is{1√5(1,−2),

1√5(2, 1)

}, with corresponding eigenvalues 6 and 1.

(c) T is normal, but not self-adjoint. An orthonormal basis of eigenvectorsis{

1

2(1 + i,

√2),

1

2(1 + i,−

√2)

}with corresponding eigenvalues

2 +1 + i√

2and 2 − 1 + i√

2.

(e) T is self-adjoint. An orthonormal basis of eigenvectors is{1√2

(0 11 0

),

1√2

(1 00 1

),

1√2

(0 −11 0

),

1√2

(−1 00 1

)}with corresponding eigenvalues 1, 1, −1, −1.

SECTION 6.5

1. (a) T (b) F (c) F (d) T (e) F (f) T(g) F (h) F (i) F

2. (a) P =1√2

(1 11 −1

)and D =

(3 00 −1

)

(d) P =

⎛⎜⎜⎜⎝1√2

1√6

1√3

− 1√2

1√6

1√3

0 − 2√6

1√3

⎞⎟⎟⎟⎠ and D =

⎛⎝−2 0 00 −2 00 0 4

⎞⎠4. Tz is normal for all z ∈ C, Tz is self-adjoint if and only if z ∈ R, and Tz is

unitary if and only if |z| = 1.

5. Only the pair of matrices in (d) are unitarily equivalent.


25. 2(ψ − φ)

26. (a) ψ − φ

2(b) ψ +

φ

2

27. (a) x =1√2x′ +

1√2y′ and y =

1√2x′ − 1√

2y′

The new quadratic form is 3(x′)2 − (y′)2.

(c) x =3√13

x′ +2√13

y′ and y =−2√13

x′ +2√13

y′

The new quadratic form is 5(x′)2 − 8(y′)2.

29. (c) P =

⎛⎜⎜⎜⎝1√2

1√3

− 6√6

1√2

− 1√3

√6

6

0 1√3

√6

3

⎞⎟⎟⎟⎠ and R =

⎛⎜⎜⎝√

2√

2 2√

2

0√

3√

33

0 0√

63

⎞⎟⎟⎠(e) x1 = 3, x2 = −5, x3 = 4

SECTION 6.6

1. (a) F (b) T (c) T (d) F (e) F

2. For W = span({(1, 2)}), [T]β =

(15

25

25

45

).

3. (2) (a) T1(a, b) = 12(a + b, a + b) and T2(a, b) = 1

2(a − b,−a + b)

(d) T1(a, b, c) = 13(2a − b − c,−a + 2b − c,−a − b + 2c) and

T2(a, b, c) = 13(a + b + c, a + b + c, a + b + c)

SECTION 6.7

1. (a) F (b) F (c) T (d) T (e) F (f) F (g) T

2. (a) v1 =

(10

), v2 =

(01

), u1 =

1√3

⎛⎝111

⎞⎠, u2 =1√2

⎛⎝ 01

−1

⎞⎠, u3 =1√6

⎛⎝ 2−1−1

⎞⎠σ1 =

√3, σ2 =

√2

(c) v1 =1√π

sin x, v2 =1√π

cos x, v3 =1√2π

u1 =cos x + 2 sin x√

5π, u2 =

2 cos x − sin x√5π

, u3 =1√2π

,

σ1 =√

5, σ2 =√

5, σ3 = 2

3. (a)

⎛⎜⎜⎝1√3

1√2

1√6

1√3

− 1√2

1√6

− 1√3

0 2√6

⎞⎟⎟⎠⎛⎝√

6 00 00 0

⎞⎠(1√2

1√2

1√2

− 1√2

)∗


(c)

⎛⎜⎜⎜⎜⎜⎝2√10

0 1√2

1√10

1√10

− 1√2

0 − 2√10

1√10

1√2

0 − 2√10

2√10

0 − 1√2

1√10

⎞⎟⎟⎟⎟⎟⎠⎛⎜⎜⎝√

5 00 10 00 0

⎞⎟⎟⎠(

1√2

1√2

1√2

− 1√2

)∗

(e)

(1+i2

1+i2

1−i2

−1+i2

)(√6 00 0

)( 2√6

1−i√6

1+i√6

− 2√6

)∗

4. (a) WP =

(1√2

1√2

1√2

− 1√2

)( √8+

√2

2−√

8+√

22

−√8+

√2

2

√8+

√2

2

)

5. (a) T †(x, y, z) =(x + y + z

3,y − z

2

)(c) T †(a + b sin x + c cos x) = T−1(a + b sin x + c cos x) =

a

2+

(2b + c) sin x + (−b + 2c) cos x

5

6. (a)1

6

(1 1 −11 1 −1

)(c)

1

5

(1 −2 3 11 3 −2 1

)(e)

1

6

(1 − i 1 + i

1 i

)7. (a) Z1 = N(T)⊥ = R2 and Z2 = R(T) = span{(1, 1, 1), (0, 1,−1)}

(c) Z1 = N(T)⊥ = V and Z2 = R(T) = V

8. (a) No solution1

2

(11

)SECTION 6.81. (a) F (b) F (c) T (d) F (e) T (f) F

(g) F (h) F (i) T (j) F4. (a) Yes (b) No (c) No (d) Yes (e) Yes (f) No

5. (a)

⎛⎝0 2 −22 0 −21 1 0

⎞⎠ (b)

⎛⎜⎜⎝1 0 0 10 0 0 00 0 0 01 0 0 1

⎞⎟⎟⎠ (c)

⎛⎜⎜⎝0 0 0 0

−1 0 −4 00 0 0 0

−2 0 −8 0

⎞⎟⎟⎠

17. (a) and (b)

⎧⎨⎩⎛⎝ 2√

5

− 1√5

⎞⎠ ,

⎛⎝ 1√5

2√5

⎞⎠⎫⎬⎭ (c)

⎧⎪⎪⎪⎨⎪⎪⎪⎩⎛⎜⎜⎜⎝

1√2

0

1√2

⎞⎟⎟⎟⎠ ,

⎛⎜⎜⎝0

1

0

⎞⎟⎟⎠ ,

⎛⎜⎜⎜⎝1√2

0

− 1√2

⎞⎟⎟⎟⎠⎫⎪⎪⎪⎬⎪⎪⎪⎭

18. Same as Exercise 17(c)

22. (a) Q =

(1 −30 1

)and D =

(1 00 −7

)

(b) Q =

(1 − 1

2

1 12

)and D =

(2 0

0 − 12

)

(c) Q =

⎛⎝0 0 10 1 −0.251 0 2

⎞⎠ and D =

⎛⎝−1 0 00 4 00 0 6.75

⎞⎠


SECTION 6.9

7. (Bv)−1 =

⎛⎜⎜⎜⎜⎜⎝1√

1 − v20 0

v√1 − v2

0 1 0 00 0 1 0v√

1 − v20 0

1√1 − v2

⎞⎟⎟⎟⎟⎟⎠SECTION 6.10

1. (a) F (b) T (c) T (d) F (e) F

2. (a)√

18 (c) approximately 2.34

4. (a) ‖A‖ ≈ 84.74, ‖A−1‖ ≈ 17.01, and cond(A) ≈ 1441(b) ‖x − A−1b‖ ≤ ‖A−1‖ · ‖Ax − b‖ ≈ 0.17 and

‖x − A−1b‖‖A−1b‖ ≤ cond(A)

‖b − Ax‖‖b‖ ≈ 14.41

‖b‖

5. 0.001 ≤ ‖x − x‖‖x‖ ≤ 10

6. R

⎛⎝ 1−2

3

⎞⎠ =9

7, ‖B‖ = 2, and cond(B) = 2.

SECTION 6.11

1. (a) F (b) T (c) T (d) F (e) T (f) F(g) F (h) F (i) T (j) F

3. (b)

{t

(√3

1

): t ∈ R

}4. (b)

{t

(10

): t ∈ R

}if φ = 0 and

{t

(cos φ + 1

sin φ

): t ∈ R

}if φ �= 0

7. (c) There are six possibilities:

(1) Any line through the origin if φ = ψ = 0

(2)

⎧⎨⎩t

⎛⎝001

⎞⎠ : t ∈ R

⎫⎬⎭ if φ = 0 and ψ = π

(3)

⎧⎨⎩t

⎛⎝cos ψ + 1− sin ψ

0

⎞⎠ : t ∈ R

⎫⎬⎭ if φ = π and ψ �= π

(4)

⎧⎨⎩t

⎛⎝ 0cos φ − 1

sin φ

⎞⎠ : t ∈ R

⎫⎬⎭ if ψ = π and φ �= π

(5)

⎧⎨⎩t

⎛⎝010

⎞⎠ : t ∈ R

⎫⎬⎭ if φ = ψ = π


(6)

⎧⎨⎩t

⎛⎝sin φ(cos ψ + 1)− sin φ sin ψ

sin ψ(cos φ + 1)

⎞⎠ : t ∈ R

⎫⎬⎭ otherwise

CHAPTER 7

SECTION 7.1

1. (a) T (b) F (c) F (d) T (e) F (f) F (g) T (h) T

2. (a) For λ = 2,

{(−1−1

),

(10

)}J =

(2 10 2

)

(c) For λ = −1,

⎧⎨⎩⎛⎝1

30

⎞⎠⎫⎬⎭ For λ = 2,

⎧⎨⎩⎛⎝1

11

⎞⎠ ,

⎛⎝120

⎞⎠⎫⎬⎭ J =

⎛⎝−1 0 00 2 10 0 2

⎞⎠

3. (a) For λ = 2, {2,−2x, x2} J =

⎛⎝2 1 00 2 10 0 2

⎞⎠

(c) For λ = 1,

{(1 00 0

),

(0 01 0

),

(0 10 0

),

(0 00 1

)}J =

⎛⎜⎜⎝1 1 0 00 1 0 00 0 1 10 0 0 1

⎞⎟⎟⎠SECTION 7.2

1. (a) T (b) T (c) F (d) T (e) T (f) F (g) F (h) T

2. J =

⎛⎝A1 O OO A2 OO O A3

⎞⎠ where A1 =

⎛⎜⎜⎜⎜⎜⎜⎝

2 1 0 0 0 00 2 1 0 0 00 0 2 0 0 00 0 0 2 1 00 0 0 0 2 00 0 0 0 0 2

⎞⎟⎟⎟⎟⎟⎟⎠,

A2 =

⎛⎜⎜⎝4 1 0 00 4 1 00 0 4 00 0 0 4

⎞⎟⎟⎠ and A3 =

(−3 00 −3

)

3. (a) −(t − 2)5(t − 3)2 (b)

λ1 = 2 λ2 = 3• •• ••

• •

(c) λ2 = 3 (d) p1 = 3 and p2 = 1(e) (i) rank(U1) = 3 and rank(U2) = 0

(ii) rank(U21) = 1 and rank(U2

2) = 0(iii) nullity(U1) = 2 and nullity(U2) = 2(iv) nullity(U2

1) = 4 and nullity(U22) = 2


4. (a) J =

⎛⎝1 0 00 2 10 0 2

⎞⎠ and Q =

⎛⎝1 1 12 1 21 −1 0

⎞⎠

(d) J =

⎛⎜⎜⎝0 1 0 00 0 0 00 0 2 00 0 0 2

⎞⎟⎟⎠ and Q =

⎛⎜⎜⎝1 0 1 −11 −1 0 11 −2 0 11 0 1 0

⎞⎟⎟⎠

5. (a) J =

⎛⎜⎜⎝1 1 0 00 1 1 00 0 1 00 0 0 2

⎞⎟⎟⎠ and β = {2et, 2tet, t2et, e2t}

(c) J =

⎛⎜⎜⎝2 1 0 00 2 0 00 0 2 10 0 0 2

⎞⎟⎟⎠ and β = {6x, x3, 2, x2}

(d) J =

⎛⎜⎜⎝2 1 0 00 2 1 00 0 2 00 0 0 4

⎞⎟⎟⎠ and

β =

{(1 00 0

),

(0 11 0

),

(0 −10 2

),

(1 −22 0

)}

24. (a)

⎛⎝xyz

⎞⎠ = e2t

⎡⎣(c1 + c2t)

⎛⎝100

⎞⎠+ c2

⎛⎝010

⎞⎠⎤⎦+ c3e3t

⎛⎝ 11

−1

⎞⎠(b)

⎛⎝xyz

⎞⎠ = e2t

⎡⎣(c1 + c2t + c3t2)

⎛⎝100

⎞⎠+ (c2 + 2c3t)

⎛⎝010

⎞⎠+ 2c3

⎛⎝001

⎞⎠⎤⎦SECTION 7.3

1. (a) F (b) T (c) F (d) F (e) T (f) F(g) F (h) T (i) T

2. (a) (t − 1)(t − 3) (c) (t − 1)2(t − 2) (d) (t − 2)2

3. (a) t2 − 2 (c) (t − 2)2 (d) (t − 1)(t + 1)

4. For (2), (a); for (3), (a) and (d)

5. The operators are T0, I, and all operators having both 0 and 1 as eigenvalues.

SECTION 7.4

1. (a) T (b) F (c) F (d) T (e) T (f) F (g) T

2. (a)

⎛⎝0 0 271 0 −270 1 9

⎞⎠ (b)

(0 −11 −1

)

(c)

(12(−1 + i

√3) 0

0 12(−1 − i

√3)

)(e)

⎛⎜⎜⎝0 −2 0 01 0 0 00 0 0 −30 0 1 0

⎞⎟⎟⎠

3. (a) t2 + 1 and t2 C =

⎛⎜⎜⎝0 −1 0 01 0 0 00 0 0 00 0 0 0

⎞⎟⎟⎠; β = {1, x,−2x + x2,−3x + x3}

(c) t2 − t + 1 C =

⎛⎜⎜⎝0 −1 0 01 1 0 00 0 0 −10 0 1 1

⎞⎟⎟⎠β =

{(1 00 0

),

(0 0

−1 0

),

(0 10 0

),

(0 00 −1

)}

588

Index 589

Index

Absolute value of a complex num-ber, 558

Absorbing Markov chain, 304Absorbing state, 304Addition

of matrices, 9Addition of vectors, 6Additive function, 78Additive inverse

of an element of a field, 553of a vector, 12

Adjointof a linear operator, 358–360of a linear transformation, 367of a matrix, 331, 359–360uniqueness, 358

Algebraic multiplicity of an eigen-value, see Multiplicity of aneigenvalue

Algebraically closed field, 482, 561Alternating n-linear function, 239Angle between two vectors, 202,

335Annihilator

of a subset, 126of a vector, 524, 528

Approximation property of an or-thogonal projection, 399

Area of a parallelogram, 204Associated quadratic form, 389Augmented matrix, 161, 174Auxiliary polynomial, 131, 134, 137–

140Axioms of the special theory of

relativity, 453Axis of rotation, 473

Back substitution, 186Backward pass, 186Basis, 43–49, 60–61, 192–194

cyclic, 526dual, 120

Jordan canonical, 483ordered, 79orthonormal, 341, 346–347, 372rational canonical, 526standard basis for Fn, 43standard basis for Pn(F ), 43standard ordered basis for Fn,

79standard ordered basis for Pn(F ),

79uniqueness of size, 46

Bessel’s inequality, 355Bilinear form, 422–433

diagonalizable, 428diagonalization, 428–435index, 444invariants, 444matrix representation, 424–428product with a scalar, 423rank, 443signature, 444sum, 423symmetric, 428–430, 433–435vector space, 424

Cancellation law for vector addi-tion, 11

Cancellation laws for a field, 554Canonical form

Jordan, 483–516rational, 526–548for a symmetric matrix, 446

Cauchy–Schwarz inequality, 333Cayley–Hamilton theorem

for a linear operator, 317for a matrix, 318, 377

Chain of sets, 59Change of coordinate matrix, 112–

115Characteristic of a field, 23, 41,

42, 430, 449, 555Characteristic polynomial, 373

590 Index

of a linear operator, 249

of a matrix, 248

Characteristic value, see Eigenvalue

Characteristic vector, see Eigen-vector

Classical adjoint

of an n × n matrix, 231

of a 2 × 2 matrix, 208

Clique, 94, 98

Closed model of a simple econ-omy, 176–178

Closure

under addition, 17

under scalar multiplication, 17

Codomain, 551

Coefficient matrix of a system oflinear equations, 169

Coefficients

Fourier, 119

of a differential equation, 128

of a linear combination, 24, 43

of a polynomial, 9

Cofactor, 210, 232

Cofactor expansion, 210, 215, 232

Column of a matrix, 8

Column operation, 148

Column sum of matrices, 295

Column vector, 8

Companion matrix, 526

Complex number, 556–561

absolute value, 558

conjugate, 557

fundamental theorem of alge-bra, 482, 560

imaginary part, 556

real part, 556

Composition

of functions, 552

of linear transformations, 86–89

Condition number, 469

Conditioning of a system of linearequations, 464

Congruent matrices, 426, 445, 451

Conic sections, 388–392

Conjugate linear property, 333

Conjugate of a complex number,557

Conjugate transpose of a matrix,331, 359–360

Consistent system of linear equa-tions, 169

Consumption matrix, 177Convergence of matrices, 284–288Coordinate function, 119–120Coordinate system

left-handed, 203right-handed, 202

Coordinate vector, 80, 91, 110–111

Corresponding homogeneous sys-tem of linear equations, 172

Coset, 23, 109Cramer’s rule, 224Critical point, 439Cullen, Charles G., 470Cycle of generalized eigenvectors,

488–491end vector, 488initial vector, 488length, 488

Cyclic basis, 526Cyclic subspace, 313–317

Degree of a polynomial, 10Determinant, 199–243

area of a parallelogram, 204characterization of, 242cofactor expansion, 210, 215, 232Cramer’s rule, 224of an identity matrix, 212of an invertible matrix, 223of a linear operator, 258, 474,

476–477of a matrix transpose, 224of an n × n matrix, 210, 232n-dimensional volume, 226properties of, 234–236of a square matrix, 367, 394of a 2 × 2 matrix, 200uniqueness of, 242of an upper triangular matrix,

218

Index 591

volume of a parallelepiped, 226Wronskian, 232

Diagonal entries of a matrix, 8Diagonal matrix, 18, 97Diagonalizable bilinear form, 428Diagonalizable linear operator, 245Diagonalizable matrix, 246Diagonalization

of a bilinear form, 428–435problem, 245simultaneous, 282, 325, 327, 376,

405of a symmetric matrix, 431–433test, 269, 496

Diagonalize, 247Differentiable function, 129Differential equation, 128

auxiliary polynomial, 131, 134,137–140

coefficients, 128homogeneous, 128, 137–140, 523linear, 128nonhomogeneous, 142order, 129solution, 129solution space, 132, 137–140system, 273, 516

Differential operator, 131null space, 134–137order, 131, 135

Dimension, 47–48, 50–51, 103, 119,425

Dimension theorem, 70Direct sum

of matrices, 320–321, 496, 545of subspaces, 22, 58, 98, 275–

279, 318, 355, 366, 394, 398,401, 475–478, 494, 545

Disjoint sets, 550Distance, 340Division algorithm for polynomi-

als, 562Domain, 551Dominance relation, 95–96, 99Dot diagram

of a Jordan canonical form, 498–500

of a rational canonical form, 535–539

Double dual, 120, 123Dual basis, 120Dual space, 119–123

Economics, see Leontief, WassilyEigenspace

generalized, 485–491of a linear operator or matrix,

264Eigenvalue

of a generalized eigenvector, 484of a linear operator or matrix,

246, 371–374, 467–470multiplicity, 263

Eigenvectorgeneralized, 484–491of a linear operator or matrix,

246, 371–374Einstein, Albert, see Special the-

ory of relativityElement, 549Elementary column operation, 148,

153Elementary divisor

of a linear operator, 539of a matrix, 541

Elementary matrix, 149–150, 159Elementary operation, 148Elementary row operation, 148, 153,

217Ellipse, see Conic sectionsEmpty set, 549End vector of a cycle of general-

ized eigenvectors, 488Entry of a matrix, 8Equality

of functions, 9, 551of matrices, 9of n-tuples, 8of polynomials, 10of sets, 549

Equilibrium condition for a sim-ple economy, 177

Equivalence relation, 107, 551congruence, 449, 451

592 Index

unitary equivalence, 394, 472Equivalent systems of linear equa-

tions, 182–183Euclidean norm of a matrix, 467–

470Euler’s formula, 132Even function, 15, 21, 355Exponential function, 133–140Exponential of a matrix, 312, 515Extremum, see Local extremum

Field, 553–555algebraically closed, 482, 561cancellation laws, 554characteristic, 23, 41, 42, 430,

449, 555of complex numbers, 556–561product of elements, 553of real numbers, 549sum of elements, 553

Field of scalars, 6–7, 47Finite-dimensional vector space, 46–

51Fixed probability vector, 301Forward pass, 186Fourier, Jean Baptiste, 348Fourier coefficients, 119, 348, 400Frobenius inner product, 332Function, 551–552

additive, 78alternating n-linear, 239codomain of, 551composite, 552coordinate function, 119–120differentiable, 129domain of, 551equality of, 9, 551even, 15, 21, 355exponential, 133–140image of, 551imaginary part of, 129inverse, 552invertible, 552linear, see Linear transforma-

tionn-linear, 238–242norm, 339

odd, 21, 355one-to-one, 551onto, 551polynomial, 10, 51–53, 569preimage of, 551range of, 551real part of, 129restriction of, 552sum of, 9vector space, 9

Fundamental theorem of algebra,482, 560

Gaussian elimination, 186–187back substitution, 186backward pass, 186forward pass, 186

General solution of a system oflinear equations, 189

Generalized eigenspace, 485–491Generalized eigenvector, 484–491Generates, 30Generator of a cyclic subspace, 313Geometry, 385, 392, 436, 472–478Gerschgorin’s disk theorem, 296Gram–Schmidt process, 344, 396Gramian matrix, 376

Hardy–Weinberg law, 307Hermitian operator or matrix, see

Self-adjoint linear operatoror matrix

Hessian matrix, 440Homogeneous linear differential equa-

tion, 128, 137–140, 523Homogeneous polynomial of de-

gree two, 433Homogeneous system of linear equa-

tions, 171Hooke’s law, 128, 368Householder operator, 397

Identity elementin C, 557in a field, 553, 554

Identity matrix, 89, 93, 212Identity transformation, 67Ill-conditioned system, 464

Index 593

Image, see RangeImage of an element, 551Imaginary number, 556Imaginary part

of a complex number, 556of a function, 129

Incidence matrix, 94–96, 98Inconsistent system of linear equa-

tions, 169Index

of a bilinear form, 444of a matrix, 445

Infinite-dimensional vector space,47

Initial probability vector, 292Initial vector of a cycle of gener-

alized eigenvectors, 488Inner product, 329–336

Frobenius, 332on H, 335standard, 330

Inner product spacecomplex, 332H, 332, 343, 348–349, 380, 399real, 332

Input–output matrix, 177Intersection of sets, 550Invariant subspace, 77–78, 313–

315Invariants

of a bilinear form, 444of a matrix, 445

Inverseof a function, 552of a linear transformation, 99–

102, 164–165of a matrix, 100–102, 107, 161–

164Invertible function, 552Invertible linear transformation, 99–

102Invertible matrix, 100–102, 111,

223, 469Irreducible polynomial, 525, 567–

569Isometry, 379Isomorphic vector spaces, 102–105

Isomorphism, 102–105, 123, 425

Jordan block, 483Jordan canonical basis, 483Jordan canonical form

dot diagram, 498–500of a linear operator, 483–516of a matrix, 491uniqueness, 500

Kernel, see Null spaceKronecker delta, 89, 335

Lagrange interpolation formula, 51–53, 125, 402

Lagrange polynomials, 51, 109, 125Least squares approximation, 360–

364Least squares line, 361Left shift operator, 76Left-handed coordinate system, 203Left-multiplication transformation,

92–94Legendre polynomials, 346Length of a cycle of generalized

eigenvectors, 488Length of a vector, see NormLeontief

closed model, 176–178open model, 178–179

Leontief, Wassily, 176Light second, 452Limit of a sequence of matrices,

284–288Linear combination, 24–26, 28–30,

39uniqueness of coefficients, 43

Linear dependence, 36–40Linear differential equation, 128Linear equations, see System of

linear equationsLinear functional, 119Linear independence, 37–40, 59–

61, 342Linear operator, (see also Linear

transformation), 112adjoint, 358–360characteristic polynomial, 249

594 Index

determinant, 258, 474, 476–477diagonalizable, 245diagonalize, 247differential, 131differentiation, 131, 134–137eigenspace, 264, 401eigenvalue, 246, 371–374eigenvector, 246, 371–374elementary divisor, 539Householder operator, 397invariant subspace, 77–78, 313–

315isometry, 379Jordan canonical form, 483–516left shift, 76Lorentz transformation, 454–461minimal polynomial, 516–521nilpotent, 512normal, 370, 401–403orthogonal, 379–385, 472–478partial isometry, 394, 405positive definite, 377–378positive semidefinite, 377–378projection, 398–403projection on a subspace, 86,

117projection on the x-axis, 66quotient space, 325–326rational canonical form, 526–548reflection, 66, 113, 117, 387, 472–

478right shift, 76rotation, 66, 382, 387, 472–478self-adjoint, 373, 401–403simultaneous diagonalization, 282,

405spectral decomposition, 402spectrum, 402unitary, 379–385, 403

Linear space, see Vector spaceLinear transformation, (see also

Linear operator), 65adjoint, 367composition, 86–89identity, 67image, see Rangeinverse, 99–102, 164–165

invertible, 99–102isomorphism, 102–105, 123, 425kernel, see Null spaceleft-multiplication, 92–94linear functional, 119matrix representation, 80, 88–

92, 347, 359null space, 67–69, 134–137nullity, 69–71one-to-one, 71onto, 71product with a scalar, 82pseudoinverse, 413range, 67–69rank, 69–71, 159restriction, 77–78singular value, 407singular value theorem, 406sum, 82transpose, 121, 126, 127vector space of, 82, 103zero, 67

Local extremum, 439, 450Local maximum, 439, 450Local minimum, 439, 450Lorentz transformation, 454–461Lower triangular matrix, 229

Markov chain, 291, 304Markov process, 291Matrix, 8

addition, 9adjoint, 331, 359–360augmented, 161, 174change of coordinate, 112–115characteristic polynomial, 248classical adjoint, 208, 231coefficient, 169cofactor, 210, 232column of, 8column sum, 295companion, 526condition number, 469congruent, 426, 445, 451conjugate transpose, 331, 359–

360consumption, 177

Index 595

convergence, 284–288determinant of, 200, 210, 232,

367, 394diagonal, 18, 97diagonal entries of, 8diagonalizable, 246diagonalize, 247direct sum, 320–321, 496, 545eigenspace, 264eigenvalue, 246, 467–470eigenvector, 246elementary, 149–150, 159elementary divisor, 541elementary operations, 148entry, 8equality of, 9Euclidean norm, 467–470exponential of, 312, 515Gramian, 376Hessian, 440identity, 89incidence, 94–96, 98index, 445input–output, 177invariants, 445inverse, 100–102, 107, 161–164invertible, 100–102, 111, 223,

469Jordan block, 483Jordan canonical form, 491limit of, 284–288lower triangular, 229minimal polynomial, 517–521multiplication with a scalar, 9nilpotent, 229, 512norm, 339, 467–470, 515normal, 370orthogonal, 229, 382–385orthogonally equivalent, 384–385permanent of a 2 × 2, 448polar decomposition, 411–413positive definite, 377positive semidefinite, 377product, 87–94product with a scalar, 9pseudoinverse, 414rank, 152–159

rational canonical form, 541reduced row echelon form, 185,

190–191regular, 294representation of a bilinear form,

424–428representation of a linear trans-

formation, 80, 88–92, 347,359

row of, 8row sum, 295scalar, 258self-adjoint, 373, 467signature, 445similarity, 115, 118, 259, 508simultaneous diagonalization, 282singular value, 410singular value decomposition, 410skew-symmetric, 23, 229, 371square, 9stochastic, see Transition ma-

trixsubmatrix, 230sum, 9symmetric, 17, 373, 384, 389,

446trace, 18, 20, 97, 118, 259, 281,

331, 393transition, 288–291, 515transpose, 17, 20, 67, 88, 127,

224, 259transpose of a matrix inverse,

107transpose of a product, 88unitary, 229, 382–385unitary equivalence, 384–385, 394,

472upper triangular, 21, 218, 258,

370, 385, 397Vandermonde, 230vector space, 9, 331, 425zero, 8

Maximal element of a family ofsets, 58

Maximal linearly independent sub-set, 59–61

Maximal principle, 59

596 Index

Member, see ElementMichelson–Morley experiment, 451Minimal polynomial

of a linear operator, 516–521of a matrix, 517–521uniqueness, 516

Minimal solution to a system oflinear equations, 364–365

Monic polynomial, 567–569Multiplicative inverse of an ele-

ment of a field, 553Multiplicity of an eigenvalue, 263Multiplicity of an elementary di-

visor, 539, 541

n-dimensional volume, 226n-linear function, 238–242n-tuple, 7

equality, 8scalar multiplication, 8sum, 8vector space, 8

Nilpotent linear operator, 512Nilpotent matrix, 229, 512Nonhomogeneous linear differen-

tial equation, 142Nonhomogeneous system of linear

equations, 171Nonnegative vector, 177Norm

Euclidean, 467–470of a function, 339of a matrix, 339, 467–470, 515of a vector, 333–336, 339

Normal equations, 368Normal linear operator or matrix,

370, 401–403Normalizing a vector, 335Null space, 67–69, 134–137Nullity, 69–71Numerical methods

conditioning, 464QR factorization, 396–397

Odd function, 21, 355One-to-one function, 551One-to-one linear transformation,

71

Onto function, 551Onto linear transformation, 71Open model of a simple economy,

178–179Order

of a differential equation, 129of a differential operator, 131,

135Ordered basis, 79Orientation of an ordered basis,

202Orthogonal complement, 349, 352,

398–401Orthogonal equivalence of matrices,

384–385Orthogonal matrix, 229, 382–385Orthogonal operator, 379–385, 472–

478on R2, 387–388

Orthogonal projection, 398–403Orthogonal projection of a vector,

351Orthogonal subset, 335, 342Orthogonal vectors, 335Orthonormal basis, 341, 346–347,

372Orthonormal subset, 335

Parallel vectors, 3Parallelogram

area of, 204law, 2, 337

Parseval’s identity, 355Partial isometry, 394, 405Pendular motion, 143Penrose conditions, 421Periodic motion of a spring, 127,

144Permanent of a 2 × 2 matrix, 448Perpendicular vectors, see Orthog-

onal vectorsPhysics

Hooke’s law, 128, 368pendular motion, 143periodic motion of a spring, 144special theory of relativity, 451–

461

Index 597

spring constant, 368Polar decomposition of a matrix,

411–413Polar identities, 338Polynomial, 9

annihilator of a vector, 524, 528auxiliary, 131, 134, 137–140characteristic, 373coefficients of, 9degree of a, 10division algorithm, 562equality, 10function, 10, 51–53, 569fundamental theorem of alge-

bra, 482, 560homogeneous of degree two, 433irreducible, 525, 567–569Lagrange, 51, 109, 125Legendre, 346minimal, 516–521monic, 567–569product with a scalar, 10quotient, 563relatively prime, 564remainder, 563splits, 262, 370, 373sum, 10trigonometric, 399unique factorization theorem, 568vector space, 10zero, 9zero of a, 62, 134, 560, 564

Positive definite matrix, 377Positive definite operator, 377–378Positive semidefinite matrix, 377Positive semidefinite operator, 377–

378Positive vector, 177Power set, 59Preimage of an element, 551Primary decomposition theorem,

545Principal axis theorem, 390Probability, see Markov chainProbability vector, 289

fixed, 301initial, 292

Productof a bilinear form and a scalar,

423of complex numbers, 556of elements of a field, 553of a linear transformation and

scalar, 82of matrices, 87–94of a matrix and a scalar, 9of a vector and a scalar, 7

Projectionon a subspace, 76, 86, 98, 117,

398–403on the x-axis, 66orthogonal, 398–403

Proper subset, 549Proper value, see EigenvalueProper vector, see EigenvectorPseudoinverse

of a linear transformation, 413of a matrix, 414

Pythagorean theorem, 337

QR factorization, 396–397Quadratic form, 389, 433–439Quotient of polynomials, 563Quotient space, 23, 58, 79, 109,

325–326

Range, 67–69, 551Rank

of a bilinear form, 443of a linear transformation, 69–

71, 159of a matrix, 152–159

Rational canonical basis, 526Rational canonical form

dot diagram, 535–539elementary divisor, 539, 541of a linear operator, 526–548of a matrix, 541uniqueness, 539

Rayleigh quotient, 467Real part

of a complex number, 556of a function, 129

Reduced row echelon form of amatrix, 185, 190–191

598 Index

Reflection, 66, 117, 472–478of R2, 113, 382–383, 387, 388

Regular transition matrix, 294Relation on a set, 551Relative change in a vector, 465Relatively prime polynomials, 564Remainder, 563Replacement theorem, 45–46Representation of a linear trans-

formation by a matrix, 80Resolution of the identity opera-

tor, 402Restriction

of a function, 552of a linear operator on a sub-

space, 77–78Right shift operator, 76Right-handed coordinate system,

202Rigid motion, 385–387

in the plane, 388Rotation, 66, 382, 387, 472–478Row of a matrix, 8Row operation, 148Row sum of matrices, 295Row vector, 8Rudin, Walter, 560

Saddle point, 440Scalar, 7Scalar matrix, 258Scalar multiplication, 6Schur’s theorem

for a linear operator, 370for a matrix, 385

Second derivative test, 439–443,450

Self-adjoint linear operator or ma-trix, 373, 401–403, 467

Sequence, 11Set, 549–551

chain, 59disjoint, 550element of a, 549empty, 549equality of, 549equivalence relation, 107, 394,

449, 451

equivalence relation on a, 551intersection, 550linearly dependent, 36–40linearly independent, 37–40orthogonal, 335, 342orthonormal, 335power, 59proper subset, 549relation on a, 551subset, 549union, 549

Signatureof a bilinear form, 444of a matrix, 445

Similar matrices, 115, 118, 259,508

Simpson’s rule, 126Simultaneous diagonalization, 282,

325, 327, 376, 405Singular value

of a linear transformation, 407of a matrix, 410

Singular value decomposition of amatrix, 410

Singular value decomposition the-orem for matrices, 410

Singular value theorem for lineartransformations, 406

Skew-symmetric matrix, 23, 229,371

Solutionof a differential equation, 129minimal, 364–365to a system of linear equations,

169Solution set of a system of linear

equations, 169, 182Solution space of a homogeneous

differential equation, 132, 137–140

Space–time coordinates, 453Span, 30, 34, 343Special theory of relativity, 451–

461axioms, 453Lorentz transformation, 454–461space–time coordinates, 453

Index 599

time contraction, 459–461Spectral decomposition, 402Spectral theorem, 401Spectrum, 402Splits, 262, 370, 373Spring, periodic motion of, 127,

144Spring constant, 368Square matrix, 9Square root of a unitary operator,

393Standard basis

for Fn, 43for Pn(F ), 43

Standard inner product on Fn, 330Standard ordered basis

for Fn, 79for Pn(F ), 79

Standard representation of a vec-tor space, 104–105

Statesabsorbing, 304of a transition matrix, 288

Stationary vector, see Fixed prob-ability vector

Statistics, see Least squares ap-proximation

Stochastic matrix, see Transitionmatrix

Stochastic process, 291Submatrix, 230Subset, 549

linearly dependent, 36–40linearly independent, 59–61maximal linearly independent,

59–61orthogonal, 335, 342orthogonal complement of a, 349,

352, 398–401orthonormal, 335span of a, 30, 34, 343sum, 22

Subspace, 16–19, 50–51cyclic, 313–317dimension of a, 50–51direct sum, 22, 58, 98, 275–279,

318, 355, 366, 394, 398, 401,

475–478, 494, 545generated by a set, 30invariant, 77–78sum, 275zero, 16

Sumof bilinear forms, 423of complex numbers, 556of elements of a field, 553of functions, 9of linear transformations, 82of matrices, 9of n-tuples, 8of polynomials, 10of subsets, 22of vectors, 7

Sum of subspaces, (see also Directsum, of subspaces), 275

Sylvester’s law of inertiafor a bilinear form, 443for a matrix, 445

Symmetric bilinear form, 428–430,433–435

Symmetric matrix, 17, 373, 384,389, 446

System of differential equations,273, 516

System of linear equations, 25–30,169

augmented matrix, 174coefficient matrix, 169consistent, 169corresponding homogeneous sys-

tem, 172equivalent, 182–183Gaussian elimination, 186–187general solution, 189homogeneous, 171ill-conditioned, 464inconsistent, 169minimal solution, 364–365nonhomogeneous, 171solution to, 169well-conditioned, 464

T-annihilator, 524, 528T-cyclic basis, 526

600 Index

T-cyclic subspace, 313–317T-invariant subspace, 77–78, 313–

315Taylor’s theorem, 441Test for diagonalizability, 496Time contraction, 459–461Trace of a matrix, 18, 20, 97, 118,

259, 281, 331, 393Transition matrix, 288–291, 515

regular, 294states, 288

Translation, 386Transpose

of an invertible matrix, 107of a linear transformation, 121,

126, 127of a matrix, 17, 20, 67, 88, 127,

224, 259Trapezoidal rule, 126Triangle inequality, 333Trigonometric polynomial, 399Trivial representation of zero vec-

tor, 36–38

Union of sets, 549Unique factorization theorem for

polynomials, 568Uniqueness

of adjoint, 358of coefficients of a linear com-

bination, 43of Jordan canonical form, 500of minimal polynomial, 516of rational canonical form, 539of size of a basis, 46

Unit vector, 335Unitary equivalence of matrices,

384–385, 394, 472Unitary matrix, 229, 382–385Unitary operator, 379–385, 403Upper triangular matrix, 21, 218,

258, 370, 385, 397

Vandermonde matrix, 230Vector, 7

additive inverse of a, 12annihilator of a, 524, 528column, 8

coordinate, 80, 91, 110–111fixed probability, 301Fourier coefficients, 119, 348, 400initial probability, 292linear combination, 24nonnegative, 177norm, 333–336, 339normalizing, 335orthogonal, 335orthogonal projection of a, 351parallel, 3perpendicular, see Orthogonal

vectorspositive, 177probability, 289product with a scalar, 8Rayleigh quotient, 467row, 8sum, 7unit, 335zero, 12, 36–38

Vector space, 6addition, 6basis, 43–49, 192–194of bilinear forms, 424of continuous functions, 18, 67,

119, 331, 345, 356of cosets, 23dimension, 47–48, 103, 119, 425dual, 119–123finite-dimensional, 46–51of functions from a set into a

field, 9, 109, 127infinite-dimensional, 47of infinitely differentiable func-

tions, 130–137, 247, 523isomorphism, 102–105, 123, 425of linear transformations, 82, 103of matrices, 9, 103, 331, 425of n-tuples, 8of polynomials, 10, 86, 109quotient, 23, 58, 79, 109scalar multiplication, 6of sequences, 11, 109, 356, 369subspace, 16–19, 50–51zero, 15zero vector of a, 12

Index 601

Volume of a parallelepiped, 226

Wade, William R., 439Well-conditioned system, 464Wilkinson, J. H., 397Wronskian, 232

Z2, 16, 42, 429, 553Zero matrix, 8

Zero of a polynomial, 62, 134, 560,564

Zero polynomial, 9Zero subspace, 16Zero transformation, 67Zero vector, 12, 36–38

trivial representation, 36–38Zero vector space, 15

linear algebra - cloudflare-ipfs.com

Documents