linear algebra chapter 5

The Vector Space �n5SECTION 5.1 Subspaces and Spanning

In Section 2.5 we introduced the set �n of all n × 1 columns, investigated the lineartransformations �n → �m, and showed that they are all given by left multiplication by an m × n matrix. Particular attention was paid to the euclidean plane �2 and to euclidean space �3, where geometric transformations like rotations and reflectionswere shown to be matrix transformations. We returned to this in Section 4.4 whereprojections in �2 or �3 were also shown to be matrix transformations, and wheredeterminants were related to areas and volumes.

In this chapter we investigate �n in full generality, and introduce some of themost important concepts and methods in linear algebra. While the n-tuples in �n

can be written as rows or as columns, we will primarily denote them as columnmatrices X, Y, etc. The main exception is that the geometric vectors in �2 and �3

will be written as , , etc.

Subspaces of �n

A set1 U of vectors in �n is called a subspace of �n if it satisfies the followingproperties:

S1. The zero vector 0 is in U. S2. If X and Y are in U, then X + Y is also in U. S3. If X is in U, then aX is in U for every real number a.

We say that the subset U is closed under addition if S2 holds, and that U is closedunder scalar multiplication if S3 holds.

Clearly �n is a subspace of itself. The set U = {0}, consisting of only the zerovector, is also a subspace because 0 + 0 = 0 and a0 = 0 for each a in �; it is called thezero subspace. Any subspace of �n other than {0} or �n is called a propersubspace.

w�v�

1 We use the language of sets. Informally, a set X is a collection of objects, called the elements of the set. Two sets X and Y arecalled equal (written X = Y ) if they have the same elements. If every element of X is in the set Y, we say that X is a subset of Y, and write X ⊆ Y. Hence both X ⊆ Y and Y ⊆ X if and only if X = Y.

nic22772_ch05.qxd 11/21/2005 6:49 PM 97

198 Chapter 5 The Vector Space �n

We saw in Section 4.2 that every plane M through the origin in �3 has equationax + by + cz = 0 where a, b, and c are not all zero. Here is a normal tothe plane and

where and denotes the dot product introduced in Section 4.2.2

Then M is a subspace of �3. Indeed we show that M satisfies S1, S2, and S3 as follows:

S1. is in M because ;

S2. If and are in M, then , so is in M;

S3. If is in M, then , so is in M.

This proves the first part of

Example 1Planes and lines through the origin in �3 are all subspaces of �3.

Solution We dealt with planes above. If L is a line through the origin with direction vector then . We leave it as an exercise to verify that Lsatisfies S1, S2, and S3.

Example 1 shows that lines through the origin in �2 are subspaces; in fact, they arethe only proper subspaces of �2 (Exercise 24). Indeed, we shall see in Example 11§5.2 that lines and planes through the origin in �3 are the only proper subspaces of�3. Thus the geometry of lines and planes through the origin is captured by thesubspace concept. (Note that every line or plane is just a translation of one of these.)

Subspaces can also be used to describe important features of an m × n matrix A.The null space of A, denoted null A, and the image space of A, denoted im A, aredefined by

.

In the language of Chapter 2, null A consists of all solutions X in �n of thehomogeneous system AX = 0, and im A is the set of all vectors Y in �m such that AX = Y has a solution for some X.

Note that X is in null A if it satisfies the condition AX = 0, while im A consists ofvectors of the form AX for some X in �n. These two ways to describe subsets occurfrequently.

Example 2If A is an m × n matrix, then:

1. null A is a subspace of �n.2. im A is a subspace of �m.

Solution 1. The zero vector 0 in �n lies in null A because A0 = 0.3 If X and X1are in null A, then X + X1 and aX are in null A because they satisfy therequired condition:

null { } im { }A X AX A AX Xn n= = =in and in� �| |0

L td t= { }�

| in �d�,

av�n av a n v a� � � �i ( ) ( ) ( )= = =i 0 0v�v v� �+ 1n v v n v n v� � � � � � �i i i( )+ = + = + =1 1 0 0 0v1

�v�n� �i 0 0=0

�

n v� �iv x y z T� = [ ]

M v n v= ={ }� � �in �3 | i 0

n a b c T� [ ]=

2 We are using set notation here. In general {q | p } means the set of all objects q with property p.3 We are using 0 to represent the zero vector in both �m and �n. This abuse of notation is common, and causes no confusion

once everyone knows that it is going on.

nic22772_ch05.qxd 11/21/2005 6:49 PM Page 198

199Section 5.1 Subspaces and Spanning

4 The vector n� = v� × w� is nonzero because v� and w� are not parallel.

Hence null A satisfies S1, S2, and S3, and so is a subspace of �n.2. The zero vector 0 in �m lies in im A because 0 = A0. Suppose that Y and

Y1 are in im A, say Y = AX and Y1 = AX1 where X and X1 are in �n. Then

show that Y + Y1 and aY are both in im A (they have the required form).Hence im A is a subspace of �m.

There are other important subspaces associated with a matrix A that clarify basicproperties of A. If A is an m × n matrix and � is any number, let

.

A vector X is in E�(A) if and only if (�I − A)X = 0�

, so Example 2 gives:

Example 3E�(A) = null(�I − A) is a subspace of �n for each n × n matrix A and number �.

E�(A) is called the eigenspace of A corresponding to �. The reason for the name isthat, in the terminology of Section 3.3, � is an eigenvalue of A if E� (A) ≠ {0}. In this case the nonzero vectors in E�(A) are called the eigenvectors of Acorresponding to �.

The reader should not get the impression that every subset of �n is a subspace.For example:

Hence neither U1 nor U2 is a subspace of �2. (However, see Exercise 20.)

Spanning SetsLet and be two nonzero, nonparallel vectors in �3 with their tails at the origin.The plane M through the origin containing these vectors is described in Section 4.2by saying that is a normal for M, and that M consists of all vectors suchthat .4 While this is a very useful way to look at planes, there is anotherapproach that is at least as useful in �3 and, more importantly, works for all subspaces of �n.

The idea is as follows: Observe that, by the diagram, a vector is in M if andonly if it has the form

for certain real numbers a and b (we say that p�is a linear combination of v� and w�).

p av bw� � �= +

p�

n p� �i = 0p�n v w� � �= ×

w�v�

Uxy

x

Uxy

x y

1

22

0=

≥

=

=

� ��

satisfies S1and S2, but not S3;

22� satisfies S1and S3, but not S2.

E A X AX Xn� �( ) { }= =in � |

Y Y AX AX A X X aY a AX A aX+ = + = + = =1 1 1( ) ( ) ( )and

A X X AX AX A aX a AX a( ) ( ) ( ) .+ = + = + = = = =1 1 0 0 0 0 0and

a

bM

p�

0�

w� w�

v�v�

nic22772_ch05.qxd 11/21/2005 6:49 PM Page 199

Hence we can describe M as

and we say that is a spanning set for M. It is this notion of a spanning set thatprovides a way to describe all subspaces of �n.

Given vectors X1, X2, … , Xk in �n, a vector of the form

is called a linear combination of the Xi, and ti is called the coefficient of Xi in thelinear combination. The set of all such linear combinations is called the span of theXi and is denoted

.

Thus span{X, Y } = {sX + tY | s, t in �}, and span{X} = {tX | t in �}.In particular, the above discussion shows that, if and are two nonzero,

nonparallel vectors in �3, then

M = span

is the plane in �3 containing and . Moreover, if is any nonzero vector in �3

(or �2), then

is the line with direction vector . Hence lines and planes can both be described interms of spanning sets.

Example 4Let and in �4. Determine whether

or is in U = span{X, Y }.

Solution The vector P is in U if and only if P = sX + tY for scalars s and t. Equatingcomponents gives equations

.

This linear system has solution s = 3 and t = −2, so P is in U. On the otherhand, Q = sX + tY leads to equations

and this system has no solution. So Q does not lie in U.

Theorem 1

Let in �n. Then:1. U is a subspace of �n containing each Xi. 2. If W is a subspace of �n and each Xi is in W, then U ⊆ W.

U X X Xk= span , , ,{ }1 2 …

2 3 2 4 3 2 1 2s t s t s t s t+ = − + = − = + =, , , and

2 3 0 4 11 2 8 1s t s t s t s t+ = − + = − − = + =, , , and

Q T= [ ]2 3 1 2P T= −[ ]0 11 8 1Y T= −[ ]3 4 1 1X T= −[ ]2 1 2 1

d�

L v td t= =span{ } { }� � | in �

d�

w�v�{ , }v w� �

w�v�

span{ , , , } { }X X X t X t X t X tk k k i1 2 1 1 2 2… = + + + | in �

t X t X t X tk k i1 1 2 2+ + + where the are scalars

{ , }v w� �

M av bw a b= +{ , }� � | in �5


5 In particular, this implies that any vector p� orthogonal to v� × w� must be a linear combination p� = av� + bw� of v� and w� forsome a and b. Can you prove this directly?

nic22772_ch05.qxd 11/21/2005 6:49 PM Page 200

PROOF

Write1. The zero vector 0 is in U because is a linear

combination of the Xi. If andare in U, then X + Y and aX are in U because

Hence S1, S2, and S3 are satisfied for U, proving (1). 2. Let where the ti are scalars and each Xi is in

W. Then each tiXi is in W because W satisfies S3. But then X is in Wbecause W satisfies S2 (verify). This proves (2).

Condition (2) in Theorem 1 can be expressed by saying that isthe smallest subspace of �n that contains each Xi. Here is an example of how it is used.

If we say that the vectors span thesubspace U.

Example 5If X and Y are in �n, show that .

Solution Since both X + Y and X − Y are in span{X, Y }, Theorem 1 gives

But and are both inspan{X + Y, X − Y}, so

again by Theorem 1. Thus span{X, Y } = span{X + Y, X − Y }.

It turns out that many important subspaces are best described by giving aspanning set. Here are three examples, beginning with an important spanning setfor �n itself. Column j of the n × n identity matrix In is denoted Ej and called the jth coordinate vector in �n, and the set is called the standard basis of �n. If is any vector in �n, then

as the reader can verify. This proves:

Example 6.

If A is an m × n matrix A, the next two examples show that it is a routinematter to find spanning sets for null A and im A.

Example 7Given an m × n matrix A, let X1, X2, … , Xk denote the basic solutions to thesystem AX = 0 given by the gaussian algorithm. Then

.null span{ 2A X X Xk� 1, , , }…

�n

kE E E= span{ , , , }1 2 …

X x E x E x En n= + + +1 1 2 2

X x x xnT= [ ]1 2

{ , , , }E E En1 2 …

span span{ , } { , }X Y X Y X Y⊆ + −

Y X Y X Y= + − −12

( ) ( )12

X X Y X Y= + + −12

12( ) ( )

span span{ , } { , }.X Y X Y X Y+ − ⊆

span span{ } { }X Y X Y X Y, ,= + −

X X Xk1 2, , ,…U X X Xk= span , , ,{ }1 2 …

span{ }X X Xk1 2, , ,…

X t X t X t Xk k= + + +1 1 2 2

X Y s t X s t X s t XaX at X at X

k k k+ = + + + + + += + +

( ) ( ) ( ) and( ) ( )

1 1 1 2 2 2

1 1 2 2

,+ ( )at Xk k .

Y s X s X s Xk k= + + +1 1 2 2

X t X t X t Xk k= + + +1 1 2 2

0 0 0 01 2= + + +X X Xk

U X X Xk= span , , ,{ } for convenience.1 2 …

201Section 5.1 Subspaces and Spanning

nic22772_ch05.qxd 11/21/2005 6:49 PM Page 201

Solution If X is in null A, then AX = 0 so Theorem 3 §2.2 shows that X is a linearcombination of the basic solutions; that is, null A ⊆ span{X1, X2, … , Xk}. Onthe other hand, if X is in span{X1, X2, … , Xk}, then X = t1X1 + t2X2 + … + tkXkfor scalars ti, so

.

This shows that X is in null A, and hence that span{X1, X2, … , Xk} ⊆ null A.Thus we have equality.

Example 8Let C1, C2, … , Cn denote the columns of the m × n matrix A. Then

im A = span{C1, C2, … , Cn}.

Solution Observe first that AEj = Cj for each j where Ej is the jth coordinate vector in �n.Hence each Cj is in im A, and so span{C1, C2, … , Cn} ⊆ im A by Theorem 1.

Conversely, let Y be a vector in im A, say Y = AX for some X in �n. If X = [x1 x2 … xn]T, then Theorem 4 §2.2 gives

so Y is in span{C1, C2, … , Cn}. Hence im A ⊆ span{C1, C2, … , Cn}, and theresult is proved.

Exercises 5.1

Y AX C C C

xx

x

x C x C x Cn

n

n n= =

= + + +[ ]1 2

1

21 1 2 2

AX t AX t AX t AX t t tk k k= + + + = + + + =1 1 2 2 1 20 0 0 0


1. In each case determine whether U is a subspaceof �3. Support your answer.(a) U = {[1 s t]T | s and t in �}.

♦(b) U = {[0 s t]T | s and t in �}.(c) U = {[r s t]T | r, s, and t in �, − r + 3s + 2t = 0}.

♦(d) U = {[r 3s r − 2]T | r and s in �}.(e) U = {[r 0 s]T | r2 + s2 = 0, r and s in �}.

♦(f ) U = {[2r − s2 t]T | r, s, and t in �}.

2. In each case determine if X lies in U = span{Y, Z}.If X is in U, write it as a linear combination of Yand Z; if X is not in U, show why not.(a) X = [2 − 1 0 1]T, Y = [1 0 0 1]T, and

Z = [0 1 0 1]T.♦(b) X = [1 2 15 11]T, Y = [2 −1 0 2]T, and

Z = [1 −1 −3 1]T.

(c) X = [8 3 −13 20]T, Y = [2 1 −3 5]T, andZ = [−1 0 2 −3]T.

♦(d) X = [2 5 8 3]T, Y = [2 −1 0 5]T, andZ = [−1 2 2 −3]T.

3. In each case determine if the given vectorsspan �4. Support your answer.

(a) {[1 1 1 1]T, [0 1 1 1]T, [0 0 1 1]T, [0 0 0 1]T }.

♦(b) {[1 3 −5 0]T, [−2 1 0 0]T, [0 2 1 −1]T, [1 −4 5 0]T }.

4. Is it possible that {[1 2 0]T, [2 0 3]T } can spanthe subspace U = {[r s 0]T | r and s in �}?Defend your answer.

5. Give a spanning set for the zero subspace {0} of �n.

6. Is �2 a subspace of �3? Defend your answer.

7. If U = span{X, Y, Z } in �n, show that U = span{X + tZ, Y, Z } for every t in �.

8. If U = span{X, Y, Z } in �n, show that U = span{X + Y, Y + Z, Z + X }.

9. If a ≠ 0 is a scalar, show that span{aX } = span{X }for every vector X in �n.

nic22772_ch05.qxd 11/21/2005 6:49 PM Page 202

♦10. If are nonzero scalars, show that

for any vectors Xi in �n.

11. If X ≠ 0 in �n, determine all subspaces ofspan{X }.

12. Suppose that whereeach Xi is in �n. If A is an m × n matrix and AXi = 0 for each i, show that AY = 0 for everyvector Y in U.

13. If A is an m × n matrix, show that, for eachinvertible m × m matrix U, null(A) = null(UA).

14. If A is an m × n matrix, show that, for eachinvertible n × n matrix V, im(A) = im(AV ).

15. Let U be a subspace of �n, and let X be a vectorin �n.

(a) If aX is in U where a ≠ 0 is a number, showthat X is in U.

♦(b) If Y and X + Y are in U where Y is a vectorin �n, show that X is in U.

16. In each case either show that the statement istrue or give an example showing that it is false.

(a) If U ≠ �n is a subspace of �n and X + Y is inU, then X and Y are both in U.

♦(b) If U is a subspace of �n and rX is in U for allr in �, then X is in U.

(c) If U is a subspace of �n and X is in U, then −X is also in U.

♦(d) If X is in U and U = span{Y, Z}, then U = span{X, Y, Z}.

(e) The empty set of vectors in �n is a subspaceof �n.

17. (a) If A and B are m × n matrices, show that U = {X in �n | AX = BX } is a subspace of �n.

(b) What if A is m × n, B is k × n, and m ≠ k?

18. Suppose that are vectors in �n. If Y = a1X1 + a2X2 + … + akXk where a1 ≠ 0, showthat span{X1, X2, … , Xk} = span{Y, X2, … , Xk}.

19. If U ≠ {0} is a subspace of �, show that U = �.

♦20. Let U be a nonempty subset of �n. Show that Uis a subspace if and only if S2 and S3 hold.

21. If S and T are nonempty sets of vectors in �n,and if S ⊆ T, show that span{S} ⊆ span{T }.

22. Let U and W be subspaces of �n. Define theirintersection U � W and their sum U + W asfollows:

U � W = {X in �n | X belongs to both U and W }.

U + W = {X in �n | X is a sum of a vector in Uand a vector in W }.

(a) Show that U � W is a subspace of �n.

♦(b) Show that U + W is a subspace of �n.

23. Let P denote an invertible n × n matrix. If � is a number, show that

for each n × n matrix A. [Here E�(A) is the set ofeigenvectors of A.]

24. Show that every proper subspace U of �2 is aline through the origin. [Hint: If is a nonzerovector in U, let denote theline with direction vector . If is in U but notin L, argue geometrically that every vector in�2 is a linear combination of and .]d

�u�

v�u�d

�L d rd r= =� �

� �{ }| in

d�

{ is in ( )}P E AX X| �E PAP�( )1− =

X X Xk1 2, , ,…

U X X Xk= , , ,span{ }1 2 …

span span{ } { }a X a X a X X X Xk k k1 1 2 2 1 2, , , = , , ,… …a a ak1 2, , ,…

203Section 5.2 Independence and Dimension

SECTION 5.2 Independence and Dimension Some spanning sets are better than others. If is a subspaceof �n, then every vector in U can be written as a linear combination of the Xi in atleast one way. Our interest here is in spanning sets where each vector in U has aexactly one representation as a linear combination of these vectors.

Linear Independence Suppose that two linear combinations are equal in �n:

.r X r X r X s X s X s Xk k k k1 1 2 2 1 1 2 2+ + + = + + +

U X X Xk= , , ,span{ }1 2 …

nic22772_ch05.qxd 11/21/2005 6:49 PM Page 203

We are looking for a condition on the set of vectors that guaranteesthat this representation is unique; that is, ri = si for each i. Taking all terms to the leftside gives

.

so the required condition is that this equation forces all the coefficients ri − si to be zero. With this in mind, we call a set of vectors linearly independent

(or simply independent) if it satisfies the following condition:

.

We record the result of the above discussion for reference.

Theorem 1

If is an independent set of vectors in �n, then every vector inhas a unique representation as a linear combination of the Xi.

It is useful to state the definition of independence in different language. Let ussay that a linear combination vanishes if it equals the zero vector, and call a linearcombination trivial if every coefficient is zero. Then the definition of independencecan be compactly stated as follows:

A set of vectors is independent if and only if the only linear combination that vanishes is the trivial one.

Hence the procedure for checking that a set of vectors is independent is:

Independence Test

To verify that a set of vectors in �n is independent, proceed asfollows:

1. Set a linear combination equal to zero: .2. Show that ti = 0 for each i (that is, the linear combination is trivial).

Of course, if some nontrivial linear combination vanishes, the vectors are notindependent.

Example 1Determine whether is independentin �4.

Solution Suppose a linear combination vanishes:

.

Equating corresponding entries gives a system of four equations:

.

The only solution is the trivial one r = s = t = 0 (verify), so these vectors areindependent by the independence test.

r s t s t r t r s t+ + = , + = , − + = − + =2 0 0 2 2 0 5 0, and

r s tT T T T[ ] [ ] [ ] [ ]1 0 2 5 2 1 0 1 1 1 2 1 0 0 0 0− + − + =

{[ ] , [ ] , [ ] }1 0 2 5 2 1 0 1 1 1 2 1− −T T T

t X t X t Xk k1 1 2 2 0+ + + =

{ }X X Xk1 2, , ,…

span{ , , , }X X Xk1 2 …{ }X X Xk1 2, , ,…

If thent X t X t X t t tk k k1 1 2 2 1 20 0+ + + = = = = =

{ }X X Xk1 2, , ,…

( ) ( ) ( )r s X r s X r s Xk k k1 1 1 2 2 2 0− + − + + − =

{ }X X Xk1 2, , ,…


nic22772_ch05.qxd 11/21/2005 6:49 PM Page 204

Example 2Show that the standard basis of �n is independent.

Solution We have for all scalars ti, so the linearcombination vanishes if and only if each ti = 0. Hence the independence testapplies.

Example 3If {X, Y } is independent, show that {2X + 3Y, X − 5Y } is also independent.

Solution If s(2X + 3Y ) + t(X − 5Y ) = 0, collect terms to get (2s + t)X + (3s − 5t)Y = 0.Since {X, Y } is independent this combination must be trivial; that is, 2s + t = 0and 3s − 5t = 0. These equations have only the trivial solution s = t = 0, asrequired.

Example 4Show that the zero vector in �n does not belong to any independent set.

Solution Given a set of vectors containing 0, we have a vanishing,nontrivial linear combination . Hence the set isnot independent.

Example 5Given X in �n, show that {X } is independent if and only if X ≠ 0.

Solution A vanishing linear combination from {X } takes the form tX = 0, t in �. Thisimplies that t = 0 because X ≠ 0.

A set of vectors in �n is called linearly dependent (or simply dependent) if it isnot linearly independent, equivalently if some nontrivial linear combination vanishes.

Example 6If and are nonzero vectors in �3, show that is dependent if and onlyif and are parallel.

Solution If and are parallel, then one is a scalar multiple of the other (Theorem 4§4.1), say for some scalar a. Then the nontrivial linear combination

vanishes, so is dependent. Conversely, if is dependent,let be nontrivial, say s ≠ 0. Then , so X and are parallel(by Theorem 4 §4.1). A similar argument works if t ≠ 0.

By Theorem 5 §2.3, the following conditions are equivalent for an n × n matrix A:1. A is invertible. 2. If AX = 0 where X is in �n, then X = 0.3. AX = B has a solution X for every vector B in �n.

w�v wts

� �= −sv tw� � �+ = 0{ }v w� �,{ }v w� �,v aw� � �− = 0

v aw� �=w�v�

w�v�{ }v w� �,w�v�

1 0 0 0 0 01 2⋅ + + + + =X X Xk

{ }0 1 2, , , ,X X Xk…

t E t E t E t t tn n nT

1 1 2 2 1 2+ + + = [ ]

{ }E E Ek1 2, , ,…


nic22772_ch05.qxd 11/21/2005 6:49 PM Page 205

While (1) makes no sense if A is not square, conditions (2) and (3) are meaningfulfor any matrix A and, in fact, are related to independence and spanning. To see how, let denote the columns of an m × n matrix A. If X = [x1 x2 … xn]

T

is a column in �n, then

(∗)

by Theorem 4 §2.2. With this, we get the following theorem:

Theorem 2

If A is an m × n matrix, let denote the columns of A.1. is independent in �m if and only if AX = 0, X in �n,

implies X = 0.2. if and only if AX = B has a solution X for every

vector B in �n.

PROOF

Write . Then AX = 0 means by(∗), so (1) follows from the definition of independence. Similarly, (∗) shows thata vector B in �n satisfies AX = B if and only if X is a linear combination of thecolumns Cj, so (2) follows from the definition of a spanning set.

For a square matrix A, Theorem 2 characterizes the invertibility of A in terms ofthe spanning and independence of its columns (see the discussion precedingTheorem 2). It is important to be able to discuss these notions for rows. If

are 1 × n rows, we define to be the set of alllinear combinations of the Xi (as matrices), and we say that islinearly independent if the only vanishing linear combination is the trivial one (that is, if is independent in �n, as the reader can verify).6

Theorem 3

The following are equivalent for an n × n matrix A:1. A is invertible. 2. The columns of A are linearly independent. 3. The columns of A span �n.4. The rows of A are linearly independent. 5. The rows of A span the set of all 1 × n rows.

PROOF

Let denote the columns of A.(1) ⇔ (2). By Theorem 5 §2.3, A is invertible if and only if AX = 0 implies X = 0; this

holds if and only if is independent by Theorem 2.(1) ⇔ (3). Again by Theorem 5 §2.3, A is invertible if and only if AX = B has a

solution for every column B in �n; this holds if and only if by Theorem 2.

span{ , , , }C C Cnn

1 2 … = �

{ , , , }C C Cn1 2 …

C C Cn1 2, , ,…

{ , , , }X X XT TkT

1 2 …

{ , , , }X X Xk1 2 …span{ , , , }X X Xk1 2 …X X Xk1 2, , ,…

x C x C x Cn n1 1 2 2 0+ + =X x x xnT= [ ]1 2

�m

nC C C= span{ , , , }1 2 …

{ , , , }C C Cn1 2 …{ , , , }C C Cn1 2 …

AX x C x C x Cn n= + +1 1 2 2

C C Cn1 2, , ,…


6 It is best to view columns and rows as just two different notations for ordered n-tuples. This discussion will becomeredundant in Chapter 6 where we define the general notion of a vector space.

nic22772_ch05.qxd 11/21/2005 6:49 PM Page 206


7 The plural of “basis” is “bases”.

8 We will show in Theorem 6 that every subspace of �n does indeed have a basis.

(1) ⇔ (4). The matrix A is invertible if and only if AT is invertible (by theCorollary to Theorem 4 §2.3); this in turn holds if and only if AT has independentcolumns (by (1) ⇔ (2)); finally, this last statement holds if and only if A hasindependent rows (because the rows of A are the transposes of the columns of AT).

(1) ⇔ (5). The proof is similar to (1) ⇔ (4).

Dimension

It is common geometrical language to say that �3 is 3-dimensional, that planes are 2-dimensional and that lines are 1-dimensional. The next theorem is a basic tool for clarifying this idea of “dimension”. Its importance is difficult toexaggerate.

Theorem 4 Fundamental Theorem

Let U be a subspace of �n. If U is spanned by m vectors, and if U contains klinearly independent vectors, then k ≤ m.

We give a proof at the end of this section.

The main use of the fundamental theorem depends on the following concept. IfU is a subspace of �n, a set of vectors in U is called a basis of U ifit satisfies the following two conditions:

1. is linearly independent. 2. .

The most remarkable result about bases7 is:

Theorem 5 Invariance Theorem

If and are bases of a subspace U of �n, then m = k.

PROOF

We have k ≤ m by the fundamental theorem because spans U,and is independent. Similarly m ≤ k, so m = k.

The invariance theorem guarantees that there is no ambiguity in the follow-ing definition: If U is a subspace of �n and is any basis of U, thenumber m of vectors in the basis is called the dimension of U, and is denoted

.

The importance of the invariance theorem is that the dimension of U can be deter-mined by counting the number of vectors in any basis.8 This is very useful as weshall see.

dimU m=

{ , , , }X X X m1 2 …

{ , , , }Y Y Yk1 2 …{ , , , }X X X m1 2 …

{ , , , }Y Y Yk1 2 …{ , , , }X X X m1 2 …

U X X X m= span{ , , , }1 2 …{ , , }X X Xm1 2 … ,

{ , , , }X X X m1 2 …

nic22772_ch05.qxd 11/21/2005 6:49 PM Page 207

Let denote the standard basis of �n, that is the set of columns ofthe identity matrix. Then by Example 6 §5.1, and

is independent by Example 2. Hence it is indeed a basis of �n in thepresent terminology, and we have

Example 7dim(�n) = n and is a basis.

This agrees with our sense that �2 is two-dimensional and �3 is three-dimensional. It also says that �1 = � is one-dimensional, and {1} is a basis.Returning to subspaces of �n, we define

.

This amounts to saying {0} has a basis containing no vectors. This makes sensebecause 0 cannot belong to any independent set (Example 4).

Example 8Let . Show that U is a subspace of �4, find abasis of U, and calculate dim U.

Solution Clearly, where, and . It follows that U = span{X1, X2, X3},

and hence that U is a subspace of �4. Moreover, if a linear combination vanishes, it is clear that r = s = t = 0, so {X1, X2, X3}

is independent. Hence {X1, X2, X3} is a basis of U and so dim U = 3.

Example 9Let be a basis of �n. If A is an invertible n × n matrix,then is also a basis of �n.

Solution Let X be a vector in �n. Then A−1 X is in �n so, since B is a basis, we havefor ti in �. Left multiplication by A gives

, and it follows that D spans �n. Toshow independence, let , where the si arein �. Then , so left multiplication by A−1 gives

. Now the independence of B shows that each si = 0, and so proves the independence of D. Hence D is a basis of �n.

While we have found bases in many subspaces of �n, we have not yet shown thatevery subspace has a basis. This is part of the next theorem, the proof of which isdeferred to Section 6.4 where it will be proved in more generality.

Theorem 6

Let U ≠ {0} be a subspace of �n. Then:1. U has a basis and dim U ≤ n. 2. Any independent set in U can be enlarged (by adding vectors) to a basis of U.3. If B spans U, then B can be cut down (by deleting vectors) to a basis of U.

s X s X s Xn n1 1 2 2 0+ + + =A s X s X s Xn n( )1 1 2 2 0+ + + =

s AX s AX s AXn n1 1 2 2 0( ) ( ) ( )+ + + =X t AX t AX t AXn n= + + +1 1 2 2( ) ( ) ( )A X t X t X t Xn n

− = + + +11 1 2 2

D AX AX AXn= { , , , }1 2 …B X X Xn= { , , , }1 2 …

rX sX tX r s t s T1 2 3+ + = [ ]

X T3 0 0 1 0= [ ]X T

2 0 1 0 1= [ ]X T

1 1 0 0 0= [ ] ,[ ]r s t s rX sX tXT = + +1 2 3

U r s t s r s tT= {[ ] | , , }and in �

dim{ }0 0=

{ , , , }E E En1 2 …

{ , , , }E E En1 2 …�

nnE E E= span{ , , , }1 2 …

{ , , , }E E En1 2 …


nic22772_ch05.qxd 11/21/2005 6:49 PM Page 208

Theorem 6 has a number of useful consequences. Here is the first.

Theorem 7

Let U be a subspace of �n, and let be a set of m vectors in Uwhere m = dim U. Then

B is independent if and only if B spans U.

PROOF

Suppose B is independent. If B does not span U then, by Theorem 6, B can beenlarged to a basis of U containing more than m vectors. This contradicts theinvariance theorem because dim U = m, so B spans U. Conversely, if B spans Ubut is not independent, then B can be cut down to a basis of U containing fewerthan m vectors, again a contradiction. So B is independent, as required.

Theorem 7 is a “labour-saving” result. It asserts that, given a subspace U ofdimension m and a set B of exactly m vectors in U, to prove that B is a basis of U itsuffices to show either that B spans U or that B is independent. It is not necessary toverify both properties.

Example 10Find a basis of �4 containing B = {X1, X2, X3} where ,

, and .

Solution If , then it is routine to verify that {E1, X1, X2, X3} is linearlyindependent. Since �4 has dimension 4 it follows by Theorem 7 that {E1, X1, X2, X3} is a basis.9

Theorem 8

Let U ⊆ W be subspaces of �n. Then:1. dim U ≤ dim W. 2. If dim U = dim W, then U = W.

PROOF

Write dim W = k, and let B be a basis of U.1. If dim U > k, then B is an independent set in W containing more than k

vectors, contradicting the fundamental theorem. So dim U ≤ dim W, proving (1).

2. If dim U = k, then B is an independent set in W containing k = dim W vectors,so B spans W by Theorem 7. Hence W = span B = U, proving (2).

E T1 1 0 0 0= [ ]

X T3 1 1 1 1= [ ]X T

2 0 1 1 3= [ ]X T

1 1 2 1 0= −[ ]

B X X X m= { , , , }1 2 …


9 In fact, an independent subset of �n can always be enlarged to a basis by adding vectors from the standard basis of �n. (SeeExample 7 §6.4.)

nic22772_ch05.qxd 11/21/2005 6:49 PM Page 209

It follows from Theorem 8 that if U is a subspace of �n, then , andthat

, and

The other subspaces are called proper. The following example uses Theorem 8 toshow that the proper subspaces of �2 are the lines through the origin, while theproper subspaces of �3 are the lines and planes through the origin.

Example 111. If U is a subspace of �2 or �3, then dim U = 1 if and only if U is a line

through the origin. 2. If U is a subspace of �3, then dim U = 2 if and only if U is a plane through

the origin.

PROOF

1. Since dim U = 1, let be a basis of U. Then , so U is the linethrough the origin with direction vector . Conversely each line L with directionvector has the form . Hence {d�} is a basis of U, so U has dimension 1.

2. If U ⊆ �3 has dimension 2, let be a basis of U. Then and are not par-allel (by Example 6) so . Let denote the plane through the origin with normal . Then P is a subspace of �3

(Example 1 §5.1) and both and lie in P (they are orthogonal to ), soby Theorem 1 §5.1. Hence

.

Since dim U = 2 and dim(�3) = 3, it follows from Theorem 8 that dim P = 2 or 3,whence P = U or �3. But P ≠ �3 (for example, is not in P) and so U = P is a planethrough the origin.

Conversely, if U is a plane through the origin, then dim U = 0, 1, 2, or 3 byTheorem 8. But dim U ≠ 0 or 3 because and U ≠ �3, and dim U ≠ 1 by (1).So dim U = 2.

Note that this proof shows that if and are nonzero, nonparallel vectors in �3,then is the plane with normal . We gave a geometricalverification of this fact in Section 5.1.

Proof of the Fundamental Theorem

Fundamental Theorem (Theorem 4)

Let U be a subspace of �n. If and if is anindependent set in U, then k ≤ m.

{ , , , }Y Y Yk1 2 …U X X X m= span{ , , , }1 2 …

n v w� � �= ×span{ , }v w� �w�v�

U ≠ { }0�

n�

U P⊆ ⊆ �3

U v w P= ⊆span{ , }� �n�w�v�

n�P x n x= ={ }� � �in �

3 0| in v w� � � �= × ≠ 0

w�v�{ , }v w� �

L td t= { }�

| in �d� u�

U tu t= { }� | in �{ }u�

dim U n U n= =if and only if �

dim { }U U= =0 0if and only if

dim U n= 0 1 2, , , ,…


nic22772_ch05.qxd 11/21/2005 6:49 PM Page 210

PROOF

We assume that k > m and show that this leads to a contradiction. Each Yj is in, so write

where the coefficients aij are real numbers. These coefficients form column j ofa matrix A = [aij] of size m × k. Since k > m by assumption, the homogeneoussystem AX = 0 has a nontrivial solution . Consider thelinear combination of the Yj with these xj as coefficients:

where for each i because it is the ith entry of AX = 0. Since the Yj are independent, the equation implies that each xj = 0, acontradiction because X ≠ 0.

Exercises 5.2

jk

j jx Y= =∑ 1 0jk

ij ja x= =∑ 1 0

x Y x a X a x Xj jj

k

jj

k

ij ii

m

i

m

ij jj

k

i=

=

=

= = = = =∑ ∑ ∑ ∑ ∑

1 1 1 1 1(( ) ,0 0

1Xi

i

m

==∑

X x x xkT= ≠[ ]1 2 0

Y a X a X a X a X j kj j j mj m ij ii

m

= + + + = ==∑1 1 2 2

11 2, , , , ,…

U X X X m= span{ , , , }1 2 …


1. Which of the following subsets are independent?Support your answer.(a) {[1 −1 0]T, [3 2 −1]T, [3 5 −2]T } in �3.

♦(b) {[1 1 1]T, [1 −1 1]T, [0 0 1]T } in �3.(c) {[1 −1 1 −1]T, [2 0 1 0]T, [0 −2 1 −2]T} in �4.

♦(d) {[1 1 0 0]T, [1 0 1 0]T, [0 0 1 1]T,[0 1 0 1]T } in �4.

2. Let {X, Y, Z, W } be an independent set in �n.Which of the following sets is independent?Support your answer.(a) {X − Y, Y − Z, Z − X }

♦(b) {X + Y, Y + Z, Z + X }(c) {X − Y, Y − Z, Z − W, W − X }

♦(d) {X + Y, Y + Z, Z + W, W + X }

3. Find a basis and calculate the dimension of thefollowing subspaces of �4.(a) span{[1 −1 2 0]T, [2 3 0 3]T, [1 9 −6 6]T }.

♦(b) span{[2 1 0 −1]T, [−1 1 1 2]T, [2 7 4 5]T }.

(c) span{[−1 2 1 0]T, [2 0 3 −1]T, [4 4 11 −3]T,[3 −2 2 −1]T }.

♦(d) span{[−2 0 3 1]T, [1 2 −1 0]T, [−2 8 5 3]T,[−1 2 2 1]T }.

4. Find a basis and calculate the dimension of thefollowing subspaces of �4.(a) U = {[a a + b a − b b]T | a and b in �}.

♦(b) U = {[a + b a − b b a]T | a and b in �}.

(c) U = {[a b c + a c]T | a, b, and c in �}.

♦(d) U = {[a − b b + c a b + c]T | a, b, and c in �}.

(e) U = {[a b c d ]T | a + b − c + d = 0 in �}.♦(f ) U = {[a b c d ]T | a + b = c + d in �}.

5. Suppose that {X, Y, Z, W } is a basis of �4. Showthat:(a) {X + aW, Y, Z, W } is also a basis of �4 for

any choice of the scalar a.

♦(b) {X + W, Y + W, Z + W, W } is also a basisof �4.

(c) {X, X + Y, X + Y + Z, X + Y + Z + W } is alsoa basis of �4.

6. Use Theorem 3 to determine if the followingsets of vectors are a basis of the indicated space.

(a)

♦(b)

(c)

♦(d)

(e)

♦(f ) {[ ] , [ ] , [ ] ,

[ ] } .

1 0 2 5 4 4 3 2 0 1 0 3

1 3 3 10 4

− − −

−

T T T

T in �

{[ ] , [ ] , [ ] ,

[ ] } .

2 1 1 3 1 1 0 2 0 1 0 3

1 2 3 1 4

− −

−

T T T

T in �

{[ ] , [ ] , [ ] } .5 2 1 1 0 1 3 1 0 3− −T T T in �

{[ ] , [ ] , [ ] } .− − −1 1 1 1 1 2 0 0 1 3T T T in �

{[ ] , [ ] , [ ] } .1 1 1 1 1 1 0 0 1 3− −T T T in �

{[ ] , [ ] } .3 1 2 2 2− T T in �

nic22772_ch05.qxd 11/21/2005 6:49 PM Page 211

7. In each case show that the statement is true orgive an example showing that it is false.

(a) If {X, Y } is independent, then {X, Y, X + Y }is independent.

♦(b) If {X, Y, Z } is independent, then {Y, Z } isindependent.

(c) If {Y, Z } is dependent, then {X, Y, Z } isdependent.

♦(d) If all of are nonzero, thenis independent.

(e) If one of is zero, thenis dependent.

♦(f ) If aX + bY + cZ = 0, then {X, Y, Z } is inde-pendent.

(g) If {X, Y, Z} is independent, then aX + bY + cZ = 0 for some a, b, and c in �.

♦(h) If is dependent, thenfor some num-

bers ti in � not all zero.

(i) If is independent, thenfor some ti in �.

8. If A is an n × n matrix, show that det A = 0 ifand only if some column of A is a linearcombination of the other columns.

9. Let {X, Y, Z } be a linearly independent set in�4. Show that {X, Y, Z, Ek} is a basis of �4 forsome Ek in the standard basis {E1, E2, E3, E4}.

♦10. If {X1, X2, X3, X4, X5, X6} is an independent setof vectors, show that the subset {X2, X3, X5} isalso independent.

11. Let A be any m × n matrix, and let B1, B2, B3, … , Bk be columns in �m such that thesystem AX = Bi has a solution Xi for each i. If{B1, B2, B3, … , Bk} is independent in �m, showthat {X1, X2, X3, … , Xk} is independent in �n.

♦12. If {X1, X2, X3, … , Xk} is independent, show that{X1, X1 + X2, X1 + X2 + X3, … , X1 + X2 + … + Xk}is also independent.

13. If {Y, X1, X2, X3, … , Xk} is independent, showthat {Y + X1, Y + X2, Y + X3, … , Y + Xk} is alsoindependent.

14. Suppose that {X, Y } is a basis of �2, and let

.

(a) If A is invertible, show that {aX + bY, cX + dY } is a basis of �2.

♦(b) If {aX + bY, cX + dY } is a basis of �2, showthat A is invertible.

15. Let A denote an m × n matrix.

(a) Show that null A = null(UA) for everyinvertible m × m matrix U.

♦(b) Show that dim(null A) = dim(null(AV )) forevery invertible n × n matrix V. [Hint: If {X1, X2, … , Xk} is a basis of null A, show that{V −1 X1, V

−1X2, … , V −1Xk} is a basis ofnull(AV ).]

16. Let A denote an m × n matrix.

(a) Show that im A = im(AV ) for everyinvertible n × n matrix V.

(b) Show that dim(im A) = dim(im(UA)) forevery invertible m × m matrix U. [Hint: If {Y1, Y2, … , Yk} is a basis of im(UA), show that {U −1Y1, U

−1Y2, … , U −1Yk} is abasis of im A.]

17. Let U and W denote subspaces of �n, andassume that U ⊆ W. If dim U = n − 1, show that either W = U or W = �n.

♦18. Let U and W denote subspaces of �n, andassume that U ⊆ W. If dim W = 1, show thateither U = {0} or U = W.

Aa bc d

=

t X t X t Xk k1 1 2 2 0+ + + ={ }X X Xk1 2, , ,…

t X t X t Xk k1 1 2 2 0+ + + ={ }X X Xk1 2, , ,…

{ }X X Xk1 2, , ,…X X Xk1 2, , ,…

{ }X X Xk1 2, , ,…X X Xk1 2, , ,…


SECTION 5.3 OrthogonalityLength and orthogonality are basic concepts in geometry and, in �2 and �3, theycan both can be defined using the dot product. In this section we extend theseconcepts to �n, introduce the idea of an orthogonal basis—one of the most usefulconcepts in linear algebra, and begin exploring some of its applications.

nic22772_ch05.qxd 11/21/2005 6:49 PM Page 212

Dot Product, Length, and Distance Let and be two vectors in �n. The dotproduct X i Y of X and Y is defined by

.

Note that technically XTY is a 1 × 1 matrix, which we take to be a number. Thelength ��X��of the vector X is defined by

where indicates the positive square root. A vector of length 1 is called a unit vector.

Example 1If and in �5, then X i Y = 2 + 0 − 15 + 2 − 1 = −12, and .

These definitions agree with those in �2 and �3, and many properties carry over to �n:

Theorem 1

Let X, Y, and Z denote vectors in �n. Then: 1. X i Y = Y i X. 2. X i (Y + Z ) = X i Y + X i Z.3. (aX) i Y = a(X i Y) = X i (aY ) for all scalars a. 4. ��X��2 = X i X. 5. ��X�� ≥ 0, and ��X�� = 0 if and only if X = 0.6. ��aX �� = �a��X�� for all scalars a.

PROOF

(1), (2), and (3) follow from matrix arithmetic because X i Y = XTY; (4) is clear

from the definition; and (6) is a routine verification since . If

, then , so ��X�� = 0 if and only if . Since each xi is a real number this happens if and only if xi = 0

for each i; that is, if and only if X = 0. This proves (5).

Because of Theorem 1, computations with dot products in �n are similar to thosein �3. In particular, the dot product

equals the sum of mn terms, Xi i Yj, one for each choice of i and j. For example:

holds for all vectors X and Y.

( ) ( )3 4 7 2 21 6 28 8

21 22 82 2

X Y X Y X X X Y Y X Y Y

X X Y Y

− + = + − −= − −

i i i i ii

( ) ( )X X X Y Y Ym n1 2 1 2+ + + + + +i

x x xn12

22 2 0+ + + =

X x x xn= + + +12

22 2X x x xn

T= [ ]1 2

| | =a a2

X = + + + + =1 0 9 4 1 15Y T= − −[ ]2 1 5 1 1X T= −[ ]1 0 3 2 1

X X X x x xn= = + + +i 12

22 2

X Y X Y x y x y x yTn ni = = + + +1 2 2 2

Y y y ynT= [ ]1 2X x x xn

T= [ ]1 2

213Section 5.3 Orthogonality

nic22772_ch05.qxd 11/21/2005 6:49 PM Page 213

Example 2Show that ��X + Y ��2 = ��X��2 + 2X i Y + ��Y��2 for any X and Y in �n.

Solution Using Theorem 1 several times:

Example 3Suppose that for some vectors Fi. If X i Fi = 0 for eachi where X is in �n, show that X = 0.

Solution We show X = 0 by showing that ��X�� = 0 and using (5) of Theorem 1. Since the Fi span �n, write where the ti are in �.Then

We saw in Section 4.2 that if u� and v� are nonzero vectors in �3, then

where � is the angle between u� and v�. Since �cos � � ≤ 1 for

any angle �, this shows that �u� i v� � ≤ ��u�� v��. In this form the result holds in �n.

Theorem 2 Cauchy Inequality10

If X and Y are vectors in �n, then

.

Moreover �X i Y � = ��X�� Y�� if and only if one of X and Y is a multiple of the other.

PROOF

The inequality holds if X = 0 or Y = 0 (in fact it is equality). Otherwise, write ��X�� = a > 0 and ��Y�� = b > 0 for convenience. A computation like that inExample 2 gives

. (∗)

It follows that ab − X i Y ≥ 0 and ab + X i Y ≥ 0, and so that −ab ≤ X i Y ≤ ab.Hence �X i Y � ≤ ab = ��X�� Y��, proving the Cauchy inequality.

bX aY ab ab X Y bX aY ab ab X Y− = − + = +2 22 2( ) ( )i iand

X Y X Yi ≤

u vu u

� �

� �i

|| |||| ||cos= θ

21 1 2 2

1 1 2 2

X X X X t F t F t Ft X F t X F t X Ft

k k

k k

= = + + += + + +=

i ii i i

( )( ) ( ) ( )

11 20 0 0 0( ) ( ) ( ) .+ + + =t tk

X t F t F t Fk k= + + +1 1 2 2

�n

kF F F= , , ,span{ }1 2 …

X Y X Y X Y X X X Y Y X Y Y

X X Y Y

+ = + + = + + += + +

2

2 22

( ) ( )

.

i i i i ii


10 Augustin Louis Cauchy (1789–1857) was born in Paris and became a professor at the École Polytechnique at the age of 26.He was one of the great mathematicians, producing more than 700 papers, and is best remembered for his work in analysisin which he established new standards of rigour and founded the theory of functions of a complex variable. He was a devoutCatholic with a long-term interest in charitable work, and he was a royalist, following King Charles X into exile in Prague afterhe was deposed in 1830.

Augustin Louis Cauchy.Photo © Corbis.

nic22772_ch05.qxd 11/21/2005 6:49 PM Page 214


If equality holds, then �X i Y � = ab, so X i Y = ab or X i Y = −ab. Hence (∗)shows that bX − aY = 0 or bX + aY = 0, so one of X and Y is a multiple of theother (even if a = 0 or b = 0).

The Cauchy inequality is equivalent to (X i Y)2 ≤ ��X��2 ��X��2; for example, in �5 thisbecomes

for all xi and yi in �.There is an important consequence of the Cauchy inequality. Given X and Y in

�n, use Example 2 and the fact that X i Y ≤ ��X�� Y�� to compute

.

Taking positive square roots gives:

Corollary Triangle Inequality

If X and Y are vectors in �n, then ��X + Y �� ≤ ��X�� + ��Y��.

The reason for the name comes from the observation that in �2 the inequalityasserts that the sum of the lengths of two sides of a triangle is not less than thelength of the third side. This is illustrated in the first diagram.

If X and Y are two vectors in �n, we define the distance d(X, Y ) between X and Y by

The motivation again comes from �2 as is clear in the second diagram. Thisdistance function has all the intuitive properties of distance in �2, including anotherversion of the triangle inequality.

Theorem 3

If X, Y, and Z are three vectors in �n we have: 1. d(X, Y ) ≥ 0 for all X and Y.2. d(X, Y ) = 0 if and only if X = Y.3. d(X, Y ) = d(Y , X).4. Triangle inequality. d(X, Z ) ≤ d(X, Y ) + d(Y, Z ).

PROOF

(1) and (2) restate part (5) of Theorem 1 because d(X, Y ) = ��X − Y��, and (3) followsbecause ��U �� = ��−U�� for every vector U in �n. To prove (4) use the Corollary to Theorem 2:

d X Z X Z X Y Y ZX Y Y Z d X Y d Y Z

( , ) ( ) ( )( ) ( ) ( , ) ( , ).

= − = − + −≤ − + − = +

d X Y X Y( , ) = −

X Y X X Y Y X X Y Y X Y+ = + + ≤ + + = +2 2 2 2 2 22 2i ( )

( )

( ) (

x y x y x y x y x y

x x x x x y y1 1 2 2 3 3 4 4 5 5

2

12

22

32

42

52 2

12

22

+ + + +

≤ + + + + + ++ + +y y y32

42

52 2)

+

–

v�

v� w�

v�

v�

w�

w�

w�

nic22772_ch05.qxd 11/21/2005 6:49 PM Page 215

Orthogonal Sets and the Expansion Theorem Two nonzero vectors and in �3 are orthogonal if and only if(Theorem 3 §4.2). More generally, a set of vectors in �n is calledan orthogonal set if

.

Note that {X } is an orthogonal set if X ≠ 0. A set of vectors in �n iscalled orthonormal if it is orthogonal and, in addition, each Xi is a unit vector:

.

Example 4The standard basis is an orthonormal set in �n.

The routine verification is left to the reader, as is the proof of:

Example 5If is orthogonal, so also is for anynonzero scalars ai.

If X ≠ 0, it follows from item (6) of Theorem 1 that X is a unit vector, that is it has length 1. Hence if is an orthogonal set, then

is an orthonormal set, and we say that it is

the result of normalizing the orthogonal set .

Example 6If , and , then {F1, F2, F3, F4} is an orthogonal set in �4 as is easily verified. After normalizing, the corresponding orthonormal set is .

The most important result about orthogonality is Pythagoras’ theorem. Givenorthogonal vectors and in �3, it asserts that as in the dia-gram. In this form the result holds for any orthogonal set in �n.

Theorem 4 Pythagoras’ Theorem

If is a orthogonal set in �n, then

.

PROOF

The fact that Xi i Xj = 0 whenever i ≠ j gives

This is what we wanted.

X X X X X X X X XX X X X X X

k k k

k k

1 22

1 2 1 2

1 1 2 2

+ + + = + + + + + += + + +

ii i i

( ) ( )( ))

( )

( ) ( ).

+ + + += + + + + + + +

X X X X X X

X X Xk

1 2 1 3 2 3

12

22 2 0 0 0

i i i

X X X X X Xk k1 22

12

22 2+ + + = + + +

X X Xk1 2, , ,…{ }

�� v w v w� � � �+ +2 2 2=w�v�

{ }, , ,12 1

16 2

12 3

12 3 4F F F F

F T4 1 3 1 1= − −[ ]F F FT T T

1 2 31 1 1 1 1 0 1 2 1 0 1 0= − = = −[ ] , [ ] , [ ]

{ , , , }X X Xk1 2 …

1 1 1

11

22X

XX

XX

Xk

k, , ,…

{ , , , }X X Xk1 2 …

1X

{ , , , }a X a X a Xk k1 1 2 2 …{ , , , }X X Xk1 2 …

{ , , , }E E En1 2 …

X ii = 1 for each

{ , , , }X X Xk1 2 …

X X i j X ii j ii = ≠ ≠0 0 11for all and for all

{ , , , }X X Xk1 2 …v w� �i = 0w�v�


11 The reason for insisting that orthogonal sets consist of nonzero vectors is that we will be primarily concerned with orthogonal bases.

v

v + w w

nic22772_ch05.qxd 11/21/2005 6:49 PM Page 216

If and are orthogonal, nonzero vectors in �3, then they are certainly notparallel, and so are linearly independent by Example 6 §5.2. The next theorem givesa far-reaching extension of this observation.

Theorem 5

Every orthogonal set in �n is linearly independent.

PROOF

Let be an orthogonal set in �n and suppose a linear combinationvanishes: . Then

Since ��X1��2 ≠ 0, this implies that t1 = 0. Similarly ti = 0 for each i.

Theorem 5 suggests considering orthogonal bases for �n, that is orthogonal setsthat span �n. These turn out to be the best bases. One reason is that, when expand-ing a vector as a linear combination of the basis vectors, there are explicit formulasfor the coefficients.

Theorem 6 Expansion Theorem

Let be an orthogonal basis of a subspace U of �n. If X is any vector in U, we have

.

PROOF

Since spans U, we have where the tiare scalars. To find t1 we take the dot product of both sides with F1:

Since F1 ≠ 0, this gives . Similarly, for each i.

The expansion of X as a linear combination of the orthogonal basis is

called the Fourier expansion of X, and the coefficients are called the

Fourier coefficients. Note that if is actually orthonormal, then ti = X i Fi for each i. We will have a great deal more to say about this in Section 10.5.

{ }F F Fm1 2, , ,…

tX F

Fi

i

i=

i2

{ }F F Fm1 2, , ,…

tX F

Fi

i

i=

i2t

X FF

11

12=

i

X F t F t F t F Ft F F t F F t F F

m m

m m

i ii i i

1 1 1 2 2 1

1 1 1 2 2 1 1

= + + += + + +

( )( ) ( ) ( )

== + + +=

t F t t

t Fm1 1

22

1 12

0 0( ) ( )

.

X t F t F t Fm m= + + +1 1 2 2{ }F F Fm1 2, , ,…

XX FF

FX FF

FX FF

Fm

mm= + + +� � � � � �i i i1

12 1

2

22 2 2

{ , , , }F F Fm1 2 …

0 01 1 1 1 2 2

1 1 1 2 1 2 1

= = + + += + + +

X X t X t X t Xt X X t X X t X X

k k

k

i ii i i

( )( ) ( ) ( kk

kt X t t

t X

)

( ) ( )

.

= + + +=

1 12

2

1 12

0 0

t X t X t Xk k1 1 2 2 0+ + + ={ , , , }X X Xk1 2 …

w�v�


nic22772_ch05.qxd 11/21/2005 6:49 PM 7

Example 7Expand as a linear combination of the orthogonal basis {F1, F2, F3, F4} of �4 given in Example 6.

Solution We have , , , and, so the Fourier coefficients are

The reader can verify that indeed X = t1F1 + t2F2 + t3F3 + t4F4.

A natural question arises here: Does every subspace U of �n have an orthogonalbasis? The answer is “yes”; in fact, there is a systematic procedure, called theGram–Schmidt algorithm, for turning any basis of U into an orthogonal one. Thisleads to a definition of the projection onto a subspace U that generalizes the projec-tion along a vector used in �2 and �3. All this is discussed in Section 8.1.

Exercises 5.3

tX F

Fa b c d t

X F

Fa c

tX F

Fa c

11

12

14 3

3

32

12

22

22

16

= = + + − = = − +

= = +

i i

i

( ) ( )

( ++ = = − + − +2 344

42

112d t

X F

Fa b c d) ( )

i

F T4 1 3 1 1= − −[ ]

F T3 1 0 1 0= −[ ]F T

2 1 0 1 2= [ ]F T1 1 1 1 1= −[ ]

X a b c d T= [ ]


1. Obtain an orthonormal basis of �3 bynormalizing the following.

(a) {[1 −1 2]T, [0 2 1]T, [5 1 −2]T}

♦(b) {[1 1 1]T, [4 1 −5]T, [2 −3 1]T}

2. In each case, show that the set of vectors isorthogonal in �4.

(a) {[1 −1 2 5]T, [4 1 1 −1]T, [−7 28 5 5]T}

(b) {[2 −1 4 5]T, [0 −1 1 −1]T, [0 3 2 −1]T}

3. In each case, show that B is an orthogonal basis of �3 and use Theorem 6 to expand X = [a b c]T as a linear combination of the basisvectors.

(a) B = {[1 −1 3]T, [−2 1 1]T, [4 7 1]T}

♦(b) B = {[1 0 −1]T, [1 4 1]T, [2 −1 2]T}

(c) B = {[1 2 3]T, [−1 −1 1]T, [5 −4 1]T}

♦(d) B = {[1 1 1]T, [1 −1 0]T, [1 1 −2]T}

4. In each case, write X as a linear combination ofthe orthogonal basis of the subspace U.

(a)

♦(b)

5. In each case, find all [a b c d ]T in �4 such thatthe given set is orthogonal.

(a) {[1 2 1 0]T, [1 −1 1 3]T, [2 −1 0 −1]T, [a b c d ]T}

♦(b) {[1 0 −1 1]T, [2 1 1 −1]T, [1 −3 1 0]T, [a b c d]T}

6. If and X i Y = −2, compute:

7. In each case either show that the statement istrue or give an example showing that it is false. (a) Every independent set in �n is orthogonal.

♦(b) If {X, Y } is an orthogonal set in �n, then {X, X + Y } is also orthogonal.

(c) If {X, Y } and {Z, W } are both orthogonal in�n, then {X, Y, Z, W } is also orthogonal.

♦(d) If {X1, X2} and {Y1, Y2, Y3} are bothorthogonal and Xi i Yj = 0 for all i and j,then {X1, X2, Y1, Y2, Y3} is orthogonal.

(e) If is orthogonal in �n, then.

♦(f ) If X ≠ 0 in �n, then {X} is an orthogonal set.

�n

nX X X= , , ,span{ }1 2 …{ }X X Xn1 2, , ,…

(a)(b)(c)(

3 52 73 2

X YX YX Y Y X

−+− −

♦

♦

( ) ( )idd) ( ) ( )X Y X Y− +2 3 5i

X Y= =3 1, ,

XU

T

T T= −= − − −

[ ] ;{[ ] , [ ] }

14 1 8 52 1 0 3 2 1 2 1span

XU

T

T T= −= − −

[ ] ;{[ ] , [ ] }

13 20 151 2 3 1 1 1span

nic22772_ch05.qxd 11/21/2005 6:49 PM Page 218

8. If A is an m × n matrix with orthonormalcolumns, show that ATA = In. [Hint: If

are the columns of A, show thatcolumn j of ATA is

.]

9. Use the Cauchy inequality to show thatfor all x ≥ 0 and y ≥ 0. Here

and are called, respectively, the geometric mean and arithmetic mean of x and y.

[Hint: Use and .]

10. Use the Cauchy inequality to prove that:

(a) for allri in � and all n ≥ 1.

♦(b) for all r1, r2,and r3 in �. [Hint: See part (a).]

11. (a) Show that X and Y are orthogonal in �n ifand only if

♦(b) Show that X + Y and X − Y are orthogonal

in �n if and only if

12. Show that if and only ifX is orthogonal to Y.

13. (a) Show that for all X, Y in �n.

(b) Show that for all X, Y in �n.

14. If A is n × n, show that every eigenvalue of ATAis nonnegative. [Hint: Compute where Xis an eigenvector.]

15. If �n = span{X1, … , Xm} and X i Xi = 0 for all i,show that X = 0. [Hint: Show ]

16. If �n = span{X1, … , Xm} and X i Xi = Y i Xi forall i, show that X = Y.

17. Let {E1, … , En} be an orthogonal basis of �n.Given X and Y in �n, show that

.X YX E Y E

EX E Y E

En n

ni

i i i i= + +

( )( ) ( )( )1 1

12 2

X = 0.

AX 2

X Y 2+ − ]X Y X Y2 2 21

2+ = +[

X Y X Y X Yi = + − −14

2 2[ ]

X Y X Y+ = +2 2 2

X Y= .

X Y X Y+ = − .

r r r r r r r r r1 2 1 3 2 3 12

22

32+ + ≤ + +

( ) ( )r r r n r r rn n1 22

12

22 2+ + + ≤ + + +

Yy

x=

Xx

y=

12 ( )x y+

xyxy x y≤ +12 ( )

[ ]C C C C C Cj j n jT

1 2i i i

C C Cn1 2, , ,…

219Section 5.4 Rank of a Matrix

SECTION 5.4 Rank of a MatrixIn this section we use independence and spanning to properly define the rank of amatrix and to study its properties. This requires that we deal with rows and columnsin the same way. While it has been our custom to write the n-tuples in �n ascolumns, in this section we will frequently write them as rows. Subspaces, independ-ence, spanning, and dimension are defined for rows using matrix operations, just asfor columns. If A is an m × n matrix, we define:

The column space, col A, of A is the subspace of �m spanned by the columns of A.The row space, row A, of A is the subspace of �n spanned by the rows of A.

Much of what we do in this section involves these subspaces. Recall from Theorem 4 §2.2 that if are the columns of an m × n matrix A, and if

is any column in �n, then

. (∗)

With this we can prove:

Theorem 1

Let A, U, and V be matrices of sizes m × n, p × m, and n × q respectively. Then: 1. col(AV) ⊆ col A, with equality if V is (square and) invertible. 2. row(UA) ⊆ row A, with equality if U is (square and) invertible.

AX C C C

xx

x

x C x C x Cn

n

n n= = + + +

1 2

1

21 1 2 2

X x x xnT= [ ]1 2

C C Cn1 2, , ,…

nic22772_ch05.qxd 11/21/2005 6:49 PM Page 219

PROOF

Let denote the columns of A, and let denote column j of V. Then column j of AV is AXj, and by (∗) this is AXj = . Since this is in col A for each j, it follows thatcol(AV ) ⊆ col A. If V is invertible, we obtain col A = col[(AV )V−1] ⊆ col(AV ) inthe same way, and (1) follows.

As to (2), we have col[(UA)T] = col(ATUT) ⊆ col(AT) by (1), from which itfollows that row(UA) ⊆ row A. If U is invertible, we obtain row(UA) = row A asin the proof of (1).

Now suppose that a matrix A is carried to some row-echelon matrix R by rowoperations. Then R = UA for some invertible matrix U by Theorem 1 §2.4, soTheorem 1 shows that row R = row A. Moreover, the next lemma shows thatdim(row R) is the rank of A defined in Section 1.2, and hence shows that rank A isindependent of the particular row-echelon matrix to which A can be carried. Thisfact was not proved in Section 1.2.

Lemma 1

Let R denote an m × n row-echelon matrix.1. The rows of R are a basis of row R. 2. The columns of R containing the leading ones are a basis of col R.

PROOF

1. If denote the nonzero rows of R, we have by definition. Suppose where each ai is in �. Then a1 = 0 because the leading 1 in R1 is to the left of any nonzero entry in any otherRi. But then and so a2 = 0 in the same way (because the matrix R with R1 deleted is also row-echelon). This continues to show that each ai = 0.

2. The r columns containing leading ones are independent because the leading onesare in different rows (and have zeros below them). It is clear that col R is con-tained in the subspace of all columns in �m with the last m − r entries zero. Thisspace has dimension r, so the r independent columns containing leading ones area basis by Theorem 7 §5.2.

Somewhat surprisingly, Lemma 1 is instrumental in showing that dim(col A) = dim(row A) for any matrix A. This is the main result in the followingfundamental theorem.

Theorem 2 Rank Theorem

Let A denote any m × n matrix. Then

.

Moreover, suppose A can be carried to a matrix R in row-echelon form by a series of elementary row operations. If r denotes the number of nonzero rows in R, then

dim( ) dim( )row colA A=

a R a Rr r2 2 0+ + =

a R a R a Rr r1 1 2 2 0+ + + =row span{ }R R R Rr= , , ,1 2 …R R Rr1 2, , ,…

x C x C x Cn n1 1 2 2+ + +

X x x xj nT= [ ]1 2C C Cn1 2, , ,…


nic22772_ch05.qxd 11/21/2005 6:49 PM Page 220

1. The r nonzero rows of R are a basis of row A.2. If the leading 1’s lie in columns j1, j2, … , jr of R, then the corresponding

columns j1, j2, … , jr of A are a basis of col A.

PROOF

We have R = UA for some invertible matrix U. Hence row A = row R by Theorem 1,and (1) follows from Lemma 1.

To prove (2), let C1, C2, … , Cn denote the columns of A. Then A = [C1 C2 … Cn] in block form, and

.

Hence, in the notation of (2), the set consists of the columns of R that contain a leading 1, so B is a basis of col R by Lemma 1. But then the fact that U is invertible implies that is linearlyindependent. Furthermore, if Cj is any column of A, then UCj is a linearcombination of the columns in the set B. Again, the invertibility of U implies that Cj is a linear combination of This proves (2).

Finally, dim(row A) = r = dim(col A) by (1) and (2).

The common dimension of the row and column spaces of an m × n matrix A iscalled the rank of A and is denoted rank A. By (1) of Theorem 2, this agrees withthe definition in Section 1.2 and we record the result for reference.

Corollary 1

Suppose a matrix A can be carried to a matrix R in row-echelon form by a seriesof elementary row operations. Then the rank of A is equal to the number ofnonzero rows of R.

Example 1

Compute the rank of matrix and find bases for the row

space and the column space of A.

Solution The reduction of A to row-echelon form is as follows:

Hence rank A = 2, and {[ 1 2 2 −1 ], [ 0 0 1 −3 ]} is a basis of the row space of A. Moreover, the leading 1’s are in columns 1 and 3 of the row-echelon matrix, so Theorem 2 shows that columns 1 and 3 of A are

a basis of col A.131

251

,

1 2 2 13 6 5 01 2 1 2

1 2 2 10 0 1 30 0 1 3

1 2 2 10 0 1 30 0

−

→−

−−

→−−

00 0

A =−

1 2 2 13 6 5 01 2 1 2

C C Cj j jr1 2, , , . …

{ , , , }C C Cj j jr1 2 …

B UC UC UCj j jr= { , , }1 2 , …

R UA U C C C UC UC UCn n= = =[ ] [ ]1 2 1 2


nic22772_ch05.qxd 11/21/2005 6:49 PM Page 221

The rank theorem has several other important consequences. Corollary 2 followsbecause the rows of A are independent (respectively, span row A) if and only if theirtransposes are independent (respectively, span col(AT )).

Corollary 2

If A is any matrix, then rank A = rank(AT).

Corollary 3

If A is an m × n matrix, then rank A ≤ m and rank A ≤ n.

PROOF

If A is carried to the row-echelon matrix R by row operations, then Corollary 1shows that rank A = r where r is the number of nonzero rows of R. Since R is m × ntoo, it follows that r ≤ m. Applying this to AT gives rank(AT ) ≤ n because AT isn × m. Hence we are done by Corollary 2.

Theorem 1 immediately yields

Corollary 4

rank A = rank(UA) = rank(AV ) whenever U and V are invertible.

Corollary 5

An n × n matrix A is invertible if and only if rank A = n.

PROOF

If A is invertible, then A → In by row operations (by Theorem 5 §2.3), so rank A = nby Corollary 1. Conversely, let A → R by row operations where R is an n × nreduced row-echelon matrix. If rank A = n, then R has n leading ones by Corollary 1,and so R = In . Hence A → In so A is invertible, again by Theorem 5 §2.3.

The rank theorem can be used to find bases of subspaces of the space of all n × 1 rows. Here is an example where n = 4.

Example 2Find a basis for the following subspace of �4, (written as rows).

U = − −span{[ ], [ ], [ ]}1 1 2 3 2 4 1 0 1 5 4 9


nic22772_ch05.qxd 11/21/2005 6:49 PM Page 222

Solution U is just the row space of , so we reduce this to row-echelon

form:

The required basis is Thus isalso a basis and avoids fractions.

In Section 5.1 we discussed two other subspaces associated with an m × n matrix A,the null space null A = {X | X in �n and AX = 0} and the image im A = {AX | X in �n}.Using the rank, there are simple ways to find bases for these spaces.

We already know (Theorem 3 §2.2) that null A is spanned by the basic solutionsto the system AX = 0. The following example is instructive in showing that thesebasic solutions are, in fact, independent (and so are a basis of null A).

Example 3

If find a basis of null A and so find its dimension.

Solution If X is in null A, then AX = 0, so X is given by solving the system AX = 0.The reduction of the augmented matrix to reduced form is

Hence, writing X = [x1 x2 x3 x4]T, the leading variables are x1 and x3, and thenonleading variables x2 and x4 become parameters: x2 = s and x4 = t. Then theequations corresponding to the reduced matrix determine the leading variablesin terms of the parameters:

This means that the general solution is

(∗)

Hence X is in span{X1, X2} where X1 = [2 1 0 0]T and X2 = [1 0 −2 1]T arethe basic solutions, and we have shown that null(A) ⊆ span{X1, X2}. But X1 andX2 are in null A (they are solutions of AX = 0), so

by Theorem 1 §5.1. We claim further that {X1, X2} is linearly independent.To see this, let sX1 + tX2 = 0 be a linear combination that vanishes. Then (∗)shows that [2s + t s −2t t ]T = 0, whence s = t = 0. Thus {X1, X2} is a basis ofnull(A), and so dim(null A) = 2.

null { , }A X X= span 1 2

X s t s t t s tT T T= + − = + −[ ] [ ] [ ] .2 2 2 1 0 0 1 0 2 1

x s t x t1 32 2= + = −and .

1 2 1 1 01 2 0 1 02 4 1 0 0

1 2 0 1 00 0 1 2 00 0 0 0 0

−−

−

→− −

A =−

−−

1 2 1 11 2 0 12 4 1 0

{[1 1 2 3 0 2 3 6], [ ]}− −{[ ], [ ]}.1 1 2 3 0 1 332− −

1 1 2 32 4 1 01 5 4 9

1 1 2 30 2 3 60 4 6 12

1 1 2 3

0 1 32

− −

→ − −− −

→ − −−

30 0 0 0

1 1 2 32 4 1 01 5 4 9− −


nic22772_ch05.qxd 11/21/2005 6:49 PM Page 223


The calculation in Example 3 is typical of what happens in general. If A is anm × n matrix with rank A = r, there are exactly r leading variables, and hence exactlyn − r nonleading variables. These lead to exactly n − r basic solutions X1, X2, … , Xn − r ,and the reduced equations give the leading variables as linear combinations of theXi . Hence

(This is Theorem 3 §2.2.). We now claim that these basic solutions Xi are indepen-dent. The general solution is a linear combination X = t1X1 + t2X2 + … + tn − r Xn − rwhere each coefficient ti is a parameter equal to a nonleading variable. Thus, if thislinear combination vanishes, then each ti = 0 (as for s and t in Example 3, each ti is acoefficient when X is expressed as a linear combination of the standard basis of �n ).This proves that {X1, X2, … , Xn − r} is linearly independent, and so is a basis of null A.This proves the first part of the following theorem.

Theorem 3

Let A denote an m × n matrix of rank r.1. If X1, X2, … , Xn − r are the basic solutions of the homogeneous system AX = 0

that are produced by the gaussian algorithm, then {X1, X2, … , Xn − r } is a basis of null A. In particular

2. We have im A = col A so the rank theorem provides a basis of im A.In particular,

PROOF

It remains to prove (2). But im A = col A by Example 8 §5.1, so dim(im A ) = dim(col A ) = r. The rest follows from Theorem 2.

Let A be an m × n matrix. Corollary 3 of the rank theorem asserts that rank A ≤ m and rank A ≤ n, and it is natural to ask when these extreme cases arise. If are the columns of A, Theorem 2 §5.2 shows that spans �m if and only if the system AX = B is consistent for every B in �m, and that is independent if and only if AX = 0, X in �n, implies X = 0. The next two theorems improve on both these results, and relate them to when therank of A is n or m.

Theorem 4

The following are equivalent for an m × n matrix A:1. rank A = n.2. The rows of A span �n.3. The columns of A are linearly independent in �m.

{ }C C Cn1 2, , ,…

{ }C C Cn1 2, , ,…C C Cn1 2, , ,…

dim(im ) .A r=

dim(null ) .A n r= −

null { , , , }.A X X Xn r= −span 1 2 …

nic22772_ch05.qxd 11/21/2005 6:49 PM Page 224

4. The n × n matrix ATA is invertible.5. CA = In for some n × m matrix C. 6. If AX = 0, X in �n, then X = 0.

PROOF

(1) ⇒ (2). We have row A ⊆ �n, and dim(row A) = n by (1), so row A = �n byTheorem 8 §5.2. This is (2).

(2) ⇒ (3). By (2), row A = �n, so rank A = n. This means dim(col A) = n. Since then columns of A span col A, they are independent by Theorem 7 §5.2.

(3) ⇒ (4). If (ATA)X = 0, X in �n, we show that X = 0 (Theorem 5 §2.3). We have

.

Hence AX = 0, so X = 0 by (3) and Theorem 2 §5.2.(4) ⇒ (5). Given (4), take C = (ATA)−1 AT. (5) ⇒ (6). If AX = 0, then left multiplication by C (from (5)) gives X = 0.(6) ⇒ (1). Given (6), the columns of A are independent by Theorem 2 §5.2.

Hence dim(col A) = n, and (1) follows.

Theorem 5

The following are equivalent for an m × n matrix A: 1. rank A = m. 2. The columns of A span �m.3. The rows of A are independent in �n. 4. The m × m matrix AAT is invertible. 5. AC = Im for some n × m matrix C.6. The system AX = B is consistent for every B in �m.

PROOF

(1) ⇒ (2). By (1), dim(col A) = m, so col A = �m by Theorem 8 §5.2. (2) ⇒ (3). By (2), col A = �m, so rank A = m. This means dim(row A) = m. Since

the m rows of A span row A, they are independent by Theorem 7 §5.2. (3) ⇒ (4). We have rank A = m by (3), so the n × m matrix AT has rank m. Hence

applying Theorem 4 to AT in place of A shows that (AT)TAT is invertible, proving (4).(4) ⇒ (5). Given (4), take C = AT(AAT)−1 in (5).(5) ⇒ (6). Comparing columns in AC = Im gives ACj = Ej for each j, where Cj and

Ej denote column j of C and Im respectively. Given B in �m, write , rj in �. Then (6) holds with as the reader can verify.

(6) ⇒ (1). Given (6), the columns of A span �m by Theorem 2 §5.2. Thus col A = �m and (1) follows.

Example 4Show that is invertible if x, y, and z are all distinct.

32 2 2

x y z

x y z x y z

+ +

+ + + +

X r Cj jjm=

=∑ 1

B r Ej jjm= =∑ 1

AX AX AX X A AX XT T T T2 0 0= = = =( )


nic22772_ch05.qxd 11/21/2005 6:49 PM Page 225

Solution The given matrix has the form ATA where has independent

columns (verify). Hence Theorem 4 applies.

Theorems 4 and 5 relate several important properties of an m × n matrix A to theinvertibility of the square, symmetric matrices ATA and AAT. In fact, even if thecolumns of A are not independent or do not span �m, the matrices ATA and AAT areboth symmetric and, as such, have real eigenvalues as we shall see. We return to thisin Chapter 7.

Exercises 5.4

Axyz

=

111


1. In each case find bases for the row and columnspaces of A and determine the rank of A.

2. In each case find a basis of the subspace U.(a) U = span{[1 −1 0 3], [2 1 5 1], [4 −2 5 7]}

♦(b) U = span{[1 −1 2 5 1], [3 1 4 2 7], [1 1 0 0 0], [5 1 6 7 8]}

(c)

♦(d)

3. (a) Can a 3 × 4 matrix have independentcolumns? Independent rows? Explain.

♦(b) If A is 4 × 3 and rank A = 2, can A haveindependent columns? Independent rows?Explain.

(c) If A is an m × n matrix and rank A = m,show that m ≤ n.

♦(d) Can a nonsquare matrix have its rowsindependent and its columns independent?Explain.

(e) Can the null space of a 3 × 6 matrix havedimension 2? Explain.

♦(f ) If A is not square, show that either the rowsof A or the columns of A are not linearlyindependent.

4. (a) Show that rank UA ≤ rank A, with equality ifU is invertible.

(b) Show that rank AV ≤ rank A, with equality ifV is invertible.

5. Show that rank (AB) ≤ rank A and that rank (AB) ≤ rank B.

6. Show that the rank does not change when anelementary row or column operation isperformed on a matrix.

7. In each case find a basis of the null space of A.Then compute rank A and verify (1) of Theorem 3.

♦(b)

8. Let A = CR where C ≠ 0 is a column in �m andR ≠ 0 is a row in �n.(a) Show that col A = span{C } and

row A = span{R}.♦(b) Find dim(null A).

(c) Show that null A = null R.

A =

3 5 5 2 01 0 2 2 11 1 1 2 22 0 4 4 2

− −− − − −

(a) A =

3 1 12 0 14 2 11 1 1−

U =−

−

−

span156

268

37

10

48

12

U =

span

1100

0011

1010

0101

♦(d) A =−

− − −

1 2 1 33 6 3 2

( )c A =

− −− −

− −− −

1 1 5 2 22 2 2 5 10 0 12 9 31 1 7 7 1

(a) (b)A A=

−−−−

=

−−

−−

2 4 6 82 1 3 24 5 9 100 1 1 2

2 1 12 1 14 2 36

♦

33 0

nic22772_ch05.qxd 11/21/2005 6:49 PM Page 226

SECTION 5.5 Similarity and DiagonalizationIn Section 3.3 we studied diagonalization of a square matrix A, and found importantapplications (for example to linear dynamical systems). We can now utilize the con-cepts of subspace, basis, and dimension to clarify the diagonalization process, revealsome new results, and prove some theorems which could not be demonstrated inSection 3.3.

Before proceeding, we introduce a notion that simplifies the discussion of diagonalization, and is used throughout the book.

Similar MatricesIf A and B are n × n matrices, we say that A and B are similar, and write A ∼ B, ifB = P −1AP for some invertible matrix P, equivalently (writing Q = P −1) if B = QAQ−1

for an invertible matrix Q. The language of similarity is used throughout linear algebra. For example, a matrix A is diagonalizable if and only if it is similar to adiagonal matrix.

If A ∼ B, then necessarily B ∼ A. To see why, suppose that B = P −1AP. ThenA = PBP −1 = Q−1BQ where Q = P −1 is invertible. This proves the second of thefollowing properties of similarity (the others are left as an exercise):

1. A ∼ A for all square matrices A.2. If A ∼ B, then B ∼ A. (∗)3. If A ∼ B and B ∼ C, then A ∼ C.

227Section 5.5 Similarity and Diagonalization

9. Show that null A = 0 if and only if the columnsof A are independent.

10. Let A be an n × n matrix.

(a) Show that A2 = 0 if and only if col A ⊆ null A.

♦(b) Conclude that if A2 = 0, then rank

(c) Find a matrix A for which col A = null A.

11. If A is m × n and B is n × m, show that AB = 0 ifand only if col B ⊆ null A.

♦12. If A is an m × n matrix, show that col A = {AX | X in �n}.

13. Let A be an m × n matrix with columns C1, C2, … , Cn. If rank A = n, show that {ATC1, A

TC2, … , ATCn} is a basis of �n.

14. If A is m × n and B is m × 1, show that B liesin the column space of A if and only if rank[A B] = rank A.

15. (a) Show that AX = B has a solution if and onlyif rank A = rank[A B]. [Hint: Exercises 12and 14.]

♦(b) If AX = B has no solution, show that rank[A B] = 1 + rank A.

16. Let X be a k × m matrix. If I is the m × midentity matrix, show that I + XTX is invertible.

[Hint: I + XTX = ATA where A = in

block form.]

17. If A is m × n of rank r, show that A can befactored as A = PQ where P is m × r with rindependent columns, and Q is r × n with r

independent rows. [Hint: Let by

Theorem 3, §2.4, and write and

in block form, where U1 and V1

are r × r.]

18. (a) Show that if A and B have independentcolumns, so does AB.

(b) Show that if A and B have independentrows, so does AB.

19. A matrix obtained from A by deleting rows andcolumns is called a submatrix of A. If A has aninvertible k × k submatrix, show that rank A ≥ k.[Hint: Show that row and column operations

carry in block form.] Remark: It

can be shown that rank A is the largest integer rsuch that A has an invertible r × r submatrix.

AI P

Qk→

0

VV VV V

− =

1 1 2

3 4

UU UU U

− =

1 1 2

3 4

UAVIr=

00 0

IX

A n≤ 2 .

nic22772_ch05.qxd 11/21/2005 6:49 PM Page 227

These properties are often expressed by saying that the similarity relation ∼ is anequivalence relation on the set of n × n matrices. Here is an example showing howthese properties are used.

Example 1If A is similar to B and either A or B is diagonalizable, show that the other isalso diagonalizable.

Solution We have A ∼ B. Suppose that A is diagonalizable, say A ∼ D where D isdiagonal. Since B ∼ A by (2) of (∗), we have B ∼ A and A ∼ D. Hence B ∼ D by(3) of (∗), so B is diagonalizable too. An analogous argument works if weassume instead that B is diagonalizable.

Similarity is compatible with inverses, transposes, and powers in the following sense: If A ∼ B, then A−1 ∼ B−1, AT ∼ BT, and Ak ∼ Bk for all integers k ≥ 1 (the proofs are routine matrix computations using Theorem 1 §3.3). Thus, for example, if A is diagonalizable, so also is AT, A−1 (if it exists), and Ak (for each k ≥ 1).Indeed, if A ∼ D where D is a diagonal matrix, we obtain AT ∼ DT, A−1 ∼ D−1, and Ak ∼ Dk, and each of the matrices DT, D−1, and Dk is diagonal.

We pause to introduce a simple matrix function that will be referred to later. The trace tr A of an n × n matrix A is defined to be the sum of the main diagonalelements of A. In other words:

It is evident that tr(A + B) = tr A + tr B and that tr(cA) = c tr A holds for all n × nmatrices A and B and all scalars c. The following fact is more surprising.

Lemma 1

Let A and B be n × n matrices. Then tr(AB) = tr(BA).

PROOF

Write A = [ai j] and B = [bi j]. For each i, the (i, i )-entry of the matrix AB is Hence

.

Similarly we have . Since these two double sums are the same, Lemma 1 is proved.

As the name indicates, similar matrices share many properties, some of which arecollected in the next theorem for reference.

Theorem 1

If A and B are similar n × n matrices, then A and B have the same determinant,rank, trace, characteristic polynomial, and eigenvalues.

tr( ) ( )BA b aij jiji= ∑∑

tr( ) ( )AB d d d d a bn i i i j ij ji= + + + = =∑ ∑ ∑1 2

d a b a b a b a bi i i i i in ni ij jij= + + + = ∑1 1 2 2 .

If then trA a A a a aij nn= = + + +[ ], .11 22


nic22772_ch05.qxd 11/21/2005 6:49 PM Page 228

PROOF

Let B = P −1AP for some invertible matrix P. Then we have det B = det(P −1) det A det P = det A because det(P −1) = 1/det P. Similarly, rank B = rank(P −1AP ) = rank A by Corollary 4 of Theorem 2 §5.4. Next Lemma 1gives

As to the characteristic polynomial,

Finally, this shows that A and B have the same eigenvalues because the eigenvaluesof a matrix are the roots of its characteristic polynomial.

Example 2

The matrices have the same determinant, rank,

trace, characteristic polynomial, and eigenvalues, but they are not similar because P −1IP = I for any invertible matrix P. Hence sharing the five properties in Theorem 1 does not guarantee that two matrices are similar.

Diagonalization Revisited Recall that a square matrix A is diagonalizable if there exists an invertible matrix Psuch that P−1AP = D is a diagonal matrix, that is if A is similar to a diagonal matrix D.

Unfortunately, not all matrices are diagonalizable, for example the matrix

(see Example 8 §3.3). Determining whether A is diagonalizable is closely related to the eigenvalues and eigenvectors of A. Recall that a number � is called aneigenvalue of A if AX = �X for some nonzero column X in �n, and any suchnonzero vector X is called an eigenvector of A corresponding to � (or simplya �-eigenvector of A). The eigenvalues and eigenvectors of A are closely related tothe characteristic polynomial cA(x) of A, defined by

.

If A is n × n this is a polynomial of degree n, and its relationship to the eigenvaluesis given in the following theorem (a repeat of Theorem 2 §3.3).

Theorem 2

Let A be an n × n matrix. 1. The eigenvalues � of A are the roots of the characteristic polynomial cA(x) of A.2. The �-eigenvectors X are the nonzero solutions to the homogeneous system

of linear equations with �I − A as coefficient matrix.

( )�I A X− = 0

c x xI AA ( ) det( )= −

1 10 1

A I= =1 10 1

and 1 00 1

c x xI B x P IP P AP

P xI A PxI

B ( ) det{ } det{ ( ) }

det{ ( ) }det(

= − = −

= −=

− −

−

1 1

1

−−=

Ac xA

)( ).

tr tr tr tr( ) [ ( )] [( ) ] .P AP P AP AP P A− − −= = =1 1 1


nic22772_ch05.qxd 11/21/2005 6:49 PM Page 229

Example 3Show that the eigenvalues of a triangular matrix are the main diagonal entries.

Solution Assume that A is triangular. Then the matrix xI − A is also triangular and hasdiagonal entries (x − a11), (x − a22), … , (x − ann) where A = [ai j]. HenceTheorem 4 §3.1 gives

and the result follows because the eigenvalues are the roots of cA(x).

Theorem 3 §3.3 asserts (in part) that an n × n matrix A is diagonalizable if andonly if it has n eigenvectors X1, … , Xn such that the matrix P = [X1 … Xn ] with theXi as columns is invertible. This is equivalent to requiring that {X1, … , Xn} is a basisof �n consisting of eigenvectors of A. Hence we can restate Theorem 3 §3.3 asfollows:

Theorem 3

Let A be an n × n matrix.1. A is diagonalizable if and only if �n has a basis {X1, X2, … , Xn} consisting of

eigenvectors of A.2. When this is the case, the matrix P = [X1 … Xn] is invertible and

P −1AP = diag(�1, �2, … , �n) where, for each i, �i is the eigenvalue of Acorresponding to Xi .

The next result is a basic tool for determining when a matrix is diagonalizable.It reveals an important connection between eigenvalues and linear independence:Eigenvectors corresponding to distinct eigenvalues are necessarily linearlyindependent.

Theorem 4

Let X1, X2, … , Xk be eigenvectors corresponding to distinct eigenvalues �1, �2, … , �k of an n × n matrix A. Then {X1, X2, … , Xk} is a linearly independent set.

PROOF

We use induction on k. If k = 1, then {X1} is independent because X1 ≠ 0. In general,suppose the theorem is true for some k ≥ 1. Given eigenvectors {X1, X2, … , Xk + 1},suppose a linear combination vanishes:

(∗)

We must show that each ti = 0. Left multiply (∗) by A and use the fact thatAXi = �iXi to get

(∗∗)

If we multiply (∗) by �1 and subtract the result from (∗∗), the first terms cancel andwe obtain

t X t X t Xk k k2 2 1 2 3 3 1 3 1 1 1 1 0( ) ( ) ( ) .� � � � � �− + − + + − =+ + +

t X t X t Xk k k1 1 1 2 2 2 1 1 1 0� � �+ + + =+ + + .

t X t X t Xk k1 1 2 2 1 1 0+ + + =+ + .

c x x a x a x aA nn( ) ( )( ) ( )= − − −11 22


nic22772_ch05.qxd 11/21/2005 6:49 PM Page 230

Since X2, X3, … , Xk + 1 correspond to distinct eigenvalues �2, �3, … , �k + 1, the set{X2, X3, … , Xk + 1} is independent by the induction hypothesis. Hence,

and so t2 = t3 = … = tk + 1 = 0 because the �i are distinct. Hence (∗) becomest1X1 = 0, which implies that t1 = 0 because X1 ≠ 0. This is what we wanted.

Theorem 4 will be applied several times; we begin by using it to give a useful testfor when a matrix is diagonalizable.

Theorem 5

If A is an n × n matrix with n distinct eigenvalues, then A is diagonalizable.

PROOF

Choose one eigenvector for each of the n distinct eigenvalues. Then theseeigenvectors are independent by Theorem 4, and so are a basis of �n byTheorem 7 §5.2. Now use Theorem 3.

Example 4

Show that is diagonalizable.

Solution A routine computation shows that cA(x) = (x − 1)(x − 3)(x + 1) and so hasdistinct eigenvalues 1, 3, and −1. Hence Theorem 5 applies.

However, a matrix can have multiple eigenvalues as we saw in Section 3.3. Todeal with this situation, we prove an important lemma which formalizes a techniquethat is basic to diagonalization, and which will be used three times below.

Lemma 2

Let {X1, X2, … , Xk} be a linearly independent set of eigenvectors of an n × nmatrix A, extend it to a basis {X1, X2, … , Xk, … , Xn} of �n (by Theorem 6 §5.2),and let

be the (invertible) matrix with the Xi as its columns. If �1, �2, … , �k are the (not necessarily distinct) eigenvalues of A corresponding to X1, X2, … , Xkrespectively, then P−1AP has block form

where B and A1 are matrices of size k × (n − k) and (n − k ) × (n − k ) respectively.

P APBA

k− =

1 1 2

10diag ( , , ... , )� � �

P X X Xn= [ ]1 2

A =−

1 0 01 2 31 1 0

t t tk k2 2 1 3 3 1 1 1 10 0 0( ) , ( ) , , ( )� � � � � �− − … −= = =+ +


nic22772_ch05.qxd 11/21/2005 6:49 PM Page 231

PROOF

If {E1, E2, … , En} is the standard basis of �n, then

Comparing columns, we have P −1Xi = Ei for each 1 ≤ i ≤ n. On the other hand,observe that

Hence, if 1 ≤ i ≤ k, column i of P −1AP is

This describes the first k columns of P −1AP, and Lemma 2 follows.

Note that Lemma 2 (with k = n) shows that an n × n matrix A is diagonalizable if �n has a basis of eigenvectors of A, as in (1) of Theorem 3.

If � is an eigenvalue of an n × n matrix A, write

This is a subspace of �n called the eigenspace of A corresponding to � (see Example 3§5.1) and the eigenvectors corresponding to � are just the nonzero vectors in E�(A).In fact E�(A) is the null space of the matrix (�I − A):

Hence, by Example 7 §5.1, the basic solutions of the homogeneous system (�I − A)X = 0 given by the gaussian algorithm form a basis for E�(A). In particular

. (∗∗∗)

Now recall that the multiplicity of an eigenvalue � of A is the number of times �occurs as a root of the characteristic polynomial. In other words, the multiplicity of� is the largest integer m ≥ 1 such that

for some polynomial g(x). Because of (∗∗∗), the assertion (without proof ) inTheorem 4 §3.3 can be stated as follows: A square matrix is diagonalizable if andonly if the multiplicity of each eigenvalue � equals dim[E�(A)]. We are going toprove this, and the proof requires the following result which is valid for any squarematrix, diagonalizable or not.

Lemma 3

Let � be an eigenvalue of multiplicity m of a square matrix A. Then dim[E�(A)] ≤ m.

c x x g xAm( ) ( ) ( )= − �

dim[ ( )]E A� is the number of basic solutions to the system (( )�I A X− = 0

E A X I A X I A� � �( ) { ( ) } ( ).= − = = −| 0 null

E A X AX Xn� �( ) { }.= = in � |

( ) ( ) ( ) .P A X P X P X Ei i i i i i i− − −= = =1 1 1

� � �

P AP P A X X X P AX P AX P AXn n− − − − −= =1 1 [ ] [ ].1 2

11

12

1

[ ] [ ]

1 2

1 11 2E E E I P P P X X Xn n n= = =− −

[ ].11

12

1= − − −P X P X P Xn


nic22772_ch05.qxd 11/21/2005 6:49 PM Page 232

PROOF

Write dim[E�(A)] = d. It suffices to show that cA(x) = (x − �)dg(x) for some poly-nomial g(x ), because m is the highest power of (x − �) that divides cA(x). To thisend, let {X1, X2, … , Xd } be a basis of E�(A). Then Lemma 2 shows that aninvertible n × n matrix P exists such that

in block form, where Id denotes the d × d identity matrix. Now write A′ = P −1APand observe that cA′(x) = cA(x) by Theorem 1. But Theorem 5 §3.1 gives

Hence cA(x) = cA′(x) = (x − �)dg(x) where g(x) = cA1(x). This is what we wanted.

It is impossible to ignore the question when equality holds in Lemma 3 for eacheigenvalue �. It turns out that this characterizes the diagonalizable matrices. Thiswas stated without proof in Theorem 4 §3.3.

Theorem 6

The following are equivalent for a square matrix A: 1. A is diagonalizable.2. dim[E�(A)] equals the multiplicity of � for every eigenvalue � of the matrix A.

PROOF

Let A be n × n and let �1, �2, … , �k be the distinct eigenvalues of A. For each i, let mi denote the multiplicity of �i and write di = dim[E�i(A)]. Then

so m1 + … + mk = n because cA(x) has degreen. Moreover, di ≤ mi for each i by Lemma 3.

(1) ⇒ (2). By (1), �n has a basis of n eigenvectors of A, so let ti of them lie in E�i(A)for each i. Since the subspace spanned by these ti eigenvectors has dimension ti , we have ti ≤ di for each i by Theorem 4 §5.2. Hence

It follows that d1 + … + dk = m1 + … + mk, so, since di ≤ mi for each i, we must have di = mi. This is (2).

(2) ⇒ (1). Let Bi denote a basis of E�i(A) for each i, and let B = B1 � … � Bk.

Since each Bi contains mi vectors by (2), and since the Bi are pairwise disjoint (the �iare distinct), it follows that B contains n vectors. So it suffices to show that B islinearly independent (then B is a basis of �n ). Suppose a linear combination of thevectors in B vanishes, and let Yi denote the sum of all terms that come from Bi .

n t t d d m m nk k k= + + ≤ + + ≤ + + =1 1 1 .

c x x x xAm m

kmk( ) ( ( )= − − −� � �1 2

1 2) ( )

c x xI Ax I B

xI A

x I

A nd

n d

d

′−

= − ′ =− −

−

= −

( ) det( ) det( )

det [( )

�

�

0 1

]] det [ ]

( ) ( ).

xI A

x c x

n d

dA

− −

= −

1

1 �

P API B

Ad− =

1

10�


nic22772_ch05.qxd 11/21/2005 6:49 PM Page 233

Then Yi lies in E�i(A) for each i, so the nonzero Yi are independent by Theorem 4(as the �i are distinct). Since the sum of the Yi is zero, it follows that Yi = 0 for eachi. Hence all coefficients of terms in Yi are zero (because Bi is independent). Thisshows that B is independent.

Example 5If show that A is diagonalizable

but B is not.

Solution We have cA(x) = (x + 3)2(x − 1) so the eigenvalues are �1 = −3 and �2 = 1. Thecorresponding eigenspaces are E�1

(A) = span{X1, X2} and E�2(A) = span{X3}

where

as the reader can verify. Since {X1, X2} is independent, we have dim(E�1(A)) = 2

which is the multiplicity of �1. Similarly, dim(E�2(A)) = 1 equals the multiplicity

of �2. Hence A is diagonalizable by Theorem 6, and a diagonalizing matrix is P = [X1 X2 X3].

Turning to B, cB(x) = (x + 1)2(x − 3) so the eigenvalues are �1 = −1 and �2 = 3.The corresponding eigenspaces are E�1

(B) = span{Y1} and E�2(B) = span{Y2} where

Here dim(E�1(B)) = 1 is smaller than the multiplicity of �1, so the matrix B is

not diagonalizable, again by Theorem 6. The fact that dim(E�1(B)) = 1 means

that there is no possibility of finding three linearly independent eigenvectors.

Complex EigenvaluesAll the matrices we have considered have had real eigenvalues. But this need not be

the case: The matrix has characteristic polynomial cA(x) = x2 + 1 which

has no real roots. Nonetheless, this matrix is diagonalizable; the only difference is that we must use a larger set of scalars, the complex numbers. The basic propertiesof these numbers are outlined in Appendix A.

Indeed, nearly everything we have done for real matrices can be done for com-plex matrices. The methods are the same; the only difference is that the arithmeticis carried out with complex numbers rather than real ones. For example, thegaussian algorithm works in exactly the same way to solve systems of linear equa-tions with complex coefficients, matrix multiplication is defined the same way, andthe matrix inversion algorithm works in the same way.

But the complex numbers are better than the real numbers in one respect: Whilethere are polynomials like x2 + 1 with real coefficients that have no real root, thisproblem does not arise with the complex numbers: Every nonconstant polynomialwith complex coefficients has a complex root, and hence factors completely asa product of linear factors. This fact is known as the Fundamental Theorem ofAlgebra, and was first proved by Gauss.12

A =−

0 11 0

Y YT T1 21 2 1 5 6 1= − = −[ ] , [ ] .

X X XT T T1 2 31 1 0 2 0 1 2 1 1= − = − = −[ ] , [ ] , [ ]

A B=− − −

= −− −

5 8 164 1 84 4 11

2 1 12 1 21 0 2

and ,


12 This was a famous open problem in 1799 when Gauss solved it at the age of 22 in his Ph.D. dissertation.

nic22772_ch05.qxd 11/21/2005 6:49 PM Page 234

Example 6Diagonalize the matrix

Solution The characteristic polynomial of A is

where i 2 = −1. Hence the eigenvalues are �1 = i and �2 = − i, with correspond-

ing eigenvectors . Hence A is diagonalizable by the

complex version of Theorem 5, and the complex version of Theorem 3 shows

that is invertible and

Of course, this can be checked directly.

We shall return to complex linear algebra in Section 8.6.

Symmetric Matrices13

On the other hand, many of the applications of linear algebra involve a real matrix Aand, while A will have complex eigenvalues by the Fundamental Theorem of Algebra,it is always of interest to know when the eigenvalues are, in fact, real. While this canhappen in a variety of ways, it turns out to hold whenever A is symmetric. Thisimportant theorem will be used extensively later. Surprisingly, the theory of complexeigenvalues can be used to prove this useful result about real eigenvalues.

If Z is a complex matrix, the conjugate matrix is defined to be the matrixobtained from Z by conjugating every entry. Thus, if Z = [zi j], then Forexample,

Recall that holds for all complex numbers z and w. Itfollows that if Z and W are two complex matrices, then

hold for all complex scalars �. These facts are used in the proof of the following theorem.

Theorem 7

Let A be a symmetric real matrix. If � is any eigenvalue of A, then � is real.14

PROOF

Observe that A– = A because A is real. If � is an eigenvalue of A, we show that � isreal by showing that �

– = �. Let X be a (possibly complex) eigenvector correspondingto �, so that X ≠ 0 and AX = �X. Define c = X TX–.

Z W Z W ZW Z W Z+ = + = =, )and (��

z w z w zw z w+ = + = and

If thenZi

i iZ

ii i

=− +

+

=+− −

2 53 4

2 53 4

Z zij= [ ].Z

P APi

i− =

=−

1 1

2

00

00

�

�.P X X

i i= =

−

[ ]1 21 1

Xi

Xi1 2

1 1=

−

=

and

c x xI A x x i x iA ( ) det ( ) ( )( )= − = + = − +2 1

A =−

0 11 0

.


13 This discussion uses complex conjugation and absolute value. These topics are discussed in Appendix A.

14 This theorem was first proved in 1829 by the great French mathematician Augustin Louis Cauchy (1789–1857) who is mostremembered for his work in analysis.

nic22772_ch05.qxd 11/21/2005 6:49 PM Page 235

If we write X = [z1 z2 … zn ]T where the zi are complex numbers, we have

Thus c is a real number, and c > 0 because at least one of the zi ≠ 0 (as X ≠ 0). We show that �– = � by verifying that �c = �

–c. We have

At this point we use the hypothesis that A is symmetric and real. This means AT = A = A–, so we continue

as required.

The technique in the proof of Theorem 7 will be used again when we return tocomplex linear algebra in Section 8.6.

Example 7Verify Theorem 7 for every real, symmetric 2 × 2 matrix A.

Solution If we have cA(x) = x2 – (a + c)x + (ac − b2), so the eigenvalues are

given by . But the discriminant

for any choice of a, b, and c. Hence, the eigenvalues are real numbers.

Exercises 5.5

( ) ( ) ( )a c ac b a c b+ − − = − + ≥2 2 2 24 4 0

� = + ± + − −

12

2 24( ) ( ) ( )a c a c ac b

Aa bb c

=

� �

�

�

�

c X A X X A X X AX X X

X X

X Xc

T T T T T

T

T

= = = =

=

==

( ) ( ) ( )

( )

� � �c X X X X AX X X A XT T T T T= = = =( ) ( ) ( ) .

c X X z z z

z

z

z

z z z z z zTn

n

n n= =

= + + +[ ]1 2

1

21 1 2 2

= + + +z z zn12

22 2 .


1. By computing the trace, determinant, and rank,show that A and B are not similar in each case. ♦ =

−

=−

=

=

(d)

(e)

A B

A B

3 11 2

2 13 2

2 1 11 0 11 1 0

,

,11 2 12 4 23 6 3

1 2 31 1 20 3 5

2 1 36

−− −− −

=−

−−

=−

♦(f ) , A B −− −

3 90 0 0

(a)

(b)

A B

A B

=

=−

=−

=

♦

1 22 1

1 11 1

3 12 1

1 12 1

,

,

=−

=−

(c) A B2 11 1

3 01 1

,

nic22772_ch05.qxd 11/21/2005 6:49 PM Page 236

2.

are not similar.

3. If A ∼ B, show that:

4. In each case, decide whether the matrix A isdiagonalizable. If so, find P such that P −1AP isdiagonal.

5. If A is invertible, show that AB is similar to BAfor all B.

6. Show that the only matrix similar to a scalarmatrix A = rI, r in �, is A itself.

7. Let � be an eigenvalue of A with correspondingeigenvector X. If B = P −1AP is similar to A, showthat P −1X is an eigenvector of B correspondingto �.

8. If A ∼ B and A has any of the following proper-ties, show that B has the same property.

(a) Idempotent, that is A2 = A. ♦(b) Nilpotent, that is Ak = 0 for some k ≥ 1.

(c) Invertible.

9. Let A denote an n × n upper triangular matrix. (a) If all the main diagonal entries of A are

distinct, show that A is diagonalizable. ♦(b) If all the main diagonal entries of A are equal,

show that A is diagonalizable only if it isalready diagonal.

(c) Show that is diagonalizable but

that is not.

10. Let A be a diagonalizable n × n matrix with eigenvalues �1, �2, … , �n (including multiplicities). Show that:

(a) det A = �1�2 … �n

♦(b) tr A = �1 + �2 + … + �n

11. Given a polynomial p(x) = r0 + r1x + … + rnxn

and a square matrix A, the matrix p(A ) = r0I + r1A + … + rnAn is called theevaluation of p(x) at A. Let B = P −1AP. Showthat p(B) = P −1p(A )P for all polynomials p(x).

12. Let P be an invertible n × n matrix. If A is any n × n matrix, write TP(A ) = P −1AP. Verify that:

(a) TP(I ) = I

♦(b) TP(AB) = TP(A )TP(B)

(c) TP(A + B) = TP(A ) + TP(B)

(d) TP(rA) = rTP(A )

(e) TP(Ak) = [TP(A )]k for k ≥ 1

(f ) If A is invertible, TP(A−1) = [TP(A)]−1.

(g) If Q is invertible, TQ[TP(A)] = TPQ(A).

13. (a) Show that two diagonalizable matrices aresimilar if and only if they have the same eigen-values with the same multiplicities.

♦(b) If A is diagonalizable, show that A ∼ AT.

14. If A is 2 × 2 and diagonalizable, show that C(A ) = {X | XA = AX } has dimension 2 or 4. [Hint: If P −1AP = D, show that X is in C(A ) ifand only if P −1XP is in C (D).]

15. If A is diagonalizable and p(x) is a polynomialsuch that p(�) = 0 for all eigenvalues � of A,show that p(A ) = 0 (see Example 9 §3.3). Inparticular, show cA(A ) = 0. [Remark: cA(A ) = 0for all square matrices A—this is the Cayley–Hamilton theorem (see Theorem 2 §9.4).]

16. Let A be n × n with n distinct real eigenvalues.If AC = CA, show that C is diagonalizable.

1 1 00 1 00 0 2

1 0 10 1 00 0 2

(c) 3 1 62 1 01 0 3− −

♦(d) 4 0 00 2 22 3 1

(a) (b)1 0 01 2 10 0 1

3 0 60 3 05 0 2

−

♦

(a) (b)

(c) for in (d) for

A B A B

rA rB r A B n

T T

n n

∼ ∼

∼ ∼

♦ − −

≥

1 1

1�

Show that and

1 2 1 02 0 1 11 1 0 14 3 0 0

1 1 3 01 0 1 10 1 4

−

−

−−

− 115 1 1 4− − −


nic22772_ch05.qxd 11/21/2005 6:49 PM Page 237

17. Let

(a) Show that x3 – (a2 + b2 + c2)x – 2abc has realroots by considering A.

♦(b) Show that a2 + b2 + c2 ≥ ab + ac + bc byconsidering B.

18. Assume the 2 × 2 matrix A is similar to an uppertriangular matrix. If tr A = 0 = tr A2, show thatA2 = 0.

19. Show that A is similar to AT for all 2 × 2

matrices A. [Hint: Let . If c = 0, treat

the cases b = 0 and b ≠ 0 separately. If c ≠ 0, reduce to the case c = 1 using Exercise 12(d).]

20. Refer to Section 3.4 on linear recurrences.Assume that the sequence x0, x1, x2, … satisfies

xn + k = r0xn + r1xn + 1 + … + rk − 1xn + k − 1

for all n ≥ 0. Define

Then show that:(a) Vn = AnV0 for all n.

(b) cA(x) = xk − rk − 1xk − 1 − … − r1x − r0.

(c) If � is an eigenvalue of A, the eigenspace E�

has dimension 1, and X = (1, �, �2, … , �k − 1)T

is an eigenvector. [Hint: Use cA(�) = 0 toshow that E� = �X.]

(d) A is diagonalizable if and only if the eigen-values of A are distinct. [Hint: See part (c)and Theorem 4.]

(e) If �1, �2, … , �k are distinct real eigenvalues,there exist constants t1, t2, … , tk such that

holds for all n. [Hint: IfD is diagonal with �1, �2, … , �k as the maindiagonal entries, show that An = PDnP −1 hasentries that are linear combinations of

.]� � �1 2n n

kn, , ,…

x t tnn

k kn= + +1 1� �

A

r r r r

V

xx

xk

n

n

n=

=

−

+

0 1 0 00 0 1 0

0 0 0 1

0 1 2 1

1,

nn k+ −

1

.

Aa bc d

=

Aa b

a cb c

Bc a ba b cb c a

=

=

00

0and .


SECTION 5.6 An Application to Correlation and VarianceSuppose the heights of n men are measured. Such a data set is called asample of the heights of all the men in the population under study, and variousquestions are often asked about such a sample: What is the average height in thesample? How much variation is there in the sample heights, and how can it bemeasured? What can be inferred from the sample about the heights of all men inthe population? How do these heights compare to heights of men in neighbouringcountries? Does the prevalence of smoking affect the height of a man?

The analysis of samples, and of inferences that can be drawn from them, is asubject called mathematical statistics, and an extensive body of information has beendeveloped to answer many such questions. In this section we will describe a fewways that linear algebra can be used.

It is convenient to represent a sample as a sample vectorin �n. This being done, the dot product in �n provides a

convenient tool to study the sample and describe some of the statistical conceptsrelated to it. The most widely known statistic for describing a data set is the samplemean defined by15

The mean is “typical” of the sample values xi, but may not itself be one of them.The number xi − is called the deviation of xi from the mean . The deviation isxx

x

x x x x xn n n i

i

n

= + + + ==∑1

1 21

1( ) .

x

X x x xnT= [ ]1 2

{ }x x xn1 2, , ,…

h h hn1 2, , ,…

15 The mean is often called the “average” of the sample values xi, but statisticians use the term “mean”.

nic22772_ch05.qxd 11/21/2005 6:49 PM Page 238

positive if and it is negative if . Moreover, the sum of these deviations iszero:

(∗)

This is described by saying that the sample mean is central to the sample values xi.If the mean is subtracted from each data value xi, the resulting data are

said to be centred. The corresponding data vector is

and (∗) shows that the mean . For example, the sample is plotted in the diagram. The mean is , and the centred sample

is also plotted. Thus, the effect of centring is to shift thedata by an amount (to the left if is positive) so that the mean moves to 0.

Another question that arises about samples is how much variability there is in thesample ; that is, how widely are the data “spread out” aroundthe sample mean . A natural measure of variability would be the sum of thedeviations of the xi about the mean, but this sum is zero by (∗); these deviationscancel out. To avoid this cancellation, statisticians use the squares of thedeviations as a measure of variability. More precisely, they compute a statistic calledthe sample variance , defined16 as follows:

The sample variance will be large if there are many xi at a large distance from themean , and it will be small if all the xi are tightly clustered about the mean. Thevariance is clearly nonnegative (hence the notation ), and the square root sx of thevariance is called the sample standard deviation.

The sample mean and variance can be conveniently described using the dotproduct. Let

denote the column with every entry equal to 1. If , then, so the sample mean is given by the formula

.

Moreover, remembering that is a scalar, we have , so thecentred sample vector Xc is given by

Thus we obtain a formula for the sample variance:

Linear algebra is also useful for comparing two different samples. To illustratehow, consider two examples.

s X X xx n c n2 1

12 1

12= = −− − 1 .

X X x x x x x x xc nT= − = − − −1 [ ] .1 2

x x x x T1= [ ]x

x n X=1

( )i1

X x x xni1 = + + +1 2

X x x xnT= [ ]1 2

1 = [ ]1 1 1 T

sx2

x

s x x x x x x x xx n n n ii

n2 1

1 12

22 2 1

12

1= − + − + + − = −− −

=∑[ ]( ) ( ) ( ) ( ) .

sx2

( )x xi − 2

xX x x xn

T= [ ]1 2

xx[ ]− − −3 2 1 2 4 TXc =

x = 2X T= −[ ]1 0 1 4 6cx = 0

X x x x x x xc nT= − − −[ ]1 2

x xi −xx

( )x x x nx nx nxii

n

ii

n

− =

− = − = .= =∑ ∑

1 10

x xi <x xi >

239Section 5.6 An Application to Correlation and Variance

16 Since there are n sample values, it seems more natural to divide by n here, rather than by n − 1. The reason for using n − 1is that then the sample variance s2

x provides a better estimate of the variance of the entire population from which the samplewas drawn.

–1 0 1

Sample X

Centred Sample Xc

2 4 6

420-1-2-3

x

xc

nic22772_ch05.qxd 11/21/2005 6:49 PM Page 239

The following table represents the number of sick days at work per year and theyearly number of visits to a physician for 10 individuals.

The data are plotted in the scatter diagram where it is evident that, roughlyspeaking, the more visits to the doctor the more sick days. This is an example of apositive correlation between sick days and doctor visits.

Now consider the following table representing the daily doses of vitamin C and the number of sick days.

The scatter diagram is plotted as shown and it appears that the more vitamin Ctaken, the fewer sick days. In this case there is a negative correlation between dailyvitamin C and sick days.

In both these situations, we have paired samples, that is observations of twovariables are made for ten individuals: doctor visits and sick days in the first case;daily vitamin C and sick days in the second case. The scatter diagrams point to arelationship between these variables, and there is a way to use the sample tocompute a number, called the correlation coefficient, that measures the degree towhich the variables are associated.

To motivate the definition of the correlation coefficient, suppose two pairedsamples and are given and consider thecentred samples

If xk is large among the xi’s, then the deviation will be positive; and will be negative if xk is small among the xi’s. The situation is similar for Y, and thefollowing table displays the sign of the quantity in all four cases:

Intuitively, if X and Y are positively correlated, then two things happen:

1. Large values of the xi tend to be associated with large values of the yi, and 2. Small values of the xi tend to be associated with small values of the yi.

It follows from the table that, if X and Y are positively correlated, then the dotproduct

is positive. Similarly Xc i Yc is negative if X and Y are negatively correlated. Withthis in mind, the sample correlation coefficient17 r is defined by

X Y x x y yc c i ii

n

i = − −=∑ ( )( )

1

Sign oflarge small

large positive negative( )( ):x x y yx x

yy

i i

i i

i

i

− −ssmall negative positive

( )( )x x y yi i− −

x xk −x xk −

X x x x x x x Y y y y y y yc nT

c nT= − − − = − − −[ ] [ ] .1 2 1 2and


T= [ ]1 2

Individual Vitamin CSick days

1 5 7 0 4 9 2 8 6 35 2 2 6 2 1 4 3

1 2 3 4 5 6 7 8 9 10

22 5

Individual 1 2 3 4 5 6 7 8 9 10Doctor visits 2 6 8 1 5 10 3 9 7 4Sick days 2 4 8 3 55 9 4 7 7 2


17 The idea of using a single number to measure the degree of relationship between different variables was pioneered by FrancisGalton (1822–1911). He was studying the degree to which characteristics of an offspring relate to those of its parents. Theidea was refined by Karl Pearson (1857–1936) and r is often referred to as the Pearson correlation coefficient.

SickDays

SickDays

Doctor Visits

Vitamin C Doses

nic22772_ch05.qxd 11/21/2005 6:49 PM Page 240

Bearing the situation in �3 in mind, r is the cosine of the “angle” between thevectors Xc and Yc, and so we would expect it to lie between −1 and 1. Moreover, we would expect r to be near 1 (or −1) if these vectors were pointing in the same(opposite) direction, that is the “angle” is near zero (or �).

This is confirmed by Theorem 1 below, and it is also borne out in the examplesabove. If we compute the correlation between sick days and visits to the physician(in the first scatter diagram above) the result is r = 0.90 as expected. On the otherhand, the correlation between daily vitamin C doses and sick days (second scatterdiagram) is r = − 0.84.

However, a word of caution is in order here. We cannot conclude from the secondexample that taking more vitamin C will reduce the number of sick days at work.The (negative) correlation may arise because of some third factor that is related toboth variables. In this case it may be that less healthy people are inclined to takemore vitamin C. Correlation does not imply causation. Similarly, the correlationbetween sick days and visits to the doctor does not mean that having many sick dayscauses more visits to the doctor. A correlation between two variables may point tothe existence of other underlying factors, but it does not necessarily mean that thereis a causality relationship between the variables.

Our discussion of the dot product in �n provides the basic properties of thecorrelation coefficient:

Theorem 1

Let and be (nonzero) paired samples,and let r = r(X, Y ) denote the correlation coefficient. Then:

1. −1 ≤ r ≤ 1.2. r = 1 if and only if there exist a and b > 0 such that yi = a + bxi for each i.3. r = −1 if and only if there exist a and b < 0 such that yi = a + bxi for each i.

PROOF

The Cauchy inequality (Theorem 2 §5.3) proves (1), and also shows that r = ±1 if andonly if one of Xc and Yc is a scalar multiple of the other. This in turn holds if andonly if Yc = bXc for some b ≠ 0, and it is easy to verify that r = 1 when b > 0 and r = −1 when b < 0.

Finally, Yc = bXc means for each i; that is, yi = a + bxi where. Conversely, if yi = a + bxi, then (verify), so

for each i. In other words, Yc = bXc. Thiscompletes the proof.

Properties (2) and (3) in Theorem 1 show that r(X, Y ) = 1 means that there is a linear relation with positive slope between the paired data (so large x values arepaired with large y values). Similarly, r(X, Y ) = −1 means that there is a linearrelation with negative slope between the paired data (so small x values are pairedwith small y values). This is borne out in the two scatter diagrams above.

We conclude by using the dot product to derive some useful formulas for com-puting variances and correlation coefficients. Given samples and , the key observation is the following formula:

(∗∗)X Y X Y n x yc ci i= − .

Y y y ynT= [ ]1 2

X x x xnT= [ ]1 2

( ) ( ) ( )a bx a bx b x xi i+ − + = −y yi − =y a bx= +a y bx= −

y y b x xi i− = −( )


T= [ ]1 2

r r X YX YX Y

c c

c c= , =( ) .

i

241Section 5.6 An Application to Correlation and Variance

nic22772_ch05.qxd 11/21/2005 6:49 PM Page 241

1. The following table gives IQ scores for 10fathers and their eldest sons. Calculate themeans, the variances, and the correlation coeffi-cient r. (The data scaling formula is useful.)

1 2 3 4 5 6 7 8 9 10

Father’s IQ 140 131 120 115 110 106 100 95 91 86

Son’s IQ 130 138 110 99 109 120 105 99 100 94

Indeed, remembering that and are scalars:

Taking Y = X in (∗∗) gives a formula for the variance of X.

Variance Formula

If x is a sample vector, then .

We also get a convenient formula for the correlation coefficient,

Moreover, (∗∗) and the fact that give:

Correlation Formula

If X and Y are sample vectors, then

Finally, we give a method that simplifies the computations of variances andcorrelations.

Data Scaling

Let and be sample vectors. Givenconstants a, b, c, and d, consider new samples and

where zi = a + bxi for each i and wi = c + dyi for each i.Then:

(a) .(b) , so sz = �b� sx.(c) If b and d have the same sign, then r(X, Y ) = r(Z, W ).

The verification is left as an exercise. For example, if , subtracting 100 yields

. A routine calculation shows that and , so, and .

Exercises 5.6

sx2 14

3 4 67= = .x = − = .100 99 6713

sz2 14

3=z = − 13Z T= − − −[ ]1 2 3 1 0 3

X T= [ ]101 98 103 99 100 97

s b sz x2 2 2=

z a bx= +

W w w wnT= [ ]1 2

Z z z znT= [ ]1 2


T= [ ]1 2

r r X YX Y nx y

n s sx y= , =

−−

( )( )

.i

1

s Xx n c2 1

12= −

r r X YX YX Y

c c

c c= , =( ) .

i

s X nxx n2 1

12 2= −− � �

s Xx n c2 1

12= −

X Y X x Y yX Y y X x Y x yX Y y nx x

c ci ii i i ii

= − −= − − += − −

( ) ( )( ) ( ) ( )( ) (

1 11 1 1 1

nny x y nX Y n x y

) ( ).

+= −i

yx


nic22772_ch05.qxd 11/21/2005 6:49 PM Page 242

SECTION 5.7 An Application to Least Squares ApproximationIn many scientific investigations, data are collected that relate two variables. Forexample, if x is the number of dollars spent on advertising by a manufacturer and yis the value of sales in the region in question, the manufacturer could generate databy spending x1, x2, … , xn dollars at different times and measuring the correspondingsales values y1, y2, … , yn.

Suppose it is known that a linear relationship exists between the variables x and y—in other words, that y = a + bx for some constants a and b. If the data are plotted,the points (x1, y1), (x2, y2), … , (xn, yn) may appear to lie on a straight line andestimating a and b requires finding the “best-fitting” line through these data points.For example, if five data points occur as shown in Figure 5.1, line 1 is clearly abetter fit than line 2. In general, the problem is to find the values of the constantsa and b such that the line y = a + bx best approximates the data in question. Notethat an exact fit would be obtained if a and b were such that yi = a + bxi were truefor each data point (xi, yi). But this is too much to expect. Experimental errors inmeasurement are bound to occur, so the choice of a and b should be made in sucha way that the errors between the observed values yi and the corresponding fittedvalues a + bxi are in some sense minimized.

The first thing we must do is explain exactly what we mean by the best fit of a liney = a + bx to an observed set of data points (x1, y1), (x2, y2), … , (xn, yn). Forconvenience, write the linear function r + sx as

so that the fitted points (on the line) have coordinates (x1, f (x1)), … , (xn, f (xn)).Figure 5.2 is a sketch of what the line y = f (x) might look like. For each i theobserved data point (xi, yi) and the fitted point (xi, f (xi)) need not be the same, andthe distance di between them measures how far the line misses the observed point.For this reason di is often called the error at xi, and a natural measure of how closethe line y = f (x) is to the observed data points is the sum d1 + d2 + … + dn of all theseerrors. However, it turns out to be better to use the sum of squares

as the measure of error, and the line y = f (x) is to be chosen so as to make this sumas small as possible. This line is said to be the least squares approximating line forthe data points (x1, y1), (x2, y2), … , (xn, yn).

The square of the error di is given by for each i, so the quantityS to be minimized is the sum:

S y f x y f x y f xn n= − + − + + −[ ( )] [ ( )] [ ( )] .1 12

2 22 2

d y f xi i i2 2= −[ ( )]

S d d dn= + + +12

22 2

f x r sx( ) = +

243Section 5.7 An Application to Least Squares Approximation

Figure 5.1

Y

XO

(x5, y5)(x4, y4)

(x3, y3)

(x2, y2)(x1, y1)

Line2Line1

Figure 5.2

Y

XO x1

y f x= ( )di

d1

dn

xi xn

(xn, f(xn))(xn, yn)

(xi, yi)

(xi, f(xi))

(x1, y1)(x1, f(x1))

♦2. The following table gives the number of years ofeducation and the annual income (in thousands) of10 individuals. Find the means, the variances, andthe correlation coefficient. (Again the data scalingformula is useful.)

Individual 1 2 3 4 5 6 7 8 9 10

Years of education 12 16 13 18 19 12 18 19 12 14

Yearly income (1000’s) 31 48 35 28 55 40 39 60 32 35

3. If X is a sample vector, and Xc is the centredsample, show that and the standarddeviation of Xc is sx.

4. Prove the data scaling formulas:(a) ♦(b) (c)found on page 242.

cx = 0

nic22772_ch05.qxd 11/21/2005 6:49 PM Page 243

Note that all the numbers xi and yi are given here: what is required is that thefunction f be chosen in such a way as to minimize S. Because f (x) = r + sx, thisamounts to choosing r and s so as to minimize S, and the problem can be solvedusing vector techniques. The following notations simplify the discussion.

Observe that

so the quantity S to be minimized is

Here Y and M are given and we are asked to find Z such that the length of thevector Y − MZ is as small as possible. To this end, consider the set U of all vectorsMZ where Z varies:

This is a subspace of �n, and the task is to choose MZ in U as close as possible to Y.If n = 3 and x1, x2 and x3 are distinct, U is the plane containing

and with normal . In this case, we

look for so that is orthogonal to every vector in the plane U,

as in Figure 5.3.This condition18 makes sense in �n so we look for such that

(MZ ) i (Y − MA) = 0 for all Z in �2. This dot product is in �n, and it can be writtenas a dot product in �2:

for all Z in �2. This means that MTY − MTMA is orthogonal to every vector in �2.In particular, it is orthogonal to itself, and so must be zero, and we obtain

.

These are called the normal equations for A, and can be solved using gaussian elimination. Moreover, if at least two of the xi are distinct, the matrix MTM can be

shown to be invertible, so the solution A is unique. If the solution is , the

best fitting line is y = a0 + a1x. This proves the following useful theorem.

Aaa

=

0

1

( )M M A M YT T=

0 = − = − = −( ) ( ) [ ( )] ( )MZ Y MA Z M Y MA Z M Y M MAT T T T Ti

Aaa

=

0

1

M z�y M a� �−aaa

� =

0

1

x v x x x x x x T� �× = − − −[ ]2 3 3 1 1 2v T� = [ ]1 1 1x x x x T� = [ ]1 2 3

U MZ Zrs

r sxr sx

r sxn

= =

=

++

+

� �arbitrary

1

2

r sand arbitrary .

S Y MZ= − 2.

Y MZ

y r sxy r sx

y r sx

y f xy

n n

− =

− +− +

− +

=

−1 1

2 2

1 1( )( )

( )

( )

22 2−

−

f x

y f xn n

( )

( )

Y

yy

y

M

xx

x

Zrs

n n

=

=

=

1

2

1

2

11

1


18 We will revisit this in Chapter 8 where a more rigorous argument will be given.

−U

M

y�y�

0�

a�

M a�

Figure 5.3

nic22772_ch05.qxd 11/21/2005 6:49 PM Page 244

x y1 13 24 36 47 5

Theorem 1

Suppose that n data points (x1, y1), (x2, y2), … , (xn, yn) are given, where at least twoof x1, x2, … , xn are distinct. Put

Then the least squares approximating line for these data points has the equation

where is found by gaussian elimination from the normal equations

The condition that at least two of x1, x2, … , xn are distinct ensures that M TMis an invertible matrix, so A is unique:

Example 1Let data points (x1, y1), (x2, y2), … , (x5, y5) be given as in the accompanyingtable. Find the least squares approximating line for these data.

Solution In this case we have

so the normal equations (MTM )A = MTY for become

The solution (using gaussian elimination) is to two decimal

places, so the least squares approximating line for these data is y = 0.24 + 0.66x.Note that MTM is indeed invertible here (the determinant is 114), and the exact solution is

aa

0

1

0 240 66

=

.

.

5 2121 111

1578

0

1

=

aa

Aaa

=

0

1

M Yx x x

yy

yy y y

x y x

T =

=+ + +

+

1 1 1

1 2 5

1

2

5

1 2 5

1 1 2 yy x y2 5 5

1578+ +

=

M Mx x x

xx

xx x

x

T =

=+ +

+ +

1 1 111

15

1 2 5

1

2

5

1 5

1 xx x x5 12

52

5 2121 111+ +

=

A M M M YT T= −( ) .1

( ) .M M A M YT T=

Aaa

=

0

1

y a a x= +0 1

Y

yy

y

M

xx

xn n

=

=

1

2

1

2

11

1


nic22772_ch05.qxd 11/21/2005 6:49 PM Page 245

Suppose now that, rather than a straight line, we want to find the parabola y = a0 + a1x + a2x

2 that is the least squares approximation to the data points (x1, y1), … , (xn , yn). In the function f (x) = a0 + a1x + a2x

2, the three constants a0, a1, and a2 must be chosen to minimize the sum of squares of the errors:

Choosing a0, a1, and a2 amounts to choosing the (parabolic) function f thatminimizes S.

In general, there is a relationship y = f (x) between the variables, and the range ofcandidate functions is limited—say, to all lines or to all parabolas. The task is tofind, among the suitable candidates, the function that makes the quantity S as smallas possible. The function that does so is called the least squares approximatingfunction (of that type) for the data points.

As might be imagined, this is not always an easy task. However, if the functions f (x) are restricted to polynomials of degree m,

the analysis proceeds much as before (where m = 1). The problem is to choose thenumbers a0, a1, … , am so as to minimize the sum

The resulting function y = f (x) = a0 + a1x + … + amxm is called the least squaresapproximating polynomial of degree m for the data (x1, y1), … , (xn, yn ). Byanalogy with the preceding analysis, define

Then

so S is the sum of the squares of the entries of Y − MA. An analysis similar to thatfor Theorem 1 can be used to prove Theorem 2.

Theorem 2

Suppose n data points (x1, y1), (x2, y2), … , (xn, yn ) are given, where at least m + 1 of x1, x2, … , xn are distinct (in particular n ≥ m + 1). Put

Y MA

y a a x a x

y a a x a x

y a a x

mm

mm

n n

− =

− + + +

− + + +

− +

1 0 1 1 1

2 0 1 2 2

0 1

( )

( )

( ++ +

=

−−

−

a x

y f xy f x

y f xm n

m n n)

( )( )

( )

1 1

2 2

Y

yy

y

M

x x x

x x x

x x xn

m

m

n n nm

=

=

1

2

1 12

1

2 22

2

2

1

1

1

=

A

aa

am

0

1

S y f x y f x y f xn n= − + − + + −[ ( )] [ ( )] [ ( )] .1 12

2 22 2

f x a a x a xmm( ) = + + +0 1

S y f x y f x y f xn n= − + − + + −[ ( )] [ ( )] [ ( )] .1 12

2 22 2

A M M M YT T= =−

−

=

−( ) 1 1114

111 2121 5

1578

1114

2775

==

138

925


nic22772_ch05.qxd 11/21/2005 6:49 PM Page 246

Then the least squares approximating polynomial of degree m for the data pointshas the equation

where is found by gaussian elimination from the normal equations

The condition that at least m + 1 of x1, x2, … , xn be distinct ensures that the matrix MMT is invertible, so A is unique:

A proof of this theorem is given in Section 8.7 (Theorem 2).

Example 2Find the least squares approximating quadratic y = a0 + a1x + a2x

2 for thefollowing data points.

(−3, 3), (−1, 1), (0, 1), (1, 2), (3, 4)

Solution This is an instance of Theorem 2 with m = 2. Here

Hence,

M YT = − −

=

1 1 1 1 13 1 0 1 39 1 0 1 9

31124

114

66

M MT = − −

−−

1 1 1 1 13 1 0 1 39 1 0 1 9

1 3 91 1 11 0 01 1 11 3 9

=

5 0 200 20 0

20 0 164

Y M=

=

−−

31124

1 3 91 1 11 0 01 1 11 3 9

A M M M YT T= −( ) .1

( ) .M M A M YT T=

A

aa

am

=

0

1

y a a x a xmm= + + +0 1

Y

yy

y

M

x x x

x x x

x x xn

m

m

n n nm

=

=

1

2

1 12

1

2 22

2

2

1

1

1


nic22772_ch05.qxd 11/21/2005 6:49 PM Page 247

The normal equations for A are

This means that the least squares approximating quadratic for these data is y = 1.15 + 0.20x + 0.26x 2.

Least squares approximation can be used to estimate physical constants, as isillustrated by the next example.

Example 3Hooke’s law in mechanics asserts that the magnitude of the force f required tohold a spring is a linear function of the extension e of the spring (see theaccompanying diagram). That is,

where k and e0 are constants depending only on the spring. The following datawere collected for a particular spring.

Find the least squares approximating line f = a0 + a1e for these data, and use itto estimate k.

Solution Here f and e play the role of y and x in the general theory. We have

as in Theorem 1, so

Hence the normal equations for A are

The least squares approximating line is f = 7.70 + 2.84e, so the estimate for k is k = 2.84.

Exercises 5.7

5 6767 963

2293254

7 702 84

=

=

A Awhence..

M M M YT T=

=

5 6767 963

2293254

and

Y M=

=

3338435461

1 91 111 121 161 19

e 9 11 12 16 19f 33 38 43 54 61

f ke e= + 0

5 0 200 20 0

20 0 164

114

66

1 150 200 26

=

=A Awhence ...


e

f

1. Find the least squares approximating line y = a0 + a1x for each of the following sets of datapoints.

(a) (1, 1), (3, 2), (4, 3), (6, 4)

♦(b) (2, 4), (4, 3), (7, 2), (8, 1)

(c) (−1, −1), (0, 1), (1, 2), (2, 4), (3, 6)

♦(d) (−2, 3), (−1, 1), (0, 0), (1, −2), (2, − 4)

nic22772_ch05.qxd 11/21/2005 6:49 PM Page 248

2. Find the least squares approximating quadraticy = a0 + a1x + a2x

2 for each of the following setsof data points.

(a) (0, 1), (2, 2), (3, 3), (4, 5)

♦(b) (−2, 1), (0, 0), (3, 2), (4, 3)

3. If M is a square invertible matrix, show thatA = M −1Y (in the notation of Theorem 2).

♦4. Newton’s laws of motion imply that an objectdropped from rest at a height of 100 metres will be at a height metres t seconds later, where g is a constant called theacceleration due to gravity. The values of s and tgiven in the table are observed. Write x = t2,find the least squares approximating lines = a + bx for these data, and use b to estimate g.

Then find the least squares approximatingquadratic s = a0 + a1t + a2t

2 and use the value of a2 to estimate g.

5. A naturalist measured the heights yi (in metres)of several spruce trees with trunk diameters xi(in centimetres). The data are as given in thetable. Find the least squares approximating line for these data and use it to estimate theheight of a spruce tree with a trunk of diameter10cm.

6. (a) Use m = 0 in Theorem 2 to show that thebest-fitting horizontal line y = a0 throughthe data points (x1, y1), … , (xn, yn) is y = ( y1 + y2 + … + yn ), the average of they coordinates.

♦(b) Deduce the conclusion in (a) without usingTheorem 2.

7. Assume n = m + 1 in Theorem 2 (so M is square).If the xi are distinct, use Theorem 6 §3.2 to showthat M is invertible. Deduce that A = M −1Yand that the least squares polynomial is theinterpolating polynomial (Theorem 6 §3.2) andactually passes through all the data points.

1n

xi 5 7 8 12 13 16

yi 2 3.3 4 7.3 7.9 10.1

t 1 2 3

s 95 80 56

s gt= −100 12

2


Supplementary Exercises for Chapter 5

1. In each case either show that the statement istrue or give an example showing that it is false.Throughout, X, Y, Z, X1, X2, … , Xn denotevectors in �n. (a) If U is a subspace of �n and X + Y is in U,

then X and Y are both in U.♦(b) If U is a subspace of �n and rX is in U,

then X is in U.(c) If U is a nonempty set and sX + tY is in U

for any s and t whenever X and Y are in U,then U is a subspace.

♦(d) If U is a subspace of �n and X is in U, then −X is in U.

(e) If {X, Y } is independent, then {X, Y, X + Y }is independent.

♦(f ) If {X, Y, Z } is independent, then {X, Y } is independent.

(g) If {X, Y } is not independent, then {X, Y, Z }is not independent.

♦(h) If all of X1, X2, … , Xn are nonzero, then {X1, X2, … , Xn} is independent.

(i) If one of X1, X2, … , Xn is zero, then {X1, X2, … , Xn} is not independent.

♦( j) If aX + bY + cZ = 0 where a, b, and c are in �,then {X, Y, Z } is independent.

(k) If {X, Y, Z } is independent, then aX + bY + cZ = 0 for some a, b, and c in �.

♦(l) If {X1, X2, … , Xn } is not independent, thent1X1 + t2X2 + … + tnXn = 0 for ti in � not allzero.

(m) If {X1, X2, … , Xn } is independent, then t1X1 + t2X2 + … + tnXn = 0 for some ti in �.

♦(n) Every set of four nonzero vectors in �4

is a basis.

(o) No basis of �3 can contain a vector with a component 0.

♦(p) �3 has a basis of the form {X, X + Y, Y }where X and Y are vectors.

(q) Every basis of �5 contains one column of I5.

♦(r) Every nonempty subset of a basis of �3 isagain a basis of �3.

(s) If {X1, X2, X3, X4} and {Y1, Y2, Y3, Y4} are bases of �4, then {X1 + Y1, X2 + Y2, X3 + Y3, X4 + Y4} is also a basis of �4.

nic22772_ch05.qxd 11/21/2005 6:49 PM Page 249

linear algebra chapter 5

Documents