geometry in rn - texas a&m universitystecher/linearalgebrapdffiles/chaptersix.pdf · the...

40
Chapter 6 Geometry in R n In Chapter 2, where we first started our study of vector spaces, a vector was intuitively described as something possessing both magnitude and direction. This led us to the idea of expressing vectors, at least in R 2 , as pairs of real numbers. One of the topics we discuss in this chapter is the relation between these pairs of numbers and the length of a vector. We also show how the angle between two vectors is related to their ordered pair representation. 6.1 Length and Dot Product We first define the length of a vector in R 2 and then show how to compute the angle between two vectors. Let a =(a 1 ,a 2 ) be any vector in R 2 . By the length of a we mean the distance from the point with coordinates (a 1 ,a 2 ) to the origin (see Figure 6.1). The Pythagorean theorem tells us that this distance equals (a 2 1 + a 2 2 ) 1/2 . We therefore define the length of any vector in R 2 or R n as follows: a1 (a1,a2) a2 Figure 6.1 201

Upload: phamlien

Post on 15-Mar-2018

214 views

Category:

Documents


1 download

TRANSCRIPT

Page 1: Geometry in Rn - Texas A&M Universitystecher/LinearAlgebraPdfFiles/chapterSix.pdf · the shortest distance between any two points is the ... the length of the side opposite the angle

Chapter 6

Geometry in Rn

In Chapter 2, where we first started our study of vector spaces, a vector wasintuitively described as something possessing both magnitude and direction.This led us to the idea of expressing vectors, at least in R

2, as pairs of realnumbers. One of the topics we discuss in this chapter is the relation betweenthese pairs of numbers and the length of a vector. We also show how the anglebetween two vectors is related to their ordered pair representation.

6.1 Length and Dot Product

We first define the length of a vector in R2 and then show how to compute the

angle between two vectors. Let aaa = (a1, a2) be any vector in R2. By the length

of aaa we mean the distance from the point with coordinates (a1, a2) to the origin(see Figure 6.1). The Pythagorean theorem tells us that this distance equals(a21+a

22)

1/2. We therefore define the length of any vector in R2 or Rn as follows:

a1

(a1, a2)

a2

Figure 6.1

201

Page 2: Geometry in Rn - Texas A&M Universitystecher/LinearAlgebraPdfFiles/chapterSix.pdf · the shortest distance between any two points is the ... the length of the side opposite the angle

202 CHAPTER 6. GEOMETRY IN RN

Definition 6.1. Let xxx = (x1, x2, . . . , xn) be any vector in Rn. The length or

norm of xxx denoted by ‖xxx‖, is

‖xxx‖ = (x21 + · · ·+ x2n)1/2 =

n∑

j=1

x2j

1/2

(6.1)

Example 1. Compute the lengths of the following vectors:

a. ‖(2,−3)‖ = (4 + 9)1/2 = (13)1/2

b. ‖(2,−1, 3)‖ = (4 + 1 + 9)1/2 = (14)1/2

c. ‖(7,−8, 1, 3, 6)‖ = (49 + 64 + 1 + 9 + 36)1/2 = (159)1/2 �

In part b of Example 1, we computed the length of a vector in R3. Figure 6.2

shows that in this case we may also interpret the length of the vector (x1, x2, x3)as the distance from the point with coordinates (x1, x2, x3) to the origin. Thefollowing theorem lists a few useful properties of the norm or length of a vector.

x1

x3

(2,−1, 0)

(2,−1, 3)

x2

Figure 6.2

Theorem 6.1.

1. Let xxx be any vector in Rn. Then ‖xxx‖ ≥ 0, and ‖xxx‖ = 0 if and only if

xxx = 000.

2. ‖cxxx‖ = |c| ‖xxx‖ for any constant c and any vector xxx.

3. For any two vectors xxx and yyy, we have ‖xxx+ yyy‖ ≤ ‖xxx‖+ ‖yyy‖.

The first two results are almost obvious, both geometrically and analytically,as we will see in a few lines. The third, which is called the triangle inequality,is not quite so obvious, but it does have a nice geometrical interpretation for xxxand yyy in R

3.

Page 3: Geometry in Rn - Texas A&M Universitystecher/LinearAlgebraPdfFiles/chapterSix.pdf · the shortest distance between any two points is the ... the length of the side opposite the angle

6.1. LENGTH AND DOT PRODUCT 203

Figure 6.3 shows the result of adding yyy to xxx. The three points A,B, and Cdetermine a triangle the lengths of whose sides are ‖xxx‖, ‖yyy‖, and ‖xxx+yyy‖. Sincethe shortest distance between any two points is the straight line connectingthem, we see that inequality 3 must indeed be true.

C

vvv

xxx B

xxx+ yyy

A

Figure 6.3

Proof of Theorem 6.1.

1. Let xxx = (x1, . . . , xn). Then ‖xxx‖ = (x21 + · · · + x2n)1/2 ≥ 0. Moreover

‖xxx‖ = 0 if and only if xj = 0 for each j, i.e., ‖xxx‖ = 0 if and only if xxx = 000.

2. Let c be any scalar and xxx any vector in Rn. Then ‖cxxx‖ =

[

∑nj=1(cxj)

2]1/2

= (c2)1/2[

∑nj=1 x

2j

]1/2

= |c| ‖xxx‖.

3. The reader is asked to prove this property in one of the problems at theend of this section.

Property 2 is used when we wish to construct a vector that has a given directionand length equal to 1. This is accomplished by taking any nonzero vector xxx thathas the desired direction and then dividing it by its length, for if uuu = xxx/‖xxx‖,then ‖uuu‖ = 1.

Example 2.

a. ‖ − 2(1, 3)‖ = ‖(−2,−6)‖ = (4 + 36)1/2 = 2‖(1, 3)‖

b. Construct a unit vector that is parallel to the line going from the pointP = (−1, 2) to the point Q = (3, 4). See Figure 6.4. By a unit vector wemean one whose length equals 1. A vector that has the desired directioncan be found by subtracting the coordinates of the point P from those ofQ. Thus xxx = (3, 4) − (−1, 2) = (4, 2) points in the desired direction, butit may not have length 1. In fact ‖xxx‖ = (16 + 4)1/2 = (20)1/2. Thus, thedesired unit vector uuu equals (4, 2)/

√20. �

We next wish to take two vectors AAA = (a1, a2) and BBB = (b1, b2) in R2 and

derive a formula relating their coordinates and the cosine of the angle betweenthem. See Figure 6.5. If we picture the triangle formed by these two vectors,

Page 4: Geometry in Rn - Texas A&M Universitystecher/LinearAlgebraPdfFiles/chapterSix.pdf · the shortest distance between any two points is the ... the length of the side opposite the angle

204 CHAPTER 6. GEOMETRY IN RN

Q(3, 4)

P (−1, 2)

Figure 6.4

the length of the side opposite the angle θ equals ‖AAA−BBB‖ = ‖(a1−b1, a2−b2)‖.The law of cosines then implies

‖AAA−BBB‖2 = ‖AAA‖2 + ‖BBB‖2 − 2‖AAA‖‖BBB‖ cos θ (6.2)

Computing these lengths in terms of the coordinates, we have

(a1 − b1)2 + (a2 − b2)

2 = a21 + a22 + b21 + b22 − 2‖AAA‖‖BBB‖ cos θSquaring the terms in parentheses and then canceling gives us

−2a1b1 − 2a2b2 = −2‖AAA‖‖BBB‖ cos θWe finally arrive at the formula

cos θ =a1b1 + a2b2‖AAA‖‖BBB‖ (6.3)

where AAA = (a1, a2),BBB = (b1, b2), and θ is the smaller of the two angles de-termined by AAA and BBB. We may draw a similar diagram from two vectorsAAA = (a1, a2, a3) and BBB = (b1, b2, b3) in R

3. If the same calculations are per-formed, we have

cos θ =a1b1 + a2b2 + a3b3

‖AAA‖‖BBB‖ (6.4)

AAA = (a1, a2)

BBB = (b1, b2)θ

Figure 6.5

where again, θ is the smaller of the two angles betweenAAA andBBB. The numeratorin (6.3) and (6.4) appears so often in various formula that it has been given aspecial name.

Page 5: Geometry in Rn - Texas A&M Universitystecher/LinearAlgebraPdfFiles/chapterSix.pdf · the shortest distance between any two points is the ... the length of the side opposite the angle

6.1. LENGTH AND DOT PRODUCT 205

Definition 6.2. Let xxx = (x1, . . . , xn) and yyy = (y1, . . . , yn) be any two vectorsin R

n. The dot, or scalar, or inner product, of these two vectors is defined to be

〈xxx,yyy〉 = x1y1 + · · ·+ xnyn =

n∑

j=1

xjyj (6.5)

The phrase dot product arises because this operation is commonly denoted by adot, i.e., xxx ·yyy. The term scalar is used because the operation produces a numberand not a vector; while the term inner distinguishes this product from the outerproduct, xxxTyyy, an n× n matrix whose j, k entry is xjyk.

Example 3.

a. 〈(1, 2), (6, 4)〉 = 6 + 8 = 14

b. 〈(1, 2), (−4, 2)〉 = −4 + 4 = 0

c. 〈(2,−3, 4), (1, 6,−2)〉 = 2− 18− 8 = −24

d. 〈(1, 0, 4, 6), (−2, 3, 2, 8)〉 = −2 + 0 + 8 + 48 = 54 �

We now rewrite formulas (6.3) and (6.4) as

〈AAA,BBB〉 = ‖AAA‖‖BBB‖ cos θ (6.6)

Example 4. Compute the cosine of the angle between the following pairs ofvectors:

a. (1,−2) and (4,3). From (6.6) we have

cos θ =〈(1,−2), (4, 3)〉√

5√25

=4− 6

5√5

=−2

5√5

b. (−2, 3, 4) and (3,−1, 8)

cos θ =〈(−2, 3, 4), (3,−1, 8)〉√

29√74

=23√

29√74

c. (4, 2,−1) and (−3, 4,−4)

cos θ =〈(4, 2,−1), (−3, 4,−4)〉√

21√41

=−12 + 8 + 4√

21√41

= 0

Since cos θ = 0 if and only if θ equals 90 degrees, we see that the twovectors (4, 2,−1) and (−3, 4,−4) are perpendicular. �

Page 6: Geometry in Rn - Texas A&M Universitystecher/LinearAlgebraPdfFiles/chapterSix.pdf · the shortest distance between any two points is the ... the length of the side opposite the angle

206 CHAPTER 6. GEOMETRY IN RN

Two comments are in order. First, formula (6.6) is only valid (at this point) inR

2 or R3. We later extend (6.6) to vectors in Rn, by defining the cosine of the

angle between two vectors to be that number which satisfies (6.6). The secondcomment, which we state as a theorem, has to do with when two vectors areperpendicular.

Theorem 6.2. Let xxx and yyy be any two nonzero vectors in R2 or R

3. Then xxxand yyy are perpendicular if and only if 〈xxx,yyy〉 = 0.

Proof. Two nonzero vectors are perpendicular if and only if the angle betweenthem equals 90 degrees; equivalently the cosine of that angle must equal zero.Using formula (6.6), we see that happens only when the inner product of thetwo vectors equals zero.

The inner product 〈xxx,yyy〉 of two vectors satisfies many properties, some of whichare listed in the next theorem.

Theorem 6.3. Let xxx = (x1, . . . , xn) and yyy = (y1, . . . , yn) be any two vectors inR

n. Then

a. 〈xxx,xxx〉 = ‖xxx‖2

b. 〈xxx,yyy〉 = 〈yyy,xxx〉

c. Let a and b be any two scalars and let zzz be any vector in Rn; then

〈axxx+ byyy,zzz〉 = a〈xxx,zzz〉+ b〈yyy,zzz〉

Proof.

a. 〈xxx,xxx〉 = x1x1 + x2x2 + · · ·+ xnxn = ‖xxx‖2

b. 〈xxx,yyy〉 = x1y1 + · · ·+ xnyn= y1x1 + · · ·+ ynxn= 〈yyy,xxx〉

c. 〈axxx+ byyy,zzz〉 =n∑

j=1

(axj + byj)zj

=n∑

j=1

axjzj +n∑

j=1

byjzj

= a〈xxx,zzz〉+ b〈yyy,zzz〉

Property a is referred to by saying that the inner product is positive definite, i.e.,〈xxx,xxx〉 ≥ 0, and it equals zero only if xxx is the zero vector. The second property issummarized by saying that the inner product is symmetric. The third propertyis the statement that the dot product is linear in its first argument. Symmetryimmediately implies that

〈zzz, axxx+ byyy〉 = a〈zzz,xxx〉+ b〈zzz,yyy〉

Page 7: Geometry in Rn - Texas A&M Universitystecher/LinearAlgebraPdfFiles/chapterSix.pdf · the shortest distance between any two points is the ... the length of the side opposite the angle

6.1. LENGTH AND DOT PRODUCT 207

We want to extend formula (6.6) to dimensions higher than three. Before wecan do so, however, we need to know that the expression 〈xxx,yyy〉/‖xxx‖‖yyy‖ can beinterpreted as the cosine of some angle between 0 and 180 degrees. Figure 6.6shows the graph of cos θ for 0 ≤ θ ≤ π radians. Notice that cos θ is a numberthat always lies between −1 and 1. Thus, we would like the absolute value of〈xxx,yyy〉/‖xxx‖‖yyy‖ to be no greater than 1. This is the content of the next theorem.

1

π2 π0

−1

Figure 6.6

Theorem 6.4 (Cauchy–Schwarz inequality). Let xxx and yyy be any two vectors inR

n. Then|〈xxx,yyy〉| ≤ ‖xxx‖‖yyy‖ (6.7)

Proof. Define the following function of t by

f(t) = ‖xxx+ tyyy‖2 = 〈xxx+ tyyy,xxx+ tyyy〉= t2‖yyy‖2 + 2t(xxx,yyy〉+ ‖xxx‖2 (6.8)

f(t) is a quadratic function of t and is never negative. If yyy = 0, then (6.7) iscertainly true. Hence, we may assume that yyy is not the zero vector. Completingthe square we rewrite (6.8) as

f(t) = ‖yyy‖2[

t+〈xxx,yyy〉‖yyy‖2

]2

+ ‖xxx‖2 − 〈xxx,yyy〉2‖yyy‖2 (6.9)

Regardless of the value of t we must have f(t) ≥ 0. We now pick t0 in order toobtain the minimum value of f(t). Setting t0 = −〈xxx,yyy〉/‖yyy‖2, we have

0 ≤ f(t0) = ‖xxx‖2 −[ 〈xxx,yyy〉

‖yyy‖

]2

Thus, we see that 〈xxx,yyy〉2 ≤ [‖xxx‖‖yyy‖]2, from which (6.7) immediately follows.

Example 5. Write the Cauchy–Schwarz inequality for any two vectors in R2

in terms of their coordinates.

Solution. If xxx = (x1, x2) and yyy = (y1, y2), we have

|〈xxx,yyy〉| = |x1y1 + x2y2| ≤ (x21 + x22)1/2(y21 + y22)

1/2�

Page 8: Geometry in Rn - Texas A&M Universitystecher/LinearAlgebraPdfFiles/chapterSix.pdf · the shortest distance between any two points is the ... the length of the side opposite the angle

208 CHAPTER 6. GEOMETRY IN RN

Definition 6.3. Let xxx and yyy be two nonzero vectors in Rn. We define the angle

θ between xxx and yyy to be that angle which lies between 0 and 180 degrees andsatisfies

cos θ =〈xxx,yyy〉‖xxx‖‖yyy‖

Definition 6.4. We say that two nonzero vectors xxx and yyy are perpendicular if〈xxx,yyy〉 equals zero.

Definitions 6.3 and 6.4 extend our usual notions of the angle between two vectors,and the concept of perpendicularity, to R

n, for n greater than 3, in a consistentmanner.

Example 6. Determine which of the following pairs of vectors are perpendic-ular.

a. (1,0) and (0,1) are perpendicular since 〈(1, 0), (0, 1)〉 = 0 + 0 = 0.

b. (a, b), and (−b, a) are perpendicular since 〈(a, b), (−b, a)〉 = −ab+ ba = 0.

c. (1,6,3) and (0,1,2) are not perpendicular since their inner product, whichequals 12, is not zero.

d. (1, 2,−6, 1) and (0,1,4,3) are not perpendicular since their inner productequals −19.

e. (−2, 8, 3, 4) and (6, 4, 4,−8) are perpendicular since their dot productequals zero. �

Example 7. Show that the diagonals of a rhombus must be perpendicular. Arhombus is a four-sided polygon with all four sides having the same length.

xxx3 = (x3, y3)

xxx2 = (x2, y2)

xxx1 = (x1, y1)

OOO = (0, 0)

Solution. The accompanying figure has all four sides the same length. Thus,

‖xxx2‖ = ‖xxx1‖ = ‖xxx3 − xxx2‖ = ‖xxx3 − xxx1‖

Page 9: Geometry in Rn - Texas A&M Universitystecher/LinearAlgebraPdfFiles/chapterSix.pdf · the shortest distance between any two points is the ... the length of the side opposite the angle

6.1. LENGTH AND DOT PRODUCT 209

Hence, we have by Theorem 6.3,

0 = ‖xxx3 − xxx1‖2 − ‖xxx3 − xxx2‖2

= 〈xxx3 − xxx1,xxx3 − xxx1〉 − 〈xxx3 − xxx2,xxx3 − xxx2〉= 〈xxx3,xxx3〉 − 2〈xxx3,xxx1〉+ 〈xxx1,xxx1〉− {〈xxx3,xxx3〉 − 2〈xxx3,xxx2〉+ 〈xxx2,xxx2〉}

= 2〈xxx3,xxx2 − xxx1〉+ ‖xxx1‖2 − ‖xxx2‖2

= 2〈xxx3,xxx2 − xxx1〉

Thus, the vectors xxx3 and xxx2 − xxx1 are perpendicular. Since these vectors areparallel to the diagonals of the rhombus, the diagonals must be perpendicularto each other. �

Problem Set 6.1

1. Calculate the lengths of the following vectors:

a. (1,2) b. (−1, 3, 6) c. (1,1,2,8)

2. Find all unit vectors that are parallel to the vector (1, 2,−4).

3. Compute the dot product of each of the following pairs of vectors:

a. (1,0), (0,1) b. (a, b), (b, a) c. (1,2,1), (3,−6, 2)

4. Sketch each of the following pairs of vectors. Compute their inner productand determine the cosine of the angle between them.

a. (1,0), (1,0) b. (1,0), (1,1) c. (1,0), (0,1) d. (1,0), (−1, 1)e. (1,0), (−1, 0) f. (1,0), (−1,−1)

5. Find the cosine of the angle between each of the following pairs of vectors:

a. (1,2), (3,−1) b. (1, 0,−4), (6,1,2) c. (−2, 3, 0, 1), (1, 2, 8,−2)

6. Show that if xxx and yyy are any two vectors in Rn for which |〈xxx,yyy〉| = ‖xxx‖‖yyy‖,

then one of them must be a scalar multiple of the other. [Hint: Let f(t)be the function defined in (6.8). Show that the above equality holds if andonly if f(t) = 0 for some value of t.]

7. Use the Cauchy–Schwarz inequality to prove the triangle inequality forany two vectors. (Hint: Compute ‖xxx+ yyy‖2.)

8. Let V be the vector space C[0, 1] which consists of all real-valued continu-ous functions defined on the interval [0,1]. For any two functions fff(t) andggg(t) in V define

〈fff,ggg〉 =ˆ 1

0

fff(t)ggg(t)dt

Page 10: Geometry in Rn - Texas A&M Universitystecher/LinearAlgebraPdfFiles/chapterSix.pdf · the shortest distance between any two points is the ... the length of the side opposite the angle

210 CHAPTER 6. GEOMETRY IN RN

a. Let fff(t) = t2 and ggg(t) = 1− t. Compute 〈fff,ggg〉.b. Define the length of a vector fff in V by

‖fff‖2 = 〈fff,fff〉 =ˆ 1

0

fff(t)fff(t)dt

If fff(t) = sinnπt where n is an integer, compute its length.

c. Show that properties 1 and 2 of Theorem 6.1 are valid.

d. Show that 〈 , 〉 is an inner product in the sense that Theorem 6.3 isvalid.

e. Prove the Cauchy–Schwarz inequality for this inner product. (Hint:Repeat the proof of Theorem 6.4.)

9. Let A = [ajk] be any real m× n matrix. Define ‖A‖2 =∑m

j=1

∑nk=1 a

2jk.

a. For any vector xxx in Rn show that ‖Axxx‖ ≤ ‖AAA‖‖xxx‖.

b. Let A =

[

1 23 −1

]

. Compute ‖A‖ and verify directly that ‖Axxx‖ ≤‖A‖‖xxx‖ for any xxx in R

2.

10. The distance between a vector xxx and a subspace W is defined to be thesmallest value of ‖xxx − www‖ as www varies throughout W . Let V equal R2

and W = S[(1, 1)], the span of the vector (1,1). For each of the followingvectors xxx compute the distance between xxx and W .

a. (2,2) b. (−1, 1) c. (1,2)In each case determine that vector www0 in W such that ‖xxx−www0‖ equals thedistance between xxx and W . Then show that xxx − www0 is perpendicular to(1,1) and hence to every vector in S[(1, 1)] =W .

11. Let xxx = (x1, . . . , xn) be any vector in Rn. Show that |xj | ≤ ‖xxx‖ for any

j, and ‖xxx‖ ≤ |x1|+ |x2|+ · · ·+ |xn|.

12. Let xxx1, . . . ,xxxn, . . . be a sequence of vectors in Rm. We say that the se-

quence xxxn converges to xxx, limn→∞

xxxn = xxx, if and only if limn→∞

(‖xxxn −xxx‖) = 0.

a. Let xxxn = (1 + (1/n), 0). Show that xxxn converges to (1,0).

b. Let xxxn be a sequence of vectors in R2. Show that lim

n→∞xxxn = xxx

if and only if the components of xxxn converge to the correspondingcomponents of xxx; cf. problem 11.

13. Let xxxn = (1 + (1/n), 2− (2/n), 3). Show that xxxn converges to (1,2,3).

14. Show that the result of problem 12b is also true in Rn.

Page 11: Geometry in Rn - Texas A&M Universitystecher/LinearAlgebraPdfFiles/chapterSix.pdf · the shortest distance between any two points is the ... the length of the side opposite the angle

6.2. PROJECTIONS AND BASES 211

15. Let A = [ajk], Ap = [apjk], p = 1, 2, . . . . Define the norm of m × nmatrices as in problem 9. Show that lim

p→∞‖A − Ap‖ = 0 if and only if

limp→∞

|ajk − apjk| = 0, for each j and k. Note, this is problem 14 where we

identify Mmn with Rmn.

16. Suppose that xxx is perpendicular to every vector in some set A. Show thatxxx must then be perpendicular to every vector in S[A].

17. Let A be any m × n matrix. Show that Axxx = 000 if and only if xxx isperpendicular to every row of A. Thus, xxx is in ker(A) if and only if xxx isperpendicular to every vector in the row space of A.

18. Let V = R2. For any yyy in V the mapping L[xxx] = 〈xxx,yyy〉 maps V to R.

a. Show that L is a linear transformation.

b. Using the standard bases in V and R, what is the matrix represen-tation of L?

c. Repeat parts a and b where V is now Rn.

19. Let

A

BO

be an isosceles triangle with equal angles at O and B.

Show that the line drawn from the vertex A to the midpoint of OB isperpendicular to OB.

20. Show that the diagonals of a square bisect not only each other but alsoeach vertex angle.

6.2 Projections and Bases

For various reasons we sometimes wish to compute the component or projectionof a vector in some particular direction. Geometrically it’s easy to see how to dothis. Suppose xxx is some vector in R

2 and we wish to compute its perpendicularprojection onto the direction indicated by the dashed line in Figure 6.7a.

(a)

CProj xxx C Proj xxx C

(c)(b)

xxx

Figure 6.7

We locate a point C on this line so that the line joining the tip of xxx to C isperpendicular to the original line. The projection of xxx in the direction C is

Page 12: Geometry in Rn - Texas A&M Universitystecher/LinearAlgebraPdfFiles/chapterSix.pdf · the shortest distance between any two points is the ... the length of the side opposite the angle

212 CHAPTER 6. GEOMETRY IN RN

then given by the vector starting where xxx starts and ending at the point C.Figure 6.7b and c show two other configurations. Notice that in Figure 6.7c, xxxis perpendicular to the dashed line. Since this forces the point C to coincidewith the origin of xxx, the projection of xxx in this case is the zero vector.

In deriving a formula for the projection, we first start by assuming that thedirection is given by a unit vector uuu, that is ‖uuu‖ = 1. Let θ denote the anglebetween the vector xxx and uuu. Let d denote the length of the projection; cf.Figure 6.8. Then cos θ = d/‖xxx‖ and

d = ‖xxx‖ cos θ = ‖xxx‖ 〈xxx,uuu〉‖xxx‖‖uuu‖ = 〈xxx,uuu〉 (6.10)

Thus, d is merely the inner product of xxx with uuu. To get the projection of xxx ontouuu, Projuxxx, we just multiply uuu by d. We note that if uuu were not a unit vector wewould have d = 〈xxx,uuu〉/‖uuu‖. In order to avoid having to carry along the factor‖uuu‖−1 we insist that uuu be a vector of length 1.

Definition 6.5. Let xxx be any vector in Rn, and let uuu be any unit vector in R

n.The projection of xxx onto uuu, Projuxxx, is defined to be

Projuxxx = 〈xxx,uuu〉uuu (6.11)

Example 1. Let xxx = (2,−3). Compute Projuxxx for each of the following unitvectors:

a. uuu = (1, 0) : Proju(2,−3) = 〈(2,−3), (1, 0)〉(1, 0)= 2(1, 0) = (2, 0)

b. uuu = (1/√2, 1/

√2) : Proju(2,−3) =

(2,−3),(

1√2, 1√

2

)⟩(

1√2, 1√

2

)

= 2−3√2

(

1√2, 1√

2

)

= −(

12 ,

12

)

c. uuu = (0, 1) : Proju(2,−3) = 〈(2,−3), (0, 1)〉(0, 1)= −3(0, 1) = (0,−3) �

xxx

θ

d

Figure 6.8

Figure 6.9 illustrates this example. It is clear, geometrically, that our construc-tion gives us the “perpendicular component” of xxx in the direction specified byuuu. The following lemma shows that this is indeed true.

Page 13: Geometry in Rn - Texas A&M Universitystecher/LinearAlgebraPdfFiles/chapterSix.pdf · the shortest distance between any two points is the ... the length of the side opposite the angle

6.2. PROJECTIONS AND BASES 213

Proj

Proj

Proj

(c)(b)(a)

(2,−3) (2,−3)

uuu = (1, 0)

uuu = (0, 1)uuu =

(

1√

2, 1√

2

)

Figure 6.9

Lemma 6.1. Let xxx be any vector in Rn, and let uuu be a unit vector. Then

xxx− Projuxxx is either the zero vector, or it is perpendicular to uuu.

Proof.

〈xxx− Projuxxx,uuu〉 = 〈xxx,uuu〉 − 〈Projuxxx,uuu〉= 〈xxx,uuu〉 − 〈〈xxx,uuu〉uuu,uuu〉= 〈xxx,uuu〉 − 〈xxx,uuu〉〈uuu,uuu〉= 〈xxx,uuu〉 − 〈xxx,uuu〉 = 0

Remember that uuu is assumed to be a unit vector and therefore 〈uuu,uuu〉 = 1.Thus, if xxx − Projuxxx is not the zero vector it must be perpendicular to uuu. SeeFigure 6.10.

xxx

uuu Projuxxx

xxx − Projuxxx

Figure 6.10

Lemma 6.1 tells us that, given any vector xxx and any unit vector uuu, we can writexxx as the sum of two vectors, one parallel to uuu and the other perpendicular to uuu.

xxx = Projuxxx+ [xxx− Projuxxx] (6.12)

Why is Projuxxx parallel to uuu?

Example 2. Write xxx = (4, 2) as the sum of two vectors, one of which is parallelto the line joining the two points P = (−6, 4) and Q = (3,−5), while the second

Page 14: Geometry in Rn - Texas A&M Universitystecher/LinearAlgebraPdfFiles/chapterSix.pdf · the shortest distance between any two points is the ... the length of the side opposite the angle

214 CHAPTER 6. GEOMETRY IN RN

vector is perpendicular to this line. See Figure 6.11. Let yyy = (−6−3, 4−(−5)) =(−9, 9). Since we found yyy by subtracting the coordinates of Q from those of P,yyyis a vector parallel to the line through the two points P and Q. Thus a unitvector parallel to this line is yyy/‖yyy‖ = (1/

√2)(−1, 1) = uuu.

A vector parallel to uuu is Proju(xxx) = 〈(4, 2), (−1, 1)/√2〉uuu = (1,−1). A

vector perpendicular to the line is xxx− Projuxxx = (4, 2)− (1,−1) = (3, 3). Thus,xxx = (4, 2) = (1,−1) + (3, 3), where (1,−1) is parallel to the line and (3,3) isperpendicular to the line. �

We next want to relate these ideas to those involving the coordinates of avector. Referring to Example 1a and c, notice that the vectors uuu1 = (1, 0) anduuu2 = (0, 1) are our standard basis. For xxx = (2,−3) we had Proju1

xxx = 2uuu1and Proju2

xxx = −3uuu2. In other words, the coordinates of xxx with respect to thestandard basis can be found by taking the dot product of xxx with each of thebasis vectors. That may not happen for an arbitrary basis, as Example 3 shows.

Example 3. The pair uuu1 = (1, 1)/√2 and uuu2 = (1, 2)/

√5 form a basis of

R2. Moreover each of them has length 1. Find the coordinates of xxx = (2,−3)

with respect to this basis and also compute the inner product of xxx with uuu1 anduuu2. An easy calculation shows that xxx = 7(

√2)uuu1 − 5(

√5)uuu2. Notice though

that 〈xxx,uuu1〉 = −1/√2, and 〈xxx,uuu2〉 = −4/

√5. Thus, the inner products do not

equal the coordinates of xxx with respect to this basis. Actually, we shouldn’thave expected any such relationship because the two vectors uuu1 and uuu2 arenot perpendicular and the inner product of xxx with uuu1 gives us the size of theperpendicular projection of xxx onto uuu1. �

Proj(− 1

2. 1√

2)(4, 2)

P (−6, 4)

(4,2)

Q(3,−5)

(

− 1√

2, 1√

2

)

Figure 6.11

With the above example in mind we might expect that if U = {uuu1,uuu2}consists of two perpendicular unit vectors in R

2, then [xxx]u = [〈xxx,uuu1〉, 〈xxx,uuu2〉].Indeed, we will prove that

xxx = 〈xxx,uuu1〉uuu1 + 〈xxx,uuu2〉uuu2 (6.13)

Thus, suppose that UUU = {uuu1,uuu2} is a basis of R2 that consists of two perpen-dicular unit vectors. Then xxx = c1uuu1 + c2uuu2. Let’s now take the inner product

Page 15: Geometry in Rn - Texas A&M Universitystecher/LinearAlgebraPdfFiles/chapterSix.pdf · the shortest distance between any two points is the ... the length of the side opposite the angle

6.2. PROJECTIONS AND BASES 215

of xxx with uuu1 and then with uuu2.

〈xxx,uuu1〉 = 〈c1uuu1 + c2uuu2,uuu1〉 = c1〈uuu1,uuu1〉+ c2〈uuu2,uuu1〉 = c1

since 〈uuu1,uuu1〉 = 1 and 〈uuu1,uuu2〉 = 0. Similarly we have 〈xxx,uuu2〉 = c2. Thus, (6.13)is valid, and as we shall see in a short while, it is also valid in R

n.

Definition 6.6. A set of nonzero vectors {uuuj : j = 1, . . . , p} in Rn is said to

be orthogonal if they are mutually perpendicular, i.e., 〈uuuj ,uuukkk〉 = 0 if j 6= k.

Example 4. The set {(1, 1, 0, 0), (0, 0, 1, 1), (1,−1, 1,−1)} is orthogonal since〈(1, 1, 0, 0), (0, 0, 1, 1)〉 = 0, 〈(1, 1, 0, 0), (1,−1, 1,−1)〉 = 0, and 〈(0, 0, 1, 1),(1,−1, 1,−1)〉 = 0. However, the set {(1, 1, 1), (1,−1, 0), (1, 1, 2)} is not or-thogonal since 〈(1, 1, 1), (1, 1, 2)〉 equals 4, not zero. �

Lemma 6.2. Any set of orthogonal vectors must be linearly independent.

Proof. Let {uuuk : k = 1, . . . , p} be an orthogonal set of vectors and suppose thatwe have constants cj such that

000 = c1uuu1 + · · ·+ cpuuup (6.14)

Taking the inner product of (6.14) with uuu1 we have

0 = 〈000,uuu1〉 =⟨

p∑

j=1

cjuuuj ,uuu1

=

p∑

j=1

cj〈uuuj ,uuu1〉

= c1〈uuu1,uuu1〉

Since uuu1 is not the zero vector, we know that 〈uuu1,uuu1〉 6= 0. Thus c1 = 0.By taking the inner product of (6.14) with any one of the uuuk’s, we similarlysee that ck = 0 for each k. Thus, our orthogonal set of vectors is linearlyindependent.

Definition 6.7. A set of vectors U = {uuu1, . . . ,uuun} is said to be an orthonormalbasis of Rn if it is a basis consisting of orthogonal unit vectors. That is, 〈uuujuuuk〉 =δjk.

The following is probably the most useful idea in this section.

Theorem 6.5. Let U = {uuu1, . . . ,uuun} be an orthonormal basis of Rn. Then thecoordinates of any vector xxx can be found by taking the inner product of xxx witheach of the basis vectors uuuk. That is,

xxx = 〈xxx,uuu1〉uuu1 + · · ·+ 〈xxx,uuun〉uuun (6.15)

Page 16: Geometry in Rn - Texas A&M Universitystecher/LinearAlgebraPdfFiles/chapterSix.pdf · the shortest distance between any two points is the ... the length of the side opposite the angle

216 CHAPTER 6. GEOMETRY IN RN

Proof. Since U is a basis we know there are unique constants ck, 1 ≤ k ≤ n,such that

xxx = c1uuu1 + · · ·+ cnuuun (6.16)

Takin the dot product of both sides of (6.16) with the kth basis vector uuuk, wehave

〈xxx,uuuk〉 =⟨

n∑

j=1

cjuuuj ,uuuk

=

n∑

j=1

cj〈uuuj ,uuuk〉 =n∑

j=1

cjδjk

= ck

Thus, each coordinate of xxx with respect to U is the inner product of xxx withthe corresponding basis vector in U . Another interpretation of (6.15) is thata vector equals the sum of its projections onto the vectors of an orthonormalbasis.

Example 5. Verify that each of the following is an orthonormal basis of R3,and then compute the coordinates of the vector xxx = (6,−2, 1) with respect tothese bases.

a. U = {(1, 0, 0), (0, 1, 0), (0, 0, 1)}. Clearly each of these vectors has length1, and they are mutually perpendicular. Thus, (6.15) is applicable. Com-puting the inner product of x with each basis vector we have

〈(6,−2, 1), (1, 0, 0)〉 = 6

〈(6,−2, 1), (0, 1, 0)〉 = −2

〈(6,−2, 1), (0, 0, 1)〉 = 1

Thus, xxx is equal to

xxx = (6,−2, 1) = 6uuu1 + (−2)uuu2 + 1uuu3

= 6(1, 0, 0)− 2(0, 1, 0) + (0, 0, 1)

b. U = {(1, 1, 1)/√3, (2,−1,−1)/

√6, (0, 1,−1)/

√2}. We first verify that U

is an orthonormal set.

‖uuu1‖2 =1

3+

1

3+

1

3= 1 〈uuu1,uuu2〉 = (2− 1− 1)/

√18 = 0

‖uuu2‖2 =4

6+

1

6+

1

6= 1 〈uuu1,uuu3〉 = (1− 1)/

√6 = 0

‖uuu3‖2 =1

2+

1

2= 1 〈uuu2,uuu3〉 = (−1 + 1)/

√12 = 0

Thus, U consists of mutually orthogonal unit vectors. Lemma 6.2 tellsus that U is linearly independent. Since dim(R3) equals 3, U must be a

Page 17: Geometry in Rn - Texas A&M Universitystecher/LinearAlgebraPdfFiles/chapterSix.pdf · the shortest distance between any two points is the ... the length of the side opposite the angle

6.2. PROJECTIONS AND BASES 217

basis. Computing the inner products of xxx with each of the basis vectorswe have

〈(6,−2, 1), (1, 1, 1)/√3〉 = 6− 2 + 1√

3〉 = 5√

3

〈(6,−2, 1), (2,−1,−1)/√6〉 = 12 + 2− 1√

6=

13√6

〈(6,−2, 1), (0, 1,−1)/√2〉 = −2− 1√

2=

−3√2

Thus, xxx = (5/√3)uuu1 + (13/

√6)uuu2 − (3/

√2)uuu3. �

In order to appreciate the convenience of an orthonormal basis, the reader shouldcompute the coordinates of xxx with respect to U without using Theorem 6.5.

Another fact, which is sometimes useful, is the relationship between the innerproduct of two vectors and their coordinates with respect to an orthonormalbasis.

Theorem 6.6. Let U = {uuuj : j = 1, . . . , n} be any orthonormal basis of Rn. Letxxx and yyy be any two vectors. Their coordinates with respect to the orthonormalbasis U are [x1, . . . , xn]U and [y1, . . . , yn]U , respectively. Then

〈xxx,yyy〉 = x1y1 + · · ·+ xnyn =n∑

j=1

xjyj (6.17a)

‖xxx‖2 =

n∑

j=1

x2j (6.17b)

Proof. By hypothesis xxx =∑n

j=1 xjuuuj and yyy =∑n

j=1 ynuuuj . Thus

〈xxx,yyy〉 =⟨

n∑

j=1

xjuuuj ,n∑

k=1

ykuuuk

=

n∑

j=1

n∑

k=1

xjyk〈uuuj ,uuuk〉

=

n∑

j=1

n∑

k=1

xjykδjk =

n∑

j=1

xjyj

To verify (6.17b) we only need to remember that

‖xxx‖2 = 〈xxx,xxx〉 =n∑

j=1

x2j

Example 6. U = {(1, 1)/√2, (−1, 1)/

√2} is an orthonormal basis of R2. Verify

(6.17) for the vectors (1,1) and (2,6).

Page 18: Geometry in Rn - Texas A&M Universitystecher/LinearAlgebraPdfFiles/chapterSix.pdf · the shortest distance between any two points is the ... the length of the side opposite the angle

218 CHAPTER 6. GEOMETRY IN RN

Solution. The inner product of these two vectors is

〈(1, 1), (2, 6)〉 = 2 + 6 = 8

Using Theorem 6.5 to compute the coordinates of these vectors with respect toU , we have

[(1, 1)]u = [〈(1, 1), (1, 1)/√2〉, 〈(1, 1), (−1, 1)/

√2〉] = [

√2, 0]

[(2, 6)]u = [〈(2, 6), (1, 1)/√2〉, 〈(2, 6), (−1, 1)/

√2〉]

= [8/√2, 4/

√2]

Thus, x1 =√2, x2 = 0, y1 = 8/

√2, and y2 = 4/

√2, and we have

x1y1 + x2y2 = 8 + 0 = 〈xxx,yyy〉

Using the coordinates of xxx and yyy with respect to U to compute their lengths,we have

‖xxx‖2 = (√2)2 + 0 = 2 ‖yyy‖2 =

64

2+

16

2= 40 �

Problem Set 6.2

1. Compute Projuxxx, where xxx = (7,−8) for each of the following unit vectors:

a. (1,−2)/51/2 b. (2, 3)/(13)1/2 c. (1,0)

2. Compute the projection of xxx = (−2, 3) in a direction parallel to thestraight line joining the points (1,7) and (−3, 8). There are two possi-ble choices for a direction; take the one that points from (1,7) to (−3, 8).

3. Let xxx = (7,−5). Let U = {(1, 5)/√26, (−5, 1)/

√26}.

a. Show that U is an orthonormal basis of R2.

b. Find Projujxxx, where uuuj is the jth unit vector in U .

c. Compute the coordinates of xxx with respect to U .

4. Let xxx = (1,−2). Show that any vector orthogonal to xxx is a scalar multipleof (2,1).

5. Let A = {(1, 2,−1), (−2, 1, 0)}. Show that A is an orthogonal set of vec-tors, and if xxx is any vector orthogonal to both vectors in A, then xxx mustbe a scalar multiple of (1,2,5).

6. Let V = {(2,−3, 1), (2, 3, 5), (−9,−4, 6)}.

a. Show that V is an orthogonal set of vectors.

b. Let xxx = (7,−3, 4). Compute the projection of xxx onto the directiongiven by vvvj , where vvvj is the jth vector in the set V .

Page 19: Geometry in Rn - Texas A&M Universitystecher/LinearAlgebraPdfFiles/chapterSix.pdf · the shortest distance between any two points is the ... the length of the side opposite the angle

6.2. PROJECTIONS AND BASES 219

c. Compute the coordinates of (7,−3, 4) = xxx with respect to the basisV . (Hint: It’s easy to construct an orthonormal basis from V .)

7. Find the angle between the following pairs of vectors:

a. (1,1), (0,1) b. (1,1,1), (0,1,0)c. (1,1,1,1), (0,1,0,0) d. (6, 7,−2, 3), (−1,−2, 1, 1)

8. Let U = {(1,−1)/√2, (1, 1)/

√2}. Use the fact that U is an orthonormal

basis to compute the coordinates of the following vectors with respect toU :

a. (9,−2) b. (6,4) c. (1,−1) d. (1,0)

9. Let U = {(2,−3, 1)/√14, (2, 3, 5)/

√38, (−9,−4, 6)/

√133}. Show that U

is an orthonormal basis (cf. problem 6) and then compute [xxx]u for thefollowing vectors:

a. (1,0,0) b. (−1, 6, 4) c. (18, 2,−4)

10. Let uuu be an arbitrary unit vector in Rn.

a. If xxx is the zero vector, show that Projuxxx = 000.

b. If xxx and uuu are perpendicular, show that Projuxxx = 000.

c. Show that Projuxxx is a linear transformation from Rn to R

n.

d. What is the dimension of the kernel of this linear transformation?

11. Let V = P3. Let fff and ggg be any two polynomials in V . Define 〈fff,ggg〉 =´ 1

0fff(t)ggg(t)dt; cf. problem 8 in Section 6.1.

a. Find a unit vector uuu that points in the same direction as fff(t) = t.

b. Find the projection of ttt2 onto the vector uuu of part a.

c. Find the cosine of the angle between the vectors ttt2 and ttt.

12. Let V = P1. Define the inner product of two vectors as we did in prob-lem 11. Show that {111, ttt − 1

2} is an orthogonal set of vectors. Find anorthonormal basis for V .

13. Let V = P2. Define the inner product as we did in problems 11 and 12.Let fff ′ denote the derivative of fff .

a. Find all polynomials in P2 that are perpendicular to their derivatives.

b. For any two polynomials f and g in V , compute 〈fff,ggg′〉+ 〈fff ′, ggg〉.

14. Let V = M22, the vector space of 2 × 2 matrices. For A = [ajk] and

B = [bjk] in M22 define 〈A,B〉 =∑2j=1

∑2k=1 ajkbjk.

a. Compute the norms of the matrices

[

1 00 1

]

and

[

a bc d

]

.

Page 20: Geometry in Rn - Texas A&M Universitystecher/LinearAlgebraPdfFiles/chapterSix.pdf · the shortest distance between any two points is the ... the length of the side opposite the angle

220 CHAPTER 6. GEOMETRY IN RN

b. Let Ejk, 1 ≤ j, k ≤ 2, be the standard basis of V . Do these fourmatrices form an othonormal basis?

15. In problem 10 we saw that for a fixed unit vector uuu in Rn, L[xxx] = Projuxxx

is a linear transformation from Rn to R

n. Let

uuu =1√2(eee1 − eee3) n ≥ 3

a. What is the matrix representation of L with respect to the standardbasis?

b. Find an orthonormal basis for Rg(L).

c. Find an orthonormal basis for ker(L).

d. Show that the union of the two orthonormal sets in parts b and c isan orthonormal basis of Rn.

e. What is the matrix representation of L with respect to the basis ofpart d? (List the vectors from b and then the vectors from c.)

6.3 Construction of Orthonormal Bases

We indicated in Chapter 2 that every vector space has a basis. A naturalquestion now is whether or not every vector space has an orthonormal basis.This of course makes sense only if the vector space has an inner product. Clearly,the answer is yes for R

n since the standard basis {eee1, . . . , eeen} is orthonormal.What about subspaces of R

n? The answer is again yes. In fact there is atechnique for constructing an orthonormal basis from any given basis. Thistechnique goes by the name of Gram–Schmidt. We illustrate it with an examplebefore going into the details of the algorithm.

Example 1. Let fff1 = (0, 1, 1) and fff2 = (0, 2, 0). LetW = S[fff1, fff2]. Constructan orthonormal basis for W .

Solution. Geometrically W is a plane (two-dimensional subspace of R3). In factW is the plane x1 = 0. Clearly eee2 and eee3 form an orthonormal basis for W .What we wish to do, though, is to use the given basis for W in constructing ourorthonormal basis. We first set uuu1 = fff1/‖fff1‖ = (0, 1, 1)/

√2. The unit vector

uuu1 will be the first vector in our basis. We now want a unit vector uuu2 that isperpendicular to uuu1 and also lies in W . Such a vector is easy to construct byusing the fact that fff2 −Proju1

fff2 must be perpendicular to uuu1. Moreover, sincethis vector is a linear combination of fff1 and fff2 (uuu1 is a scalar multiple of fff1),

Page 21: Geometry in Rn - Texas A&M Universitystecher/LinearAlgebraPdfFiles/chapterSix.pdf · the shortest distance between any two points is the ... the length of the side opposite the angle

6.3. CONSTRUCTION OF ORTHONORMAL BASES 221

it will lie in W . Thus, set

vvv2 = fff2 − Projuuu1fff2

= (0, 2, 0)− 〈(0, 2, 0), (0, 1, 1)/√2〉 (0, 1, 1)√

2

= (0, 2, 0)− (0, 1, 1) = (0, 1,−1)

uuu2 =vvv2‖vvv2‖

=(0, 1,−1)√

2

A quick computation shows that 〈uuu1,uuu2〉 = 0.

The only difficulty here is that the vector vvv2 might equal zero. But thiscannot happen, since the vectors fff1 and fff2 are linearly independent. �

Let’s assume now that {fff1, . . . , fffn} is a set of linearly independent vectors.The Gram–Schmidt procedure given below provides us with an orthonormalset of vectors {uuu1, . . . ,uuun}, such that S[uuu1, . . . ,uuup] = S[fff1, . . . , fffp], for p =1, 2, . . . , n. Define the unit vectors uuuk inductively by

uuu1 =fff1‖fff1‖

vvv2 = fff2 − 〈fff2,uuu1〉uuu1uuu2 =

vvv2‖vvv2‖

(6.18)

vvvk = fffk − [〈fffk,uuu1〉uuu1 + 〈fffk,uuu2〉uuu2 + · · ·+ 〈fffk,uuuk−1〉uuuk−1]

uuuk =vvvk‖vvvk‖

k = 2, . . . , n

Theorem 6.7. Let {fffk : k = 1, . . . , n} be a linearly independent set of vectors.Define uuuk by (6.18). Then {uuuk : k = 1, . . . , n} is an northonormal set or vectorsand S[uuu1, . . . ,uuup] = S[fff1, . . . , fffppp] for p = 1, . . . , n.

Proof. We prove this theorem by induction. For p = 1, we have uuu1 = fff1/‖fff1‖.Clearly {uuu1} is an orthonormal set and S[uuu1] = S[fff1]. Again we note that fff1 6= 0since the fff j ’s are linearly independent. We now assume that the theorem is truefor p and deduce its truth for p+ 1. From (6.18) we have

vvvp+1 = fffp+1 − [Projuuu1(fffp+1) + · · ·+ Projuuup

(fffp+1)] (6.19a)

uuup+1 =vvvp+1

‖vvvp+1‖(6.19b)

We have by assumption that S[uuu1, . . . ,uuup] = S[fff1, . . . , fffp]. Thus, the vectorvvvp+1 cannot be the zero vector since fffp+1 is not in S[fff1, . . . , fffp] = S[uuu1, . . . ,uuup].Hence we may divide vvvp+1 by its length to get uuup+1, a unit vector. Since(6.19a) can be solved for fffp+1, the reader can easily show that S[uuu1, . . . ,uuup+1] =

Page 22: Geometry in Rn - Texas A&M Universitystecher/LinearAlgebraPdfFiles/chapterSix.pdf · the shortest distance between any two points is the ... the length of the side opposite the angle

222 CHAPTER 6. GEOMETRY IN RN

S[fff1, . . . , fffp+1]. It remains to show that uuup+1, equivalently vvvp+1, is orthogonalto each of the preceding uuuk.

〈vvvp+1,uuuk〉 = 〈fffp+1 − Proju1fffp+1 − Proju2

fffp+1 − · · · − Projupfffp+1,uuuk〉

=

fffp+1 −p∑

j=1

〈fffp+1,uuuj〉uuuj ,uuuk⟩

= 〈fffp+1,uuuk〉 −p∑

j=1

〈fffp+1,uuuj〉〈uuuj ,uuuk〉

By assumption the set {uuuk : k = 1, . . . , p} is orthonormal. Thus 〈uuuj ,uuuk〉 = δjkand we have

〈vvvp+1,uuuk〉 = 〈fffp+1,uuuk〉 − 〈fffp+1,uuuk〉 = 0

Example 2. Construct an orthonormal basis for R3 from the vectors {(1, 0, 1),(2, 1, 0), (1, 1, 1)} by using the Gram–Schmidt algorithm

uuu1 =(1, 0, 1)√

2

vvv2 = (2, 1, 0)− 〈(2, 1, 0), (1, 0, 1)/√2〉((1, 0, 1)/

√2)

= (2, 1, 0)− (1, 0, 1) = (1, 1,−1)

uuu2 =(1, 1,−1)√

3

vvv3 = (1, 1, 1)− 〈(1, 1, 1), (1, 0, 1)/√2〉((1, 0, 1)/

√2)

− 〈(1, 1, 1), (1, 1,−1)/√3〉((1, 1,−1)/

√3)

= (1, 1, 1)− (1, 0, 1)− (1, 1,−1)

3=

(−1, 2, 1)

3

uuu3 =(−1, 2, 1)√

6�

In the preceding section we defined and showed how to calculate the pro-jection of a vector onto a unit vector. We now wish to define the projection ofa vector onto a subspace, and we do so in terms of the distance between thevector and the subspace.

Definition 6.8. Let xxx be any vector in Rn and let W be any subspace of Rn.

The projection of xxx ontoW , Projw(xxx), is defined to be that vector yyy inW whichminimizes ‖xxx−www‖ for www any vector in W .

It is not at all clear that there is such a vector; or perhaps there might be morethan one, and which should we pick?

Theorem 6.8. Let W be any m-dimensional subspace of Rn. Let U = {uuu1, . . . ,uuum} be any orthonormal basis of W . Then for any vector xxx in R

n there is a

Page 23: Geometry in Rn - Texas A&M Universitystecher/LinearAlgebraPdfFiles/chapterSix.pdf · the shortest distance between any two points is the ... the length of the side opposite the angle

6.3. CONSTRUCTION OF ORTHONORMAL BASES 223

unique yyy that minimizes ‖xxx−www‖ for www in W . Moreover,

yyy = ProjW (xxx) =

m∑

k=1

〈xxx,uuuk〉uuuk (6.20)

Proof. Let U = {uuu1, . . . ,uuum, vvv1, . . . , vvvn−m} be any orthonormal basis of Rn

whose first m vectors are the given orthonormal basis of W . Theorem 6.7 tellsus how to construct such a basis, and we also have

xxx =

m∑

k=1

〈xxx,uuuk〉uuuk +

n−m∑

k=1

〈xxx,vvvk〉vvvk

Let www be any vector in W . Then www =∑m

k=1〈www,uuuk〉uuuk. From (6.17) we have

‖xxx−www‖2 =

m∑

k=1

[〈xxx−www,uuuk〉]2 +n−m∑

k=1

[〈xxx,vvvk〉]2

Clearly, the second sum is constant regardless of the choice of www. However, thefirst sum equals zero if and only if 〈www,uuuk〉 = 〈xxx,uuuk〉 for each k. In other words,the minimum value of ‖xxx−www‖ occurs only when www =

∑mk=1〈xxx,uuuk〉uuuk. Moreover,

it is clear from the above equation that this minimum length equals

‖xxx− Projwwwxxx‖ =

[

n−m∑

k−1

〈xxx,vvvk〉2]1/2

and that xxx− Projw(xxx) is perpendicular to every vector in W .

Formula (6.20) also says that the projection of any vector xxx onto a subspaceW equals the sum of its projections onto the vectors of an orthonormal basis ofW . If xxx is an arbitrary vector in R

n, what is the projection of xxx onto Rn?

Definition 6.9. We define the distance between a vector xxx and a subspace Wto equal ‖xxx− ProjWxxx‖.

Example 3. Let xxx = (1, 2, 3). Compute the projection of xxx onto each of thefollowing subspaces.

a. W is the x1, x2 plane in R3. An orthonormal basis for W is the set

{(1, 0, 0), (0, 1, 0)}. Thus,

ProjW (1, 2, 3) = 〈(1, 2, 3), (1, 0, 0)〉(1, 0, 0) + 〈(1, 2, 3), (0, 1, 0)〉(0, 1, 0)= (1, 0, 0) + 2(0, 1, 0) = (1, 2, 0)

Page 24: Geometry in Rn - Texas A&M Universitystecher/LinearAlgebraPdfFiles/chapterSix.pdf · the shortest distance between any two points is the ... the length of the side opposite the angle

224 CHAPTER 6. GEOMETRY IN RN

xxx = (1, 2, 3)

Projwxxx = (1, 2, 0)

x3

x1

x2

b. W = S[(1, 1, 0), (0, 1,−1)]. Since the vectors (1, 1, 0) and (0, 1,−1) arelinearly independent, we may use them to construct an orthonormal basisfor W . This basis is {(1, 1, 0)/

√2, (−1, 1,−2)/

√6} and we have

ProjW (1, 2, 3) = 〈(1, 2, 3),uuu1〉uuu1 + 〈(1, 2, 3),uuu2〉uuu2

=

(

1

2

)

(1, 1, 0) +

(

5

6

)

(−1, 1,−2)

=

(

−1

3,4

3,−5

3

)

We conclude this section with a discussion of the properties of a change ofbasis matrix P relating two orthonormal bases. Let U = {uuuk : k = 1, . . . , n}and V = {vvvk : k = 1, . . . , n} be two orthonormal bases of Rn. Let P = [pjk] bethe matrix that gives the vectors uuuk as linear combinations of the vvvk. That is,

uuuk =

n∑

j=1

pjkvvvj (6.21a)

and if P−1 = Q = [qjk], then

vvvk =n∑

j=1

qjkuuuj (6.21b)

We remind the reader that the kth column of P consists of the coordinates ofthe vector uuuk with respect to the basis V . A similar comment of course appliesto the columns of P−1 = Q. However, these two bases are orthonormal. Thus,Theorem 6.5 may be used to compute the coordinates

pjk = 〈uuuk, vvvj〉 qjk = 〈vvvk,uuuj〉 (6.22)

Since our inner product is symmetric, we have

pjk = 〈uuuk, vvvj〉 = 〈vvvj ,uuuk〉 = qkj

Page 25: Geometry in Rn - Texas A&M Universitystecher/LinearAlgebraPdfFiles/chapterSix.pdf · the shortest distance between any two points is the ... the length of the side opposite the angle

6.3. CONSTRUCTION OF ORTHONORMAL BASES 225

In other words the matrix Q = P−1 is the transpose of the matrix P , or P−1 =PT , an extremely useful fact. We also note that the formulas

PPT = PTP = I

imply that both the rows and columns of P form orthonormal sets of vectors.

Definition 6.10. A matrix P is said to be orthogonal if PT = P−1.

Example 4. Let U = {(1, 0, 1)/√2, (1, 1,−1)/

√3, (−1, 2, 1)/

√6}. Find the

change of basis matrices P and P−1 relating this basis to the standard basis.

Solution. We know from Example 2 that U is an orthonormal basis of R3. ThusP−1 = PT .

P−1 =

1√2

1√3

− 1√6

01√3

2√6

1√2

− 1√3

1√6

P =

1√2

01√2

1√3

1√3

− 1√3

− 1√6

2√6

1√6

Clearly, when we deal with orthonormal bases, the amount of computationalwork is considerably lessened. There is of course the initial labor involved inconstructing such a basis, but it is usually well worth the effort.

Problem Set 6.3

1. Use the Gram–Schmidt procedure to construct an orthonormal basis foreach of the following subspaces of R3:

a. W = {(x1, x2, x3) : x1 − x2 = 0}b. W = S[(1,−1, 2), (6, 1, 1)]

2. Construct an orthonormal basis for R3 from the following basis, {(0, 5, 1),(0, 1,−5), (1,−2, 3)}.

3. Let W be the subspace of R4 spanned by the vectors fff1 = (1, 1, 0, 1) andfff2 = (3, 1, 4, 1). Compute the projection of xxx = (3, 0, 3, 3) onto W .

4. Find the distance from the point (1,−2, 3) to the plane 2x1−3x2+6x3 = 0.

5. Find the distance from the point (1,−2, 3) to the plane 2x1−3x2+6x3 = 2.

6. Show that U ={(

1√2, 1√

2

)

,(

−1√2, 1√

2

)}

and V = {(1, 0), (0, 1)} are both

orthonormal bases of R2. Find a change of basis matrix P relating U andV and verify that it is orthogonal.

Page 26: Geometry in Rn - Texas A&M Universitystecher/LinearAlgebraPdfFiles/chapterSix.pdf · the shortest distance between any two points is the ... the length of the side opposite the angle

226 CHAPTER 6. GEOMETRY IN RN

7. Construct an orthonormal basis for R4 from the basis {(0, 1, 1, 1), (1, 0, 1, 1),(1,1,0,1), (1, 1, 1, 0)}.

8. Show that {(x1, x2), (y1, y2)} is an orthonormal basis for R2 if and only if

the matrix

[

x1 y1x2 y2

]

is orthogonal.

9. Find an orthonormal basis for the kernels of each of the following matrices:

a.

[

1 23 6

]

b.

[

1 −1 24 6 3

]

c.

[

1 0 −1 3−3 1 0 1

]

10. Find an orthonormal basis for the ranges of each of the matrices in problem9.

11. We’ve seen that if U = {uuuj : j = 1, . . . , n} and V = {vvvj : j = 1, . . . , n}are two orthonormal bases of R

n, then the matrix P = [pjk] relatingthem is orthogonal. Conversely, show that if P is orthogonal and U is anorthonormal basis then V = {vvvj}, where the vectors in V are defined by

vvvj =

n∑

k=1

pkjuuuk

is also an orthonormal basis.

12. Show that if W is any subspace of Rn and xxx is any vector in Rn, then

there is a unique unit vector www0 in W such that the angle between xxx andwww0 is minimized. That is, the angle between xxx and www for any vector www inW is no smaller than that between xxx and www0.

13. Let V = P2. Define 〈fff,ggg〉 =´ 1

0fff(t)ggg(t)dt. The set B = {1, t, t2} is a

basis for V . Construct an orthonormal basis for V from B by using theGram–Schmidt procedure.

14. Let V = P2. Define 〈fff,ggg〉 = f0g0+f1g1+f2g2, where fff(t) = f0+f1t+f2t2

and ggg(t) = g0 + g1t+ g2t2. Show that {1, t, t2} is an orthonormal basis if

we use this inner product, but not if we use the inner product of problem13.

15. Let V = C[0, 1]. Define 〈fff,ggg〉 as we did in problem 13.

a. Compute the length of the vector sinπt.

b. Show that the set {1, sinπt, cosπt, . . . , sinnπt, cosnπt, . . .} is orthog-onal.

16. Let V = S[(1, 0, 1), (1, 1, 1)]. Show that (−1, 0, 1) is perpendicular to everyvector in V .

17. Let V andW be subspaces of Rn. We say that V andW are perpendicularif 〈xxx,yyy〉 = 0 for every xxx in V and yyy in W .

Page 27: Geometry in Rn - Texas A&M Universitystecher/LinearAlgebraPdfFiles/chapterSix.pdf · the shortest distance between any two points is the ... the length of the side opposite the angle

6.3. CONSTRUCTION OF ORTHONORMAL BASES 227

a. Suppose fff1 and fff2 are two perpendicular vectors. Show S[fff1] andS[fff2] are perpendicular.

b. Let {vvv, . . . , vvvp} be an orthogonal set of vectors. Show that S[vvv1, . . . , vvvk]and S[vvvk+1, . . . , vvvp} are perpendicular.

18. Let V be any subspace of R3. Show that dim(V ) + dim(V ⊥) = 3. Gener-alize this to R

n.

19. Let V be any subspace of R2. Show that any vextor xxx in R2 can be written

uniquely in the form xxx = vvv +www, for some vvv in V and www in V ⊥. Considerthe two special cases V = {000} and V = R

2 first. Then consider the caseV = S[vvv] for some fixed vector vvv.

20. Let V be any substance of Rn. Define V ⊥(V perp) by

V ⊥ = {xxx : 〈xxx,yyy〉 = 0 for every yyy in V }.

a. Show that V ⊥ is a subspace of Rn.

b. Show that V and V ⊥ are perpendicular in the sense of problem 17.

c. {000}⊥ = Rn, (Rn)⊥ = {000}.

d. (V ⊥)⊥ = V . Hint: What are the dimensions of the two spaces?

21. Use the result of problem 18 to show that for any subspace V of Rn thefollowing is true. Given any xxx in R

n, we can write xxx uniquely in formxxx = vvv +www for some vvv in V and www in V ⊥.

22. Given any unit vector uuu, Projuxxx is a linear transformation from Rn to R

n;cf. problem 10 in Section 6.2.

a. Find a “nice” matrix representation for Projuxxx.

b. Describe geometrically the two subspaces ker(Projuuu) and Rg(Projuuu).

c. What are the dimensions of the kernel and range of Projuuu?

23. Let A be a matrix representation of a linear transformation L : R2 → R

2,where L rotates the plane through some angle θ. Show that A is anorthogonal matrix.

24. Let W be any subspace of Rn. Define L(xxx) = ProjWxxx.

a. Show L is a linear transformation.

b. Find the range and kernel of L.

c. Find a “nice” matrix representation for L.

Page 28: Geometry in Rn - Texas A&M Universitystecher/LinearAlgebraPdfFiles/chapterSix.pdf · the shortest distance between any two points is the ... the length of the side opposite the angle

228 CHAPTER 6. GEOMETRY IN RN

6.4 Symmetric Matrices

Given a matrix A we defined the transpose of A seemingly for no special reason.There is, however, an important relationship between A and AT that is notapparent until we have an inner product. To demonstrate this relationship wetake the inner product of Axxx with yyy. Thus suppose A = [ajk] is anm×n matrix,xxx a vector in R

n, and yyy a vector in Rm. Then AT = [aTjk] is an n ×m matrix,

Axxx is in Rm and ATyyy is in R

n.

〈Axxx,yyy〉 =⟨(

n∑

k=1

a1kxk, . . . ,

n∑

k=1

amkxy

)

, yyy

=

m∑

j=1

(

n∑

k=1

ajkxk

)

yj =

n∑

k=1

xk

m∑

j=1

ajkyj

=

m∑

k=1

xk

m∑

j=1

aTkjyj

= 〈xxx,ATyyy〉

This is such a useful formula that we write it again

〈Axxx,yyy〉 = 〈xxx,ATyyy〉 (6.23)

Notice that if A = AT , then (6.23) becomes 〈Axxx,yyy〉 = 〈xxx,Ayyy〉.In Chapter 5 we stated that every symmetric matrix was similar to a diagonal

matrix. Put another way, we know that given any symmetric matrix there isa basis of Rn, which consists of eigenvectors of A. It turns out that it is alsopossible to construct this basis in such a manner that it is an orthonormal basis.This useful feature of symmetric matrices is a consequence of the next lemma.

Lemma 6.3. Let A be a symmetric matrix. Let λ1 and λ2 be two distincteigenvalues of A. Then any pair of eigenvectors fff1 and fff2 corresponding to λ1and λ2, respectively, must be perpendicular.

Proof.

λ1(fff1, fff2〉 = 〈λ1fff1, fff2〉= 〈Afff1, fff2〉 = 〈fff1, Afff2〉= 〈fff1, λ2fff2〉 = λ2〈fff1, fff2〉

Thus, we have (λ1 − λ2)〈fff1, fff2〉 = 0. Since λ1 − λ2 6= 0, we must have 〈fff1, fff2〉= 0, i.e., the eigenvectors are perpendicular.

Example 1. Let A =

[

2 33 4

]

. Find the eigenvectors of A and verify Lemma 6.3

for this symmetric matrix.

Page 29: Geometry in Rn - Texas A&M Universitystecher/LinearAlgebraPdfFiles/chapterSix.pdf · the shortest distance between any two points is the ... the length of the side opposite the angle

6.4. SYMMETRIC MATRICES 229

Solution. A quick calculation shows that the characteristic polynomial of Ais p(λ) = λ2 − 6λ − 1. The eigenvalues are 3 ±

√10 and their corresponding

eigenvectors are

λ1 = 3 +√10 fff1 = (−1−

√10, 3)

λ2 = 3−√10 fff2 = (−1 +

√10, 3)

Computing the inner product of the eigenvectors we have

〈fff1, fff2〉 = 〈(−1−√10, 3), (−1 +

√10, 3)〉

= (−1−√10)(−1 +

√10) + 9 = 0 �

The procedure for constructing an orthonormal basis from the eigenvectors of asymmetric matrix is now relatively easy. We first find the eigenvalues of A fromits characteristic polynomial

p(λ) = (λ− λ1)m1(λ− λ2)

m2 . . . (λ− λp)mp

We next find a basis for each of the eigenspaces ker(A− λjI). An orthonormalbasis for each of these eigenspaces is constructed by using the Gram–Schmidtprocedure. These orthonormal bases are then adjoined to form a basis of Rn.It is Lemma 6.3 which guarantees that combining these individually orthogonalsets will produce an orthogonal set. Since each vector has length 1 to beginwith, they will remain unit vectors. In other words, suppose that {fff1, . . . , fffm1

}and {ggg1, . . . , gggm2

} are orthonormal bases of ker(A − λ1III) and ker(A − d2III),respectively. Then since 〈fff j , gggk〉 = 0 for every j and k, we conclude that{fff1, . . . , gggm2

} is also an orthonormal set.

Example 2. Find an orthogonal matrix P such that PTAP is a diagonal ma-trix, where A is the matrix

7 −2 −1−2 10 2−1 2 7

Solution. Since A is symmetric, we know that there is an orthonormal basisof R3 consisting of eigenvectors. If P is the matrix whose columns are theseeigenvectors then P−1 = PT and we have P−1AP = PTAP , a diagonal matrix.Computing the characteristic polynomial of A we have

p(λ) = det(A− λI) = (6− λ)2(12− λ) �

The eigenvalues of A are 6 with multiplicity 2 and 12 with multiplicity 1. Welist the corresponding eigenvectors.

λ1 = 6 fff1 = (1, 0, 1) fff2 = (2, 1, 0)

λ2 = 12 fff3 = (−1, 2, 1)

Page 30: Geometry in Rn - Texas A&M Universitystecher/LinearAlgebraPdfFiles/chapterSix.pdf · the shortest distance between any two points is the ... the length of the side opposite the angle

230 CHAPTER 6. GEOMETRY IN RN

Since the eigenvectors fff1 and fff2 correspond to the eigenvalue 6, they are au-tomatically perpendicular to the eigenvector fff3. To construct our orthonormalbasis we use the Gram–Schmidt procedure for the first two eigenvectors andthen divide fff3 by its length

uuu1 =(1, 0, 1)√

2

vvv2 = fff2 − 〈fff2,uuu1〉uuu1 = (1, 1,−1)

uuu2 =vvv2‖vvv2‖

=(1, 1,−1)√

3

uuu3 =fff3‖fff3‖

=(−1, 2, 1)√

6

Thus,

P =

1√2

1√3

− 1√6

01√3

2√6

1√2

− 1√3

1√6

and

PTAP =

6 0 00 6 00 0 12

We summarize this discussion in the following theorem.

Theorem 6.9. Let A be an n × n symmetric matrix (A = AT ). Then Rn has

an orthonormal basis of eigenvectors of A, and there is an orthogonal matrix Psuch that PTAP = D is a diagonal matrix. The diagonal elements of D are theeigenvalues of A and the columns of P are eigenvectors of A.

Another consequence of formula (6.23), which we use in the next section, isthe fact that if A is any m× n matrix then ATAxxx = 000 if and only if Axxx = 000.

Lemma 6.4. Let A be an m × n matrix. Then the following statements aretrue:

a. ATAxxx = 000 if and only if Axxx = 000

b. Rank(ATA) = rank(A) = rank(AT )

Proof. Clearly, if Axxx = 000, then ATAxxx = 000. Thus, to verify a it suffices to showthat if ATAxxx = 000, then Axxx = 000. Assuming ATAxxx = 000 we have

0 = 〈ATAxxx,xxx〉 = 〈Axxx,Axxx〉 = ‖Axxx‖2

Page 31: Geometry in Rn - Texas A&M Universitystecher/LinearAlgebraPdfFiles/chapterSix.pdf · the shortest distance between any two points is the ... the length of the side opposite the angle

6.4. SYMMETRIC MATRICES 231

Thus we also have Axxx = 000. To see that rank(ATA) equals rank(A) we note thatpart a has shown that ker(ATA) equals ker(A). Thus

Rank(ATA) = n− dim(kerATA) = n− dim(kerA)

= rank(A)

That rank(A) = rank(AT ) was proved in Chapter 3; cf. Theorem 3.5.

Problem Set 6.4

1. For each of the following matrices verify formula (6.23):

a.

[

1 23 4

]

b.

[

3 6 81 2 4

]

c.

2 41 −15 6

2. Verify (6.23) for each of the following matrices:

a.

[

1 33 1

]

b.

0 1 01 0 20 2 0

c.

6 0 00 3 00 0 2

3. An n × n symmetric matrix is said to be positive definite if 〈Axxx,xxx〉 ispositive for each nonzero vector xxx in R

n.

a. Show that the matrix

[

3 11 2

]

is positive definite.

b. Show that the matrix

[

a bb d

]

is positive definite if and only if a > 0

and ad− b2 > 0.

c. Find a similar criterion for 3× 3 symmetric matrices.

4. Show that a symmetric matrix is positive definite (〈Ax, x〉 > 0 if x̄ 6= 0)if and only if each of its eigenvalues is positive. (Hint: If A is positivedefinite so is PTAP when P is a nonsingular matrix.)

5. Find an orthonormal basis of eigenvectors for each of the following matri-ces:

a.

[

1 22 −3

]

b.

[

6 00 4

]

c.

[

3 −1−1 2

]

6. Find an orthonormal basis of eigenvectors for each of the following matri-ces:

a.

8 −1 1−1 8 11 1 8

b.

−2 3 03 4 00 0 2

7. Find an orthonormal basis of eigenvectors for each matrix in problem 2.

Page 32: Geometry in Rn - Texas A&M Universitystecher/LinearAlgebraPdfFiles/chapterSix.pdf · the shortest distance between any two points is the ... the length of the side opposite the angle

232 CHAPTER 6. GEOMETRY IN RN

8. For each of the matrices A of problem 5 find P and D such that PTAP =D, where D is a diagonal matrix.

9. For each of the matrices A of problem 6 find P and D such that PTAP =D, where D is a diagonal matrix.

10. Let A be any symmetric matrix. Show that PTAP is also a symmetricmatrix.

11. Let A be any n×n matrix. Show that A is symmetric if and only if thereis an orthonormal basis of Rn consisting of eigenvectors of A.

12. Let A be an m × n matrix. Show that if Axxx = bbb has a solution, then bbbmust be perpendicular to ker(AT ). [Hint: If Axxx = bbb and yyy is in ker(AT ),we have 〈bbb,yyy〉 = 〈Axxx,yyy〉 =?]

13. Show that the converse of problem 12 is also true. That is, show that if bbbis perpendicular to ker(AT ), then the equation Axxx = bbb has a solution.

14. Let V be a vector space with an inner product 〈 , 〉. A linear trans-formation L : V → V is said to be symmetric if 〈Lxxx,yyy〉 = 〈xxx, Lyyy〉 forevery pair of vectors xxx and yyy in V . Let V be P1 and for fff,ggg in V define

〈fff,ggg〉 =´ 1

0fff(t)ggg(t)dt. Decide which, if any, of the following linear trans-

formations is symmetric.

a. L[fff ] = fff ′ b. L[fff ] = fff ′′ c. L[fff ] = tttfff ′

15. Let L be a linear transformation from R2 to R

2.

a. Show that for each xxx in R2 there is a unique yyy in R

2 such that〈xxx, L[zzz]〉 = 〈yyy,zzz〉 for every zzz in R

2. This vector, yyy, will be denoted asLT [xxx]. Thus, we have the formula 〈xxx, L[zzz]〉 = 〈LT [xxx], zzz〉.

b. Show that if A is the matrix representation of L with respect to thestandard basis of R2, then AT is the matrix representation of LT .

Note, a linear transformation is said to be symmetric if L = LT . Thus, wehave shown that L is symmetric if and only if its matrix representation,with respect to the standard basis, is a symmetric matrix.

16. Generalize problem 15 to Rn.

17. For each of the linear transformations in problem 14, find LT .

18. Let V = C[0, 1]. Define 〈fff,ggg〉 =´ 1

0fff(t)ggg(t)dt. Let L : V → V be defined

as L[fff ] =´ t

0fff(s)ds. Define the transpose of L as in problem 15, and find

a formula for it.

Page 33: Geometry in Rn - Texas A&M Universitystecher/LinearAlgebraPdfFiles/chapterSix.pdf · the shortest distance between any two points is the ... the length of the side opposite the angle

6.5. LEAST SQUARES 233

6.5 Least Squares

The first problem we considered at the start of this text was that of solvinga system of linear equations. At that time, we noticed that there are systemswhich have no solution. In the language of matrices and linear transformations,this translates to the statement that Axxx = bbb does not have a solution unless bbbis in the range of A. What we can do at this time is to find a “best” possiblesolution. That is, we find that vector yyy in Rg(A) which is closest to bbb. Thus, ifwe cannot solve Axxx = bbb, we solve the equation

Axxx = ProjRgAAA(bbb) (6.24)

Since (6.24) will in general have many solutions, we restrict our discussion tothe case where A is one-to-one or equivalently ker(A) consists of just the zerovector.

With the preceding in mind let A = [ajk] be anm×n matrix with rank(A) =n ≤ m. Note that if m < n, then A cannot be one-to-one. In terms of a systemof linear equations the restriction n ≤ m says that there are at least as manyequations as unknowns. (The number of unknowns is never more than thenumber of equations.)

Theorem 6.10. Let A be an m× n matrix with n ≤ m and rank(A) = n. Letbbb be any vector in R

m. Then the solution to Axxx = ProjRgAAA(bbb) is given by

xxx = (ATA)−1(ATbbb) (6.25)

Proof. The trick in the proof is to pick a nice basis for Rn. We first note thatATA is a symmetric n × n matrix. Moreover, by Lemma 6.4 we know thatrank(ATA) = rank(A) = n. Theorem 6.9 guarantees an orthonormal basisU = {uuu1, . . . ,uuun} of Rn such that ATAuuuk = dkuuuk. Moreover,

〈Auuuj , Auuuk〉 = 〈ATAuuujuuuk〉= dj〈uuuj ,uuuk〉 = djδjk

Thus, the vectors Auuuj are mutually perpendicular, and since ker(A) is just thezero vector, we have ‖Auuuj‖2 = dj > 0. We therefore conclude that {Auuuj/

dj :j = 1, . . . , n} is an orthonormal basis of Rg(A). By (6.20) we have

ProjRgAAA(bbb) =

n∑

j=1

〈bbb, Auuuj/√

dj〉Auuuj√

dj

=

n∑

j=1

1

dj〈ATbbb,uuuj〉Auuuj

= A

n∑

j=1

1

dj〈ATbbb,uuuj〉uuuj

Page 34: Geometry in Rn - Texas A&M Universitystecher/LinearAlgebraPdfFiles/chapterSix.pdf · the shortest distance between any two points is the ... the length of the side opposite the angle

234 CHAPTER 6. GEOMETRY IN RN

Setting

xxx =

n∑

j=1

1

dj〈AT b,uuuj〉uuuj

we see, since A is 1 − 1, that xxx is the unique solution to Axxx = ProjRgAAA(bbb).

Moreover since (ATA)uuuj = djuuuj we also have (ATA)−1uuuj = (1/dj)uuuj . Thus,

xxx =

n∑

j=1

〈ATbbb,uuuj〉(ATA)−1uuuj

= (ATA)−1

n∑

j=1

〈ATbbb,uuuj〉uuuj

But the set {uuuj} is an orthonormal basis of Rn; hence the term in brackets isequal to ATbbb. Thus, xxx = (ATA)−1ATbbb is that unique vector in R

n such thatAxxx is closest to bbb.

This formula has an immediate application to curve fitting. Suppose we have aset of data points (xj , yj), 1 ≤ j ≤ n, and we wish to find a straight line passingthrough these points. If there are more than two data points, such a line isusually nonexistent. See Figure 6.12. What is normally done in this situationis to find that straight line y = mx + b, such that the sum

∑nj=1 e

2j =

∑nj=1

[yj−(mxj+b)]2 is minimized. The numbers ej equal the error in approximating

yj by mxj + b. Thus, in a certain sense, picking m and b in order to minimizethe above sum gives us the best straight-line approximation to our data. Thisline is often referred to as the least squares fit.

(x1, y1)

}

e3e1

{

e2

{

e4{

(x4, y4)

(x3, y3)

(x2, y2)

Figure 6.12

At this time the reader might find it advisable to review the discussionimmediately preceding Example 4 in Section 3.4.

Let A be the n× 2 matrix

1 x1...

......

...1 xn

Page 35: Geometry in Rn - Texas A&M Universitystecher/LinearAlgebraPdfFiles/chapterSix.pdf · the shortest distance between any two points is the ... the length of the side opposite the angle

6.5. LEAST SQUARES 235

Think of R2 as pairs of numbers of the form (b,m)T , and A : R2 → R

n. Wewish to find a solution to the equation

A

[

bm

]

=

y1......yn

The pair (b,m) is a solution to this equation if and only if the line y = mx+ bpasses through each of the data points (xj , yj). Realizing that this is unlikelywe look for that pair (b,m) such that A(b,m)T is closest to (y1, . . . , yn)

T . Sincerank(A) is two (assuming at least two different xj ’s), we may apply Theo-rem 6.10. Thus, our approximate solution is

[

bm

]

= (ATA)−1AT

y1y2...yn

On easily calculates that

ATA =

nn∑

j=1

xj

n∑

j=1

xjn∑

j=1

x2j

and that ATyyyT =

n∑

j=1

yj

n∑

j=1

xjyj

where yyy = (y1, . . . , yn). Using Cramer’s rule we have

b =

det

n∑

j=1

yjn∑

j=1

xj

n∑

j=1

xjyjn∑

j=1

x2j

det

nn∑

j=1

xj

n∑

j=1

xjn∑

j=1

x2j

(6.26)

m =

det

nn∑

j=1

yj

n∑

j=1

xjn∑

j=1

xjyj

det

nn∑

j=1

xj

n∑

j=1

xjn∑

j=1

x2j

Page 36: Geometry in Rn - Texas A&M Universitystecher/LinearAlgebraPdfFiles/chapterSix.pdf · the shortest distance between any two points is the ... the length of the side opposite the angle

236 CHAPTER 6. GEOMETRY IN RN

Example 1. Find the least squares fit to the following data:

(1,−1), (2, 3), (3, 4), (7, 5)

Solution. We first construct the following table:

xj 1 2 3 74∑

j=1

xj = 13

x2j 1 4 9 494∑

J=1

x2j = 63

yj −1 3 4 54∑

j=1

yj = 11

xjyj −1 6 12 354∑

j=1

xjyj = 52

A is a 4× 2 matrix and we have

A =

1 11 21 31 7

ATA =

[

4 1313 63

]

AT y =

[

1152

]

b =

det

[

11 1352 63

]

det

[

4 1313 63

] =17

83m =

det

[

4 1113 52

]

det

[

4 1313 63

] =65

83

Thus, y = 6583x+

1783 is the least squares straight-line approximation to our data.�

There is no a priori reason why one should always insist upon fitting astraight line to data. For example, we might wish to fit a parabola to the data.That is, find a0, a1, and a2 such that y = a0 + a1x+ a2x

2 is the quadratic leastsquares fit, cf. problem 5.

Problem Set 6.5

1. Let A =

2 10 23 4

a. Determine the range of A, and show that (1,1,0) is not in the range,i.e., the equation Axxx = (1, 1, 0)T does not have a solution.

b. Compute ATA and show that it is one to one.

c. Solve the equation ATAxxx = ATbbb, where bbb = (1, 1, 0).

d. If xxx is your solution from part c, show that ‖Axxx− bbb‖ is smaller than‖www − bbb‖ for any vector www in the range of A.

Page 37: Geometry in Rn - Texas A&M Universitystecher/LinearAlgebraPdfFiles/chapterSix.pdf · the shortest distance between any two points is the ... the length of the side opposite the angle

6.5. LEAST SQUARES 237

2. Determine the straight-line least squares fit for the following data: (1,1),(2,−3), (4,0), (5,1), (10,3).

3. Determine the straight-line least squares fit for the following data:

a. (0,6), (3,0), (4,−2)

b. (−2, 4), (3,9), (4,7)

4. Consider the system of equations:

3x1 + 4x2 + 8x3 = 0x1 − x3 = 1

2x1 + x2 + 4x3 = 0x1 + x2 + x3 = 0

a. This system is overdetermined (more equations than unknowns) andmay not have a solution. Show that if there is a solution, it is unique.

b. Show that this system does not have a solution, and then find xxx inR

3 such that Axxx is that vector in the range of A closest to (0,1,0,0).

5. Determine the least squares quadratic fit for the following data; i.e., findp(x) = a0 + a1x+ a2x

2, such that∑n

j=1[p(xj)− yj ]2 is minimized, where

(xj , yj) are the given data. Remember, you will need to solve an equationof the form ATAxxx = ATbbb.

a. (−2, 4), (3,9), (4,7)

b. (1,1), (2,−3), (4,0) (5,1), (10,3)

Supplementary Problems

1. Define each of the following and give an example of each:

a. Length of a vector

b. Angle between two vectors

c. Orthonormal basis

d. Projection onto a subspace

e. Orthogonal matrix

2. Let xxx be a vector in Rn.

a. Suppose 〈xxx,yyy〉 = 0 for every yyy in Rn, show that xxx must be the zero

vector.

b. Suppose 〈xxx,yyy〉 = 0 for every vector yyy in some spanning set F of Rn.Show that xxx must be the zero vector.

Page 38: Geometry in Rn - Texas A&M Universitystecher/LinearAlgebraPdfFiles/chapterSix.pdf · the shortest distance between any two points is the ... the length of the side opposite the angle

238 CHAPTER 6. GEOMETRY IN RN

3. Compute the inner product and the cosine of the angle between each ofthe following pairs of vectors:

a. (−4, 5), (1,2) b. (−2, 3, 7), (2,−4, 5) c. (−1,−2, 3, 5), (1,1,0,8)

4. Let xxx0 = (1,−2, 6). Show that the vector (1,−2, 0) is that vector in thesubspace x3 = 0 which is closest to xxx0, by showing that

f(x, y) = ‖(1,−2, 6)− x(1, 0, 0)− y(0, 1, 0)‖2

obtains its minimum when x = 1 and y = −2. Repeat this for xxx0 =(a, b, c), an arbitrary vector in R

3.

5. Let V =M22. Define the inner product of two 2× 2 matrices by

〈A,B〉 =2∑

j=1

2∑

k=1

ajkbk

a. Show that this inner product satisfies properties a through c of The-orem 6.3.

b. Show that Theorem 6.4 is valid.

c. Let A =

[

c 6−3 2

]

, where c is a fixed constant. Let E1 =

[

1 00 0

]

.

Let f(t) = ‖A− tE1‖2. Find that value of t which minimizes f(t).

6. Given two vectors xxx = (x1, x2, x3) and yyy = (y1, y2, y3) in R3, define their

cross product as (cf. problem 8 in Supplementary Problems to Chapter 4).

xxx× yyy = (x2y3 − x3y2, x3y1 − x1y3, x1y2 − x2y1)

a. Show that the cross product of xxx and yyy is perpendicular to both xxxand yyy.

b. Show that xxx× yyy = 000 if and only if xxx and yyy are linearly dependent.

c. Show that iii× jjj = kkk, jjj × kkk = iii, and kkk × iii = jjj.

d. Verify that xxx× yyy = −yyy × xxx.

e. Show that xxx× (yyy + zzz) = (xxx× yyy) + (xxx+ zzz).

f. Find three vectors xxx,yyy, and zzz for which xxx × (yyy × zzz) 6= (xxx × yyy) × zzz.Hint: Use parts b and c.

Thus, the cross product of two vectors is a noncommutative, nonassocia-tive operation that produces a vector perpendicular to both of the originalvectors.

7. Define xxx× yyy as in problem 6. Show that

‖xxx× yyy‖2 + 〈xxx,yyy〉2 = ‖xxx‖2‖yyy‖2

Page 39: Geometry in Rn - Texas A&M Universitystecher/LinearAlgebraPdfFiles/chapterSix.pdf · the shortest distance between any two points is the ... the length of the side opposite the angle

6.5. LEAST SQUARES 239

a. Deduce from the above formula that ‖xxx×yyy‖ = ‖xxx‖‖yyy‖ sin θ, where θis the angle between the vectors xxx and yyy.

b. If P is the parallelogram determined by xxx and yyy, show that area(P ) =‖xxx× yyy‖.

8. If xxx× yyy is defined as in problem 6, show that

(xxx× yyy)′ = (xxx′ × yyy) + (xxx× yyy′)

where we assume that both xxx and yyy are vector-valued functions of a realvariable and the ′ denotes differentiation.

9. Let P be an orthogonal n× n matrix. Let xxx and yyy be any two vectors inR

n.

a. Show that 〈xxx,yyy〉 = 〈Pxxx, Pyyy〉. Deduce from this that the linear trans-formation L given by Lxxx = Pxxx preserves the lengths of vectors andthe angles between them.

b. Conversely, show that if P is an n × n matrix for which 〈xxx,yyy〉 =〈Pxxx, Pyyy〉 for every pair of vectors in R

n, then P is an orthogonalmatrix.

10. A mapping T from Rn to R

n is said to be affine if Txxx = Axxx+aaa, where A isan n×n matrix and aaa is a fixed vector in R

n. Clearly, if A is an orthogonalmatrix, then T is distance preserving, i.e., ‖Txxx − Tyyy‖ = ‖xxx − yyy‖. Showthat the converse of this is also true. That is, if T is any mapping thatpreserves distance then T is affine and the matrix A is orthogonal.

11. Let xxx(t) and yyy(t) be two vector-valued functions from R to R2. If xxx(t) =

(x1(t), x2(t)), define x′(t) = (x′1, x

′2).

a. Let c(t) be a real-valued differentiable function. Show that [cxxx]′ =c′xxx+ cxxx′.

b. Show that 〈xxx,yyy〉′ = 〈xxx′, yyy〉+ 〈xxx,yyy′〉.c. If xxx(t) is a vector-valued function with constant nonzero length, show

that xxx and xxx′ are perpendicular if xxx′ is not the zero vector.

12. A linear transformation L from Rn to R

n is positive definite if 〈Lxxx,xxx〉 ≥ 0for all vectors xxx, and whenever the inner product of xxx and Lxxx equals zero,then xxx equals 000. Let A = [ajk] be the matrix representation of L withrespect to the standard basis. Assume that A is a symmetric matrix.

a. If n = 2, show that L is positive definite if and only if a11 > 0, anddet(A) > 0.

b. If n = 3, show that L is positive definite if and only if a11 > 0,det(M33) > 0, and det(A) > 0, where M33 is the 2× 2 matrix in theupper left-hand corner of A.

Page 40: Geometry in Rn - Texas A&M Universitystecher/LinearAlgebraPdfFiles/chapterSix.pdf · the shortest distance between any two points is the ... the length of the side opposite the angle

240 CHAPTER 6. GEOMETRY IN RN

13. Let P be a linear transformation from Rn to R

n. Suppose P 2 = P .That is, P (P (xxx)) = Pxxx for all vectors in R

n. Such mappings are calledprojections.

a. Show that (I − P )2 = I − P . Thus, if P is a projection so too isI − P .

b. Show that the following are equivalent:

(1) xxx is in the range of P .

(2) Pxxx = xxx.

(3) (I − P )xxx = 000.

c. Show that ker(P ) = Rg(I − P ).

d. A projection is said to be orthogonal if ker(P ) is orthogonal to Rg(P ).Show that P is an orthogonal projection if and only if P = PT .

e. Show that the projections defined in Section 6.2 are orthogonal pro-jections in the sense of part d.

14. Let T : P2 → P3 be a linear transformation given by

T [ppp](t) =

ˆ t

0

ppp(s)ds

Define an inner product on P2 by 〈ppp,qqq〉 = p0q0 + p1q1 + p2q2, whereppp(t) = p0+p1t+p2t

2. Define an inner product on P3 in a similar manner.

a. Show that ppp(t) ≡ 1 is not in the range of T .

b. Show that T is one-to-one.

c. Find the least squares solution to T (ppp) = 1. That is, find ppp in P2 suchthat T (ppp) is that vector in the range of T closest to the polynomialidentically 1.

15. Let V = M22. Define the inner product of two matrices A = [ajk] andB = [bjk] by

〈A,B〉 =2∑

j=1

2∑

k=1

ajkbjk

Let F be the set consisting of the three matrices below:{[

1 2−3 1

] [

0 1−2 0

] [

1 −1−1 1

]}

a. Show that F is a linearly independent set.

b. Construct orthonormal bases for V and for S[F ].

16. Define T : P2 → P3 by T [ppp](t) =´ t

0ppp(s)ds− (t/2)ppp(t).

a. Show that T is a linear transformation and determine its range andkernel.

b. If inner products on P2 and P3 are defined as in problem 14, constructorthonormal bases for the range and kernel of T .