lecture notes for math 245, linear algebra 2

Lecture Notes for MATH 245, Linear Algebra 2

by Stephen New

0

1. Affine Spaces, Convex Sets and Simplices

1.1 Note: In this section, we let F denote a fixed field (for example F = Q, R, C or Zpwith p prime) and let W be a fixed vector space over F .

1.2 Definition: An affine space in W is a set of the form

P = p+ U ={p+ u

∣∣u ∈ U}for some element p ∈ W and some subspace U ⊆ W . An element in an affine space iscalled a point.

1.3 Example: Every subspace U ⊆W is also an affine space in W (since U = 0+U). Foran element a ∈ U , we can call the element a a vector if we are considering U as a vectorspace, and we can call the element a point if we are considering U as an affine space.

1.4 Example: For a subspace U ⊆W , the quotient space W/U is the set

W/U ={p+ U

∣∣p ∈W}.The operations in W/U are given by (p+U)+(q+U) = (p+q)+U and t(p+U) = tp+U .

1.5 Theorem: Let p, q ∈W be points and let U, V ⊆W be subspaces. Then

(1) p+ U ⊆ q + V if and only if U ⊆ V and p− q ∈ V , and(2) p+ U = q + V if and only if U = V and p− q ∈ U .

Proof: Suppose that p+ U ⊆ q + V . Since p = p+ 0 ∈ p+ U , we also have p ∈ q + V , sayp = q + v where v ∈ V . Then p− q = v ∈ V . Let u ∈ U . Then we have p+ u ∈ p+U andso p + u ∈ q + V , say p + u = q + w where w ∈ V . Then u = w − (p − q) = w − v ∈ V .Conversely, suppose that U ⊆ V and p − q ∈ V , say p − q = v ∈ V . Let a ∈ p + U , saya = p+ u where u ∈ U . Then we have a = p+ u = (q+ v) + u = q+ (u+ v) ∈ q+ V sinceu+ v ∈ V . This proves part (1), and part (2) follows immediately from part (1).

1.6 Definition: Let P be an affine space in W , say P = p + U where p ∈ W is a pointand U ⊆ W is a subspace. The vector space U , which by the above theorem is uniquelydetermined, is called the associated vector space of P , and we say the P is the affinespace through p in the direction of U . We define the dimension of P to be

dim(P ) = dim(U).

Similarly, the codimension of P in W is codimW (P ) = codimW (U) = dim(W/U).

1.7 Definition: A line in W is a 1-dimensional affine space in W . A plane in W is a2-dimensional affine space in W . We often call a 0-dimensional affine space in W a point(although, strictly speaking, a 0-dimensional affine space in W is a one-element set whichcontains a point). A hyperplane in W is an affine space in W of codimension 1 (so whendim(W ) = n, a hyperplane in W is an (n− 1)-dimensional affine space in W ).

1

1.8 Example: Let u1, u2, · · · , uk ∈ Fn, let A = {u1, u2, · · · , uk}, let U = SpanA, letp ∈ Fn, P = p+ U , and let A = (u1, u2, · · · , uk) ∈Mn×k(F). Note that

U = SpanA ={ k∑i=1

tiui

∣∣∣ each ti ∈ F}

={At∣∣ t ∈ Fk

}= Col(A).

We can calculate dim(P ) = dim(U) in several ways. For example, we can row reduce thematrix A to obtain a reduced row-echelon matrix R. If the pivots in R occur in columns1 ≤ j1 ≤ j2 ≤ · · · ≤ jr ≤ n, then {uj1 , uj2 , · · · , ujr} (the set of corresponding columns inA) is a basis for U = Col(A) and we have dim(P ) = dim(U) = r = rank(A). Alternatively,we can row-reduce the matrix AT to obtain a row-reduced echelon matrix S. The nonzerorows of S then form a basis for Row(S) = Row(AT ) = U .

1.9 Example: Let A ∈Mk×n(F) and let b ∈ Fk. If P is the solution set

P ={x ∈ Fn

∣∣Ax = b}

then either P = ∅ (the empty set) or P is an affine space in Fn. Indeed if p ∈ Fn is in thesolution set so that Ap = b, then for x ∈ Fn we have

Ax = b ⇐⇒ Ax = Ap ⇐⇒ A(x− p) = 0 ⇐⇒ (x− p) ∈ Null(A) ⇐⇒ x ∈ p+ Null(A)

and so the solution set is the affine space P = p+U where U = Null(A). We can determinewhether Ax = b has a solution, and if so we can determine a solution and find a basis forU = Null(A) using Gauss-Jordan elimination. We row reduce the augmented matrix (A|b)to obtain a row-reduced augmented matrix, say

(A|b) ∼(R0

∣∣∣∣ cd)

where R is in row reduced echelon form with non-zero rows. If d 6= 0 then there isno solution and if d = 0 then the solution is obtained from R and c as follows. Let1 ≤ j1 ≤ j2 ≤ · · · ≤ jr ≤ n be the pivot column indices and let 0 ≤ l1 ≤ l2 ≤ · · · ≤ ls ≤ nbe the non-pivot column indices in R (so that r+s = n). Let u1, u2, · · · , un be the columnsof R so R = (u1, u2, · · · , un) ∈ Mr×n(F). Write RJ = (uj1 , uj2 , · · · , ujr ) ∈ Mr×r(F) andRL = (ul1 , ul2 , · · · , uls) ∈ Mr×s(F) and note that RJ = I. For B ∈ Mn×s(F) withrow vectors v1, v2, · · · , vn ∈ Fs so that we have B = (v1, · · · , vn)T ∈ Mn×s(F), writeBJ = (vj1 , vj2 , · · · , vjr )T ∈ Mr×s(F) and BL = (vl1 , vl2 , · · · , vls)T ∈ Ms×s(F), and forp ∈ Fn write pJ = (pj1 , pj2 , · · · , pjr )T ∈ Fr and pL = (pl1 , pl2 , · · · , pls)T ∈ Fs. Then thesolution to Ax = b is given by

x = p+Bt , where pJ = c, pL = 0, BJ = −RL and BL = I.

Because the matrix B ∈ Mn×r(F) includes s linearly independent rows, namely the rowsin BL = I, it follows that the columns of B are linearly independent and form a basis forU = Null(A) = Col(B).

1.10 Example: Let L : W → V be a linear map and let a ∈W and b ∈ V . If b /∈ Range(L)then L−1(b) = ∅. If b ∈ Range(L) with L(a) = b then L−1(b) = a + Null(L) because forx ∈W we have L(x) = b ⇐⇒ L(x− a) = L(x)− L(a) = b− b = 0.

2

1.11 Theorem: Let A be a non-empty set and for each α ∈ A, let Pα be an affine spacein W . Let S =

⋂α∈A

Pα. Then either S = ∅ or S is an affine space in W .

Proof: Suppose that S 6= ∅. Choose p ∈ S. For each α ∈ A, let Uα be the associatedvector space of Pα, and note that, since p ∈ Pα, we have Pα = pα + Uα. Let U =

⋂α∈A

Uα.

Note that U is a subspace of W . Indeed 0 ∈ U and if u, v ∈ U and t ∈ F, then for everyα ∈ A we have u, v ∈ Uα so that u + v ∈ Uα and tu ∈ Uα, and hence u + v ∈ U andtu ∈ U . We claim that S = p + U . Let x ∈ S =

⋂α∈A

Pα =⋂α∈A

(p + Uα). For each α ∈ A,

choose uα ∈ Uα so that x = p + uα. Fix β ∈ A and let u = uβ . Note that for all α ∈ Awe have uα = x − p = uβ = u. Thus u ∈

⋂α∈A

Uα = U and we have x = p + u ∈ p + U .

Conversely, let y ∈ p+U , say y = p+u with u ∈ U . Then for every α ∈ A we have u ∈ Uαso y = p+ u ∈ p+ Uα = Pα. Since y ∈ Pα for all α, we have y ∈ S.

1.12 Definition: Let ∅ 6= S ⊆ W . We define the affine span of S, denoted by 〈S〉, tobe the smallest affine space in W which contains S, or equivalently, the intersection of allaffine spaces in W which contain S. Sometimes we omit set brackets from our notation, sofor example when a0, a1, · · · , al ∈ Fn we usually write 〈a0, a1, · · · , al〉 =

⟨{a0, a1, · · · , al}

⟩.

1.13 Theorem: Let ∅ 6= S ⊆W and let p ∈ S. Let U = Span{a− p

∣∣ a ∈ S}. Then

〈S〉 = p+ U ={ n∑i=0

siai

∣∣∣n ∈ N , ai ∈ S , si ∈ F ,n∑i=0

si = 1}.

Proof: Let a ∈ S. Then a − p ∈ U and so a = p + (a − p) ∈ p + U . Thus p + U is anaffine space in W which contains S, and so we have 〈S〉 ⊆ p + U . Let Q be any affinespace in W which contains S. Note that since p ∈ S, we also have p ∈ Q. Let V be theassociated vector space of Q so that Q = p+ V . For every a ∈ S we have a ∈ Q = p+ Vand so a − p ∈ V . It follows that U = Span

{a − p

∣∣ a ∈ S} ⊆ V . Since U ⊆ V we havep+U ⊆ p+ V = Q. Since p+U ⊆ Q for every affine space Q which contains S, it followsthat p+ U ⊆ 〈S〉. Thus we have shown that

〈S〉 = p+ U = p+ Span{a− p

∣∣ a ∈ S}={p+

n∑i=1

ti(ai − p)∣∣∣n ∈ N, ai ∈ S, ti ∈ F

}.

Finally note that

p+n∑i=1

ti(ai − p) =(1−

n∑i=1

ti)p+

n∑i=1

tiai =n∑i=0

siai

where a0 = p, s0 =(1−

n∑i=1

ti), and si = ti for i ≥ 1.

1.14 Definition: Let ∅ 6= S ⊆ W . An affine combination on S is a point in 〈S〉, thatis a point in W of the form

p =n∑i=0

siai where n ∈ N , ai ∈ S , si ∈ F ,n∑i=0

si = 1.

3

1.15 Definition: Let ∅ 6= S ⊆ W . We say that S is affinely independent when for alln ∈ N, for all distinct a0, a1, · · · an ∈ S and for all s0, s1, · · · , sn ∈ F,

ifn∑i=0

siai = 0 andn∑i=0

si = 0 then every si = 0.

Otherwise we say that S is affinely dependent.

1.16 Note: Let ∅ 6= S ⊆ W . Then S is affinely independent if and only if every elementin 〈S〉 can be expressed uniquely as an affine combination on S.

1.17 Theorem: Let ∅ 6= S ⊆ W , let p ∈ S, and let A ={a− p

∣∣ a ∈ S \ {p}}. Then S isaffinely independent if and only if A is linearly independent.

Proof: We prove one direction of the if and only if statement and leave the proof of theother direction as an exercise. Suppose that S is affinely independent. Let n ∈ Z+, let

u1, u2, · · · , un be distinct elements in A, let t1, · · · , tn ∈ F and suppose thatn∑i=1

tiui = 0.

Note that A is a set of non-zero vectors so each ui 6= 0. Let ai = ui + p and note thata1, · · · , an are distinct elements in S with each ai 6= p. Let a0 = p, let si = ti for 1 ≤ i ≤ nand let s0 = −

n∑i=1

si. Note that

n∑i=1

tiui = 0 ⇐⇒n∑i=1

ti(ai − p) = 0 ⇐⇒n∑i=0

siai = 0.

Since S is affinely independent we have si = 0 for 0 ≤ i ≤ n and hence ti = 0 for 1 ≤ i ≤ n.

1.18 Corollary: Let a0, a1, · · · , ak be distinct points in W . Let P = 〈a0, a1, · · · , al〉. Then{a0, a1, · · · , al} is affinely independent if and only if dim(P ) = l.

1.19 Note: For the rest of this section, we let W denote a fixed vector space over R.

1.20 Definition: For a, b ∈W , the line segment between a and b in W is the set

[a, b] ={a+ t(b− a)

∣∣t ∈ R, 0 ≤ t ≤ 1}

={sa+ tb

∣∣0 ≤ s, t ∈ R, s+ t = 1}.

1.21 Definition: A non-empty set ∅ 6= S ⊆W is called convex when it has the propertythat for all a, b ∈ S, we have [a, b] ⊆ S.

1.22 Theorem: The intersection of a set of convex sets in W is either empty or convex.

Proof: Let A be a set, and for each α ∈ A let Sα be a convex set in W . Let S =⋂α∈A

Sα.

Suppose that S 6= ∅. Let a, b ∈ S. Then a, b ∈ Sα for all α ∈ A. Since each Sα is convex,it follows that [a, b] ⊆ Sα for every α ∈ A, and so we have [a, b] ⊆ S.

1.23 Definition: Let ∅ 6= S ⊆ W . The convex hull of S in W , denoted by [S], isthe smallest convex set in W which contains S. Equivalently, [S] is the intersection of allconvex sets in W which contain S.

4

1.24 Theorem: Let ∅ 6= S ⊆W . Then

[S] ={ m∑i=0

siai

∣∣∣m ∈ N, ai ∈ S, 0 ≤ si ∈ R,m∑i=0

si = 1}.

Proof: Let T denote the set on the right. We claim that T is convex. Let x, y ∈ T , say

x =m∑i=0

siai and y =m∑i=0

tiai where m ∈ N, 0 ≤ si, ti and∑si =

∑ti = 1 (we can

use the same upper limits in the sums for x and y because some of the coefficients si, tican be zero). Let z ∈ [x, y], say z = x + r(y − x) where 0 ≤ r ≤ 1. Then we havez =

∑siai + r

(∑tiai −

∑siai

)=∑riai where ri = si + r(ti − si). Since 0 ≤ si and

0 ≤ ti and ri ∈ [si, ti], we must have ri ≥ 0. Also, we have∑ri =

∑si+r

(∑ti−

∑si)

=1 + r(1− 1) = 1, and so z =

∑riai ∈ T . Thus T is convex, as claimed. Since T is convex,

and clearly S ⊆ T , we have [S] ⊆ T .Let C be a convex set with S ⊆ C. For each k ∈ N, let

Tk ={ k∑i=0

siai

∣∣∣ai ∈ S, 0 ≤ si ∈ R,k∑i=0

si = 1}.

We claim that each Tk ⊆ C. Note that T0 = S ⊆ C. Fix k ≥ 1 and suppose, inductively,

that Tk−1 ⊆ C. Let x ∈ Tk, say x =k∑i=0

siai with 0 ≤ si,∑si = 1. If sk = 1 then x = ak

and so x ∈ S ⊆ C. Suppose that sk 6= 1. Let y =k−1∑i=0

si1−sk ai. Note that each si

1−sk ≥ 0 and

thatk−1∑i=0

si1−sk = 1

1−sk

k−1∑i=0

si = 11−sk (1 − sk) = 1 and so we have y ∈ Tk−1 ⊆ C. Also, we

have (1− sk)y =k−1∑i=0

siai = x− skak and so x = (1− sk)y+ skak = y+ sk(ak−y) ∈ [y, ak].

Since y ∈ Tk−1 ⊆ C and ak ∈ C, x ∈ [y, ak] and C is convex, we have x ∈ C. Thus Tk ⊆ C.By induction, we have Tk ⊆ C for all k ∈ N, and hence T =

⋃∞k=0 Tk ⊆ C. Since T is

contained in every convex set C with S ⊆ C, it follows that T ⊆ [S].

1.25 Definition: Let ∅ 6= S ⊆ W . A convex combination on S is a point in [S], thatis a point of the form

p =m∑i=0

siai where m ∈ N , ai ∈ S , 0 ≤ si ∈ R ,m∑i=0

si = 1 .

1.26 Definition: Let l ∈ N. An (ordered, non-degenerate) l-simplex in W is a convexset of the form [a0, a1, · · · , al] where (a0, a1, · · · , al) is an ordered (l + 1)-tuple of distinctpoints ai ∈W such that {a0, a1, · · · , al} is affinely independent. A 0-simplex is sometimescalled a point (although it is actually a one-element set containing a point), a 1-simplexis called a line segment, a 2-simplex is called a triangle, and a 3-simplex is called atetrahedron.

1.27 Definition: Let S = [a0, a1, · · · , al] be an l-simplex in W . For each pair (j, k) with0 ≤ j < k ≤ l, the medial hyperplane Mj,k of S is given by

Mj,k =⟨12 (aj + ak), ai

∣∣i 6= j, k⟩.

5

1.28 Note: Given an l-simplex [a0, a1, · · · , al] and a pair (j, k) with 1 ≤ j < k ≤ l, notethat the set

{12 (aj + ak), ai

∣∣i 6= j, k}

is affinely independent. Indeed if s · 12 (aj + ak) +∑i 6=j,k

siai = 0 with s +∑si = 0 then if we let sj = sk = 1

2s then we havel∑i=0

siai = 0

with∑si = 0, and so each si = 0 (including sj and sk) because {a0, a1, · · · , al} is affinely

independent. It follows that dim(Mj,k) = l−1. We remark that when l 6= n, the affine spaceMj,k is not a hyperplane in W but rather a hyperplane in the affine span 〈a0, a1, · · · , al〉.

1.29 Theorem: Let [a0, a1, · · · , al] be an l-simplex in W . Then the medial hyperplanesMj,k have a unique point of intersection g, called the centroid of the simplex, which isgiven by

g = 1l+1

l∑i=0

ai.

Proof: First we show that the point g = 1l+1

l∑i=0

ai lies on each medial hyperplane Mj,k.

For 1 ≤ j < k ≤ n we have

g = 1l+1

l∑i=0

ai = 1l+1 (aj + ak) + 1

l+1

∑i 6=j,k ai = 2

l+1 ·12 (aj + ak) +

∑i 6=j,k

1l+1 ai.

The sum of the coefficients is 2l+1 + (l − 1) 1

l+1 = l+1l+1 = 1 and so g ∈Mj,k.

To show that g is the unique point which lies in every medial hyperplane Mj,k, weshall show that there can be at most one point which lies in each medial hyperplane M0,k.To do this we first show that ak /∈M0,k. Suppose, for a contradiction, that ak ∈M0,k, sayak = s· 12 (a0+ak)+

∑i 6=0,k

siai with s+∑i 6=0,k

si = 1. Then by letting s0 = s2 and sk = s

2−1 we

obtainl∑i=0

siai = 0 with∑si = 0. Since {a0, a1, · · · , al} is affinely independent, it follows

that easy si = 0. But it is not possible to have both 0 = s0 = s2 and 0 = sk = s

2 − 1, so weobtain the desired contradiction. Thus ak /∈M0,k.

To complete the proof, we shall show that there can be at most one point whichlies in each of the medial hyperplanes M0,k. We do this by a dimension count. Let

Pk =⋂ki=1M0,k. Note that Pk 6= ∅ since we know that g ∈ Pk, and so Pk is an affine

space. For k ≥ 2 we have Pk = Pk−1 ∩M0,k so that Pk ⊆ Pk−1, and we have ak ∈ Pk−1but ak /∈ Pk and so Pk ⊂6=Pk−1. Thus we have

M0,1 = P1⊃6= P2

⊃6= · · · ⊃6= Pl.

Since dim(P1) = dim(M0,1) = l − 1, and since Pk ⊂6=Pk−1 so that dim(Pk) < dimPk−1 for

all k ≥ 2, we must have dim(Pk) ≤ l − k for all k. In particular dim(Pl) ≤ 0 and hencedim(Pl) = 0 and so Pl is a one-element set containing a point, indeed Pl = {g}.

6

2. The Dot Product, Norm, Angle and Orthogonal Projections in Rn

2.1 Definition: For vectors x, y ∈ Rn we define the dot product of x and y to be

x. y = yTx =n∑i=1

xiyi .

2.2 Theorem: (Properties of the Dot Product) For all x, y, z ∈ Rn and all t ∈ R we have

(1) (Bilinearity) (x+ y). z = x. z + y . z , (tx). y = t(x. y)x. (y + z) = x. y + x. z , x. (ty) = t(x. y),

(2) (Symmetry) x. y = y .x, and(3) (Positive Definiteness) x.x ≥ 0 with x.x = 0 if and only if x = 0.

Proof: The proof is left as an exercise.

2.3 Definition: For a vector x ∈ Rn, we define the length (or norm) of x to be

|x| =√x.x =

√n∑i=1

xi2.

We say that x is a unit vector when |x| = 1.

2.4 Theorem: (Properties of Length) Let x, y ∈ Rn and let t ∈ R. Then

(1) (Positive Definiteness) |x| ≥ 0 with |x| = 0 if and only if x = 0,(2) (Scaling) |tx| = |t||x|,(3) |x± y|2 = |x|2 ± 2(x. y) + |y|2.(4) (The Polarization Identities) x. y = 1

2

(|x+ y|2 − |x|2 − |y|2

)= 1

4

(|x+ y|2 − |x− y|2

),

(5) (The Cauchy-Schwarz Inequality) |x. y| ≤ |x| |y| with |x. y| = |x| |y| if and only if theset {x, y} is linearly dependent, and(6) (The Triangle Inequality) |x+ y| ≤ |x|+ |y|.

Proof: We leave the proofs of Parts (1), (2) and (3) as an exercise, and we note that(4) follows immediately from (3). To prove part (5), suppose first that {x, y} is linearlydependent. Then one of x and y is a multiple of the other, say y = tx with t ∈ R. Then

|x. y| = |x. (tx)| = |t(x.x)| = |t| |x|2 = |x| |tx| = |x| |y|.

Suppose next that {x, y} is linearly independent. Then for all t ∈ R we have x + ty 6= 0and so

0 6= |x+ ty|2 = (x+ ty). (x+ ty) = |x|2 + 2t(x. y) + t2|y|2.

Since the quadratic on the right is non-zero for all t ∈ R, it follows that the discriminantof the quadratic must be negative, that is

4(x. y)2 − 4|x|2|y|2 < 0.

Thus (x. y)2 < |x|2|y|2 and hence |x. y| < |x| |y|. This proves part (5).Using part (5) note that

|x+y|2 = |x|2 +2(x. y)+ |y|2 ≤ |x+y|2 +2|x. y|+ |y|2 ≤ |x|2 +2|x| |y|+ |y|2 =(|x|+ |y|

)2and so |x+ y| ≤ |x|+ |y|, which proves part (6).

7

2.5 Definition: For points a, b ∈ Rn, we define the distance between a and b to be

dist(a, b) = |b− a|.

2.6 Theorem: (Properties of Distance) Let a, b, c ∈ Rn. Then

(1) (Positive Definiteness) dist(a, b) ≥ 0 with dist(a, b) = 0 if and only if a = b,(2) (Symmetry) dist(a, b) = dist(b, a), and(3) (The Triangle Inequality) dist(a, c) ≤ dist(a, b) + dist(b, c).


2.7 Definition: For nonzero vectors 0 6= x, y ∈ Rn, we define the angle between x and yto be

θ(x, y) = cos−1(x. y|x| |y|

)∈ [0, π].

Note that θ(x, y) = π2 if and only if x. y = 0. For vectors x, y ∈ Rn, we say that x and y

are orthogonal when x. y = 0.

2.8 Theorem: (Properties of Angle) Let 0 6= x, y ∈ Rn. Then

(1) θ(x, y) ∈ [0, π] with

{θ(x, y) = 0 if and only if y = tx for some t > 0, and

θ(x, y) = π if and only if y = tx for some t < 0,

(2) (Symmetry) θ(x, y) = θ(y, x),

(3) (Scaling) θ(tx, y) = θ(x, ty) =

{θ(x, y) if 0 < t ∈ R,

π − θ(x, y) if 0 > t ∈ R,(4) (The Law of Cosines) |y − x|2 = |x|2 + |y|2 − 2|x| |y| cos θ(x, y),(5) (Pythagoras’ Theorem) θ(x, y) = π

2 if and only if |y − x|2 = |x|2 + |y|2, and

(6) (Trigonometric Ratios) if (y− x).x = 0 then cos θ(x, y) = |x||y| and sin θ(x, y) = |y−x|

|y| .

Proof: The Law of Cosines follows from the identity |y − x|2 = |y|2 − 2(y .x) + |x|2 andthe definition of θ(x, y). Pythagoras’ Theorem is a special case of the Law of Cosines. WeProve Part (6). Let 0 6= x, y ∈ Rn and write θ = θ(x, y). Suppose that (y − x).x = 0.Then we have y .x− x.x = 0 so that x. y = |x|2, and so we have

cos θ =x. y|x| |y|

=|x|2

|x| |y|=|x||y|

.

Also, by Pythagoras’ Theorem we have |x|2 + |y − x|2 = |y|2 so that |y|2 − |x|2 = |y − x|2,and so

sin2 θ = 1− cos2 θ = 1− |x|2

|y|2=|y|2 − |x|2

|y|2=|y − x|2

|y|2.

Since θ ∈ [0, π] we have sin θ ≥ 0, and so taking the square root on both sides gives

sin θ =|y − x||y|

.

2.9 Definition: For points a, b, c ∈ Rn with a 6= b and b 6= c we define

∠abc = θ(a− b, c− b).

8

2.10 Definition: Let U ⊆ Rn be a subspace. We define the orthogonal complementof U in Rn to be

U⊥ ={x ∈ Rn

∣∣x.u = 0 for all u ∈ U}.

2.11 Theorem: (Properties of the Orthogonal Complement) Let U ⊆ Rn be a subspace,let S ⊆ U and let A ∈Mk×n(R). Then

(1) U⊥ is a vector space,(2) If U = Span (S) then U⊥ =

{x ∈ Rn

∣∣x · u = 0 for all u ∈ S}

,(3) (RowA)T = NullA.(4) dim(U) + dim(U⊥) = n(5) U ⊕ U⊥ = Rn,(6) (U⊥)⊥ = U ,(7) (NullA)⊥ = RowA.

Proof: Note that 0 ∈ U⊥ since 0.u = 0 for all u ∈ U . If x, y ⊂ U⊥ so that x.u = 0and y .u = 0 for all u ∈ U then we have (x + y).u = x.u + y .u = 0 for all u ∈ Uand so x + y ∈ U⊥. If x ∈ U⊥ so that x.u = 0 for all u ∈ U and t ∈ R then we have(tx).u = t(x.u) = 0 for all u ∈ U and so tu ∈ U⊥. This shows that U⊥ is a subspace ofRn, proving part (1).

To prove part (2), let T ={x ∈ Rn

∣∣x.u = 0 for all u ∈ S}

. It is clear that U⊥ ⊆ T .

Let x ∈ T . Let u ∈ U = Span (S), say u =n∑i=1

tiui with each ti ∈ R and each ui ∈ S.

Then x.u = x. n∑i=1

tiui =n∑i=1

ti(x.ui) = 0. Thus x ∈ U⊥ and so we have T ⊆ U⊥.

To prove part (3), let v1, v2, · · · , vn be the rows of A. Note that Ax =

x. v1...

x. vn

so

we have x ∈ NullA ⇐⇒ x. vi = 0 for all i ⇐⇒ x ∈ Span {v1, v2, · · · , vk}⊥ = (RowA)⊥

by part (2).Part (4) follows from part (3) since if we choose A so that RowA = U then we have

dim(U) + dim(U⊥) = dim RowA+ dim(RowA)⊥ = dim RowA+ dim NullA = n.To prove part (5), in light of part (4), it suffices to show that U ∩ U⊥ = {0}. Let

x ∈ U ∩ U⊥. Since x ∈ U⊥ we have x.u = 0 for all u ∈ U . In particular, since x ∈ U wehave x.x = 0, and hence x = 0. Thus U ∩ U⊥ = {0} and so U ⊕ U⊥ = Rn.

To prove part (6), let x ∈ U . By the definition of U⊥ we have x. v = 0 for all v ∈ U⊥.By the definition of (U⊥)⊥ we see that x ∈ (U⊥)⊥. Thus U ⊆ (U⊥)⊥. By part (4) weknow that dimU + dimU⊥ = n and also that dimU⊥ + dim(U⊥)⊥ = n. It follows thatdimU = n − dimU⊥ = dim(U⊥)⊥. Since U ⊆ (U⊥)⊥ and dimU = dim(U⊥)⊥ we haveU = (U⊥)⊥, as required.

By parts (3) and (6) we have (NullA)⊥ =((RowA)⊥

)⊥= RowA, proving part (7).

9

2.12 Definition: For a subspace U ⊆ Rn and a vector x ∈ Rn, we define the orthogonalprojection of x onto U , denoted by ProjU (x), as follows. Since Rn = U ⊕ U⊥, we canchoose unique vectors u, v ∈ Rn with u ∈ U , v ∈ U⊥ and x = u+ v. We then define

ProjU (x) = u.

Note that since U = (U⊥)⊥, for u and v as above we have ProjU⊥(x) = v. When y ∈ Rnand U = Span {y}, we also write Projy(x) = ProjU (x) and Projy⊥(x) = ProjU⊥(x).

2.13 Theorem: Let U ⊆ Rn be a subspace and let x ∈ Rn. Then ProjU (x) is the uniquepoint in U which is nearest to x.

Proof: Let u, v ∈ Rn with u ∈ U , v ∈ V and u + v = x so that ProjU (x) = u. Letw ∈ U with w 6= u. Since v ∈ U⊥ and u,w ∈ U we have v .u = v .w = 0 and sov . (w − u) = v .w − v .u = 0. Thus we have

|x− w|2 = |u+ v − w|2 = |v − (w − u)|2 =(v − (w − u)

). (v − (w − u))

= |v|2 − 2 v . (w − u) + |w − u|2 = |v|2 + |w − u|2 = |x− u|2 + |w − u|2 .

Since w 6= u we have |w − u| > 0 and so |x− w|2 > |x− u|2. Thus |x− w| > |x− u|, thatis dist(x,w) > dist(x, u), so u is the vector in U nearest to x, as required.

2.14 Theorem: For any matrix A ∈ Mn×l(R) we have Null(ATA) = Null(A) andCol(ATA) = Col(AT ) so that nullity(ATA) = nullity(A) and rank(ATA) = rank(A).

Proof: If x ∈ Null(A) then Ax = 0 so ATAx = 0 hence x ∈ Null(ATA). This showsthat Null(A) ⊆ Null(ATA). If x ∈ Null(ATA) then we have ATAx = 0 which implies that|Ax|2 = (Ax)T (Ax) = xTATAx = 0 and so Ax = 0. This shows that Null(ATA) ⊆ Null(A).Thus we have Null(ATA) = Null(A). It then follows that

Col(AT )=Row(A)=Null(A)⊥=Null(ATA)⊥=Row(ATA)=Col((ATA)T

)=Col(ATA).

2.15 Theorem: Let A ∈Mn×l(R), let U = Col(A) and let x ∈ Rn. Then

(1) the matrix equation ATA t = ATx has a solution t ∈ Rl, and for any solution t we have

ProjU (x) = At,

(2) if rank(A) = l then ATA is invertible and

ProjU (x) = A(ATA)−1ATx.

Proof: Note that U⊥ = (ColA)⊥ = Row(AT )⊥ = Null(AT ). Let u, v ∈ Rn with u ∈ U ,v ∈ U⊥ and u+ v = x so that ProjU (x) = u. Since u ∈ U = ColA we can choose t ∈ Rl sothat u = At. Then we have x = u+ v = At+ v. Multiply by AT to get AT = ATAt+AT v.Since v ∈ U⊥ = Null(AT ) we have AT v = 0 so ATA t = ATx. Thus the matrix equationATA t = ATx does have a solution t ∈ Rl.

Now let t ∈ Rl be any solution to ATA t = Atx. Let u = At and v = x− u. Note thatx = u+ v, u = At ∈ Col(A) = U , and AT v = AT (x−u) = AT (x−At) = ATx−ATA t = 0so that v ∈ Null(AT ) = U⊥. Thus ProjU (x) = u = At, proving part (1).

Now suppose that rank(A) = l. Since ATA ∈Ml×l(R) with rank(ATA) = rank(A) = l,the matrix ATA is invertible. Since ATA is invertible, the unique solution t ∈ Rl to thematrix equation ATA t = ATx is the vector t = (ATA)−1ATx, and so from Part (1) wehave ProjU (x) = At = A(ATA)−1ATx, proving Part (2).

10

2.16 Definition: For a subset A ⊆ Rn, we say that A is orthogonal when x. y = 0 forall x, y ∈ A with x 6= y. We say that A is orthonormal when A is orthogonal and |x| = 1for every x ∈ A.

2.17 Note: Let u1, · · · , ul ∈ Rn, let A = {u1, · · · , ul} and let A = (u1, · · · , ul) ∈Mn×l(R).Then

ATA =

u1T

...ulT

(u1, · · · , ul) =

u1 .u1 u1 .u2 · · · u1 .ulu2 .u1 u2 .u2 · · · u2 .ul

......

...ul .u1 ul .u2 · · · ul .ul

.

It follows that A is orthogonal if and only if ATA is diagonal, in which case we haveATA = diag(|u1|2, |u2|2, · · · , |ul|2), and A is orthonormal if and only if ATA = I.

2.18 Note: Recall that when A = {u1, u2, · · · , ul} is a basis for a vector space U over a

field F , a vector x ∈ U can be written uniquely as a linear combination x =l∑i=1

tiui with

each ti ∈ F , and then we define the coordonate vector of x with respect to A to be

[x]A = t = (t1, t2, · · · , tl)T ∈ F l.

2.19 Theorem: Let u1, u2, · · · , ul ∈ Rn, let A = {u1, u2, · · · , ul}, let U = SpanA, and letx ∈ Rn Then

(1) if A is orthogonal with each ui 6= 0 then A is a basis for U and

[x]A =

(x.u1|u1|2

,x.u2|u2|2

, · · · , x.ul|ul|2

)T, and

(2) if A is orthonormal then A is a basis for U and

[x]A =(x.u1, x.u2, · · · , x.ul

)T.

Proof: Suppose A is orthogonal with each ui 6= 0. Let A = (u1, u2, · · · , ul) ∈ Mn×l(R)so that U = Col(A). Since A is orthogonal we have ATA = diag(|u1|2, · · · , |ul|2). Sinceeach ui 6= 0 we see that ATA is invertible. Since rank(A) = rank(ATA) = l, the columnsof A are linearly independent, so A is a basis for U . Write x as a linear combination

x =l∑i=1

tiui = At with t ∈ Rl. Then we have ATx = ATA t and so

[x]A = t = (ATA)−1ATx = diag(|u1|2, · · · , |ul|2

)−1u1T

...ulT

x

= diag(

1|u1|2 , · · · ,

1|ul|2

)x.u1...

x.ul

=

x.u1|u1|2

...x.ul|ul|2

This proves part (1), and part (2) follows immediately from part (1).

11

2.20 Theorem: Let u1, u2, · · · , ul ∈ Rn, let A = {u1, u2, · · · , ul}, let U = SpanA, and letx ∈ Rn. Then

(1) if A is orthogonal with each ui 6= 0 then we have

ProjU (x) =l∑i=1

x.ui|ui|2

ui,

(2) if A is orthonormal then

ProjU (x) =

l∑i=1

(x.ui)ui.

Proof: Suppose that A is orthogonal with each ui 6= 0. Let A = (u1, u2, · · · , ul) ∈Mn×l(R)so that U = Col(A) and we have ATA = diag(|u1|2, · · · , |ul|2), which is invertible. Then

ProjU (x) = A (ATA)−1ATx =(u1, · · · , ul

)diag

(1

|u1|2, · · · , 1

|ul|2

)u1T

...ulT

x

=

(u1|u1|2

, · · · , ul|ul|2

)x.u1...

x.ul

=x.u1|u1|2

u1 + · · ·+ x.ul|ul|2

ul.

This proves part (1), and part (2) follows immediately from part (1).

12

3. Applications of Orthogonal Complements and Orthogonal Projection

3.1 Definition: For two affine spaces P and Q in Rn, we define the distance between Pand Q to be

dist(P,Q) = min{

dist(a, b)∣∣a ∈ P, b ∈ Q} .

3.2 Theorem: Let p and q be points in Rn, let U and V be subspaces of Rn, and letP = p+ U and Q = q + V . Then

dist(P,Q) =∣∣∣Proj(U+V )⊥(p− q)

∣∣∣ .Proof: We have

dist(P,Q) = min{

dist(x, y)∣∣x ∈ P, y ∈ Q}

= min{

dist(p+ u, q + v)∣∣u ∈ U, v ∈ V }

= min{|(q + v)− (p+ u)|

∣∣u ∈ U, v ∈ V }= min

{|(q − p)− (u− v)|

∣∣u ∈ U, v ∈ V }= min

{|(q − p)− w|

∣∣w ∈ U + V}

=∣∣(q − p)− ProjU+V (q − p)

∣∣=∣∣Proj(U+V )⊥(q − p)

∣∣where, on the second last line, we used the fact that ProjU+V (q− p) is the (unique) pointon U + V which is nearest to q − p.

3.3 Definition: For two subspaces U, V ⊆ Rn, we define the angle between U and V ,which we write as θ(U, V ), as follows.

(1) If U ⊆ V or V ⊆ U then we define θ(U, V ) = 0.(2) Otherwise, if U ∩ V = {0} then we define

θ(U, V ) = min{θ(u, v)

∣∣∣0 6= u ∈ U, 0 6= v ∈ V},

(3) and if U ∩ V = W 6= {0} then we define θ(U, V ) = θ(U ∩W⊥, V ∩W⊥), noting that(U ∩W⊥) ∩ (V ∩W⊥) = (U ∩ V ) ∩W⊥ = W ∩W⊥ = {0}.We define the angle between two affine spaces in Rn to be the angle between their asso-ciated vector spaces.

13

3.4 Theorem: Let {0} 6= U, V ⊆ Rn be non-trivial subspaces with U ∩ V = {0}. Then

(1) In the case that dim(U) = 1 with U = Span {u} where u ∈ Rn with |u| = 1, we have

cos θ(U, V ) =∣∣ProjV (u)

∣∣.(2) In general, we have

cos θ(U, V ) = maxu∈U,|u|=1

∣∣ProjV (u)∣∣.

Proof: To prove part (1), suppose that U = Span {u} where u ∈ Rn with |u| = 1. Sinceevery nonzero vector in U is of the form tu for some 0 6= t ∈ R, by the definition of θ(U, V )we have

θ(U, V ) = min{θ(tu, w)

∣∣0 6= t ∈ R, 0 6= w ∈ V}.

Since θ(tu, w) = θ(u,±w)(indeed when t > 0 we have θ(tu, w) = θ(u,w) and when t < 0

we have θ(tu, w) = θ(u,−w))

it follows that

θ(U, V ) = min{θ(u,w)

∣∣0 6= w ∈ V}.

If u ∈ V ⊥ then ProjV (u) = 0 and for all 0 6= w ∈ V we have u.w = 0 so that θ(u,w) = π2 ,

and so θ(U, V ) = min{θ(u,w)

∣∣0 6= w ∈ V}

= π2 , and hence cos θ(U, V ) = 0 =


Suppose that u /∈ V ⊥ and letv = ProjV (u).

Note that v 6= 0 since u /∈ V ⊥. By Trigonometric Ratios, we have

cos θ(u, v) = |v||u| = |v|.

Since cos θ(u, v) ≥ 0 we have θ(u, v) ∈[0, π2

]. Let 0 6= w ∈ V and let

y = Projw(U) = u.w|w|2 w.

If y = 0 then u.w = 0 and so θ(u,w) = π2 ≥ θ(u, v). Suppose that y 6= 0. By Trigono-

metric Ratios, we have cos(u, y) = |y||u| = |y|. Since θ(u, y) ≥ 0 we have θ(u, y) ∈

[0, π2

].

If u.w < 0 so that θ(u,w) = π − θ(u, y) ∈[π2 , π

], then we have θ(u,w) ≥ θ(u, v).

If u.w > 0 so that θ(u,w) = θ(u, y), then by Trigonometric Ratios, and since v is thepoint in V nearest to u, we have

sin θ(u,w) = sin θ(u, y) = |u−y||u| = |u− y| ≥ |u− v| = |u−v|

|u| = sin θ(u, v)

and hence θ(u,w) ≥ θ(u, v). Thus for all 0 6= w ∈ V we have θ(u,w) ≥ θ(u, v), wherev = ProjV (u). It follows that θ(U, V ) = min

{θ(u,w)

∣∣0 6= w ∈ V}

= θ(u, v) and hence

that cos θ(U, V ) = cos θ(u, v) = |v| =∣∣ProjV (u)

∣∣. This completes the proof of Part (1).

To prove Part (2), we no longer assume that U is 1-dimensional. Note that

θ(U, V ) = min06=u∈U

min06=v∈V

θ(u, v)

= minu∈U,|u|=1

min06=w∈Span {u}

min06=v∈V

θ(w, v)

= minu∈U,|u|=1

θ(Span {u}, V

),

and so, by Part (1) we have

cos θ(U, V ) = maxu∈U,|u|=1

cos θ(Span {u}, V

)= maxu∈U,|u|=1


14

3.5 Definition: Let a, b ∈ Rn with a 6= b. The perpendicular bisector of [a, b] is thehyperplane in Rn through the midpoint a+b

2 which is perpendicular to the vector b− a, in

other words it is the hyperplane in Rn given by the equation(x− a+b

2

). (b− a) = 0.

3.6 Theorem: a point x ∈ Rn lies on the perpendicular bisector of [a, b] if and only if xis equidistant from a and b.

Proof: Let P be the perpendicular bisector of [a, b]. Then

x ∈ P ⇐⇒(x− a+b

2

). (b− a) = 0 ⇐⇒(2x− (a+ b)

). (b− a) = 0

⇐⇒ 2x. (b− a) = (a+ b). (b− a)

⇐⇒ 2x. b− 2x. a = a. b− a. a+ b. b− b. a = |b|2 − |a|2

⇐⇒ −2x. a+ |a|2 = −2x. b+ |b|2

⇐⇒ |x|2 − 2x. a+ |a|2 = |x|2 − 2x. b+ |b|2

⇐⇒ |x− a|2 = |x− b|2 ⇐⇒ |x− a| = |x− b| .

3.7 Theorem: Let [a0, a1, · · · , al] be an l-simplex in Rn. For 0 ≤ j < k ≤ n, let Bjk bethe perpendicular bisector of [aj , ak]. Then there is a unique point o in the affine span〈a0, a1, · · · , al〉 which lies in the intersection of all of the perpendicular bisectors Bjk. Thispoint o is called the circumcentre of the simplex.

Proof: For 1 ≤ i ≤ l let ui = ai−a0. For x ∈ 〈a0, a1, · · · , al〉 = a0+Span {u1, u2, · · · , ul}, we

can write x uniquely as x = a0 +l∑i=1

tiui = a0 +At where A = (u1, u2, · · · , ul) ∈Mn×l(R).

For x ∈ 〈a0, a1, · · · , al〉 with x = a0 +At we have

x ∈l⋂

k=1

B0k ⇐⇒(x− a0 + ak

2

). (ak − a0) = 0 for all 1 ≤ k ≤ l

⇐⇒(

(a0 +At)−(a0 + ak−a0

2

). (ak − a0) = 0 for all 1 ≤ k ≤ l

⇐⇒(At− 1

2 uk).uk = 0 for all 1 ≤ k ≤ l

⇐⇒ (At).uk = 12 |uk|

2 for all 1 ≤ k ≤ l

⇐⇒

(At).u1...

(At).ul

= 12

(|u1|2...|ul|2

)

⇐⇒ ATA t = 12 u , where u =

(|u1|2, |u2|2, · · · , |ul|2

)T.

Since {a0, a1, · · · , al} is affinely independent, the set {u1, u2, · · · , ul} is linearly independentso we have rank(ATA) = rank(A) = l, and hence ATA is invertible. Thus there is a uniquepoint o ∈ 〈a0, a1, · · · , al〉 which lies in each bisector B0k for 1 ≤ k ≤ l, namely the point

o = a0 +At = a0 + 12A(ATA)−1u .

Finally, note that for 1 ≤ j < k ≤ l, by the previous theorem, since o ∈ B0j and o ∈ B0k

we have |o−a0| = |a−aj | and |o−a0| = |o−ak| so that |o−aj | = |o−ak|, then by anotherapplication of the previous theorem, since |o− aj | = |o− ak| it follows that o ∈ Bjk.

15

3.8 Theorem: Let F be a field. Let A =

1 a0 a0

2 · · · a0n

1 a1 a12 · · · a1

n

...1 an an

2 · · · ann

with each ai ∈ F .

Then detA =∏

0≤i<j≤n

(aj − ai).

Proof: Let An =

1 a0 a0

2 · · · a0n

1 a1 a12 · · · a1

n

...1 an an

2 · · · ann

. Note that detA1 = det

(1 a01 a1

)= a1−a0.

Suppose, inductively, that detAn−1 =∏

0≤i<j<n

(aj−ai). Note that if ai = aj for some i 6= j,

then An has two equal rows, and so in this case we have detAn = 0 =∏

0≤i<j≤n

(aj − ai).

Suppose that a0, a1, · · · , an are all distinct. Replace an by x and let

f(x) = det

1 a0 a0

2 · · · a0n

1 a1 a12 · · · a1

n

...1 an−1 an−1

2 · · · an−1n

1 x x2 · · · xn

.

By expanding along the last row we see that f(x) is a polynomial of degree n with leading

coefficient equal to C = detAn−1 =∏

0≤i<j<n

(aj − ai). On the other hand, for each value

of i with 0 ≤ i < n, by subtracting the ith row from the last row we see that

f(x) = det

1 a0 a0

2 · · · a0n

1 a1 a12 · · · a1

n

...1 an−1 an−1

2 · · · an−1n

0 x− ai x2 − ai2 · · · xn − ain

= (x− ai) det

1 a0 a0

2 · · · a0n

1 a1 a12 · · · a1

n

...1 an−1 an−1

2 · · · an−1n

0 1 x+ ai · · · xn−1 + · · ·+ x ain−2 + ai

n−1

and so (x− ai) divides f(x). Thus we must have

f(x) = C(x− a0)(x− a1) · · · (x− an−1) =∏

0≤i<j<n

(aj − ai)∏

0≤i<n

(x− ai) .

Replacing x by an gives detAn = f(an) =∏

0≤i<j≤n

(aj − ai), as required.

16

3.9 Definition: The matrix A in the above theorem is called the Vandermonde matrixon a0, a1, · · · , an.

3.10 Corollary: Let F be any field. Let (a0, b0), (a1, b1), · · · , (an, bn) be ordered pairs ofelements in F with the ai all distinct. Then there exists a unique polynomial f ∈ Pn(F )with f(ai) = bi for all i.

Proof: Suppose that a0, a1, · · · , an are all distinct, and let b0, b1, · · · , bn be arbitrary. Letf ∈ Pn(F ), say f(x) = c0 + c1x+ · · · cnxn. Then we have

f(ai) = bi for all i ⇐⇒ c0 + c1ai + c2ai2 + · · ·+ cnai

n = bi for all i

⇐⇒ Ac = b

where b = (b0, b1, · · · , bn)T , c = (c0, c1, · · · , cn)T , and A is the Vandermonde matrix ona0, · · · , an. By the above theorem, we have detA =

∏(aj − ai). Since a0, a1, · · · , an are all

distinct, detA 6= 0, so A is invertible and the equation Ac = b has a unique solution c.

3.11 Theorem: Let n, l ∈ Z+. Given n ordered pairs (a1, b1), (a2, b2), · · · , (an, bn) ∈ R2

such that at least l + 1 of the ai are distinct, there exists a unique polynomial f ∈ Pl(R)

which minimizes the sumn∑i=1

(f(ai)− bi

)2. This polynomial f is called the least-squares

best fit polynomial for the data points (ai, bi).

Proof: For f(x) = c0 + c1x+ · · ·+ clxl, we have f(a1)

...f(an)

=

c0 + c1a1 + c2 a12 + · · ·+ cl a1

l

...c0 + c1an + c2 an

2 + · · ·+ cl anl

= Ac

where

A =

1 a1 a1

2 · · · a1l

1 a2 a22 · · · a2

l

...1 an an

2 · · · anl

∈Mn×(l+1)(R) and c =

c0c1...cl

.

Note that the sumn∑i=1

(f(ai)−bi

)2is the square of the distance between b = (b1, b2, · · · , bn)T

and f(a) =(f(a1), f(a2), · · · , f(an)

)T= Ac, so to minimize the sum we need to choose

c to minimize the diastance |b − Ac|. To do this Ac must be the (unique) point in ColAwhich is nearest to b, that is

Ac = ProjColA(b).

Since l + 1 of the ai are distinct, it follows that the corresponding rows of A form aVandermonde matrix on l + 1 distinct points. This (l + 1)× (l + 1) Vandermonde matrixis invertible by Theorem 3.8, so these l + 1 rows are linearly independent. It follows thatrankA = l+1 and that the l+1 columns of A are linearly independent. Thus A is injective,and so there is a unique vector c with Ac = ProjColA(b). Indeed from our formula for theorthogonal projection given in Theorem 2.15, we have

c = (ATA)−1AT b.

17

4. The Generalized Cross Product

4.1 Definition: Given vectors u1, u2, · · · , uk ∈ Rn, we define the parallelotope onu1, · · · , uk to be the set

P (u1, · · · , uk) ={ k∑j=1

tiui

∣∣∣0 ≤ ti ≤ 1 for all i}.

We define the volume of this parallelotope, denoted by V (u1, · · · , uk), recursively byV (u1) = |u1| and

V (u1, · · · , uk) = V (u1, · · · , uk−1)∣∣ProjU⊥(uk)

∣∣where U = Span {u1, · · · , uk−1}.

4.2 Theorem: Let u1, · · · , uk ∈ Rn and let A = (u1, · · · , uk) ∈Mn×k(R). Then

V (u1, · · · , un) =√

det(ATA).

Proof: We prove the theorem by induction on k. Note that when k = 1, u1 ∈ Rn andA = u1 ∈ Mn×1(R), we have V (u1) = |u1| =

√u1 .u1 =

√u1Tu1 =

√ATA, as required.

Let k ≥ 2 and suppose, inductively, that when A = (u1, · · · , uk−1) ∈ Mn×k−1 we havedet(ATA) > 0 and V (u1, · · · , uk−1) =

√det(ATA). Let B = (u1, · · · , uk) = (A, uk). Let

U = Span {u1, · · · , uk−1} = Col(A). Let v = ProjU (uk) and w = ProjU⊥(uk). Notethat v ∈ U = Col(A) and w ∈ U⊥ = Null(AT ). Then we have uk = v + w so thatB = (A, v + w). Since v ∈ Col(A), the matrix B can be obtained from the matrix (A,w)by performing elementary column operations of the type Ck 7→ Ck + tCi. Let E be theproduct of the elementary matrices corresponding to these column operations, and notethat B = (A, v + w) = (A,w)E. Since the row operations Ck 7→ Ck + tCi do not alterthe determinant, E is a product of elementary matrices of determinant 1, so we havedet(E) = 1. Since det(E) = 1 and w ∈ Null(AT ) we have

det(BTB) = det(ET (A,w)T (A,w)E

)= det

((AT

wT

)(A w

))= det

(ATA ATwwTA wTw

)=

(ATA 0

0 |w|2)

= det(ATA) |w|2.

By the induction hypothesis, we can take the square root on both sides to get√det(BTB) =

√det(ATA) |w| = V (u1, · · · , uk−1) |w| = V (u1, · · · , uk).

4.3 Note: In the special case that A = (u1, u2, · · · , un) ∈Mn(R), we have

V (u1, · · · , un) =√

det(ATA) =√

det(A)2 =∣∣det(A)

∣∣.4.4 Remark: There is a similar formula for the volume of an l-simplex in Rn. For thel-simplex S = [a0, a1, · · · , al], if we let A = (u1, u2, · · · , ul) ∈Mn×l(R) where uk − ak − a0,then the volume of S is given by

V [a0, a1, · · · , al] =1

l !V (u1, · · · , ul) =

1

l !

√det(ATA).

18

4.5 Definition: Let F be a field. For n ≥ 2 we define the cross product

X : Mn×(n−1)(F ) =n−1∏k=1

Fn → Fn

as follows. Given A = (u1, u2, · · · , un−1) ∈ Mn×(n−1)(F ), we define X(A), also written asX(u1, u2, · · · , un−1), to be the vector in Fn with entries

X(A)j = X(u1, u2, · · · , un−1)j = (−1)n+j |A(j)|

where A(j) ∈Mn−1(F ) is the matrix obtained from A by removing the jth row. For u ∈ F 2

we write X(u) as u×, and for u, v ∈ F 3 we write X(u, v) as u× v.

4.6 Example: Given u ∈ F 2 we have

u× =

(u1u2

)×=

(−u2u1

).

Given u, v ∈ F 3 we have

u× v =

u1u2u3

× v1v2v3

=

∣∣∣u2 v2u3 v3

∣∣∣−∣∣∣u1 v1u3 v3

∣∣∣∣∣∣u1 v1u2 v2

∣∣∣

=

u2v3 − u3v2u3v1 − u1v3u1v2 − u2v1

.

4.7 Note: Because the determinant is n-linear and alternating, it follows that the crossproduct is (n− 1)-linear and alternating. Thus for ui, v, w ∈ Fn and t ∈ F we have

(1) X(u1, · · · , v + w, · · · , un−1) = X(u1, · · · , v, · · · , un−1) +X(u1, · · · , w, · · · , un−1),(2) X(u1, · · · , t uk, · · · , un−1) = tX(u1, · · · , uk, · · · , un−1),(3) X(u1, · · · , uk, · · · , ul, · · · , un−1) = −X(u1, · · · , ul, · · · , uk, · · · , un−1).

4.8 Definition: Recall that for u1, · · · , un ∈ Rn, the set {u1, · · · , un} is a basis for Rn ifand only if det(u1, · · · , un) 6= 0. For an ordered basis A = (u1, · · · , un), we say that A ispositively oriented when det(u1, · · · , un) > 0 and we say that A is negatively orientedwhen det(u1, · · · , un) < 0.

4.9 Theorem: (Properties of the Cross Product) For u1, · · · , un−1, v1, · · · , vn−1, w ∈ Rn,

(1) X(u1, · · · , un−1).w = det(u1, · · · , un−1, w),(2) X(u1, · · · , un−1).uk = 0 for 1 ≤ k < n.(3) X(u1, · · · , un−1) = 0 if and only if {u1, · · · , un−1} is linearly dependent.(4) When w = X(u1, · · · , un−1) 6= 0 we have det(u1, · · · , un−1, w) > 0 so that the n-tuple(u1, · · · , un−1, w) is a positively oriented basis for Rn,(5)

∣∣X(u1, · · · , un−1)∣∣ is equal to the volume of the parallelotope on u1, · · · , un−1,

(6) X(u1, · · · , un−1).X(v1, · · · , vn−1) = det(BTA) where A=(u1, · · · , un−1) ∈Mn×n−1(R)and B=(v1, · · · , vn−1) ∈Mn×n−1(R), and

(7) X(u1, · · ·, un−2, X(v1, · · ·, vn−1)

)=n−1∑i=1

(−1)n+idet((BTA)(i)

)vi where A=(u1, · · ·, un−2)

and B = (v1, · · · , vn−1), and (BTA)(i) is obtained from BTA by removing the ith row.

19

Proof: Since X(u1, · · · , un−1) =n∑i=1

(−1)n+i|A(i)|ei we have

X(u1, u2, · · · , un−1).w =n∑i=1

(−1)n+i|A(i)|wi = det(u1, · · · , un−1, w),

where the last equality follows by expanding the determinant along the last column. Thisproves Part (1), and Part (2) follows from Part (1) since det(u1, · · · , uk, · · · , un−1, uk) = 0.

To prove Part (3), let A = (u1, · · · , un−1). Then {u1, · · · , un−1} is linearly independentif and only rank(A) = n−1 if and only if some set of n−1 rows of A are linearly independentif and only if A(i) is invertible for some index i if and only if X(u1, · · · , un−1) 6= 0.

Part (4) holds because when w = X(u1, · · · , un−1) 6= 0 we have |w|2 > 0 so that

0 < |w|2 = w .w = X(u1, · · · , un−1).w = det(u1, · · · , un−1, w).

To prove Part (6), let x = X(u1, · · · , un−1), y = X(v1, · · · , vn−1), A = (u1, · · · , un−1)and B = (v1, · · · , vn−1). Using Part (1) we see that x. y = det(u1, · · · , un−1, y) = det(A, y)and also x. y = det(v1, · · · , vn−1, x) = det(B, x), and so

(x. y)2 = det(A, y) det(B, x) = det((B, x)T (A, y)

)= det

(BTA BTyxTA xTy

).

By Part (2), x is perpendicular to the columns of A and y is perpendicular to the columnsof B and so we have ATx = 0 = BTy and so

(x. y)2 = det

(BTA 0

0 x. y)

= (x. y) det(BTA).

When x. y 6= 0, we can divide both sides by x. y to get x. y = det(BTA), as required.We shall now provide two proofs to deal with the case in which x. y = 0. For the first

proof, we consider both sides of the above equality, namely (x. y)2 and (x. y) det(BTA),to be polynomials in the entries of the vectors ui and vj . By unique factorization ofpolynomials (in many variables), we obtain (x. y) = det(BTA), as required.

Here is an alternate proof. Suppose that x. y = 0. First we consider the case thatx = 0 or y = 0. In this case, either rank(A) < n−1 or rank(B) < n−1, and in either case wehave rank(BTA) < n−1 so that BTA is not invertible, hence det(BTA) = 0 = x. y. Finally,we consider the case that x. y = 0 with x 6= 0 and y 6= 0. In this case, since x. y = 0 wehave y ∈ Span {x}⊥. Since x 6= 0, the set {u1, · · · , un−1} is linearly independent by Part (3)and so we have y ∈ Span {x}⊥ = Span {u1, · · · , un−1} = Col(A). But also, by Part (2),we have y ∈ Span {v1, · · · , vn−1}⊥ = Col(B)⊥ = Null(BT ). Since 0 6= y ∈ Col(A) we canwrite y = At for some 0 6= t ∈ Rn−1, and since y ∈ Null(BT ) we have 0 = BTy = BTAt.Since t 6= 0 and BTAt = 0 it follows that BTA is not invertble, so again we find thatdet(BTA) = 0 = x. y. This completes the proof of Part (6).

Note that Part (5) follows from Part (6). Indeed when A = (u1, · · · , un−1) we have∣∣X(u1, · · · , un−1)∣∣2 = X(u1, · · · , un−1).X(u1, · · · , un−1) = det(ATA)

and so ∣∣X(u1, · · · , un−1)∣∣ =

√det(ATA) = V (u1, · · · , un−1).

20

In order to prove Part (7), we shall obtain a change of variables formula for the crossproduct. Let A = (u1, · · · , un−1) ∈Mn×(n−1)(R) and let P = (v1, · · · , vn) ∈Mn(R). Note

that the ith entry of PTX(PA) is(PTX(PA)

)i

= viTX(PA) = X(PA). vi = det(PA, vi).

Recall that Cof(P )P = P Cof(P ) = det(P ) I, where Cof(P ) is the cofactor matrix of P ,so we have

(detP )n(PTX(PA)

)i

= det(P Cof(P )

)det(PA, vi) = det(P ) det

(Cof(P )PA,Cof(P )vi

)= det(P ) det

((detP )A, (Cof(P )P )i

))= det(P ) det

((detP )A, (detP )ei

)= (detP )n+1 det(A, ei) = (detP )n+1(−1)n+i detA(i) = (detP )n+1X(A)i.

Thus (detP )nPTX(PA) = (detP )n+1X(A). When P is invertible, we can divide bothsides by (detP )n to get PTX(PA) = (detP )X(A). Even when P is not invertible, we canregard both sides of the equality (detP )nPTX(PA) = (detP )n+1X(A) as polynomials inthe entries of the vectors ui and vj , and then by unique factorization we obtain the changeof variables formula

PTX(PA) = (detP )X(A).

Alternatively, replacing P by PT , we obtain

P X(PTA) = (detP )X(A).

Finally, let us prove Part (7). Let A = (u1, · · · , un−2) and B = (v1, · · · , vn−1), and lety = X(B) = X(v1, · · · , vn−1), so that we have

X(u1, · · · , un−2, X(v1, · · · , vn−1))

= X(A, y).

Let P = (B, y) = (v1, · · · , vn−1, y). Note that

detP = X(v1, · · · , vn−1). y = y . y = |y|2.

By the above change of variables formula, we have

|y|2X(A, y) = (detP )X(A, y) = P X(PT (A, y)

)= P X

((BT

yT

)(A, y

))= P X

(BTA BTyyTA yTy

)= P X

(BTA 0yTA |y|2

)= P

(( n−1∑i=1

(−1)n+i det

((BTA)(i) 0yTA |y|2

)ei

)+ 0 · en

)

= (v1, · · · , vn−1, y)

(n−1∑i=1

(−1)n+i|y|2 det(BTA)(i) ei + 0 · en)

= |y|2n−1∑i=1

(−1)n+i det((BTA)(i)

)vi

Regarding both sides of the equality |y|2X(A, y) = |y|2n−1∑i=1

(−1)n+i det((BTA)(i) vi as poly-

nomials in the entries of the vectors ui and vj , we can divide both sides by |y|2 to obtain

X(A, y) =n−1∑i=1

det((BTA)(i) vi, as required.

21

5. Inner Products, Norms, Distance and Angle

5.1 Note: In this section we shall be primarily interested in vector spaces over R or C.

5.2 Definition: Recall that the set of complex numbers C is defined to be C = R2. In C

we write 1 =

(10

)and i =

(01

), and for x, y ∈ R we write x+ iy =

(xy

). For z = x+ iy

with x, y ∈ R, we say that x and y are the real and imaginary parts of z, and writex = Re(z) and y = Im(z). For z = x + iy and w = u + iv with x, y, u, v ∈ R, we defineaddition and multiplication by

z + w = (x+ u) + i(y + v) , zw = (xu− yv) + i(xv + yu)

and we define the conjugate of z and the length (or norm) of z to be

z = x− iy , |z| =√z z =

√x2 + y2.

These operations make C into a field. For 0 6= z ∈ C, the inverse of z is given by

z−1 =z

|z|.

For a vector z ∈ Cn, we can write z = x + iy with x, y ∈ Rn. We then define z = x − iyand we define z∗ = zT . More generally, for A ∈ Mn×l(C) define the adjoint of A to be

the matrix A∗

= AT ∈Ml×n(C), that isa1,1 a1,2 · · · a1,la2,1 a2,2 · · · a2,l

......

...an,1 an,2 · · · an,l

∗

=

a1,1 a2,1 · · · an,1a1,2 a2,2 · · · an,2

......

...a1,l a2,l · · · an,l

5.3 Definition: There are several products in Cn analogous to the dot product on Rn.For z, w ∈ Cn we define the (complex) dot product of z and w to be

z .w = wT z =n∑i=1

ziwi ∈ C.

As second product can be defined as follows. Given z = x + iy and w = u + iv withx, y, u, v ∈ Rn, we identify Cn with Rn by writing z and w as

zR =(x1, y1, x2, y2, · · · , xn, yn

)T ∈ R2n

wR =(u1, v1, u2, v2, · · · , un, vn

)T ∈ R2n,

and then we define the real dot product of z and w to be

zR .wR = x.u+ y . v =n∑i=1

(xiui + yivi) ∈ R.

Finally, for z, w ∈ Cn we define a third product, called the inner product of z with w by

〈z, w〉 = w∗z =n∑i=1

ziwi ∈ C.

22

5.4 Remark: The latter two products can be used to define distance and angles in Cn.Both give rise to the same definition of distance, but they give rise to different notionsof orthogonality. They are related by zR .wR = Re

(〈z, w〉

). For the moment, we shall

concentrate primarily on the third of these three products, namely the inner product.

5.5 Definition: Let F = R or C. Let W be a vector space over F . An inner productover F is a function 〈 , 〉 : W ×W → F (meaning that if u, v ∈ W then 〈u, v〉 ∈ F ) suchthat for all u, v, w ∈W and all t ∈ F we have

(1) (Sesquilinearity) 〈u+ v, w〉 = 〈u,w〉+ 〈v, w〉 , 〈tu, v〉 = t 〈u, v〉,〈u, v + w〉 = 〈u, v〉+ 〈u,w〉 , 〈u, tv〉 = t 〈u, v〉,

(2) (Conjugate Symmetry) 〈u, v〉 = 〈v, u〉, and(3) (Positive Definiteness) 〈u, u〉 ∈ R and 〈u, u〉 ≥ 0 with 〈u, u〉 = 0 ⇐⇒ u = 0.

For u, v ∈ W , 〈u, v〉 is called the inner product of u with v. An inner product spaceover F is a vector space over F equipped with an inner product. Given two inner productspaces U and V over F , a linear map L : U → V is called a homomorphism of innerproduct spaces (or we say that L preserves inner product) when

⟨L(x), L(y)

⟩= 〈x, y〉

for all x, y ∈ U .

5.6 Definition: Let F = R or C. Let W be a vector space over F . A norm on W is amap | | : W → R such that for all u, v ∈W and all t ∈ F we have

(1) (Scaling) |tu| = |t| |u|,(2) (Positive Definiteness) |u| ≥ 0 with |u| = 0 ⇐⇒ u = 0, and(3) (Triangle Inequality) |u+ v| ≤ |u|+ |v|.For u ∈ W the real number |u| is called the norm (or length) of u, and we say thatu is a unit vector when |u| = 1. A normed linear space over F is a vector spaceover F equipped with a norm. Given two normed linear spaces U and V over F , a linearmap L : U → V is called a homomorphism of normed linear spaces (or we say that Lpreserves norm) when

∣∣L(x)∣∣ = |x| for all x ∈ U .

5.7 Definition: Let S be a set. A metric on S is a map d : S × S → R such that for alla, b, c ∈ S we have

(1) (Symmetry) d(a, b) = d(b, a),(2) (Positive Definiteness) d(a, b) ≥ 0 with d(a, b) = 0 ⇐⇒ a = b, and(3) (Triangle Inequality) d(a, c) ≤ d(a, b) + d(b, c).

For a, b ∈ S, d(a, b) is called the distance between a and b. A metric space is a set whichis equipped with a metric. Given two metric spaces S and T , a map F : S → T is calledan isometry (or we say that F is distance preserving) when d

(F (a), F (b)

)= d(a, a)

for all a, b ∈ S.

5.8 Theorem: Let W be an inner product space over F = R or C and let u, v ∈W . Thenif 〈x, u〉 = 〈x, v〉 for all x ∈W , or if 〈u, x〉 = 〈v, x〉 for all x ∈W then u = v.

Proof: Suppose that 〈x, u〉 = 〈x, v〉 for all x ∈W . Then 〈x, u− v〉 = 〈x, u〉 − 〈x, v〉 = 0 forall x ∈ W . In particular, taking x = u − v we have |u − v|2 = 〈u − v, u − v〉 = 0, and sou = v. Similarly, if 〈u, x〉 = 〈v, x〉 for all x ∈W then u = v.

23

5.9 Theorem: Let F = R or C. Let W be an inner product space over F . For u ∈ Wdefine |u| =

√〈u, u〉. Then

(1) (Scaling) |tu| = |t| |u|,(2) (Positive Definiteness) |u| ≥ 0 with |u| = 0 ⇐⇒ u = 0,(3) |u+ v|2 = |u|2 + 2 Re〈u, v〉+ |v|2,(4) (Polarization Identity) if F = R then 〈u, v〉 = 1

4

(|u+ v|2 − |u− v|2

)and

if F = C then 〈u, v〉 = 14

(|u+ v|2 + i|u+ iv|2 − |u− v|2 − i|u− iv|2

),

(5) (The Cauchy-Schwarz Inequality) |〈u, v〉| ≤ |u| |v| with |〈u, v〉| = |u| |v| if and only if{u, v} is linearly dependent, and

(6) (The Triangle Inequality)∣∣|u| − |v|∣∣ ≤ |u+ v| ≤ |u|+ |v|.

In particular, | | is a norm on W .

Proof: We only prove Part (5) and part of Part (6). To prove Cauchy’s Inequality, supposefirst that {u, v} is linearly dependent. Then one of x and y is a multiple of the other, sayv = tu with t ∈ F . Then we have |〈u, v〉| = |〈u, tu〉| =

∣∣ t 〈u, u〉 = |t| |u|2 = |u| |tu| = |u| |v|.Next we suppose that {u, v} is linearly independent. Then 1 ·v+ t ·u 6= 0 for all t ∈ F ,

so in particular v − 〈v,u〉|u|2 u 6= 0. Thus we have

0 <∣∣∣v − 〈v,u〉|u|2 u

∣∣∣2 =⟨v − 〈v,u〉|u|2 u , v − 〈v,u〉|u|2 u

⟩= 〈v, v〉 − 〈v,u〉|u|2 〈v, u〉 −

〈v,u〉|u|2 〈u, v〉+ 〈v,u〉

|u|2〈v,u〉|u|2 〈u, u〉

= |v|2 − |〈u,v〉|2

|u|2

so that |〈u,v〉|2

|u|2 < |v|2 and hence |〈u, v〉| ≤ |u| |v|. This proves Part (5).

Using Parts (3) and (5), and the inequality |Re(z)| ≤ |z| for z ∈ C (which follows fromPythagoras’ Theorem in R2), we have

|u+v|2 = |u|2+2Re〈u, v〉+ |v|2 ≤ |u|2+2|〈u, v〉|+ |v|2 ≤ |u|2+2|u| |v|+ |v|2 =(|u|2+ |v|2

).

Taking the square root on both sides gives |u+ v| ≤ |u|+ |v|.

5.10 Theorem: Let F = R or C. Let W be a normed linear space over F . For a, b ∈W ,define d(a, b) = |b− a|. Then d is a metric on W .


5.11 Definition: Let F = R or C. Let W be an inner product space over F . For0 6= u, v ∈W , we define the real angle between u and v to be

θR(u, v) = cos−1Re〈u, v〉|u| |v|

∈ [0, π]

and we define the complex angle from u to v to be

θC(u, v) = cos−1〈u, v〉|u| |v|

∈ C.

Here we use the complex cosine given by cos(z) =eiz + e−iz

2for z ∈ C with 0 ≤ Re(z) ≤ π.

For u, v ∈ Cn, we say that u and v are orthogonal when 〈u, v〉 = 0.

24

5.12 Example: The standard inner product on Rn is the dot product x. y = yTx.

5.13 Example: The standard inner product on Cn is given by 〈z, w〉 = w∗z.

5.14 Example: Let 〈 , 〉 be an inner product on a vector space W over F = R or C. LetL : Fn → Fn be any bijective linear map. For u, v ∈ Fn define 〈u, v〉L =

⟨L(u), L(v)

⟩.

Then 〈 , 〉L is another inner product on W .

5.15 Example: The standard inner product on the vector space Mn×l(F ), where F = Ror C, is given by

〈A,B〉 =

⟨a1,1 a1,2 · · · , a1,la2,1 a2,2 · · · a2,l

......

...an,1 an,2 · · · an,l

,

b1,1 b1,2 · · · , b1,lb2,1 b2,2 · · · b2,l

......

...bn,1 bn,2 · · · bn,l

⟩

=∑

1≤i≤n1≤j≤l

ai,jbi,j .

As you can check, this inner product can be expressed more elegantly as

〈A,B〉 = trace(B∗A).

5.16 Example: Let F = R or C. Let FN be the vector space of all functions f : N→ For equivalently, the set of all sequences a = (a0, a1, a2, · · ·) with each ai ∈ F (indeed thesequence a = (a0, a1, · · ·) is equal, by definition, to the function f : N → F given byf(k) = ak). Let F∞ be the subspace

F∞ ={f : N→ F

∣∣∣f(k) = 0 for all but finitely many k ∈ N}

={a = (a0, a1, a2, · · ·)

∣∣∣ each ai ∈ F with ai = 0 for all but finitely many k ∈ N}.

The vector space F∞ has the standard basis{e0, e1, e2, · · ·

}where ek = (ek0, ek1, ek2, · · ·)

has entries eki = δki (the vector space FN, by contrast, has an uncountable basis). Thestandard inner product on F∞ is given by

〈a, b〉 =⟨(a0, a1, · · ·), (b0, b1, · · ·)

⟩=∞∑i=0

aibi.

Note that the sum on the right is a finite sum since only finitely many of the terms ai andbi are nonzero (the same sum would not be well-defined for a, b ∈ FN).

5.17 Example: Let F = R or C. For a, b ∈ R, let C0([a, b], F

)denote the vector space of

all continuous functions f : [a, b]→ F The standard inner product on C0([a, b], F

)is given

by

〈f, g〉 =

∫ b

a

f g.

We recall here that for h : [a, b]→ C given by h(z) = u(z) + iv(z) where u, v : [a, b]→ R,the map h is continuous if and only if both u and v are continuous, and in this case we have∫ bah =

∫ bau + i

∫ bav. Note that this product is positive definite because for a continuos

function f we have∫ ba|f |2 ≥ 0 with

∫ ba|f |2 = 0 ⇐⇒ f = 0.

25

5.18 Example: Let F = R or C. For n ∈ N, let Pn(F ) denote the vector space of allpolynomials of degree at most n with coefficients in F , and P (F ) = F [x] denote the vectorspace of all polynomials (of any degree) with coefficients in F , that is

Pn(F ) ={ n∑i=0

cixi∣∣∣ each ci ∈ F

},

F [x] = P (F ) =∞⋃n=0

Pn(F ) ={ n∑i=0

ci xi∣∣∣n ∈ N, each ci ∈ F

}.

The vector space Pn(F ) has standard basis{

1, x, x2, · · · , xn}

and the vector space P (F )

has standard basis{

1, x, x2, · · ·}

. We define several inner products on Pn(F ). For the

first product, we identify Pn(F ) with Fn+1 by identifying the polynomialn∑i=0

cixi with its

vector of coefficients (c0, c1, · · · , cn)T , and this gives rise to the inner product

〈f, g〉 =⟨ n∑i=0

aixi ,

n∑i=0

bixi⟩

=n∑i=0

aibi.

For the second product, we choose a, b ∈ R with a < b, and then we identify Pn(F )with a subspace of C0

([a, b], F

)by considering each polynomial f ∈ Pn(F ) as function

f : [a, b]→ C, and this gives rise to the inner product

〈f, g〉 =

∫ b

a

f g .

We can define a third inner product on Pn(F ) as follows. We choose n+ 1 distinct pointsa0, a1, · · · , an ∈ F and then we define

〈f, g〉 =n∑i=0

f(ai)g(ai).

Note that this product is positive definite since

n∑i=0

|f(ai)|2 ≥ 0 withn∑i=0

|f(ai)|2 = 0 ⇐⇒ f(ai) = 0 for all i ⇐⇒ f = 0

since the n+ 1 points ai are distinct and since f is a polynomial of degree at most n.

Of these three inner products on Pn(F ), only the second one gives rise to an inner producton the space of all polynomials P (F ).

26

6. Orthogonal Bases, Orthogonal Complement and Orthogonal Projection

6.1 Definition: Let W be an inner product space over F = R or C. For a subset A ⊆W ,we say that A is orthogonal when 〈u, v〉 = 0 for all u, v ∈ A with u 6= v, and we say thatA is orthonormal when A is orthogonal with |u| = 1 for every u ∈ A.

6.2 Example: Let u1, u2, · · · , ul ∈ Cn and let A =(u1, · · · , ul

)∈Mn×l(C). Since

A∗A =

u1∗

...ul∗

(u1 · · · ul ) =

〈u1, u1〉〈u2, u1〉 · · · 〈ul, ul〉......

...〈u1, ul〉〈u2, ul〉 · · · 〈ul, ul〉

it follows that {u1, · · · , ul} is orthogonal if and only if A∗A is diagonal, and {u1, · · · , ul} isorthonormal if and only if A∗A = I.

6.3 Example: Let F = R or C and let a0, a1, · · · , an be distinct points in F. Consider the

vector space Pn = Pn(F ) with the inner product 〈f, g〉 =n∑i=0

f(ai)g(ai). For each index k,

let gk ∈ Pn be given by

gk(x) =

∏i6=k(x− ai)∏i 6=k(ak − ai)

so that gk(ai) = δk,i. For f ∈Pn we have 〈f, gk〉 =n∑i=0

f(ai)g(ai) =n∑i=0

f(ai)δk,i = f(ak). In

particular, we have 〈gj , gk〉 = gj(ak) = δj,k and so the set {g0, g1, · · · , gn} is an orthonormalbasis for Pn(F ).

6.4 Theorem: Let W be an inner product space over F = R or C. Let A ⊆W .

(1) If A is an orthogonal set of nonzero vectors then for x ∈ SpanA with say x =n∑i=1

tiui

where ti ∈ F and ui ∈ A, we have

tk =〈x, uk〉|uk|2

for all indices k, and in particular, A is linearly independent.

(2) If A is orthonormal then for x ∈ SpanA with say x =n∑i=1

tiui where ti ∈ F and ui ∈ A,

we have tk = 〈x, uk〉 for all k, and in particular, A is linearly independent.

Proof: To prove Part (1), suppose that A is an orthogonal set of nonzero vectors and let

x =n∑i=1

tiui with each ti ∈ F and each ui ∈ A. Then for all indices k, since 〈ui, uk〉 = 0

whenever i 6= k we have 〈x, uk〉 =⟨ n∑i=1

tiu1 , uk

⟩=

n∑i=1

ti〈ui, uk〉 = tk〈uk, uk〉 = tk|uk|2

and so tk = 〈x,uk〉|uk|2 , as required. In particular, when x = 0 we find that tk = 0 for all k,

and this shows that A is linearly independent. This proves Part (1), and Part (2) followsimmediately from Part (1).

27

6.5 Theorem: (The Gram-Schmidt Procedure) LetW be a finite or countable dimensionalinner product space over F = R or C. Let A = {u1, u2, · · ·} be an ordered basis for W .Let v1 = u1 and for k ≥ 2 let

vk = uk −k−1∑i=1

〈uk, vi〉|vi|2

vi.

Then the set B = {v1, v2, · · ·} is an orthogonal basis for W with the property that for everyindex k ≥ 1 we have Span {v1, · · · , vk} = Span {u1, · · · , uk}.

Proof: We prove, by induction on k, that {v1, v2, · · · , vk} is an orthogonal basis forSpan {u1, u2, · · · , uk}. When k = 1 this is clear since v1 = u1. Let k ≥ 2 and sup-pose, inductively, that {v1, · · · , vk−1} is an orthogonal basis for Span {u1, · · · , uk−1}. Since

vk = uk −∑k−1i=1

〈uk,vi〉|vi|2 vi, we see that uk is equal to vk plus a linear combination of the

vectors v1, · · · , vi−1, and so we have Span {v1, · · · , vk−1, vk} = Span {v1, · · · , vk−1, uk}. Bythe induction hypothesis, we have Span {v1, · · · , vk−1} = Span {u1, · · · , uk−1} so we have

Span {v1, · · · , vk−1, vk} = Span {v1, · · · , vk−1, uk} = Span {u1, · · · , uk−1, uk}.

It remains to show that the set {v1, v2, · · · , vk} is an orthogonal set. By the inductionhypothesis, we have 〈vj , vi〉 = 0 for all 1 ≤ i < j < k, so it suffices to show that 〈vk, vj〉 = 0for all indices 1 ≤ j < k and indeed, for 1 ≤ j < k we have

〈vk, vj〉 =⟨uk −

k−1∑i=1

〈uk,vi〉|vi|2 vi , vj

⟩= 〈uk, vj〉 −

k−1∑i=1

〈uk,vi〉|vi|2 〈vi, vj〉

= 〈uk, vj〉 −〈uk, vj〉|vj |2

〈vj , vj〉 = 0.

6.6 Corollary: Every finite or countable dimensional inner product space W over F = Ror C has an orthonormal basis.

Proof: Let W be a finite or countable dimensional inner product space over f = R or C.Choose an ordered basis A = {u1, u2, · · ·} for W . Apply the Gram-Schmidt Procedure tothe basis A to obtain an orthogonal basis B = {v1, v2, · · ·} for W . For each index k ≥ 1,let wk = vk

|vk| . Then C = {w1, w2, · · ·} is an orthonormal basis for W .

6.7 Remark: It is not the case that every uncountable dimensional inner product spacehas a basis which is orthonormal.

6.8 Corollary: Let W be a finite or countable dimensional inner product space overF = R or C. Let U ⊆ W be a finite dimensional subspace. Then every orthogonal (ororthonormal) basis A for U extends to an orthogonal (or orthonormal) basis for C.

Proof: Let A = {u1, u2, · · · , ul} be an ordered orthogonal (or orthonormal) basis for U .Extend A to an ordered basis {u1, · · · , ul, v1, v2, · · ·} for W . Apply the Gram-SchmidtProcedure to this basis to obtain an orthogonal basis C = {u1′, · · · , ul′, w1, w2} for W .Verify that since {u1, · · · , ul} is already orthogonal, it follows that the vectors ui are leftunchanged in the Gram Schmidt Procedure so that in fact ui

′ = ui for all indices i, andso the new orthogonal basis C extends the original orthogonal basis A.

28

6.9 Remark: The above corollary does not hold in general in the case that the subspaceU is countable dimensional, as we shall soon see in Example 6.16.

6.10 Corollary: Let F = R or C and let U and V be finite or countable dimensionalinner product spaces over F. Then U and V are isomorphic (as inner product spaces) ifand only if dim(U) = dim(V ). In particular, if dim(U) = n then U is isomorphic to Fnand if dim(U) = ℵ0 then U is isomorphic to F∞.

Proof: Suppose that U and V are isomorphic. Let L : U → V be an isomorphism. LetA = {u1, u2, · · ·} be any basis for U . Since L is a bijective linear map, it follows thatB = {L(u1), L(u2), · · ·} is a basis for V , and that A and B have the same cardinality. Thusdim(U) = dim(V ).

Conversely, suppose dim(U) = dim(V ). Let A = {u1, u2, · · ·} and B = {v1, v2, · · ·}be orthonormal bases for U and V . Let L : U → V be the (unique) bijective linear mapwith L(ui) = vi for all i. Then L preserves inner product because for x, y ∈ U with sayx =

∑i≥1

tiui and y =∑j≥1

tjuj we have

〈x, y〉 =⟨ ∑i≥1

siui ,∑j≥1

tjuj

⟩=

∑i≥1,j≥1

sitj〈ui, uj〉 =∑

i≥1,j≥1sitj δi,j =

∑i≥1

siti

and⟨L(x), L(y)

⟩=⟨L( ∑i≥1

siui), L( ∑j≥1

tjuj)⟩

=⟨ ∑i≥1

siL(ui) ,∑j≥1

tjL(uj)⟩

=⟨ ∑i≥1

sivi ,∑j≥1

tjvj

⟩=

∑i≥1,j≥1

sitj〈vi, vj〉 =∑

i≥1,j≥1sitj δi,j =

∑i≥1

sitj .

6.11 Corollary: Let F=R or C, let U be an n-dimensional inner product space over F,and let A = {u1, · · · , un} be an orthonormal basis for U . Then the map φA : U → Fn

given by φA(x) = [x]A is an isomorphism. In particular, when x =n∑i=1

siui and y =n∑i=1

tiui

so that s = [x]A and t = [y]A, we have 〈x, y〉 = t∗s.

Proof: Taking V = Fn with its standard orthonormal basis B = {e1, · · · , en}, the mapL : U → V with L(ui) = ei, used in the above proof, is precisely the map φA.

29

6.12 Definition: Let W be an inner product space over F = R or C. For a subspaceU ⊆W , we define the orthogonal complement of U in W to be the set

U⊥ ={x ∈W

∣∣〈x, u〉 = 0 for all u ∈ U}.

6.13 Theorem: Let W be an inner product space over F = R or C. Let U ⊆ W be asubspace. Then

(1) U⊥ is a subspace of W ,(2) if A is a basis for U then U⊥ =

{x ∈W

∣∣〈x, u〉 = 0 for all u ∈ A}

,(3) U ∩ U⊥ = {0}, and(4) U ⊆ (U⊥)⊥.

If U is finite dimensional, then we also have

(5) U ⊕ U⊥ = W , and(6) U = (U⊥)⊥.

Proof: We leave the proofs of Parts (1) to (4) as an exercise (they are identical to theproofs of analogous parts of Theorem 2.11). To prove Parts (5) and (6), suppose that Uis finite dimensional. Let A = {u1, u2, · · · , ul} be an orthonormal basis for A. To provePart (5), we need to show that for every x ∈ W there exist unique vectors u, v ∈ W withu ∈ U , v ∈ V and u + v = x. First we prove uniqueness. Let x ∈ W , and suppose thatu ∈ U , v ∈ U⊥ and u+ v = x. Note that for all indices i we have

〈x, ui〉 = 〈u+ v, ui〉 = 〈u, ui〉+ 〈v, ui〉 = 〈u, ui〉.

and so, by Theorem 6.4, we have

u =l∑i=1

〈u, ui〉ui =l∑i=1

〈x, ui〉ui.

This proves uniqueness, since given x ∈W , the vector u must be given by u =l∑i=1

〈x, ui〉uiand then the vector v must be given by v = x− u.

To prove existence, let x ∈ W and choose u and v to be the vectors u =l∑i=1

〈x, ui〉ui

and v = x − u. Then we have u ∈ U and u + v = x, so it suffices to show that v ∈ U⊥.For all indices k we have

〈v, uk〉 = 〈x− u, uk〉 = 〈x, uk〉 − 〈u, uk〉 = 〈x, uk〉 −⟨ l∑i=1

〈x, ui〉ui , uk⟩

= 〈x, uk〉 −l∑i=1

〈x, ui〉〈ui, uk〉 = 〈x, uk〉 −l∑i=1

〈x, ui〉δi,k = 〈x, uk〉 − 〈x, uk〉 = 0.

Since 〈v, uk〉 = 0 for all 1 ≤ k ≤ l, from Part (2) we have v ∈ U⊥. This proves Part (5).Let us prove Part (6). From Part (4), we have U ⊆ (U⊥)⊥. Conversely, let x ∈ (U⊥)⊥.

Using Part (5), we can choose u, v ∈W with u ∈ U , v ∈ V and u+v = x. Since x ∈ (U⊥)⊥

and v ∈ U⊥, we have 〈x, v〉 = 0, and so 0 = 〈x, v〉 = 〈u + v, v〉 = 〈u, v〉 + 〈v, v〉 = 〈v, v〉.Since 〈v, v〉 = 0 we have v = 0 and so x = u+ v = u ∈ U . Thus (U⊥)⊥ ⊆ U , as required.

30

6.14 Example: As an exercise, show that for A ∈ Mn×l(C) we have (NullA)⊥ = ColA∗

and (ColA)⊥ = NullA∗.

6.15 Remark: Parts (5) and (6) of the above theorem do not always hold when U isinfinite dimensional, as the following example shows.

6.16 Example: Let F = R or C. Let W = F∞. Let U ={a = (a0, a1, · · ·)

∣∣ ∞∑i=0

ai = 0}

.

Note that U is a proper subspace of W and it is countable dimensional with countablebasis A = {u1, u2, · · ·} where uk = ek− e0 = (−1, 0, · · · , 0, 1, 0, 0, · · ·). Although U ⊂6=W we

have

U⊥ ={x ∈W

∣∣〈x, uk〉 = 0 for all k}

={x ∈W

∣∣〈x, ek − e0〉 = 0 for all k}

={x ∈W

∣∣xk = x0 for all k}

={

(x0, x1, · · ·) ∈W∣∣x0 = x1 = x2 = · · ·

}= {0}

because for (x0, x1, · · ·) ∈W we have x0 = 0 for all but finitely many indices i. Notice thatin this example we do not have U ⊕ U⊥ = W . Also notice that, although we could applythe Gram-Schmidt Procedure to the basis A to obtain an orthogonal basis B = {v0, v1, · · ·}for U , the basis B cannot be extended to an orthogonal basis for W because there is nononzero vector 0 6= x ∈W with 〈x, vi〉 = 0 for all i.

6.17 Definition: Let W be an inner product space over F = R or C. Let U ⊆ W be afinite dimensional subspace. For x ∈W , we define the orthogonal projection of x ontoU , denoted by ProjU (x), as follows. Since W = U ⊕ U⊥, we can choose unique vectorsu, v ∈W with u ∈ U , v ∈ V and u+ v = x. We then define

ProjU (x) = u.

Since U = (U⊥)⊥, for u and v as above we have ProjU⊥(x) = v. When y ∈ W andU = Span {y}, we also write Projy(x) = ProjU (x).

6.18 Note: Let W be an inner product space over F = R or C. Let U be a finitedimensional subspace of W . Let A = {u1, u2, · · · , ul} be an orthogonal basis for U . Thenfor x ∈W , as in the proof of Part (5) of Theorem 6.13, we see that

ProjU (x) =l∑i=1

〈x, ui〉|ui|2

ui.

6.19 Example: As an exercise, show that for A ∈ Mn×l(C) and U = ColA, givenx ∈ Cn there exists t ∈ Cl such that A∗At = A∗x and that for any such t we haveProjU (x) = At. In particular, when rank(A) = l show that A∗A is invertible so thatProjU (x) = A(A∗A)−1A∗x.

31

6.20 Theorem: Let W be an inner product space over F = R or C. Let U ⊆ W be afinite dimensional subspace. Let x ∈W . Then ProjU (x) is the unique point in U which isnearest to x.

Proof: Let u, v ∈ W be the vectors with u ∈ U , v ∈ V and u + v = x, so that we haveProjU (x) = u. Let w ∈ U with w 6= u. Since 〈w−u, x−u〉 = 〈w−u, v〉 = 〈w, v〉−〈u, v〉 = 0,Pythagoras’ Theorem gives

|x− w|2 = |(x− u)− (w − u)|2 = |x− u|2 + |w − u|2 > |x− u|2

and so |x− w| > |x− u|.

6.21 Example: Find the quadratic polynomial f ∈ P2 = P2(R) which minimizes∫ 1

−1

(f(x)− |x|

)2dx.

Solution: Let W = C0([−1, 1],R

)with inner product given by 〈f, g〉 =

∫ 1

−1 f(x)g(x) dx.Then we need to find the polynomial f ∈ P2 which minimizes dist(f, g) where g(t) = |t|,so we must take

f = ProjP2(g).

Let p0 = 1, p1 = x and p2 = x2 so that {p0, p1, p2} is the standard basis for P2. Apply theGram-Schmidt Procedure to get

q0 = p0 = 1,

q1 = p1 −〈p1, q0〉|q0|2

q0 = x− 〈x, 1〉|1|2

· 1 = x−∫ 1

−1 x dx∫ 1

−1 1 dx· 1 = x− 0

2 · 1 = x,

q2 = p2 −〈p2, q0〉|q0|2

q0 −〈p2, q1〉|q1|2

q1 = x2 − 〈x2, 1〉|1|2

· 1− 〈x2, x〉|x|2

· x

= x2 −∫ 1

−1 x2 dx∫ 1

−1 1 dx· 1−

∫ 1

−1 x3 dx∫ 1

−1 x2 dx

· x = x2 − 2/32 · 1−

02/3 · x = x2 − 1

3 .

Using the orthogonal basis {q0, q1, q2} ={

1, x, x− 13

}for P2, we calculate

f = ProjP2(g) =

〈g, q0〉|q0|

q0 +〈g, q1〉|q1|2

q1 +〈g, q2〉|q2|2

q2

=〈|x|, 1〉|1|2

· 1 +〈|x|, x〉|x|2

· x+〈|x|, x2 − 1

3 〉|x2 − 1

3 |2· (x− 1

3 )

=

∫ 1

−1 |x| dx∫ 1

−1 1 dx· 1 +

∫ 1

−1 x|x| dx∫ 1

−1 x2 dx

· x+

∫ 1

−1 |x|(x2 − 1

3 ) dx∫ 1

−1(x2 − 13 )2 dx

· (x− 13 )

= 12 · 1 + 0

2/3 · x+12−

13

25−

49+

29

· (x2 − 13 ) = 1

2 + 1/68/45 (x2 − 1

3 )

= 12 + 15

16 (x2 − 13 ) = 3

16 + 1516x

2.

32

7. The Dual and Adjoint of a Linear Map

7.1 Definition: For two vector spaces U and V over a field F, we write Hom(U, V ) forthe vector space of linear maps L : U → V . For a vector space U over a field F, the dualof U is the vector space

U∗ = Hom(U,F).

7.2 Theorem: (Dual Basis) Let U be an n-dimensional vector space over a field F. LetA = {u1, · · · , un} be a basis for U . For each index k, let fk ∈ U∗ be the linear mapfk : U → F such that fk(ui) = δk,i. Then the set F = {f1, · · · , fn} is a basis for U∗. Also,for x ∈ U and g ∈ U∗ we have

[x]A =

f1(x)...

fn(x)

and [g]F =

g(u1)...

g(un)

.

Proof: For x =n∑i=1

tiui ∈ U we have

fk(x) = fk( n∑i=1

tiui)

=n∑i=1

fk(ui) =n∑i=1

tiδk,i = tk

and so [x]A =(f1(x), · · · , fn(x)

)T. For g =

n∑i=1

tifi ∈ SpanF , we have

g(uk) =( n∑i=1

tifi

)(uk) =

n∑i=1

tifi(uk) =n∑i=1

fiδi,k = tk.

It follows that F is linearly independent because ifn∑i=1

tifi = 0 then tk =( n∑i=1

tifi

)(uk) = 0

for all k, and it follows that F spans U∗ because given any g ∈ U∗ we can let tk = g(uk)

and then we have g(uk) =( n∑i=1

g(ui)fi

)(uk) for all k, and this implies that g =

n∑i=1

g(ui)fi

so that g ∈ Span F . It also follows that [g]F =(g(u1), · · · , g(un)

)T.

7.3 Definition: The basis F in the above theorem is called the dual basis of A for U∗.

7.4 Remark: If U is a countable dimensional vector space over F and A = {u1, u2, · · ·} isa basis for U , then for each index k we can still let fk ∈ U∗ be the linear map fk : U → Fgiven by fk(ui) = δk,i. Then the set F = {f1, f2, · · ·} is still linearly independent, but itno longer spans U∗. In this case we have

SpanF ∼= F∞ and U∗ ∼= FN.

Indeed every g ∈ U∗ is uniquely determined by the values g(ui), and we can define a vectorspace isomorphism φA : U∗ → FN by φA(g) =

(g(u1), g(u2), · · ·

).

More generally, i U is any vector space over F and A is a basis, then for each u ∈ A wecan let fu ∈ U∗ be the unique linear map fu : U → F such that fu(u) = 1 and fu(v) = 0for v ∈ A with v 6= u. Then the set F =

{fu∣∣u ∈ A} is linearly independent, but when U

is infinite dimensional we have SpanF ⊂6=U∗.

33

7.5 Theorem: (Double Dual) Let U be a vector space over a field F. Define φ : U → (U∗)∗

by φ(u)(g) = g(u) for u ∈ U and g ∈ U∗. Then

(1) φ is an injective linear map, and(2) if U is finite dimensional then φ is bijective.

Proof: The map φ is linear because for all u, v ∈ U we have

φ(u+ v)(g) = g(u+ v) = g(u) + g(v) = φ(u)(g) + φ(v)(g) =(φ(u) + φ(v)

)(g)

for all g ∈ U∗ so that φ(u+ v) = φ(u) + φ(v), and because for all u ∈ U and all t ∈ F wehave

φ(tu)(g) = g(tu) = tg(u) = t(φ(u)(g)

)=(tφ(u)

)(g)

for all g ∈ U∗ so that φ(tu) = tφ(u). The map φ is injective because, for u ∈ U , if φ(u) = 0then φ(u)(g) = 0 for all g ∈ U∗, and hence g(u) = 0 for all g ∈ U∗, and this implies thatu = 0 (since if u 6= 0 we can construct g ∈ U∗ such that g(u) 6= 0 as follows: extend {u}to a basis A for U , then define g ∈ U∗ to be the linear map g : U → F given by g(u) = 1and g(v) = 0 for v ∈ A with v 6= u). This proves Part (1).

Suppose that U is finite dimensional. By the Dual Basis Theorem, we know thatdimU = dimU∗ and dimU∗ = dim(U∗)∗. Since φ : U → (U∗)∗ is injective and dimU =dim(U∗)∗, it follows that φ is bijective. This proves Part (2).

7.6 Definition: The map φ : U → (U∗)∗ of the above theorem, given by φ(u)(g) is calledthe evaluation map.

7.7 Definition: Let U and V be vector spaces over a field F. Let L : U → V be a linearmap. The dual of the map L is the linear map LT : V ∗ → U∗ given by LT (g) = g ◦ L sothat LT (g)(u) = g

(L(u)

)for all g ∈ V ∗ and u ∈ U .

7.8 Theorem: Let U and V be finite dimensional vector spaces over a filed F. Let Aand B be ordered bases for U and V . Let F and G be the dual bases for U∗ and V ∗. LetL : U → V be a linear map. Then [

LT]GF =

([L]AB

)T.

Proof: Let A = {u1, · · · , uk}, B = {v1, · · · , vl}, F = {f1, · · · , fk} and G = {g1, · · · , gl}.Then using the formulas for coefficient vectors from the Dual Basis Theorem, we have

[L]AB =([L(u1)

]B, · · · ,

[L(ul)

]B

)=

g1(Lu1) g1(Lu2) · · · g1(Luk)g2(Lu1) g2(Lu2) · · · g2(Luk)

......

...gl(Lu1) gl(Lu2) · · · gl(Luk)

and

[LT]GF =

([LT (g1)

]F , · · · ,

[LT (gl)

]F

)=

g1(Lu1) g2(Lu1) · · · gl(Lu1)g1(Lu2) g2(Lu2) · · · gl(Lu2)

......

...g1(Luk) g2(Luk) · · · gl(Luk)

.

34

7.9 Definition: Let W be a vector space over a field F. For a subspace U ⊆ W , theannihilator of U in W ∗ is the space

U◦ ={g ∈W ∗

∣∣∣ g(x) = 0 for all x ∈ U}.

7.10 Theorem: Let W be a finite dimensional vector space over a field F. Let U ⊆ Wbe a subspace. Then

dimU + dimU◦ = dimW.

Proof: Let {u1, u2 · · · , uk} be an ordered basis for U . Extend this to an ordered basis{u1, · · · , uk, v1, · · · , vl} for W . Let {f1, · · · , fk, g1, · · · , gl} be the dual basis for W ∗. Weclaim that {g1, · · · , gl} is a basis for U◦. Since gj(ui) = 0 for all 1 ≤ i ≤ k, we see that

each gj ∈ U◦ so we have Span {g1, · · · , gl} ⊆ U◦. For h ∈ U◦, say h =k∑i=1

sifi +l∑i=1

tigi,

we have sj = h(uj) = 0 for all indices j so that h =l∑i=1

tigi ∈ Span {g1, · · · , gl}. Thus

Span {g1, · · · , gl} = U◦, and so {g1, · · · , gl} is a basis for U◦, as claimed.

7.11 Theorem: Let U be a finite dimensional inner product space over F = R or C.Define φU : U → U∗ by φ(u)(x) = 〈x, u〉 for u, x ∈ U . Then

(1) if F = R then φU is a vector space isomorphism, and(2) if F = C then φU is conjugate-linear and bijective.

Proof: Let φ = φU . The map φ is well-defined because for u ∈ U , the map φ(u) : U → Fgiven by φ(u)(x) is linear in x so that φ(u) ∈ U∗. The map φ is linear when F = R andconjugate-linear when F = C because for u, v ∈ U and t ∈ F we have

φ(u+ v)(x) = 〈x, u+ v〉 = 〈x, u〉+ 〈x, v〉 = φ(u)(x) + φ(v)(x) , and

φ(tu)(x) = 〈x, tu〉 = t〈x, u〉 = t φ(u)(x)

for all x ∈ U . The map φ is injective because if φ(u1) = φ(u2) then 〈x − u1〉 = 〈x, u2〉for all x ∈ U , so u1 = u2 by Theorem 5.8. To show that φ is surjective, let g ∈ U∗. Wemust find u ∈ U so that φ(u) = g, that is so that 〈x, u〉 = g(x) for all x ∈ U . Choose anorthonormal basis A = {u1, · · · , un} for U . In order to obtain g(x) = 〈x, u〉 for all x ∈ U ,

it suffices to have g(uk) = 〈uk, u〉 for all indices k. We choose u =n∑i=1

g(ui)ui so that

〈u, uk〉 = g(uk) and then we have g(uk) = 〈u, uk〉 = 〈uk, u〉, as required.

7.12 Note: Let W be an inner product space over F = R or C, and let U ⊆ W be asubspace. Then the above map φW : W → W ∗ given by φW (w)(x) = 〈x,w〉 sends U⊥ toU◦. Indeed for x ∈W we have

u ∈ U⊥ ⇐⇒ 〈u, x〉 = 0 for all x ∈ U ⇐⇒ 〈x, u〉 = 0 for all x ∈ U⇐⇒ φW (u)(x) = 0 for all x ∈ U ⇐⇒ φW (u) ∈ U◦.

35

7.13 Definition: Let U and V be finite dimensional inner product spaces over F = R orC. Let L : U → V be a linear map. The adjoint of L is the map L∗ : V → U given by

L∗ = φU−1 ◦ LT ◦ φV .

7.14 Note: For a map M : V → U , where U and V are finite dimensional inner productspaces over F = R or C, we have

M = L∗ ⇐⇒ M = φU−1 ◦ LT ◦ φV ⇐⇒ φU ◦M = LT ◦ φV

⇐⇒ φU(M(y)

)= LT

(φV (y)

)for all y ∈ V

⇐⇒ φU(M(y)

)= φV (y) ◦ L for all y ∈ V

⇐⇒ φU(M(y)

)(x) = φV (y)

(L(x)

)for all x ∈ U, y ∈ V

⇐⇒⟨x,M(y)

⟩=⟨L(x), y

⟩for all x ∈ U, y ∈ V.

Thus the adjoint of L is the unique map L∗ : V → U with the property that⟨L(x), y

⟩=⟨x, L∗(y)


When F = R, the adjoint L∗ is clearly a linear map because it is the composite of linearmaps. When F = C, the adjoint L∗ is again linear since the map LT is linear and the mapsφU−1 and φV are conjugate-linear. Indeed for y ∈ V and t ∈ C we have

φU−1(LT(φV (ty)

))= φU

−1(LT(t φV (y)

))= φU

−1(t LT

(φV (y)

))= t φU

−1(LT(φV (y)

))7.15 Theorem: Let U and V be finite dimensional inner product spaces over F = R orC and let L : U → V be a linear map. Let A and B be orthonormal bases for U and V .Then [

L∗]BA =

([L]AB

)∗.

Proof: Let A = {u1, · · · , uk} and B = {v1, · · · , vl}. Then

[L]AB =([L(u1)

]B, · · · ,

[L(uk)

]B

)=

⟨L(u1), v1

⟩ ⟨L(u2), v1

⟩· · ·

⟨L(uk), v1

⟩⟨L(u1), v2

⟩ ⟨L(u2), v2

⟩· · ·

⟨L(uk), v2

⟩...

......⟨

L(u1), vl⟩ ⟨

L(u2), vl⟩· · ·

⟨L(uk), vl

⟩

and

[L∗]BA =

([L∗(v1)

]A, · · · ,

[L∗(vl)

]A =

⟨L∗(v1), u1

⟩ ⟨L∗(v2), u1

⟩· · ·

⟨L∗(vl), u1

⟩⟨L∗(v1), u2

⟩ ⟨L∗(v2), u2

⟩· · ·

⟨L∗(vl), u2

⟩...

......⟨

L∗(v1), uk⟩ ⟨

L∗(v2), uk⟩· · ·

⟨L∗(vl), uk

⟩

so the (i, j) entry of the matrix[L∗]BA is([

L∗]BA

)i,j

=⟨L∗(vj), ui

⟩=⟨vj , L(ui)

⟩=⟨L(ui), vj

⟩=(

[L]AB

)j,i.

36

7.16 Remark: We now wish to extend our definition of the adjoint of a linear mapL : U → V to include the case in which U and V are infinite dimensional.

7.17 Theorem: Let U and V be inner product spaces over F = R or C and let L : U → Vbe a linear map. Suppose that there exists a map M : V → U with the property that⟨

L(x), y⟩

=⟨x,M(y)


Then M is unique and linear.

Proof: To prove that M is unique, suppose that another map N : V → U has the propertythat

⟨L(x), y

⟩=⟨x,N(y)

⟩for all x ∈ U and y ∈ V . Then for all y ∈ V we have

〈x,M(y)〉 = 〈L(x), y〉 = 〈x,N(y)〉 for all x ∈ U , and so M(y) = N(y) by Theorem 5.8.Since M(y) = N(y) for all y ∈ V , we have M = N . To see that M is linear, let y, y1, y2 ∈ Vand let t ∈ F. Since⟨

x,M(y1 + y2)⟩

= 〈L(x), y1 + y2〉 = 〈L(x), y1〉+ 〈L(x), y2〉= 〈x,M(y1)〉+ 〈x,M(y2)〉 =

⟨x,M(y1) +M(y2)

⟩for all x ∈ U , we have M(y1 + y2) = M(y1) +M(y2) by Theorem 5.8. Since

〈x,M(ty)〉 = 〈L(x), ty〉 = t 〈L(x), y〉 = t 〈x,M(y)〉 = 〈x, tM(y)〉

for all x ∈ U , we have M(ty) = tM(y) by Theorem 5.8. Thus M is linear.

7.18 Definition: Let U and V be inner product spaces over F = R or C and let L : U → Vbe a linear map. We define an adjoint of L to be a map L∗ : V → U with the propertythat ⟨

L(x), y⟩

=⟨x, L∗(y)


By the above theorem, if L has an adjoint L∗ then L∗ is unique and linear.

37

8. Orthonormal Triangularization and Diagonalization

8.1 Definition: Let F = R or C. For a linear map L : U → U , where U is a finite di-mensional inner product space over F, we say that L is orthonormally triangularizablewhen there exists an othonormal basis A for U such that [L]A is upper triangular. Fora matrix A ∈ Mn×n(F), we say that A is orthonormaly triangularizable when thereexists a matrix P ∈ Mn×n(F) with P ∗P = I such that P ∗AP is upper triangular. Mostbooks do not use the term orthonormally triangularizable but, instead, in the case thatF = R they use the term orthogonally triangularizable and when F = C they use theterm unitarily triangularizable.

8.2 Theorem: Let U be a finite dimensional inner product space over F = R or C. LetA be an orthonormal basis for U . Let L : U → U be a linear map and let A = [L]A. ThenL is orthonormally triangularlizable if and only if A is orthonormally triangularizable.


8.3 Theorem: (Schur) Let U be a finite dimensional inner product space over F = R orC. Let L : U → U be linear. Then L is orthonormally triangularizable if and only if thecharacteristic polynomial fL(x) splits.

Proof: Suppose that L is orthonormally trianglarizable. Choose an orthonormal basis Afor U such that [L]A is upper triangular. Let T = [L]A ∈ Mn(F). Then fL(x) = fT (x),

and since T is upper triangular we have fT (x) = (−1)nn∏k=1

(x− Tk,k

), which splits.

Conversely, suppose that fL(x) splits. Choose any orthonormal basis A for U and letA = [L]A. Since fA(x) = fL(x), we know that fA(x) splits. We shall show, by induction onn, that for any matrix A ∈Mn(F) for which fA(x) splits, there exists a matrix P ∈Mn(F)with P ∗P = I such that P ∗AP is upper triangular. When n = 1, the 1 × 1 matrix A isalready upper triangular and we can take P to be the 1 × 1 identity matrix. Fix n ≥ 2,let A ∈ Mn, suppose that fA(x) splits, and suppose, inductively that for every matrixB ∈ Mn−1(F) for which fB(x) splits, we can find a matrix Q ∈ Mn−1(F) with Q∗Q = Isuch that Q∗BQ is upper triangular. Since fA(x) splits, A has an eigenvalue. Let λ1 bean eigenvalue of A and let u1 ∈ Fn be a corresponding eigenvector with |u1| = 1. Extend{u1} to an orthonormal basis {u1, u2, · · · , un} for Fn and let R = (u1, u2, · · · , un) ∈Mn(F).Note that since {u1, u2, · · · , un} is orthonormal we have R∗R = I. The kth entry of thefirst column of the matrix R∗AR is equal to

(R∗AR)k,1 = ek∗R∗ARe1 = uk

∗Au1 = 〈Au1, uk〉 = 〈λ1u1, uk〉 = λ1δk,1

so we have

R∗AR =

(λ1 xT

0 B

)for some x ∈ Fn−1 and some B ∈ Mn−1(F). Since fA(x) = fR∗AR(x) = −(x − λ1)fB(x),so we see that fB(x) splits. By the induction hypothesis, we can choose Q ∈ Mn−1(F)

with Q∗Q = I such that Q∗BQ is upper triangular. Letting P = R

(1 00 Q

), we have

P ∗AP =

(1 00 Q∗

)(λ1 xT

0 B

)(1 00 Q

)=

(λ1 xTQ0 Q∗BQ

)which is upper triangular, and it is easy to check that P ∗P = I.

38

8.4 Definition: Let F = R or C. For a linear map L : U → U , where U is a finitedimensional inner product space over F, we say that L is orthonormally diagonalizablewhen there exists an orthonormal basis A for U such that [L]A is diagonal. For a matrixA ∈Mn(F), we say that A is orthonormally diagonalizable when there exists a matrixP ∈ Mn(F) with P ∗P = I such that P ∗AP is diagonal. Most books do not use the termorthonormally diagonalizable but, instead, when F = R they use the term orthogonallydiagonalizable and when F = C they use the term unitarily diagonalizable.

8.5 Theorem: Let U be a finite dimensional inner product space over F = R or C. LetA be an orthonormal basis for U . Let L : U → U be a linear map and let A = [L]A. ThenL is orthonormally diagonalizable if and only if A is orthonormally diagonalizable.


8.6 Definition: Let F = R or C. For a linear map L : U → U , where U is an innerproduct space over F, we say that L is normal when the adjoint L∗ exists and L∗L = LL∗.For a matrix A ∈Mn(F), we say that A is normal when A∗A = AA∗. Note that when Uis finite dimensional and A is an orthonormal basis for U , the map L is normal if and onlyif its matrix [L]A is normal.

8.7 Theorem: (Diagonalization of Normal Matrices) Let U be a finite dimensional innerproduct space over F = R or C. Let L : U → U be linear. Then L is orthonormallydiagonalizable if and only if L is normal and the characteristic polynomial fL(x) splits.

Proof: Suppose first that L is orthonormally diagonalizable. Choose an orthonormal basisA for U so that [L]A is diagonal, say [L]A = D = diag(λ1, λ2, · · · , λn) ∈ Mn×n(F). Then

fL(x) splits because fL(x) = fD(x) = (−1)nn∏i=1

(x − λi), and L is normal because D is

normal, indeed

D∗D = diag(λ1, · · · , λn

)diag

(λ1, · · · , λn

)= diag

(|λ1|2, · · · , |λn|2

)= diag

(λ1, · · · , λn

)diag

(λ1, · · · , λn

)= DD∗.

Conversely, suppose that L is normal and that fL(x) splits. Since fL(x) splits, by Schur’sTheorem we can orthonormally triangularize L. Choose an orthonormal basis A for U sothat [L]A is upper triangular. Let T = [L]A ∈ Mn(F). Since L is normal, it follows thatT is normal. Since T is normal and upper triangular, it follows that T is in fact diagonal.

Indeed, the diagonal entries of T ∗T and of TT ∗ are given by (T ∗T )k,k =n∑i=1

(T ∗)k,iTi,k =

n∑i=1

T i,kTi,k =n∑i=1

|Ti,k|2 and (TT ∗)k,k =n∑i=1

Tk,i(T∗)i,k =

n∑i=1

Tk,iT k,i =n∑i=1

|Tk,i|2. Since

T is upper triangular, so that Ti,j = 0 whenever i > j, these expressions simplify to

(T ∗T )k,k =k∑i=1

|Ti,k|2 and (TT ∗)k,k =n∑i=k

|Tk,i|2.

Comparing these diagonal entries, we find that

(T ∗T )1,1 = (TT ∗)1,1 =⇒ |T1,1|2 = |T1,1|2 + |T1,2|2 + · · ·+ |T1,n|2 =⇒ T1,i = 0 for i > 1,

(T ∗T )2,2 = (TT ∗)2,2 =⇒ |T2,2|2 = |T2,2|2 + |T2,3|2 + · · ·+ |T2,n|2 =⇒ T2,i = 0 for i > 2,

and so on, so that T is diagonal.

39

8.8 Definition: Let F = R or C. For a linear map L : U → U , where U is an innerproduct space over F, we say that L is unitary when the adjoint L∗ exists and we haveL∗L = I = LL∗. For a matrix A ∈ Mn(F), we say that A is unitary when A∗A = I.Note that when U is finite dimensional and A is an orthonormal basis for U , the map Lis unitary if and only if its matrix [L]A is unitary. When F = R the term unitary can bereplaced by the term orthogonal.

8.9 Theorem: Let U be a finite dimensional inner product space over F = R or C. LetL : U → U . Then the following are equivalent.

(1) L is unitary,(2) L preserves inner product, that is

⟨L(x), L(y)

⟩= 〈x, y〉 for all x, y ∈ U ,

(3) L preserves norm, that is∣∣L(x)

∣∣ = |x| for all x ∈ U .

Proof: First we show that (1) is equivalent to (2). Suppose that L is unitary. Thenfor x, y ∈ U we have

⟨Lx,Ly

⟩=⟨x, L∗Ly

⟩= 〈x, Iy〉 = 〈x, y〉, and so L preserves inner

product. Conversely, suppose that L preserves inner product. Let y ∈ U . Then for allx ∈ U we have

⟨x, L∗Ly

⟩=⟨Lx,Ly

⟩= 〈x, y〉 Since

⟨x, L∗Ly

⟩=⟨x, y⟩

for all x ∈ U , itfollows (from Theorem 5.8) that L∗Ly = y. Since L∗Ly = y for all y ∈ U , it follows thatL∗L = I, and so L is unitary.

Next we shall show that (2) is equivalent to (3). Suppose that L preserves innerproduct. Then for x ∈ U we have∣∣Lx∣∣2 =

⟨Lx,Lx

⟩= 〈x, x〉 = |x|2

so that L preserves norm. Conversely, suppose that L preserves norm. Then, using thePolarization Identity and the linearity of L, for x, y ∈ U we have

〈Lx,Ly〉 = 14

(∣∣Lx+ Ly∣∣2 + i

∣∣Lx+ iLy∣∣2 − ∣∣Lx− Ly∣∣2 − i∣∣Lx− iLy∣∣2)

= 14

(∣∣L(x+ y)∣∣2 + i

∣∣L(x+ iy)∣∣2 − ∣∣L(x− y)

∣∣2 − i∣∣L(x− iy)∣∣2)

= 14

(|x+ y|2 + i|x+ iy|2 − |x− y|2 − i|x− iy|2

)= 〈x, y〉.

8.10 Theorem: (Diagonalization of Unitary Maps) Let U be a finite dimensional innerproduct space over F = R or C and let L : U → U be linear. Then L is orthonormallydiagonalizable and all its eigenvalues have norm 1 if and only if L is unitary and fL(x)splits.

Proof: Suppose that L is orthonormally diagonalizable and that all of its eigenvalues havenorm 1. Choose an orthonormal basis A for U so that [L]A = D = diag

(λ1, · · · , λn

)where

|λi| = 1 for 1 ≤ i ≤ n. Then fL(x) splits since fL(x) = fD(x) = (−1)nn∏i=1

(x− λi), and L

is unitary since D is unitary, indeed

D∗D = diag(λ1, · · · , λn

)diag

(λ1, · · · , λn

)= diag

(|λ1|2, · · · , |λn|2

)= I.

Conversely, suppose that L is unitary and that fL(x) splits. Since L is unitary, it is alsonormal because L∗L = I = LL∗. Since L is normal and fL(x) splits, L is orthonormally di-agonalizable. Choose an orthonormal basisA for U such that [L]A = D = diag(λ1, · · · , λn).Since L is unitary, so is D, and so we have I = D∗D = diag

(|λ1|2, · · · , |λn|2

), and hence

|λi| = 1 for all indices i.

40

8.11 Definition: Let F = R or C. For a linear map L : U → U , where U is an innerproduct space over F, we say that L is Hermitian (or self-adjoint) when the adjoint L∗

exists and we have L∗ = L. For a matrix A ∈Mn(F), we say that A is Hermitian (or self-adjoint) when A∗ = A. Note that when U is finite dimensional and A is an orthonormalbasis for U , the map L is Hermitian if and only if its matrix A is Hermitian. When F = R,the terms Hermitian and self-adjoint can be replaced by the term symmetric.

8.12 Theorem: (Diagonalization of Hermitian Maps) Let U be a finite dimensional innerproduct space over F = R or C and let L : U → U be linear. Then L is orthonormallydiagonalizable and all its eigenvalues are real if and only if L is Hermitian.

Proof: Suppose that L is orthonormally diagonalizable and all of its eigenvalues are real.Choose an orthonormal basis A for U so that [L]A = D = diag(λ1, · · · , λn) with eachλi ∈ R. Then L is Hermitian since D is Hermitian, indeed

D∗ = diag(λ1, · · · , λn

)= diag

(λ1, · · · , λn

)= D.

Conversely, suppose that L is Hermitian, that is L∗ = L. Since L is Hermitian, it is alsonormal, indeed we have L∗L = L2 = LL∗. Also, because L is Hermitian it follows that itseigenvalues are all real. Indeed if λ is an eigenvalue of L and u is a corresponding uniteigenvector, so that we have Lu = λu and |u| = 1, then

λ = λ〈u, u〉 = 〈λu, u〉 = 〈Lu, u〉 = 〈u, L∗u〉 = 〈u, Lu〉 = 〈u, λu〉 = λ〈u, u〉 = λ.

Since the eigenvalues of L are all real, it follows that fL(x) splits (even when F = R).Since L is normal and fL(x) splits, L is orthonormally diagonalizable.

8.13 Example: Let U be a finite dimensional inner product space over F = R or C. LetL : U → U be a linear map. Then L is an orthogonal projection (onto some subspaceU0 ⊆ U) when there exists an orthonormal basis A for U (obtained by extending anorthonormal basis A0 for U0 to all of U) such that

[L]A =

(I 00 0

).

Thus we see that

L is an orthogonal projection map

⇐⇒ L is orthogonally diagonalizable and all of its eigenvalues are 0 or 1

⇐⇒ L∗ = L and L2 = L

because when [L]A = D = diag(λ1, · · · , λn) we have

L2 = L ⇐⇒ D2 = D ⇐⇒ λi2 = λi for all i ⇐⇒ λi ∈ {0, 1} for all i.

Similarly, L is a reflection (onto some subspace U0 ⊆ U) when there is an orthonormalbasis A for U such that

[L]A =

(I 00 −I

)and so we see that

L is a reflection map

⇐⇒ L is orthogonally diagonalizable and all of its eigenvalues are 1 or -1

⇐⇒ L∗ = L and L2 = I ⇐⇒ L∗ = L and L∗L = I.

41

8.14 Theorem: (Singular Value Decomposition) Let U and V be finite dimensional innerproduct spaces over F = R or C. Let L : U → V be a linear map. Then there existorthonormal bases A and B for U and V such that [L]AB is of the block form

[L]AB =

(D 00 0

)with D = diag

(σ1, σ2, · · · , σr

)with r = rank(L) and σ1 ≥ σ2 ≥ · · · ≥ σr > 0. The positive real numbers σi are unique.

Proof: First, let us prove that the numbers σ1, · · · , σr are uniquely determined from L.Suppose A = {u1, · · · , uk} and B = {v1, · · · , vl} are orthonormal bases for U and V such

that [L]AB =

(D 00 0

)∈ Ml×k(F) where D = diag(σ1, · · · , σr) with σ1 ≥ · · · ≥ σr > 0.

Since [L]AB =

(D 00 0

)∈Ml×k(F), we must have L(ui) = σivi for 1 ≤ i ≤ r and L(ui) = 0

for r < i ≤ k. Since [L∗]BA =

(D 00 0

)∈ Mk×l(F), we must have L∗(vi) = σiui for

1 ≤ i ≤ r and L∗(vi) = 0 for r < i ≤ l. It follows that

L∗L(ui) = L∗(σivi) = σiL∗(vi) = σi

2ui

for 1 ≤ i ≤ r and L∗L(ui) = L∗(0) = 0 for r < i ≤ k. Thus for 1 ≤ i ≤ r, the valuesλi = σi

2 must be the non-zero eigenvalues of L∗L, and they must be positive and real, andthe vectors ui must be corresponding eigenvectors.

Next, let us prove that there do indeed exist orthonormal bases which put L into thedesired form. Note that Null(L∗L) = Null(L), indeed for x ∈ U we have

Lx = 0 =⇒ L∗Lx = 0 , and

L∗Lx = 0 =⇒⟨x, L∗Lx

⟩= 0 =⇒

⟨Lx,Lx

⟩= 0 =⇒ Lx = 0.

In particular, we have rank(L∗L) = rank(L) = r. Also note that L∗L is Hermitian since(L∗L)∗ = L∗L∗∗ = L∗L, and so L∗L is orthonormaly diagonalzable and its eigenvalues areall real. Furthermore, note that the eigenvalues of L∗L are all non-negative because if λ isan eigenvalue of L∗L and u is a corresponding unit eigenvector so that we have Lu = λuand |u| = 1, then we have

λ = λ|u|2 = λ〈u, u〉 = 〈λu, u〉 =⟨L∗Lu, u

⟩=⟨u, L∗Lu

⟩=⟨Lu,Lu

⟩=∣∣Lu∣∣2 ≥ 0.

Let λ1, · · · , λk be the eigenvalues of L∗L with λ1 ≥ · · · ≥ r > 0 and λr+1 = · · · = λk = 0.let σi =

√λi for 1 ≤ i ≤ k so that σ1 ≥ · · · ≥ σr > 0 and σr+1 = · · · = σk = 0. Choose an

orthonormal basis A for U so that [L∗L]A = diag(λ1, · · · , λk). Note that {ur+1, · · · , uk}is an orthonormal basis for Null(L∗L) = Null(L) and {u1, · · · , ur} is an orthonormal basisfor Null(L)⊥. For 1 ≤ i ≤ r, let vi = 1

σiL(ui). Note that {v1, · · · , vr} is orthonormal since

〈vi, vj〉 =⟨

1σiL(ui) ,

1σjL(uj)

⟩= 1

σi σj

⟨L(ui), L(uj)

⟩= 1

σiσj

⟨ui, L

∗L(uj)⟩

= 1σiσj〈ui, λjuj〉 =

λjσiσj〈ui, uj〉 =

λjσiσj

δi,j = δi,j .

sinceλjσjσj

= 1. Extend {v1, · · · , vr} to an orthonormal basis for V , and note that [L]AB is

of the desired form.

8.15 Definition: The singular values of a linear map L : U → V are the square rootsof the eigenvalues of the map L∗L. The singular values of a matrix A are the squareroots of the eigenvalues of the matrix A∗A

42

9. Bilinear Forms

9.1 Definition: Let U , V and W be vector spaces over a field F and let L : U × V →W .We say that L is bilinear when

L(x1 + x2, y) = L(x1, y) + L(x2, y) , L(tx, y) = t L(x, y) ,L(x, y1 + y2) = L(x, y1) + L(x, y2) and L(x, ty) = t L(x, y)

for all x, x1, x2 ∈ U , and all y, y1, y2 ∈ V and all t ∈ F. For a bilinear map L : U×U →W ,we say that L is symmetric when L(y, x) = L(x, y) for all x, y ∈ U , and we say that L isalternating (or skew-symmetric) when L(y, x) = −L(x, y) for all x, y ∈ U . A bilinearmap L : U × U → F is called a bilinear form on U .

9.2 Example: For any field F, the dot product . : Fn × Fn → F given by u. v = vTuis a symmetric bilinear form on Fn, and the cross product × : F3 × F3 → F given byu× v =

(u2v3 − u3v2 , u3v1 − u1v3 , u1v2 − u2v1

)is an alternating bilinear form on F3.

9.3 Note: Let U , V and W be vector spaces over a field F. Given bases A and B for Uand V , A bilinear map L : U × V →W is uniquely determined by the values L(u, v) ∈Wwith u ∈ A and v ∈ B. Indeed, given x ∈ U and y ∈ V , say x =

n∑i=1

siui and y =m∑j=1

tjvj

with ui ∈ A, vj ∈ B and si, tj ∈ F, we have

L(x, y) = L( n∑i=1

siuu ,m∑j=1

tjvj

)=

∑1≤i≤n,1≤j≤m

sitjL(ui, vj).

9.4 Theorem: (The Matrix of a Bilinear Map) Let U and V be finite dimensional vectorspaces over a field F. Let A = {u1, · · · , uk} and B = {v1, · · · , vl} be bases for U and V .Let L : U × V → F be a bilinear map. There exists a unique matrix [L]AB ∈Ml×k(F) withthe property that

[y]BT

[L]AB [x]A = L(x, y)

for all x ∈ U and y ∈ V .

Proof: First we prove uniqueness. Suppose that such a matrix [L]AB exists. Let A = [L]AB .Then the entries of A are given by

Ai,j = eiTAej = [vi]B

T[L]AB [uj ]A = L(uj , vi).

This shows that the matrix is unique.To prove existence, given a bilinear map L : U × V → F, we let A ∈ Ml×k(F) be the

matrix with entries Ai,j = L(uj , vi). Define M : U × V → F by M(x, y) = [y]BTA [x]A.

Note that M is bilinear and for all indices i and j we have

M(uj , vi) = [vi]BTA [uj ]A = ei

TAej = Ai,j = L(uj , vi).

It follows from the above note that M = L, so we can take [L]AB = A.

9.5 Definition: The matrix [L]AB in the above theorem is called the matrix of the bilinearmap L with respect to the bases A and B. For a bilinear form L : U × U → F, we write[L]A = [L]AA.

43

9.6 Theorem: Let U be a finite dimensional vector space over a field F. Let L be abilinear form on U . Then L is symmetric if and only [L]A is symmetric for some, henceany, basis A for U .

Proof: Suppose that L is symmetric. Let A = {u1, · · · , un} be any basis for U and letA = [L]A. Then for all indices i, j we have Ai,j = L(uj , ui) = L(ui, uj) = Aj,i, and so A issymmetric. Conversely, let A = {u1, · · · , un} be any basis for U , let A = [L]A, and suppose

that A is symmetric. Let x, y ∈ U , say x =n∑i=1

siui and y =n∑i=1

tjuj with si, tj ∈ F. Then

L(x, y) = L( n∑i=1

siui ,n∑i=1

tjuj

)=

∑1≤i,j≤n

sitj L(ui, uj) =∑

1≤i,j≤nsitjAj,i

=∑

1≤i,j≤nsitjAi,j =

∑1≤i,j≤n

sitjL(uj , ui) = L( n∑i=1

tjuj ,n∑i=1

siui

)= L(y, x).

9.7 Theorem: (Change of Basis) Let U and V be finite dimensional vector spaces overa field F. Let L : U × V → F be a bilinear map. Let A1, A2 and B1, B2 be two bases foreach of the vector spaces U and V . Then

[L]A2

B2= [I]B2

B1

T[L]A1

B1[I]A2

A1.

Proof: For all x ∈ U and y ∈ V we have

[y]B2

T[L]A2

B2[x]A2

= F(x, y) = [y]B1

T[L]A1

B1[x]A1

=(

[I]B2

B![y]B2

)T[L]A1

B1

([I]A2

A1[x]A2

)= [y]B2

T(

[I]A2

A1

T[L]A1

B1[I]A2

A1

)[x]A2

and so, by the uniqueness of the matrix [L]A2

B2, we have [L]A2

B2= [I]B2

B1

T[L]A1

B1[I]A2

A1.

9.8 Definition: Let U and V be finite dimensional vector spaces over a field F, and letL : U × V → F be a bilinear map. We define the rank of L to be the rank of the matrix[L]AB where A and B are any bases for U and V . Note that, by the above theorem, thisdefinition does not depend on the choice of A and B (because multiplying a matrix, on theright or on the left, by an invertible matrix does not alter its rank).

9.9 Note: As a particular case of the above theorem, if U is a finite dimensional vectorspace over a field F, L is a bilinear form on U , and A and B are two bases for U , and ifwe write A = [L]A, B = [L]B and P = [I]BA, then we have B = PTAP .

9.10 Definition: For A,B ∈Mn(F), we say that A and B are congruent, and we writeA ∼= B, when there exists an invertible matrix P ∈Mn(F) such that B = PTAP .

9.11 Note: It is perhaps worth mentioning that congruent matrices do not, in general,share the same trace, determinant, or eigenvalues, and we do not define the trace, deter-minant, or eigenvalues of a bilinear map.

44

9.12 Theorem: (Diagonalization of Symmetric Bilinear Forms) Let U be a finite dimen-sional vector space over a field F with char (F) 6= 2. Let L : U ×U → F be a bilinear formon U . Then there exists a basis A for U such that [L]A is diagonal if and only if L issymmetric.

Proof: If A is a basis for U such that [L]A is diagonal, then L is symmetric since its matrix[L]A is symmetric. Conversely, suppose that L is symmetric. Choose a basis A0 for U , andlet A = [L]A0

∈ Mn(F). Note that A is symmetric. We must show that A is congruentto a diagonal matrix. We describe an algorithm which uses elementary row and columnoperations to put the matrix A into diagonal form. Consider the element A1,1. If A1,1 6= 0then we use the (1, 1) entry to eliminate the other entries on the first row and column byapplying the row and column operations

Rk 7→ Rk −Ak,1A1,1

R1 and Ck 7→ Ck −A1,k

A1,1C1.

Note that since A is symmetric, we have Ak,1 = A1,k, and it follows that, for each k ≥ 2, theelementary matrices associated to the above row and column operations are the transposesof one another. If A1,1 = 0 and A1,i 6= 0 for some i ≥ 2, say A1,k 6= 0, then first we usethe row and column operations

R1 7→ R1 +Rk and C1 7→ C1 + Ck

to replace the (1, 1) entry by A1,k +Ak,1 = 2A1,k 6= 0, and then we use this new non-zero(1, 1) entry to eliminate the other entries on the first row and column, as above. Again,note that the elementary matrices associated to the above row and column operation arethe transposes of one another. At this stage we have converted A to the congruent matrix

P1TAP1 =

(d1 00 B

)with B ∈ Mn−1(F), where P1 is the product of all the elementary

column operation matrices. Since A is symmetric, the matrix P1TAP is symmetric, and so

the matrix B ∈ Mn−1(F) is also symmetric. We now repeat the above procedure on thematrix B.

9.13 Corollary: Let U be a finite dimensional vector space over a field F and let L bea symmetric bilinear form on U of rank r. Then there is a basis A for U such that suchthat [L]U = diag(d1, d2, · · · , dn) for some di ∈ F with di 6= 0 for 1 ≤ r and di = 0 for i > r.

Proof: Choose a basis A0 so that [L]A0 is diagonal, then, if necessary, perform the row andcolumn operations Ri ↔ Rj and Ci ↔ Cj to rearrange the diagonal entries of the matrix.

9.14 Corollary: Let U be a finite dimensional vector space over C and let L be asymmetric bilinear form on U of rank r. Then there exists a basis A for U such that

[L]A =

(Ir 00 0

).

Proof: Choose a basis A0 for U so that [L]A0= D = diag(d1, d2, · · · , dn) with di 6= 0 for

1 ≤ i ≤ r and di = 0 for r < i ≤ n. For 1 ≤ i ≤ r, choose ci ∈ C so that ci2 = 1

diand for

r < i ≤ n, choose ci = 1, and then let C = diag(c1, c2, · · · , cn). Then D is congruent tothe matrix CTDC, which is in the required form.

45

9.15 Theorem: (Sylvester’s Law of Inertia) Let U be a finite dimensional vector spaceover R and let L be a symmetric bilinear form on U of rank r. Then there exists a basisA for U such that [L]A is of the form

[L]A =

Ik−Ir−k

0

for some uniquely determined number k with 0 ≤ k ≤ r.

Proof: We can choose a basis A0 for U so that D = [L]A is diagonal, and we can orderthe diagonal entries so that D = diag(d1, d2, · · · , dn with di > 0 for 1 ≤ i ≤ k, di < 0 fork < i ≤ r and di = 0 for r < i ≤ n. For 1 ≤ i ≤ k we choose ci = 1√

di, for k < i ≤ r we

choose ci = 1√−di

and for k < i ≤ n we choose ci = 1, and then let C = diag(c1, c2, · · · , cn).

Then the matrix D is congruent to the matrix CTDC which is in the desired form.

It remains to show that the number of positive entries k is uniquely determined by L.Suppose, for a contradiction, that we can find two bases A and B for U such that

[L]A = diag(Ik,−Ir−k, 0

)and [L]B = diag

(Il,−Ir−l, 0

)with k 6= l, say k < l. Note that for x ∈ U , with say x =

n∑i=1

siui, we have

L(x, uj) = L( n∑i=1

siui , uj

)=

n∑i=1

siL(ui, uj) = sjL(uj , uj) =

sj if 1 ≤ j ≤ k−sj if k < j ≤ r, and

0 if r < j ≤ n

and hence

L(x, x) = L(x ,

n∑j=1

sjuj

)=

n∑j=1

sjL(x, uj) =k∑j=1

sj2 −

r∑j=k+1

sj2.

Similar formulas hold for x ∈ U with x =n∑i=1

tivi.

Consider the linear map φ : U → Rk+n−l given by

φ(x) =(L(x, u1), L(x, u2), · · · , L(x, uk), L(x, vl+1), L(x, vl+2), · · · , L(x, vr)

)T.

Note that nullity(φ) = n − rank(φ) ≥ n − (k + r − l) = (n − r) + (l − k) > n − r. Sincenullity(φ) > n− r = dim Span {ur+1, ur+2, · · · , un}, we can choose an element x ∈ Null(φ)

with x /∈ Span {uk+1, · · · , un}. Choose such an element x and write x =n∑i=1

siui =n∑i=1

tivi.

Since x ∈ Null(φ) we have si = L(x, ui) = 0 for 1 ≤ i ≤ k and ti = −L(x, vi) = 0 forl < i ≤ r. Since x /∈ Span {ur+1, · · · , un}, we must have si 6= 0 for some 1 ≤ i ≤ r. Thuswe have si = 0 for all 1 ≤ i ≤ k and si 6= 0 for some 1 ≤ i ≤ r, which implies that

L(x, x) =k∑i=1

si2 −

r∑i=k+1

si2 < 0, but we also have ti = 0 for all l < i ≤ r which implies

that L(x, x) =l∑i=1

ti2 −

r∑i=l+1

ti2 ≥ 0, giving the desired contradiction.

46

9.16 Definition: For a bilinear form L : U ×U → R, the number k in the above theoremis called the index of L, and the pair (k, r − k) is called the signature of L.

9.17 Note: Let U be a finite dimensional inner product space over R and let L be asymmetric bilinear form on U . Let A be any basis for U and let A = [L]A. Since A issymmetric, it is orthogonally diagonalizable, so there exists an orthogonal matrix P suchthat PTAP = D = diag(λ1, · · · , λn) and the diagonal entries λi are the eigenvalues of A(repeated according to multiplicity). By Sylvester’s Theorem the number of indices i forwhich λi > 0 is equal to the index of L, and does not depend on the choice of basis A.

9.18 Definition: Let U be a vector space over R and let L : U × U → R be a symmetricbilinear form. Then

(1) L is positive definite when L(x, x) > 0 for all 0 6= x ∈ U ,(2) L is positive semidefinite when L(x, x) ≥ 0 for all x ∈ U ,(3) L is negative definite when L(x, x) < 0 for all 0 6= x ∈ U ,(4) L is negative semidefinite when L(x, x) ≤ 0 for all x ∈ U , and(5) L is indefinite when there exist x, y ∈ U with L(x, x) > 0 and L(y, y) < 0.

9.19 Note: Let Ube an n-dimensional vector space over R and let L : U × U → R be asymmetric bilinear form. Let A be a basis for U and let A = [L]A. Then

L is positive definite ⇐⇒ L(u, u) > 0 for all 0 6= u ∈ U

⇐⇒ [u]AT

[L]A[u]A > 0 for all 0 6= u ∈ U⇐⇒ xTAx > 0 for all 0 6= x ∈ Rn.

Similarly, L is positive semidefinite if and only if xTAx ≥ 0 for all x ∈ Rn, and so on.

9.20 Definition: For a symmetric matrix A ∈Mn(R),

(1) A is positive definite when xTAx > 0 for all 0 6= x ∈ Rn,(2) L is positive semidefinite when xTAx ≥ 0 for all x ∈ Rn,(3) L is negative definite when xTAx < 0 for all 0 6= x ∈ Rn,(4) L is negative semidefinite when xTAx ≤ 0 for all x ∈ Rn, and(5) L is indefinite when there exist x, y ∈ Rn with xTAx > 0 and xTAx < 0.

9.21 Theorem: (The Characterization of Definiteness by Eigenvalues) Let U be an n-dimensional vector space over R and let L : U ×U → R be a symmetric bilinear form. LetA be a basis for U and let A = [L]A ∈Mn(R). Let λ1, λ2, · · · , λn be the eigenvalues of A.Then

(1) L is positive definite ⇐⇒ λi > 0 for all i ⇐⇒ index (L) = rank(L) = dim(U),(2) L is positive semidefinite ⇐⇒ λi ≥ 0 for all i ⇐⇒ index (L) = rank(L),(3) L is negative definite⇐⇒ λi < 0 for all i ⇐⇒ index (L) = 0 and rank(L) = dim(U),(4) L is negative semidefinite ⇐⇒ λi ≤ 0 for all i ⇐⇒ index (L) = 0, and(5) L is indefinite ⇐⇒ λi > 0 and λj < 0 for some i, j ⇐⇒ 0 < index (L) < rank(L).

Proof: We prove Part (1). Note that A is symmetric, and hence orthogonally diagonaliz-able. Choose an orthogonal matrix P ∈ On(R) such that PTAP = D = diag(λ1, · · · , λn).Then L is positive definite ⇐⇒ D is positive definite ⇐⇒ xTDx > 0 for all x ∈ Rn

⇐⇒n∑i=1

λixi2 > 0 for all 0 6= x ∈ Rn ⇐⇒ λi > 0 for all i.

47

9.22 Theorem: (The Characterization of Definiteness by Determinants) Let U be ann-dimensional vector space over R and let L : U × U → R be a symmetric bilinear form.Let A be a basis for U and let A = [L]A ∈ Mn(R). For 1 ≤ k ≤ n, let Ak×k be theupper-left k × k submatrix of A. Then

(1) L is positive definite ⇐⇒ det(Ak×k

)> 0 for all k, and

(2) L is negative definite ⇐⇒ (−1)k det(Ak×k

)> 0 for all k.

Proof: Suppose first that L is positive definite. Then A is positive definite, so we have

xTAx > 0 for all 0 6= x ∈ Rn. Let 1 ≤ k ≤ n. Note that xTAk×kx =

(x0

)TA

(x0

)> 0

for all 0 6= x ∈ Rk and so Ak×k is positive definite. Since Ak×k is positive definite, itseigenvalues are all positive and hence det

(Ak×k

)> 0 (since the determinant of a square

matrix is equal to the product of its eigenvalues).Conversely, suppose that det

(Ak×k

)> 0 for 1 ≤ k ≤ n. Consider what happens when

we apply the row and column operation algorithm (from Theorem 9.12) to diagonalize thesymmetric matrix A. Since A1,1 = det

(A1×1) > 0, we begin by using the row and column

operations

Ri 7→ Ri −Ai,1A1,1

R1 and Ci 7→ Ci −A1,i

A1,1C1

to eliminate the other entries on the first row and column. This puts the matrix A intothe form (

A1,1 00 B

)for some symmetric matrix B. Notice that for 1 ≤ k < n, the same row and columnoperations convert A(k+1)×(k+1) to the matrix(

A1,1 00 Bk×k

)and these operations do not change the determinant so we have

det(A(k+1)×(k+1)

)= A1,1 det

(Bk×k

)and so we have det

(Bk×k

)> 0 for 1 ≤ k < n. Thus repeating this procedure eventually

converts A to a diagonal matrix whose diagonal entries are all positive. It follows thatindex (L) = n and hence L is positive definite. This proves Part (1).

Finally, note that Part (2) follows immediately from Part (1) because

L is negative definite ⇐⇒ −L is positive definite ⇐⇒ det(−Ak×k

)> 0 for all k

⇐⇒ (−1)k det(Ak×k

)> 0 for all k.

48

10. Quadratic Forms

10.1 Definition: Let U be a vector space over a field F. A quadratic form on U is amap K : U → F of the form K(u) = L(u, u) for some symmetric bilinear form L on U .Note that for u, v ∈ U we have

K(u+v) = L(u+v, u+v) = L(u, u)+L(u, v)+L(v, u)+L(v, v) = K(u)+2L(u, v)+K(v)

and so when char (F) 6= 2 we have the Polarization Identity

L(u, v) = 12

(K(u+ v)−K(u)−K(v)

).

This shows that L is uniquely determined from K. Given a basis A for U , we define thematrix of K with respect to A to be the matrix of its (unique) associated symmetricbilinear form L, that is

[K]A = [L]A

so that for u ∈ U we haveK(u) = [u]A

T[K]A[u]A.

When A = {u1, u2, · · · , un}, the matrix A = [K]A ∈ Mn(F) has entries Ai,j = L(uj , ui),and writing x = [u]A we have

K(u) = xTAx =n∑i=1

xiAi,jxj =n∑i=1

Ai,ixi2 + 2

∑i<j

Ai,jxixj .

When we diagonalize the symmetric matrix A by choosing an invertible matrix P ∈Mn(F)so that PTAP = D = diag(d1, d2, · · · , dn), if we write x = Pt, or equivalently t = P−1x,then we have

K(u) = xTAx = tTPTAP t = tTDt =n∑i=1

diti2.

10.2 Example: A polynomial or power series in the variables x1, x2, · · · , xn can be writtenas

f(x) =∑

i1,i2,···,in≥0

ci1,···,inx1i1x2

i2 · · ·xnin =∑k≥0

fk(x)

wherefk(x) =

∑i1,i2,···,in≥0i1+i2+···+in=k

ci1,···,inx1i1 · · ·xnin .

For each k ≥ 0, the polynomial fk(x) is a homogeneous polynomial of degree k, whichmeans that fk(tx) = tkfk(x) for all x ∈ Rn and all t ∈ R. By relabeling the coefficients,we can also write

f0(x) = a , f1(x) =n∑

1≤i≤naixi , f2(x) =

∑1≤i≤j≤n

ai,j xixj , f3(x) =∑

1≤i≤j≤k≤nai,j,k xixjxk

and so on. In particular, when char (F) 6= 2, we have f2(x) =∑

1≤i≤j≤nai,j xixj = xTAx

where A ∈Mn(F) is the matrix with entries Ai,i = ai for 1 ≤ i ≤ n and Ai,j = Aj,i = 12ai,j

for 1 ≤ i < j ≤ n. Thus a quadratic form on Fn is the same thing as a homogeneouspolynomial in x1, · · · , xn of degree 2.

49

10.3 Example: Sketch or describe the curve 3x2 − 4xy + 6y2 = 10.

Solution: Let K be the quadratic form on R2 given by K(x, y) = 3x2 − 4xy + 6y2. Notethat

K(x, y) =(x y)A

(xy

)where A =

(3 −2−2 6

).

The characteristic polynomial of A is

fA(x) = det(A− xI

)= det

(3− x −2−2 6− x

)= (x2 − 9x+ 14 = (x− 7)(x− 2)

so the eigenvalues of A are λ1 = 7 and λ2 = 2. We have

A− λ1I =

(−4 −2−2 −1

)∼=(

2 10 0

)and so u1 = 1√

5(1,−2)T is a unit eigenvector for λ1. Since A is symmetric, the eigenspace

of λ2 is orthogonal to the eigenspace of λ1, so u2 = 1√5(2, 1)T is a unit eigenvector for λ2.

Thus we have

PTAP = D where P =(u1, u2

)= 1√

5

(1 2−2 1

)and D =

(λ1 00 λ2

)=

(7 00 2

).

We make a change in coordinates by writing

(xy

)= P

(st

), and then we have

K(x, y) =(x y)A

(xy

)=(s t)PTAP

(st

)=(s t)D

(st

)= 7s2 + 2t2

and so

K(x, y) = 10 ⇐⇒ 7s2 + 2t2 = 10 ⇐⇒ s2

10/7+t2

5= 1.

Thus, in the st-plane, the curve is the ellipse with vertices at ±(√

107 , 0

)and ±

(0,±√

5).

Since our change of coordinate matrix P is an orthogonal matrix it preserves inner product,norm and angle (indeed P is a rotation matrix), in the xy-plane, the curve is an ellipse ofthe same shape with vertices at

P

(±√

107

0

)= ±

√107 ·

1√5

(1 2−2 1

)(10

)= ±

√27

(1−2

), and

P

(0±√

5

)= ±√

5 · 1√5

(1 2−2 1

)(01

)= ±

(21

).

50

10.4 Theorem: Let U be an n-dimensional inner product space over R and let K : U → Rbe a quadratic form on U . Let A = {u1, u2, · · · , un} be an orthonormal basis for U suchthat [K]A = D = diag(λ1, λ2, · · · , λn) with λ1 ≥ λ2 ≥ · · · ≥ λn. Then

maxu∈U,|u|=1

K(u) = K(u1) = λ1 and minu∈U,|u|=1

K(u) = K(un) = λn.

Proof: Let u ∈ U and write x = [u]A. Note that |x| = |u| since A is orthonormal. When|u| = |x| = 1 we have

K(u) = xTDx =n∑i=1

λixi2 ≤

n∑i=1

λ1xi2 = λ1

n∑i=1

xi2 = λ1|x|2 = λ1

and when u = u1 we have x = [u1]A = e1 so that K(u) = K(u1) = e1TDe1 = λ1. This

shows that maxu∈U,|u|=1

K(u) = K(u1) = λ1, and the proof that minu∈U,|u|=1

K(u) = K(un) = λn

is similar.

10.5 Theorem: Let U and V be finite dimensional inner product spaces over R and letL : U → V be a linear map. Then

maxu∈U,|u|=1

∣∣L(u)∣∣ =

∣∣L(u1)∣∣ = σ1 and min

u∈U,|u|=1

∣∣L(u)∣∣ =

∣∣L(un)∣∣ = σn

where σ1 and σn are the largest and smallest singular values of L and u1 and un are uniteigenvectors of the map L∗L for the eigenvalues λ1 = σ1

2 and λn = σn2.

Proof: Choose an orthonormal basis A = {u1, u2, · · · , un} for U such that

[L∗L]A = D = diag(λ1, λ2, · · · , λn)

where λ1, · · · , λn are the eigenvalues of L∗L with λ1 ≥ λ2 ≥ · · · ≥ λn, and let σi =√λi.

Choose any orthonormal basis B for V and let A = [L]AB . Note that

ATA = A∗A = [L∗]BA[L ]AB = [L∗L]A = D.

Let u ∈ U and write x = [u]A. Note that |x| = |u| and∣∣L(u)∣∣2 =

∣∣[L(u)]B∣∣2 =

∣∣[L]AB [u]A∣∣2 =

∣∣Ax∣∣2 = (Ax)T (Ax) = xTATAx = xTDx.

As in the proof of the previous theorem, we see that

maxu∈U,|u|=1

∣∣L(u)∣∣2 =

∣∣L(u1)∣∣2 = λ1 and min

u∈U,|u|=1

∣∣L(u)∣∣2 =

∣∣L(un)∣∣2 = λn

and so

maxu∈U,|u|=1

∣∣L(u)∣∣ =

∣∣L(u1)∣∣ = σ1 and min

u∈U,|u|=1

∣∣L(u)∣∣ =

∣∣L(un)∣∣ = σn.

10.6 Theorem: Let U and V be non-trivial subspaces of Rn with U ∩ V = {0}. Thenθ(U, V ) = cos−1(σ1) where σ1 is the largest singular value of the map P : U → V given byP (x) = ProjV (x).


51

10.7 Example: Let U ⊆ Rn be an open set with a ∈ U . Let f : U → R be smooth(meaning that the partial derivatives of all orders all exist in U). The Taylor polynomialof degree 2 centred at x = a for a smooth function of the variables x1, x2, · · · , xn can bewritten as

T (x) = f(a) +Df(a) (x− a) + (x− a)THf(a) (x− a)

where Df(a) ∈M1×n(R) and Hf(a) ∈Mn×n(R) are the matrices with entries

Df(a)i =∂f

∂xi(a) and Hf(a)i,j =

∂2f

∂xi ∂xj(a).

10.8 Theorem: (The Second Derivative Test) Let U ⊆ Rn be an open set with a ∈ U .Let f : U → R be smooth. Suppose that Df(a) = 0. Then

(1) if Hf(a) is positive definite then f(x) has a local minimum at x = a,(2) if Hf(a) is negative definite then f(x) has a local maximum at x = a, and(3) if Hf(a) is indefinite then f(x) has a saddle point at x = a.

Proof: We omit the proof. This theorem is often proven in a calculus course.

10.9 Remark: Let U , V and W be a vector spaces over C. A map L : U × V → W iscalled sesquilinear when

L(x1 + x2, y) = L(x1, y) + L(x2, y) , L(tx, y) = t L(x, y) ,L(x, y1 + y2) = L(x, y1) + L(x, y2) and L(x, ty) = t L(x, y)

for all x, x1, x2 ∈ U , and all y, y1, y2 ∈ V and for all t ∈ C. For a sesquilinear mapL : U × U →W we say that L is Hermitian when L(y, x) = L(x, y) for all x, y ∈ U , andwe say that L is skew-Hermitian when L(y, x) = −L(x, y) for all x, y ∈ U . A sesquilinearmap L : U × U → C is called a sesquilinear form on U . A sesquilinear form which isHermitian is called a Hermitian form.

10.10 Example: As an exercise, think about how the theory of bilinear and quadraticforms, from this and the previous chapter, carry over to Hermitian forms.

52

11. Tensor Algebras

11.1 Definition: Recall that for a vector space U over a field F, we define the dual spaceof U to be the vector space

U∗ ={

linear maps L : U → F}.

Recall also, that when U is finite dimensional and U = {u1, u2, · · · , un} is a basis for U ,we can define linear maps fi : U → F for i = 1, 2, · · · , n by requiring that fi(uj) = δij , andthen F = {f1, f2, · · · , fn} is a basis for U∗ which is called the dual basis to U . We shallsometimes identify the double dual U∗∗ with U by identifying the element u ∈ U with thelinear map u : U∗ → F given by u(f) = f(u).

11.2 Definition: Let U1, U2, · · · , Uk and V be vector spaces over a field F. A map

L : U1 × U2 × · · · × Uk → V

is called k-linear when

L(u1, · · · , t ui, · · · , uk) = t L(u1, · · · , ui, · · · , uk) , and

L(u1, · · · , v + w, · · · , uk) = L(u1, · · · , v, · · · , uk) + L(u1, · · · , w, · · · , uk)

for all indices i, all vectors u1, · · · , uk, v, w in the appropriate vector spaces, and all t ∈ F.When U1, U2, · · · , Uk are finite dimensional, the tensor product of U1, U2, · · · , Uk is thevector space

U1 ⊗ U2 ⊗ · · · ⊗ Uk ={k-linear maps L : U∗1 × U∗2 × · · · × U∗k → F

}.

For u1, u2, · · · , uk with each ui ∈ Ui, we define (u1⊗u2⊗ · · ·⊗uk) ∈ U1⊗U2⊗ · · ·⊗Uk by

(u1 ⊗ u2 ⊗ · · · ⊗ uk)(g1, g2, · · · , gk) = g1(u1)g2(u2) · · · gk(uk),

where each gi ∈ U∗i .

11.3 Example: The dot product . : (Fn)2 → F given by u. v = vTu is a 2-linear map.

11.4 Example: An inner product 〈 , 〉 : U2 → R on a vector space U over R is 2-linear.

11.5 Example: The determinant D : (Fn)n → F given by D(u1, u2, · · · , un) = det(A),where A = (u1, u2, · · · , un) ∈Mn×n(F), is an n-linear map.

11.6 Example: The generalized cross product X : (Fn)n−1 → F is (n− 1)-linear.

53

11.7 Theorem: Let U1, U2, · · · , Uk be finite dimensional vector spaces. For each index i,let Ui be a basis for Ui. Then the set{

u1 ⊗ u2 ⊗ · · · ⊗ uk∣∣ each ui ∈ Ui

}is a basis for U1 ⊗ U2 ⊗ · · · ⊗ Uk. In particular dim(U1 ⊗ U2 ⊗ · · · ⊗ Uk) =

k∏i=1

dim(Ui).

Proof: Let Ui = {ui1, ui2, · · · , ui,ni} be a basis for Ui and let Fi = {fi1, fi2, · · · , fi,ni} bethe dual basis for U∗i . Then for appropriate indices i1, i2, · · · , ik and j1, j2, · · · , jk (that isfor 1 ≤ i1 ≤ n1 , 1 ≤ i2 ≤ n2 , · · · , 1 ≤ ik ≤ nk and similarly for j1, j2, · · · , jk) we have

(u1,i1 ⊗ u2,i2 ⊗ · · · ⊗ uk,ik)(f1,j1 , f2,j2 , · · · , fk,jk) = f1,j1(u1,i1)f2,j2(u2,i2) · · · fk,ik(uk, ik)

= δi1,j1δi2,j2 · · · δik,jk =

{1 if i1 = j1, i2 = j2, · · · , ik = jk

0 otherwise.

It follows that the set of elements of the form (u1,i1⊗u2,i2⊗· · ·⊗uk,ik) is linearly indepen-dent because if 0 = α =

∑i1,i2,···,ik

ai1i2···ik(u1,i1⊗u2,i2⊗· · · ,⊗uk,ik) then for all appropriate

choices of indices j1, j2, · · · , jk we have

0 =∑

i1,i2,···,ikai1i2···ik(u1,i1 ⊗ · · · ⊗ uk,ik)(f1,j1 , · · · , fk,jk) = aj1j2···jk

More generally, for gi ∈ U∗i with say gi =∑cijfij we have

(u1,i1 ⊗ u2,i2 ⊗ · · · ⊗ uk,ik)(g1, g2, · · · , gk)

= (u1,i1 ⊗ u2,i2 ⊗ · · · ⊗ uk,ik)(∑j1

c1,j1f1,j1 ,∑j2

c2,j2f2,j2 , · · · ,∑jk

c2,j2f2,j2

)=

∑j1,j2,···,jk

c1,j1c2,j2 · · · ck,jk(u1,i1 ⊗ u2,i2 ⊗ · · · ⊗ uk,jk)(f1,j1 , f2,j2 , · · · , fk,jk)

=∑

j1,j2,···,jk

c1,j1c2,j2 · · · ck,jkδi1,j1δi2,j2 · · · δik,jk = c1,i1c2,i2 · · · ck,ik .

It follows that the set of elements of the form (u1,i1⊗u2,i2⊗· · ·⊗uk,ik) spans U1⊗U2⊗· · ·⊗Ukbecause given L ∈ U1 ⊗ U2 ⊗ · · · ⊗ Uk, for g1, g2, · · · , gk with each gi ∈ U∗i , with saygi =

∑cijfij , we have

L(g1, g2, · · · , gk) = L(∑i1

c1,i1f1,i1 ,∑i2

c2,i2f2,i2 , · · · ,∑ik

ck,ikfk,ik

)=

∑i1,i2,···,ik

c1,i1c2,i2 · · · ck,ikL(f1,i1 , f2,i2 , · · · , fk,ik)

=∑

i1,i2,···,ik

L(f1,i1 , f2,i2 , · · · , fk,ik)(u1,i1 ⊗ u2,i2 ⊗ · · · ⊗ uk,ik)(g1, g2, · · · , gk)

so L =∑

i1,i2,···,ikai1i2···ik(u1,i1 ⊗ u2,i2 ⊗ · · · ⊗ uk,ik) with ai1i2···ik = L(f1,i1 , f2,i2 , · · · , fk,ik).

54

11.8 Example: For finite dimensional vector spaces U and V , there is a natural isomor-phism U∗ ⊗ V ∼= Lin(U, V ) obtained by identifying the element f ⊗ v ∈ U∗ ⊗ V with thelinear map f ⊗ v : U → V given by (f ⊗ v)(u) = f(u)v.

11.9 Remark: For finite dimensional vector spaces U1, U2, · · · , Uk and V , there is anatural isomorphism between the space of k-linear maps L : U1 × U2 × · · · × Uk → V andthe space of linear maps M : U1⊗U2⊗· · ·⊗Uk → V . This isomorphism sends the k-linearmap L : U1 × U2 × . . .× Uk → V to the liner map M : U1 ⊗ U2 ⊗ · · · ⊗ Uk → V given byM(u1 ⊗ u2 ⊗ · · · ⊗ uk) = L(u1, u2, · · · , uk) for all ui ∈ Ui.

11.10 Remark: When some of the vector spaces U1, U2, · · · , Uk are infinite dimensional,for vectors u1, u2, · · · , uk with each ui ∈ Ui, we can still define the k-linear map

u1 ⊗ u2 ⊗ · · · ⊗ uk : U∗1 × U∗2 × · · · × U∗k → F

by(u1 ⊗ u2 ⊗ · · · ⊗ uk)(g1, g2, · · · , gk) = g1(u1)g2(u2) · · · gk(uk).

When Ui is a basis for Ui for each i, the set of k-linear maps

S ={

(u1 ⊗ u2 ⊗ · · · ⊗ uk)∣∣ each ui ∈ Ui

}is linearly independent (but does not span the vector space of all k-linear maps). In thiscase we define the tensor product U1 ⊗ U2 ⊗ · · · ⊗ Uk to be the span of S.

11.11 Example: We have natural isomorphisms F[x]⊗ F[x] ∼= F[x]⊗ F[y] ∼= F[x, y]. Theelement f(x) ⊗ g(x) ∈ F[x] ⊗ F[x] corresponds to the element f(x) ⊗ g(y) ∈ F[x] ⊗ F[y]which corresponds to the element f(x)g(y) ∈ F[x, y].

11.12 Definition: For k ∈ Z+ we let Sk denote the set of all permutations of {1, 2, · · · , k},that is the set of all bijective maps σ : {1, 2, · · · , k} → {1, 2, · · · , k}. For a permutationσ ∈ Sk we denote the parity of σ by (−1)σ, in other words (−1)σ = det(Pσ) where Pσ isthe k × k permutation matrix Pσ = (eσ(1), eσ(2), · · · eσ(k))

11.13 Definition: Let U and V be vector spaces over a field F. Let L : Uk → V bek-linear. We say that L is symmetric when

L(u1, · · · , ui, · · · , uj , · · · , uk) = L(u1, · · · , uj , · · · , ui, · · · , uk)

for all indices i, j and all vectors u1, u2, · · · , uk ∈ U . Equivalently L is symmetric when

L(u1, u2, · · · , uk) = L(uσ(1), uσ(2), · · · , uσ(k))

for all vectors u1, u2, · · · , uk ∈ U and for every permutation σ ∈ Sk. We say that L isalternating (or skew-symmetric) when

L(u1, · · · , ui, · · · , uj , · · · , uk) = −L(u1, · · · , uj , · · · , ui, · · · , uk)

for all indices i, j and all vectors u1, u2, · · · , uk ∈ U . Equivalently, L is skew-symmetricwhen

L(u1, u2, · · · , uk) = (−1)σL(uσ(1), uσ(2), · · · , uσ(k))

for all vectors u1, u2, · · · , uk ∈ U and all permutations σ ∈ Sk.

55

11.14 Definition: Let U be a finite dimensional vector space. We define the space ofk-tensors on U , the space of symmetric k-tensors on U , and the space of alternatingk-tensors on U to be

T kU =k⊗i=1

U = U ⊗ U ⊗ · · · ⊗ U ={k-linear maps L : (U∗)k → F

},

SkU ={S ∈ T kU

∣∣S is symmetric},

ΛkU ={A ∈ T kU

∣∣A is alternating}.

11.15 Example: We have T 1U = S1U = Λ1U ={

linear maps L : U∗ → F}

= U∗∗,which we identify with U .

11.16 Definition: Let U be a finite dimensional vector space. For u1, u2, · · · , uk ∈ U , wedefined the tensor product (u1 ⊗ u2 ⊗ · · · ⊗ uk) ∈ T kU by

(u1 ⊗ u2 ⊗ · · · ⊗ uk)(g1, g2, · · · , gk) = g1(u1)g2(u2) · · · gk(uk)

where each gi ∈ U∗. We also define the symmetric product u1�u2�· · ·�uk ∈ SkU by

(u1 � u2 � · · · � uk)(g1, g2, · · · , gk) =∑σ∈Sk

(u1 ⊗ u2 ⊗ · · · ⊗ uk)(g1, g2, · · · , gk)

=∑σ∈Sk

gσ(1)(u1)gσ(2)(u2) · · · gσ(k)(uk).

and we define the wedge product u1 ∧ u2 ∧ · · · ∧ uk ∈ ΛkU by

(u1 ∧ u2 ∧ · · · ∧ uk)(g1, g2, · · · , gk) =∑σ∈Sk

(−1)σ(u1 ⊗ u2 ⊗ · · · ⊗ uk)(gσ(1)gσ(2) · · · , gσ(k))

=∑σ∈Sk

(−1)σgσ(1)(u1)gσ(2)(u2) · · · gσ(k)(uk)

= det

g1(u1) g1(u2) · · · g1(uk)g2(u1) g2(u2) · · · g2(uk)

......

...gk(u1) gk(u2) · · · gk(uk)

11.17 Theorem: Let U be a finite dimensional vector space. Let U = {u1, u2, · · · , un}be a basis for U . Then

(1){

(ui1 ⊗ ui2 ⊗ · · · ⊗ uik)∣∣ 1 ≤ i1, i2, · · · , ik ≤ n} is a basis for T kU ,

(2){

(ui1 ⊗ ui2 ⊗ · · · ⊗ uik)∣∣ 1 ≤ i1 ≤ i2 ≤ · · · ≤ ik ≤ n} is a basis for SkU , and

(3){

(ui1 ∧ ui2 ∧ · · · ∧ uik)∣∣ 1 ≤ i1 < i2 < · · · < ik ≤ n

}is a basis for ΛkU .

In particular we have dim(T kU

)= nk, dim

(SkU

)=(n+k−1

k

)and dim

(ΛkU

)=(nk

).

56

Proof: Part (1) follows immediately from Theorem 11.7. We shall prove Part (3) and leavethe proof of Part (2) as an exercise. Let F =

{f1, f2, · · · , fn

}be the dual basis for U∗.

Note that

(ui1 ∧ ui2 ∧ · · · ∧ uik

)(fj1 , fj2 , · · · , fjk

)=

∣∣∣∣∣∣∣fj1(ui1) fj1(ui2) · · · fj1(uik)

......

...fjk(ui1) fjk(ui2) · · · fjk(uik)

∣∣∣∣∣∣∣=

∣∣∣∣∣∣∣δi1,j1 δi1,j2 · · · δi1,jk

......

...δi1,jk δi2,jk · · · δik,jk

∣∣∣∣∣∣∣=

0 if for some l we have il 6= jm for all m

0 if il = im for some l 6= m

(−1)σ if il = jσ(l) for all l and some σ ∈ Sk.

In particular, when I = (i1, i2, · · · , ik) and J = (j1, j2, · · · , jk) are increasing (that is wheni1 < i2 < · · · < ik and j1 < j2 < · · · < jk ) we have

(ui1 ∧ ui2 ∧ · · · ∧ uik

)(fj1 , fj2 , · · · , fjk

)=

{0 if I = J

1 if I 6= J.

It follows that the set

S ={uI = (ui1 ∧ ui2 ∧ · · · ∧ uik)

∣∣I = (i1, i2, · · · , ik) is increasing}

is linearly independent because if∑I incr

aIuI = 0 then for all increasing J = (j1, j2, · · · , jk)

we have0 =

( ∑I incr

aIuI

)(fj1 , fj2 , · · · , fjk

)= aJ .

Given L ∈ ΛkU , for each increasing I = (i1, i2, · · · , ik), let aI = L(f1,i1 , f2,i2 , · · · , fk,ik).Then for g1, g2, · · · , gk ∈ U∗ with say gj =

∑i

cj,ifi, we have

L(g1, g2, · · · , gk) = L(∑i1

c1,i1fi1 ,∑i2

c2,i2fi2 , · · · ,∑ik

ck,ikfik

)=∑all I

(c1,i1c2,i2 · · · ck,ik

)L(f1,i1 , f2,i2 , · · · , fk,ik

)=∑I incr

∑σ∈Sk

(c1,iσ(1)c2,iσ(2) · · · ck,iσ(k)

)(−1)σL

(f1,i1 , f2,i2 , · · · , fk,ik

)=∑I incr

aI∑σ∈Sk

(−1)σc1,iσ(1)c2,iσ(2) · · · ck,iσ(k)

=∑I incr

aI

∣∣∣∣∣∣∣c1,i1 c1,i2 · · · c1,ik

......

...ck,i1 ck,i2 · · · ck,ik

∣∣∣∣∣∣∣ =∑I incr

aIuI(g1, g2, · · · , gk).

Thus we have L =∑I incr

aIuI ∈ Span (S) and so S spans ΛkU .

57

11.18 Example: Let U = {u1, u2, · · · , un} and V = {v1v2, · · · , vn} be two bases for U .Let α ∈ ΛkU . Say α =

∑I incr

aIuI =∑

J incr

bJvJ . Determine how aI and bJ are related.

Solution: Let F = {f1, f2, · · · , fn} and G = {g1, g2, · · · , gn} be the bases for U∗ whichare dual to U and V. Let P be the change of basis matrix P = [I]VU so that we havevj =

∑i

pijui. Note that

fi(vj) = fi(∑k

pkjuk)

=∑pkjfi(uk) =

∑k

pkjδik = pij .

We haveaI = α(fi1 , fi2 , · · · , fik) =

∑J

bJvJ(fi1 , fi2 , · · · , fik)

=∑J

bJ

∣∣∣∣∣∣∣fi1(vj1) fi1(vj2) · · · fi1(vjk)

......

...fik(vj1) fik(vj2) · · · fik(vjk)

∣∣∣∣∣∣∣=∑J

bJ

∣∣∣∣∣∣∣pi1,j1 pi1,j2 · · · pi1,jk

......

...pik,j1 pik,j2 · · · pik,jk

∣∣∣∣∣∣∣11.19 Definition: Given an n-dimensional vector space U , we define vector spaces

TU =∞⊕k=0

T kU , SU =∞⊕k=0

, ΛU =n⊕k=0

ΛkU.

The operations ⊗, � and ∧, which are defined on basis vectors, determine products oneach of the above vector spaces. A vector space with a compatible multiplication is calledan algebra, so the above three vector spaces, together with their products, are called thetensor algebra, the symmetric algebra, and the exterior algebra.

11.20 Example: If α ∈ ΛkU and β ∈ ΛlU then we have α ∧ β ∈ Λk+lU . Indeed ifU = {u1, u2, · · · , un} is a basis for U and we have α =

∑I incr

aIuI and β =∑

J incr

bJuJ , then

α ∧ β =∑I incr

∑J incr

aIbJ uI ∧ uJ

whereuI ∧ uJ = (ui1 ∧ · · · ∧ uik) ∧ (uj1 ∧ · · · ∧ ujl)

= ui1 ∧ · · · ∧ uik ∧ uj1 ∧ · · · ∧ ujl .

58

12. Jordan Canonical Form

12.1 Definition: Let F be a field. For m ∈ Z+ and λ ∈ F, we define the m×m Jordanblock for the eigenvalue λ to be the m×m matrix

Jmλ =

λ 1λ 1

λ 1. . .

. . .

λ 1λ

.

For n ∈ Z+, a matrix A ∈ Mn×n(F) is said to be in Jordan form when it is in theblock-diagonal form

A =

Jm1

λ1

Jm2

λ2

. . .

Jmlλl

for some l,mi ∈ Z+ and λi ∈ F.

12.2 Note: Our goal in this chapter is to prove that for every linear map L : U → Uon a finite dimensional vector space U over a field F, if fL(x) splits then there exists anordered basis A for U such that the matrix [L]A is in Jordan form, and that this Jordanform matrix is unique up to the order of the Jordan blocks.

Recall that when A = {u1, u2, · · · , un} is an ordered basis for U , the matrix [L]A isgiven by the formula

[L]A =([Lu1]A, [Lu2]A, · · · , [Lun]A

)∈Mn(F).

If follows immediately from this formula that when

A ={u11, u12, · · · , u1m1 , u21, u22, · · ·u2m2 , · · · , ul1, ul2, · · · , ulml

}is an ordered basis for U , the matrix [L]A is of the required Jordan form with blocks Jmiλiprecisely when for each index i with 1 ≤ i ≤ l, we have

Lui1 = λiui1 , Lui2 = ui1 + λiui2, , Lui3 = ui2 + λiui2 , · · · , Luimi = uimi−1 + λiuimi .

We can also write the above equations as

(L− λiI)ui1 = 0 , (L− λiI)ui2 = ui1 , (L− λI)ui3 = ui2 , · · · , (L− λiI)uimi = uimi−1 .

Notice that when these equations hold, we have

0 = (L− λI)ui,1 = (L− λI)2ui,2 = (L− λI)3ui,3 = · · · = (L− λI)miui,mi .

These considerations lead us to make the following definitions.

59

12.3 Definition: Let L : U → U where U is a finite dimensional vector space over a fieldF, and let λ ∈ Spec (L). The generalized eigenspace of L for λ is the vector space

Kλ = Kλ(L) ={u ∈ U

∣∣(L− λI)ku = 0 for some k ∈ Z+}.

A cycle of generalized eigenvectors for λ is an ordered m-tuple (u1, u2, · · · , um) with eachui ∈ U such that

(L− λI)u1 = 0 , (L− λI)u2 = u1 , (L− λI)u3 = u2 , · · · , (L− λI)um = um−1.

Note that for each index k we have (L− λI)kuk = 0 so that uk ∈ Kλ.

12.4 Note: The discussion in Note 12.2 shows that for an ordered basis A for U , thematrix [L]A is in Jordan form if and only if A is an ordered union of cycles of generalizedeigenvectors (for various eigenvalues).

12.5 Definition: Let L : U → U be a linear map on a vector space U over a field F.Let V ⊆ U be a subspace. We say that V is an invariant subspace of U under L whenL(V ) ⊆ V . Note that when V is invariant under L, the restriction of L to V gives a linearmap L : V → V .

12.6 Theorem: Let L : U → U be a linear map on a finite dimensional vector space Uover a field F. Suppose that fL(x) splits. Let λ ∈ Spec (L) and let µ ∈ F. Then

(1) If M : U → U is linear and commutes with L, then Kλ is invariant under M . Inparticular, Kλ is invariant under L, and under L− µI.(2) When µ 6= λ the map (L− λI) : Kµ → Kµ is an isomorphism.(3) We have dim(Kλ) ≤ mλ where mλ is the algebraic multiplicity of λ.

Proof: Let M : U → U be any map which commutes with L and let u ∈ Kλ, say(L−λI)uk = 0. Snce M commutes with L, M also commutes with L = λI and so we have

(L− λI)k(Mu) = M(L− λI)ku = M(0) = 0

and so M(u) ∈ Kλ. This proves Part (1).To prove Part (2), suppose that µ 6= λ and suppose, for a contradiction, that the map

(L− λI) : Kµ → Kµ is not an isomorphism. Choose 0 6= u ∈ Kµ such that (L− λI)u = 0.Choose k ∈ N so that (L−µI)ku 6= 0 but (L−µI)k+1u = 0 and let v = (L−µI)ku. Notethat v 6= 0 and v ∈ Eµ. Since (L − λI) commutes with (L − µI) and (L − λI)u = 0 wehave

(L− λI)v = (L− λI)(L− µI)ku = (L− µI)k(L− λI)u = (L− µI)k(0) = 0

so that v ∈ Eλ. Thus v 6= 0 and v ∈ Eλ ∩Eµ. But this is not possible since µ 6= λ so thatEλ ∩ Eµ = {0}.

By Part (1), Kλ is an invariant subspace so we can let M : Kλ → Kλ be the restrictionof L. Note that λ ∈ Spec (M) and, since fM (x) divides fL(x) (as we can see by choosingany basis for Kλ, extending it to a basis for U , and considering the associated matrices forL and M), it follows that fM (x) splits and Spec (M) ⊆ Spec (L). By Part (2), when µ 6= λthe map (L− µI) : Kλ → Kλ is an isomorphism, so that µ is not an eigenvalue of M , andso we have Spec (M) = {λ}. Thus fM (x) = ±(x− λ)d where d = dim(Kλ) and d ≤ mλ.

60

12.7 Theorem: Let L : U → U be a linear map on a finite dimensional vector space Uover a field F. Suppose that fL(x) splits. Then

U =⊕

λ∈Spec (L)

Kλ(L).

Proof: We prove by induction on the number of distinct eigenvalues of L. When L hasonly one eigenvalue λ, by Schur’s Theorem (or the Cayley-Hamilton Theorem) we have(L− λI)d = 0 where d = dimU so U = Null(L− λI)d = Kλ(L).

Suppose that L has at least 2 distinct eigenvalues and suppose the theorem holds forany linear map M with fewer eigenvalues that L. Let λ ∈ Spec (L). Since U is finitedimensional, we can choose p ∈ Z with 1 ≤ p ≤ d = dimU such that

U = Range(L− λI)0 ⊃6= Range(L− λI)1 ⊃6= · · · ⊃6= Range(L− λI)p = Range(L− λI)p+1.

Note that since we have Range(L − λI)p = Range(L − λI)p+1, it follows that the map(L− λI) : Range(L− λI)p → Range(L− λI)p+1 is surjective hence isomorphic, and so wehave Range(L− λI)p = Range(L− λI)k for all k ≥ p. It follows that we also have

{0} = Null(L− λI)0⊂6= Null(L− λI)1⊂6= · · · ⊂6= Null(L− λI)p = Null(L− λI)k

for all k ≥ p, so Kλ(L) ={u ∈ U

∣∣ (L−λI)ku = 0 for some k ∈ Z+}

= Null(L−λI)p. Let

V = Range(L− λI)p.

Since L−λI : V → V , we also have L : V → V . Let M : V → V be the restriction of L toV . Since fM (x) divides fL(x), it follows that fM (x) splits and that Spec (M) ⊆ Spec (L).Since M − λI = L − λI : V → V is an isomorphism, λ is not an eigenvalue of M so wehave Spec (M) ⊆ Spec (L) \ {λ}. By the induction hypothesis, we have

V =⊕

ν∈Spec (M)

Kν(M).

Let µ ∈ Spec (L) with µ 6= λ. We have

u ∈ Kµ(M) ⇐⇒ u ∈ V and (M − λI)ku = 0 for some k ∈ Z+

⇐⇒ u ∈ V and (L− λI)ku = 0 for some k ∈ Z+

⇐⇒ u ∈ V and u ∈ Kµ(L)

so Kµ(M) = Kµ(L) ∩ V . Since (L − λI) : Kµ(L) → Kµ(L) is an isomorphism, it followsthat Kµ(L) ⊆ Range(L − λI)p = V and so Kµ(M) = Kµ(L) ∩ V = Kµ(L), as claimed.Thus we have

V =⊕

µ∈Spec (M)

Kµ(M) =⊕

µ∈Spec (L)\{λ}

Kµ(L).

To prove U =⊕

λ∈SpecLKλ(L), we show that U =

∑λ∈Spec (L)

Kλ(L) and∑

λ∈Spec (L)dim(Kλ(L)) = d

where d = dimU . Given x ∈ U , let y = (L − λI)px ∈ V . For each µ ∈ Spec (L) \ {λ},choose vµ ∈ Kµ(L) so that y =

∑vµ. Since (L − λI), hence also (L − λI)p, is an

automorphism from Kµ(L), we can choose uµ ∈ Kµ(L) so that (L − λI)puµ = vµ. Thenwe have (L− λI)p(x−

∑uµ) = y −

∑vµ = 0 so we can choose uλ = x−

∑µ 6=λ

uµ ∈ Kλ(L)

to get x = uλ +∑uµ. This proves that U =

∑λ∈Spec (L)

Kλ(L). Finally, we note that since

dim(Kλ(L)

)≤ mλ we have

∑λ∈Spec (L)

dim(Kλ(L)

)≤

∑λ∈Spec (L)

mλ = dimU .

61

12.8 Theorem: Let L : U → U be a linear map on a finite dimensional vector space Uover a field F and let λ ∈ Spec (L). Then Kλ has an ordered basis which is an orderedunion of cycles of generalized eigenvectors.

Proof: Consider the restriction of L− λI to Kλ. Choose m ∈ Z+ so that

Kλ = Range(L− λI)0 ⊃6= Range(L− λI)1 ⊃6= · · · ⊃6= Range(L− λI)m = Range(L− λI)m+1.

Note that Range(L−λI)m = {0} since Kλ = Null(L−λI)m, and Range(L−λI)m−1 ⊆ Eλbecause if u ∈ Range(L− λI)m−1 then (L− λI)u ∈ Range(L− λI)m = {0}.

We describe a recursive procedure for constructing an ordered union of cycles which isan ordered basis for Kλ in which, at the kth step, we obtain a basis for Range(L−λI)m−k.We begin with the empty set, which is a basis for {0} = Range(L− λI)m. At the 1st stepwe choose a basis

{u1,1, u2,1, · · · , un,1

}for Range(L−λI)m−1 ⊆ Eλ. Suppose, inductively,

that after the kth step we have constructed a basis

A ={u1,1, u1,2, · · · , u1,`1 , u2,1, · · · , u2,`2 , · · · , ur,1, · · · , ur,`r

}for Range(L−λI)m−k ⊆ Kλ, where

{u1,1, · · · , ur,1

}is a basis for Range(L−λI)m−k ∩Eλ

and ui,j = (L − λI)ui,j+1 for all 1 ≤ i ≤ r and 1 ≤ j < `r. At the (k + 1)st step, foreach index i with 1 ≤ i ≤ r we choose ui,ì+1 ∈ Kλ such that (L− λI)ui,ì+1 = ui,ì and,in addition, we extend the basis {u1,1, · · · , ur,1} for Range(L − λI)m−k ∩ Eλ to obtain abasis

{u1,1,, · · · , ur,1, ur+1,1 · · · , us,1

}for Range(L− λI)m−k−1 ∩ Eλ. We then extend the

ordered basis A to the ordered set

B ={u1,1,, · · · , u1,`1 , u1,`1+1, u2,1,, · · · , u2,`2+1, · · · , ur,1, · · · , ur,`r+1 , ur+1,1, · · · , us,1

}.

It remains to prove that B is a basis for Range(L− λI)m−k−1.Consider the map M = (L − λI) : Range(L − λI)m−k−1 → Range(L − λI)m−k.

This map is surjective so that rank(M) = dim(Range(L − λI)m−k

)= |A|, and we have

Null(M) = Range(L− λI)m−k−1 ∩ Eλ so that nullity(M) = s, and it follows that

dim(Range(L− λI)m−k−1

)= rank(M) + nullity(M) = |A|+ s.

Now consider the mapN= (L−λI) :Span(B) ⊆ Range(L−λI)m−k−1→ Range(L−λI)m−k.Since (L − λI)(ui,j+1) = ui,j it follows that N is surjective so that rank(N) = |A|, andsince (L − λI)(ui,1) = 0 it follows that Range(L − λI)m−k−1 ∩ Eλ ⊆ Null(N) so thatnullity(N) ≥ s. Since B is obtained from A by adding exactly s elements, it follows that

|A|+ s = |B| ≥ dim(Span (B)

)= rank(N) + nullity(N) ≥ |A|+ s

so that |B| = dim(Span (B)

)= dim

(Range(L − λI)m−k−1

). Thus B is a basis for

Range(L− λI)m−k−1), as required.

12.9 Theorem: (Jordan Form) Let L : U → U be a linear map on a finite dimensionalvector space U over a field F. Suppose that fL(x) splits over F. Then there exists anordered basis A for U such that

[L]A =

Jm1

λ1

Jm2

λ2

. . .

Jmlλl

for some l,mi ∈ Z+ and some λi ∈ F, with the Jordan blocks Jmiλi

uniquely determined(up to order).

62

Proof: The existence of the ordered basis which puts the map L into Jordan form followsfrom the previous two theorems. It remains to show that the Jordan blocks are uniquelydetermined (up to order). Note that for m ∈ Z+ and λ ∈ F we have

(Jmλ − λI

)=

0 1

0 1. . .

. . .

. . . 10

,(Jmλ − λI

)2=

0 0 10 0 1

. . .. . .

. . .. . . 1

00

and so on until

· · · ,(Jmλ − λI

)m−1=

0 1

0

,(Jmλ − λI

)m= 0 .

It follows that for 0 ≤ k ≤ m we have

rank(Jmλ − λI)k

= m− k.

Also notice that for µ ∈ F with µ 6= λ we have

(Jmλ − µI

)=

λ− µ 1

λ− µ 1. . .

. . .

. . . 1λ− µ

and so

rank(Jmλ − µI

)= m.

Now suppose that there exists a basis A as stated in the theorem, so that A = [L]A is inJordan form with Jordan blocks Jmiλi

. The eigenvalues of L are the same as the eigenvalues

of A, namely λ1, · · · , λl. For indices i such that λi 6= λ, we have rank(Jmiλi− λI)k = mi.

For indices i with λi = λ and mi ≥ k, we have rank(Jmiλi− λI)k = mi − k. For indices

i such that λi = λ and mi < k, we have rank(Jmiλi− λI)k = 0. Let ak be the number of

indices i such that λi = λ and mi = k, and let bk be the number of indices i such thatλi = λ and mi ≥ k. Then since

[(L − λiI)k

]A = (A − λI)k, which is the block diagonal

matrix with blocks (Jmiλi− λI)k, we see that

rank(L− λI)k = n−(1 a1 + 2 a2 + 3 a3 + · · ·+ (k − 1)ak−1 + k bk

)= n−

(b1 + b2 + b3 + · · ·+ bk

)and so we have

bk = rank(L− λI)k−1 − rank(L− λI)k

and hence

ak = bk − bk+1 = rank(L− λI)k−1 − 2 rank(L− λI)k + rank(L− λI)k+1 .

This formula shows that the blocks Jmiλiare uniquely determined (up to order).

63

lecture notes for math 245, linear algebra 2

Documents