linear algebra short course lecture 4 - feedbackward.com filelinear algebra short course lecture 4...

Linear Algebra Short CourseLecture 4

Matthew J. [email protected]

Mathematical Informatics LabGraduate School of Information Science, NAIST

1

Some useful references

I Finite dimensional inner-product spaces, normal operators: Axler(1997, Ch. 6-7)

I Projection theorem on infinite-dimensional Hilbert spaces:Luenberger (1968, Ch. 3)

I Unitary matrices: Horn and Johnson (1985, Ch. 2)

2

Lecture contents

1. Inner products: motivations, terms, and basicproperties

2. Projections, orthogonal complements, andrelated problems

3. Linear functionals and the adjoint

4. Normal operators and the spectral theorem

5. Positive operators and isometries

6. Some famous decompositions

2

Key idea: generalizing geometric notions

Our progress thus far:

I Built a framework for sets with a linearity property

I Built a framework for functions with a linearity property

I Looked at some deeper results based on this framework

Note our framework was very general (operations on linear spaces offunctions, etc.).

Can we add length and angle to our general framework?

Yes, and the key notion is that of an “inner product” between vectors.

3

Geometric motivations from vector analysis on R3 1Typically define “projection” by its length, to start.

That is, if proj(x; y) ∈ R3 denotes projection of x onto direction of y,we require proj(x; y) satisfy

‖ proj(x; y)‖ = ‖x‖| cos(∠xy)|,natural considering the right-triangle of hypotenuse length ‖x‖.

4

Geometric motivations from vector analysis on R3 2To define the actual projection, just scale y. That is,

proj(x; y) ..=‖x‖ cos(∠xy)‖y‖

y.

This naturally depends on “what goes where” (i.e., asymmetric inarguments).

A convenient quantity for examining the direction of a vector pairx, y ∈ R3 is

x · y ..= ‖x‖‖y‖ cos(∠xy).

Clearly x · y = y · x and

x ⊥ y ⇐⇒ x · y = 0

∠xy acute ⇐⇒ x · y > 0

∠xy obtuse ⇐⇒ x · y < 0.

5

Geometric motivations from vector analysis on R3 3It is easy to geometrically motivate the validity of scalar product beinglinear in both terms, namely (x + z) · y = (x · y) + (y · y).

With this, and perpendicular unit coordinate vectors e1, e2, e3 ∈ R3

note

x · y = (x1e1 + x2e2 + x3e3) · (y1e1 + y2e2 + y3e3)

= x1y1 + x2y2 + x3y3

where xi..= ‖x‖ cos(∠xei), same for yj. 6

The inner product as a generalized scalar product 1Enough geometry, now we do algebra.Scalar product x · y captures both length and angle. Let’s generalize.

For x, y ∈ Rn, natural to extend length and angle quantifiers via

‖x‖ ..=√

x21 + · · ·+ x2

n

x · y ..= x1y1 + · · ·+ xnyn.

What about complex case? For u = a + ib ∈ C, just like R2, i.e.,

‖u‖ ..=√

a2 + b2 = (uu)1/2 =√|u|2.

Extending this length to u = (u1, . . . , un) ∈ Cn, naturally try

‖u‖ ..=√|u1|2 + · · ·+ |un|2.

As ‖u‖2 = u1u1 + · · ·+ unun, intuitively we’d like to consider defining

u · v ..= u1v1 + · · ·+ unvn.

7

The inner product as a generalized scalar product 2

(*) Note that the following properties of our extended scalar productson both Rn and Cn hold:

I conjugate symmetry, v · u = u · vI definiteness, u · u = 0 ⇐⇒ u = 0I linearity in first argument, (αu + βu′) · v = α(u · v) + β(u′ · v)

Recall that these properties are shared by the original scalar product.

In linear algebra, we start with inner product properties asaxioms.

We abandon the clunky “dot” for the more standard 〈·, ·〉 notationhenceforth.

8

Inner product

Consider vector space V on field F = R or C.

Defn. Call 〈·, ·〉 : V × V → C an inner product on V if∀ u, v,w ∈ V, α ∈ F,

IP.1 〈u, v〉 = 〈v, u〉IP.2 〈αu, v〉 = α〈u, v〉IP.3 〈u + w, v〉 = 〈u, v〉+ 〈w, v〉IP.4 〈u, u〉 ≥ 0, and 〈u, u〉 = 0 ⇐⇒ u = 0.

(*) Additivity actually holds in both arguments. Also, 〈u, αv〉 = α〈u, v〉,and 〈0, v〉 = 〈v, 0〉 = 0 for any v ∈ V .

Defn. Call (V, 〈·, ·〉) an inner product space. A complete IP space iscalled Hilbert space.

9

Inner product examples(*) Note Rn and Cn with 〈·, ·〉 defined by our generalized dot productare IP spaces.

(*) Note Pm() equipped with

〈p, q〉 ..=

∫ 1

0p(x)q(x) dx

is a valid inner product space.

(**) Recalling the space of real sequences

`p..=

{(x1, x2, . . .) ∈ R∞ :

∞∑i=1

|xi|p <∞

},

if use the Hölder inequality to show finiteness, can show the natural IP

〈x, y〉 ..=

∞∑i=1

xiyi

makes `2 an inner product space. 10

Inner product properties 1Defn. On IP space V , call ‖u‖ ..=

√〈u, u〉 the norm on V .

Let’s verify this naming is valid (considering general norm definition).

(*) [Cauchy-Schwartz] On IP space V ,

|〈u, v〉| ≤ ‖u‖‖v‖, ∀ u, v ∈ V.

Expand 0 ≤ 〈u− αv, u− αv〉 and cleverly pick α.

(*) We just need the triangle inequality. Expand ‖u + v‖2 and use C-Sto verify

‖u + v‖ ≤ ‖u‖+ ‖v‖, ∀ u, v ∈ V.

Be sure to check the other axioms to conclude that inner productsinduce valid norms.

11

Inner product properties 2(*) The generalized “Parallelogram Law” clearly follows:

‖x + y‖2 + ‖x− y‖2 = 2‖x‖2 + 2‖y‖2.

Defn. If 〈u, v〉 = 0, we say u and v are orthogonal, often denotedu ⊥ v. For any W ⊂ V , say u ⊥ W iff u ⊥ w, ∀w ∈ W.

(*) Clearly the Pythagorean theorem extends nicely,

u ⊥ v =⇒ ‖u + v‖2 = ‖u‖2 + ‖v‖2.

(*) If u ⊥ v, ∀ v ∈ V , then u = 0.

(*) A superb fact: the IP is continuous. That is, if sequences (un), (vn)in V converge to un → u and vn → v, then

〈un, vn〉 → 〈u, v〉.

12

Lecture contents







12

Considering the notion of a projection againWe previously considered proj(u; v) only geometrically.Now it is quite easy. Intuitively, we seek αv ∈ [{v}] such that

u = αv + w, where w ⊥ v.

(*) Check that the scalar then must be α = 〈u, v〉/‖v‖2, a nostalgicform indeed.

This “orthogonal projection” (formalized shortly) will play an importantrole moving ahead.

13

An optimization problem

Consider the following problem.Let V be an IP space, and X ⊂ V a subspace. Fix u0 ∈ V , and

find x̂ ∈ X which minimizes ‖x− u0‖ in x.

Natural questions:

Does a solution exist? Is it unique? What is it?

The answers to these questions are given by the “Projection Theorem,”a truly classic result.

Note: no requirement that dim V <∞ thus far.

14

The Projection Theorem

(*) Say x̂ ∈ X is s.t. ‖x̂− u0‖ ≤ ‖x− u0‖ for all x ∈ X. Then x̂ (theminimizing vector in X) is unique.

(*) Element x̂ (uniquely) minimizes ‖x− u0‖ ⇐⇒ x̂− u0 ⊥ X.

We have not shown that such an element need exist; to do this weneed slightly stronger conditions:

(**) Let V be a Hilbert space, and X ⊂ V a closed subspace. Then,for any u0 ∈ V ,

∃ x̂ ∈ X, ‖x̂− u0‖ ≤ ‖x− u0‖, ∀ x ∈ X.

This result is typically called the classical Projection theorem.Let’s develop these ideas further.

15

Orthogonal complements 1Let’s develop these ideas further. Let V be IP space.

Defn. Take any subset U ⊂ V , and denote by

U⊥ ..= {v ∈ V : u ⊥ v},

called the orthogonal complement of U.

(*) Note {0}⊥ = V and V⊥ = {0}. Also, for any U, have that U⊥ ⊂ Vis a closed subspace.

(*) Some additional properties (still allowing dim V =∞):

I U ⊂ U⊥⊥

I U ⊂ W =⇒ W⊥ ⊂ U⊥

I U⊥⊥⊥ = U⊥

I U⊥⊥ = [U]

16

Orthogonal complements 2The use of the term “complement” will now be justified.

(*) Let V be a Hilbert space and X ⊂ V a closed subspace. Then,

V = X ⊕ X⊥, and X⊥⊥ = X.

This result may be proved using the Projection Theorem.

Thus, orthogonal complements furnish a nice direct sumdecomposition. Uniquely have v = x + x′ with x ∈ X, x′ ∈ X⊥.

(*) If specialize to dim V <∞ everything simplifies further:

I In this case, V an IP space =⇒ V is Hilbert (recall Lec 1).I Thus, above result and Proj Theorem hold for any subspace.I Similarly, for subspace U ⊂ V have U⊥⊥ = U.I Naturally have dim U⊥ = dim V − dim U.

17

Orthogonal projection 1

With these terms down, we provide a general projection notion.

Defn. Let X ⊂ V be a subset, and take u ∈ V . Uniquely, have

u = x + x′

where x ∈ X, x′ ∈ X⊥. Define the orthogonal projection of u onto Xby proj(u;X) ..= x′ = u− x.

(*) This pops up naturally in the Proj Theorem, since

x̂ ∈ X minimizes ‖x− u0‖ ⇐⇒ x̂− u0 ∈ X⊥,

and as u0 = (u0 − x̂) + x̂, have x̂ = proj(u;X).

18

Orthogonal projection 2

(*) Projecting some x ∈ V in the direction of y ∈ V is tantamount toacquiring proj(x; [{y}]). It takes a familiar form. Decompose

x = αy + w, w ⊥ y. Thus, proj(x; [{y}]) = 〈x, y〉/‖y‖2

as we would hope.

19

Properties of orthogonal projection

Let U ⊂ V be a subspace of V . Assume dim V <∞. DenotePU(v) ..= proj(v;U) here.

(*) The following may readily be checked:

I PU ∈ L(V), a linear operator.

I range PU = U, null PU = U⊥.

I P2U = PU (idempotent map)

I ‖PU(v)‖ ≤ ‖v‖,∀ v ∈ V , a contraction.

(*) Interestingly, the latter two properties characterize the orthogonalprojections. That is, taking some S ∈ L(V),

S2 = S and ‖S(v)‖ ≤ ‖v‖,∀ v ∈ V =⇒ S = PU for some subspace U.

20

Orthogonal sets 1

Let S ⊂ V be a subset of IP space V .

Defn. We call S an orthogonal set if for any u, v ∈ S, we have u ⊥ v.We call S orthonormal if it is orthogonal and each u ∈ S is ‖u‖ = 1.

In the following useful way, here orthogonality connects with the morefundamental notion of independence seen earlier:

(*) If S ⊂ V is orthogonal, it is (linearly) independent.

21

Orthogonal sets 2Conversely, given independent sets, we can always “orthonormalize,”in the following sense.

(*) Given independent sequence v1, v2, . . ., exists orthonormalsequence e1, e2, . . . such that

[{v1, . . . , vn}] = [{e1, . . . , en}], for any n > 0.

Proving this is straightforward, and can be done constructively.Initialize e1

..= v1/‖v1‖. The rest are induced by

en = vn −n−1∑i=1

〈vn, ei〉ei.

This is often called the “Graham-Schmidt procedure.”

(*) Thus every inner product space has an orthonormal basis.

22

Orthogonal basesOrthonormal sets play an important role as convenient bases.

Let V be dim V = n, with orthonormal basis {e1, . . . , en}. Then forany v ∈ V , have

v = 〈v, e1〉e1 + · · ·+ 〈v, en〉en

‖v‖2 = |〈v, e1〉|2 + · · ·+ |〈v, en〉|2.

A classical result is nice to check here.

(*) (Schur’s theorem, 1909). For any finite-dim V on C and T ∈ L(V),there exists basis B such that

M(T;B) is upper-tri and B is orthonormal.

Show this with our “portmanteau theorem” for upper-tri representations(Lec 3), and the previous result via Graham-Schmidt.

23

Optimization example 1

First example: find the optimal approximation of sin(x) on [-pi,pi] by a5th-degree polynomial.

24


Second example: Find the closest element in the subspace generatedby m vectors to an arbitrary vector.

25


Third example: Find the element of an affine set which has thesmallest norm (this is of course the distance from any element in thataffine set to the associated hyperplane through the origin).

26


Fourth example: Minimum distance from an arbitrary element to aconvex set.

27

Lecture contents







27

Linear functionals 1

Linear maps which return scalars play an important role in linearalgebra (and other related fields).

Defn. Let V be a vector space on F. Any f ∈ L(V,F) is called alinear functional.

(*) The following are linear functionals:

I On Rn over F, f (x) ..=∑n

i=1 αixi, any fixed α ∈ Fn. In fact, everyg ∈ L(Rn,F) takes this form.

I On real P6(R), f (x) ..=∫ 1

0 x(t) cos(t) dt.I On C[0, 1], f (x) ..= x(0.5).I On Hilbert space H, f (x) ..= 〈x, h〉 for fixed h ∈ H.

The last example here will be of particular interest to us.

28

Linear functionals 2

A very neat fact:

(*) On IP space (V, 〈·, ·〉) with dim V = n, let f be a linear functional.Then, there exists a unique v ∈ V such that

f (u) = 〈u, v〉, ∀ u ∈ V.

To see this, recall we can always find an orthonormal basis{e1, . . . , en} of V . Expand arbitrary u wrt this basis, examine f (u)using linearity of f .

This result is a special case of the Riesz-Fréchet theorem, whichextends things to infinite-dim case. See for example Luenberger(1968, Ch. 4).

29

Adjoint of a linear mapA very important notion moving forward.

Defn. Let U,V be IP spaces on F, with dim U, dim V <∞. Take anyT ∈ L(U,V), fixed. For any v ∈ V fixed,

f (u) ..= 〈Tu, v〉

is clearly a linear functional f ∈ L(U,F). By Riesz-Fréchet, ∃ u∗ ∈ U,unique, s.t.

f (u) = 〈u, u∗〉, u ∈ U.

The initial v was arbitrary, so, we may define a map T∗ : V → U by

T∗(v) ..= u∗ as above.

We call T∗ the adjoint of T . Somewhat subtle, but critical.Critical to memorize: 〈T(u), v〉 = 〈u,T∗(v)〉.

30

Properties of adjoints(*) Define T : R3 → R2 by

T(x1, x2, x3) ..= (x2 + 3x3, 2x1),

and verify with usual inner products that T∗(y) = (2y2, y1, 3y1).Simply note we must have for any y ∈ R2 that 〈Tx, y〉 = 〈x,T∗y〉.

(*) For any T ∈ L(U,V) as in previous slide, have T∗ ∈ L(V,U).

(*) Verify the following properties of (·)∗ map of T 7→ T∗. TakeT,T ′ ∈ L(U,V) and α ∈ F.

I (T + T ′)∗ = (T)∗ + (T ′)∗

I (αT)∗ = α(T)∗

I (T∗)∗ = T .I For T ∈ L(U,V), S ∈ L(V,W) have (ST)∗ = T∗S∗, where W

any IP space.

31

More properties of adjoints(*) Let T ∈ L(V) and take α ∈ F. Then,

α ∈ σ(T) ⇐⇒ α ∈ σ(T∗).(*) Let U ⊂ V be a subspace, and T ∈ L(V). Then,

U is T-invariant ⇐⇒ U⊥ is T∗-invariant.

(*) Take T ∈ L(V,W). ProveI T is injective iff T∗ is surjective.I T is surjective iff T∗ is injective.

(*) With this, take T ∈ L(V,W) and verify

dim null T∗ = dim null T + dim W − dim V

as well as dim range T∗ = dim range T .

(*) Note the above result completes the generalization of Strang’s firstfundamental theorem (i.e., row/colspaces have same dimension),mentioned in Lec 2.

32

Connections between a map and its adjoint(*) Take any T ∈ L(U,V), both U,V IP spaces. Then,

I null T∗ = (range T)⊥

I range T∗ = (null T)⊥

I null T = (range T∗)⊥

I range T = (null T∗)⊥

Defn. Denote the conjugate transpose of matrix A = [aij] ∈ Fm×n byA∗ ..= AT = [aji].

Given a proper matrix representation of a linear map, we can easilyfind the representation of its adjoint:

(*) Take T ∈ L(U,V) for finite-dim IP spaces U,V . Let BU,BV berespectively orthonormal bases of U and V . Then,

(M(T;BU,BV))∗ = M(T∗;BV ,BU).

33

Lecture contents







33

Operators on inner product spaces

The flow through the first three lectures was:

I Linear spaces (sets with linearity)

I Linear maps (functions with linearity)

I Linear operators on general spaces

Now, considering what we’ve seen in this lecture, the next key point totackle is

I Linear operators on inner product spaces

That is precisely what we look at now.

34

Self-adjoint operatorsLet V be a finite-dim IP space, take T ∈ L(V).

Defn. Call operator T self-adjoint or Hermitian when T = T∗.

(*) Take T ∈ L(F2) defined to have matrix

M(T) =[

19 γ7 59

]wrt standard basis of course. Note T self-adjoint ⇐⇒ γ = 7.

(*) Similarly, we may confirm that for arbitrary orthonormal basis B,

T = T∗ ⇐⇒ M(T;B) = (M(T;B))∗.

A natural connection to the more familiar matrix territory.

35

Properties of self-adjoint operators

(*) If T, S ∈ L(V) are self-adjoint, then T + S is self-adjoint.

(*) If T ∈ L(V) is self-adjoint, then for α ∈ R, αT is self-adjoint.

(*) Let T ∈ L(V) be self-adjoint. Then every eigenvalue is real(recalling F may be either C or R).

(*) Of course, for T ∈ L(Fn) specified by A ∈ Fm×n, this is already amatrix WRT the standard basis, so just look at A.

A nice analogy:Think of the self-adjoint operators among all operators like R as asubset of C (adjoint operation (·)∗ analogous to complex conjugateoperation (·)).

36

Characterizing the self-adjoint operators

A characterization of the self-adjoint operators is given at the start ofsection 5, but technically is used for upcoming results.

37

The normal operatorsThe next very important class of operators.

Defn. Take T ∈ L(V). When T commutes with its adjoint, that is, if

TT∗ = T∗T,

we call T a normal operator.

(*) Every self-adjoint operator is normal.

(*) Let B be an orthonormal basis. Then, T is normal iff M(T;B) andM(T∗;B) commute.

(*) Consider T ∈ L(F2) with matrix (wrt standard basis)

M(T) =[

2 −33 2

],

clearly normal, not self-adjoint. Thus normals are larger class.

38

Properties of normal operatorsNormal operators need not equal their adjoints, yet they have a lot incommon with them:

(**) Norms of maps are common:

T is normal ⇐⇒ ‖T(v)‖ = ‖T∗(v)‖, v ∈ V.

(*) This implies for normal T ∈ L(V),

null T = null T∗.

(*) Their eigenvectors are closely related. Let T is normal andα ∈ σ(T). We have that

if Tv = αv, then T∗v = αv.

(*) This gives us a critical property. Let α1, . . . , αm be the distincteigenvalues of T ∈ L(V), with eigenvectors v1, . . . , vm. Then,

{v1, . . . , vm} is orthogonal.

This clearly strengthens previous results (only had independence).39

The spectral theorem, intuitivelyRecall we know that for T ∈ L(V), dim V = n,

T is “diagonalizable” ⇐⇒ ∃ basis {v1, . . . , vn}, vi eigenvectors.

While such a T is nice, in general we have no guarantee that basis{v1, . . . , vn} is orthogonal, which is really the “nicest” setup.

The spectral theorem characterizes the very nicest operators:

C version:The nicest operators are the normal operators.

R version:The nicest operators are the self-adjoint operators.

Why is this useful?It gives us easy access to an orthonormal basis!(in general, we have only existence guarantees)

40

The spectral theoremWork on V , dim V = n. Take T ∈ L(V).

(**) Complex spectral theorem. Assume V on F = C.

T is normal ⇐⇒ ∃ ortho basis {v1, . . . , vn}, all eigenvecs

(**) Real spectral theorem. Assume V on F = R.

T is self-adjoint ⇐⇒ ∃ ortho basis {v1, . . . , vn}, all eigenvecs

Proving these results is somewhat involved (though we have the toolsrequired), but absolutely worth doing.

Key take-aways:

For the “nicest” operators (and only the nicest operators), theeigenvectors furnish an orthogonal basis.

All the self-adjoint operators (on general F) can bediagonalized via an orthogonal basis.

41

Illustrative examples 1

Example. (*) Define T ∈ L(C2) by

M(T) =[

2 −33 2

]wrt standard basis of C2. Confirm

B =

{(i, 1)√

2,(−i, 1)√

2

}is a orthonormal basis, both are eigenvectors of T , and indeed thatM(T;B) is diagonal.

42

Illustrative examples 2

Example. (*) Similar deal, this time T ∈ L(R3), with matrix (wrtstd. basis)

M(T) =

14 −13 8−13 14 8

8 8 −7

.Check the same properties as in previous slide, this time for the basis

B′ ={(1,−1, 0)√

2,(1, 1, 1)√

3,(1, 1,−2)√

6

}.

43

Some comments on the R caseEven on R, when restricted to self-adjoint operators, things simplify.

(**) Let T ∈ L(V) on R be self-adjoint. Take a, b ∈ R s.t. a2 < 4b.Then,

T2 + aT + bI ∈ L(V) is invertible.

(*) This implies T has no “eigenpairs,” and thus (recall Lec 3),

=⇒ σ(T) 6= ∅.

Of course, we know this last fact must hold, since we’ve alreadypresented the real spectral theorem.

Note: we haven’t characterized the normal operators in the real case.For this, see Axler (1997, Ch. 7).

44

Specialized structural results

With the extra assumptions of the “nice” operators, the structuralresults specialize nicely, proving us with respectively orthogonalsubspaces.

(*) Let T ∈ L(V) be self-adjoint if F = R (normal if F = C), withdistinct eigenvalues α1, . . . , αm. Then,

V = null(T − α1I)⊕ · · · ⊕ null(T − αmI)

and null(T − αiI) ⊥ null(T − αjI), all i 6= j.

Thus, the spectral information of any “nice” T yields an orthogonaldecomposition of V .

45

Lecture contents







45

Characterizing the self-adjoint operatorsReal case:The real spectral theorem characterizes self-adjoint operators.

Complex case:We haven’t discussed this yet.

(**) For any T ∈ L(V) on C, say

〈Tv, v〉 = 0, ∀ v ∈ V.

Then, T = 0 (for self-adjoint T , holds for R case).

(*) It follows that for T ∈ L(V) on C,

T self-adjoint ⇐⇒ 〈Tv, v〉 ∈ R, v ∈ V.

So, complex self-adjoint operators are precisely those for which any vand its map T(v) have a real inner product.

Important special case: when 〈Tv, v〉 ≥ 0, all v ∈ V .46

Positive operators and square rootsLet V be dim V <∞.

Defn. Focus on self-adjoint T ∈ L(V). Call T a positive(semi-definite) operator if

〈Tv, v〉 ≥ 0, ∀ v ∈ V.

(*) Of course, for C case, self-adjoint requirement is superfluous.

(*) For any subspace U ⊂ V , the projection operator proj(·;U) ispositive.

Defn. Take any operator T ∈ L(V). If exists S ∈ L(V) such that

S2 = T,

then call S a square root of operator T .

(*) Find a square root of T ∈ L(F3) defined T(z1, z2, z3) ..= (z3, 0, 0).

47

Portmanteau theorem for positive operators

(**) Take any T ∈ L(V) on finite-dim V . The following are equivalent.

A T is positive; i.e., T = T∗ and 〈Tv, v〉 ≥ 0, all v.

B T = T∗ and eigenvalues of T are non-negative.

C Exists positive Q ∈ L(V) such that Q2 = T .

D Exists self-adjoint R ∈ L(V) such that R2 = T .

E Exists S ∈ L(V) such that S∗S = T .

(**) When T ∈ L(V) is positive, there exists a unique Q ∈ L(V)s.t. Q2 = T . That is, T has a unique square root. Denote

√T ..= Q.

Great. We’ll sort out the implications in the next slide.

48

Key properties of positive operators

(*) From the results of the previous slide:

I Only positive operators have positive square roots

I If an operator has a positive square root, this root is unique.

I Positive operators form a subset of self-adjoint operators.

I Not only are eigenvalues real, they’re positive.

I For any S ∈ L(V), S∗S is positive.

I If S is positive or self-adjoint, S2 is positive.

(*) Let T ∈ L(V) be positive. Show

T invertible ⇐⇒ 〈Tv, v〉 > 0, ∀ v 6= 0.

49

IsometriesNorm-preserving operators are also naturally of interest.

Defn. Call T ∈ L(V) an isometry if

‖Tv‖ = ‖v‖, ∀ v ∈ V.

This is a general term. For specific cases, other names are used:

I If F = C, call T a unitary operator.I If F = R, call T an orthogonal operator.

(*) Let β ∈ F be |β| = 1. Note T ..= βI is an isometry.

(*) Let {v1, . . . , vn} be orthonormal basis of V . Define T ∈ L(V) by

T(vi) ..= βivi,

with |βi| = 1 for each i = 1, . . . , n. Then T is positive.

(*) Counter-clockwise rotation on V = R2 is an isometry.

50

Many useful properties of isometries(*) If T ∈ L(V) an isometry, T−1 exists.

(**) Let T ∈ L(V). The following are equivalent.

A T is an isometry.

B 〈Tu,Tv〉 = 〈u, v〉 for all u, v ∈ V (preserves IP)

C T∗T = I

D For any orthonormal set {e1, . . . , em}, the mapped{Te1, . . . ,Tem} also orthonormal (0 ≤ m ≤ n).

E Exists basis {v1, . . . , vn} such that {Tv1, . . . ,Tvn} isorthonormal.

F T∗ is an isometry.

This is a nice collection of characterizations for the special case ofisometries, which yields some critical implications.

51

Implications of isometry equivalencesSome key implications:

(*) Clearly, if T an isometry, have T−1 = T∗.

(*) T preserves norms ⇐⇒ T preserves inner products.

(*) Every isometry is normal.

(*) Now a great equivalence. Let E ..= {e1, . . . , en} be anyorthonormal basis of V on F. Then,

T an isometry ⇐⇒ columns of M(T;E) orthonormal

To see this, use A =⇒ D for the =⇒ direction, and E =⇒ A forthe ⇐= direction.

(*) Using A ⇐⇒ F, show that analogous condition holds using therows of M(T;E) above.

52

More concrete characterization of isometries

The previous characterizations of isometries were quite general. Let’sput forward a more concrete equivalence condition.

Complex case:(*) Let T ∈ L(V), on C. The following condition is both sufficient andnecessary for T ∈ L(V) to be an isometry.

Exists {v1, . . . , vn}, orthonormal basis of V , where the vi areeigenvectors of T , with eigenvalues |αi| = 1.

Real case:Similar to Lecture 3, a bit less elegant. See Axler (1997, Ch. 7).

53

Symmetric real matrices

In probability/statistics, symmetric real matrices appear frequently.

We’ve said a lot about how working on R is somewhat inconvenient.What’s so special about symmetric matrices?

That’s easy: Let A ∈ Rn×n be symmetric. Then,

I T ∈ L(Rn) def’d T(x) ..= Ax is self-adjoint

I T is normal.

I T has eigenvalues, and Rn×n has an orthonormal basis{v1, . . . , vn} of T ’s eigenvectors.

I T may be “diagonalized” by {v1, . . . , vn}.I Specifically, A may be diagonalized by COB matrix [v1 · · · vn].

54

Lecture contents







54

Some famous decompositions

Here we look at conditions for some well-known decompositions:

I Schur

I Polar

I Singular value

I Spectral

Here we periodically switch over to “matrix language” to illustrate thegenerality of our results thus far.

55

Schur’s decomposition

See for example Magnus and Neudecker (1999).

(*) Let A be complex, n× n matrix. Then, there exists unitary matrix Rsuch that

R∗AR =

α1 ∗. . .

0 αn

,where the αi are eigenvalues of A.

To see this:Easy, just let T(z) ..= Az, which we know can always beupper-triangularized by an orthonormal basis E = {v1, . . . , vn}.Construct COB matrix (here R) using these vi. Done.

56

Polar decomposition: first, an analogy

A nice analogy exists between C and L(V):

z ∈ C · · · T ∈ L(V)

z ∈ C · · · T∗ ∈ L(V)

z = z, i.e. R ⊂ C · · · T = T∗, i.e. {self-adjoint ops.} ⊂ L(V)

x ∈ R, x ≥ 0 · · · {positive ops.} ⊂ {self-adjoint ops.}unit circle {z : zz = 1} · · · isometries, {T : T∗T = I}

Note any z ∈ C can be written

z =(

z|z|

)√zz, of course noting z/|z| on unit circle.

Following the analogy, we wonder whether for any T ∈ L(V) we havean isometry S such that T breaks down into S

√T∗T . . .

57

Polar decompositionIndeed, the analogy proves to lead us in a fruitful direction.

(**) (Polar decomposition). Let T ∈ L(V) over F. Then, existsisometry S ∈ L(V) such that

T = S√

T∗T.

The naming refers to z = eθir, θ ∈ [0, 2π), where r = |z|.Here S (like eθi) only changes direction.Magnitude is determined by

√T∗T (like r).

Why is this nice? T is totally general, but,

T = isometry × positive operator

i.e., it breaks into two classes we know very well!

58

Polar decomposition, in matrix language(*) Let A ∈ Fn×n. Then, exists unitary matrix Q and positivesemi-definite matrix P such that

A = QP.

To see this:Take matrix of T ∈ L(Fn) defined by A with respect to usual basis B,so

A = M(T;B) = M(S;B)M(√

T∗T;B),

where S an isometry, and√

T∗T is positive. Verify M(S;B) is unitaryand M(

√T∗T;B) is positive semi-definite. Done.

(*) Also note that in decomposing any T into an isometry/positiveoperator product, the only choice for the positive operator is

√T∗T .

59

Singular values of operators

Clearly, for T ∈ L(V), the positive operator√

T∗T clearly plays animportant role. It pops up in both theory and practice.

Take any T ∈ L(V) on general F.

T need not have real eigenvalues, nor even any at all. However,√

T∗T always has real, non-neg eigenvalues.

Defn. Call the eigenvalues si ∈ σ(√

T∗T) the singular values of T .

60

Singular value decomposition (SVD) 1(*) Let’s see the interesting role that σ(

√T∗T) plays.

Take T ∈ L(V), dim V = n. Let s1, . . . , sn denote the eigenvalues of√T∗T , up to multiplicity.

By spectral theorem, exists {b1, . . . , bn}, eigenvectors of√

T∗T ,forming an orthonormal basis of V . Taking v ∈ V , recall

v = 〈v, b1〉b1 + · · ·+ 〈v, bn〉bn.

Note that via polar decomp. T = S√

T∗T ,

Tv = S√

T∗Tv

= 〈v, b1〉s1Sb1 + · · ·+ 〈v, bn〉snSbn,

and as S is an isometry, {Sb1, . . . , Sbn} is an orthonormal basis of V .

61

Singular value decomposition (SVD) 2(*) With this handy decomposition, one may easily check that forB1

..= {b1, . . . , bn} and B2..= {Sb1, . . . , Sbn}, have

M(T;B1,B2) =

s1 0. . .

0 sn

,another rare appearance of matrix reps with distinct bases.

To estimate singular values:Finding

√T∗T explicitly may be hard. For fixed basis B, let G be the

matrix that diagonalizes such that

GM(√

T∗T;B)G∗ = M(T;B1,B2),

clearly M(T;B1,B2)2 = GM(T∗T;B)G∗. So

σ(T∗T) = {s21, . . . , s

2n}.

Estimating the eigenvalues of positive T∗T is easier. Take roots.

62

SVD, in (square) matrix language(**) (Matrix SVD). Consider A ∈ Fn×n. Then we may factorize A into

A = QDR∗,

where Q,R are unitary matrices, and D is diagonal, and whosediagonal entries are precisely the singular values of A.

To see this:Defining T(z) ..= Az, via polar decomposition T = S

√T∗T , letting B be

usual basis,

A = M(T;B) = M(S;B)M(√

T∗T;B)

= M(S;B)GDG∗,

where G = M(I;E,B), and E = {v1, . . . , vn} is an orthonormal basisdiagonalizing

√T∗T , where

√T∗Tvi = si.

G has ortho cols, equivalent to G being unitary. Let R∗ ..= G∗ = G−1.Set Q ..= M(S;B)G, unitary as both M(S;B) and G are.

63

SVD-related additional properties(*) Take T ∈ L(V), singular values s1, . . . , sn. Then,

T invertible ⇐⇒ si 6= 0, i = 1, . . . , n.

(*) Take T ∈ L(V). Then,

dim range T = |{s ∈ σ(√

T∗T) : s 6= 0}|.

(*) Take S ∈ L(V), singular values s1, . . . , sn. Then,

S an isometry ⇐⇒ si = 1, i = 1, . . . , n.

(*) Let s∗ and s∗ denote the smallest and largest singular values ofT ∈ L(V). Then,

s∗‖v‖ ≤ ‖Tv‖ ≤ s∗‖v‖, any v ∈ V.

64

Singular values of general linear mapsIn fact, the “usual” singular value decomposition extends to the moregeneral case of A ∈ Fm×n quite easily.

(*) Note of course for finite-dim IP spaces U,V , taking T ∈ L(U,V),

T∗T ∈ L(U), (T∗T)∗ = T∗T,

thus T∗T self-adjoint, and furthermore, for u ∈ U,

〈T∗Tu, u〉 = 〈T∗(Tu), u〉 = 〈Tu,Tu〉 ≥ 0,

and so T∗T is in fact positive, as we would hope.

Thus may define singular values of T ∈ L(U,V) by σ(√

T∗T).

For more on the general SVD, see Horn and Johnson (1985, Ch. 7).

65

Spectral (or eigen-) decompositionLet A ∈ Fn×n be self-adjoint. We may then express A as

A =

n∑i=1

αivivTi ,

where αi are eigenvalues of A, with respective orthonormaleigenvectors vi.

To see this:Letting T(z) ..= Az, as T is self-adjoint, have orthonormal basis ofeigenvectors E ..= {v1, . . . , vn}. Let B be usual basis. Then,

A = M(T;B) = M(I;E,B)DM(I;B,E),

where D is diagonal, populated by eigenvalues αi, andM(I;E,B) = [v1 · · · vn]. Matrix multiplication yields our result.

66

The rest of the decompositionsThe rest of the basic famous decompositions can be shown usingtypically algorithmic approaches. For example:

QR factorizationAny A ∈ Fm×n, can get A = QR, Q ∈ Fm×n with orthonormalcolumns, R ∈ Fn×n upper-tri.

Cholesky factorizationAny positive definite A ∈ Fn×n may be factorized as A = LL∗, with Llower-tri with non-neg diagonal elements. Note A = S∗S for somesquare S. Applying the QR result to S yields the result.

For QR and Cholesky, see Horn and Johnson (1985, Ch. 2).

For LU decomposition (a lot of technical details), see Horn andJohnson (1985, Ch. 3).

67

Lecture contents







67

References

Axler, S. (1997). Linear Algebra Done Right. Springer, 2nd edition.

Horn, R. A. and Johnson, C. R. (1985). Matrix Analysis. Cambridge University Press, 1st edition.

Luenberger, D. G. (1968). Optimization by Vector Space Methods. Wiley.

Magnus, J. R. and Neudecker, H. (1999). Matrix differential calculus with applications in statisticsand econometrics. Wiley, 3rd edition.

68

linear algebra short course lecture 4 - feedbackward.com filelinear algebra short course lecture 4...

Documents