linear algebra weeks 1234

7/31/2019 Linear Algebra Weeks 1234

1/31

Linear Algebra

JVN, Premaster Summer 2012

Time: 8:00am - 11:00am Tuesday, ThursdayLocation: Room 2, JVN Building

Contents

Part 1. Preliminaries 11. Sets 11.1. Basic Objects 11.2. Set Operations 11.3. Maps between Sets 12. Topological Spaces 23. Groups 44. Rings and Fields 6

Part 2. Linear Algebra 85. Vector Spaces 85.1. Basic Objects 85.2. Inner Product 105.3. Norm 125.4. Angle between Vectors 145.5. Exercises 156. Linear Maps and Matrices 15

6.1. Morphisms between Vector Spaces 156.2. Kernel, Nullity, Image, Rank 166.3. Matrix Operations 176.4. Invertible Matrices and their Inverses 236.5. Exercises 31

Part 1. Preliminaries

1. Sets

1.1.Basic Objects.

We begin with the most fundamental object in mathematics - a set.Definition 1.1. A set X is a collection of distinct, definite objects.

Each object is called an element ofX and written x X. Each subcollection U in X iscalled a subset of X and denoted U X. Each subset usually contains elements of someproperty P. The empty set is denoted . A finite set is denoted X = {x1, x2,...,xn}. Acountably infinite set is denoted X = {x1, x2, x3,...}.Example 1.2. The set of all positive natural numbers is N+ = {1, 2, 3,...}, the set of allnatural numbers isN = {0, 1, 2,...}, the set of all integral numbers is Z = {..., 2, 1, 0, 1, 2,...}

1


2/31

2

and the set of all real numbers is R. Surely N+ N Z R. The properties are beingpositive natural, being natural and being integral.

Example 1.3. The set of all possible outcomes when we roll a die is X = {1, 2, 3, 4, 5, 6}.Example 1.4. The set of all possible outcomes when we toss a coin is X = {H, T}.Example 1.5. The set of all subsets of X is called its power set and denoted P(X).

1.2. Set Operations. Below are basic set operations, all of which can be seen and verifiedby Venns diagrams.complement: if U X then Uc = {x X| x / U}.union: A

B = {x | x A or x B}.

intersection: A

B = {x | x A and x B}.disjoint: if A

B = then we say A and B are disjoint.

partition: if

Ai = X and the Ai are disjoint then we say they partition X.De Morgans law: (AB)c = AcBc and (AB)c = AcBcrelative complement of B in A: A\B = {all x in A but not in B}.symmetric difference: AB = (AB)\(AB) = (A\B)(B\A)1.3. Maps between Sets. It is really important to consider the relationships, or mapsbetween sets beside looking within each set.

Definition 1.6. A map Xf- Y between X and Y is a law that assigns to each x X

a unique element f(x) Y.Example 1.7. For every nonempty set X there exists a map X

f- X, x x, called

the identity map and often denoted idX.

Example 1.8. For every nonempty sets X, Y there exists a map Xf- Y, x y0 for

some y0 Y, called a constant map.Definition 1.9. Given X

f- Y and Y

g- Z we define their composition to be

Xgf- Z, x g(f(x)).

Below is the picture of composition,

Xf

- Y x - f(x)

Z

g

?

gf

-

g(f(x))?

-

Definition 1.10. A map Xf- Y is called injective (or one-to-one) if f(x) = f(x)

whenever x = x X. It is called surjective (or onto) if f(X) = {f(x), all x X} = Y.It is called bijective (or one-to-one and onto) if it is both injective and surjective.

Every bijection Xf- Y has an inverse Y

g- X, y x where x is the unique element

in X that maps to y. We denote this map as f1. Surely gf = idX and f g = idY, togetherthey solidify the impression that X and Y are the same.


3/31

3

Example 1.11. If m n then there exists an injection {x1,...,xm} f- {y1,...,yn}. Ingeneral, an injection X

f- Y allows us to view X as a subset of Y. Conversely, each

subset U

X is an injection U

f- X.

Example 1.12. Ifm < n then there does not exist any surjection {x1,...,xm} f- {y1,...,yn}.The same does not hold for infinite sets, as we can have a surjection 5Z - Z, 5z z.

Already maps allow us to rigorously compare the cardinalities of sets. We say that

|X| |Y| if there exists an injection X f- Y, that |X| |Y| if there exists a surjectionX

f- Y, and that |X| = |Y| if there exists a bijection X f- Y. In the last case, we

can define an inverse Yg- X, y = f(x) x such that gf = idX and f g = idY. This

inverse is unique and denoted f1.

Example 1.13. 5Z, Zn for all n 1, Q all have the same cardinality.Exercise 1.14. Show that |Z| < |R| with strict inequality.

2. Topological Spaces

One of the first structures we can give a set X is to pick out some subsets in X.And if our choice satisfies certain properties, it can be used to formalize the notions ofconvergence, continuity, connectedness, etc.

Definition 2.1. A topological space is a set X together with a collection T of subsets ofX that satisfies,

(1) , X T.(2) (closure under union)

i Ui T if Ui T.

(3) (closure under finite intersection)n

i=1

Ui T if Ui T.The subsets in T are called open sets and T is called a topology for X. Together they

are called a topological space and denoted (X, T) though we often drop the T. A subsetof X may be neither closed nor open, either closed or open, or both.

Example 2.2. Every set X has the trivial topology T = {, X} and the discrete topologyP(X).

Example 2.3. The finite set X = {x1,...,x6} has two topologies T = {, {x1, x2}, X}and T = {, {x1, x2}, {x1, x2, x3, x4}, X}. They are finer than the trivial topology andcoarser than the discrete topology.Example 2.4. IfX is an infinite set then the collection T = {, X, all finite subsets of X}do not form a topology. One can find a countable union of finite subsets that is not in T.Example 2.5. The real line R together with the topology generated by all open intervals(a, b), a , b R is a topological space.

Among topological spaces we only consider maps that respect their topologies.


4/31

4

Definition 2.6. A map Xf- Y between two topological spaces is called continuous if

f1(V) is open for every open set V Y.Exercise 2.7. Show that composition of two continuous maps is continuous.

We are more familiar with the (, ) definition of continuity, but it only applies to spaceswith metrics. This definition of continuity by open sets is more general and agrees with

that over metric spaces. As with maps between sets, a continuous map Xf- Y is called

injective if f(x) = f(x) whenever x = x and f is called surjective if f(X) = Y. When fis bijective we can define a map Y - X, y x where x is the unique element that fmaps to y. This map is not necessarily continuous. When it is, we denote it as f1, callf a homeomorphism and write X Y.Example 2.8. Every map (X, P(X))

f- (Y, TY) is continuous.

Example 2.9. Every constant map (X,TX)

f- (Y,

TY) is continuous.

Exercise 2.10. Show that the function (R, B(R)) f- (R, B(R)), 0 0 and 0 = x sin( 1

x2) is continuous everywhere but 0.

Example 2.11. Define (1, 1) f- (, ), x x(x+1)(x1) then f is a homeomorphismbetween (1, 1) and R. In topology, these spaces are the same. What is the inverse of f?

Whenever we have objects with some structure, we also consider subobjects with thesame structure.

Definition 2.12. A subset X (X, T) together with the induced topology T =

{XU, U T } is called a subspace of (X, T).Each open set U T comes from an open set U T. More generally, we can view

a subspace (X, T) (X, T) as an inclusion X i- X such that T is the smallesttopology on X to make i continuous.

Example 2.13. (Z,P(Z)) is a subspace of (R,B(R)).

Exercise 2.14. Consider Q (R,B(R)) with the induced topology. Show that,(a) 0 is not open in Q so this induced topology is not the discrete topology P(Q).(b) Ifa, b are rational then the interval a < q < b, q Q is open in Q.(c) Ifa, b are rational then the interval a q b, q Q is closed in Q.(d) Ifa, b are irrational then the interval a < q < b, q

Q is both open and closed in Q.

3. Groups

While a topology gathers elements into open and closed subsets, an operation connectselements in a different way.

Definition 3.1. A group is a set G together with an operation that satisfies the following,(1) (closure) g h G for any g, h G.(2) (associativity) (g h) k = g (h k) for all g,h,k G.


5/31

5

(3) (identity element) there exists an element e G such that e g = g e = g for allg G.

(4) (inverse element) there exists an element g1 G such that g g1 = g1g = e forall g

G.

From these four group axioms one can deduce the uniqueness of both e and g1. If His a subset in (G, ) such that (H, ) is itself a group then we call H a subgroup of G.Example 3.2. We have the trivial group S = {}.Example 3.3. All Z, nZ,Z/nZ,Q,R,C are groups under the usual addition. SurelyQ,R,C are groups under multiplication. Some groups here are finite while others areinfinite.

Taken for granted is the fact that g + h = h + g for all h, g in any of the above groups.This is not guaranteed in general. Nor is it guaranteed that a subset H (G, ) is a groupunder , because H may not be closed under multiplication or inverse.Definition 3.4. A group (G, ) is called commutative if g h = h g for all g, h G.Definition 3.5. A subset H (G, ) is called a subgroup if it is a group under .Example 3.6. Consider Sn = {all bijections from {1,...,n} to {1,...,n}} under compo-sition. For each n, Sn is a group with cardinality n!. For n 3, Sn is noncommutativewith many commutative and noncommutative subgroups.

Example 3.7. The set C(X,R) of all continuous functions from X to R is a subgroup ofthe set F(X,R) of all functions from X to R under addition. What do we need to modifyunder multiplication?

Exercise 3.8. Determine which of the following operations are associative:

1. on Z defined by a b = a b.2. on R defined by a b = a + b + ab.3. on Z Z = {(z1, z2), zi Z} defined by (z1, z2) (z1, z2) = (z1 + z1, z2 + z2)Exercise 3.9. Determine which of the following sets are groups, and which groups arecommutative:

(1) the set of all rational numbers with odd denominators under addition.(2) the set of all rational numbers with even denominators under addition.(3) the set of all rational numbers with denominators 1, 2, or 3 under addition.(4) the set of all nonzero rational numbers under multiplication.(5) the set of all integers under multiplication.(6) the set M(2,R) of all 2

2 matrices over R under addition.

(7) the set M(2,R) of all 2 2 matrices over R under multiplication.(8) the set of all nth roots of unity R = {z C, zn = 1}, n > 0 under multiplication.(9) the set of all nth roots of unity R = {z C, zn = 1}, n > 0 under addition.

Again we consider relationships between groups, i.e. maps between groups that respecttheir group structures.

Definition 3.10. A map (G, ) f- (H, ) such that f(g g) = f(g) f(g) is called agroup morphism.


6/31

6

Exercise 3.11. Show that composition of two group morphisms is a group morphism.

Example 3.12. (nZ, +) - (Z, +) - (Q, +) - (R, +) - (C, +).

Example 3.13. (Z, +) - (Z/nZ, +), n

n.

Example 3.14. (Z/nZ, +) - (R, ), k e i2kn .Example 3.15. (R, +) - (R, ), x ex.

As with maps between sets, a group morphism Gf- H is called injective iif f(g) =

f(g) whenever g = g iff f1(1H) = {1G} and f is called surjective if f(G) = H. Wecall f1(1H) and f(G) the kernel and image of f, and denote them as ker(f) and im(f)respectively. Both are subgroups ofG and ofH. When f is bijective we can define a mapH - G, h g where g is the unique element that f maps to h. This map is a groupmorphism. We denote it as f1, call f a group isomorphism and write G H.

Exercise 3.16. Show that if a group morphism Gf- H is bijective then f1 is indeed

a group morphism. Hint: show that f1(hh) = f1(h)f1(h) for all h, h H.Exercise 3.17. Given a group morphism G

f- H, show that ker(f) is a subgroup of

G and im(f) is a subgroup ofH.

Exercise 3.18. Determine which of the following maps are group morphisms, whichgroup morphisms are injective, surjective, bijective.

(a) (5Z, +) - (Z, +), 5n 5n.(b) (nZ, +) - (Z, +), 5n n.(c) (Z, +) - (M(2,Z), +), ().(d) (M(2,Z), +) - (Z, +),

a bc d

a + d.4. Rings and Fields

We observe that beside addition Z also has multiplication. This observation leads usto a general definition.

Definition 4.1. A ring (R, +, ) is a set R with two binary operations + and calledaddition and multiplication satisfying,

(1) (additive group) (R, +) is a commutative group.(2) (associativity) (a b) c = a (b c) for all a,b,c R.(3) (distributivity) (a + b) c = a b + b c and a (b + c) = a b + a c.(4) (multiplicative identity) there exists an element 1 R such that 1 a = a 1 for

all a R.We often suppress +, and simply write R unless they are needed to clarify matters.

When multiplication is commutative, we call R a commutative ring. While others maynot require a ring to have 1, we always do and the requirement of R being abelian isactually redundant, for (1 + 1) (a + b) = a + b + a + b and (1+1) (a + b) = a + a + b + bimplies a + b = b + a for all a, b R.


7/31

7

Example 4.2. The set {a, b} of two elements with addition a + a = a, b + b = a, a + b =b, b + a = b and multiplication a a = a, a b = a, b a = a, b b = b is a ring.Example 4.3. The sets Z/nZ,Z,Q,R,C with the usual addition and multiplication all

are commutative rings.Exercise 4.4. Determine which of the following are rings, commutative rings,

(1) nZ, n > 1.(2) The set of all 2 2 matrices (M(2,R), +, ).(3) The set C(R, [0, 1]) of all continuous functions from R to [0, 1] under the usual

addition and multiplicaton.(4) The set C(R,R)of all continuous functions from R to R under the usual addition

and multiplication.(5) The set C(R,R)of all continuous functions from R to R under the usual addition

and composition.

So we have seen some noncommutative rings and some noninvertible ring elements.Better than a general ring is one in which division is possible. Best is a ring in whichdivision is possible and multiplication is commutative, something like Q and R.

Definition 4.5. (R, +, ) is called a division ring if each element a R has a multiplicativeinverse b R such that a b = b a = 1. We denote such b as a1.Definition 4.6. A field is a commutative division ring.

Surely all fields are division rings and it can be shown that all finite divisions ringsare fields. It is not easy to give example of a noncommutative division ring. The mostpopular one is the Hamilton quarternion algebra.

Example 4.7. Z is not a field while all Q,R,C are.

Exercise 4.8. Determine which of the following are fields:

(a) (Z/pZ, +, ) for p prime.(b) (Z/pqZ, +, ) for p, q prime.(c) The set of{

a 00 d

(M(2,Z), 0 = a, b Z}{

0 00 0

} under the usual matrix

addition and multiplication.

(d) The set of all {

a 00 d

(M(2,R), 0 = a, b R}{

0 00 0

} under the usual

matrix addition and multiplication.

(e) The set of all{

a 0

0 d (M(2,R), a , b R} under the usual matrix addition and

multiplication.

Again we consider subsets in a ring R with the same ring structure.

Definition 4.9. a subet S (R, +, ) is a subring ofR if (S, +, ) is also a ring. A subringin a field F is called a subfield.

Example 4.10. nZ Z Q R C as subrings. Both Q and R are subfields ofCwhile nZ and Z are not fields, hence not subfields.


8/31

8

Once among rings, we consider maps between them that respect their ring structures.

Definition 4.11. A ring morphism is a map R- S such that (a + b) = (a) + (b)

and (ab) = (a)(b).

Again composition of two ring morphisms is a ring morphism. As with group mor-phism, a ring morphism R

f- S is called injective iif f(g) = f(g) whenever g = g iff

f1(0R) = {0S} and f is called surjective if f(R) = S. When f is bijective we call it aring isomorphism and write R S. We call f1(0H) and f(R) the kernel and image off, and denote them as ker(f) and im(f) respectively.

Exercise 4.12. Given a ring morphism Rf- S, determine if ker(f) is a subring of R

and ifim(f) is a subring of S. Compare with exercise 3.17.

Exercise 4.13. Determine which of the following are ring morphisms,

(a) (Z, +,) - (M(2,Z), +,

),

0

0 .

(b) (M(2,Z), +, ) - (Z, +, ),

a bc d

a + d.

(c) (R, +, ) - (M(2,R), +, ), ().(d) (M(2,R), +, ) - (R, +, ),

a bc d

ad bc.

Exercise 4.14. Show that the ring in example 4.2 is isomorphic to Z/2Z.

Part 2. Linear Algebra

5. Vector Spaces

5.1. Basic Objects. Engineers and physicists represent different objects in their fieldsas elements in one-dimensional space R, two-dimensional space R R, or generally n-dimensional space R ... R. Such elements can be added, subtracted, or scaled by areal number. Together they form what are called Euclidean vector spaces Rn. We beginwith an abstract generalization.

Definition 5.1. A vector space over a field F is a set V together with two binary oper-ations + and that satisfy the following axioms,

(1) (vector addition) u + v V for any u, v V.(2) (associativity of addition) (u + v) + w = u + (v + w) for all u,v,w

V.

(3) (commutivity of addition) u + v = v + u.(4) (identity element under addition) there exists an element 0 V such that 0 + u =

u + 0 = u for all u V.(5) (inverse element under addition) there exists an element u such that u + (u) =

u + u = 0 for all u V.(6) (scalar multiplication) u V for any F and u V.(7) (distributivity of scalar multiplication with respect to vector addition) (u+v) =

u + v.


9/31

9

(8) (distributivity of scalar multiplication with respect to field addition) ( + ) u = u + u.

(9) (compatitbility of scalar multiplication with field multiplication) ()u = (u).(10) (identity element of scalar multiplication) 1

u = u for any u

V and 1 is the

multiplicative identity in F.For those with some background in abstract algebra, the first five axioms mean V is

a commutative group under + and the next five axioms mean V is an F-module. Theelements in V are called vectors while the elements in F are called scalars. We give someexamples.

Example 5.2. The singleton set V = {} under the trivial + and is a vector space overany field F. It is called the zero vector space over F.

Example 5.3. The plane R2 = {all 2-tuples (x, y) with x, y R} over the field R un-der the usual operations + and is a vector space. More generally, the space Rn ={all n-tuples (x1,...,xn) with xi

R} over the field

Runder the usual operations + and is a vector space. They are called Euclidean spaces and will be our main focus in this

course.

Exercise 5.4. Determine if the set of all finite sums S = {1s+2t+3u+4v+5w, ai C,s,t,u,v,w indeterminates} under the usual operations + and is a vector space overthe field C.

Example 5.5. We consider different classes of functions from Rn to R.

(1) A function Rnf- R, (x1,...,xn) f(x1,...,xn)} is called linear if f(u + v) =

f(u) + f(v) for any , R, u , v Rn. If we define addition f + g as (f +g)(x) = f(x) + g(x) and scalar multiplication as (

f)(x) = f(x) then the set

L = {all linear functions Rn f- R} over the field R under + and is a vectorspace.

(2) More generally, a function Rng- R, (x1,...,xn) g(x1,...,xn) is called affine if

g(x) = f(x) + for some linear function f and some R. One can verify thatthis condition is equivalent to the condition g(u + v) = f(u) + f(v) for any

+ = 1 R, u , v Rn. The set M = {all affine functions Rn g- R} over thefield R under the usual operations + and is a vector space.

(3) Most generally, the set N = {all functions Rn h- R} over the field R under sameoperations is a vector space.

Next we consider subobjects in the category of vector spaces over F.Definition 5.6. A subset U (V, +, ) of a vector space V over F is called a subspace ifU under + and is also a vector space over F.Example 5.7. In example 5.5 L M N as subspaces over R. We will know moreabout them later.

Example 5.8. We can view R2 as a subspace U = {(x,y, 0), x , y R2} R3. Why isntU = {(x,y, 1), x , y R2} R3 a subspace?


10/31

10

Exercise 5.9. Verify that the set R [x] = {p(x) = anxn + ... + a1x + a0, ai R} of allpolynomials in one indeterminate x over R is a vector space over R. What are some ofits nontrivial subspaces?

Together addiction and scalar multiplication allow us to form linear combinations ofvectors in Rn.

Definition 5.10. Given v1,...,vn V over F we define their linear combination to be1 v1 + ... + n vn, any 1,...,n F. The scalars i are called coefficients of the linearcombination. If 1 v1 + ... + n vn = 0 for some nonzero i then vi can be written asa linear combination of v1,..,vi1, vi+1,...,vn and we say v1,...,vn are linearly dependent.Else we say they are linearly independent.

Definition 5.11. The set {1v1+ ...+nvn, i R} of all linear combinations ofv1,...,vnis called their span and denoted Span(v1,...,vn).

Example 5.12. It is easy to see that v1, v2 are linearly dependent iff v1 =

v2 as a

multiple. In that case Span(v1, v2) = Span(v1) = Span(v2) a line.

Example 5.13. It is harder to see that (1, 2, 3), (4, 5, 6), (2, 1, 0) are linearly dependentand that Span((1, 2, 3), (4, 5, 6), (2, 1, 0)) = Span((1, 2, 3), (4, 5, 6)) a plane. Generally,if {v1, v2} in R3 are linearly independent then {1v1 + 2v2, all i R} make up theplane containing v1, v2.

Exercise 5.14. Show that (1, 2, 0), (4, 0, 5), (6, 4, 3) are linearly independent and thatthey span the whole R3.

Example 5.15. Given the system of one linear equation x y + 2z = 5, its solu-tions form the plane {(5 + s 2t,s,t), s , t R} in R3. This is equivalent to the mapR

2 f-

R3

, (s, t) (5 + s 2t,s,t). We can also describe this plane as {(5, 0, 0) +s(1, 1, 0) + t(2, 0, 1), s , t R}, essentially spanned by (1, 1, 0) and (2, 0, 1) up to trans-lation.

Example 5.16. Any pair of indeterminates in example 5.4 are linearly independent.

Definition 5.17. A subset B = {vi}iI, I an indexing set and vi V, is called a basisfor V if B is linearly independent and Span(B) = V.

By definition, every v V can be written as a linear combination v = iI

ivi of

members of basis B = {v1,...,vn} and such representation is unique by linear independenceof B. Sometimes we write v = (i) in its coordinate form if the vi are ordered. One canimagine that v has different representations and different coordinate forms in differentbases.

Example 5.18. If we choose B = {(1, 0), (0, 1)} as a basis for R2 then v = (3, 4) canbe written as 3(1, 0) + 4(0, 1) with coordinate form (3, 4). If B = {(0, 1), (1, 0)} thenstill v = 3(1, 0) + 4(0, 1) = 4(0, 1) + 3(1, 0) but its coordinate form is now (4, 3). If B = {(1, 0), (0, 2)} is chosen then v = 3(1, 0) + 2(0, 2) = (3, 2) in its coordinate form.Exercise 5.19. Find the representation and coordinate form of (3, 4) in basis B ={( 1

2, 1

2), ( 1

2, 1

2)}.


11/31

11

The previous examples show no basis is more special than the rest, only some bases arenicer for computation than others. Moreover, the order in each basis affects coordinateform representation. What is true is every vector space has a basis B = {vi, i I} andall bases of V have the same size, though that may be infinite. Now we can associate the

first invariant to each vector space V over a field F, generalizing the notion of dimensionthat we often speak of for Rn.

Definition 5.20. We define the dimension dim(V)/F of V as the size of any basis for Vover F.

Example 5.21. The unit vectors e1 = (1, 0,.., 0), e2 = (0, 1, 0,..., 0),...,en = (0,..., 0, 1)together form a basis for Rn since they are clearly linearly independent and any v =(1,...,n) can be written as 1(1, 0,..., 0) + ... + n(0,..., 1). The dimension ofR

n is nas conventionally known.

Example 5.22. The vectors s,t,u,v,w together form a basis for our vector space V over

C in example 5.4. Its dimension is 5. Viewed as a vector space over R, however V hasdimension 10 since one of its bases is s,is,t,it,u,iu,v,iv,w,iw.

Exercise 5.23. Show that the vector space N in example 5.5 has infinite dimension. Itfact, any of its bases must be uncountable.

5.2. Inner Product. This section focuses on vector spaces over R. If they are to enjoy

some sort of multiplication V V ,- R, we must expect the following.Definition 5.24. An inner product on a vector space V over R is any function V V

,- R that satisfies the following axioms,

(1) (symmetry)u, v

=

v, u

.

(2) (linearity) u + v, w = u, w + v, w.(3) (positive definiteness) u, u 0 for all u V, with equality iff u = 0.

A vector space V over R equipped with an inner product is called an inner productspace. Note that the product of two vectors is a scalar in R. We can turn Rn into aninner product space as follows.

Definition 5.25. Given two vectors u = (u1,...,un), v = (v1,...,vn) Rn we define theirinner product to be u, v =

ni=1

uivi = u1v1 + ... + unvn.

One can verify that this newly minted product satisfies the above three axioms. Lateron we will also define outer product for Rn as well. Meanwhile let us see some examples.

Example 5.26. The inner product of a vector u = (u1,...,un) Rn with the ithunit vector ei picks out its i

th coordinate ui. This actually induces a linear function

Rngei- R, (u1,...,un) ui.

Example 5.27. If u = (u1,...,un) Rn then u, u = u21 + ... + u2n.Example 5.28. If ai, bi {0, 1} and a = (a1,...,an), b = (b1,...,bn) Rn then a, b isthe number of indices i where ai = bi = 1.


12/31

12

Following is a nice statement about inner product and linear functions as seen in ex-ample 5.5.

Theorem 5.29. For any a = (a1,...,an)

Rn the functionRn

fa- R, u = (u1,...,un)

a, u is linear. Conversely, any linear function f equals fa for some a Rn.Proof. That fa = a, is linear follows from linearity of inner product. Conversely, forany linear function Rn

f- R and any u = (u1,...,un) Rn we have f(u) = f(u1e1 +

... + unen) = u1f(e1) + ... + unf(en) = a, u = fa(u) where a = (f(e1),...,f(en)). What a linear function Rn

f- R does to basis {e1,...,en} or to any other basis

completely determines its whole behavior. We will return to this later. Here are two niceresults.

Corollary 5.30. The space L of all linear functions fromRn to R has dimension n.

Proof. It follows from the theorem that any linear function f = fa = a, = (f(e1),...,f(en)), =f(e1)ge1 + ... + f(en)gen where gei was defined in example 5.26. Furthermore, these{ge1 ,...,gen} are linearly independent, hence they form a basis for L. Summarily, ev-ery basis B = {b1,...,bn} for Rn corresponds to a basis B = {gb1,...,gbn} for L. HenceRn and L share the same dimension n.

We now look at some examples.

Example 5.31. Taking average of the coordinates of a vector x, f(x) = (x1 + ... + xn)/nis linear. Surely f = fa where a = (1/n, ..., 1/n).

Example 5.32. Taking maximum of the coordinates of a vector x, f(x) = max {x1,...,xn}is not linear. To see this, pick n = 2, x = (1,

1), y = (

1, 1). Then f(x+y)

= f(x)+f(y).

Hence it can not be represented by any inner product.

Another nice result from theorem 5.29 is the following statement about the space M ofall affine functions from Rn to R.

Corollary 5.33. The space M of all affine functions fromRn to R has dimension n + 1.

Proof. This follows from definition of affine functions in example 5.5 and previous corol-lary.

While M is much smaller than N, every continuously differentiable function f N hasa good affine approximation in M. Recall that ifRn

f- R, x = (x1,...,xn) f(x) is a

continuously differentiable then we can take the continuous partial derivatives

f(x)

xi , i =1,...,n and form its gradient f(x) = (f(x)/x1,...,f(x)/xn). The first-order Taylorapproximation off near x is defined as faff(x) =

ni=1

f(x)xi

(xi xi) + f(x) = f(x), (xx) + f(x). This function faff is certainly linear and gives a good approximation of f(x)when x is near x.

Example 5.34. When n = 1 this is none other than the usual Taylor approximationfaff(x

) = f(x)(x x) + f(x) we often see.


13/31

13

Example 5.35. Consider R2f- R, (x1, x2) ex1+x21 + ex1x21 + ex11. Then

f(x) = (ex1+x21+ex1x21ex11, ex1+x21ex1x21). At (0, 0), f((0, 0)) = (1/e, 0).Hence the first-order Taylor approximation of f near (0, 0) is faff(x) = f((0, 0)), x (0, 0)

+ f((0, 0)) = x

1/e + 3/e.

5.3. Norm. Given a vector space V over R we want to make precise how large each vectorv V is. Of course, there are different ways to do so, but they all must meet certainexpectations.

Definition 5.36. A norm on V over R is a function V||||- R that satisfies,

(1) (positive definiteness) ||v|| 0, with equality iff v = 0.(2) (homogeneity) || v|| = ||||v||.(3) (triangle inequality) ||u + v|| ||u|| + ||v||.

Any vector space V over R with norm is called a normed space.

Example 5.37. One obvious way to define a norm on Rn

is ||(u1,...,un)|| = |u1|+...+|un|.This is called the 1-norm and denoted | || |1.Example 5.38. Another norm we have on Rn is the usual Euclidean norm ||(u1,...,un)|| =

u21 + ... + u2n =

u, u. It is called the 2-norm and denoted || ||2. The Euclideannorm is related to the root mean square (RMS, instead of mean or mean square) value

of a vector u, defined as RMS(u) =

1n

(u21 + ... + u2n) =

1n||u||. This quantity roughly

tells us about the typical value of the coordinates ui with respect to n.

Example 5.39. More generally we define thep-norm onRn as ||(u1,..,un)||p = p

up1 + ... + upn

for any 1 p < .

Example 5.40. We define the -norm on Rn

as ||(u1,...,un)|| = max {|u1|,..., |un|}. Itis denoted | || |.Exercise 5.41. Draw all the vectors of norm 1 in R2, where norm here is the 1-norm,the p-norm for 1 < p < 2, the 2-norm, the p-norm for 2 < p < , and the -norm.Exercise 5.42. Given a measure space (X, F, ), consider

L(X, ) = {all measurable X f- R such that X

|f|pd < }

Show that we can define a norm L||||p- R, f

X

|f|pd1/p

for L as follows.

(a) (positive semidefiniteness) ||f||p 0, with equality iff f is 0 almost everywhere.(b) (homogeneity) ||f||p = ||||f||p.(c) (triangle inequality) ||f + g||p ||f||p + ||g||p.(d) (positive definiteness) Let L = L modulo those functions that are 0 almost every-

where, then ||||p is a norm on L. Together with this norm, L is called the Lp spacein analysis.

Example 5.43. One more norm we can give Rn is ||u||w =

(u1/w1)2 + ... + (un/wn)2 =v, w for some w = (w1,...,wn). What this norm does is assigning different weights


14/31

14

w1,...,wn to the coordinates u1,...,un of u. Euclidean norm is now a special case ofweighted norm, where each coordinate is given the same weight 1. Moreover, when eachcoordinate of u is a physical quantity with unit then the weights are often of the sameunits, so that u has unitless norm.

We just saw that ||u||22 = u, u on Rn. In general, any inner product , on a vectorspace V over R induces a norm ||u|| =

u, u. Such definition easily clears the firstthree axioms of being a norm due to the properties of inner product. We establish a nicetheorem that would imply the triangle inequality axiom for such norm.

Theorem 5.44. (Cauchy-Schwarz Inequality) For all u, v in an inner product space Vwith induced norm | || | we have |u, v| ||u||||v||. Moreover, equality holds iff u, v arelinearly dependent.

Proof. If either u or v is 0 then inequality holds. Else consider the quadratic polynomialp(t) = ||tu + v||2 = tu + v,tu + v = u, ut2 + 2u, vt + v, v. Being a square, p(t) 0,hence its discriminant 4

u, v

2

4u, u

v, v

0, or

u, v

2

u, u

v, v

. Equality holds

iff the discriminant is 0 iff p(t) = 0 iffut + v = 0 iffv is a multiple ofu. Do you know whywe considered such polynomial p(t) as to show us exactly what we needed to see?

Corollary 5.45. (Triangle Inequality) The induced norm |||| from an inner product onV satisfies triangle inequality ||u + v|| ||u|| + ||v||, all u, v V.Proof. Clearly ||u+v||2 = u+v, u+v = u, u+2u, v+v, v = ||u||2+2u, v+ ||v||2 ||u||2 + 2||u||||v|| + ||v||2 = (||u|| + ||v||)2, from which our claim follows.

Thus an inner product on V induces a norm. A norm in turn induces distance.

Definition 5.46. For any vectors u, v in a vector space V equipped with norm | |||, wedefine the distance between them to be dist(u, v) =

||u

v

||.

Example 5.47. All the above norms on Rn induce different distances between vectors inRn, though some of them are rather antiintuitive.

5.4. Angle between Vectors. As an inner product , induces a norm for V, wewant to relate u, v and ||u||, ||v|| for any pair u, v V. For example, when u = v Vthen u, v = ||u||||v|| = ||u||2 and when u = v then u, v = ||u||||v|| = ||u||2. Hencewe suspect the ratio u,v||u||||v|| bears some correlation between u and v, perhaps how u linesup against v.

Definition 5.48. For nonzero u, v in a vector space V with inner product , andinduced norm | || | we define their correlation coefficient (u, v) = u,v||u||||v|| .

This correlation coefficient, viewed as a function V V- R is surely symmetric.

It ranges between -1 and 1 by Cauchy-Schwarz inequality. Two vectors u, v said to becorrelated if (u, v) is close to 1, uncorrelated if (u, v) is close to 0, and anticorrelated if(u, v) is close to 1.Example 5.49. Consider u = (0.1, 0.3, 1.3, 0.3, 3.3), v = (0.2, 0.4, 3.2, 0.8, 5.2), w =(1.8, 1.0, 0.6, 1.4, 0.2). Then ||u|| = 3.57, ||v|| = 6.17, ||w|| = 2.57, u, v = 21.7, u, w =0.06, (u, v) = 0.98, (u, w) = 0.007. Therefore u and v are more correlated than uand w are (v is roughly twice u).


15/31

15

Example 5.50. (Throw back to probability theory) The set R = {(, F, P) X- R} ofall real-valued random variables form a vector space over R. If we define an inner productfor this space as X, Y = E((X X)(Y Y)) then X, Y is none other cov(X, Y),

X, X

=

||X

||2 is none other than var(X),

||X

||is none other than

X, and correlation

coefficient (X, Y) is none other than the usual correlation coefficient corr(X, Y) as knownin probability theory.

Cauchy-Schwarz inequality is also useful as it helps us define angle between two vectors.

Definition 5.51. In an inner product space V we define the angle between two vectors

u and v to be (u, v) = arccos((u, v)) = arccos

u,v||u||||v||

.

Example 5.52. We can now speak of angle between two random variables X, Y.

This definition agrees with the usual notion of angle between vectors in R2 and R3,while generalizing to Rn, n > 3. If (u, v) = 0 we say the u, v are aligned, i.e. they

have correlation coefficient 1, or that each vector is a positive multiple of the other. If0 < (u, v) < 90 we say u, v make an acute angle, i.e. they have positive correlationcoefficient. If(u, v) = 90 we say u, v are orthogonal, i.e. they are uncorrelated andwritten u v. If 90 < (u, v) < 180 we say u, v make an obtuse angle, i.e. they havenegative correlation coefficient. Lastly, if(u, v) = 180 we say u, v are antialigned, i.e.they have correlation coefficient 1 and each vector is a negative multiple of the other.We are more interested in orthogonal vectors because they make excellent bases.

Definition 5.53. A vector v in a normed space V is called a unit vector if ||v|| = 1.Any vector v V can be easily normalized and replaced by an aligned unit vector v||v|| .

Definition 5.54. A collection of vectors {v1,...,vn} in an inner product space V withinduced norm are said to be orthonormal if each vi is a unit vector and vi vj for anyi = j.

If v = 1v1 + ... + nvn is a linear combination of an orthonormal collection {v1,...,vn}then surely vi, v = vi, 1v1 + ... + nvn =

nj=1

vi, jvj = i 1 = i. So taking innerproduct with vi yields the i

th coefficient in the linear combination for v. This implies anyorthonormal collection {v1,...,vn} are linearly independent, which is most useful whenn = dim(V) and the {v1,...,vn} form a basis for V.Example 5.55. Both collections A = {(1, 0, 0), (0, 1, 0)} andB = {(0, 0, 1), (1/2, 1/2, 0), (1/2, 1/2, 0)} are orthonormal whileC =

{(1,

1, 0), (1, 1, 1)

}can be normalized and completed into an orthonormal basis.

Write (2, 7, 3) as a linear combination in B and in C.

5.5. Exercises.

Exercise 5.56. page 18: 1.2, 1.7, 1.9, 1.10, 1.11, 1.12, 1.15, 1.16, 1.17, 1.21

6. Linear Maps and Matrices

6.1. Morphisms between Vector Spaces. We step back from vectors within a vectorspace V /F and consider the relationships between finite dimensional vector spaces over


16/31

16

the same field F. Given two vector spaces V, W we would expect any map between themto respect their linear structures.

Definition 6.1. A map Vf- W between two vector spaces V, W over F is called linear

if f(u + v) = f(u) + f(v) for all u, v V and , F.Example 6.2. The trivial map V

f- W, v 0 between any two vector spaces is

certainly linear.

Example 6.3. The map R2f- R3, (x, y) (x,y, 0) is linear. This is how we embed

R2 into R3 as seen before. In similar fashion can Rm be embedded into Rn, m < n .

The most notable thing about a linear map Vf- W is that it is completely determined

by what it does to a basis B = {v1,...,vn} V. More precisely, if we write V v =a1v1 + ... + anvn then f(v) = a1f(v1) + ... + anf(vn) by linearity off. If we also choose abasis C =

{w1,...,w

m}for W then we can write f(v

i) = b

1iw1

+...+bmi

wm

= (b1i

,...,bmi

)t.Hence f(v) = a1f(v1)+ ...+anf(vn) = a1(b11,...,bm1)

t+...+an(b1n,...,bmn)t = (a1b11+...+

anb1n,...,a1bm1 + ... + ambmn)t =

b11 ... b1n. . .

bm1 ... bmn

(a1,...,an)t if we define multiplication

between an mn matrix and an n1 matrix as such (we have used the transpose notationt here without having introduced it yet.) We state this in a theorem.

Theorem 6.4. Any linear map Vf- W can be represented by a matrix Af once V, W

are given bases. Conversely any m n matrix A is a linear map between a vector spaceV /F of dimension n and a vector space W/F of dimension m.

Proof. It remains to show that as a map, A(v + v) = A(v) + A(v), but this followsfrom definition of matrix multiplication above.

Domain V and codomain W for A are often understood to be Fn with canonical basis{e1 = (1, 0,..., 0),...,en = (0,..., 0, 1)} and Fm with {e1,...,em}. Let us look at someexamples.

Example 6.5. The zero map V0- W, v 0 is linear, it is represented by the 0 matrix

with respect to any bases for V and W.

Example 6.6. The identity map VidV- V, v v is linear, it is represented by the

identity matrix I

M(n, F) w.r.t. any basis for V. More generally, scaling by

F is

linear and represented by the matrix () M(n, F) w.r.t. any basis for V.Example 6.7. Reflection across any line in the plane R2 is linear. If we choose anorthonormal basis {u1, u2} such that r(u1) = u2 and r(u2) = u1 then clearly it is repre-sented as

0 11 0

. Similarly, reflection across u1 is linear and represented by the matrix

1 00 1

while projection onto u1 is represented by

1 00 0

.


17/31

17

Example 6.8. For any linear map VA- W with respect to basis {v1,...,vn} for V,

A(vi) equals the ith column of A, as seen below,

c11 ... c1i ... c1n

: : : : :: : : : :

cn1 ... cni ... cnn

0

:1i:0

=

c1i

:::

cni

.

6.2. Kernel, Nullity, Image, Rank. The first thing we look at in each linear map

Vf- W is what it destroys in V and what it reaches in W.

Definition 6.9. For any linear map Vf- W we define ker(f) = {v V such that f(v) =

0}. The number dim(ker(f)) is called the nullity of f.Definition 6.10. For any linear map V

f- W we define im(f) =

{w

W such that w =

f(v) for some v V}. The number dim(im(f)) is called the rank of f.Clearly f is injective iff nullity(f) = 0 and f is surjective iff rank(f) = dim(W).

Furthermore, if {v1,...,vk} is a basis for ker(f) then it can be completed to a basis{v1,...,vk, vk+1,...,vn} for V. Write any V v = 1v1+ ...kvk +k+1vk+1+...+nvn thenf(v) = 1f(v1) + ...kf(vk) + k+1f(vk+1) + ... + nf(vn) = k+1f(vk+1) + ... + nf(vn).Therefore im(f) = Span(f(vk+1),...,f(vn)). While dim(ker(f)) = k, we see dim(im(f)) n k. Equality follows from the following theorem.Theorem 6.11. (Rank-Nullity) IfV

f- W is a linear map then dim(V) = nullity(f)+

rank(f).

Proof. It remains to show

{f(vk+1),...,f(vn)

}are linearly independent and thus form a

basis for im(f). Suppose k+1f(vk+1) + ... + nf(vn) = 0 for some k+1,...,n F,then f(k+1vk+1 + ... + nvn) = 0. So k+1vk+1 + ... + nvn ker(f) and we can writek+1vk+1 + ... + nvn = 1v1 + ... + kvk for some 1,...,k F. Since the vi form a basisfor V, the i must all be 0. In particular, k+1 = ... = n = 0.

Example 6.12. The zero map V0- W has nullity(0) = dim(V) and rank(0) = 0 while

scaling Vs- V has nullity(s) = 0 and rank(s) = dim(V).

Example 6.13. Reflections in example 6.7 has nullity 0 and rank 2. On the other hand,projection onto u1 has nullity 1 and rank 1. In general, if{u1,...,uk} are linearly indepen-dent in V then projection onto Span(u1,...,uk) has rank = k and nullity = dim(V) k.

It follows that V f- W is bijective iff dim(V) = rank(f) = dim(W). In that casewe can define a map W - V, w v where v is the unique element that f maps to w.This map is linear. We denote it as f1, call f a linear isomorphism and write V W.Exercise 6.14. Show that if a linear map V

f- W is bijective then f1 is indeed linear.

Hint: show that f1(w + w) = f1(w) + f1(w) for all , F, w, w W.Exercise 6.15. Define a bijection between the space in example 5.4 and R5. Define anonbijection between them.


18/31

18

If we view each matrix A M(m,n,F) as a linear map Fn A- Fm then from example6.8 and theorem 6.11, n k of its columns will form a basis for its image while the otherk columns are linearly dependent upon those. Looking closely at columns of a matrixreveals information about that matrix.

Example 6.16. The matrix

1 2 51 0 12 1 00 1 2

has rank 2 and nullity 1. It is neither

injective nor surjective.

6.3. Matrix Operations. As any linear map between vector spaces over F with fixedbases is represented by a matrix, we study matrices more closely. First goes a formaldefinition of matrix.

Definition 6.17. A matrix M is a rectangular array of (aij)mn ofm rows and n columns,

where each entry aij is an element in F in the i

th

row and j

th

column. Somtimes we alsouse (A)ij for aij . We denote the set of all m n matrices over F as M(m,n,F).Example 6.18. We have

(1)

1/22

2/3

, M M(3, 1,Q), a3,1 = 2/3

(2) M =

sin(/10) cos(/10)

, M M(1, 2,R), a1,2 = cos(/10)

(3) M =

i e 1 0 1/2

ln5 1 3

, M M(3, 3,C), a2,2 = 0

(4) M =

sin(/10) cos(/10)

, M M(1, 2,R

), a1,3 = cos(/10)Below are what we can do with matrices,

6.3.1. Partition a matrix into submatrices. such as A45 =

A33 A32A13 A12

. This is

especially useful when we multiply matrices by blocks without fretting too much aboutentries.

Example 6.19. If A M(4, 5, F) and B M(r + s, 4, F) then they can be partitionedinto blocks for multiplication as follows,

Br3 Br1Bs

3 Bs

1

A33 A32A1

3 A1

2

=

Br3A33 + Br1A13 Br3A32 + Br1A12Bs

3A3

3 + Bs

1A1

3 Bs

3A3

2 + Bs

1A1

2

.

6.3.2. Addition of matrices of same size. We equip the set M(m,n,F) with additionconsistent with addition of their associated linear maps.

Definition 6.20. If Amn = (aij) and Bmn = (bij) then we define A + B = (cij) wherecij = aij + bij.

Example 6.21.

1 0 20 1 1

+

1 3 00 1 2

=

2 3 20 0 3

in M(2, 3,R).


19/31

19

.

Proposition 6.22. Matrix addition enjoys the following properties,

(1) A + B = B + A

(2) (A + B) + C = A + (B + C)

Proof. Both statements follow from commutivity and associativity of addition in F.

Importantly, one can verify that this addition of matrices A+B corresponds to addition

of linear maps VfA+fB

- W once bases are chosen for V and W.

6.3.3. Multiplication of matrices. What about composition of linear maps Uf- V

g- W?

If bases are chosen for U,V,Wand A,B,Care the corresponding matrices for f,g,gfthenwe must define the right product BC such that BA = C to represent gf. Here is thepicture.

Uf

- V UA

- V UA

- V

W

g

?

gf

-

W

B

?

C

-

W

B

?

BA

-

Definition 6.23. If A = (aij) M(m,n,F) and B = (bij) M(l ,m,F) then we defineBA = (cij) M(l ,n,F) where cij = a1jbi1 + a2jbi2 + ... + anjbil =

mk=1

akjbik is the inner

product of ith row of B and jth column of A.

Proposition 6.24. Consider vector spaces U,V,W over F with dimensions n,m,l and

bases{u1,...,un}, {v1,...,vm}, and{w1,...,wl}. IfU f- V is represented by A = (aij) M(m,n,F) and V

g- W is represented by B = (bij) M(l ,m,F) then U gf- W is

represented by BA.

Proof. We write out A =

a11 ... a1n: :

am1 ... amn

, B =

b11 ... b1m: :

bl1 ... blm

. We examine what

f does to the basis vectors,

f(u1) = A(1, 0,..., 0)t

= (a11,...,am1)t

= a11v1 + ... + am1vm...

f(un) = A(0,..., 0, 1)t = (a1n,...,amn)

t = a1nv1 + ... + amnvm

and what gf does to the basis vectors,

gf(u1) = a11B(v1) + ... + am1B(vm) = a11(b11,...,bl1)t + ... + am1(b1m,...,blm)

t

...gf(un) = a1nB(v1) + .... + amnB(vm) = a1n(b11,...,bl1)

t + ... + amn(b1m,...,blm)t


20/31

20

Hence gf is represented by

a11b11 + ... + am1b1m ... a1nb11 + ... + amnb1m: :

a11bl1 + ... + am1blm ... a1nbl1 + ... + amnblm

, which

is precisely BA.

This definition agrees with our earlier one in 6.1. As a special case we define scalar

multiplication as A = ... 0. .

0 ...

A. This scalar multiplication together with

matrix addition turn M(m,n,F) into a vector space over F. The space of all linear mapsbetween two vector spaces V and W is itself a vector space.

Example 6.25.

1 4 2 02 1 5 6

2 1 33 0 14 0 51 2 0

= C23 where c21 = 22+1 3+54+61 =

22

Example 6.26. 5

2 1 33 0 14 0 51 2 0

=

5 0 0 00 5 0 00 0 5 00 0 0 5

2 1 33 0 14 0 51 2 0

=

10 5 1515 0 520 0 255 10 0

Exercise 6.27. Show that M(m,n,F) has dimension mn as a vector space over F.

Proposition 6.28. Matrix multiplication enjoys the following properties,

(1) (associativity) (AB)C = A(BC)(2) (distributivity) A(B + C) = AB + AC and (A + B)C = AC+ BC

(3) (identity element, best when m = n) ImmAmn = AmnInn = A for all Amn(4) (commutivity in scalar multiplication) A = A

Here are a few more examples to illuminate matrix multiplication.

Example 6.29. One can see

0 11 0

1 00 0

=

1 00 0

0 11 0

through direct

multiplication or through geometry. Matrix multiplication is not commutative in general.

Example 6.30. Matrix multiplication fails to cancel,

1 00 0

31

=

1 00 0

32

=

30

but clearly

31

=

32

.

Example 6.31. Matrix multiplication has zero divisors,

1 00 0

0 00 1

=

0 00 0

.

Example 6.32. Matrix multiplication has idempotents,

1 00 0

1 00 0

=

1 00 0

.

Example 6.33. Given 1,...,k F and A1,...,Ak M(m,n,F) we can form a linearcombination 1A1 + ... + kAk M(m,n,F).


21/31

21

Example 6.34. Given a polynomial such as p(x) = x2 + 2x + 3 R[x], we can view it asR

p(x)- R, 4 42+24+3 = 27. Now we can also view it as M(m,n,R) p(x)- M(m,n,R), A

A2 + 2 A + 3.

6.3.4. Trace of a matrix. We assign to each square matrix A M(n, F) its first invariant.Definition 6.35. For A = (aij) M(n, F) we define its trace as tr(A) =

mi=1

aii.

Example 6.36. tr() = n for any matrix () M(n, F).

Example 6.37. tr

2 1 43 4 1

5 3 1

= 2 + 4 + 1 = 7.

Trace has the following properties,

Proposition 6.38. For any A, B M(n, F) and F,(1) tr(A) = tr(A).(2) tr(A + B) = tr(A) + tr(B)(3) tr(AB) = tr(BA).

Proof. Straightforward.

The first two properties mean trace is actually a linear map M(n, F)tr- F. To-

gether with the third property, they actually characterize trace completely, any linear

map M(n, F)f- F satisfying the above three properties must be a multiple of trace.

6.3.5. Transpose of a matrix. Next we give each matrix A M(m,n,F) a companion, ithelps us express matrix theory.

Definition 6.39. For A = (aij) M(m,n,F) we define its transpose At = (bij) M(n,m,F), where bij = aji .

Proposition 6.40. Transpose has the following properties,

(1) (At)t = A.(2) (A + B)t = At + Bt.(3) (A)t = At.(4) (AB)t = BtAt.

Proof. Straightforward.

Example 6.41.

2 1 4

3 4 15 3 1

t

=

2 3 5

1 4 34 1 1

and

2 1 7t

=

2

17

Transpose can be used to describe many interesting classes of matrices. The first classis triangular matrices.

Definition 6.42. A square matrix A M(n, F) is called lower triangular if all entriesabove the diagonal are zero, i.e. aij = 0 for all i < j. A matrix A M(n, F) is calledupper triangular if aij = 0 for all i > j. Furthermore, if aii = 1 then we say A is unitlower triangular or unit upper triangular, respectively.


22/31

22

Definition 6.43. A matrix A M(n, F) is called diagonal if aij = 0 whenever i = j.One sees from definition that A is lower triangular iff At is upper triangular and vice

versa, while A is diagonal iff it is both lower triangular and upper triangular. Anotherclass of matrices that transpose helps describe is symmetric matrices.

Definition 6.44. A square matrix A M(n, F) is called symmetric if A = At, orequivalently if aij = aji for all i, j.

One source for symmetric matrices is taking inner product. If x, y are two vectors inan inner product space V and {v1,...,vn} are a basis for V then x = 1v1 + ... + nvn, y =1v1 + ... +nvn and x, y = 1v1 + ...+nvn, 1v1+ ... +nvn = (1,...,n)A(1,...,n)twhere A = (vi, vj) M(n, F). Even if {v1,...,vn} are not a basis for V, we can stillform A = (vi, vj).Definition 6.45. A square matrix A M(n, F) is named Gram ifA = (vi, vj) for somevectors v1,...,vn V.

This matrix is symmetric since inner product is symmetric. Moreover, if we choosea basis for V and write vi = (1i,...,ni)t then vi, vj = (1i,...,ni)(1j,...,nj)t, so

A = BtB where B has ith column (1i,...,ni)t.

Proposition 6.46. In an inner product space V overR with induced norm, the followingstatements for v1,...,vn are equivalent,

(1) they are orthonormal.(2) their Gram matrix equals I.(3) their coordinate forms (1i,...,ni) with respect to any basis are orthonormal.(4) BtB = I where B has ith column (1i,...,ni)

t.

Proof. follows from above discussion.

Such matrices as B also have a name.

Definition 6.47. A matrix A M(m,n,R) is called orthonormal (or orthogonal) ifits columns form an orthonormal collection in V. A linear map V

f- W is called

orthonormal if its associated matrix Af is orthonormal when bases have been chosen forV and W.

The number n of columns of A may not be m, so they need not form a basis andA need not be square. We will prove that it has as many or more rows than columns.Orthonormal matrices are special. For one, AtA = I by definition, so their transpose is

their inverse from the left. Moreover, they preserve inner product, hence norm and anglewhen viewed as maps between inner product spaces.

Proposition 6.48. If VA- W is an orthonormal map between inner product spaces

overR then A(u), A(v) = u, v for all u, v V.Proof. If (1,...,n)

t and (1,...,n)t are the coordinate forms for u and v then A(u), A(v) =

(A(1,...,n)t)tA(1,...,n)

t = (1,...,n)AtA(1,...,n)

t = (1,...,n)(1,...,n)t = v, v.


23/31

23

What this proposition means is the following diagram commutes,

V V (A,A)- W W

F

,

?id

- F

,

?

One more class of matrices that transpose helps describe is positive semidefinite matri-ces over R.

Definition 6.49. A square matrix A M(n,R) is called positive semidefinite if it issymmetric and xtAx 0 for all x Rn. It is called positive definite if in addition tobeing semidefinite, A satisfies xtAx = 0 only if x = 0.

Example 6.50. For matrices of small sizes, we can verify the sign of xtAx to see

if A is positive semidefinite or positive definite. Concretely, A =

9 66 5

is posi-

tive definite as (x1, x2)A(x1, x2)t = (3x1 + 2x2)

2 + x22 0 for all (x1, x2) R2 and(x1, x2)A(x1, x2)

t = 0 iff (x1, x2) = (0, 0). On other hand B =

9 66 4

is positive

semidefinite as (x1, x2)A(x1, x2)t = (3x1 + 2x2)

2 0 for all (x1, x2) R2. However itis not positive definite since (2, 3)A(2, 3)t = 0. Lastly, B =

9 66 3

is not pos-

itive semidefinite as (x1, x2)A(x1, x2)t = (3x1 + 2x2)

2 x22 for all (x1, x2) R2 and(2/3, 1)A(2/3, 1)t < 0.

Example 6.51. Another source for positive semidefinite matrices are Gram matricesA M(n,R), since xtAx = xtBtBx = (Bx)t(Bx) = Bx,Bx 0. Clearly A is positivedefinite iffB(x) = 0 implies x = 0 iffB has trivial kernel iff Bt has full image.

6.3.6. Norm of a matrix. Viewed as a linear map between normed spaces, a matrixA M(m,n,R) will either stretch or shrink a vector. This behavior is measured by||A(x)||/||x||.Definition 6.52. If V

A- W is a linear map between normed spaces over R then we

define ||A|| = max {||A(x)||/||x||, all x V}.The following instances examplify this new definition.

Example 6.53. The scalar matrix () has norm ||.Example 6.54. The n1 matrix (a1,...,an)t has norm (a21 + ... + a2n)1/2 as it would whenviewed as a vector.

It is not easy to calculate norm of a matrix in general, although MATLAB and wolfra-malpha can approximate matrix norm by numerical methods.

Proposition 6.55. We list some apparent properties of matrix norm.

(1) (Homogeneity) ||A|| = ||||A||.


24/31

24

(2) (Triangle inequality) ||A + B|| ||A|| + ||B||.(3) (Definiteness) ||A|| 0 for all A and equality holds iff A = 0.(4) ||A|| = max {||A(x)||, ||x|| = 1}.(5)

||A(x)

|| ||A

||||x

||for all vectors x.

(6) ||AB|| ||A||||B|| for all A, B.(7) ||At|| = ||A||.

Proof. Do it if time permits.

The first three properties turn M(m,n,R) into a normed space.

6.4. Invertible Matrices and their Inverses.

6.4.1. Square matrices. When a linear map Vf- W between vector spaces of equal di-

mension n over F is bijective with inverse f1 then their matrix representations Af, Af1 M(n, F) satisfy AfAf1 = Af1A = I with respect to any bases for V, W. Or abstractly,M(n, F) has been equipped with + and

and we want to consider those matrices that

are invertible under .Definition 6.56. A square matrix A M(n, F) is called invertible (or nonsingular) ifthere exists B M(n, F) such that AB = BA = I, in which case we denote B as A1.Else we say A is noninvertible (or singular).

Example 6.57. The inverse of I M(n, F) is I itself. More generally, for = 0 F, ... 0: :

0 ...

1

=

1/ ... 0: 1/ :

0 ... 1/

Example 6.58. One can verify that 1 2 33 2 1

2 1 3

and 5/12 3/12 4/127/12 3/12 8/121/12 3/12 4/12

areinverses of each other. Or one can input {{1, 2, 3}, {3, 2, 1}, {2, 1, 3}} into wolframalphaand let it do the work.

Theorem 6.59. Any invertible matrices A, B M(n, F) satisfy the following,(1) A has unique inverse A1 and (A1)1 = A.(2) (AB)1 = B1A1

(3) (An)1 = (A1)n.(4) (A)1 = 1A

1 for k = 0 R.(5) (At)1 = (A1)t.

Proof. prove a few of these in class. Look at these properties in terms of composition oflinear maps.

Example 6.60. We reuse A and A1 in example 6.58. One can verify that

A2 =

13 9 1411 11 14

11 9 16

and (A1)2 =

25/72 9/72 14/7211/72 27/72 14/72

11/72 9/72 22/72

are inverses of each

other.


25/31

25

Example 6.61. Or 12A1 =

5/12 3/12 4/127/12 3/12 8/12

1/12 3/12 4/12

=

5 3 47 3 8

1 3 4

is the

inverse of 1

12

A.

Example 6.62. Or (A1)t =

5/12 7/12 1/123/12 3/12 3/12

4/12 8/12 4/12

is the inverse ofAt =

1 3 22 2 1

3 1 3

We relate invertibility of a matrix A with its behavior as a linear map between vectorspaces of equal dimension over F.

Theorem 6.63. A matrix A M(n, F) is invertible iff V fA- V is bijective as alinear map iff fA has trivial kernel iff fA has full image iff the columns of A are linearlyindependent iff the rows of A are linearly independent.

Proof. Surely A is invertible with inverse A1

iff fA is bijective with inverse fA1

ifffA has trivial kernel iff fA has full image by the Nullity-Rank theorem iff the columnsA(e1),...,A(en) of A span W iff they are linearly independent iff A

t is invertible iff therows of A are linearly independent.

Example 6.64. Any matrix with a whole row of zeros or a whole column of zeros is sin-gular. Explicitly, ifA =

c1 c2 0

then BA = B

c1 c2 0

=

Bc1 Bc2 0 = I

for any matrix B.

Corollary 6.65. Positive definite matrices are invertible while positive semidefinite ma-trices that are not definite are noninvertible.

Proof. Consider a positive definite matrix A. For x

ker(A), xtAx = 0 so x = 0. Hence A

has trivial kernel as a linear map. By the theorem, A is invertible. On the other hand, ifA is positive semidefinite but not positive definite then there exists some x = 0 such thatxtAx = 0. Consider the quadratic polynomial p(t) = (x+ty)tA(x+ty) for arbitrary y V.Then p(t) 0 since A is positive semidefinite. After expansion, p(t) = ytAyt2 + 2ytAxtwith minimum 0 at t = 0. Hence p(0) = 2ytAx = 0. It follows Ax = 0. By the theorem,A is noninvertible.

Corollary 6.66. For A M(n,R), the product AtA is positive definite iff A has trivialkernel iff At has full image.

Proof. Surely xtAAt = (Ax)t(Ax) 0 for all x V so AtA is positive semidefinite.Everything now follows from corollary 6.65.

6.4.2. Elementary matrices and a method to find A1. An obvious question is how tofind the inverse of an invertible matrix. As the inverse for any invertible A M(1, F) isobvious, we begin with A M(2, F).

Theorem 6.67. A square matrix A =

a bc d

M(2, F) is invertible iff ad bc = 0,

in which case A1 = 1adbc

d bc a


26/31

26

Proof. One can verify that the product of those two matrices equals I M(2, F). Example 6.68. Projection matrix in example 6.7 is singular while each reflection matrixin example 6.7 is invertible. One can see this by either computing the inverse as above or

by geometry.Example 6.69. Given a linear system of 2x + y = 3, x y = 4 before, we could haveformed the augmented matrix

2 1 31 1 4

and used Gaussian Jordan elimination to

solve it. Now we have A =

2 11 1

, A1 =

1/3 1/31/3 2/3

. Hence = A1

34

=

7/35/3

.

Finding the inverse of a more general matrix A M(n, F) is more troublesome. Recallthe three elementary row operations: interchanging two rows, multiplying a row by a

constant, and adding a constant times a row to another row.

Definition 6.70. A matrix A M(n, F) is called elementary if it differs from I by asingle elementary row operation.

Definition 6.71. Two matrices A, B M(n, F) are called row equivalent if they differby a sequence of elementary row operations, in which case we write A B.

Example 6.72.

1 00 1

4r2-

1 00 4

r1+r2 to r1

-

1 40 4

r1r2

-

0 41 4

. Hence

0 41 4

I but it is not an elementary matrix.

For convenience, we denote each elementary row operation by O and its reverse opera-tion by O1.

Theorem 6.73. If E is the elementary matrix from performing an elementary row op-eration O to I then EA = O(A), i.e. multiplying A by E on the left is the same asperforming O to A.

Proof. ff times permits.

Corollary 6.74. Any elementary matrix E is invertible with inverse E1 = O1(I).

Proof. If E = O(I) then E O1(I) = O(O1(I)) = I.

Theorem 6.75. For A M(n, F) the following statements are equivalent,(1) A is invertible.(2) Reduced row echelon form of A is I.(3) A I, or A = Er...E1A = Or(...O1(A)).

Proof. if time permits.

Corollary 6.76. If a sequence of elementary row operations reduces an invertible matrixA to I then that same sequence changes I to A1.


27/31

27

Proof. IfA is invertible then by theorem 6.75 there exists O1,...,Or such that Or(...O1(A)) =I. But then Or(...O1(I))A = Or(...O1(A)) = I by theorem 6.73, so Or(...O1(I)) =A1.

This corollary gives us an algorithm to invert any nonsingular matrix. It will stall ona singular matrix.

Example 6.77. Given A =

2 1 11 2 1

1 1 2

, we perform

2 1 1 1 0 01 2 1 0 1 0

1 1 2 0 0 1

r12 ,r2r1,r3r1-

1 1/2 1/2 1/2 0 00 3/2 1/2 1/2 1 0

0 1/2 3/2 1/2 0 1

2r23 ,r3 r22-

1 1/2 1/2 1/2 0 0

0 1 1/3 1/3 2/3 00 0 4/3 1/3 1/3 1 3r34 ,r2 r33 ,r1 r32-

1 1/2 0 5/8 1/8 3/80 1 0 1/4 3/4 1/40 0 1 1/4 1/4 3/4

r1 r22-

1 0 0 3/4 1/4 1/40 1 0 1/4 3/4 1/4

0 0 1 1/4 1/4 3/4

= I A1

Corollary 6.78. A triangular matrix A M(n, F) is invertible iff aii = 0 for all i.Proof. The algorithm to invert A will produce A1 iff aii = 0 for all i. 6.4.3. Determinant of a Matrix and Cofactor Expansion. One invariant we have associ-ated with a square matrix A

M(n, F) is its trace. Now we associate a second invariant

to A, called determinant and written det(A) that encodes some innate characteristics ofA. We do it by induction.

Definition 6.79. For A =

a bc d

M(2, F) we define det(A) =

a bc d = ad bc.

Example 6.80. det

1 23 4

=

1 23 4 = 1 4 2 3 = 2

Suppose we have defined determinant for A M(n 1, F). We consider two moreingredients for determinant of a general n n matrix.

Definition 6.81. For A = (aij) M(n, F) we define the minor Mij of entry aij to bethe determinant of the (n 1) (n 1) submatrix that remains after deleting the ith rowand jth column from A. We define the cofactor Cij of entry aij to be (1)i+jMij .Definition 6.82. For A = (aij) M(n, F) we define its cofactor expansion along row ithto be

nj=1

aijCij and its cofactor expansion along column jth to be

ni=1

aijCij

One obvious question is whether cofactor expansion depends on the choice of row vs.column, or the choice of which row, or the choice of which column. Here is the answer.


28/31

28

Theorem 6.83. Cofactor expansion of A = (aij) M(n, F) is independent of the choiceof row vs. column, or the choice of row ith, or the choice of column jth.

Proof. One way is to explicitly write out any two cofactor expansions in terms of the aij

and compare. It is tedious bookkeeping. Finally we can define determinant of a general n n matrix A.

Definition 6.84. For A = (aij) M(n, F) we define its determinant det(A) =n

j=1

aijCij =

ni=1

aijCij any cofactor expansion along any row or any column.

It takes us long to define determinant but it is not hard to compute.

Example 6.85.

det

1 2 3

2 3 13 1 2

= 1 3 11 2 2

2 1

3 2+ 3

2 3

3 1

= 2 2 13 2

+ 3 1 33 2

1 1 32 1

= ... = 18Example 6.86.

det

1 2 3 42 3 1 53 1 2 62 1 3 7

= 7

1 2 32 3 13 1 2

6

1 2 32 3 12 1 3

+ 5

1 2 33 1 22 1 3

4

2 3 13 1 22 1 3

Computation of determinant is easier for matrices full of zeros. If A has a row or acolumn full of zeros then we choose to calculate cofactor expansion along that row or

column. As a special case, any triangular matrix A has determinantn

i=1

aii. Another way

to compute determinant is via row reduction. For this, we need to understand how eachelementary operation affects determinant.

Theorem 6.87. For A M(n, F),(1) If O multiplies a row or column of A by then det(O(A)) = det(A).(2) If O interchanges two rows or two columns of A then det(O(A)) = det(A).(3) IfO adds a multiple of one row to another or a multiple of one column to another

then det(O(A)) = det(A).

Proof. Straightforward from definition of determinant by cofactor expansion.

Corollary 6.88. If two rows or two columns of A M(n, R) are proportional thendet(A) = 0.

Proof. By theorem 6.87[3] we can reduce A to a matrix with a row of zeros or a columnsof zeros and both have the same determinant 0.

We apply theorem 6.87 to elementary matrices first.


29/31

29

Example 6.89.

1 0 00 1 00 0 2

= 2,

0 0 11 0 00 1 0

= 1, and

1 0 30 1 00 0 1

= 1. In any case,

det(E)= 0 for any elementary matrix E.

Example 6.90.

1 2 33 4 53 6 9

= 0We apply theorem 6.87 to calculate determinant of a general n n matrix.

Example 6.91. Reduce

0 1 23 4 5

6 7 8

to diagonal form by elementary row operations,

keeping track of how det changes along the way. This could have been done with columnoperations as well.

Example 6.92.

0 0 11 0 0

0 1 0

+

1 0 30 1 0

0 0 1

=

1 0 41 1 00 1 1

. Determinant does notrespect addition.

However, determinant does respect matrix multiplication. Toward this we have a pre-lude.

Lemma 6.93. If E is an elementary matrix then det(EA) = det(E)det(A) for any A M(n, R).

Proof. If E = O(I) then EA = O(A) and the result follows from theorem 6.87.

We now list the immediate properties of determinant.

Theorem 6.94. If A,B,C M(n, F) are square matrices then,(1) (Compatibility with transpose) det(At) = det(A).(2) det(A) = ndet(A).(3) If A,B,C differ only in one row ith and aij + bij = cij for all j then det(A) +

det(B) = det(C). The result holds for columns as well.(4) (Compatibility with multiplication) det(AB) = det(A)det(B).

Proof. (1), (2) and (3) follow from definition of determinant. For (4) we consider two cases.If A is not invertible then neither is AB and det(AB) = det(A)det(B) = 0 by theorem

6.96. If A is invertible then we write A = Ek...E1, hence det(AB) = det(Ek...E1B) =det(Ek)...det(E1)det(B) = det(A)det(B) by lemma 6.93.

Example 6.95. We can verify some instances of the property det(AB) = det(A)det(B). 0 0 11 0 0

0 1 0

1 0 30 1 0

0 0 1

=

0 0 11 0 00 1 0

1 0 30 1 00 0 1

We arrive at the most important piece of information in determinant.


30/31

30

Theorem 6.96. A matrix A M(n, F) is invertible iff det(A) = 0. In that case,det(A1) = det(A)1.

Proof. Reduce A to its reduced row echelon form R = Ek...E1A, then det(A) = 0 iffdet(R) = 0 iff R = I iff A is invertible. In that case, det(A)det(A

1

) = det(AA1

) =det(I) = 1.

Example 6.97. We can test whether a matrix A M(n, F) is invertible before usingrow reduction to find its inverse. If det(A) = 0 then we stop, else we go.

6.4.4. Adjugate matrix and another way to find A1. There is another way to find inverseof an invertible matrix beside using elementary row operations. We have actually seen it

in the 2 2 case A =

a bc d

and A1 = 1det(A)

d b

c a

Definition 6.98. IfA M(n, F) is a square matrix then the matrix (Cij), Cij the cofactorof aij is called the matrix of cofactors of A and its transpose (Cij)

t is called the adjugate(or adjoint in some literature) of A, denoted adj(A).

Example 6.99. The matrix

1 2 33 2 1

2 1 3

has cofactors C11 = 5, C12 = 7, C13 =

1, C21 = 3, C22 = 3, C23 = 3, C31 = 4, C32 = 8, C33 = 4. Hence its matrix of

cofactors is

5 7 13 3 3

4 8 4

and its adjugate is

5 3 47 3 8

1 3 4

We state and prove our suspicion for the n n case.Theorem 6.100. If A M(n, F) is invertible then A1 = 1det(A)adj(A).Proof. Consider A adj(A) = (aij)(Cij)t = (dij) where dij = ai1Cj1 + ... + ajnCjn. Ifi = j then dij is precisely det(A), else dij is the determinant of the matrix from replacingthe jth row of A with its ith row. Since this matrix has two rows that are the same, itsdeterminant is 0 by corollary 6.88. Hence A adj(A) = det(A)I and we are done.

Example 6.101. Since A =

1 2 33 2 1

2 1 3

has determinant 12, it is invertible with

inverse A1

=

1

12 5 3 4

7 3 81 3 4

=

5/12 3/12 4/127/12 3/12 8/121/12 3/12 4/12

as already seen in

example 6.58.

6.4.5. General matrices. When Vf- W is a linear map between vector spaces V and

W of possibly different dimension n and m then its matrix representation Af has sizem n and we have more to keep track of. If m > n then Af has more rows than columnsand is called a tall matrix. If m < n then A has more columns than rows and is called awide matrix.


31/31

31

Definition 6.102. A matrix A M(m,n,F) is called left invertible if there exists amatrix B M(n,m,F) such that BA = I M(n, F).Theorem 6.103. A matrix A M(m,n,F) is left invertible iff A has trivial kernel.Proof. IfA has a left inverse B then A(v) = 0, v V implies v = I(v) = BA(v) = B(0) =0, so A has trivial kernel. Conversely, suppose A has trivial kernel. By corollary 6.66AtA is positive definite, hence invertible with inverse (AtA)1. But then ((AtA)1At)A =(AtA)1(AtA) = I, so A is left invertible.

Corollary 6.104. Any left invertible matrix A must be square or tall.

Proof. By theorem 6.103, A has trivial kernel and V embeds into W, so m n (recallNullity-Rank theorem).

This left inverse (AtA)1At for A is called Moore-Penrose pseudoinverse, it is not unique.Recall that a matrix A is called orthogonal if AtA = I, so it has left inverse and conse-

quently as many or more rows than columns. The case m n is mirrored.Definition 6.105. A matrix A M(m,n,F) is called right invertible if there exists amatrix B M(n,m,F) such that AB = I M(m, F).Theorem 6.106. A matrix A M(m,n,F) is right invertible iff A has full image.Proof. We see A is right invertible iffAt is left invertible iffAt has trivial kernel by theorem6.103 iff (At)t = A has full image by 6.66.

Corollary 6.107. Any right invertible matrix A must be square or tall.

Proof. IfA is right invertible then At has trivial kernel, hence left invertible, hence squareor tall. So A must be square or wide.

A right inverse for a right invertible matrix A is At(AAt)1 as we can imagine. It isalso called Moore-Penrose pseudoinverse. Note that a right inverse for A may not be itsleft inverse and vice versa.

6.5. Exercises.

Exercise 6.108. page 41: 2.3, 2.4, 2.5, 2.6, 2.7, 2.8, 2.10, 2.12.

Exercise 6.109. page 63: 3.1, 3.4, 3.6, 3.17

linear algebra weeks 1234

Documents