the basics of vectors, matrices and linear algebra · 2020-07-02 · vectors,matrices and...

THE BASICS OFVECTORS, MATRICES

ANDLINEAR ALGEBRA

LUKE COLLINSmaths.com.mt/notes

Version 0.4* (30th March, 2019)

Contents1 Introduction 3

1.1 Vector Operations . . . . . . . . . . . . . . . . . . . . . . . . . 41.2 Distances and Angles . . . . . . . . . . . . . . . . . . . . . . . 91.3 Matrix Operations . . . . . . . . . . . . . . . . . . . . . . . . . 17

2 Computational Methods 322.1 Systems of Linear Equations . . . . . . . . . . . . . . . . . . . 322.2 Elementary Row Operations . . . . . . . . . . . . . . . . . . . 372.3 Determinants . . . . . . . . . . . . . . . . . . . . . . . . . . . 392.4 The Adjugate Matrix . . . . . . . . . . . . . . . . . . . . . . . 462.5 Gaussian Elimination . . . . . . . . . . . . . . . . . . . . . . . 48

3 Some Geometry 483.1 Lines . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 483.2 Planes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 48

*If you find any mathematical, grammatical or typographical errors whilst reading thesenotes, please let the author know via email: [email protected].

1

https://drmenguin.com

http://maths.com.mt/notes

mailto:[email protected]

Contents Luke Collins

3.3 Mensuration . . . . . . . . . . . . . . . . . . . . . . . . . . . . 48

4 Linear Maps 484.1 Matrices are Linear Maps . . . . . . . . . . . . . . . . . . . . 484.2 Rank and Nullity . . . . . . . . . . . . . . . . . . . . . . . . . 484.3 The Dimension Theorem . . . . . . . . . . . . . . . . . . . . . 484.4 Eigenvalues and Eigenvectors . . . . . . . . . . . . . . . . . . 48

Appendix A Preliminaries 49A.1 Naïve Set Theory . . . . . . . . . . . . . . . . . . . . . . . . . 49A.2 Big Operators . . . . . . . . . . . . . . . . . . . . . . . . . . . 57

Appendix B Solutions to Exercises 60

2 PRELIMINARY VERSION 0.4

§1 | Introduction Luke Collins

1 IntroductionYou are encouraged to look through appendix A before you start readingthese notes.

Recall that all pairs (x, y) of real numbers are regarded as points in the xy-plane, where the set of all such points is denoted by

R2 = R×R.

Here we will interpret the pair (x, y) in two ways: sometimes as the point(x, y) in the plane just as before, which we will call the position (x, y); othertimes as the directed line segment taking us from the origin (0, 0) to the point(x, y), which we call the vector (x, y).

x

y

(4, 5)

FIGURE 1: The position (4, 5)

x

y

(4, 5)

FIGURE 2: The vector (4, 5)

The distinction between the two interpretations is rarely important, andwhenever the distinction is important, it is often clear from the context.

Note that vectors which are translated in the plane (that is, vectors whichare moved so that their tails do not sit at the origin (0, 0)) correspond to thesame pair of coordinates (x, y), since what the pair of numbers represent inthis case is the displacement from the tip of the arrow to its head. Thus if avector is translated, we treat the tip as the “new origin”, and read off thecoordinates at the head of the arrow, thus obtaining the same pair (x, y).

These ideas easily extend to the ordered triples (x, y, z) of real numbers,corresponding to points or vectors in three dimensional space

R3 = R×R×R.


§1.1 | Vector Operations Luke Collins

x

y

(4, 5)

FIGURE 3: Still the vector (4, 5)

x

y

z

2

3

2

(2, 2, 3)

FIGURE 4: A vector in R3

Nothing stops us from considering the set Rn = ∏ni=1 R of ordered n-

tuples. Although there is no geometric meaning for n > 3, it is convenientto use geometric language. Thus, we still call these tuples points or vectors,its entries are called coordinates or components, and the set as a whole we calln-dimensional (Euclidean) space.

1.1 Vector OperationsHere we introduce some operations on vectors. We denote vectors usingsingle letters in bold typeface, and their coordinates are denoted using thesame letter with corresponding subscripts. Thus we write

x = (x1, x2, . . . , xn)

for example. We sometimes write# »

Ox when we want to emphasise that weconsider x to be a directed line segment whose tail sits at the origin (as infigure 2). In this case,

# »

Ox is called the position vector of x. In R2 and R3, weuse the letters x, y, z to avoid subscripts, so we write u = (x, y) ∈ R2 andv = (x, y, z) ∈ R3 for example. In writing, you are encouraged to underlinevectors to distinguish them from numbers, e.g., writing v, for v.

Notation. We adopt the notation

v = (vi)n

or simply v = (vi) to stand for v = (v1, v2, . . . , vn), where vi is denoting thegeneral ith component of the vector v.



Definition 1.1 (Vector Addition). Let u = (ui) and v = (vi) be two vectorsin Rn. Then the sum u + v is defined by

u + v ..= (ui + vi).

Example 1.2. In R3, if u = (u1, u2, u3) and v = (v1, v2, v3), then

u + v = (ui + vi) = (u1 + v1, u2 + v2, u3 + v3).

Remark 1.3. Observe that the vector addition u + v corresponds to the posi-tion obtained when translating the vector v such that its tail is at the headof the vector u, or vice-versa; as shown in figure 5. This is a consequence ofthe fact that we think of vectors as representing only relative displacement,and not position. Think about it this way: first we travel from the tip ofthe vector u to the head, and then treating the head as if it were the “neworigin”, we travel along the vector v. This is known as the parallelogram law.

x

y

u

v

v

uu +

v

FIGURE 5: Illustration of the parallelogram law in R2

Definition 1.4 (Scalar Multiplication). Let λ ∈ R, and let v = (vi) be avector in Rn. Then the scalar multiplication of v by λ, denoted λv, is thevector given by

λv = (λvi).

Example 1.5. In R3, if u = (x, y, z) then

λu = (λx, λy, λz).



Remark 1.6. The reason we call this operation scalar multiplication is thatthe result of λv is a scaled version of v by a factor of λ. When λ < 0, thenthe direction of v is reversed. In particular, −1v, which we denote by −v,corresponds to the vector with the arrow head and tail interchanged.

x

y

λv

v

−v

FIGURE 6: Illustration of scaling in R2

As a consequence of this scaling behaviour, we call single real numbersscalars instead of numbers throughout. Thus the entries in a vector arescalars, for example.

Notation. As mentioned in remark 1.6, we denote −1v by −v, and we alsointroduce the difference between two vectors, denoted u− v, defined by

u− v = u + (−v).

Remark 1.7. A relative vector is a vector which takes us from a position a toa position b, that is, another vector v such that a + v = b. This vector v isgiven by b− a, as illustrated in figure 7.

Sometimes positions are denoted using upper case letters such as A or B.In this case, the vector from A to B is denoted by

# »

AB, thus# »

AB =# »

OB− # »

OA.

Example 1.8. The vector from position a = (1, 3, 2) to b = (−1, 0, 1) isgiven by b − a = (−1, 0, 1) − (1, 3, 2) = (−2,−3,−1). Indeed, if we add



x

y

a

b−a

−a

b−

a

b− a

FIGURE 7: Relative vector from a to b in R2

(−2,−3,−1) to a, we get

a + (−2,−3,−1) = (1, 3, 2) + (−2,−3,−1) = (−1, 0, 1) = b,

so (−2,−3,−1) “takes us” from a to b, as expected.

Definition 1.9 (Zero vector). The vector 0 = (0) = (0, . . . , 0) is called thezero vector or the origin.

Note. 0 6= 0. One is a vector with n entries, the other is a scalar.

Theorem 1.10 (Vector space properties in Rn). Let u, v, w be three vectors inRn, and let λ, µ ∈ R be scalars. Then the following properties hold:

u + (v + w) = (u + v) + wI) u + v = v + uII)

u + 0 = uIII) v + (−v) = 0IV)

λ(µv) = (λµ)vV) 1v = vVI)

λ(u + v) = λu + λvVII) 0v = 0VIII)

(λ + µ)v = λv + µvIX)

Proof. These results all easily follow from the definitions, and properties ofreal numbers, e.g. for I, we have

u + (v + w) = (ui)n + ((vi) + (wi))n



= (ui)n + (vi + wi)n (by definition 1.1)= (ui + (vi + wi))n (by definition 1.1)= ((ui + vi) + wi)n (by associativity of addition in R)= (ui + vi)n + (wi)n (by definition 1.1)= ((ui)n + (vi)n) + (wi)n (by definition 1.1)= (u + v) + w,

as required. Similarly for VIII, we have

0v = (0vi) = (0) = 0.

The proofs of the remaining properties are left as an exercise.

Exercise 1.11. Try to visualise each of the “laws” in theorem 1.10in terms of scaling and translation, as we illustrated in the variousfigures (figures 5 to 7). Construct figures which show the equalitiesof each. Then, provide a proof for each of them.

Definition 1.12 (Basic Unit Vectors). Let δik, known as the Kronecker delta,be defined by

δik =

{1 if i = k0 otherwise.

Then we define the basic unit vectors ek for k = 1, . . . , n in Rn by

ek = (δik)n,

that is, ek has a k entry in the ith position, and zeros everywhere else:

e1 = (1, 0, 0, . . . , 0)e2 = (0, 1, 0, . . . , 0)

...en = (0, 0, 0, . . . , 1).

In R2, we denote e1 and e2 by i and j, so i = (1, 0) and j = (0, 1). Similarlyin R3, we denote e1, e2 and e3 by i, j and k, so i = (1, 0, 0), j = (0, 1, 0), andk = (0, 0, 1).


§1.2 | Distances and Angles Luke Collins

Definition 1.13 (Linear combination of vectors). Let v1, v2, . . . , vk ∈ Rn,and let λ1, λ2, . . . , λk ∈ R be scalars. Then the vector

v =k

∑i=1

λivi = λ1v1 + λ2v2 + · · ·+ λkvk

is said to be a linear combination of the vectors v1, v2, . . . , vk.

Example 1.14. Let u = (1, 2, 3), v = (3, 2,−1) and w = (1, 0, 7) be vectors inR3. Then the vectors

u + 3v = u + 3v + 0w = (10, 8, 0) and 3u + v− 5w = (1, 8,−27)

are examples of linear combinations of u, v and w.

Theorem 1.15. Every vector v ∈ Rn can be written as a linear combination ofthe vectors e1, e2, . . . , en.

Proof. Let v = (vi) be any vector in Rn. Then

v = (v1, v2, . . . , vn)

= (v1, 0, . . . , 0) + (0, v2, . . . , 0) + · · ·+ (0, 0, . . . , vn)

= v1(1, 0, . . . , 0) + v2(0, 1, . . . , 0) + · · ·+ vn(0, 0, . . . , 1)

= v1e1 + v2e2 + · · ·+ vnen =n

∑k=1

vkek,

as required.

Remark 1.16. In particular, any vector u = (a, b) ∈ R2 can be expressed interms of i and j as

ai + bj,

and similarly any vector v = (a, b, c) ∈ R3 can be written as ai + bj + ck.

1.2 Distances and AnglesSo far we have encoded positions in Rn, as well as operations we can carryout when interpreting them as directed line segments (vectors), but wehave not yet described the notion of distance between positions; i.e., wedo not yet have a way to express that (1, 0) is closer to (1, 2) than to (5, 5),for example.



In the chapter on geometry, we defined the distance function (or metric)d : R2 ×R2 → R, where for two points A = (a1, a2) and B = (b1, b2) in R2,we have

d(A, B) =√(a1 − b1)2 + (a2 − b2)2

which is inspired by Pythagoras. Here we do things in a different butequivalent way which nicely generalises to Rn. We first define the lengthof a vector in terms of the dot product.

Definition 1.17 (Dot Product). Let u = (ui), v = (vi) be vectors in Rn.Then the dot product or scalar product of u and v, denoted u · v or 〈u, v〉, is thescalar defined by

u · v =n

∑i=1

uivi = u1v1 + u2v2 + · · ·+ unvn.

Example 1.18. (1, 2, 3) · (4, 5, 6) = 1 · 4 + 2 · 5 + 3 · 6 = 32.

Definition 1.19. The length (or magnitude or norm) of a vector v ∈ Rn, de-noted ‖v‖ or |v|, is the scalar defined by

‖v‖ =√

v · v.

When a vector v has ‖v‖ = 1, then v is called a unit vector or a direction.

Example 1.20. ‖(1, 2, 3)‖ =√(1, 2, 3) · (1, 2, 3) =

√12 + 22 + 32 =

√14.

For v = (x, y) ∈ R2, we have ‖v‖ =√

x2 + y2 =√(x− 0)2 + (y− 0)2.

With our old definition, this turns out to be the distance d((0, 0), (x, y))from the tip of the vector v to its head.

Definition 1.21 (Distance). Let a and b be two positions in Rn. Then thedistance between a and b, denoted d(a, b), is the length of their relativevector; i.e.,

d(a, b) = ‖b− a‖.

Remark 1.22. For A = (a1, a2) and B = (b1, b2) in R2, we have

d(A, B) = ‖ # »

AB‖ = ‖ # »

OB− # »

OA‖ = ‖(b1, b2)− (a1, a2)‖= ‖(b1 − a1, b2 − a2)‖

=√(b1 − a1)2 + (b2 − a2)2,

which shows that definition 1.21 agrees with our old definition for d(A, B).



Exercise 1.23. 1. Find the distance between the following pairs ofvectors.

(1, 2) and (3, 4)a) (1, 2, 3) and (−1, 0, 1)b)

i and j in R2c) i and j in R3d)

2. Show that in general for a, b ∈ Rn,

d(a, b) =

√n

∑i=1

(ai − bi)2.

3. Let c = (a, b) ∈ R2 and r ∈ R where r > 0. Show that the setof points C = {x ∈ R2 : d(c, x) = r} correspond to a circle,centred at (a, b) with radius r.

Definition 1.24 (Midpoint). Let u = (ui), v = (vi) ∈ Rn. The midpoint of uand v is the position m with coordinates

m =

(ui + vi

2

),

i.e., the coordinates of m are the averages of the corresponding coordinatesof u and v.

Example 1.25. The midpoint of (1, 3,−5) and (5,−3, 2) is

m = ( 1+52 , 3−3

2 , −5+22 ) = (3, 0,− 3

2 ).

Proposition 1.26. Let u, v ∈ Rn and let m be their midpoint. Then

d(u, m) = d(m, v),

i.e., the midpoint m lies “in the middle” of u and v.

Proof. This goes similarly to the proof for R2 in geometry:

d(u, m) = d((ui),

(ui + vi

2

))=

∥∥∥∥(ui)−(

ui + vi

2

)∥∥∥∥ =

∥∥∥∥(ui −ui + vi

2

)∥∥∥∥ =

∥∥∥∥(ui − vi

2

)∥∥∥∥11 PRELIMINARY VERSION 0.4


=

∥∥∥∥(ui + vi

2− vi

)∥∥∥∥ =

∥∥∥∥(ui + vi

2

)− (vi)

∥∥∥∥= d

((ui + vi

2

), (vi)

)= d(m, v),

as required.

Notation (Scalar division). Let λ ∈ R, λ 6= 0, and v ∈ Rn. We adopt thenotation v

λ

or v/λ in-line, to stand for 1λ v.

Definition 1.27 (Normalised vector). Let v ∈ Rn be a non-zero vector. Thenormalised version or direction of v, denoted v, is the vector v/‖v‖.

Proposition 1.28. Every normalised (non-zero) vector is a unit vector.

Proof. Let v ∈ Rn r {0}. Then

‖v‖2 =

∥∥∥∥ v‖v‖

∥∥∥∥2

=v‖v‖ ·

v‖v‖ =

v · v‖v‖2 =

v · vv · v = 1,

and so ‖v‖ = 1, as required.

Exercise 1.29. Do you believe the proof presented for proposition 1.28?Are there any unjustified steps? Yes! The step v

‖v‖ ·v‖v‖ =

v·v‖v‖2 is not

obvious for the dot product (remember this is not ordinary multipli-cation). Prove using the definition of the dot product that if u, v ∈ Rn

and λ, µ ∈ R, then (λu) · (µv) = (λµ)(u · v). (In the case of the proof,this was applied with λ = µ = 1

‖v‖ and u = v.)

Proposition 1.30. Every non-zero vector v ∈ Rn can be written uniquely as λufor some λ > 0 and unit vector u ∈ Rn, where λ = ‖v‖ and u = v.

Proof. Clearly v = ‖v‖ v by definition of v.

Now for uniqueness, suppose v = λu where u is unit, and 0 < λ 6= ‖v‖.But then

‖v‖ = ‖λu‖ by exercise 1.31 1(c)============ |λ|‖u‖ = |λ| = λ 6= ‖v‖



since u is unit, a contradiction. Therefore we must have λ = ‖v‖, and sosuppose v = λu where u is unit but this time u 6= v. Then

v = λu = ‖v‖u

since λ must be ‖v‖. But it follows that u = v/‖v‖ = v, a contradiction.Therefore we must have λ = ‖v‖ and u = v, so uniqueness follows.

Exercise 1.31. We assume that the easy results proved in these exer-cises are known throughout the rest of the notes.

1. Prove the following for any u, v ∈ Rn and λ ∈ R.

0 · v = 0a) v · v = ‖v‖2b)

‖λv‖ = |λ|‖v‖c) u · v = ‖u‖‖v‖(u · v)d)

2. Prove the following for any u, v, w ∈ Rn and λ ∈ R.

u · v = v ·wa) u · u = 0 ⇐⇒ u = 0b)

u · (v + w) = u · v + u ·wc)

u · (λv) = λ(u · v) = (λu) · vd)

‖u + v‖ = ‖u‖2 + 2u · v + ‖v‖2e)

(u + v) · (u− v) = ‖u‖2 + ‖v‖2f)

3. Show, diagrammatically, that any unit vector u ∈ R2 has theform

u = (cos θ, sin θ)

where θ ∈ [−π, π] is the angle u makes with the x-axis.

4. Show that for any vector v ∈ Rn, the ith component of v isgiven by vi = 〈v, ei〉, and

v =n

∑k=1〈v, ei〉 ei.



Proposition 1.32 (Cauchy–Schwarz Inequality). Let u, v ∈ Rn. Then

|u · v| 6 ‖u‖‖v‖.

Proof. We prove that u · v 6 ‖u‖‖v‖, because then replacing u with −uyields ‖u‖‖v‖ = ‖−u‖‖v‖ > (−u) · v = ∑n

i=1(−uivi) = −∑ni=1 uivi =

−u · v, so that −‖u‖‖v‖ 6 u · v.

Clearly for any x, y ∈ R, we have (x − y)2 > 0, which expands to givex2 + y2 > 2xy. If we suppose for now that u and v are unit vectors, bydefinition we get

u · v =n

∑i=1

uivi 6n

∑i=1

ui2 + vi

2

2=

12

(n

∑i=1

ui2 +

n

∑i=1

vi2

)

=12(‖u‖2 + ‖v‖2)

=12(1 + 1) = 1 = ‖u‖‖v‖.

So the result holds for unit vectors. If u and v are not unit (and not zero),then their normalised versions are unit by proposition 1.28, so u · v 6 1,and thus (u · v)/‖u‖‖v‖ 6 1. Finally if any of u, v are zero, the result isimmediate.

The Cauchy–Schwarz inequality is a very useful inequality. We need it herefor the following definition, because it ensures that the inverse cosine of adot product of two unit vectors is always defined.

Definition 1.33 (Angle between two vectors). Let u, v ∈ Rn. The angle](u, v) between the vectors u, v is the real number ](u, v) in [0, π] definedby

](u, v) = cos−1 (u · v) .

Remark 1.34. Note that there is no geometric meaning for angles in Rn forn > 3, that is why we take this definition. Let us show that it agrees withour usual understanding of an angle in R2.

Let u, v ∈ R2. By proposition 1.30 and by exercise 1.31.3, we can write

u = ‖u‖(cos θ, sin θ) and v = ‖v‖(cos φ, sin φ)

where θ, φ ∈ [−π, π] are the angles u and v make with the x-axis.



Sketching a quick diagram, we see that the angle between u and v is θ − φ.Indeed, we have

u · v = ‖u‖‖v‖(cos θ cos φ + sin θ sin φ) = ‖u‖‖v‖ cos(θ − φ),

so that cos(θ − φ) = u·v‖u‖‖v‖ = u · v, as required.

Definitions 1.35. Let u, v ∈ Rn be two vectors.

(i) u and v are said to be in the same direction if u = v, and in oppositedirections if u = −v. In either case, u and v are said to be parallel,denoted u ‖ v.

(ii) u and v are said to be perpendicular or orthogonal if u · v = 0.

Proposition 1.36. Let u, v ∈ Rn be two vectors. Then

(i) If u and v are in the same direction, the angle between them is 0.

(ii) If u and v are in opposite directions, the angle between them is π.

(iii) If u and v are perpendicular, the angle between them is π2 .

Proof. These easily follow from definition 1.33. For (i), u = v gives thatu · v = u · u = ‖u‖2 = 1, so in this case ](u, v) = cos−1(1) = 0. For (ii),we have u = −v, so now u · v = u · (−u) = −(u · u) = −‖u‖2 = −1, andtherefore ](u, v) = cos−1(1) = π. Finally for (iii), we have

u · v =u‖u‖ ·

v‖v‖ =

u · v‖u‖‖v‖ =

0‖u‖‖v‖ = 0,

so ](u, v) = cos−1(0) = π2 .

Exercise 1.37. 1. A triangle ABC has vertices A(0,−1, 1), B(2, 3,−2)and C(3, 1, 0). Express the vectors

# »

AB,# »

BC and# »

CA in terms ofi, j and k, and hence find the lengths of the three sides.

2. Suppose u and v are orthogonal vectors in Rn. Show that

‖u + v‖2 = ‖u‖2 + ‖v‖2

and explain why this is equivalent to Pythagoras’ theorem.

3. In triangle ABC, A = (3, 3,−2), B = (−2, 0, 5) and C = (1,−2, 1).If L and M are the midpoints of AB and AC respectively, show



that LM is parallel to BC.

4. Simplify the expression ‖b‖2 + ‖c‖2− (b− c) · (b− c). By tak-ing b =

# »

AC and c =# »

AB, deduce the cosine formula

a2 = b2 + c2 − 2bc cos(BAC)

for triangle ABC shown in figure 8.

A

B

C

c

b

a

FIGURE 8: Triangle ABC

5. Suppose a, b, c and d are positions of the vertices of a parallel-ogram. Express d in terms of the other three vectors.

6. Suppose (1, 2) and (4, 1) are opposite vertices of a square in R2.Find the coordinates of the other two vertices.

7. The vectors 0, x, y, and x+ y are the position vectors of verticesof a parallelogram. Show that the sum of the squares of thediagonals is equal to the sum of the squares of the sides, i.e.

‖x + y‖2 + ‖x− y‖2 = 2‖x‖2 + 2‖y‖2.

Deduce Euclid’s median formula: in a triangle ABC where Mis the midpoint of BC,

AM2 =AB2 + AC2

2− BC2

4.

8. Points on the perpendicular bisector of the line segment# »

ABsatisfy the equation ‖x− # »

OA‖ = ‖x− # »

OB‖. Show that for R2,this expands out to

2(a1 − b1)x + 2(a2 − b2)y = a12 + a2

2 − b12 − b2

2

where x = (x, y), A = (a1, a2) and B = (b1, b2).


§1.3 |Matrix Operations Luke Collins

1.3 Matrix OperationsA matrix is, in some sense, a generalisation of a vector. Consider the set

(R2)3 = (R×R)3 = (R×R)× (R×R)× (R×R).

This contains elements of the form ((a, b), (c, d), (e, f )). We will write theseelements instead as (

a c eb d f

).

Such an object is what we call a 2× 3 matrix, and the set of such matricesis denoted R2×3. Let us give a general definition.

Definition 1.38 (Matrix). An m × n matrix is a rectangular array of realnumbers, called entries, arranged in m rows and n columns. The expres-sion m× n is called the size of the matrix, and the set of all m× n matricesis denoted Rm×n.

An m× n matrix can be expressed in general as

A =

a11 a12 · · · a1na21 a22 · · · a2n...

.... . .

...am1 am2 · · · amn

or, similarly to vectors, concisely as

A = (aij)m×n

where aij denotes the entry in the ith row and jth column, i and j beingcalled the row and column indices, respectively. As before, we relax this to(aij) when the size is clear in context.

As we have seen already, matrices are denoted by single capital letters inbold typeface, and their entries are denoted using the corresponding smallletter with two subscripts ranging over the rows and columns (i = 1, . . . , mand j = 1, . . . , n).

When m = n (i.e., size n× n) the matrix is said to be square, and the entriesaii (that is, a11, a22, . . . , ann) make up the diagonal of the matrix. If a matrixis not square, it is called rectangular.

We identify n-vectors (i.e., vectors in Rn) in the world of matrices with n× 1matrices. So for example, the vector (x, y) corresponds to the matrix (x

y).



Examples 1.39. Consider the following matrices.

A =

1 23 45 6

, B =(1 2 −5 7 12

), C =

(4)

,

D =

0 π e12 − 1

314

0 0 0

, E =

(2 35 1

), x =

010

.

These matrices are, in order, 3 × 2, 1 × 5, 1 × 1, 3 × 3, 2 × 2 and 3 × 1.Matrices C, D and E are square, whereas A, B and x are rectangular.

The matrix x is equivalent to the vector (0, 1, 0), whereas the matrix B is nota vector (1× n matrices are sometimes called row vectors or covectors, butthey are not considered vectors). Notice that we still use lowercase lettersfor vectors here.

The matrix C is a number (or scalar). We do not distinguish between scalarsand 1× 1 matrices.

Definition 1.40 (Matrix Equality). Let A = (aij)m×n and B = (bij)`×k. Thenthe matrices A and B are equal, denoted

A = B,

if m = `, n = k and aij = bij for all i = 1, . . . , m and j = 1, . . . , n.

Example 1.41. None of the matrices below are equal to each other.(1 23 4

) (1 2 03 4 0

) (1 23 5

)Definition 1.42 (Matrix Addition). Let A = (aij) and B = (bij) be two m× nmatrices. Then the sum A + B is the m× n matrix given by

A + B = (aij + bij).

Matrices of different size cannot be added.

Example 1.43. Consider the matrices

A =

(1 2 30 5 −4

), B =

(4 −1 06 11 −13

).



Then their sum is

A + B =

(1 + 4 2− 1 3 + 00 + 6 5 + 11 −4− 13

)=

(5 1 36 16 −17

).

Definition 1.44 (Scalar Multiplication). Let λ ∈ R be a scalar, and let A =(aij) be an m× n matrix. Then the scalar multiplication of A with λ, denotedλA, is the matrix given by

λA = (λaij).

Example 1.45. Consider the matrix

A =

(2 0−1 5

).

Then the matrix 5A is

5A =

(10 0−5 25

).

Remark 1.46. The operations of addition and scalar multiplication whichwe defined here coincide with those defined earlier for vectors. What thismeans is that if we treat a vector v ∈ Rn as an n× 1 matrix, and apply thedefinitions given here for addition and scalar multiplication, then the resultwill be identical to what we expect using the original definitions.

Notation. Just as we did with vectors, we denote the matrix −1A by −A,and introduce the difference A− B between two matrices (of the same size),defined by

A− B = A + (−B).

Now we introduce something which we has no vector analogue.

Definition 1.47 (Matrix Multiplication). Let A = (aij)m×d and B = (bij)d×nbe two matrices. Then we define the product of A and B, denoted AB, to bethe m× n matrix given by

AB =

(d

∑k=1

aikbkj

)m×n

.

Remark 1.48. This definition seems rather complicated, so let’s break it down.Notice that the index of summation, k, varies the column index j of aij, and



the row index i of bij. Thus the ijth entry of AB is

d

∑k=1

aikbkj = ai1b1j + ai2b2j + · · ·+ aidbdj.

If we consider the rows of the matrix A to be the vectors a1, a2, . . . , am andthe columns of the matrix B to be the vectors b1, b2, . . . , bn, then the ijthentry turns out to be the dot product ai · bj, that is,

AB =

— a1 —— a2 —

...— am —

| | |

b1 b2 · · · bn| | |

=

a1 · b1 a1 · b2 · · · a1 · bn

a2 · b1 a2 · b2 · · · a2 · bn...

.... . .

...

am · b1 am · b2 · · · am · bn

Thus we say that the each entry is obtained by doing “row times column”.Notice the restriction that this places on the dimensions of the matrices Aand B: they need to be of size m× d and d× n, so that the dot product isbetween two vectors of dimension d (that is, both have d entries).

When two matrices A and B are of size m × n and ` × k with n = ` asrequired for multiplication, we say that they are compatible, conformal orthat their inner dimensions match. Otherwise, the product AB does not exist.

Let us give some examples.

Examples 1.49. Take the matrices A, B, C, D, E, x from examples 1.39:

A =

1 23 45 6

, B =(1 2 −5 7 12

), C =

(4)

,

D =

0 π e12 − 1

314

0 0 0

, E =

(2 35 1

), x =

010

.



The product AD does not exist, since A is of size 3× 2 and D is of size 3× 3.DA on the other hand does exist, since D has size 3× 3 and A has size 3× 2:

D3×3A3×2 =

0 π e12 − 1

314

0 0 0

1 2

3 45 6

We know that the resulting product will have size 3× 2 by definition (D3×3A3×2).Now each ijth entry is the dot product of the ith row with the jth column:

DA =

(0, π, e) · (1, 3, 5) (0, π, e) · (2, 4, 6)

( 12 ,− 1

3 , 14 ) · (1, 3, 5) ( 1

2 ,− 13 , 1

4 ) · (2, 4, 6)

(0, 0, 0) · (1, 3, 5) (0, 0, 0) · (2, 4, 6)

=

3π + 5e 4π + 6e

34

76

0 0

What follows immediately from this example is that matrix multiplicationis not commutative; i.e., in general, AB 6= BA. In fact, in this case, only oneof these products exists.

As another example, let’s find the product Dx. The product exists becauseD is 3× 3 and x is 3× 1. The result is 3× 1.

Dx =

0 π e12 − 1

314

0 0 0

0

10

=

π

− 13

0

.

Exercise 1.50. Find, if they exist, the products

ABa) CBb) BCc) AEd)

xBe) xCf) (DA)Eg) D(AE)h)



Remark 1.51. Observe that the 1× 1 matrix C, when compatible with othermatrices, behaves as a scalar multiple. Thus in general 1× 1 matrices aretreated as scalars and are considered “compatible” with all matrices (in thesense of definition 1.44).

Definition 1.52 (Zero matrix). The matrix O = (0) =( 0 ··· 0

.... . .

...0 ··· 0

)is called the

zero matrix.

Theorem 1.53 (Ring properties for matrices). Let A, B, C be three matrices.Assuming that the matrix dimensions are such that the operations can be per-formed, we have the following:

A + (B + C) = (A + B) + CI) A + B = B + AII)

A + O = AIII) A + (−A) = OIV)

A(BC) = A(BC)V) A(B + C) = AB + BCVI)

(A + B)C = AC + BCVII)

Proof. Just as in theorem 1.10, these results easily follow from the defini-tions. The only hard one is V, where we have

A(BC) = (aij)((bij)(cij)

)= (aij)

(d

∑k=1

bikckj

)(by definition 1.47)

=

(δ

∑`=1

(ai`

d

∑k=1

b`kckj

))(by definition 1.47)

=

(δ

∑`=1

d

∑k=1

ai`b`kckj

)(by linearity of Σ)

=

(d

∑k=1

δ

∑`=1

ai`b`kckj

)(finite sums can be interchanged1)

=

(d

∑k=1

( δ

∑`=1

ai`b`k

)ckj

)(by linearity of Σ)

=

(δ

∑`=1

ai`b`j

)(cij) (by definition 1.47)



=((aij)(bij)

)(cij) = (AB)C,

as required. Proofs of the remaining properties are left as an exercise.

Definition 1.54 (Matrix Transpose). Let A = (aij)m×n. The transpose of A,denoted AT, is the n×m matrix given by

AT = (aji)n×m.

Example 1.55. If A, B and x are given by

A =

1 23 45 6

B =

(1 04 1

2

)x =

−101

,

then we have

AT =

(1 3 52 4 6

)BT =

(1 40 1

2

)xT =

(−1 0 1

).

Now we introduce a special matrix which behaves analogously to the num-ber 1 in the set of real numbers. The number 1 is called the multiplicativeidentity in R, because it does nothing to numbers under multiplication (pre-serving their “identity”):

x · 1 = 1 · x = x.

Analogously we have the additive identity 0 ∈ R, since 0 does not changenumbers under addition:

x + 0 = 0 + x = x.

For matrices, the additive identity is simply the zero matrix O, which ex-hibits the desired behaviour

A + O = O + A = A

for any matrix A (of compatible size), as seen in theorem 1.53 III. What wewould like to try and obtain here is a multiplicative identity for matrices.Note that this is not as simple a task as determining an additive identity,

1See proposition A.12.



firstly because multiplication of matrices is not defined as simply as addi-tion, and secondly because multiplication is not commutative; so even ifwe find some identity matrix I which satisfies AI = A, it needn’t satisfyIA = A.

Here is the definition.

Definition 1.56 (Identity matrix). The identity matrix is the n × n matrixdenoted In or simply I, defined by

In = (δij)

where δij denotes the Kronecker delta (definition 1.12).

Thus the first few identity matrices are

I1 = (1), I2 =

(1 00 1

), I3 =

1 0 00 1 00 0 1

, I4 =

1 0 0 00 1 0 00 0 1 00 0 0 1

,

with ones on the diagonal and zeros everywhere else.

Observe that if A is an m × n matrix where m 6= n, then it is impossibleto have InA = AIn = A for the same n, simply because only one of theseproducts can exist (because of their size). Refer to example 1.57, and verifythe computation yourself.

Example 1.57. Suppose A =(

1 2 34 5 6

). Then we have

AI3 =

(1 2 34 5 6

)1 0 00 1 00 0 1

=

(1 2 34 5 6

)=

(1 00 1

)(1 2 34 5 6

)= I2A.

Let us now prove that I behaves as desired in the general case.

Theorem 1.58. Let A = (aij) be an m× n matrix. Then

AIn = ImA = A.

Proof. Recall In = (δij). Thus

AIn =

(n

∑k=1

aikδkj

)



= (ai1δ1j + · · ·+ ai(j−1)δ(j−1)j + aijδjj + ai(j+1)δ(j+1)j + · · ·+ ainδnj)

= (ai10 + · · ·+ ai(j−1)0 + aij1 + ai(j+1)0 + · · ·+ ain0)

= (aij) = A,

and by a similar reasoning we get ImA = A.

Note that since in the general case, different I’s are required on the left andthe right of a matrix A to act as a multiplicative identity, we call I such thatIA = A the left identity, and I such that AI = A the right identity. It is easyto see that a matrix A has the same left and right identity if and only if thematrix A is square.

The final operation we introduce is the analogue of division for matrices.Before we give a definition however, let us again consider the real numbersfirst. What does it mean to divide? In an infantile treatment of arithmetic,division is introduced as a distinct operation from multiplication, just assubtraction is thought of being distinct from addition. But the way weactually treat subtraction in more formal considerations is the addition ofsome “inverse element”, i.e., x− y is shorthand for x + (−y), where −y isa number such that

y + (−y) = (−y) + y = 0.

We call −y the additive inverse of y. (Note that 0 on the right-hand side isthe additive identity).

Likewise, x÷ y or xy denotes x · y−1, where y−1 is a number such that

y · y−1 = y−1 · y = 1.

We call y−1 the multiplicative inverse, and this time we have = 1 on the righthand side, since 1 is the multiplicative identity. In the general case, whenwe have some binary operation ∗ defined on a set X, an element i ∈ X isan identity if

x ∗ i = i ∗ x = x

for all x ∈ X, and the inverse element x−1 ∈ X of x is an element such that

x ∗ x−1 = x−1 ∗ x = i.

In the context of matrices, the additive inverse of A is −A = (−1)A, sinceA + (−A) = (−A) + A = O. Again, multiplication will prove to be themore challenging case. In fact, we will focus solely on 2× 2 matrices fornow.



First of all, if A is m× n with m 6= n, we cannot have one matrix A−1 suchthat

AA−1 = A−1A = I,

simply because of the sizes: if A−1 exists it will have to be n × m and onthe left we get A−1A = In, whereas on the right we get AA−1 = Im. Thereis a study of left and right inverses, however we will not get into it here,and focus solely on square matrices where everything is n× n (everythingmeaning the matrix, its inverse, and I). Thus, we have the following defi-nition.

Definition 1.59 (Matrix Inverse). Let A be an n × n (square) matrix. Ann× n matrix A−1 such that

AA−1 = A−1A = In

is said to be an inverse of A. If A has an inverse, then it is said to be invertible.Otherwise, A is said to be singular.

Exercise 1.60. Suppose A =( −5 1 −2

2 1 22 0 1

)and B =

(1 −1 42 −1 6−2 2 −7

).

Work out AB and BA. What do you conclude?

Now clearly for 1× 1 matrices, being essentially numbers, the inverse ofthe matrix A = (a) is simply A−1 =

( 1a

), as long as a 6= 0. Indeed,

AA−1 = (a)(

1a

)= (1) = I1,

and similarly we get A−1A = I1.

Theorem 1.61. Let A be a square matrix. Then if A−1 exists, it is unique.

Proof. Suppose B and C are two inverses of A. Then

AB = BA = I and AC = CA = I.

In particular, AB = AC, so

B(AB) = B(AC) =⇒ (BA)B = (BA)C =⇒ IB = IC =⇒ B = C,

thus any two inverses of A are equal, proving that A−1 is unique.



The following theorem is helpful because it saves us having to check thatboth AA−1 = I and A−1A = I; one of them is enough to prove inverse.

Theorem 1.62. Suppose A and B are square matrices such that AB = I. ThenBA = I.

Proof. We give an incomplete proof, because we assume B−1 exists.2 In-deed,

AB = I =⇒ (AB)B−1 = IB−1

=⇒ A(BB−1) = B−1

=⇒ AI = B−1

=⇒ A = B−1

=⇒ BA = BB−1

=⇒ BA = I,

as required.

Combining theorems 1.61 and 1.62, we have the following.

Corollary 1.63. If A and B satisfy AB = I, then A and B are each other’s uniqueinverse.

Proof. Indeed, if AB = I, then BA = I by theorem 1.62, and thus

AB = BA = I.

This gives us that A is an inverse of B, and that B is an inverse of A. Unique-ness then follows from theorem 1.61.

Now let us go to 2× 2 matrices. We have the following.

Theorem 1.64. Suppose ad − bc 6= 0. Then the 2 × 2 matrix A =(

a bc d

)is

invertible. Moreover,

A−1 =1

ad− bc

(d −b−c a

).

2It can be shown that B−1 always exists whenever AB = I, but we need more toolsbefore we can do so. It is given as an exercise in future sections.



Proof. We have(a bc d

)(1

ad− bc

(d −b−c a

))=

1ad− bc

(a bc d

)(d −b−c a

)

=1

ad− bc

(ad− bc −ab + abcd− cd −bc + ad

)

=1

ad− bc

(ad− bc 0

0 −bc + ad

)=

(1 00 1

)= I,

which by corollary 1.63 completes the proof.

Note. Just as we required a 6= 0 for 1× 1 matrices to be invertible, here werequire ad− bc 6= 0. This special number, ad− bc, is called the determinantof the matrix A, which we denote by |A| or by det A. There are analogousnumbers for general n× n matrices which we explore later, together withtheir algebraic and geometric significance.

Example 1.65. Suppose X =(

2 −15 3

). Then

X−1 =1|X|

(3 −(−1)−5 2

)=

12 · 3− (−1) · 5

(3 1−5 2

)=

111

(3 1−5 2

).

Example 1.66. We solve the equation AX + B = C for X, where

A =

(2 74 −3

), B =

(1 43 1

), C =

(10 294 0

).

Indeed,

AX + B = C =⇒ AX = C− B

=⇒ A−1AX = A−1(C− B)

=⇒ X = A−1(C− B).

Therefore

X =1|A|

(−3 −7−4 2

) [(10 294 0

)−(

1 43 1

)]



= − 134

(−3 −7−4 2

)(9 251 −1

)

= − 134

(−34 −68−34 −102

)

=

(1 21 3

).

Remark 1.67. Most of the algebra we carry out to solve matrix equations isanalogous to the algebra with real numbers which we are used to. The twomain differences are the lack of division, where we instead use inverses,and the lack of commutativity. Notice that in the second implication ofexample 1.66, we multiplied both sides by A−1. In particular, we multipliedon the left. This is different than multiplying on the right! So in general,if we have the equation LHS = RHS, we can premultiply by a matrix to getA LHS = A RHS, or postmultiply to get LHS A = RHS A. But we cannot doA LHS = RHS A or LHS A = A RHS.

Notation (Matrix Power). For any square matrix A, we define the matrixpower An by the following recursive definition.

An =

AAn−1 if n > 0

I if n = 0(A−1)−n if n < 0,

so for example, A3 = AAA, A0 = I and A−2 = A−1A−1.

Exercise 1.68. 1. Calculate 1 8 42 9 5−3 13 10

+

3 0 122 7 33 3 14

,

giving your answer in the form kA, k ∈N.

2. Consider the following matrices with their sizes given below:

A4× 3

B3× 3

C4× 4

D2× 5

E3× 2

F4× 1

Determine the size of:



ABa) BATb) DTEc)

FTCABd) (BE + E)Te) FTCFf)

3. Calculate:0 −8 2 11 −6 1 93 −2 4 58 2 5 17 2 7 4

1 8 7 3 21 1 0 1 23 1 −4 −3 5−6 2 3 5 −1

a)

12345

(6 7 8 9 10)

b)

4. Given the matrices A =(

1 2 34 5 67 8 9

), B =

( 2 07 5−3 8

), C =

(2 10 5

)and

D =(

149

), find where possible:

ABa) BAb) CBc)

ABC− Bd) AD + De) C−1f)

BC + BC−1g) DCh) 4A3i)

DTAj) BTAk) (AT −A)Tl)

AATm) BTAT −ABTn) DTDo)

(AAT)Tp) 12 (A−AT)q) BTABr)

5. A matrix A is said to be symmetric if AT = A and skew-symmetricif AT = −A. Let A be any n× n square matrix. Prove that:

The matrix S = 12 (A + AT) is always symmetric.a)

The matrix V = 12 (A−AT) is always skew-symmetric.b)

Any square matrix can be split as the sum of a symmetricand a skew-symmetric matrix. (Hint: use S and V).

c)



6. Given that

A =

−1 4 12 −4 7−3 6 −9

and B =

6 −42 −323 −12 −90 6 4

,

find AB and deduce A−1.

7. Theorem 1.64 gives us the implication

ad− bc 6= 0 =⇒(

a bc d

)has an inverse.

Prove that the converse is also true, that is, if the matrix(

a bc d

)has an inverse, then it must be that ad− bc 6= 0. This way, weget that for any 2× 2 matrix A, “A is invertible” is equivalent to|A| 6= 0. [i.e., we get ad− bc 6= 0 ⇐⇒

(a bc d

)has an inverse.]

8. Prove that for any two invertible square matrices A and B, ifthe product AB exists, it is invertible, and (AB)−1 = B−1A−1.

9. Invert the following matrices.12

( −4 23 −1

)a)

( 9 −2−4 1

)b)

(1 11 0

)c)

(2 22 −1

)d)

10. Explain why(

1 24 8

)is singular.

11. Given the matrices A =(

2 13 −1

), B =

(32

)and C =

(11

), solve

the equation Ax + B = 10 C.

12. Consider the matrices A =(

1 25 0

), B =

(7 −31 4

)and C =

(6 −82 3

).

Find the inverses A−1, B−1, C−1 and the product ABC. Henceverify that (ABC)−1 = C−1B−1A−1, and prove that the resultholds in general for any three invertible matrices A, B and C(of the same size).

13. A diagonal matrix is a square matrix with entries in the diag-onal, and zeros everywhere else. Prove that, in general, theproduct of two diagonal matrices (where it exists) is another

diagonal matrix. Hence invert the matrix(

13 0 00 −2 00 0 1

).

14. The matrix A is given by( 1 −1 2

3 2 55 1 −3

).

a) Find A2 and A3. Express A3 + λA + µI as a single 3× 3matrix.


§2 | Computational Methods Luke Collins

b) Find values of λ and µ such that A3 +λA+µI = O, whereO is the 3× 3 zero matrix. Hence, express A−1 as a single3× 3 matrix.

15. A matrix P(k) is given by P(k) =( k 2

k−6 k−5

)for k ∈ R.

a) Determine the values of k for which P(k) has no inverse.What is such a matrix called?

b) Find, in terms of k, the inverse matrix P−1(k) for when kis not equal to any of the values found in part (a).

2 Computational MethodsIn this section we demonstrate how matrices can help us to solve systemsof linear equations in multiple variables, and then explore the notions ofdeterminants and inverses for matrices larger than 2× 2.

2.1 Systems of Linear EquationsA system of linear equations (or a linear system) is a collection of linear equa-tions involving the same set of variables. For example,{

5x− 2y = 7

7x + 3y = 4

is a linear system involving the variables x and y. The use of the word “sys-tem” indicates that the equations are to be considered collectively, ratherthan individually. This is also why we use a curly bracket ({) to group theequations together notationally.

A solution to a linear system is an assignment of the variables, x and y in thiscase, such that all the equations are simultaneously satisfied. A solution tothe system above is given by the assignment x = 1 = −y.

Notice that the system above can be written as{(5,−2) · (x, y) = 7(7, 3) · (x, y) = 4

where · denotes the dot product of vectors. Even more concisely, the defini-tion of matrix multiplication (and matrix equality) gives us that the system


§2.1 | Systems of Linear Equations Luke Collins

is equivalent to the matrix equation(5 −27 3

)(xy

)=

(74

).

We call the matrix A =(

5 −27 3

)the matrix of coefficients, the vector x = (x, y)

the solution vector, and b = (7, 4) the vector of constant terms. Thus thissystem is simply a matrix equation of the form Ax = b.

Indeed, this is true in general for any linear system of equations:

System ⇐⇒ Ax = b{ax + by = α

cx + dy = β⇐⇒

(a bc d

)(xy

)=

(αβ

),

ax + by + cz = α

dx + ey + f z = β

gx + hy + iz = γ

⇐⇒

a b cd e fg h i

xyz

=

αβγ

,

ax + by + cz + dw = α

ex + f y + gz + hw = β

ix + jy + kz + lw = γ

mx + ny + oz + pw = δ

⇐⇒

a b c de f g hi j k lm n o p

xyzw

=

αβγδ

,

...

We can also have a different number of equations from variables, e.g.:

{au + bv + cw + dx = α

eu + f v + gw + hx = β⇐⇒

(a b c de f g h

)uvwx

=

(αβ

),

ax + by = α

cx + dy = β

ex + f y = γ

⇐⇒

a bc de f

(xy

)=

αβγ

,



and in general,a11x1 + a12x2 + · · ·+ a1mxn = b1

a21x1 + a22x2 + · · ·+ a2mxn = b2...

am1x1 + am2x2 + · · ·+ amnxn = bm

⇐⇒ (aij)m×n(xi)n = (bi)m.

However for now we focus on the case where the number of equations isequal to the number of variables (and therefore we have a square n × nmatrix of coefficients).

What is the advantage of representing linear systems in this way? Theanswer is simple: it reduces the problem of solving the system to findingthe matrix inverse, since

Ax = b =⇒ x = A−1b,

assuming that the matrix of coefficients A is invertible. For 2× 2 matrices,this is equivalent to requiring that |A| 6= 0 (by exercise 1.68.7). Indeed, letus try and solve the system {

5x− 2y = 7

7x + 3y = 4

which we gave initially, by inverting the matrix of coefficients. Since thiscan be written as (

5 −27 3

)(xy

)=

(74

),

we have(xy

)=

(5 −27 3

)−1 (74

)=

15 · 3− (−2) · 7

(3 2−7 5

)(74

)

=129

(3 · 7 + 2 · 4−7 · 7 + 5 · 4

)

=129

(29−29

)=

(1−1

),

and indeed the solution is x = 1 = −y.



Remark 2.1. We will prove later that each system of n linear equations inn variables (thus n× n matrix of coefficients) has a unique solution if andonly if the determinant of its matrix of coefficients is non-zero.

Example 2.2. Consider the matrices A =( −3 2 0−1 1 −37 0 7

)and B =

( −7 −14 −614 21 97 14 5

).

We find the product AB and hence solve the system−3x + 2y = 3

−x + y− 3z = −1

7x + 7z = 0

.

Indeed, we have

AB =

−3 2 0−1 1 −37 0 7

−7 −14 −614 21 97 14 5

=

7 0 00 7 00 0 7

= 7I.

Now observe that the system we have is Ax = b, where x = (x, y, z) andb = (3,−1, 0). Thus we can find the solution vector x since x = A−1b, and

AB = 7I =⇒ 17 AB = I =⇒ A( 1

7 B) = I

and thus A−1 = 17 B. Therefore

x = A−1b = ( 17 B)b =

17

−7 −14 −614 21 97 14 5

3−10

=

1−3−1

,

and thus the solution is x = 1, y = −3 and z = −1.

Exercise 2.3. 1. Solve the following systems of equations by in-verting the matrix of coefficients.{

2x− 5y = −21

4x + 3y = 23a)

{5x + 4y = 40

3x− 9y = −33b)

{3x− 6y = −3

5x− 6y = 7c)

{x + 9y = 34

4x− 5y = 13d)



{6x− 3y = 3

4x− 3y = −5e)

{6x + 4y = 65

6x + 8y = 86f)

{9x + 8y = 42

3x− 2y = 0g)

{−9x + 8y = 4

3x + 5y = 37h)

2. The matrix A =

(−3 0 3−6 4 11 −2 −1

)has inverse 1

15k

(−k 2` −12−5 k+`+1 5`4k −6 −12

).

Find the values of k and `, and hence or otherwise, solve thesystem of equations

−u + w = 2

−6u + 4v + w = 5

−u + 2v + w = 6

.

Ans: x=1, y=2, z=3

3. Consider the matrix M =( 1 4 0

5 −2 −1−5 0 1

).

Determine constants λ, µ such that M3 = λM + µI.a)

Hence, determine M−1 and solve the system of equationsx + 4y = 5

5x− 2y− z = 3

−5x + z = −5

.

Ans: x=y=1, z=0

b)

Notation. | a11 a12a21 a22 | is shorthand for

∣∣( a11 a12a21 a22 )

∣∣ = det( a11 a12a21 a22 ).

4. Consider the lines `1, `2 ⊆ R2 whose respective equations areax + by = k and cx + dy = m. Show that `1 and `2 are parallelif and only if ∣∣∣∣a b

c d

∣∣∣∣ = 0.

5. (Cramer’s Rule). Let A =(

a bc d

)and b = (α, β). Prove that the

solution x = (x, y) of the system of equations corresponding to


§2.2 | Elementary Row Operations Luke Collins

the equation Ax = b is given by the equations

x =

∣∣∣∣α bβ d

∣∣∣∣ /det(A) and y =

∣∣∣∣a αc β

∣∣∣∣ /det(A).

2.2 Elementary RowOperationsElementary row operations are simple operations one can carry out on therows of a given matrix. There are three such operations.

I. Row switching.Interchanging row i with row j, denoted by writing Ri ↔ Rj.

For example, 1 2 34 5 67 8 9

R1↔R2−−−−−−−→

4 5 61 2 37 8 9

.

II. Row scaling.Multiplying row i by a non-zero scalar λ ∈ R, denoted by writingλRi → Ri.

For example, 1 2 34 5 67 8 9

5R2→R2−−−−−−−→

1 2 320 25 307 8 9

.

III. Row adding.Replacing row i by the sum of itself with a scalar multiple of anotherrow j, where j 6= i, denoted by writing Ri + λRj → Ri.

For example,1 2 34 5 67 8 9

R3+(−3)R1→R3−−−−−−−−−−−−→

1 2 34 5 64 2 0

.

We focus on these three operations in particular because it turns out thatcarrying out an elementary row operation on a matrix A can be achieved


§2.2 | Elementary Row Operations Luke Collins

simply by pre-multiplying by some other matrix E. Such matrices, that is,matrices which carry out elementary row operations when multiplied onthe left, are called elementary matrices.

Definitions 2.4 (Elementary Matrices). An elementary matrix is an n× n ma-trix E which falls under one of the following definitions.

(i) The swap matrix Sk` is an n× n matrix defined by

Sk` =(δij(1− δik)(1− δi`) + δikδj` + δi`δjk

),

where δij denotes the Kronecker-delta.

(ii) The row-scaling matrix is an n× n matrix Lk(λ) defined by

Lk(λ) =(δij(1 + (λ− 1)δik)

).

(iii) The row-adding matrix is an n× n matrix Rk`(λ), k 6= `, defined by

Rk`(λ) =(δij + λδikδj`

).

We then have that these matrices behave as we wish them to:

Theorem 2.5 (Elementary Row Operations). Let A be an n× n matrix. Then

(i) The resulting matrix after applying the row operation Ri ↔ Rj to A is givenby SijA.

(ii) Let λ ∈ R be a non-zero scalar. The resulting matrix after applying the rowoperation λRi → Ri to A is given by Li(λ)A.

(iii) Let λ ∈ R be a scalar. The resulting matrix after applying the row operationRi + λRj → Ri to A is given by Rij(λ)A.

The proof of this fact is a straightforward expansion of the definitions eachof the matrices, and the definition of matrix multiplication (1.47), similar tothat of theorem 1.58. We leave it as an exercise.

Remark 2.6. Even though the matrices given in definitions 2.4 may seemcomplicated when expressed in terms of δ’s, they are actually equivalentto the matrices obtained by applying the corresponding elementary rowoperation to the identity matrix.


§2.3 | Determinants Luke Collins

For example, a 4× 4 S24 matrix (which corresponds to R2 ↔ R4) is simplythe 4× 4 identity matrix with rows 2 and 4 interchanged:

S24 =

1 0 0 00 0 0 10 0 1 00 1 0 0

,

a 3× 3 L2(6) matrix (which corresponds to 6R2 → R2) is simply the 3× 3identity matrix with row 2 multiplied by 6:

L2(6) =

1 0 00 6 00 0 1

,

and a 5× 5 R53(−2) matrix (which corresponds to R5 + (−2)R3 → R5) issimply the 5× 5 identity matrix with −2 times row 3 added to row 5:

R53(−2) =

1 0 0 0 00 1 0 0 00 0 1 0 00 0 0 1 00 0 −2 0 1

.

Exercise 2.7. Prove, using the definitions in definitions 2.4 and thedefinition of matrix multiplication (1.47), show that

Sij−1 = Sija) Li(λ)

−1 = Li(1/λ)b)

2.3 DeterminantsWe have already seen that the determinant of the 2× 2 matrix A =

(a bc d

)is

given by

|A| =∣∣∣∣a bc d

∣∣∣∣ = ad− bc.

In exercise 1.68.7, we have seen that the determinant of a matrix being non-zero is equivalent to the matrix being invertible. Moreover, in exercise 2.3.4,we arrived to this conclusion with geometric intuition about lines in twodimensions (R2).



For 3× 3 matrices, the determinant also exists, but before we get to intro-ducing it, we need some definitions.

Definition 2.8 (Submatrix). Let A = (aij) be an n× n matrix. We associatean (n− 1)× (n− 1) matrix with each entry aij of A, called the submatrix ofaij in A, denoted by Aij, obtained simply by deleting row i and column jfrom A. In other words, we have

Ak` = (a(i+[i>k])(j+[j>`]))(n−1)×(n−1),

where [φ] denotes the Iverson bracket.3

Example 2.9. Suppose A =(

1 2 34 5 67 8 9

). Then

A12 =

� � �4 � 67 � 9

=

(4 67 9

), A22 =

1 � 3� � �7 � 9

=

(1 37 9

),

and A33 =

1 2 �4 5 ��

=

(1 24 5

).

In addition to the submatrix, we associate a sign (+ or−) with each entry ina matrix A. These signs follow a chequerboard-like pattern, starting from+ in the top-left corner:

+ − + · · ·− + − · · ·+ − + · · ·...

......

. . .

� � � · · ·� � � · · ·� � � · · ·...

......

. . .

Indeed, it is not hard to see that the sign corresponding to the entry aij is+ if i + j is even, and − if i + j is odd. So we can nicely express the signcorresponding to the entry aij simply as (−1)i+j. Now we are ready to givethe most important definition before going on to introduce the determinant:

3The Iverson bracket is a notation defined by

[φ] =

{1 if φ is true0 otherwise,

where φ is a statement which can be true or false. In this case, we as using it to add 1 to thematrix indices i and j so that we “skip over” the column/row we are deleting.



Definition 2.10 (Cofactor). Let A = (aij) be an n× n matrix. The cofactorcorresponding to the entry aij, denoted co(aij), is defined by

co(aij) = (−1)i+j det(Aij).

In other words, the cofactor of aij is the determinant of the submatrix Aijpaired with the entry’s corresponding sign (as in the chequerboard-like pat-tern above).

Example 2.11. Again, suppose A =(

1 2 34 5 67 8 9

). Then

co(a23) = (−1)2+3

∣∣∣∣∣∣1 2 �� 7 8 �

∣∣∣∣∣∣ = −∣∣∣∣1 27 8

∣∣∣∣ = −(1 · 8− 2 · 7) = −(−6) = 6,

and

co(a31) = (−1)3+1

∣∣∣∣∣∣� 2 3� 5 6� � �

∣∣∣∣∣∣ = +

∣∣∣∣2 35 6

∣∣∣∣ = 2 · 6− 3 · 5 = −3.

Now we are ready to introduce the determinant for 3 × 3 matrices, and,shortly after, for any n× n matrix. The definition we give is due to Laplace.

Definition 2.12 (3 × 3 Determinant). The determinant of the 3 × 3 matrixA = (aij) is defined by

|A| =3

∑k=1

a1k co(a1k) = a11 co(a11) + a12 co(a12) + a13 co(a13).

In other words, we are defining the determinant of a 3× 3 as the sum ofentries of the first row, each multiplied by their corresponding cofactor.

Example 2.13. Again take A =(

1 2 34 5 67 8 9

). We find |A|.

|A| =

∣∣∣∣∣∣1 2 34 5 67 8 9

∣∣∣∣∣∣= 1 ·

+

∣∣∣∣∣∣� � �� 5 6� 8 9

∣∣∣∣∣∣+ 2 ·

−∣∣∣∣∣∣� � �4 � 67 � 9

∣∣∣∣∣∣+ 3 ·

+

∣∣∣∣∣∣� � �4 5 �7 8 �

∣∣∣∣∣∣



=

∣∣∣∣5 68 9

∣∣∣∣− 2∣∣∣∣4 67 9

∣∣∣∣+ 3∣∣∣∣4 57 8

∣∣∣∣= 5 · 9− 6 · 8− 2(4 · 9− 6 · 7) + 3(4 · 8− 5 · 7)= −3− 2(−6) + 3(−3) = 0.

Although we have not yet proved that it is the case for 3× 3 matrices, a zerodeterminant is in fact equivalent to having a non-invertible matrix. Thuswe have that A is not invertible.

Example 2.14 (General formula). Here we will derive a general formula forthe 3 × 3 determinant, in the style of

∣∣ a bc d

∣∣ = ad − bc for 2 × 2 matrices.Memorising it is not recommended!∣∣∣∣∣∣a b cd e fg h i

∣∣∣∣∣∣ = a

+

∣∣∣∣∣∣� � �� e f� h i

∣∣∣∣∣∣+ b

−∣∣∣∣∣∣� � �d � fg � i

∣∣∣∣∣∣+ c

+

∣∣∣∣∣∣� � �d e �g h �

∣∣∣∣∣∣

= a∣∣∣∣e fh i

∣∣∣∣− b∣∣∣∣d fg i

∣∣∣∣+ c∣∣∣∣d eg h

∣∣∣∣= aei− a f h− bdi + b f g + cdh− ceg.

We can now give the general definition of the determinant.

Definition 2.15 (Determinant). Let A = (aij) be an n× n matrix. Then thedeterminant of A, denoted det(A) or |A|, is the number defined by

det(A) =

a11 if n = 1

n

∑j=1

a1j co(a1j) otherwise.

Notice that is a straightforward generalisation of the 3 × 3 case, whichworks for both smaller and larger matrices. Indeed, for n = 1, 2 and 4,the definition gives∣∣a∣∣ = a∣∣∣∣a b

c d

∣∣∣∣ = a(+

∣∣∣∣� �� d

∣∣∣∣)+ b(−∣∣∣∣� �

c �

∣∣∣∣)= a

∣∣d∣∣− b∣∣c∣∣ = ad− bc



∣∣∣∣∣∣∣∣a b c de f g hi j k lm n o p

∣∣∣∣∣∣∣∣ = a

+

∣∣∣∣∣∣∣∣� � � �� f g h� j k l� n o p

∣∣∣∣∣∣∣∣+ b

−∣∣∣∣∣∣∣∣� � � �e � g hi � k lm � o p

∣∣∣∣∣∣∣∣

+ c

+

∣∣∣∣∣∣∣∣� � � �e f � hi j � lm n � p

∣∣∣∣∣∣∣∣+ d

−∣∣∣∣∣∣∣∣� � � �e f g �i j k �m n o �

∣∣∣∣∣∣∣∣

= a

∣∣∣∣∣∣f g hj k ln o p

∣∣∣∣∣∣− b

∣∣∣∣∣∣e g hi k lm o p

∣∣∣∣∣∣+ c

∣∣∣∣∣∣e f hi j lm n p

∣∣∣∣∣∣− d

∣∣∣∣∣∣e f gi j km n o

∣∣∣∣∣∣= a

f

∣∣∣∣∣∣� � �� k l� o p

∣∣∣∣∣∣+ · · ·+ · · · − d

· · ·+ g

∣∣∣∣∣∣� � �i j �m n �

∣∣∣∣∣∣

= dgjm− chjm− d f km + bhkm + c f lm− bglm− dgin + chin+ dekn− ahkn− celn + agln + d f io− bhio− dejo + ahjo+ belo− a f lo− c f ip + bgip + cejp− agjp− bekp + a f kp.

As is clear from the 4× 4 case, large determinants become too laborious towork out by hand. Indeed, as is illustrated, a 4× 4 determinant requiresfour 3 × 3 determinants to be worked out, each of which in turn requirethree 2 × 2 determinants to work out, meaning that a 4 × 4 determinantrequires twelve 2× 2 determinants to be worked out. (It’s not hard to seethat in general, an n× n determinant requires the computation of n!

2 2× 2determinants.)

It seems strange that the definition of the determinant specifically involvesthe first row of the matrix A (notice only a1j appears in the definition). Isthere something inherently important about the first row of a matrix? Itturns out that the answer is no: we can perform the sum of entries timestheir cofactor in any row and the result will be the same determinant! Thisis stated below.

Theorem 2.16 (Laplace Expansion). Let A = (aij) be an n× n matrix, n > 2,and pick any row i ∈ {1, . . . , n}. Then

det(A) =n

∑j=1

aij co(aij).



The proof of this result requires elementary row operations, so we revisit itlater.

Example 2.17. Here we illustrate the advantage of this result. Suppose wewish to evaluate the determinant∣∣∣∣∣∣∣∣

1 2 3 40 0 2 05 9 2 13 2 1 1

∣∣∣∣∣∣∣∣ .

It would be a lot easier if we were allowed to expand the determinant alongthe second row instead of the first row as we have been doing so far, be-cause that would give∣∣∣∣∣∣∣∣

1 2 3 40 0 2 05 9 2 13 2 1 1

∣∣∣∣∣∣∣∣ = 0

−∣∣∣∣∣∣∣∣� 2 3 4� � � �� 9 2 1� 2 1 1

∣∣∣∣∣∣∣∣+ 0

+

∣∣∣∣∣∣∣∣1 � 3 4� � � �5 � 2 13 � 1 1

∣∣∣∣∣∣∣∣

+ 2

−∣∣∣∣∣∣∣∣

1 2 � 4� � � �5 9 � 13 2 � 1

∣∣∣∣∣∣∣∣+ 0

+

∣∣∣∣∣∣∣∣1 2 3 �� 5 9 2 �3 2 1 �

∣∣∣∣∣∣∣∣

= −2

∣∣∣∣∣∣1 2 45 9 13 2 1

∣∣∣∣∣∣ = · · · = 130.

Thus thanks to theorem 2.16, we can evaluate this 4 × 4 determinant byworking out one 3× 3 determinant instead of four!

Remark 2.18. In general, by expanding along the row with the most zeroswe optimise the amount of computations. Always be aware to allocate thecorrect signs to the cofactors—remember the chequerboard pattern!

Another immediate consequence of this result is the following:

Proposition 2.19. Let A be an n × n matrix. If a row of A consists entirely ofzeros, then |A| = 0.

Proof. (the idea: expand along the row consisting solely zeros.) Supposerow i is the row consisting entirely of zeros, i.e. aij = 0 for all j = 1, . . . , n.



If n = 1, then there is only one row/column and aij = a11 = 0, so thatdet(A) = det(a11) = a11 = 0. If n > 2, then by theorem 2.16

det(A) =n

∑j=1

aij co(aij) =n

∑j=1

0 · co(aij) = 0,

as required.

It also turns out that we have the following result which we prove laterusing elementary row operations.

Theorem 2.20. Let A be an n× n matrix. Then

det(A) = det(AT).

Remark 2.21. Consequently, to find the determinant of a matrix A, we canalso choose to find the determinant of the transpose, which involves ex-panding along some row in AT. But this is equivalent to expanding alongsome column of A. In other words, we can also find the determinant of A byexpanding along a column, which might be useful if some column containsmore zeros than any row.

Example 2.22. We evaluate the following determinant by expanding alongthe third column, since it contains two zeros:∣∣∣∣∣∣∣∣

1 2 −1 72 −4 0 51 9 0 62 −6 9 7

∣∣∣∣∣∣∣∣ .

Indeed,∣∣∣∣∣∣∣∣1 2 −1 72 −4 0 51 9 0 62 −6 9 7

∣∣∣∣∣∣∣∣ = −1

+

∣∣∣∣∣∣∣∣� � � �2 −4 � 51 9 � 62 −6 � 7

∣∣∣∣∣∣∣∣+ 9

−∣∣∣∣∣∣∣∣

1 2 � 72 −4 � 51 9 � 6� � � �

∣∣∣∣∣∣∣∣

= −1

∣∣∣∣∣∣2 −4 51 9 62 −6 7

∣∣∣∣∣∣− 9

∣∣∣∣∣∣1 2 72 −4 51 9 6

∣∣∣∣∣∣= −1

(2∣∣∣∣ 9 6−6 7

∣∣∣∣+ 4∣∣∣∣1 62 7

∣∣∣∣+ 5∣∣∣∣1 92 −6

∣∣∣∣)45 PRELIMINARY VERSION 0.4

§2.4 | The Adjugate Matrix Luke Collins

− 9(∣∣∣∣−4 5

9 6

∣∣∣∣− 2∣∣∣∣2 51 6

∣∣∣∣+ 7∣∣∣∣2 −41 9

∣∣∣∣)= −1(198− 20− 120)− 9(−69− 14 + 154) = −697.

Exercise 2.23. 1. Find the following determinants.∣∣∣∣∣∣2 1 33 7 20 2 −4

∣∣∣∣∣∣a)

∣∣∣∣∣∣4 1 65 −2 05 1 7

∣∣∣∣∣∣b)

∣∣∣∣∣∣3 2 91 −2 40 2 0

∣∣∣∣∣∣c)

∣∣∣∣∣∣0 1 x1 x x2

x x2 x4

∣∣∣∣∣∣d)

∣∣∣∣∣∣k 1 33 k 20 k −4

∣∣∣∣∣∣e) f)

2. Find

∣∣∣∣∣∣cos θ − sin θ 0sin θ cos θ 0

0 0 1

∣∣∣∣∣∣.3. Let

f (x) =

∣∣∣∣∣∣x x2 x3

a a2 a3

b b2 b3

∣∣∣∣∣∣a)

2.4 The Adjugate MatrixHere we introduce an explicit formula for the inverse which can be usedto invert any n× n matrices, although in practice, this method is not veryefficient and is only useful for the 2× 2 and 3× 3 (maybe 4× 4) cases.

First we need the following definitions.

Definitions 2.24. Let A = (aij) be an n× n matrix.

(i) The matrix of cofactors of A, denoted co(A), is the n× n matrix definedby

co(A) = (co(aij)).

In other words, the matrix of cofactors of A is the matrix consisting ofthe cofactors of each entry in that entry’s position.


§2.4 | The Adjugate Matrix Luke Collins

(ii) The adjuagate matrix of A, denoted adj(A), is simply defined as thetranspose of the matrix of cofactors; i.e.

adj(A) = [co(A)]T.

Note. The adjugate has sometimes been called the “adjoint”, but today theadjoint of a matrix A normally refers to the so-called Hermitian adjoint A∗,which for n× n matrices with real entries is equivalent its transpose.

Example 2.25. Let A =( 0 2 3

4 5 67 8 9

). Then

co(A) =

+

∣∣∣∣5 68 9

∣∣∣∣ − ∣∣∣∣4 67 9

∣∣∣∣ +

∣∣∣∣4 57 8

∣∣∣∣−∣∣∣∣2 38 9

∣∣∣∣ +

∣∣∣∣0 37 9

∣∣∣∣ − ∣∣∣∣0 27 8

∣∣∣∣+

∣∣∣∣2 35 6

∣∣∣∣ − ∣∣∣∣0 34 6

∣∣∣∣ +

∣∣∣∣0 24 5

∣∣∣∣

=

−3 6 −36 −21 14−3 12 −8

,

and

adj(A) = [co(A)]T =

−3 6 −36 −21 12−3 14 −8

T

=

−3 6 −36 −21 12−3 14 −8

.

Observe that the product A adj(A) interestingly yields

A adj(A) =

0 2 34 5 67 8 9

−3 6 −36 −21 12−3 14 −8

=

3 0 00 3 00 0 3

= 3I,

and moreover a quick calculation shows that 3 = det(A).

In fact, we have the following.

Theorem 2.26 (Adjugate Theorem). Let A be an n× n matrix. Then

A adj(A) = adj(A)A = |A|I.

Proof. We study the entries of A adj(A). By definitions 1.47 and 1.54,

A adj(A) = A[co(A)]T = (aij)(co(aij))T = (aij)(co(aji))


§2.5 | Gaussian Elimination Luke Collins

=

(n

∑k=1

aik co(ajk)

).

Notice that for i = j, i.e. along the diagonal, the entries are precisely thedeterminant |A| by theorem 2.16. What is left to show is that for i 6= j, thesum evaluates to zero.

2.5 Gaussian Elimination

3 Some Geometry3.1 Lines3.2 Planes3.3 Mensuration

4 Linear Maps4.1 Matrices are Linear Maps4.2 Rank and Nullity4.3 The Dimension Theorem4.4 Eigenvalues and Eigenvectors


AppendicesA PreliminariesIn this appendix, we go over a few preliminary ideas and notation whichare assumed throughout the notes.

A.1 Naïve Set TheoryWe start with the notion of a set. Informally, a set is a collection of distinct“objects”. In particular, the defining characteristic of a set is the idea ofmembership—an object x is either a member of a set S, or not. We writex ∈ S for “x is an element of the set S” (or x is in S), and similarly y /∈ S forthe negation “y is not an element of S”. Sets may be defined by listing theirelements between curly brackets, e.g.

A = {1, 2, 3, 4, 5}

defines the set A whose elements are 1, 2, 3, 4 and 5. We have 1 ∈ A, but0 /∈ A (for example). It is conventional to use capital letters for sets.

Notice that our definition of a set is an imprecise one: we do not clearlystate what counts as an “object” in a set, nor which sets we are explicitlyallowed construct; we simply define sets by describing their elements ver-bally. Can sets contain other sets? Are we allowed to define strange setssuch as “the set of all sets which are not members of themselves”? Thisvagueness leads to fundamental problems in mathematics and philosophy—but we will not concern ourselves with these issues here.4

Some of the important sets which we encounter are:

• The empty set, denoted by the symbol ∅, is the set such that

x /∈ ∅ for all x.

• The set of natural numbers, denoted by the symbol N, is the infiniteset containing all positive whole numbers:

N = {1, 2, 3, 4, . . . }.4The interested reader is encouraged to look up the graphic novel Logicomix to get an

idea of the historical significance of these problems, or, to get stuck in to formal (i.e., non-naïve) set theory itself, take a look at the textbook Introduction to Set Theory by Hrbácek andJech or this free pdf textbook: https://www.math.uwaterloo.ca/~randre/1aaset_theory_140613.pdf.

49

https://www.math.uwaterloo.ca/~randre/1aaset_theory_140613.pdf

https://www.math.uwaterloo.ca/~randre/1aaset_theory_140613.pdf

§A.1 | Naïve Set Theory Luke Collins

• The set of integers, denoted by the symbol Z (for the German zählen,meaning counting), is the infinite set containing the positive wholenumbers, the negative whole numbers and zero:

Z = {. . . ,−2,−1, 0, 1, 2, . . . }.

• The set of rational numbers, denoted by the symbol Q (for quotient),is the set of all numbers which can be expressed as a ratio of twointegers. For example, this set contains the numbers 1

2 , 227 , − 1

3 , 0 and5.

• The set of real numbers, denoted by the symbol R, contains all therational numbers, together with all the numbers which have infinitedecimal expansions. Some of these are rational (e.g. 1

3 = 0.333 . . .and 1

7 = 0.142857142 . . . ), but others are irrational, that is, not rational(e.g.

√2 = 1.41421 . . . , π = 3.14159 . . . and e = 2.7182818 . . . ).

It is not easy to see that some numbers are irrational. The easiestnumber to prove is irrational is

√2, and a proof is provided in the

following pages.

Optional Reading: The Irrationality of√

2

It is not easy to convince students that there are irrational numbers. The ancientGreeks, in particular, the Pythagoreans, believed that numbers were either whole(that is, integers) or parts of a whole (that is, rationals). Pythagoras is famous for histheorem relating the lengths of sides in a right-angled triangle,a and perhaps thesimplest case we can consider is when the legs of the right-angled triangle are bothequal to 1.

1

1

h

FIGURE 9: Right-angled triangle with legs of unit length

By Pythagoras’ theorem, we get that the hypotenuse h must satisfy h2 = 12 + 12,from which one easily obtains h =

√2. Naturally, since numbers are either whole

or parts of a whole, there must be a way to express√

2 =ab



for some integers a, b ∈ Z, right? That’s what the Pythagoreans believed.

Let us prove that this is impossible. (If you wish, you can watch a youtube versionof this proof here: https://youtu.be/LmpAntNjPj0.) We will do this by contradic-tion; that is, we will assume that there exist integers a and b such that

√2 = a

b , andshow that this assumption leads us to an absurd conclusion. Before we proceedwith the proof, we need to make two easy observations. An even number is an inte-ger which is divisible by 2, that is, a number of the form 2n for some n ∈ Z, whereasan odd number is an integer which has the form 2n + 1 for some n ∈ Z. Indeed, 6 iseven because 6 = 2(3), and −7 is odd because −7 = 2(−4) + 1.

Lemma A.1. Suppose n ∈ Z. Then n is either odd or even.

Proof. This seems obvious—but it requires proof: we must show that any n ∈ Z

can be written either as 2k or 2k + 1 for some other integer k. To prove this, we willuse a neat trick which mathematicians use all the time: we will consider a minimalcounterexample. Indeed, suppose there exists some integer n > 0 which is neithereven nor odd (we aim to conclude that there is no such n), and suppose that n isthe smallest such integer. Since n is the smallest integer with this property, thenn− 1 must be even or odd (otherwise n− 1 would be the smallest). But if n− 1 iseven; i.e., n − 1 = 2k for some k, then we get that n = 2k + 1, i.e., that n is odd,contradicting our assumption. Thus the alternative must be true; that is, n− 1 mustbe odd, i.e., n − 1 = 2k + 1 for some k. But this gives n = 2k + 2 = 2(k + 1),which means that n is even, also contradicting the assumption. Therefore thereis no smallest n > 0 with this property, and therefore all integers n > 0 are oddor even. We can identically prove this for negative values by considering a largestcounterexample, and thus we have that all integers are either odd or even.

Lemma A.2. Suppose n ∈ Z. If n2 is even, then n is even.

Proof. This is the same as saying that if n is not even, then n2 is not even (contra-positiveb), and since n is either odd or even by the previous lemma, this is thereforeequivalent to showing that if n is odd, then n2 is odd.

Indeed, if n is odd, then n = 2k + 1 for some k ∈ Z, which means that

n2 = (2k + 1)2 = (2k + 1)(2k + 1)

= t(2k + 1) (where t = 2k + 1)

= 2kt + t= 2k(2k + 1) + 2k + 1

= 4k2 + 2k + 2k + 1

= 4k2 + 4k + 1

= 2(2k2 + 2k) + 1

= 2m + 1, (where m = 2k2 + 2k)

and thus n2 is odd since it is of the required form.


https://youtu.be/LmpAntNjPj0


Now we are ready to prove that√

2 is irrational. Indeed, suppose there exist twointegers a and b such that a

b =√

2. We can also assume that a and b share nocommon divisors, since if they did, we can cancel them out.c

By definition of√

, if√

2 = a/b, then

√2 =

ab

=⇒ 2 =( a

b

)2=⇒ 2 =

a2

b2 =⇒ a2 = 2b2.

In particular, this means that a2 is even, which by lemma A.2, means a is even. Butsince a is even, then a = 2n for some n ∈ Z, which means

a2 = 2b2 =⇒ (2n)2 = 2b2 =⇒ 4n2 = 2b2 =⇒ 2n2 = b2.

This similarly gives us that b2 is even, which again by lemma A.2 means that b isalso even. Therefore a and b are both divisible by 2. But we chose a and b so thatthey have no common divisors! This must mean that our assumption was incorrect,that is, the assumption that “there exist integers a and b with no common divisorssuch that

√2 = a/b”, is incorrect. It follows that

√2 is irrational, which concludes

the proof.

It is said that one of the disciples of Pythagoras, Hippasos of Metapontion, presentedan argument to Pythagoras that

√2 is irrational. He was so outraged by this proof

that he had Hipassos killed by throwing him to the sea!

Thus we have proved that there is at least one x ∈ R where x /∈ Q.

aAlthough there is evidence which suggests that the theorem was known to theBabylonians before Pythagoras.

bThe contrapositive of a statement “If P then Q” is “If not Q then not P”. Thetwo statements are logically equivalent, that is, if one is true, so is the other. Forexample, “If it is raining, then the grass gets wet” is equivalent to “If the grass doesnot get wet, then it is not raining”. Note that this is not the same as “If it is notraining, the grass does not get wet”, which is not necessarily true!

cFor example, 46 can be written as 2

3 , since 4 and 6 have 2 as a common divisor,where as 2 and 3 now share no divisors.

Notice that each of the sets we defined (∅, N, Z, Q, R) contains all theelements of the previous one. When a set B contains all the elements of A,or more formally, if

For all x, if x ∈ A then x ∈ B,

we say A is a subset of B and write A ⊆ B. For example, if A = {1, 2, 3},B = {1, 2, 3, 4}, then A ⊆ B. Note that by this definition, every set S is asubset of itself. Also note that it is not necessarily the case that a set is asubset of the other; for example if C = {2, 4, 6, 8}, we neither have A ⊆ C



nor C ⊆ A.5

If A contains every element of B, and B contains every element of A, thatis, if both A ⊆ B and B ⊆ A, we say that A is equal to B, written A = B.Observe that

∅ ⊆N ⊆ Z ⊆ Q ⊆ R

but none of these are equal. In particular, by proving that√

2 ∈ R but√2 /∈ Q, we showed that Q 6= R.

Notation. One final group of notations we introduce are the subsets of realnumbers called the real intervals:

• [a, b] is the set of x ∈ R such that a 6 x 6 b

• [a, b) or [a, b[ is the set of x ∈ R such that a 6 x < b

• (a, b] or ]a, b] is the set of x ∈ R such that a < x 6 b

• (a, b) or ]a, b[ is the set of x ∈ R such that a < x < b

• [a, ∞) or [a, ∞[ is the set of x ∈ R such that a 6 x

• (a, ∞) or ]a, ∞[ is the set of x ∈ R such that a < x

• (−∞, b] or ]−∞, b] is the set of x ∈ R such that x 6 b

• (−∞, b) or ]−∞, b[ is the set of x ∈ R such that x < b

So for example, if x is a real number such that 1 6 x 6 2, then x ∈ [1, 2].If moreover, x 6= 2, then x ∈ [1, 2). If y is a positive real number, theny ∈ (0, ∞), whereas if y is a non-negative real number, then y ∈ [0, ∞).

Exercise A.3. 1. Consider the sets A = {1, 2, 3}, B = {2, 4, 6, 8},C = {−1, 0, 1} and D = {

√2, e, π}. For each of the following,

say whether they are true and false.

1 ∈ Aa) {1, 2} ∈ Ab) 4 /∈ Ac)

A ⊆ Bd) A = Ce) C ⊆ Cf)

D ⊆ Qg) C ⊆ Zh)√

2 ⊆ Ri)

[−1, 2] ⊆ [−2, 2)j) (−3, 3) ⊆ [−3, ∞)k)

5Unlike the similar looking relation “6” for real numbers, where it must be the case thatx 6 y or y 6 x. Because of this, ⊆ is called a partial order, and 6 is called a total order.



2. (Optional)

(a) Adapt the proof that√

2 is irrational to prove that√

3 isirrational.

(b) Why would the proof fail if one tries to show√

4 is irra-tional?

Now, let us define some set operations, that is, ways to combine sets tocreate new sets.

Definitions A.4. Suppose A and B are two sets. Then

(i) The union of A and B, denoted A∪ B, is the set defined by the property

If x ∈ A OR x ∈ B, then x ∈ A ∪ B.

(ii) The intersection of A and B, denoted A ∩ B, is the set defined by theproperty

If x ∈ A AND x ∈ B, then x ∈ A ∩ B.

(iii) The difference between A and B, denoted A r B, is the set defined bythe property

If x ∈ A AND x /∈ B, then x ∈ A r B.

Examples A.5. If A = {1, 2, 3, 4, 5}, B = {2, 4, 6, 8, 10} and C = {−1, 0, 1},then

A ∪ B = {1, 2, 3, 4, 5, 6, 8, 10}A ∩ B = {2, 4}A r B = {1, 3, 5}

(A ∪ B) ∩ C = {1, 2, 3, 4, 5, 6, 8, 10} ∩ {−1, 0, 1} = {1}A ∪ (B ∩ C) = {1, 2, 3, 4, 5} ∪∅ = {1, 2, 3, 4, 5} = A

The final operation, and probably the most important for these notes, is theCartesian product A× B of two sets. We first need to define the idea of anordered pair. First of all, for a finite set A, the number of elements of A iscalled its cardinality, denoted by |A|. For example, |{−1, 0, 1}| = 3.

An unordered pair is a set P such that |P| = 2. This makes sense of course,since if a set contains two elements, we call it a pair. Why unordered? Well,



a set gives no sense of “order”, only membership. For example, {1, 2} ={2, 1}. How do we represent a pair of objects with an idea of which is thefirst one, and which is the second one? What we do is the following. Wedefine the notation (a, b) to denote the set

(a, b) = {{a}, {a, b}},

and take this as our definition of an ordered pair. Why this way? Well, itsatisfies the property we want it to satisfy, namely, (a, b) = (c, d) if andonly if a = c and b = d. This is easy to prove by definition of set equality.

We can similarly define the ordered triple (a, b, c) by the pair (a, (b, c)), theordered quadruple (a, b, c, d) by the pair (a, (b, c, d)), and so on. In general,we define an ordered k-tuple (a1, a2, . . . , ak) by the pair (a1, (a2, a3, . . . , ak)).

Now we define our final operator.

Definition A.6 (Cartesian Product). Let A and B be two sets. The Cartesianproduct of A and B, denoted A× B, is the set defined by the property

If a ∈ A AND b ∈ B, then (a, b) ∈ A× B.

If A = B, then A× B = A× A is denoted by A2.

Thus the set A× B consists of all the ordered pairs (a, b) such that a ∈ Aand b ∈ B. Note that we do not have A ⊆ A× B nor B ⊆ A× B. Unlikethe other operators, the Cartesian product does not contain any of the sameobjects as A and B themselves.

Example A.7. If A = {1, 2, 3, 4} and B = {a, b, c}, then

A× B = {(1, a), (1, b), (1, c), (2, a), (2, b), (2, c),(3, a), (3, b), (3, c), (4, a), (4, b), (4, c)}.

Example A.8. The set Q consisting of all rational numbers can be identifiedwith the set Z × (Z r {0}), where the pair (p, q) is interpreted as p/q.There is not a one-to-one correspondence however, since many equivalentrepresentations exist: the pairs (1, 2), (3, 6), (−5,−10) each correspond tothe rational number 1/2.

Example A.9. The enormous set R×R = R2 consists of all pairs (x, y) ofreal numbers. We can identify each element of this set with a point in thexy-plane.



x

y

(x, y)

FIGURE 10: The point (x, y) in the plane.

Thus we consider the plane, in some sense, “equivalent” to the set R2; orrather, a way to visualise its points.

Exercise A.10. 1. Consider the sets A = {1, 2, 3, 4, 5, 6, 7, 8}, B ={2, 4, 6, 8, 10, 12, 14} and C = {2, 3, 6, 9}. Determine:

A ∪ Ba) A ∩ Bb) B ∪ Cc)

A r Cd) B ∩ Ce) B r Af)

A r Bg) A ∩ Ch) C r Bi)

C ∪ (B ∩ A)j) (A r B) ∪ (B r C) ∪ (C r A)k)

2. Consider the sets X = {4, 7, 2, 1}, Y = {4, 6, 12, 7, 3} and Z ={0, 1, 2}. Determine:

X ∪Ya) X rYb)

(X rY) ∪ (Y r X)c) X ∩Y ∩ Zd)

X ∪ (Y ∩ Z)e) (X ∪Y) ∩ Zf)

X r (Y r (Z r X))g) Z3h)

3. Let A = {1, 2, 3, 4}, B = {a, b}, C = {0, 1, 2} and D = {−1, 1}.Find:


§A.2 | Big Operators Luke Collins

A× Ba) A2b)

∅× Bc) A× B2d)

B× Ae) B2f)

B3g) C× Dh)

D×∅i) B× Cj)

B× C× Dk) (A× C) ∩ (C× D)l)

A.2 Big OperatorsThis short subsection simply introduces Σ and Π notation. When we re-peatedly perform an operation, say, three times, four times, and in general,n times, it becomes tedious to write out expressions such as

am + am+1 + am+2 + · · ·+ an−2 + an−1 + an

where ai is some expression which changes with respect to the value of i.Thus we introduce the summation notation

n

∑i=m

ai.

This is equivalent to the expression above. The ‘i = m’ below the Greekletter Σ (S for sum) means that we plug in i = m into the expression to theright. We successively add terms, incrementing i by 1 each time, until n isreached. So for example,

5

∑i=1

cos(ix) = cos x + cos 2x + cos 3x + cos 4x + cos 5x,

3

∑i=−3

x2i

|i− 4| =1

7x6 +1

6x4 +1

5x2 +14+

x2

3+

x4

2+ x6

and10

∑i=1

x = x + x + x + x + x + x + x + x + x + x = 10x

(in this last one, the expression on the right did not involve i, so the termsare constant with respect to the index of summation).



For a more formal treatment, we can define Σ recursively by

n

∑i=m

ai =

an if i = n

am +n

∑i=m+1

ai otherwise.

An important property of the Σ-operator is its linearity.

Proposition A.11 (Linearity of Σ). Let k be a functions which is constant withrespect to the index of summation i (i.e., does not depend on i), and suppose ai andbi are functions. Then

n

∑i=m

kai = kn

∑i=m

ai andn

∑i=m

(ai + bi) =n

∑i=m

ai +n

∑i=m

bi.

A proof of this result is best done by induction on the recursive definition.We give an informal proof here.

Informal Proof. We have

n

∑i=m

kai = kam + kam+1 + · · ·+ kan−1 + kan

= k(am + am+1 + · · ·+ an−1 + an)

= kn

∑i=m

ai,

and

n

∑i=m

(ai + bi) = (am + bm) + (am+1 + bm+1) + · · ·+ (an + bn)

= (am + am+1 + · · ·+ an) + (bm + bm+1 + · · ·+ bn)

=n

∑i=m

ai +n

∑i=m

bi,

as required.

Another useful result is the following.



Proposition A.12. Suppose f is a function of i and j. Then

b

∑i=a

d

∑j=c

f (i, j) =d

∑j=c

b

∑i=a

f (i, j).

Again, a formal proof of this result is by induction on the definition.

Informal Proof. ∑bi=a ∑d

j=c f (i, j) sums the following terms row by row:

f (a, c) f (a, c + 1) · · · f (a, d− 1) f (a, d)f (a + 1, c) f (a + 1, c + 1) · · · f (a + 1, d− 1) f (a + 1, d)

......

. . ....

...f (b− 1, c) f (b− 1, c + 1) · · · f (b− 1, d− 1) f (b− 1, d)

f (b, c) f (b, c + 1) · · · f (b, d− 1) f (b, d)

∑dj=c ∑b

j=a f (i, j) sums the same terms column by column.

The idea of a large operator is not something unique to addition. We usethe Greek letter Π (P for product) when we want × instead of +, that is,

n

∏i=m

ai = amam+1am+2 · · · an−2an−1an.

We define it recursively, similarly to Σ:

n

∏i=m

ai =

an if i = n

am

( n

∏i=m+1

ai

)otherwise.

Since both use the symbol ×, we will also use this to denote the Cartesianproduct for more than two sets:

Definition A.13 (Cartesian Product). Let X1, X2, . . . , Xn be sets. Then theCartesian product of X1, X2, . . . , Xn, denoted ∏n

i=1 Xi, is the set defined bythe property

If xi ∈ Xi for all i ∈ {1, 2, . . . , n}, then (x1, x2, . . . , xn) ∈n

∏i=1

Xi.

If X1 = · · · = Xn = X, then the product ∏ni=1 Xi = ∏n

i=1 X is denoted Xn.


§B | Solutions to Exercises Luke Collins

Examples A.14. The set of all triples (x, y, z) of real numbers is the set R3 =R×R×R = ∏3

i=1 R. We can identify each element of this set with a pointin 3D space.

If X = {a, b}, then

X4 = {(a, a, a, a), (a, a, a, b), (a, a, b, a), . . . , (b, b, b, a), (b, b, b, b)}.

Remark A.15. One can also define the large operators

n⋂i=m

Ai = Am ∩ Am+1 ∩ · · · ∩ An−1 ∩ An

n⋃i=m

Ai = Am ∪ Am+1 ∪ · · · ∪ An−1 ∪ An.

for intersections and unions.

In general, if ⊗ denotes some binary operation on a set X and xi ∈ X fori ∈ {m, m + 1, . . . , n − 1, n}, then its corresponding (right-associative) bigoperator is defined recursively by

n⊗i=m

xi =

xn if i = n

xm ⊗( n⊗

i=m+1

xi

)otherwise.

B Solutions to Exercises1.29: Unjustified stepHint: show that ∑n

i=1(λui)(µvi) = λµ ∑ni=1 uivi.

1.31: Distances and Angles (Easy stu�)1. For each of these, expand the definition of dot product as in the pre-

vious exercise.

2. Similar to question 1.

3. If u = (u1, u2) is a unit vector, then ‖u‖ =√

u12 + u22 = 1, so 0 6

u12 = 1− u2

2 6 1, i.e. 0 6 u12 6 1, so u1 ∈ [−1, 1], and thus we may

define θ = cos−1(u1). The construction of an easy diagram of u withtail at the origin shows that this value θ is indeed the angle u makeswith the x-axis.



Then by definition of θ, we immediately have u1 = cos θ, and u2 =√1− u1

2 =√

1− cos2 θ =√

sin2 θ = sin θ, so that u = (cos θ, sin θ).

4. Hint: Recall how ei is defined in terms of δik. Then expand the defini-tions as usual!

1.37: Distances and Angles (Interesting problems)1.

# »

AB = 2i+ 4j− 3k,# »

BC = i− 2j+ 2k,# »

CA = −3i− 2j+ k. Side lengthsare ‖ # »

AB‖ =√

29, ‖ # »

BC‖ = 3, ‖ # »

CA‖ =√

14.

2. If u and v are orthogonal, then u · v = 0, and then the result followsfrom exercise 1.31.2(e). This is equivalent to Pythagoras’ theorem be-cause a right-angled triangle can be represented by taking u and v asits orthogonal sides, and then the length of the hypotenuse is givenby v− u.

3. Hint: show that# »

LM ‖ # »

BC. In other words, that the normalised ver-sions of each of the vectors are equal (±).

4. Simplified: ‖b‖2 + ‖c‖2 − (‖b‖2 + 2b · c + ‖c‖2) = 2b · c. Hence sub-stituting b =

# »

AC and c =# »

AB, we get

‖ # »

AC‖2 + ‖ # »

AB‖2 − (# »

AC− # »

AB) · ( # »

AC− # »

AB) = 2# »

AC · # »

AB

=⇒ ‖ # »

AC‖2 + ‖ # »

AB‖2 − # »

BC · # »

BC = 2# »

AC · # »

AB

=⇒ b2 + c2 − a2 = 2‖ # »

AC‖‖ # »

AB‖# »

AC · # »

AB‖ # »

AC‖‖ # »

AB‖

=⇒ b2 + c2 − a2 = 2bc

(# »

AC‖ # »

AC‖·

# »

AB‖ # »

AB‖

)=⇒ b2 + c2 − a2 = 2bc cos(cos−1(

# »

AC · # »

AB))

=⇒ b2 + c2 − a2 = 2bc cos(](# »

AC,# »

AB))

=⇒ a2 = b2 + c2 − 2bc cos(BAC),

as required.

5. There are in fact three possible choices for d given a, b and c. Theseare a + b− c, a + c− b, and b + c− a. In each case, we have threeparallelograms.

6.

7. The formula follows easily by expanding the dot product definition.



A.3: Set Facts1. truea) false (⊆, not ∈)b) truec)

falsed) falsee) truef)

falseg) trueh) false (∈ not ⊆)i)

false (2 ∈ [−1, 2] but 2 /∈ [−2, 2))j) truek)

2. Hint: first show that if a2 is a multiple of 3, then a must also be amultiple of 3.

3. Hint: if a2 is a multiple of 4, is a necessarily a multiple of 4?

A.10: Set Operations1. {1, 2, 3, 4, 5, 6, 7, 8, 10, 12, 14}a) {2, 4, 6, 8}b)

{2, 3, 4, 6, 8, 9, 10, 12, 14}c) {1, 4, 5, 7, 8}d)

{2, 6}e) {10, 12, 14}f) {1, 3, 5, 7}g)

{2, 3, 6}h) {3, 9}i) {2, 3, 4, 6, 8, 9}j)

{1, 3, 4, 5, 7, 8, 9, 10, 12, 14}k)

2. {1, 2, 3, 4, 6, 7, 12}a) {1, 2}b) {1, 2, 3, 6, 12}c)

∅d) {1, 2, 4, 7}e) {1, 2}f)

{1, 2}g) {(0, 0, 0), (0, 0, 1), (0, 0, 2), (0, 1, 0), (0, 1, 1),(0, 1, 2), (0, 2, 0), (0, 2, 1), (0, 2, 2), (1, 0, 0),(1, 0, 1), (1, 0, 2), (1, 1, 0), (1, 1, 1), (1, 1, 2),(1, 2, 0), (1, 2, 1), (1, 2, 2), (2, 0, 0), (2, 0, 1),(2, 0, 2), (2, 1, 0), (2, 1, 1), (2, 1, 2), (2, 2, 0),(2, 2, 1), (2, 2, 2)}

h)

3. a) {(1, a), (2, a), (3, a), (4, a), (1, b), (2, b), (3, b), (4, b)}

b) {(1, 1), (1, 2), (1, 3), (1, 4), (2, 1), (2, 2), (2, 3), (2, 4), (3, 1), (3, 2), (3,3), (3, 4), (4, 1), (4, 2), (4, 3), (4, 4)}

c) ∅



d) {(1, (a, a)), (1, (a, b)), (1, (b, a)), (1, (b, b)), (2, (a, a)), (2, (a, b)), (2,(b, a)), (2, (b, b)), (3, (a, a)), (3, (a, b)), (3, (b, a)), (3, (b, b)), (4, (a, a)),(4, (a, b)), (4, (b, a)), (4, (b, b))}

e) {(a, 1), (a, 2), (a, 3), (a, 4), (b, 1), (b, 2), (b, 3), (b, 4)}

f) {(a, a), (a, b), (b, a), (b, b)}

g) {(a, a, a), (a, a, b), (a, b, a), (a, b, b), (b, a, a), (b, a, b), (b, b, a), (b, b, b)}

h) {(0,−1), (0, 1), (1,−1), (1, 1), (2,−1), (2, 1)}

i) ∅

j) {(a, 0), (a, 1), (a, 2), (b, 0), (b, 1), (b, 2)}

k) {(a, 0,−1), (a, 0, 1), (a, 1,−1), (a, 1, 1), (a, 2,−1), (a, 2, 1), (b, 0,−1),(b, 0, 1), (b, 1,−1), (b, 1, 1), (b, 2,−1), (b, 2, 1)}

l) {(1, 1), (2, 1)}


the basics of vectors, matrices and linear algebra · 2020-07-02 · vectors,matrices and...

Documents