- chap. 3: numerical linear algebra - maths.nuigalway.ieniall/ma385/3-linearalgebra.pdf · chap. 3:...

Chap. 3: Numerical Linear Algebra (1/103)

Chap. 3: Numerical Linear Algebra§3.1 Introduction to solving linear systems

MA385 – Numerical Analysis 1

November 2019

§3.1 Solving Linear Systems (2/103)

This section of the course is concerned with solving systems oflinear equations (“simultaneous equations”). All problems are ofthe form: find the set of real numbers x1, x2, . . . , xn such that

a11x1 + a12x2 + · · ·+ a1nxn = b1

a21x1 + a22x2 + · · ·+ a2nxn = b2...

an1x1 + an2x2 + · · ·+ annxn = bn

where the aij and bi are real numbers.

It is natural to rephrase this using the language of linear algebra:Find x = (x1, x2, . . . , xn)T ∈ Rn such that

Ax = b, where A ∈ Rn×n, and b ∈ Rn. (1)

1

§3.1 Solving Linear Systems (3/103)

In this section, we’ll try to find a clever way to solving this system.In particular,

1. we’ll argue that its unnecessary and (more importantly)expensive to try to compute A−1;

2. we’ll have a look at Gaussian Elimination, but will study it inthe context of LU -factorisation;

3. after a detour to talk about matrix norms, we calculate thecondition number of a matrix;

4. this last task will require us to be able to estimate theeigenvalues of matrices, so we will finish the module by studyinghow to do that.

A short review of linear algebra (4/103)

A vector x ∈ Rn, is an ordered n-tuple of real numbers,x = (x1, x2, . . . , xn)T .A matrix A ∈ Rm×n is a rectangular array of n columns of mreal numbers. (But we will only deal with square matrices,i.e., ones where m = n).You already know how to multiply a matrix by a vector, and amatrix by a matrix (right??).You can write the scalar product as(x,y) = xTy =

∑ni=1 xiyi.

Recall that (Ax)T = xTAT , so(x, Ay) = xTAy = (ATx)Ty = (ATx,y).Letting I be the identity matrix, if there is a matrix B suchthat AB = BA = I, when we call B the inverse of A andwrite B = A−1. If there is no such matrix, we say that A issingular .

A short review of linear algebra (5/103)

Theorem 3.1The following statements are equivalent.1. For any b, Ax = b has a solution.2. If there is a solution to Ax = b, it is unique.3. If Ax = 0 then x = 0.4. The columns (or rows) of A are linearly independent.5. There exists a matrix A−1 ∈ Rn×n such thatAA−1 = I = A−1A.

6. det(A) 6= 0.7. A has rank n.

Item 5 of the above lists suggests that we could solve Ax = b asfollows: find A−1 and then compute x = A−1b. However this isnot the best way to proceed. We only need x and computing A−1

requires quite a bit a work.

The time it takes to compute det(A) (6/103)

In introductory linear algebra courses, you might compute

A−1 = 1det(A)A

∗

where A∗ is the adjoint matrix. But it turns out that computingdet(A) directly can be very time-consuming.

Let dn be the number of multiplications required to compute thedeterminant of an n×n matrix using the Method of Minors (alsocalled “the Laplacian determinant expansion”). We know that if

A =(a bc d

)

then det(A) = ad− bc. So d2 = 2.

The time it takes to compute det(A) (7/103)

If

A =

a b cd e fg h j

then

det(A) =

∣∣∣∣∣∣∣a b cd e fg h j

∣∣∣∣∣∣∣ = a

∣∣∣∣∣e fh j

∣∣∣∣∣− b∣∣∣∣∣d fg j

∣∣∣∣∣+ c

∣∣∣∣∣d eg h

∣∣∣∣∣ ,so d3 ≥ 3d2.

(See notes from class)

In general we find that it takes (n)(n− 1) · · · (4)(3)(2)(1) = n!operations to compute the determinant this way. And that is a lotof operations.

Exercises (8/103)

Exercise 3.1Suppose you had a computer that computer perform 2 billion operations per second.Give a lower bound for the amount of time required to compute the determinant of a10-by-10, 20-by-20, and 30-by-30, matrix using the method of minors.

Exercise 3.2The Cauchy-Binet formula for determinants tells us that, if A and B are squarematrices of the same size, then det(AB) = det(A) det(B). Using it, or otherwise,show that det(σA) = σn det(A) for any A ∈ Rn×n and any scalar σ ∈ R.Note: this exercise gives us another reason to avoid trying to calculate thedeterminant of the coefficient matrix in order to find the inverse, and thus the solutionto the problem. For example A ∈ R16×16 and the system is rescaled by σ = 0.1, thendet(A) is rescaled by 10−16. On standard computing platforms, like MATLAB, itwould be indistinguishable from a singular matrix!

§3.2 Gaussian Elimination (9/103)

Chap. 3: Numerical Linear Algebra§3.2 Gaussian Elimination (and triangular

matrices)MA385/MA530 – Numerical Analysis 1

November 2019

Gaussian Elimination (10/103)

Gaussian Elimination is an exact method for solving linear systems(we replace the problem with one that is easier to solve and hasthe same solution.)This is in contrast to approximate methods studied earlier in themodule.

There are approximate methods for solving linear systems, but theyare not part of this module.

Carl Freidrich Gauß, Germany, 1777-1855.Although he produced many very important original ideas,this wasn’t one of them. The Chinese knew of “GaussianElimination” about 2000 years ago. His actual contributionsincluded major discoveries in the areas of number theory,geometry, and astronomy.

Gaussian Elimination (11/103)

Example 3.2

Consider the problem:−1 3 −13 1 −22 −2 5

x1x2x3

=

50−9

We can perform a sequence of elementary row operations to yieldthe system: −1 3 −1

0 2 −10 0 5

x1x2x3

=

53−5

.

Row ops = matrix multiplication (12/103)

Gaussian Elimination: perform elementary row operations such as

A =

a11 a12 a13a21 a22 a23a31 a32 a33

being replaced by a11 a12 a13

a21 + µ21a11 a22 + µ21a12 a23 + µ21a13a31 a32 a33

=

A+ µ21

0 0 0a11 a12 a130 0 0

where µ21 = −a21/a11, so that a21 + µ21a11 = 0.


Note that 0 0 0a11 a12 a130 0 0

=

0 0 01 0 00 0 0

a11 a12 a13a21 a22 a23a31 a32 a33

so we can write the row operation as

(I + µ21E

(21))A, where E(pq)

is the matrix of all zeros, except for epq = 1.

In general each of the row operations in Gaussian Elimination canbe written as(

I + µpqE(pq))A where 1 ≤ q < p ≤ n, (2)

and(I + µpqE

(pq)) is an example of a Unit Lower TriangularMatrix.


We can conclude that each step of the process will involvemultiplying A by a unit lower triangular matrix, resulting in anupper triangular matrix.

Triangular Matrices (15/103)

Definition 3.3 (Lower Triangular)L ∈ Rn×n is a Lower Triangular (LT) Matrix if the only non-zeroentries are on or below the main diagonal, i.e., if lij = 0 for1 ≤ i < j ≤ n.It is a unit Lower Triangular matrix if, in addition, lii = 1.

Examples:


Definition 3.4Upper Triangular] U ∈ Rn×n is an Upper Triangular (UT) matrix ifuij = 0 for 1 ≤ j < i ≤ n. It is a Unit Upper Triangular Matrix ifuii = 1.

Examples:


Triangular matrices have many important properties. A veryimportant one is: the determinant of a triangular matrix is theproduct of the diagonal entries. For a proof, see Exercise 3.5.


There are other important properties of triangular matrices, butfirst we need the idea of matrix partitioning.

Definition 3.5 (Submatrix)X is a submatrix of A if it can be obtained by deleting some rowsand columns of A.

Example:


Definition 3.6 (Leading Principal Submatrix)The Leading Principal Submatrix of order k of A ∈ Rn×n isA(k) ∈ Rk×k obtained by deleting all but the first k rows andcolumns of A. (Simply put, it’s the k × k matrix in the topleft-hand corner of A).

Example:


Matrix partitioningTo partition a matrix means to divide it into contiguous blocksthat are submatrices.

Example:


Next recall that if A and V are matrices of the same size, and eachare partitioned

A =(B C

D E

), and V =

(W X

Y Z

),

where B is the same size as W , C is the same size as X, etc. Then

AV =(BW + CY BX + CZ

DW + EY DX + EZ

).


Theorem 3.7 (Properties of Lower Triangular Matrices)

For any integer n ≥ 2:(i) If L1 and L2 are n× n Lower Triangular (LT) Matrices that

so too is their product L1L2.(ii) If L1 and L2 are n× n Unit Lower Triangular matrices, then

so too is their product L1L2.(iii) L1 is nonsingular if and only if all the lii 6= 0. In particular all

Unit LT matrices are nonsingular.(iv) The inverse of a LT matrix is an LT matrix. The inverse of a

unit LT matrix is a unit LT matrix.


We restate Part (iv) as follows:

Suppose that L ∈ Rn×n is a lower triangular matrix withn ≥ 2, and that there is a matrix L−1 ∈ Rn×n such thatL−1L = In. Then L−1 is also a lower triangular matrix.


Theorem 3.8 (Properties of Upper Triangular Matrices)

Statements that are analogous to those concerning the propertiesof lower triangular matrices hold for upper triangular and unitlower triangular matrices. (For proof, see the exercises at the endof this section).

Exercises (26/103)

Exercise 3.3Every step of Gaussian Elimination can be thought of as a left multiplication by a unitlower triangular matrix. That is, we obtain an upper triangular matrix U bymultiplying A by k unit lower triangular matrices: LkLk−1Lk−2 . . . L2L1A = U,

where each Li = I + µpqE(pq), and E(pq) is the matrix whose only non-zero entry isepq = 1. Give an expression for k in terms of n.

Exercise 3.4

Let L be a lower triangular n× n matrix. Show that det(L) =n∏

j=1

ljj . Hence give a

necessary and sufficient condition for L to be invertible. What does that tell us aboutUnit Lower Triangular Matrices?

Exercise 3.5Let L be a lower triangular matrix. Show that each diagonal entry of L, ljj is aneigenvalue of L.

Exercises (27/103)Exercise 3.6Prove Parts (i)–(iii) of Theorem 3.7.

Exercise 3.7Prove Theorem 3.8.

Exercise 3.8Construct an alternative proof of the first part of 3.7 (iv) as follows: Suppose that Lis a non-singular lower triangular matrix. If b ∈ Rn is such that bi = 0 fori = 1, . . . , k ≤ n, and y solves Ly = b, then yi = 0 for i = 1, . . . , k ≤ n. (Hint:partition L by the first k rows and columns.)Now use this to give a alternative proof of the fact that the inverse of a lowertriangular matrix is itself lower triangular.

§3.3 LU-factorisation (28/103)

Solving Linear Systems§3.3 LU-factorisation

MA385/530 – Numerical Analysis 1

November 2019

In these slides,

LT means “lower triangular”UT means “upper triangular”


The goal of this section is to demonstrate that the process ofGaussian Elimination applied to a matrix A is equivalent tofactoring A as the product of a unit lower triangular and uppertriangular matrix.

The Section 3.2 we saw that each elementary row operation inGaussian Elimination involves replacing A with (I + µrsE

(rs))A.

Example: For the 3× 3 case, this involved computing

(I + µ32E(32))(I + µ31E

(31))(I + µ21E(21))A.


In general we multiply A by a sequence of matrices

(I + µrsE(rs)),

all of which are unit lower triangular matrices.

When we are finished we have reduced A to an upper triangularmatrix.

So we can write the whole process as

LkLk−1Lk−2 . . . L2L1A = U, (3)

where each of the Li is a unit LT matrix.


But from Theorem 3.2.6, we know that the product of unit LTmatrices is itself a unit LT matrix. So we can write the wholeprocess described in (3) as

LA = U. (4)

But Theorem 3.2.6 also tells us that the inverse of a unit LTmatrix exists and is a unit LT matrix. So we can write (4) as

A = LU

where L is unit lower triangular and U is upper triangular.

This is called “LU-factorisation”.


Definition 3.9The LU-factorization of the matrix is a unit lower triangularmatrix L and an upper triangular matrix U such that LU = A.

Example 3.10

If A =(

3 2−1 2

)then:


Example 3.11

If A =

3 −1 12 4 30 2 −4

then:

A formula for LU-factorisation (34/103)

We now want to work out formulae for L and U where

ai,j = (LU)ij =n∑k=1

likukj 1 ≤ i, j ≤ n.

Since L and U are triangular,

If i ≤ j then ai,j =i∑

k=1likukj (5a)

If j < i then ai,j =j∑

k=1likukj (5b)


The first of these equations can be written as

ai,j =i−1∑k=1

likukj + liiuij .

But lii = 1 so:

ui,j = aij −i−1∑k=1

likukj

{i = 1, . . . , j − 1,j = 2, . . . , n. (6a)

And from the second:

li,j = 1ujj

(aij −

j−1∑k=1

likukj

) {i = 2, . . . , n,

j = 1, . . . , i− 1. (6b)


Example 3.12

Find the LU -factorisation of

A =

−1 0 1 2−2 −2 1 4−3 −4 −2 4−4 −6 −5 0


Full details of Example 3.12: First, using (6a) with i = 1 we haveu1j = a1j :

U =

−1 0 1 20 u22 u23 u240 0 u33 u340 0 0 u44

.

Then (6b) with j = 1 we have li1 = ai1/u11:

L =

1 0 0 02 1 0 03 l32 1 04 l42 l43 1

.

Next (6a) with i = 2 we have u2j = a2j − l21u2j :

U =

−1 0 1 20 −2 −1 00 0 u33 u340 0 0 u44

,

A formula for LU-factorisation (38/103)then (6b) with j = 2 we have li2 = (ai2 − li1u12)/u22:

L =

1 0 0 02 1 0 03 2 1 04 3 l43 1

Etc....

Existence of an LU -factorisation (39/103)

Not every matrix has an LU -factorisation. So we need tocharacterise the matrices that do.

To prove the next theorem we need the Cauchy-Binet Formula:det(AB) = det(A) det(B).1

Theorem 3.13

If n ≥ 2 and A ∈ Rn×n is such that every leading principalsubmatrix of A is nonsingular for 1 ≤ k < n, then A has anLU -factorisation.

1Wikipedia disagrees with this attribution

Existence of an LU -factorisation (40/103)

Exercises (41/103)

Exercise 3.9

Many textbooks and computing systems compute the factorisationA = LDU where L and U are unit lower and unit upper triangularmatrices respectively, and D is a diagonal matrix. Show such afactorisation exists, providing that if n ≥ 2 and A ∈ Rn×n, thenevery leading principal submatrix of A is nonsingular for 1 ≤ k < n.

§3.4 Solving linear systems (42/103)

Solving Linear Systems

§3.4 Solving linear systems(i.e., actually solving Ax = b)

MA385/MA530 – Numerical Analysis 1

November 2019

We now know

there are sufficient conditions that guarantee we can factoriseA as LU ;How to compute L and U .

But our overarching goal is to solve: “find x ∈ Rn such thatAx = b, for some b ∈ Rn”. We do this by first solving Ly = b fory ∈ Rn and then Ux = y. Because L and U are triangular, this iseasy. The process is called back-substitution.

Solving LUx = b (43/103)

Example 3.14Use LU -factorisation to solve

−1 0 1 2−2 −2 1 4−3 −4 −2 4−4 −6 −5 0

x1x2x3x4

=

−2−3−11

Solution: In Example 4 of Section 3.3, we saw that

L =

1 0 0 02 1 0 03 2 1 04 3 2 1

, U =

−1 0 1 20 −2 −1 00 0 −3 −20 0 0 −4

So then...

Solving LUx = b (44/103)

Pivoting (not covered in class) (45/103)

Example 3.15Suppose we want to compute the LU -factorisation of

A =

0 2 −42 4 33 −1 1

.We can’t compute l21 because u11 = 0. But if we swap rows 1 and3, then we can (we did this as Example 3.4.3). This like changingthe order of the linear equations we want to solve. If

P =

0 0 10 1 01 0 0

then PA =

3 −1 12 4 30 2 −4

.This is called Pivoting and P is the permutation matrix.

Pivoting (not covered in class) (46/103)

Definition 3.16P ∈ Rn×n is a Permutation Matrix if every entry is either 0 or 1 (itis a Boolean Matrix) and if all the row and column sums are 1.

Theorem 3.17

For any A ∈ Rn×n there exists a permutation matrix P such thatPA = LU .

For a proof, see p53 of text book.

The computational cost (47/103)

How efficient is the method of LU -factorization for solvingAx = b? That is, how many computational steps (additions andmultiplications) are required? In Section 2.6 of the textbook, you’llfind a discussion that goes roughly as follows:Suppose we want to compute li,j . Recall the formula from Section3.4:

li,j = 1ujj

(aij −

j−1∑k=1

likukj

)i = 2, . . . , n, and j = 1, . . . , i− 1.

We see that this requires j − 2 additions, j − 1 multiplications, 1subtraction and 1 division: a total of 2j − 1 operations.


In Exercise 3.13 we will show that

1 + 2 + · · ·+ k = 12k(k + 1), and

12 + 22 + · · · k2 = 16k(k + 1)(2k + 1).

So the number of operations required for computing L is

n∑i=2

i−1∑j=1

(2j − 1) =n∑i=2

i2 − 2i+ 1 =

16n(n+ 1)(2n+ 1)− n(n+ 1) + n ≤ Cn3

for some C.


A similar (slightly smaller) number of operations is required forcomputing U . (For a slightly different approach that yields cruderestimates, but requires a little less work, have a look at Lecture 10of Stewart’s Afternotes on Numerical Analysis).

. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

This doesn’t tell us how long a computer program will take to run,but it does tell us how the execution time grows with n. Forexample, if n = 100 and the program takes a second to execute,then if n = 1000 we’d expect it to take about a thousand seconds.

Towards an error analysis (50/103)

Unlike the other methods we studied so far in this course, weshouldn’t have to do an error analysis, in the sense of estimatingthe difference between the true solution, and our numerical one.That is because the LU -approximation approach should give usexactly the true solution.

However, things are not that simple. Unlike the methods in earliersections, the effects of (inexact) floating point computationsbecome very pronounced. In the next section, we’ll develop theideas needed to quantify these effects.

Exercises (51/103)

Exercise 3.10Suppose that A has an LDU -factorisation (see Exercises 3.9). How could thisfactorization be used to solve Ax = b?

Exercise 3.11Prove that

1 + 2 + · · ·+ k =12k(k + 1), and

12 + 22 + · · · k2 =16k(k + 1)(2k + 1).

§3.5 Vector and Matrix Norms (52/103)

§3 Solving linear systems§3.5 Vector and Matrix Norms


November 2019

...................................................................................................................................................................................................................

..............

..............

..............

..............

.....

(0, 1)

(0, −1)

(1, 0)

(−1, 0)........................................

....................

......................................................................

...................................................................... .......... .......... .......... .......... ..........

....................

......................................

(0, 1)

(0, −1)

(1, 0)

(−1, 0)


All computer implementations of algorithms that involvefloating-point numbers (roughly, finite decimal approximations ofreal numbers) contain errors due to round-off error.

It transpires that computer implementations of LU -factorization,and related methods, lead to these round-off errors being greatlymagnified: this phenomenon is the main focus of this final sectionof the course.


You might remember from earlier sections of the course that wehad to assume functions where well-behaved in the sense that

|f(x)− f(y)||x− y|

≤ L,

for some number L, so that our numerical schemes (e.g., fixedpoint iteration, Euler’s method, etc) would work. If a functiondoesn’t satisfy a condition like this, we say it is “ill-conditioned”.

One of the consequences is that a small error in the inputs gives alarge error in the outputs.

We’d like to be able to express similar ideas about matrices: thatA(u− v) = Au−Av is not too “large” compared to u− v. To dothis we used the notion of a “norm” to describing the relativessizes of the vectors u and Au.

Three vector norms (55/103)

When we want to consider the size of a real number, withoutregard to sign, we use the absolute value. Important properties ofthis function are:

1. |x| ≥ 0 for all x.2. |x| = 0 if and only if x = 0.3. |λx| = |λ||x|.4. |x+ y| ≤ |x|+ |y| (triangle inequality).

This notion can be extended to vectors and matrices.


Definition 3.18

Let Rn be all the vectors of length n of real numbers. Thefunction ‖ · ‖ is called a norm of Rn if, for all u, v ∈ Rn

1. ‖v‖ ≥ 0,2. ‖v‖ = 0 if and only if v = 0.3. ‖λv‖ = |λ|‖v‖ for any λ ∈ R,4. ‖u+ v‖ ≤ ‖u‖+ ‖v‖ (triangle inequality).

Norms on vectors in Rn quantify the size of the vector. But thereare different ways of doing this...


Definition 3.19Let v ∈ Rn: v = (v1, v2, . . . , vn−1, vn)T .

(i) The 1-norm (a.k.a. the Taxi cab norm) is ‖v‖1 =n∑i=1|vi|.

(ii) The 2-norm (a.k.a. the Euclidean norm) ‖v‖2 =( n∑i=1

v2i

)1/2.

Note, if v is a vector in Rn, then

vTv = v21 + v2

2 + · · ·+ v2n = ‖v‖22.

(iii) The ∞-norm (a.k.a. the max-norm) ‖v‖∞ = nmaxi=1|vi|.

Example: v = (−2, 4,−4)


The unit balls in R2 given by ‖ · ‖1 (top left),‖x‖2 =

√x2

1 + x22 = 1 (top right), and ‖ · ‖∞.

...................................................................................................................................................................................................................

..............

..............

..............

..............

.....

(0, 1)

(0, −1)

(1, 0)

(−1, 0)........................................

....................

......................................................................

...................................................................... .......... .......... .......... .......... ..........

....................

......................................

(0, 1)

(0, −1)

(1, 0)

(−1, 0)

(0, 1)

(0, −1)

(−1, 0)

(1, 0)


It is easy to show that ‖ · ‖1 and ‖ · ‖∞ are norms. And it is nothard to show that ‖ · ‖2 satisfies conditions (1), (2) and (3) ofDefinition 3.18.

It takes a little bit of effort to show that ‖ · ‖2 satisfies the triangleinequality; details are given in Section 3.5.9 of the notes.

Matrix Norms (60/103)

Definition 3.20

Given any norm ‖ · ‖ on Rn, there is a subordinate matrix norm onRn×n defined by

‖A‖ = maxv∈Rn

?

‖Av‖‖v‖

, (7)

where A ∈ Rn×n and Rn? = Rn/{0}.

You might wonder why we define a matrix norm like this. Thereason is that we like to think of A as an operator on Rn: ifv ∈ Rn then Av ∈ Rn. So rather than the norm giving usinformation about the “size” of the entries of a matrix, it tells ushow much the matrix can change the size of a vector.

Computing Matrix Norms (61/103)

It is not obvious from the above definition how to calculate thenorm of a given matrix. We’ll see that

The ∞-norm of a matrix is also the largest absolute-value rowsum.The 1-norm of a matrix is also the largest absolute-valuecolumn sum.The 2-norm of the matrix A is the square root of the largesteigenvalue of ATA.

The max-norm on Rn×n (62/103)

Theorem 3.21For any A ∈ Rn×n the subordinate matrix norm associated with‖ · ‖∞ on Rn can be computed by

‖A‖∞ = maxi=1,...,n

n∑j=1|aij |.


A similar result holds for the 1-norm, the proof of which is left asan exercise.

Theorem 3.22

For any A ∈ Rn×n the subordinate matrix norm associated with‖ · ‖∞ on Rn can be computed by

‖A‖1 = maxj=1,...,n

n∑i=1|ai,j |. (8)

Computing ‖A‖2 Eigenvalues (65/103)

Computing the 2-norm of a matrix is a little harder that computingthe 1- or ∞-norms. However, later we’ll need estimates not just for‖A‖, but also ‖A−1‖. And, unlike the 1- and ∞-norms, we canestimate ‖A−1‖2 without explicitly forming A−1.

We begin by recalling some important facts about eigenvalues andeigenvectors.

Definition 3.23

Let A ∈ Rn×n. We call λ ∈ C an eigenvalue of A if there is anon-zero vector x ∈ Cn such that

Ax = λx.

We call any such x an eigenvector of A associated with λ.


(i) If A is a real symmetric matrix (i.e., A = AT ), its eigenvaluesand eigenvectors are all real-valued.

(ii) If λ is an eigenvalue of A, the 1/λ is an eigenvalue of A−1.(iii) If x is an eigenvector associated with the eigenvalue λ then so

too is ηx for any non-zero scalar η.(iv) An eigenvector may be normalised as ‖x‖22 = xTx = 1.

Computing ‖A‖2 Eigenvalues (67/103)(v) There are n eigenvectors λ1, λn, . . . , λn associated with the

real symmetric matrix A. Let x(1), x(2), ..., x(n) be theassociated normalised eigenvectors. Then the eigenvectors arelinearly independent and so form a basis for Rn. That is, anyvector v ∈ Rn can be written as a linear combination:

v =n∑i=1

αix(i).

(vi) Furthermore, these eigenvectors are orthogonal andorthonormal: (

x(i))Tx(j) ={

1 i = j

0 i 6= j


Here is a useful consequence of (v) and (vi), which we will userepeatedly.


The singular values of a matrix A are the square roots of theeigenvalues of ATA. They play a very important role in matrixanalysis and in areas of applied linear algebra, such as image andtext processing. Our interest here is in their relationship to ‖A‖2.

But first we’ll prove a theorem about certain matrices (so called,“normal matrices”).


Theorem 3.24For any matrix A, the eigenvalues of ATA are real andnon-negative.


Part of the above proof involved showing that, if(ATA

)x = λx,

then √λ = ‖Ax‖2

‖x‖2.

This at the very least tells us that

‖A‖2 := maxx∈Rn

?

‖Ax‖2‖x‖2

≥ maxi=1,...,n

√λi.

With a bit more work, we can show that if λ1 ≤ λ2 ≤ · · · ≤ λn arethe the eigenvalues of B = ATA, then

‖A‖2 =√λn.


Theorem 3.25

Let A ∈ Rn×n. Let λ1 ≤ λ2 ≤ · · · ≤ λn, be the eigenvalues ofB = ATA. Then

‖A‖2 = maxi=1,...,n

√λi =

√λn,

Here is the main idea. For full details, see the text-book.

Exercises (74/103)

Exercise 3.12 (?)Show that, for any vector x ∈ Rn, ‖x‖∞ ≤ ‖x‖2 and ‖x‖2

2 ≤ ‖x‖1‖x‖∞. For eachof these inequalities, give an example for which the equality holds. Deduce that‖x‖∞ ≤ ‖x‖2 ≤ ‖x‖1.

Exercise 3.13Show that if x ∈ Rn, then ‖x‖1 ≤ n‖x‖∞ and that ‖x‖2 ≤

√n‖x‖∞.

Exercise 3.14Show that, for any subordinate matrix norm on Rn×n, the norm of the identity matrixis 1.

Exercises (75/103)Exercise 3.15 (?)Prove that

‖A‖1 = maxj=1,...,n

n∑i=1

|ai,j |.

Hint: Suppose thatn∑

i=1

|aij | ≤ C, j = 1, 2, . . . n,

show that for any vector x ∈ Rn

n∑i=1

|(Ax)i| ≤ C‖x‖1.

Now find a vector x such that∑n

i=1 |(Ax)i| = C‖x‖1. Now deduce the result.

Extra: ‖ · ‖2 is a norm on Rn (76/103)

As mentioned on Slide 59, it takes a little effort to show that ‖ · ‖2is indeed a norm on R2; in particular to show that it satisfies thetriangle inequality, we need the Cauchy-Schwarz inequality.

Lemma 1 (Cauchy-Schwarz)

|n∑i=1

uivi| ≤ ‖u‖2‖v‖2, ∀u,v ∈ Rn.

The proof can be found in any text-book on analysis.

Now can now apply Cauchy-Schwatz to show that

‖u+ v‖2 ≤ ‖u‖2 + ‖v‖2.

(PTO)

Extra: ‖ · ‖2 is a norm on Rn (77/103)

This is because

‖u+ v‖22 = (u+ v)T (u+ v)= uTu+ 2uT v + vT v

≤ uTu+ 2|uT v|+ vT v (by the triangle-inequality)≤ uTu+ 2‖u‖‖v‖+ vT v (by Cauchy-Schwarz)= (‖u‖+ ‖v‖)2.

It follows directly that

Corollary 2‖ · ‖2 is a norm.

§3.6 Condition Numbers (78/103)

Solving linear systems of equations§3.6 Condition Numbers


November 2019

Numerical solutions to some linear systems are adversely affectedby round-off errors.

This phenomenon is related the matrices in the linear systems.Those matrices for which the issue is particularly prevalent arereferred to as being ill-conditioned .

For any matrix, we can assign a numerical score that gives anindication of whether it is ill-conditioned. That score is called thecondition number , and is the subject of these section.

The condition number is defined in terms of matrix norms.

Consistency of matrix norms (79/103)

Suppose we have a vector norm, ‖ · ‖ and associated subordinatematrix norm. It is not hard to see that

‖Au‖ ≤ ‖A‖‖u‖ for any u ∈ Rn, A ∈ Rn×n.

Here is why:

Consistency of matrix norms (80/103)

There is an analogous statement for the product of two matrices:

Definition 3.26 (Consistent matrix norm)A matrix norm ‖ · ‖ is consistent (or “sub-multiplicative” if

‖AB‖ ≤ ‖A‖‖B‖, for all A,B ∈ Rn×n.

Theorem 3.27Any subordinate matrix norm is consistent.

The proof is left to Exercise 3.17. That exercises alsodemonstrates that there are matrix norms which are not consistent.

Computer representation of numbers (81/103)

[Please read this slide in your own time!]

Modern computers don’t store numbers in decimal (base 10), butin binary (base 2) “floating point numbers” of the form :

x = ±a× 2b−M .

Most use double precision, where 8 bytes (64 bits or binary digits)are used to store

the sign (1 bit),a, called the “significand” or “mantissa” (52 bits)and the exponent, b− 1023 (11 bits)

Note that a has roughly 16 decimal digits.

(Some older computer systems sometimes use single precisionwhere a has 23 bits — giving 8 decimal digits — and b has 7; sotoo do many new GPU-based systems).

Computer representation of numbers (82/103)

[OK, you can start reading again!] When we try to store a realnumber x on a computer, we actually store the nearestfloating-point number. That is, we end up storing x+ δx, whereδx is the “round-off” error.

But the quantity we are mainly interested in is the relative error:|δx|/|x|.

Since this is not a course on computer architecture, we’ll simplify alittle and just take it that single and double precision systems leadto a relative error of 10−8 and 10−16 respectively.

Condition Numbers (83/103)

(Sew p68–70 of Suli and Mayers for a thorough development of theconcept of a condition number).

Suppose we use, say, LU -factorization and back-substitution on a computer to solve

Ax = b.

Because of the “round-off error” we actually solve

A(x+ δx) = (b+ δb).

Our problem now is, for a given A, if we know the (relative) errorin b, can we find an upper-bound on the relative error in x?


Definition 3.28The condition number of a matrix, with respect to a particularmatrix norm ‖ · ‖? is

κ?(A) = ‖A‖?‖A−1‖?.

If κ?(A)� 1 then we say A is ill-conditioned.

Example: Find the condition number κ∞ of

A =(

10 120.08 0.1

).


Theorem 3.29

Suppose that A ∈ Rn×n is nonsingular and that b, x ∈ Rn arenon-zero vectors. If Ax = b and A(x+ δx) = (b+ δb) then

‖δx‖‖x‖

≤ κ(A)‖δb‖‖b‖

.

Calculating κ∞ and κ1 (86/103)

Example 3.30

Suppose we are using a computer to solve Ax = b where

A =(

10 120.08 0.1

)and b =

(11

).

But, due to round-off error, right-hand side has a relative error (inthe ∞-norm) of 10−6. Give a bound for the relative error in x inthe ∞-norm.


For every matrix norm we get a different condition number.

Example 3.31Let A be the n× n matrix

A =

1 0 0 . . . 01 1 0 . . . 01 0 1 . . . 0...1 0 0 . . . 1

.

What are κ1(A), and κ∞(A)?

First we compute ‖A‖1 and ‖A∞‖.


For this very special example, it is easy to write down the inverseof A:

A−1 =

1 0 0 . . . 0−1 1 0 . . . 0−1 0 1 . . . 0

...−1 0 0 . . . 1

.

Estimating κ2 (89/103)

To compute κ1(A) and κ∞(A), we need to know A−1, which isusually not practical. However, for κ2, we are able to estimate thecondition number of A without knowing A−1.

Recall that ‖A‖2 =√λn where λn is the largest eigenvalue of

B = ATA.

We can also show that ‖A−1‖2 = 1√λ1

where λ1 is the smallest

eigenvalue of B (see Section 3.6.5 of notes). So

κ2(A) =(λn/λ1

)1/2.

. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

Motivated by this, we’ll finish MA385, by studying an easy way ofestimating the eigenvalues of a matrix.

Exercises (90/103)

Exercise 3.17(i) Prove that, if ‖ · ‖ is a subordinate matrix norm, then it is consistent, i.e., for

any pair of n× n matrices, A and B, we have ‖AB‖ ≤ ‖A‖‖B‖.(ii) One might think it intuitive to define the “max” norm of a matrix as follows:

‖A‖∞

= maxi,j|aij |.

Show that this is indeed a norm on Rn×n. Show that, however, it is notconsistent.

Exercise 3.18Let A be the matrix

A =

(0.1 0 010 0.1 100 0 0.1

)Compute κ∞(A). Suppose we wish to solve the system of equations Ax = b on singleprecision computer system (i.e., the relative error in any stored number isapproximately 10−8). Give an upper bound on the relative error in the computedsolution x.

§3.7 Gerschgorin’s Theorems (91/103)

Solving linear systems of equations§3.7 Gerschgorin’s Theorems


November 2018

There are some extra details posted as an “Appendix” to thissection

Gerschgorin’s theorems (92/103)

The goal of this final section is to learn a technique for estimatingeigenvalues of matrices.

The idea dates from 1931, and is a simple as it is useful. Althoughknown to mathematicians in the USSR, the original paper was notwidely read.

It received main-stream attention in the West following the work ofOlga Taussky (A recurring theorem on determinants, AmericanMathematical Monthly, vol 56, p672–676. 1949.)

See also https://www.math.wisc.edu/hans/paper_archive/other_papers/hs057.pdf

https://www.math.wisc.edu/hans/paper_archive/other_papers/hs057.pdf

https://www.math.wisc.edu/hans/paper_archive/other_papers/hs057.pdf

Gerschgorin’s First Theorem (93/103)

(See Section 5.4 of Suli and Mayers).

Theorem 3.32 (Gerschgorin’s First Theorem)

Given a matrix A ∈ Rn×n, define the n Gerschgorin Discs,D1, D2, . . . , Dn as the discs in the complex plane where Di hascentre aii and radius ri:

ri =n∑

j=1,j 6=i|aij |.

So Di = {z ∈ C : |aii − z| ≤ ri}. All the eigenvalues of A arecontained in the union of the Gerschgorin discs.


Proof.


The proof makes no assumption about A being symmetric, or theeigenvalues being real. However, if A is symmetric, then itseigenvalues are real and so the theorem can be simplified: theeigenvalues of A are contained in the union of the intervalsIi = [aii − ri, aii + ri], for i = 1, . . . , n.

Example 3.33

Let

A =

4 −2 1−2 −3 01 0 2

Gerschgorin’s 2nd Theorem (96/103)

Theorem 3.34 (Gerschgorin’s Second Theorem)

Given a matrix A ∈ Rn×n, let the n Gerschgorin disks be asdefined in Theorem 3.32. If k of discs are disjoint (have an emptyintersection) from the others, their union contains k eigenvalues.

Proof: not covered in class. If interested, see the appendix, or thetextbooks.

Using Gerschgorin’s theorems (97/103)

Example 3.35Locate the regions contains the eigenvalues of

A =

−3 1 21 4 02 0 −6

(The eigenvalues are approximately −7.018, −2.130 and 4.144.)

Using Gerschgorin’s theorems (98/103)

Example 3.36Use Gerschgorin’s Theorems to find an upper and lower bound forthe Singular Values of the matrix

A =

4 −1 22 3 11 1 4

.Hence give an upper bound for κ2(A).

Exercises (99/103)

Exercise 3.20A real matrix A = {ai,j} is Strictly Diagonally Dominant if

|aii| >n∑

j=1,j 6=i

|ai,j | for i = 1, . . . , n.

Show that all strictly diagonally dominant matrices are nonsingular.

Exercise 3.21Let

A =

(4 −1 0−1 4 −10 −1 −3

)Use Gerschgorin’s theorems to give an upper bound for κ2(A).

Appendix Proof of Gerschgorin 1 (100/103)

Proof of Gerschgorin’s First Theorem (Thm 3.32)

Let λ be an eigenvalues of A, so Ax = λx for the correspondingeigenvector x. Suppose that xi is the entry of x with largestabsolute value. That is |xi| = ‖x‖∞. Looking at the ith entry ofthe vector Ax we see that

(Ax)i = λxi =⇒n∑j=1

aijxj = λxi.

This can be rewritten as

aiixi +n∑j=0j 6=i

aijxj = λxi,

which gives

(aii − λ)xi = −n∑j=0j 6=i

aijxj

Appendix Proof of Gerschgorin 1 (101/103)By the triangle inequality,

|aii − λ||xi| = |n∑j=0j 6=i

aijxj | ≤n∑j=0j 6=i

|aij ||xj | ≤ |xi|n∑j=0j 6=i

|aij |,

since |xi| ≥ |xj | for all j. Dividing by |xi| gives

|aii − λ| ≤n∑j=0j 6=i

|aij |,

as required.

Appendix Proof of Gerschgorin 2 (102/103)

Proof of Gerschgorin’s 2nd Thm (Thm 3.34) We didn’t do theproof in class, and you are not expected to know it. Here is asketch of it.Let B(ε) be the matrix with entries

bij ={aij i = j

εaij i 6= j.

So B(1) = B and B(0) is the diagonal matrix whose entries arethe diagonal entries of A.

Each of the eigenvalues of B(0) correspond to its diagonal entriesand (obviously) coincide with the Gerschgorin discs of B(0) – thecentres of the Gerschgorin discs of A.

The eigenvalues of B are the zeros of the characteristic polynomialdet(B(ε)− λI) of B. Since the coefficients of this polynomialdepend continuously on ε, so too do the eigenvalues.

Appendix Proof of Gerschgorin 2 (103/103)Now as ε varies from 0 to 1, the eigenvalues of B(ε) trace a pathin the complex plane, and at the same time the radii of theGerschgorin discs of A increase from 0 to the radii of the discs ofA. If a particular eigenvalue was in a certain disc for ε = 0, thecorresponding eigenvalue is in the corresponding disc for all ε.

Thus if one of the discs of A is disjoint from the others, it mustcontain an eigenvalue.

The same reasoning applies if k of the discs of A are disjoint fromthe others; their union must contain k eigenvalues.

- chap. 3: numerical linear algebra - maths.nuigalway.ieniall/ma385/3-linearalgebra.pdf · chap. 3:...

Documents