math5316 lecture notes - runge.math.smu.edu

MATH5316 Lecture NotesDaniel R. Reynolds, SMU MathematicsSpring 2019

Chapter 5 – Eigenvalues and Eigenvectors

5.1 Introduction

This chapter focuses on a well-examined topic from our introductory linear algebra class– eigenvalues and eigenvectors. We will recall the basic definitions and properties coveredin that previous course, but most of our time will be spent on algorithms that can beused to compute eigenvalues and eigenvectors of a matrix.

Definition 5.1.1 (Eigenvector, Eigenvalue, Eigenpair). Given A ∈ �n×n, a nonzerovector v ∈ �n is called an eigenvector of A if ∃λ ∈ � such that

Av = λv.

The value λ is the eigenvalue corresponding to v. We refer to the pair (v, λ) as an eigenpairof A.

Notes:

• There are no restrictions on the value of λ (zero is allowed), although v must benonzero.

• If v is an eigenvector of A, then for any nonzero α ∈ �, the vector αv is also aneigenvector of A:

A(αv) = α(Av) = α(λv) = λ(αv),

so it is typical to normalize the eigenvectors to achieve some degree of uniqueness(although these still are not unique, since both v

‖v‖ and − v‖v‖ are unit vectors parallel

to v).

Definition 5.1.2 (Spectrum). The set of all eigenvalues of A ∈ �n×n is called thespectrum of A. For a n × n matrix, its spectrum contains n (possibly repeated) values.This is often denoted

λ(A) = {λ1, λ2, . . . , λn}.

Theorem 5.1.3. λ ∈ � is an eigenvalue of A if and only if det (λI − A) = 0.

Proof. Let λ ∈ � be an eigenvalue of A, i.e., for some v 6= 0,

Av = λv ⇔ λv − Av = 0 ⇔ (λI − A)v = 0.

MATH5316 Lecture Notes Chapter 5 – Eigenvalues and Eigenvectors

Since v 6= 0, then v ∈ null(λI−A), and hence (λI−A) is singular. Hence det(λI−A) = 0.

For the reverse direction, let det(λI − A) = 0. Then (λI − A) is singular, and hence∃v ∈ �n, v 6= 0, such that

(λI − A)v = 0 ⇔ Av = λv,

so λ is an eigenvalue of A.

Definition 5.1.4 (Characteristic equation, Characteristic polynomial). Given A ∈ �n×n,we call

det(λI − A) = 0

the characteristic equation of A.

The function p(λ) = det(λI−A) arising in this equation is a polynomial of degree at mostn, and is referred to as the characteristic polynomial of A.

Notes:

• The “Fundamental Theorem of Algebra” therefore guarantees that p(λ) has exactlyn roots (possibly complex, possibly repeated).

• Even if A ∈ �n×n, λ(A) may contain complex eigenvalues; however in this casethey must appear in complex conjugate pairs, i.e., if λ = α+ iβ is an eigenvalue ofA ∈ �n×n, then λ = α− iβ must also be an eigenvalue of A.

• If A ∈ �n×n has eigenvalue λk ∈ �, then the eigenvector associated with λk is alsoreal (modulo scaling by a complex number), since the equation

(λI − A)v = 0

has a real-valued matrix and right-hand side vector.

Theorem 5.1.5. Let T ∈ �n×n be triangular (upper, lower, or diagonal). Then λ(T ) ={t11, t22, . . . , tnn}.

Proof. The matrix λI − T is also triangular, with diagonal entries (λ− tii). The deter-minant of a triangular matrix is the product of the diagonal values, so

0 = det(λI − T ) =n∏i=1

(λ− tii)

Theorem 5.1.6. Let A ∈ �n×n be a block triangular matrix, e.g.

A =

A11 A12 · · · A1m

A22 · · · A2m

. . ....

Amm

.D.R. Reynolds, SMU Mathematics 105


Then λ(A) = λ(A11) ∪ λ(A22) ∪ · · · ∪ λ(Amm).

Proof. Since λI − A is block triangular, then

det(λI − A) = det(λI − A11) det(λI − A22) · · · det(λI − Amm).

Theorem 5.1.7. Let v1, v2, . . . , vk be eigenvectors of A associated with distinct eigenval-ues λ1, λ2, . . . , λk. Then {v1, v2, . . . , vk} are linearly independent.

(proof for homework/exams)

Definition 5.1.8 (Semisimple, defective, nondefective). A matrix A ∈ �n×n is calledsemisimple, or nondefective, if it has n linearly independent eigenvectors. Alternately, ifA’s eigenvectors are linearly dependent, then A is called defective.

Note: combining the previous Theorem and Definition, we see that if A ∈ �n×n has ndistinct eigenvalues, then A is nondefective.

Definition 5.1.9 (Algebraic/Geometric Multiplicity). The algebraic multiplicity of λk ∈λ(A) is the multiplicity of λk as a root of det(λI − A) = 0.

The geometric multiplicity of λk ∈ λ(A) is the dimension of null(λI − A).

Reminders from introductory linear algebra:

• The geometric multiplicity of an eigenvalue never exceeds its algebraic multiplicity:

1 ≤ geometric multiplicity of λk ≤ algebraic multiplicity of λk

• If the geometric multiplicity of λk does not equal the algebraic multiplicity of λk forany eigenvalue λk ∈ λ(A), then A is defective.

5.1.1 Correlation between eigenvalue problems and polynomialequations

Suppose that q(λ) = b0 + b1λ + · · · + bnλn, with bn 6= 0. Then q(λ) has degree n with n

roots.

Consider the monic polynomial p(λ) = a0 + a1λ + · · · + anλn with ai = bi

bn. Then an = 1

(hence “monic”), and p(λ) has the same roots as q(λ).

In other words, any study of polynomial roots may be considered for only the moniccase.

D.R. Reynolds, SMU Mathematics 106


Given a monic polynomial, p(λ) = a0 + a1λ + · · · + an−1λn−1 + λn, we may consider the

companion matrix

A =

−an−1 −an−2 · · · −a1 −a0

11

. . .

1 0

∈ �n×n,

i.e., A is an identity matrix with the last row removed, and the first row replaced byai.

Theorem 5.1.10. Let p(λ) = a0 + a1λ + · · · + 1λn, with companion matrix A ∈ �n×n.Then det(λI − A) = p(λ), and roots of p(λ) = 0 are eigenvalues of A.


What does this mean?

There is no general formula for the roots of a polynomial with degree ≥ 4 (n = 2 has thequadratic formula, n = 3 has Cardano’s formula, but that’s it!). Instead, all methods forfinding roots of polynomials with degree ≥ 4 are iterative. Hence, the problem of findingλ(A) for a given A ∈ �n×n with n ≥ 4 requires an iterative algorithm (e.g., Newton’smethod).

Iterative methods for the eigenvalue/eigenvector problem create a sequence of vectors,q1, q2, . . . in the hopes that qj → v as j →∞, where v is an eigenvector of A.

Questions:

(a) Under what conditions is convergence guaranteed?

(b) How rapidy does the iteration converge?

(c) How can we measure when we’re “done”?

(d) How difficult is it to generate qj+1 from {q1, . . . , qj}?

In our “introduction” notes at the start of the semester we discussed vector convergence:the sequence {qj}∞j=1 converges to a vector v iff ‖qj − v‖ → 0 as j →∞.

• We can measure this in any norm, since in all norms ‖x‖ = 0⇔ x = 0.

• However, we stop the iteration before ‖qj−v‖ = 0; we instead stop when ‖qj−v‖ <tolerance, so the choice of norm will affect the value of “tolerance” we would like touse.



5.2 The Power Method (and Simple Extensions)

Let A ∈ �n×n, and let {(vk, λk)}nk=1 be the eigenpairs of A. For now, assume thatall eigenvalues of A are distinct (and hence all eigenvectors are linearly independent).WLOG, assume that the eigenvalues are sorted by magnitude:

|λ1| ≥ |λ2| ≥ · · · ≥ |λn|.

Then if |λ1| > |λ2|, we call λ1 the dominant eigenvalue of A, with corresponding dominanteigenvector v1.

The power method seeks (λ1, v1) via computing powers of A. Given some guess q ∈ �n,we compute

q, Aq, A2q, . . .

In fact, we compute these one at a time, since Akq = A(Ak−1q

)= · · · .

5.2.1 Convergence

Since the eigenvectors v1, . . . , vn are linearly independent, then they must span �n, soq = c1v1 + · · · + cnvn for constants c1, . . . , cn. Assume that c1 6= 0 (if otherwise, we mayinstead choose a different q). Then

Aq = A (c1v1 + · · ·+ cnvn)

= c1(Av1) + · · ·+ cn(Avn)

= c1λ1v1 + · · ·+ cnλnvn

A2q = c1λ21v1 + · · ·+ cnλ

2nvn

...

Ajq = c1λj1v1 + · · ·+ cnλ

jnvn

= λj1

(c1v1 + c2

(λ2λ1

)jv2 + · · ·+ cn

(λnλ1

)jvn

).

Since |λ1| > |λ2| ≥ · · · ≥ |λn|, then each term∣∣∣λkλ1 ∣∣∣ < 1 for k > 1, so as j →∞,

(λkλ1

)j→ 0

as j →∞, leaving onlylimj→∞

Ajq = limj→∞

c1λj1v1,

which can go to ∞ if |λ1| > 1.

However, we only care about v1, and not the constant in front of it. So we may modifythis algorithm to instead compute:

qj =Ajq

λj1,



in which case limj→∞ qj = c1v1, since

‖qj − c1v1‖ =

∥∥∥∥∥c2(λ2λ1

)jv2 + · · ·+ cn

(λnλ1

)jvn

∥∥∥∥∥≤ |c2|

∣∣∣∣λ2λ1∣∣∣∣j ‖v2‖+ · · ·+ |cn|

∣∣∣∣λnλ1∣∣∣∣j ‖vn‖

≤∣∣∣∣λ2λ1∣∣∣∣j (|c2| ‖v2‖+ · · ·+ |cn| ‖vn‖) ,

where we have used the fact that |λ2| ≥ · · · ≥ |λn|. Hence if we define

C = |c2| ‖v2‖+ · · ·+ |cn| ‖vn‖

then

‖qj − c1v1‖ ≤ C

∣∣∣∣λ2λ1∣∣∣∣j .

Since∣∣∣λ2λ1 ∣∣∣ < 1, this will therefore converge. Moreover, we have the recursion

‖qj − c1v1‖ ≤∣∣∣∣λ2λ1∣∣∣∣ ‖qj−1 − c1v1‖,

so we see that qj → c1v1 linearly, with rate∣∣∣λ2λ1 ∣∣∣ < 1.

Unfortunately, we cannot compute qj = Ajq

λj1since we do not know λ1.

So, the algorithm instead proceeds as

qj+1 =Aqjsj+1

,

where sj+1 is a scaling factor. Typically, this is chosen to be sj+1 = ±‖Aqj‖∞, the largestvalue of any entry in (Aqj). The sign is chosen to match the largest entry itself, so thateventually we’ll converge to a vector qj with largest entry identically equal to 1.

5.2.2 Cost

Each iteration of the power method requires one matrix-vector product (2n2) and onenormalization (O(n)), so the overall cost is dominated by matrix-vector products. Hencefor m iterations of the power method, the cost is approximately 2mn2 operations.

Note: this cost can be significantly cheaper if A is sparse.

However, if∣∣∣λ1λ2 ∣∣∣ is close to 1, then m may need to be very large to capture an eigenvector

to any reasonable accuracy.



5.2.3 The eigenvalue

As the algorithm converges, we should eventually obtain

Aq = λ1q,

so in the algorithm the values sj → λ while qj → v.

Example 5.2.1. The most well-known example of the power iteration is the Google ma-trix! There, the dominant eigenvector is used to rank the importance of webpages, and∣∣∣λ1λ2 ∣∣∣ ≈ 0.85, so it doesn’t require many iterations to obtain a “decent” eigenvector v1.

5.2.4 Inverse Iteration

The first extension to the power method that we’ll investigate is called the inverse itera-tion.

Again, assume that A ∈ �n×n is semisimple, with

|λ1| ≥ |λ2| ≥ · · · ≥ |λn|.

Theorem 5.2.2. If A ∈ �n×n is nonsingular, then λn 6= 0, and A−1 has the sameeigenvectors as A, with eigenvalues λ−1k .

Proof. Since A is nonsingular, then

0 6= det(A) = det(0I − A),

so 0 cannot be an eigenvalue of A.

The eigenpairs (vk, λk) of A satisfy Avk = λkvk. So since λk 6= 0 and A is nonsingular,then

Avk = λkvk

⇔vk = A−1λkvk

⇔λ−1k vk = A−1vk,

which shows that(vk, λ

−1k

)is an eigenpair of A−1.

While the eigenvalues of A were sorted from largest-to-smallest,

|λ1| ≥ |λ2| ≥ · · · ≥ |λn|.



the corresponding eigenvalues of A−1 are sorted from smallest-to-largest:

|λ−11 | ≤ |λ−12 | ≤ · · · ≤ |λ−1n |,

so the power method applied to A−1 will converge to the eigenpair (vn, λ−1n ) linearly, with

rate ∣∣∣∣λ−1n−1λ−1n

∣∣∣∣ =

∣∣∣∣ λnλn−1

∣∣∣∣ .However, unlike the power method, each iteration will now require a solve instead of amultiply, since

y = A−1x ⇔ Ay = x.

To this end, we always solve the linear system Ay = x and NEVER compute the matrixA−1 (although perhaps we will factor A = P TLU to streamline the solves).

Theorem 5.2.3. Let A ∈ �n×n and ρ ∈ �. Then if (v, λ) is an eigenpair of A, (v, λ− ρ)is an eigenpair of (A− ρI).

Proof.(A− ρI)v = Av − ρv = λv − ρv = (λ− ρ)v.

Notes:

• If λ(A) = {λ1, . . . , λn} then λ(A− ρI) = {λ1 − ρ, . . . , λn − ρ}.

• If λ(A) = {λ1, . . . , λn} then λ ((A− ρI)−1) = {(λ1 − ρ)−1, . . . , (λn − ρ)−1}. Thelargest of these will correspond to the λk closest to ρ.

• We call ρ the shift.

• If ρ ≈ λk (but it is not exact), then |λk − ρ| � |λi − ρ|, ∀i 6= k. More specifically,if λl is the second-closest eigenvalue to ρ, we’ll have |λk − ρ| � |λl − ρ|, so if we dothe power iteration with matrix (A− ρI)−1, it will converge to (λk − ρ)−1 linearlywith rate ∣∣∣∣λk − ρλl − ρ

∣∣∣∣� 1.

• The resulting algorithm is called the shift-and-invert strategy.

To perform shift-invert iterations, we do not multiply by (A− ρI)−1 since that wouldincur extraneous cost. Instead, since

y = (A− ρI)−1 x ⇔ (A− ρI) y = x,

we

(a) solve (A− ρI) qj+1 = qj for qj+1, and

(b) update qj+1 =qj+1

sj+1, where sj+1 is the largest component of qj+1 (i.e., ±‖qj+1‖∞).



If step (a) above uses Gaussian Elimination, we may precompute P , L and U such thatP (A − ρI) = LU , and instead solve with P , L and U at each iteration (i.e. O(n2) costper iteration).

5.2.5 Rayleigh Quotient

Question: why do we need to use the same shift ρ at each iteration?

• Reuse of the same shift allows reuse of an existing LU factorization, but

• convergence accelerates when ρ is closer to λk, so updating the shift ρ may behelpful.

Suppose that q ∈ �n is an approximation of an eigenvector of A. If q were exactly aneigenvector, then Aq = λq, but if q has some error, then Aq 6= ρq for any constant ρ.

Think of Aq = ρq as an overdetermined linear system based on only one unknown, ρ.Let r = Aq − ρq. Then we can find the ρ such that ρq “best approximates” Aq, i.e., r isminimized. Consider the normal equations (now for complex matrices), for the problemqρ = Aq:

• q is now our “matrix”

• Aq is our “right-hand side”

• ρ is our “solution vector”

Hence,

(q∗q)ρ = q∗Aq ⇔ ρ =q∗Aq

q∗q.

This value, ρ = q∗Aqq∗q

, is called the Rayleigh quotient of q with respect to A.

Theorem 5.2.4. Let A ∈ �n×n and q ∈ �n with q 6= 0. Then the unique complex numberthat minimizes ‖Aq − ρq‖2 is the Rayleigh quotient, ρ = q∗Aq

q∗q.

Note: if ‖q‖2 = 1, the Rayleigh quotient equals q∗Aq.

Theorem 5.2.5. Let A ∈ �n×n with (v, λ) ∈ �n × � an eigenpair of A, and assume that‖v‖2 = 1. Let q ∈ �n with ‖q‖2 = 1, and let ρ = q∗Aq. Then

|λ− ρ| ≤ 2‖A‖2 ‖v − q‖2.

Proof. Since Av = λv and ‖v‖2 = 1, then λ = v∗Av. Hence

λ− ρ = v∗Av − q∗Aq [def. ρ and formula for λ]

= v∗Av − v∗Aq + v∗Aq − q∗Aq [add 0]

= v∗A(v − q) + (v − q)∗Aq [reorganizing],



so by the triangle inequality,

|λ− ρ| ≤ |v∗A(v − q)|+ |(v − q)∗Aq|.

by the Cauchy-Schwarz inequality, ‖v‖2 = 1, and properties of the induced matrix norm,

|v∗A(v − q)| ≤ ‖v‖2 ‖A(v − q)‖2 ≤ ‖A‖2 ‖v − q‖2.

Similarly, since ‖q‖2 = 1, |z∗| = |z| and ‖A∗‖2 = ‖A‖2,

|(v − q)∗Aq| = | (A∗(v − q))∗ q| ≤ ‖A∗(v − q)‖2 ‖q‖2 ≤ ‖A‖2 ‖v − q‖2.

Combining these results, we have

|λ− ρ| ≤ |v∗A(v − q)|+ |(v − q)∗Aq| ≤ 2‖A‖2 ‖v − q‖2

5.2.5.1 Rayleigh Quotient Iteration

We may therefore use the Rayleight quotient to adjust our shift at each iteration ofour previous method. So unlike our previous linearly-convergent Inverse Iteration, theso-called Rayleigh-Quotient Iteration (RQI):

• Computes

ρj =q∗jAqj

q∗j qj

• Solves (A− ρjI)qj+1 = qj

• Updates qj+1 =qj+1

sj+1, where sj+1 is the largest component of qj+1.

Although convergence analysis for this method is much more difficult than earlier, whenit does converge that convergence is quadratic! However, due to the changing shift ρj, wecannot reuse any LU or Cholesky factorization between iterations.

However, we may perform a “crude” convergence analysis for RQI, as follows.

Let {qj} be the sequence of vectors obtained via RQI, and assume for simplicity that‖qj‖2 = 1 for all j. Suppose also that qj → vi as j → ∞, for some eigenvector vi of Awith ‖vi‖2 = 1. Assume that λi has algebraic multiplicty one, and let λk be the closesteigenvalue to λi with k 6= i. Then the jth step of RQI consists of shift-invert with thematrix (A− ρjI)−1, so

‖vi − qj+1‖2 ≈ rj‖vi − qj‖2, (1)



where rj is the ratio of the two largest eigenvalues of (A−ρjI)−1. From our earlier theorem,ρj → λi, so once ρj is “close enough” to λi, the two largest eigenvalues of (A − ρjI)−1

should be (λi − ρj)−1 and (λk − ρj)−1. Hence,

rj =

∣∣∣∣(λk − ρj)−1(λi − ρj)−1

∣∣∣∣ =

∣∣∣∣ λi − ρjλk − ρj

∣∣∣∣ .However, from our last Theorem,

|λi − ρj| ≤ 2‖A‖2‖vi − qj‖2,

and since ρj ≈ λi, we may approximate |λk−ρj| ≈ |λk−λi|. Plugging this into our earlierresult, we have

rj /2‖A‖2 ‖vi − qj‖2|λk − λi|

= C‖vi − qj‖2.

Inserting this back into (1), we have

‖vi − qj+1‖2 / C‖vi − qj‖22,

and hence giving quadratic convergence.

5.2.5.2 Hessenberg matrices

Definition 5.2.6 (upper Hessenberg). A matrix A ∈ �n×n is called upper Hessenberg ifai,j = 0 ∀i > j + 1, i.e., it is upper triangular with one additional subdiagonal:

# # # · · · # ## # # · · · # #

# # · · · # #. . . . . .

......

# # ## #

Notes:

• The PA = LU factorization for an upper Hessenberg matrix is cheap (∼ n2 flops),and the resulting L is bidiagonal with ones on the main diagonal.

• The A = QR factorization for an upper Hessenberg matrix can be performed withonly n− 1 rotators (O(n2) flops) instead of the usual O(n2) rotators (O(n3) flops).

So although the RQI can be costly in general due to the changing shift (A− ρjI), if A isupper Hessenberg then this added cost is minimal.

This realization led to the “workhorse” eigenvalue calculation algorithm, Francis’ Al-gorithm (the topic of Section 5.6). In the next few sections, we’ll learn the requisiteinformation to transform a general A ∈ �n×n to an upper Hessenberg matrix, therebyfacilitation use of RQI to “finish off” the eigenvalue calculations.



5.3 Similarity Transformations

Definition 5.3.1 (similar matrices). We call two matrices A,B ∈ �n×n similar if ∃S ∈�n×n, S nonsingular, such that

B = S−1AS ⇔ A = SBS−1 ⇔ AS = SB.

The above equations are called similarity transformations, where S is the transformingmatrix.

Theorem 5.3.2. Similar matrices have the same eigenvalues.

Proof. Let B = S−1AS. Then

λI −B = S−1λIS − S−1AS [mult. left/right by S−1 and S, resp.]

= S−1 (λI − A)S [factor].

so

det(λI −B) = det(S−1 (λI − A)S

)[insert above]

= det(S−1

)det (λI − A) det (S) [det product]

= det (λI − A) [det(S−1

)= det (S)−1].

Thus both A and B have the same characteristic polynomial, and hence they have thesame eigenvalues.

Theorem 5.3.3. Suppose B = S−1AS. Then v is an eigenvector of A with eigenvalue λiff S−1v is an eigenvector of B with eigenvalue λ.

Proof. Let Av = λv. Then

B(S−1v

)= S−1ASS−1v [def. B]

= S−1Av [SS−1v = Iv = v]

= S−1λv [(v, λ) eigenpair of A]

= λ(S−1v

)[move scalar λ].

Similarly, suppose that B (S−1v) = λ (S−1v). Then

Av = SBS−1v [def. similar matrices]

= SλS−1v [B(S−1v

)= λ

(S−1v

)]

= λSS−1v [move scalar λ]

= λv [SS−1v = Iv = v].



Theorem 5.3.4. Let A ∈ �n×n be a semisimple matrix with linearly independent eigen-vectors v1, v2, . . . , vn and eigenvalues λ1, λ2, . . . , λn. Define D = diag (λ1, . . . , λn), andV =

[v1 · · · vn

]. Then V −1AV = D. Conversely, suppose A satisfies V −1AV = D

where D is diagonal and V nonsingular. Then the columns of V are n linearly inde-pendent eigenvectors of A, with eigenvalues the diagonal entries of D, and hence A issemisimple.

(proof is covered in Intro. class. It is straightforward, especially when using V −1AV =D ⇔ AV = V D)

This theorem says that we can completely solve the eigenvalue problem if we can find asimilarity transform that converts A to diagonal (called the diagonalization of A).

Recall unitary matrices, U ∈ �n×n, that satisfy the definition U∗U = UU∗ = I. Thesesatisfied the properties:

(a) If U , V are unitary, then UV is unitary.

(b) If U is unitary, then U−1 is unitary.

(c) If U is unitary, then 〈Ux, Uy, =〉〈x, y〉 and ‖Ux‖2 = ‖x‖2 ∀x, y ∈ �n.

(d) We may construct complex analogues of rotators and reflectors that are unitary.

(e) If A ∈ �n×n, then A = QR with Q unitary and R upper triangular.

(f) U is unitary iff the columns of U are orthonormal in the complex inner product.

(these properties result from exercises 3.2.50 through 3.2.58 in the book).

Definition 5.3.5 (Unitarily similar). We say that A,B ∈ �n×n are unitarily similar if ∃a unitary matrix U ∈ �n×n such that B = U−1AU = U∗AU . If A,B, U are all real-valued,then U is orthogonal, and A,B are called orthogonally similar.

Theorem 5.3.6. If A = A∗ (i.e., A is Hermitian), and A is unitarily similar to B, thenB = B∗.

Proof.

B∗ = (U∗AU)∗ [def. unitarily similar]

= U∗A∗ (U∗)∗ [distribute ∗]

= U∗A∗U [(U∗)∗ = U ]

= U∗AU [A Hermitian]

= B [def. B].



Theorem 5.3.7 (Schur’s theorem). Let A ∈ �n×n. Then ∃ a unitary U ∈ �n×n andupper triangular T ∈ �n×n such that T = U∗AU ⇔ A = UTU∗.

This is called the Schur decomposition of A.

Proof. (induction on n):Base case (n = 1): U =

[1], T = A.

Inductive step: assume that the result holds for all matrices of size n− 1. Let A ∈ �n×nwith eigenpair (v, λ) such that ‖v‖2 = 1. Let U1 ∈ �n×n be unitary, such that U1 =[v W

]. Since U1 is unitary, it has orthonormal columns, so W ∗v = 0. Let

A1 = U∗1AU1 =

[v∗

W ∗

]A[v W

]=

[v∗Av v∗AWW ∗Av W ∗AW

].

Since Av = λv and ‖v‖2 = 1, then v∗Av = λ, and W ∗Av = 0. Let A = W ∗AW andz∗ = v∗AW . Then

A1 =

[λ z∗

0 A

].

Since A ∈ �(n−1)×(n−1), then A = U2T U∗2 where U2 is unitary and T is upper triangular,

with T = U∗2 AU2. Let

U2 =

[1 0

0 U2

].

Then U2 is unitary, and

U∗2A1U2 =

[λ z∗U2

0 U∗2 AU2

]=

[λ z∗U2

0 T

]= T.

So, defining U = U1U2, we have

T = U∗2A1U2 = U∗2U∗1AU1U2 = U∗AU.

Notes:

• The diagonal values of T are the eigenvalues of A.

• The first column of U is the eigenvector of A corresponding to t1,1. Ther othercolumns of U are not generally eigenvectors of A.

• Schur’s theorem holds for all matrices, not just semisimple ones, so it is less elegantbut more broadly applicable.

• Schur’s theorem uses unitary transformations, which as before are numerically well-conditioned.



Theorem 5.3.8 (Spectral theorem for Hermitian matrices). Let A ∈ �n×n be Hermitian.Then ∃ a unitary U ∈ �n×n and diagonal D ∈ �n×n such that D = U∗AU ⇔ A =UDU∗. The columns of U are orthonormal eigenvectors of A, while the diagonals of Uare eigenvalues.

Proof. Combine the last two theorems. Since A is unitarily similar to T , and A is Her-mitian, then T ∗ = T , which means that T is diagonal and real-valued!

Corollary 5.3.9. The eigenvalues of a Hermitian matrix are real (including real, sym-metric matrices).

Corollary 5.3.10. Every Hermitian matrix in Cn×n has a set of n orthonormal eigen-vectors, i.e., every Hermitian matrix is semisimple.

Theorem 5.3.11. Let A ∈ �n×n be Hermitian, with eigenpairs (v, λ) and (w, µ), whereλ 6= µ. Then v and w must be orthogonal.

Proof. Suppose Av = λv and Aw = µw with λ 6= µ. Then

λw∗v = w∗Av [Av = λv]

= w∗A∗v [A Hermitian]

= (Aw)∗ v [distribute ∗]

= µw∗v [Aw = µw].

Hence0 = λw∗v − µw∗v = (λ− µ)w∗v.

Since λ 6= µ, then w∗v = 0.

There are other classes of matrices that arise frequently in eigenvalue applications:

• Skew Hermitian: A∗ = −A (similarly, skew-symmetric are AT = −A).

• Normal: A∗A = AA∗. Note that normal matrices include Hermitian, skew Hermi-tian, and even unitary matrices.

Theorem 5.3.12 (Spectral theorem for Normal matrices). Let A ∈ �n×n. Then A isnormal iff ∃ a unitary U ∈ �n×n and diagonal D ∈ �n×n such that D = U∗AU ⇔A = UDU∗.


Corollary 5.3.13. (a) Every normal matrix A ∈ �n×n has a set of n orthonormaleigenvectors.

(b) If A ∈ �n×n has a set of n orthonormal eigenvectors, then A is normal.

(c) Every normal matrix is semisimple.



(d) Every skew-Hermitian matrix has a set of n orthonormal eigenvectors and purelyimaginary eigenvalues.

We now return back to real-valued matrices, A ∈ �n×n. Although, in general, eigenpairs(v, λ) of A may be complex for real-valued matrices, there is one special case whereeverything is “nice”.

Theorem 5.3.14 (Spectral theorem for symmetric matrices). Let A ∈ �n×n be symmet-ric. Then ∃ an orthogonal U ∈ �n×n and diagonal D ∈ �n×n such that D = UTAU ⇔A = UDUT .

(proof is similar to Schur’s theorem, using induction on n)

Corollary 5.3.15. Let A ∈ �n×n be symmetric. Then A has a set of n real, orthonormaleigenvectors (and A is semisimple).

Definition 5.3.16. A matrix T ∈ �n×n is called quasi-triangular if it has block-triangularform, i.e.,

T =

T1,1 T1,2 · · · T1,m

T2,2 · · · T2,m. . .

...Tm,m

,where each diagonal block is either 1 × 1 or 2 × 2, and each 2 × 2 block has a complexconjugate eigenvalue pair.

Theorem 5.3.17 (Wintner-Murnaghan). Let A ∈ �n×n. Then ∃ orthogonal U ∈ �n×nand quasi-triangular T ∈ �n×n such that T = UTAU ⇔ A = UTUT .

(proof is similar to Schur’s theorem, using strong induction on n, and where the dimensionmust be reduced by two whenever we encounter a complex conjugate eigenvalue pair.)

Under this “quasi-triangular” similarity transformation, every 1 × 1 diagonal block of Tcontains an eigenvalue of A, and we may find the complex conjugate eigenvalue pairs fromevery 2× 2 diagonal block of T by applying the quadratic formula.

5.4 Reduction to Hessenberg and Tridiagonal Form

As is clear from the previous section, if we can find a similarity transformation B = S−1ASsuch that B has “nicer” structure than A, then we can instead solve the eigenprob-lem

Bwi = λiwi, i = 1, . . . , n



and obtain the eigen-decomposition of A since the eigenvalues λi are the same for both Aand B, and the eigenvectors may be easily computed via

vi = Swi, i = 1, . . . , n.

Recall, a matrix A ∈ �n×n is upper Hessenberg iff ai,j = 0 ∀i > j + 1. We’ll deduce an

algorithm that performs this portion of the process in ∼ 103n3 flops.

Note: if A is Hermitian, then since unitary similarity transformations retain Hermitianstructure, then if B = Q∗AQ it will be tridiagonal (not just upper Hessenberg):

• if a matrix is tridiagonal, we can do the LU factorization (without pivoting) in O(n)flops,

• if a matrix is tridiagonal, we can do the QR factorization in O(n) flops, as long aswe only store the parts to compute Q and do not compute Q explicitly.

The algorithm is relatively simple – do something like the QR factorization using House-holder reflectors, but only go to upper Hessenberg form instead of upper triangular.

Let A =

[a1,1 c∗

b A

], and let Q1 be a reflector (real or complex) such that Q1b =

−τ1

0...0

with |τ1| = ‖b‖2, and let Q1 =

[1 0T

0 Q1

]. Define A1/2 = Q1A =

a1,1 c∗

−τ10...0

Q1A

. Since

reflectors are Hermitian and unitary, Q−11 = Q∗1 = Q1, so multiplying on the right by Q1

will finish off the first step of our similarity transformation:

A1 = Q1AQ1 = A1/2Q1 =

a1,1 c∗Q1

−τ10...0

Q1AQ1

=

a1,1 d∗

−τ10...0

A1

.

Note that since Q1 has first column equal to e1 and first row equal to eT1 , it leaves thefirst column of A1/2 alone, so A1 has the correct structure! If we had tried for full uppertriangular structure, it would have messed up the first column.

The next step is like the first: zero out the values below the first subdiagonal of A1. Let



Q2 ∈ �(n−2)×(n−2) be this reflector, then

Q2 =

1 0 0 · · · 00 1 0 · · · 00...0

0...0

Q2

, and A3/2 = Q2A2 =

a1,1 # # · · · #−τ1 # # · · · #

00...0

−τ20...0

Q2A2

,

and then define

A2 = Q2A2Q2 = A3/2Q2 =

a1,1 # # · · · #−τ1 # # · · · #

00...0

−τ20...0

Q2A2Q2

,

and repeat. When finished, Q = Q1Q2 · · ·Qn−2, or equivalently, Q∗ = Qn−2Qn−3 · · ·Q1.

Note: if A ∈ �n×n, then all of the Q’s are real and orthogonal, so B = QTAQ isorthogonally similar to A.

An algorithm that does this for real numbers, and that only stores the parts to make Q,is on page 354 of the book.

5.4.1 The Hermitian case (tridiagonal form)

Now assume that A ∈ �n×n and A = A∗. Then as before we may write

A =

[a1,1 b∗

b A

]and

A1 =

[a1,1 b∗Q1

Q1b Q1AQ1

]=

a1,1 −τ1 0 · · · 0−τ1

0...0

A1

since (Q1b)

∗ = b∗Q1, and where A1 = Q1AQ1 is still Hermitian.

The costliest part of this algorithm is the calculation of A1 = Q1AQ1. We may leveragesymmetry to perform this calculation. Let Q1 = I−γuu∗ for the “correct” u. Then

A1 = (I − γuu∗)A(I − γuu∗)= (I − γuu∗)(A− γAuu∗)= A− γAuu∗ − γuu∗A+ γ2uu∗Auu∗.



Let v = −γAu. Then

−γAuu∗ = vu∗, −γuu∗A = uv∗, and γ2uu∗Auu∗ = −γuu∗vu∗.

So if we define α = −12γu∗v, then

−γuu∗vu∗ = 2αuu∗

and hence,A1 = A+ vu∗ + uv∗ + 2αuu∗.

If we then define w = v + αu, then

A1 = A+ wu∗ + uw∗,

sincevu∗ + uv∗ + 2αuu∗ = (vu∗ + αuu∗) + (uv∗ + αuu∗) = wu∗ + uw∗.

So now, the update for A1 has been reduced to

(i) v = −γAu [one matvec]

(ii) α = −12γu∗v [one inner product]

(iii) w = v + αu [one vector sum]

(iv) A+ wu∗ + uw∗ [two outer products]

for a total of ∼ 4n2 flops.

Subsequent steps are similar since the submatrices continue to be Hermitian, except thatsince these work on ever-smaller submatrices the total work is only around

4n2 + 4(n− 1)2 + 4(n− 2)2 + · · ·+ 4 ≈ 4

3n3.

5.5 Francis’s Algorithm

Let A ∈ �n×n, and let A0 = Q∗AQ be its reduction to upper Hessenberg form (accordingto previous section).

Francis’s algorithm constructs a sequence of matrices {Ak}∞k=1, all of which are unitarilysimilar to A, that [hopefully] converge to upper-triangular form (or in real arithmetic, toquasi-triangular form).

We describe a single iteration, Ak → Ak+1, and for simplicity we will denote A = Ak andA = Ak+1.

We call an upper Hessenberg matrix with all subdiagonal entries nonzero, i.e. aj+1,j 6=0 ∀j = 1, . . . , n− 1, a proper Hessenberg matrix. If any subdiagonal entries are zero, we

may rewrite A in block form as

[A11 A12

0 A22

], with A11 and A22 proper Hessenberg, and we

may then work on A11 and A22 separately instead of having to work on all of A.



5.5.1 Francis iteration of degree one (single-shift implicit QR)

Assume that A is properly Hessenberg. Choose a shift ρ. Then A − ρI has first column

p =

a11 − ρa210...0

. Let Q0 be a rotator such that Q∗0p =

#0...0

, i.e., Q0 zeros out a21.

Then using the similarity transformation A→ Q∗0AQ0, we will:

(a) do A → Q∗0A, that only modifies rows 1 and 2 (leaving Hessenberg form intact),and then

(b) do Q∗0A→ Q∗0AQ0, that modifies columns 1 and 2. This disturbs Hessenberg form,

placing a nonzero in the (3, 1) position, e.g.,

# # # · · · # ## # # · · · # #+© # # · · · # #

.. . . . ....

...# # #

# #

, i.e., there

is a “bulge” in the matrix.

We continue by returning this to upper Hessenberg form as before, but now this is muchsimpler since we may use rotators to just “push” the bulge from row-to-row toward thebottom:

• Let Q1 be the rotator that uses the (2, 1) component to zero out the (3, 1) compo-nent, so

Q∗0AQ0 → Q∗1Q∗0AQ0 upper Hessenberg

Q∗1Q∗0AQ0 → Q∗1Q

∗0AQ0Q1 bulge in (4, 2) spot

• Let Q2 be the rotator that uses the (3, 2) component to zero out the (4, 2) compo-nent, so

Q∗1Q∗0AQ0Q1 → Q∗2Q

∗1Q∗0AQ0Q1 upper Hessenberg

Q∗2Q∗1Q∗0AQ0Q1 → Q∗2Q

∗1Q∗0AQ0Q1Q2 bulge in (5, 3) spot

...

until we have pushed the bulge out! Hence

A = Q∗n−2 · · ·Q∗1Q∗0AQ0Q1 · · ·Qn−2.

This is nicknamed the “bulge chasing algorithm” – it is the “cover art” for our book!



5.5.1.1 Hermitian case

As before, A is now tridiagonal, so when we compute Q∗0AQ0 we have two bulges:

# # +©# # #+© # # #

# # #.. . . . . . . .

# # ## #

and Q∗1Q

∗0AQ0Q1 has chased both bulges down one spot:

# ## # # +©

# # #+© # # #

.. . . . . . . .

# # ## #

5.5.1.2 Choosing the shift

Still considering the case of Hermitian A, there are two main approaches for choosing theshift in Francis’s algorithm:

1. Set ρ = ann (called the Rayleigh quotient shift)

2. Compute 2 real eigenvalues of the last 2 × 2 submatrix,

[an−1,n−1 an−1,na∗n−1,n an,n

]and

choose ρ as the eigenvalue closer to an,n (called the Wilkinson shift).


math5316 lecture notes - runge.math.smu.edu

Documents