eigenvalues

56
0 Nonsymmetric Eigenvalue Problems 4.1. Introduction We discuss canonical forms (in section 4.2), perturbation theory (in section 4.3), and algorithms for the eigenvalue problem for a single nonsymmetric matrix A (in section 4.4). Chapter 5 is devoted to the special case of real symmet- ric matrices A = A T  (and the SVD). Section 4.5 discusses generalizations to eigenvalue problems involving more than one matrix, including motivating applications from the analysis of vibrating systems, the solution of linear differ- ential equations, and computational geometry. Finally, section 4.6 summarizes all the canonical forms, algorithms, costs, applications, and available software in a list. One can roughly divi de the algorithms for the eigenprobl em into two groups: direct methods and iterative methods. This chapter considers only direct meth- ods, which are intended to compute all of the eigenvalues, and (optionally) eigenvectors. Direct methods are typically used on dense matrices and cost O (n 3 ) operations to compute all eigenvalues and eigenvectors; this cost is rel- atively insensitive to the ac tual matrix entries. The ma il direct method used in practice is QR iteration with implicit shifts (see section 4.4.8). It is interesting that after more than 30 years of depend- able service, convergence failures of this algorithm have quite recently been observed, analyzed, and patched [25, 65]. But there is still no global conver- gence proof, even thoug h the current algorithm is considered quite reliable. So the problem of devising an algorithm that is numerically stable and globally (and quickly ) convergent remains open. (Note that "direct" methods must still iterate, since finding eigenvalues is m athematically equivalent to finding zeros of polynomials, for which no noniterative methods can exist. We call a method direct if experience shows that it (nearly) never fails to converge in a fixed number of iterations.) Iterative rnethods, which are discussed in Chapter 7, are usually applied to sparse matrices or matrices for which matrix-vector multiplication is the only convenient operation to perform. Iterative methods typically provide ^I; ]    D   o   w   n    l   o   a    d   e    d    1    2    /    2    6    /    1    2    t   o    1    2    8  .    9    5  .    1    0    4  .    1    0    9  .    R   e    d    i   s    t   r    i    b   u    t    i   o   n   s   u    b    j   e   c    t    t   o    S    I    A    M     l    i   c   e   n   s   e   o   r   c   o   p   y   r    i   g    h    t   ;   s   e   e    h    t    t   p   :    /    /   w   w   w  .   s    i   a   m  .   o   r   g    /    j   o   u   r   n   a    l   s    /   o    j   s   a  .   p    h   p

Upload: fuat-korkut

Post on 17-Oct-2015

33 views

Category:

Documents


0 download

TRANSCRIPT

  • 5/27/2018 Eigenvalues

    1/55

    0Nonsymmetric Eigenvalue Problems

    4.1. IntroductionWe discuss canonical forms (in section 4.2), perturbation theory (in section 4.3),and algorithms for the eigenvalue problem for a single nonsymmetric matrixA (in section 4.4). Chapter 5 is devoted to the special case of real symmet-ric matrices A = A T(and the SVD). Section 4.5 discusses generalizationsto eigenvalue problems involving more than one matrix, including motivatingapplications from the analysis of vibrating systems, the solution of linear differ-ential equations, and computational geometry. Finally, section 4.6 summarizesall the canonical forms, algorithms, costs, applications, and available softwarein a list.

    One can roughly divide the algorithms for the eigenproblem into two groups:direct methods and iterative methods. This chapter considers only direct meth-ods, which are intended to compute all of the eigenvalues, and (optionally)eigenvectors. Direct methods are typically used on dense matrices and costO(n 3 ) operations to compute all eigenvalues and eigenvectors; this cost is rel-atively insensitive to the actual matrix entries.

    The mail direct method used in practice is QR iteration with implicit shifts(see section 4.4.8). It is interesting that after more than 30 years of depend-able service, convergence failures of this algorithm have quite recently beenobserved, analyzed, and patched [25, 65]. But there is still no global conver-gence proof, even though the current algorithm is considered quite reliable. Sothe problem of devising an algorithm that is numerically stable and globally(and quickly ) convergent remains open. (Note that "direct" methods muststill iterate, since finding eigenvalues is mathematically equivalent to findingzeros of polynomials, for which no noniterative methods can exist. We call amethod direct if experience shows that it (nearly) never fails to converge in afixed number of iterations.)Iterative rnethods, which are discussed in Chapter 7, are usually appliedto sparse matrices or matrices for which matrix-vector multiplication is theonly convenient operation to perform. Iterative methods typically provide

    ^I; ]

    Downloa

    ded12/26/12to128.95.104.109.RedistributionsubjecttoSIAMl

    icenseorcopyright;see

    http://www.siam.org/journals/ojsa.php

  • 5/27/2018 Eigenvalues

    2/55

    140 Applied Numerical Linear Algebraapproximations only to a subset of the eigenvalues and eigenvectors and areusually run only long enough to get a few adequately accurate eigenvaluesrather than a large number. Their convergence properties depend strongly onthe matrix entries.

    4.2. Canonical FormsDEFINITION 4.1. The polynomial p(.\) = det(A )I) is called the character-istic polynomial of A . The roots of p(.\) = 0 are the eigenvalues of A .

    Since the degree of the characteristic polynomial pO^) equals n, the dimen-sion of A , it has n roots, so A has n eigenvalues.DEFINITION 4.2. A nonzero vector x satisfying A x = X x is a ( right) eigen-vector for the eigenvalue X. A nonzero vector y such that y*A = .)y* is a lefteigenvector. (Recall that y* _ (y) Tis the conjugate transpose of y.)

    Most of our algorithms will involve transforming the matrix A into sim-pler, or canonical forms, from which it is easy to compute its eigenvalues andeigenvectors. These transformations are called similarity transformations (seebelow). The two most common canonical forms are called the Jordan formand Schur form. The Jordan form is useful theoretically but is very hard tocompute in a numerically stable fashion, which is why our algorithms will aimto compute the Schur form instead.

    To motivate Jordan and Schur forms, let us ask which matrices have theproperty that their eigenvalues are easy to compute. The easiest case would bea diagonal matrix, whose eigenvalues are simply its diagonal entries. Equallyeasy would be a triangular matrix, whose eigenvalues are also its diagonalentries. Below we will see that a matrix in Jordan or Schur form is triangular.But recall that a real matrix can have complex eigenvalues, since the rootsof its characteristic polynomial may be real or complex. Therefore, there isnot always a real triangular matrix with the same eigenvalues as a real generalmatrix, since a real triangular matrix can only have real eigenvalues. Therefore,we must either use complex numbers or look beyond real triangular matricesfor our canonical forms for real matrices. It will turn out to be sufficient toconsider block triangular matrices, i.e., matrices of the form

    Al, Al2 ... A lbA 22 A2b

    A= 4.1)Abb

    where each Aii is square and all entries below the A22 blocks are zero. One caneasily show that the characteristic polynomial det(A )J) of A is the product

    Downloa

    ded12/26/12to128.95.104.109.RedistributionsubjecttoSIAMl

    icenseorcopyright;see

    http://www.siam.org/journals/ojsa.php

  • 5/27/2018 Eigenvalues

    3/55

    Nonsymmetric Eigenvalue Problems 141Ijb7i det A ij AI of the characteristic p olynomials of the AZZ ) and thereforeY iithat the set )(A) of eigenvalues of A is the union Ub_ 1 )(Aii) of the sets ofeigenvalues of the diagonal blocks Aii (see Question 4.1). The canonical formsthat we compute will be block triangular and will proceed computationally bybreaking up large diagonal blocks into smaller ones. If we start with a complexmatrix A, the final diagonal blocks will be 1-by-1, so the ultimate canonicalform will be triangular. If we start with a real matrix A, the ultimate canonicalform will have 1-by-1 diagonal blocks (corresponding to real eigenvalues) and 2-by-2 diagonal blocks (corresponding to complex conjugate pairs of eigenvalues);such a block triangular matrix is called quasi-triangular.

    It is also easy to find the eigenvectors of a (block) triangular matrix; seesection 4.2.1.DEFINITION 4.3. Let S be any nonsingular matrix. Then A and B = S -1 A Sare called similar matrices, and S is a similarity transformation.PROPOSITION 4.1. Let B = S 'A S, so A and B are similar. Then A and Bhave the same eigenvalues, and x (or y) is a right (or left) eigenvector of A ifand only if S -1 x (or S*y ) is a right (or left) eigenvector of B .Proof. Using the fact that det(X Y ) = det(X)det(Y) for any square matricesX and Y , we can write det(A ) I) = det(S -1(A )I)S) = det(B ) I), soA and B have the same characteristic polynomials. A x = .fix holds if and onlyif S -1 A SS -1 x = ).S -1 x or B(S l x) = A (S -1 x). Similarly, y*A = A y* if andonly if y*SS -1 A S = )^y*S or (S*y )*B = \(S*y)*.THEOREM 4.1. Jordan canonical form. Given A , there exists a nonsingular Ssuch that S 'A S = J, where J is in Jordan canonical form. This means thatJ is block diagonal, with J = diag(Jl (A1),J 2 72), ... , J,()) and

    Ai 0ixniJn^ Ai) _ 10iJ is unique, up to permutations of its diagonal blocks.For a proof of this theorem, see a book on linear algebra such as [110] or[139].Each Jm (,\) is called a Jordan block with eigenvalue ). of algebraic multi-plicity m . If some ni = 1, and A i is an eigenvalue of only that one Jordanblock, then Ai is called a simple eigenvalue. If all ni = 1, so that J is diagonal,A is called diagonalizable; otherwise it is called defective. An n-by-n defectivematrix does not have n eigenvectors, as described in more detail in the next

    Downloa

    ded12/26/12to128.95.104.109.RedistributionsubjecttoSIAMl

    icenseorcopyright;see

    http://www.siam.org/journals/ojsa.php

  • 5/27/2018 Eigenvalues

    4/55

    142 Applied Numerical Linear AlgebraH x l Hxi Hxnklk2 klki+lknml mi mnbl bi bnx i = position of i-th mass (0 = equilibrium)m i = mass of i-th masski = spring constant of i-th springbi = damping constant of i-th damperFig. 4.1. Damped, v ibrating mass-spring system .

    proposition. Although defective matrices are "rare" in a certain well-definedsense, the fact that some matrices do not have n eigenvectors is a fundamen-tal fact confronting anyone designing algorithms to compute eigenvectors andeigenvalues. In section 4.3, we will see some of the difficulties that such matri-ces cause. Symmetric matrices, discussed in Chapter 5, are never defective.PROPOSITION 4.2. A Jordan block has one right eigenvector, el = [1, 0, ... , 0] T ,and one lef t eigenvector, en= [0, ... , 0, 1] T . Therefore, a matrix has n eigen-vectors matching its n eigenv alues if and only if it is diagonalizable. In thiscase, S 'A S = diag(.Xi). This is equivalent to A S = S diag(.\i), so the ithcolumn of S is a right eigenvector for A i . It is also equivalent to S 'A =diag(7Z)S -1 , so the conjugate transpose of the ith row of S ' is a left eigen-vector for A Z. If all n eigenvalues of a matrix A are distinct, then A is diago-nalizable.Proof. Let J = J,,,,('\) for ease of notation. It is easy to see Jel = )iel ande,TJ = Ae,T, so el and e nare right and left eigenvectors of J, respectively. Tosee that J has only one right eigenvector (up to scalar multiples), note thatany eigenvector x must satisfy (J .\I)x = 0, so x is in the null space of

    1

    J A I= 1

    0But the null space of JAI is clearly span(el), so there is just one eigenvector.If all eigenvalues of A are distinct, then all its Jordan blocks must be 1-by-1,so J = diag(\1, ... , )^ n ) is diagonal.EXAMPLE 4.1. We illustrate the concepts of eigenvalue and eigenvector with aproblem of mechanical vibrations. We will see a defective matrix arise in a nat-ural physical context. Consider the damped mass spring system in Figure 4.1,which we will use to illustrate a variety of eigenvalue problems.

    Downloa

    ded12/26/12to128.95.104.109.RedistributionsubjecttoSIAMl

    icenseorcopyright;see

    http://www.siam.org/journals/ojsa.php

  • 5/27/2018 Eigenvalues

    5/55

    Nonsymmetric Eigenvalue Problems 143Newton's law F = ma applied to this system yieldsmzxi t) = k2 xi 1 t) - xi t))force on mass i from spring i

    +ki+1 xi+l t) xi t)) 4.2)force on mass i from spring i + 1 bimei (t)force on mass i from damper ior M(t) _ B(t) Kx(t), (4.3)where M = diag(ml, . .. , m ? ), B = diag(bl, . .. , bn ), and

    k1+k2k2 k22--k3 k3K= k-n,-1 kn-1 + k1, k . kn,n,We assume that all the masses mi are positive. M is called the mass matrix,B is the damping m atrix, and K is the stiffness m atrix.Electrical engineers analyzing linear circuits arrive at an analogous equationby applying Kirchoff's and related laws instead of Newton's law. In this case xrepresents branch currents, M represent inductances, B represents resistances,and K represents admittances (reciprocal capacitances).

    We will use a standard trick to change this second-order differential equa-tion to a first-order differential equation, changing variables to

    y(t) _ [ x(t) JThis yields

    ^ (t) _(t) M -1Bx (t) M -1Kx (t)x(t) Jx(t) LJM -1 B M-1K 1F it) (t)J

    L

    M -1 B M -1 K 1 y(t) __ A y(t).(4.4)I0]To solve y(t) = Ay(t), we assume that y(0) is given (i.e., the initial positionsx 0) and velocities x 0) are given).

    One way to write down the solution of this differential equation is y(t) _e A t y(0), where e A tis the matrix exponential. We will give another more el-ementary solution in the special case where A is diagonalizable; this will be

    Downloa

    ded12/26/12to128.95.104.109.RedistributionsubjecttoSIAMl

    icenseorcopyright;see

    http://www.siam.org/journals/ojsa.php

  • 5/27/2018 Eigenvalues

    6/55

    144 Applied Numerical Linear AlgebraFig. 4.2. Positions and velocities of a mass-spring system with four masses m l=m 4= 2 and m 2 = 7fl3 = 1. The spring constants are all ki = 1. T he dam pingconstants are all b i= .4. The initial displacements are x 1 (0) = .25, x 2(0) = x3(0) _0, and x4(0) = .25. T he initial velocities are v l (0) = 1, v2(0) = v3(0) = 0 , andv4 (0) = 1. The equilibrium positions are 1, 2, 3, and 4. The sof tware for solving andplotting an arbitrary mass-spring system is HOMEPA GE/M atlab/massspring. m.

    true for almost all choices of mi, kZ , and bi. We will return to consider othersituations later. (The general problem of computing matrix functions such ase Atis discussed further in section 4.5.1 and Question 4.4.)

    When A is diagonalizable, we can write A = S A S ', where A = diag(Al, ...,n ). Then y (t) = A y (t) is equivalent to y(t) = S A S l y(t) or S -1 (t) =AS y(t) or z(t) = A z(t), where z(t) - S ly t). This diagonal systemof differential equations zi(t) = )z(t) bas solutions z i(t) = e >, i t zi(0), soy(t) = Sdiag(e' 1t , .. . , e A ^ t )S l y (0) = Se A t S l y(0). A sample numerical solu-tion for four masses and springs is shown in Figure 4.2.

    To see the physical significance of the nondiagonalizability of A for amass-spring system, consider the case of a single mals, spring, and damper,whose differential equation we can simplify to m x (t) _ b(t) k x(t) , and soA= [ blm kom]. The two eigenvalues of A are = 2_(lf(1- 4b ) 1 ^ 2 )W hen < 1, the system is overdamped, and there are two negative realeigenvalues, whose mean value is 2. In this case the solution eventuallydecays monotonically to zero. When > 1, the system is underdamped,and there are two complex conjugate eigenvalues with real part . In thiscase the solution oscillates while decaying to zero. In both cases the system isdiagonalizable since the eigenvalues are distinct. When = 1, the system

    Downloa

    ded12/26/12to128.95.104.109.RedistributionsubjecttoSIAMl

    icenseorcopyright;see

    http://www.siam.org/journals/ojsa.php

  • 5/27/2018 Eigenvalues

    7/55

    Nonsymmetric Eigenvalue Problems 14 5is critically damped, there are two real eigenvalues equal to 2m and A hasa single 2-by-2 Jordan block with this eigenvalue. In other words, the non-diagonalizable matrices form the "boundary" between two physical behaviors:oscillation and monotonic decay.When A is diagonalizable but S is an ill-conditioned matrix, so that S -1isdifficult to evaluate accurately, the explicit solution y(t) = Se A , S l y(0) will bequite inaccurate and useless numerically. We will use this mechanical systemas a running example because it illustrates so many eigenproblems. o

    To continue our discussion of canonical forms, it is convenient to define thefollowing generalization of an eigenvector.DEFINITION 4.4. A n invariant subspace of A is a subspace X of R ', with theproperty that x E X implies that A x E X. W e also write this as AX C X.

    The simplest, one-dimensional invariant subspace is the set span(x) of allscalar multiples of an eigenvector x. Here is the analogous way to build aninvariant subspace of larger dimension. Let X = [xl, ... , x,, 72 ], where xl, . .. ,are any set of independent eigenvectors with eigenvalues ....... , )'z.ThenX = span(X) is an invariant subspace since x E X implies x = m 1aix i forsome scalars ai, so A x = Em 1ajA xi = Em _1c jX ixi E X. AX will equal Xunless some eigenvalue ) equals zero. The next proposition generalizes this.PR0POSITI0N 4.3. Let A ben-by-n, let X = [xl, ... , x m ] be any n-by-m matrixwith independent columns, and let X = span(X ), the m -dimensional spacespanned by the columns of X . Then X is an invariant subspace if and onlyif there is an m -by-m m atrix B such that A X = X B . In this case the meigenvalues of B are also eigenvalues of A . (W hen m = 1, X = [x l] is aneigenvector and B is an eigenvalue.)Proof. Assume first that X is invariant. Then each A x i is also in X, so eachA x z must be a linear combination of a basis of X, say, A x 2 = r_m Thislast equation is equivalent to A X = X B . Conversely, A X = X B means thateach A x 2 is a linear combination of columns of X , so X is invariant.

    Now assume A X = X B . Choose any n-by-(n m) matrix X such thatX = [X , X ] is nonsingular. Then A and X -1 A X are similar and so have

    ymxnthe same eigenvalues. Write X -1= [ (n _ )X] so X -1 X = I impliesY X = I and Y X = 0. Then X I A X = [ Y ] [A X , A X ] = [ yA X Y A X ][ Y XB Y A X ] _ [ B Y AYXB] Thus by Question 4.1 the eigenvalues of A areYAXYAXthe union of the eigenvalues of B and the eigenvalues of Y AX .For exam ple, write the Jordan canonical form S A S = J =diag(Jni(\i)) as A S = SJ, where S = [St, S2,. . . ,Sk i and Sihas n2 columnsDownloaded12/2

    6/12to128.95.104.109.RedistributionsubjecttoSIAMl

    icenseorcopyright;see

    http://www.siam.org/journals/ojsa.php

  • 5/27/2018 Eigenvalues

    8/55

    14 6 Applied Numerical Linear Algebra(the same as Jni (Ai); see Theorem 4.1 for notation). Then A S = SJ impliesA SZ = SZJ., (A i), i.e., span(S) is an invariant subspace.The Jordan form tells everything that we might want to know about amatrix and its eigenvalues, eigenvectors, and invariant subspaces. There arealso explicit formulas based on the Jordan form to compute e Aor any otherfunction of a matrix (see section 4.5.1). But it is bad to compute the Jordanform for two numerical reasons:

    First reason: It is a discontinuous function of A, so any rounding error canchange it completely.EXAMPLE 4.2. Let 01

    J. (0)_.0which is in Jordan form. For arbitrarily small E, adding i E to the (i, i) entrychanges the eigenvalues to the n distinct values i E, and so the Jordan form

    changes from J(0) to diag(E, 2E, ... , ne). oSecond reason: It cannot be computed stably in general. In other words,

    when we have finished computing S and J, we cannot guarantee that S - '(A +6A )S = J for some small SA.EXAMPLE 4.3. Suppose S - 'A S = J exactly, where S is very ill-conditioned. k S) = IIS II . IIS II is very large.) Suppose that we are extremely lucky andmanage to compute S exactly and J with just a tiny error SJ with IIbJII =O(e) 11A11. How big is the backward error? In other words, how big must 6Abe so that S - ' (A + 6A )S = J + SJ? We get SA = S5JS -1 , and all that wecan conclude is that IIsAII < 11511 - IIS JII . PIS -l il = 0(E)k(S)IIAII Thus IISAIImay be much larger than eIIAII, which prevents backward stability. o

    So instead of computing S -1 A S = J, where S can be an arbitrarily ill-conditioned matrix, we w ill restrict S to be orthogonal (so k2(S) = 1) toguarantee stability. We cannot get a canonical form as simple as the Jordanform any more, but we do get something almost as good.THEOREM 4.2. Schur canonical form. Given A , there exists a unitary matrixQ and an upper triangular matrix T such that Q*A Q = T. The eigenvalues ofA are the diagonal entries of T.Proof. We use induction on n. It is obviously true if A is 1 by 1. Now let ).be any eigenvalue and u a corresponding eigenvector normalized so IIu = 1.Choose U so U = [u, U] is a square unitary matrix. (Note that .A and u maybe complex even if A is real.) Then

    Downloa

    ded12/26/12to128.95.104.109.RedistributionsubjecttoSIAMl

    icenseorcopyright;see

    http://www.siam.org/journals/ojsa.php

  • 5/27/2018 Eigenvalues

    9/55

    Nonsymmetric Eigenvalue Problems 147U* A U= L U* J A [u,U]= [U*Au U*A U]Now as in the proof of Proposition 4.3, we can write u*Au = .Xu*u = A ,and *A u= .^U*u = 0 so U*A U = [ AB induction, there is a unitaryp X2a12 ] .2YP, so P*A 22P = T is upper triangular. Then

    U*A U=[0 PTP*] [0 P] [0 a tP*]soL

    1 0J U*4UI 0 P][0 aTP]P*is upper triangular and Q = U[ 0LP ] is unitary as desired.

    Notice that the Schur form is not unique, because the e igenvalues mayappear on the diagonal of T in any order.

    This introduces complex numbers even when A is real. When A is real, weprefer a canonical form that uses only real numbers, because it will be cheaperto compute. As mentioned at the beginning of this section, this means that wewill have to sacrifice a triangular canonical form and settle for a block-triangularcanonical form.THEOREM 4.3. Real Schur canonical form. If A is real, there ex ists a realorthogonal m atrix V such that V TA V = T is quasi upper triangular. Thism eans that T is block upper triangular w ith 1-by-1 and 2-by -2 blocks on thediagonal. Its eigenvalues are the eigenvalues of its diagonal blocks. The 1-by-1 blocks correspond to real eigenvalues, and the 2-by -2 blocks to com plexconjugate pairs of eigenvalues.Proof. We use induction as before. Let ).. be an eigenvalue. If ).. is real, it hasa real eigenvector u and we proceed as in the last theorem. If ). is complex, letu be a (necessarily) complex eigenvector, so A u = \u. Since A u = A u = )^u, ).and u are also an eigenvalue/eigenvector pair. Let UR = 2 u-F- 2 u be the real partof u and ui = l u 22 u be the imaginary part. Then span{uR, uI} = span{u, u}is a two-dimensional invariant subspace. Let U = [uR , ui] and = QR be itsQR decomposition. Thus span{Q} = span{uR, uI} is invariant. Choose Q sothat U = [Q, Q] is real and orthogonal, and compute

    TUT A u [ QT ] A - [Q, Q] _ [ Q

    T AQ Q T ASince Q spans an invariant subspace, there is a 2-by-2 matrix B such thatA Q = QB . Now as in the proof of Proposition 4.3, we can write QTAQ =QTQB = B and QTA Q = QT QB = 0, so UTA U = [ QT Q ] . Now applyinduction to QTA Q.

    Downloa

    ded12/26/12to128.95.104.109.RedistributionsubjecttoSIAMl

    icenseorcopyright;see

    http://www.siam.org/journals/ojsa.php

  • 5/27/2018 Eigenvalues

    10/55

    148 Applied Numerical Linear Algebra4.2.1. Com puting Eigenvectors from the Schur FormLet Q*A Q = T be the Schur form. Then if Tx = .Ax, we have A Qx = QTx =,XQx, so Qx is an eigenvector of A . So to find eigenvectors of A , it suffices tofind eigenvectors of T.

    Suppose that = t22 has multiplicity 1 (i.e., it is simple). Write (T ).I)x =0 as

    T11 XI T1213 x i = 23X2 0T33)^I3(T11 ))I)xl + T12x2 + T13x3= T23x3(T33 )I)x3

    where T11 is (i 1)-by-(i 1), T22 = ) is 1-by-1, T33 is (n i)-by-(n i), andx is partitioned conformably. Since X is simple, both T11 )^I and T33 ^Iare nonsingular, so (T33 )I)x3 = 0 implies X3 = 0. Therefore (T11 )*I)xl =T12x2. Choosing (arbitrarily) X2 = 1 means xl = (T11 )^I) T12, so

    (>'I Tii)-1Tia0

    In other words, we just need to solve a triangular system for x1. To find areal eigenvector from real Schur form, we get a quasi-triangular system to solve.Computing complex eigenvectors from real Schur form using only real arith-metic also just involves equation solving but is a little trickier. See subroutinestrevc in LAPACK for details.

    4.3. Perturbation TheoryIn this section we will concentrate on understanding when eigenvalues are ill-conditioned and thus hard to compute accurately. In addition to providingerror bounds for computed eigenvalues, we will also relate eigenvalue conditionnumbers to related quantities, including the distance to the nearest matrixwith an infinitely ill-conditioned eigenvalue, and the condition number of thematrix of eigenvectors.

    We begin our study by asking when eigenvalues have infinite conditionnumbers. This is the case for multiple eigenvalues, as the following exampleillustrates.EXAMPLE 4.4. Let

    1

    A=' 1

    E 0

    Downloa

    ded12/26/12to128.95.104.109.RedistributionsubjecttoSIAMl

    icenseorcopyright;see

    http://www.siam.org/journals/ojsa.php

  • 5/27/2018 Eigenvalues

    11/55

    Nonsymmetric Eigenvalue Problems 149be an n-by-n matrix. Then A has characteristic polynomial ) c = 0 so= E (n possible values). The nth root of e grows much faster than anymultiple of e for small e. More formally, the condition number is infinite becaused = En= oo at e = 0 for n >_ 2. For example, take n = 16 and e = 10 -1sThen for each eigenvalue I I = .1. oSo we expect a large condition number if an eigenvalue is "close to multi-

    ple"; i.e., there is a small SA such that A+SA has exactly a multiple eigenvalue.Having an infinite condition number does not mean that they cannot be com-puted with any correct digits, however.PROPOSITION 4.4. Eigenvalues of A are continuous functions of A , even ifthey are not differentiable.Proof. It suffices to prove the continuity of roots of polynomials, since thecoef lcients of the characteristic polynomial are continuous (in fact polynomial)functions of the matrix entries. We use the argument principle from complexanalysis [2]: the number of roots of a polynomial p inside a simple closed curvery is 2i j ry P^ z^ dz . If p is changed just a little, Pz is changed just a little,so 1. p) dz is changed just a little. But sincep(nce it is an integer, it must be27rz , z)constant, so the number of roots inside the curve ry is constant. This meansthat the roots cannot pass outside the curve -y(no matter how small ry is,provided that we perturb p by little enough), so the roots must be continuous.

    In what follows, we will concentrate on computing the condition numberof a simple eigenvalue. If \ is a simple eigenvalue of A and SA is small, thenwe can identity an eigenvalue ). + 6,\ of A + 6A "corresponding to" )^: it isthe closest one to ) . We can easily compute the condition number of a simpleeigenvalue.THEOREM 4.4. Let .X be a simple eigenvalue of A with right eigenvector x andleft eigenvector y, normalized so that IIx1I2 = IIYII2 = 1. Let ) + S ,\ be thecorresponding eigenvalue of A + 6A . Then

    S a = ^x + O(II6AII 2 ) orSaI < ,1+ O(A) = sec e(y, x) IISAII +0 IISAII ),

    where O(y, x) is the acute angle between y and x. In other words, sec e(y, x) _1/Iy*xI is the condition number of the eigenvalue .Proof. Subtract A x =)Xx from (A + SA ) (x + bx) _ (,\ + 6)..)(x + 5x) to get

    A Sx+6A x+SA 6x = )%bx+b.\x+S\6x.Ignore the second-order terms (those with two "S terms" as factors: 5A6x andS)Sx) and multiply by y* to get y*A 6x + y*bA x = y*) 6x + y*S)x.

    Downloa

    ded12/26/12to128.95.104.109.RedistributionsubjecttoSIAMl

    icenseorcopyright;see

    http://www.siam.org/journals/ojsa.php

  • 5/27/2018 Eigenvalues

    12/55

    150 Applied Numerical Linear AlgebraNow y*ASx cancels y*.XSx, so we can solve for S.\ = (y*SAx)/(y*x) asdesired.Note that a Jordan block has right and left eigenvectors el and e n , respec-tively, so the condition number of its eigenvalue is 1/Ienell = 1/0 = oo, whichagrees with our earlier analysis.

    At the other extreme, in the important special case of symmetrie matrices,the condition number is 1, so the eigenvalues are always accurately determinedby the data.COROLLARY 4.1. Let A be sym m etrie (or norm al: A A * = A *A ). Then I"I 1 > > IA,I. In this case the eigenvectors are

    just the columns ei of the identity matrix. Note that x i can also be writtenx i = A i xo/JI A Zx oII2, since the factors 1/IIyi+1112only scale xi+lto be a unitvector and do not change its direction. Then we get

    11 C ^2 )12 i

    Axo-A i ^22)==1^1 en ]SnAn^ 1 ^n 1 Z^1 Alwhere we have assumed 1. Since all the fractions X /)\ iare less than 1in absolute value, A ixo becomes more and more nearly parallel to e l , so xi =AZ xo/IIA Z xo112 becomes closer and closer to +e l , the eigenvector correspondingto the largest eigenvalue ) i . The rate of convergence depends on how muchsmaller than 1 the ratios I)'2/)'1I > > I)n/) 1 are, the smaller the faster.Since x i converges to fel, X i = xTA xi converges to ) i, the largest eigenvalue.

    In showing that the power method converges, we have made several as-sumptions, most notably that A is diagonal. To analyze a more general case,we now assume that A = SAS -1 is diagonalizable, with A = diag(\1, ... , \n)

    Downloa

    ded12/26/12to128.95.104.109.RedistributionsubjecttoSIAMl

    icenseorcopyright;see

    http://www.siam.org/journals/ojsa.php

  • 5/27/2018 Eigenvalues

    17/55

    Nonsymmetric Eigenvalue Problems 15 5and the eigenvalues sorted so that ,I > I X21 > > P 'n I. Write S =[s,, ... , s], where the columns si are the corresponding eigenvectors and alsosatisfy 118i112 = 1; in the last paragraph we had S = I. This lets us writex p = S(S -1 x0) - S([^1, ... Also, since A = S A S , we can write

    A i= (SAS ') ... (SA S - ') = S A S 'i times

    since all the S -1 S pairs cancel. This finally lets us write11lei

    Z

    AZx0= (SA'S -1 )S= S^ 2= 1A i n ]n^nSpe 1 ^n ZSl )1As before, the vector in brackets converges to el, so A 5 x0 gets closer and closerto a multiple of Sel = sl, the eigenvector corresponding to Al. Therefore,.^i = xT A xi converges to s1 As, = sl tlsl =A minor drawback of this method is the assumption that, i.e., thatxo is not the invariant subspace span{s2, ... , sn }; this is true with very highprobability if xo is chosen at random. A major drawback is that it convergesto the eigenvalue/eigenvector pair only for the eigenvalue of largest absolutemagnitude, and its convergence rate depends on ^.\2/,\1^, a quantity which maybe close to 1 and thus cause very slow convergence. Indeed, if A is real andthe largest eigenvalue is complex, there are two complex conjugate eigenvaluesof largest absolute value 1).11 = ^.^2^, and so the above analysis does not workat all. In the extreme case of an orthogonal matrix, all the eigenvalues havethe same absolute value, namely, 1.

    To plot the convergence of the power method, see HOMEPAGE/Matlab/powerplot.m.

    4.4.2. Inverse IterationWe will overcome the drawbacks of the power method just described by ap-plying the power method to (AuI) instead of A , where u is called a shift.This will let us converge to the eigenvalue closest to a, rather than just A1.This method is called inverse iteration or the inverse power method.ALGORITHM 4.2. Inverse iteration: Given xp, we iterate

    i=0repeat

    Y i+1 = (A oI) l xixi+i = yi+l/IIyi+1112approximate eigenvector)Downloaded12/26/12to128.95.104.109.RedistributionsubjecttoSIAMl

    icenseorcopyright;see

    http://www.siam.org/journals/ojsa.php

  • 5/27/2018 Eigenvalues

    18/55

    156 Applied Numerical Linear Algebrai+l = x +,A x i+lapproximate eigenvalue)i=i+1until convergenceTo analyze the convergence, note that A = SA S -1implies A al = S(A

    uI)S -1and so (A Q I) ' = S(A aI) -1 S -1 . Thus (A Q I) -1has the sameeigenvectors si as A with corresponding eigenvalues ((AaI) ) = (>,^ Q )-1The same analysis as before tells us to expect xi to converge to the eigenvectorcorresponding to the largest eigenvalue in absolute value. More specifically,assume that P'k al is smaller than all the other l)^i a^ so that (,\k Q) -1is the largest eigenvalue in absolute value. Also, write xo = S[^1 i... , n ] Tasbefore, and assume 0. Then

    A QI)-ixo _ S A QI)-i S-1) S s

    ^1 ^ ak a lk i)- U -is

    iSk Qn )

    where the 1 is in entry k . Since all the fractions ()k a)/(A i a) are less thanone in absolute value, the vector in brackets approaches ek, so (A QI) i xogets closer and closer to a multiple of Sek = sk, the eigenvector correspondingto )'k. As before, ).i = xT A xi 'also converges to .'k.

    The advantage of inverse iteration over the power method is the ability toconverge to any desired eigenvalue (the one nearest the shift a). By choosingo, very close to a desired eigenvalue, we can converge very quickly and thusnot be as limited by the proximity of nearby eigenvalues as is the originalpower method. The method is particularly effective when we have a goodapproximation to an eigenvalue and want only its corresponding eigenvector(for example, see section 5.3.4). Later we will explain how to choose such a Qwithout knowing the eigenvalues, which is what we are trying to compute inthe first place

    4.4.3. Orthogon al IterationOur next improvement will permit us to converge to a (p > 1)-dimensionalinvariant subspace, rather than one eigenvector at a time. It is called orthogonaliteration (and sometimes subspace iteration or simultaneous iteration).

    Downloa

    ded12/26/12to128.95.104.109.RedistributionsubjecttoSIAMl

    icenseorcopyright;see

    http://www.siam.org/journals/ojsa.php

  • 5/27/2018 Eigenvalues

    19/55

    Nonsymmetric Eigenvalue Problems 157ALGORITHM 4.3. Orthogonal iteration: Let Zo be an n x p orthogonal matrix.Then we iteratei=0repeatY +1= A llFactor Y +1 = Zz+lRj+l

    i=i+1until convergence

    using Algorithm 3.2 (QR decomposition)(Zz+l spans an approxim ateinvariant subspace)

    Here is an informal analysis of this method. Assume ] Ap I > 1 AP+1 . If p = 1,this method and its analysis are identical to the power method. When p > 1, wewrite span{Zi+i} = span{Y+1} = span {All }, so span{Zi} = span{Allo} _span{SA Z S 'Zo}. Note that

    SAZS 'Zo = S diag(, . .. , ^n)S -1 Zo(X1/X ) Z

    = A IS1S-1Zo.Since>1ifj

  • 5/27/2018 Eigenvalues

    20/55

    158 Applied Numerical Linear AlgebraNote that if we follow only the first p

  • 5/27/2018 Eigenvalues

    21/55

    Nonsymmetric Eigenvalue Problems 15 9Similarly, Figure 4.4 plots the actual values A i(p+ 1: n, 1 : p)112 as solidlines and the approximations JAP+1/,\PiZas dotted lines; they also match well.Here are Ao and A19 for comparison:3.54882.3595 = A o =.9953 10 -21.922730.000

    _.7607 10 -1 3`4191.5452 10 -2 3

    7.3360 10 -2 9

    15.593 8.577524.526 14.59627.599 21.48355.667 39.71732.5576.00001.1086 10 -93.3769 10 -1 5

    4.01235.81575.841510.55870.8444.9841.8143.557542.0000.258944.9533 10 -61.0000

    See HOMEPAGE/Matlab/qriter.m for Matlab software to run this and similarexamples. o

    EXAMPLE 4.6. To see why the assumption in Theorem 4.8 about nonsingular-ity of S(1 : j, 1 : j) is necessary, suppose that A is diagonal with the eigenval-ues not in decreasing order on the diagonal. Then orthogonal iteration yieldsZZ = diag(1) (a diagonal matrix with diagonal entries l) and Ai = A forall i, so the eigenvalues do not move into decreasing order. To see why theassumption that the eigenvalues have distinct absolute values is necessary, sup-pose that A is orthogonal, so all its eigenvalues have absolute value 1. Again,the algorithm leaves Ai essentially unchanged. (The rows and columns maybe multiplied by l.)

    4.4.4. QR IterationOur next goal is to reorganize orthogonal iteration to incorporate shifting andinverting, as in section 4.4.2. This will make it more efficient and eliminatethe assumption that eigenvalues differ in magnitude, which was needed inTheorem 4.8 to prove convergente.

    ALGORITHM 4.4. QR iteration: Given A0, we iteratei=0repeatFactor A i = QiR2the QR decomposition)A 2+, = R Z Qzi=i+1until convergenceDownloaded12/26

    /12to128.95.104.109.RedistributionsubjecttoSIAMl

    icenseorcopyright;see

    http://www.siam.org/journals/ojsa.php

  • 5/27/2018 Eigenvalues

    22/55

    9] Applied Numerical Linear Algebra1051010 5

    10 10

    1015

    Convergence of Iambda(1)= 3010 2

    10 10 210 410 610 810 10

    Convergence of lambda(2)= 6

    0500050z Convergence of lambda(3)= 2onvergence of lambda(4)= 1101010 210 310 410 51 6 10 6 0500050

    Fig. 4.3. Convergence of diagonal entries during orthogonal iteration.10 5Convergente of Schur form(2:4 , 1:1)2 Convergence of Schur form(3:4 , 1:2)101010 1 210 -40 5

    10 61 1010 810 1010_ 15 0500050Convergence of Schur form(4:4 , 1:3)10 Z1 - ----

    10 2-----------------------10 6 050Fig. 4.4. Convergence to Sch'ur form during orthogonal iteration.

    10

    10 10 2

    10 -4

    Downloa

    ded12/26/12to128.95.104.109.RedistributionsubjecttoSIAMl

    icenseorcopyright;see

    http://www.siam.org/journals/ojsa.php

  • 5/27/2018 Eigenvalues

    23/55

    Nonsymmetric Eigenvalue Problems 161Since A i+1 = RjQz = QT (QzRj)Qi = QT A iQ2, A2 +1 and A i are orthogo-nally similar.We claim that the A z computed by QR iteration is identical to the matrixZT A ll implicitly computed by orthogonal iteration.LEMMA 4.3. Let A i be the matrix computed by A lgorithm 4.4. Then A ZZT A ll, where ZZ is the matrix computed from orthogonal iteration (A lgo-rithm 4.3) starting with Z o = I. Thus A i converges to Schur foren if all theeigenvalues have different absolute values.Proof.e use induction. Assume A Z = ZT AZz . F4om Algorithm 4.3,we can w rite A ZZ = Zz +1Ri+l, whereZz+lis orthogonal and RZ+1is uppertriangular. Then ZT A ll = ZT(ZZ+1Ri +1) is the product of an orthogonalmatrix Q ZT Z2+1and an upper triangular matrix R = R i+l = Z +1 A Z2 ;this must be the QR decomposition A Z = QR , since the QR decompositionis unique (except for possibly multiplying each column of Q and row of R by1). Then

    Z+1A ll+1 = (Z+IA ll)(ZTZi+i) = Rz+i(ZT Zi+i) = RQ.This is precisely how the QR iteration maps A i to A i+l,so Z+ 1 A Z2 +1 = Az+las desired.To see a variety of numerical examples illustrating the convergence of QRiteration, run the Matlab code referred to in Question 4.15.

    trom earlier analysis, we know that the convergence rate depends on theratios of eigenvalues. To speed convergence, we use shifting and inverting.ALGORITHM 4.5. QR iteration with a shift: Given A 0, we iterate

    i=0repeatChoose a shift ui near an eigenvalue of AFactor A z uiI = QiRj (QR decomposition)A i+l = RQ i + ali=i+1until convergence

    LEMMA 4.4. A i and A i+l are orthogonally similar.

    Proof . A i+l = RQ i + al = QT QZR Qi + a QT Qi = QT (QiRZ + a I )Qi =QTAiQi.If Ri is nonsingular, we may also writeA i+l = R iQi + o -iI = R QiRzR 2 ' + a RiRi 1= Ri(QZRz + QiI)Rz 1= RiA iRT 1.

    Downloa

    ded12/26/12to128.95.104.109.RedistributionsubjecttoSIAMl

    icenseorcopyright;see

    http://www.siam.org/journals/ojsa.php

  • 5/27/2018 Eigenvalues

    24/55

    162 Applied Numerical Linear AlgebraIf ui is an exact eigenvalue of Ai, then we claim that QR iteration convergesin one step: since ui is an eigenvalue, AZ - ci I is singular, so RZ is singular, andso some diagonal entry of RZ must be zero. Suppose Ri (n, n) = 0. This impliesthat the last row of RiQi is 0, so the last row of A i+l = R iQi+cZI equals uien,where e nis the nth column of the n-by-n identity matrix. In other words, thelast row of Az+lis zero except for the eigenvalue uZ appearing in the (n, n)entry. This means that the algorithm has converged, because Ai+ l is blockupper triangular, with a trailing 1-by-1 block ui; the leading (n - 1)-by-(n - 1)block A' is a new, smaller eigenproblem to which QR iteration can be solvedwithout ever modifying uZ again: A i+l = [ a 1 .

    When ui is not an exact eigenvalue, then we will accept Ai+l(n, n) as havingconverged when the lower left block Ai+l(n, 1: n - 1) is small enough. Recallfrom our Barlier analysis that we expect A i+l(n, 1 : n - 1) to shrink by a factor4 Qi / mini j Qi 1, where A k Ui1 = mini u i1. So if uiis a verygood approximation to eigenvalue Xk, we expect fast convergence.

    Here is another way to see why convergence should be fast, by recognizingthat QR iteration is implicitly doing inverse iteration. When uZ is an exacteigenvalue, the last column q iof Qi will be a left eigenvector of A i for eigenvalueui, since q*A i = q*(QiRi +QZI) = en R2 +uiq *. = uiq*. When ui is close to aneigenvalue, we expect q ito be close to an eigenvector for the following reason:q. is parallel to ((Ai QiI)*) l en(we explain why below). In other words qiis the same as would be obtained from inverse iteration on (Ai uiI)* (and sowe expect it to be close to a left eigenvector).

    Here is the proof thatq is parallel to ((A i ciiI) * ) l en. A i ciii = QiRiimplies (Ai uiI)RT 1= Q . Inverting and taking the conjugate transpose ofboth sides leave the right-hand side Qi unchanged and change the left-hand sideto Ai QiI)*) -1 Rz , whose last column is Ai QjI)*) -1 [0, . ..which is proportional to the last column of ((A i o i I)*) -1 .

    How do we choose uz to be an accurate approximate eigenvalue, when weare trying to compute eigenvalues in the first place? We will say more aboutthis later, but for now note that near convergence to a real eigenvalue Ai (n, n)is close to that eigenvalue, so ui = A i(n, n) is a good choice of shift. In fact,it yields local quadratic convergence, which means that the number of correctdigits doubles at every step. We explain why quadratic convergence occurs asfollows: Suppose at step i that Ai (n, 1 : n - 1) I I / I A I I -- 9l 1. If we wereto set A i(n,1 : n - 1) to exactly 0, we would make A Z block upper triangularand so perturb a true eigenvalue \k to make it equal to A i(n, n). If thiseigenvalue is far from the other eigenvalues, it will not be ill-conditioned, sothis perturbation will be 0 TII I A I U. In other words, I)'k - A i (n, n) I = 0(iiil A ll ).On the next iteration, if we choose ui = A i(n, n), we expect Ai+1(n,1 : n-1) toshrink by a factor R.4-Qi / mini#k .\ j -U i = O(17), implying that I A i+i(n,1 :n 1 11 = O( i2 hlAhI), or Il A i+i(n,1 : n 1 /IIAII = O( j2 ). Decreasingtheerror this way from rl to O( is quadratic convergence.

    Downloa

    ded12/26/12to128.95.104.109.RedistributionsubjecttoSIAMl

    icenseorcopyright;see

    http://www.siam.org/journals/ojsa.php

  • 5/27/2018 Eigenvalues

    25/55

    Nonsymmetric Eigenvalue Problems 163EXAMPLE 4.7. Here are some shifted QR iterations starting with the same4-by -4 matrix Ao as in Example 4.5, with shift cri = AZ(4, 4). The convergenceis a bit erratic at first but eventually becomes quadratic near the end, withIIAZ(4,1 : 3)I IAi(4,3)I approximately squaring at each of the last threesteps. Also, the number of correct digits in Ai (4, 4) doubles at the fourththrough second-to-last steps.

    Ao(4,:) = +1.9 +56. +40. -10.558Al (4, :) = -.85 -4.9 +2.2 10 -2 -6.6068A2(4,:) = +.35 +.86 +.30 0.74894A3(4,:) = -1.2 10 -2 -.17 -.70 1.4672A4(4,:) = -1.5 10 -4 -1.8 10 -2 -.38 1.4045A 5 (4,:) = -3.0 10 -6 -2.2. 10 -3 -.50 1.1403A6(4,:) = -1.4 10 -8 -6.3.10 -5 -7.8 10 -2 1.0272A7(4,:) = -1.4 10 -1 1 -3.6 10 -7 -2.3 10 -3 0.99941A8(4,:) = +2.8 10 -16 +4.2 10 -1 1 +1.4. 10 -6 0.9999996468853453A9(4,:) = -3.4.10 -24 -3.0 10 -18 -4.8 10 -13 0.9999999999998767A10(4, :) = +1.5 10 -38 +7.4 10 -32 +6.0 10 -26 1.000000000000001

    By the time we reach A10, the rest of the matrix has made a lot of progresstoward convergence as well, so later eigenvalues will be computed very quickly,in one or two steps each:

    30.000-32.55770.8444.985_ 6.1548 10 -66.00001.8143-.55754A102.5531 10 -1 32.0120 10 -62.0000-.258941.4692 10 -3 87.4289 10 -3 26.0040. 10 -2 61.00004 4 5 Making QR Iteration PracticalHere are some remaining problems we have to solve to make the algorithmmore practical:1. The iteration is too expensive. The QR decomposition costs O(n3 ) flops,

    so if we were lucky enough to do only one iteration per eigenvalue, thecost would be O(n4 ). But we seek an algorithm with a total cost of only0(n 3 ).

    2. How shall we choose ai to accelerate convergence to a complex eigen-value? Choosing ui complex means all arithmetic has to be complex,increasing the cost by a factor of about 4 when A is real. We seek analgorithm that uses all real arithmetic if A is real and converges to realSchur form.

    3. How do we recognize convergence?The solutions to these problems, which we will describe in more detail later,

    are as follows:

    Downloa

    ded12/26/12to128.95.104.109.RedistributionsubjecttoSIAMl

    icenseorcopyright;see

    http://www.siam.org/journals/ojsa.php

  • 5/27/2018 Eigenvalues

    26/55

    164 Applied Numerical Linear Algebra1. We will initially reduce the matrix to upper Hessenberg form; this meansthat A is zero below the first subdiagonal (i.e.,0 if i > j + 1) (seesection 4.4.6). Then we will apply a step of QR iteration implicitly, i.e.,without computing Q or multiplying by it explicitly (see section 4.4.8).This will reduce the cost of one QR iteration from O(n 3 ) to O(n 2 ) andthe overall cost from O(n4 ) to O(n 3 ) as desiredWhen A is symmetrie we will reduce it to tridiagonal form instead, re-ducing the cost of a single QR iteration further to O(n). This is discussedin section 4.4.7 and Chapter 5.

    2. Since complex eigenvalues of real matrices occur in complex conjugatepairs, we can shift by ui and Qi simultaneously; it turns out that thiswill permit us to maintain real arithmetic (see section 4.4.8). If A issymmetrie, all eigenvalues are real, and this is not an issue.

    3. Convergente occurs when subdiagonal entries of A i are "small enough."To help choose a practical threshold, we use the notion of backward sta-bility: Since A;, is related to A by a similarity transformation by an or-thogonal matrix, we expect Ai to have roundoff errors of size O (e A I) init anyway. Therefore, any subdiagonal entry of A i smaller than O (e A IIin magnitude may as well be zero, so we set it to zero. 16W hen A isupper Hessenberg, setting ap+l, pto zero will make A into a block uppertriangular matrix A = [ A l A22Al2 ], where Al, is p-by-p and Al, andA22 are both Hessenberg. Then the eigenvalues of All and A22 may befound independently to get the eigenvalues of A. When all these diagonalblocks are 1-by-1 or 2-by-2, the algorithm has finished.

    4.4.6. Hessenberg ReductionGiven a real matrix A , we seek an orthogonal Q so that QAQT is upperHessenberg. The algorithm is a simple variation on the idea used for the QRdecomposition.

    EXAMPLE 4.8. We illustrate the general pattern of Hessenberg reduction witha 5-by-5 example. Each Qi below is a 5-by-5 Householder reflection, chosen tozero out entries i + 2 through n in column i and leaving entries 1 through iunchanged.

    16 In practice, we use a slightly more stringent condition, replacing IIAIIwith the norm ofa submatrix of A, to take into account m atrices which may be "graded" with large entriesin one place and small entries elsewhere. We can also set a subdiagonal entry to zero whenthe product a,+1,Pa,+2,P+1of two adjacent subd iagonal entries is small enough. See theLAPAC K routine slahqr for details.

    Downloa

    ded12/26/12to128.95.104.109.RedistributionsubjecttoSIAMl

    icenseorcopyright;see

    http://www.siam.org/journals/ojsa.php

  • 5/27/2018 Eigenvalues

    27/55

    Nonsymmetric Eigenvalue Problems 16 51. Choose Q1 sox x x x xx x x x xQ1A = o x x x xo x x x xo x x x x

    xxand A l - Q1A Q1 = o00

    x x x xx x x xx x x xXXIXx x x x

    Q1 leaves the first row of Q1A unchanged, and Q1 leaves the first columnof Q1AQ1 unchanged, including the zeros.

    2. Choose Q2 SOx x x x x x x x x xx x x x x x x x x xQ2A 1 = o x x x x and A 2 - Q 2 A 1 Q 2 = 0 x x x x0 0 X x x o o x x x0 o x x x 00 x x xQ2 changes only the last three rows of Al, and QZ leaves the first twocolumns of Q2A1 Q2 unchanged, including the zeros.

    3. Choose Q3 SOx x x x x x x x x xx x x x x x x x x xQ3A2 = 0 x x x x and A 3 = Q3A 2Q3 = 0 x x x x0 o x x x 00 x x x0 0 o x X o 0 o x xwhich is upper Hessenberg. Altogether A3 = (Q3Q2Q1) - A(Q3Q2Q1) TQA QT . o

    The general algorithm for Hessenberg reduction is as follows.ALGORITHM 4.6. Reduction to upper Hessenberg foren:if Q is desired, set Q = If or i=1:n-2ui = House(A(i + 1 : n, i))

    PZ = I 2uiuT* Qi = diag(Izxi, Pi) */A (i +1:n,i:n)= PiA (i +1:n,i:n)A(1 : n,i + 1: n) = A (1 n,i+1:n)-Piif Q is desiredQ i +1:n,i:n)= P2-Q(i +1:n,i:n)* Q =Q i-Q */end ifend forDownloaded12/2

    6/12to128.95.104.109.RedistributionsubjecttoSIAMl

    icenseorcopyright;see

    http://www.siam.org/journals/ojsa.php

  • 5/27/2018 Eigenvalues

    28/55

    166 Applied Numerical Linear AlgebraAs with the QR decomposition, one does not form PZ explicitly but insteadmultiplies by I 2uiuT via matrix-vector operations. The ui vectors can alsobe stored below the subdiagonal, similar to the QR decomposition. They canbe applied using Level 3 BLAS, as described in Question 3.17. This algorithmis available as the Matlab command hess or the LAPACK routine sgehrd

    The number of floating point operations is easily counted to be 3 n3+O(n 2 ), or 3 n3+ O(n2 ) if the product Q = Q _1 Q 1is computed as well.The advantage of Hessenberg form under QR iteration is that it costs only6n 2+ O(n) flops per iteration instead of O(n 3 ), and its form is preserved so

    that the matrix remains upper Hessenberg.PROPOSITION 4.5. Hessenberg form is preserved by QR iteration.Proof. It is easy to confirm that the QR decomposition of an upper Hessenbergmatrix like Ai uI yields an upper Hessenberg Q (since the jth column of Qis a linear combination of the leading j columns of Ai Q1). Then it is easyto confirm that RQ remains upper Hessenberg and adding al does not changethis.DEFINITION 4.5. A n upper Hessenberg matrix H is unreduced if all subdiag-onals are nonzero.

    It is easy to see that if H is reduced because hi+l,i = 0, then its eigenvaluesare those of its leading i-by-i Hessenberg submatrix and its trailing (n i)-by-(n i) Hessenberg submatrix, so we need consider only unreduced matrices.

    4 4 7 Tridiagonal and Bidiagonal ReductionIf A is symmetric, the Hessenberg reduction process leaves A symmetric ateach step, so zeros are created in symmetrie positions. This means we needwork on only half the matrix, reducing the operation count to 3n3+ O(n2 ) or3 n3+ O(n2 ) to form Q n _l ... Qi as well. We call this algorithm tridiagonalreduction. We will use this algorithm in Chapter 5 . This routine is availableas LAPACK routine ssytrd.

    Looking ahead a bit to our discussion of computing the SVD in section 5.4,we recall from section 3.2.3 that the eigenvalues of the symmetric matrix A T Aare the squares of the singular values of A. Our eventual SVD algorithm willuse this fact, so we would like to find a form for A which implies that A T A istridiagonal. We will choose A to be upper bidiagonal, or nonzero only on thediagonal and first superdiagonal. Thus, we want to compute orthogonal ma-trices Q and V such that QAV is bidiagonal. The algorithm, called bidiagonalreduction, is very similar to Hessenberg and tridiagonal reduction.EXAMPLE 4.9. Here is a 4-by-4 example of bidiagonal reduction, which illus-trates the general pattern:

    Downloa

    ded12/26/12to128.95.104.109.RedistributionsubjecttoSIAMl

    icenseorcopyright;see

    http://www.siam.org/journals/ojsa.php

  • 5/27/2018 Eigenvalues

    29/55

    Nonsymmetric Eigenvalue Problems 1671. Choose Ql sox x x xl rx X 0 0Q1A_0 x x x andV,soAl =Q1AV1=o x x x0 x x X 0 x x X0 X X X o x X xQ1 is a Householder reflection, and Vl is a Householder reflection thatleaves the first column of Q1A unchanged.2. Choose Q2 sox x 0 o rx X 0 00 X x X 0 x x oQ2A1= o o x x and V 2 so A2 - Q 2A iV 2 = o o x x0 0 X x o 0 x xQ2 is a Householder reflection that leaves the first row of Al unchanged.V 2 is a Householder reflection that leaves the first two columns of Q2A1unchanged.

    3. Choose Q3 sox x 0 0o x x o

    Q3 `42= 00 x x and V3=IsoA3=Q3A2.0 0 0 x

    Q3 is a Householder reflection that leaves the first two rows of A2 un-changed. o

    In general, if A is n-by -n, then we get orthogonal matrices Q = Q n _1 Qland V = V l V ,_2 such that QA V = A ' is upper bidiagonal.Note that A FT A ' = V T A T Q TQA V = V T A TAV , so A'T A ' has the same

    eigenvalues as A T A ; i.e., A ' has the same singular values as A .The cost of this bidiagonal reduction is 3n3+ 0(n 2 ) flops, plus another4n3+ 0(n2 ) flops to compute Q and V . This routine is available as LAPACK

    routine sgebrd.

    4 4 8 QR Iteration with Implicit ShiftsIn this section we show how to implement QR iteration cheaply on an upperHessenberg matrix. The implementation will be implicit in the lense that we donot explicitly compute the QR factorization of a matrix H but rather constructQ implicitly as a product of Givens rotations and other simple orthogonal

    Downloa

    ded12/26/12to128.95.104.109.RedistributionsubjecttoSIAMl

    icenseorcopyright;see

    http://www.siam.org/journals/ojsa.php

  • 5/27/2018 Eigenvalues

    30/55

    168 Applied Numerical Linear Algebramatrices. The implicit Q theorem described below shows that this implicitlyconstructed Q is the Q we want. Then we show how to incorporate a single shifto- , which is necessary to accelerate convergence. To retain real arithmetic inthe presence of complex eigenvalues, we then show how to do a double shift, i.e.,combine two consecutive QR iterations with complex conjugate shifts o and Q;the result after this double shift is again real. Finally, we discuss strategies forchoosing shifts o and u to provide reliable quadratic convergence. However,there have been recent discoveries of rare situation where convergence does notoccur [25, 65], so finding a completely reliable and fast implementation of QRiteration remains an open problem.

    Implicit Q TheoremOur eventual implementation of QR iteration will depend on the followingtheorem.THEOREM 4.9. Implicit Q theorem. Suppose that QTA Q = H is unreducedupper Hessenberg. Then columns 2 through n of Q are determined uniquely(up to signs) by the f irst column of Q.

    This theorem implies that to compute A i+l = QT A Q from A i in the QRalgorithm, we will need only to

    1. compute the first column of QZ (which is parallel to the first column ofA i o-ZI and so can be gotten just by normalizing this column vector).

    2. choose other columns of QZ so Qi is orthogonal and AZ +1is unreducedHessenberg.

    Then by the implicit Q theorem, we know that we will have computedA i+l correctly because Qi is unique up to signs, which do not matter. (Signsdo not matter because changing the signs of the columns of QZ is the sameas changing A i ajI = QiR2 to (QiS2)(SZRi), where Si = diag(1, ... , 1).Then A i+l = (SZRZ)(QiSi) + aiI = SZ(Rz Qi + o -iI)Si, which is an orthogonalsimilarity that just changes the signs of the columns and rows of A2+1.)Proof of the implicit Q theorem. Suppose that QTA Q = H and V TA V = G areunreduced upper Hessenberg, Q and V are orthogonal, and the first columnsof Q and V are equal. Let (X ) denote the ith column of X . We wish to show(Q)i = +(V )i for all i> 1, or equivalently, that W - V T Q = diag(+1, ... , 1).

    Since W = V TQ, we get GW = GV TQ = V TA Q = V TQH = W H.Now GW = W H implies C(W) i= (GW ) i= (W H) i= sohz+l,i(W )i+l = G(W )i h^Z (W ) Since (W)1 = [1, 0, ... , 0] Tand G isupper Hessenberg, we can use induction on i to show that (W)i is nonzero inentries 1 to i only; i.e., W is upper triangular. Since W is also orthogonal, Wis diagonal = diag(+1, ... ,:1).

    Downloa

    ded12/26/12to128.95.104.109.RedistributionsubjecttoSIAMl

    icenseorcopyright;see

    http://www.siam.org/journals/ojsa.php

  • 5/27/2018 Eigenvalues

    31/55

    Nonsymmetric Eigenvalue Problems 169

    Implicit Single Shift QR AlgorithmTo see how to use the implicit Q theorem to compute Al from Ao = A , we usea 5-by-5 example.EXAMPLE 4.10.. ChooseCli X X X X XS1l X X X X XQ i = 1 soA1=QTAQi= + x x x x1 0 0 X X X1 0 00 x x

    We discuss how to choose cl and sl below; for now they may be anyGivens rotation. The + in position (3,1) is called a bulge and needs tobe gotten rid of to restore Hessenberg form.

    2. Choose1 X X X x X

    C22 X X X X XQ 2 = 822 so Q2 A l = Ox x x

    1 00 X X X1 oO x x

    andXXX x xX X X X X

    A 2 Q2 A1Q2 = 0 Xo + X X X0

    Thus the bulge has been "chased" from (3,1) to (4,2).3. Choose

    1 X X X x x1 X X X X x

    Q 3 = C33 50 Q 3 A 2 = 0X X X 333 0X X X1 0O x Xand

    X X X xx X xA 3 = 3 A 2Q3 = 0 XOo+ x xThe bulge has been chased from (4,2) to (5,3).Downloaded1

    2/26/12to128.95.104.109.RedistributionsubjecttoSIAMl

    icenseorcopyright;see

    http://www.siam.org/journals/ojsa.php

  • 5/27/2018 Eigenvalues

    32/55

    170 Applied Numerical Linear Algebra4. Choose 1 X X X x x1 X X X x xQ4 =1so Q4 A3 = 0 X X X XC4 S40 0 X X X 84 C4 0 0 0 X Xand X X X x xX X X x xA 4 = Q 4 A 3 Q4 = 0 X X X X00 X X X

    0 o a x xso we are back to upper Hessenberg form.

    Altogether QTAQis upper Hessenberg, whereCl X X X Xsi X X X X

    Q = QIQ2Q3Q4 =82 X X XS 3 X X

    S4 Xso the first column of Q is [el, si, 0, ... , 0] T , which by the implicit Q theoremhas uniquely determined the other columns of Q (up to signs). We now choosethe first column of Q to be proportional to the first column of A al, [all a, a21, 0, ... 0] T . This means Q is the same as in the QR decomposition ofA QI, as desired. o

    The cost of one implicit QR iteration for an n-by-n matrix is 6n2+ O(n).Implicit Double Shift QR AlgorithmThis section describes how to maintain real arithmetic by shifting by a and Qat the same time. This is essential for an efficient practical implementation butnot for a mathematical understanding of the algorithm and may be skippedon a first reading.

    The results of shifting by o and Q in succession are

    A o aI = Q1R1,A l = R 1 Q 1 + Q I so Al = Qi A oQ1,A l QI = Q2R2,

    A 2 = R 2Q2 + if so A2= QZ A 1Q2 = Q2 Qf AoQ1Q2LEMMA 4.5. We can choose Q1 and Q2 so

    Downloa

    ded12/26/12to128.95.104.109.RedistributionsubjecttoSIAMl

    icenseorcopyright;see

    http://www.siam.org/journals/ojsa.php

  • 5/27/2018 Eigenvalues

    33/55

    Nonsymmetric Eigenvalue Problems 171(1 ) Q1Q2 is real,(2 ) A 2 is therefore real,(3) the first column of Q1Q2 is easy to compute.Proof. Since Q2R2 = A l QI = R1Q1 + (u Q)I, we get

    Q1Q2R 2R 1 = Q1(R 1Q1 + (a Q)I)Ri= Q1R1Q1R1 + (a Q)Q1R1_ (A o o -I)(A o al) +(a iT)(Ao al)= A 2( ia)A o + Q^ 2I - M.

    Thus (Q1Q2)(R2R1) is the QR decomposition of the real matrix M , andtherefore Q1Q2, as well as R2R1, can be chosen real. This means that A 2 =(Q1Q2) TA(Q 1 Q2) also is real.

    The first column of Q1Q2 is proportional to the first column of A 2 -2 iaA o+Ja 2 1I, which is

    ai1 + a12a21 2 Ra)all + lul 2a21(al1 + a22 2(Ro - ))

    a21a32 0The rest of the columns of Q1Q2 are computed implicitly using the implicitQ theorem. The process is still called "bulge chasing," but now the bulge is2-by-2 instead of 1-by-1.EXAMPLE 4.11. Here is a 6-by-6 example of bulge chasing.

    1. Choose Q1 = [ 0l], where the first column of Qi is given as above,so

    x X X X X x lrx x x x x x

    X X X X x x x X X X x x_x x x x xT XX X x x

    Q1 40 0 X X X x and A i= Q 1A Q1 = ++ x x x x

    0 00 x x x o 0 0 X X X

    0 0 0 o x x o 0 00 x xWe see that there is a 2-by-2 bulge, indicted by plus signs.Downloaded12/26/12to128.95.104.109.RedistributionsubjecttoSIAMl

    icenseorcopyright;see

    http://www.siam.org/journals/ojsa.php

  • 5/27/2018 Eigenvalues

    34/55

    172 Applied Numerical Linear Algebra2. Choose a Householder reflection Q ,which affects only rows 2, 3, and 4of Q2 A l, zeroing out entries (3, 1) and (4, 1) of Al (this means that Q2is the identity matrix outside rows and columns 2 through 4):x x x x x x x x x x x xXXX x x x IX X X X X XT_ 0 x x x X xT0 X X X X xQ2 A lo x x x x and A 2 - Q 2 A ,Q 2 = o+ x x x xo o 0 x x x o++ x x xo o 0 0 X x o 0 o o x x

    and the 2-by-2 bulge has been "chased" one column.3. Choose a Householder reflection Q ,which affects only rows 3, 4, and 5of Q3 A 2 izeroing out entries (4, 2) and (5, 2) of A2 (this means that Q3

    is the identity outside rows and columns 3 through 5):

    x x x x x x l x x x x xx x x x x x x x x x x xO O x x x x

    T A2_ o X X X X X and A3 Q_ T A 2Q3 _ 0 X X X X X33 0 O x x x xo o+ x x X o o+ x x x0 0 0 0 X x o o + + x x4. Choose a Householder reflection Q ,which affects only rows 4, 5, and 6of Q4 A 3 izeroing out entries (5, 3) and (6, 3) of A3 (this means that Q4is the identity matrix outside rows and columns 4 through 6):X X X X X Xx x x x x xo x x x x xA 4 = nw4 A3Q4 = O O x x x x0 o O X X XO o o + X X

    5. Choose

    Q5

    x x0so A 5 = Q5 A 4Q5 = 0 00

    X X X x xX X X x xX X X x x0 X X X X00 X X X0 0 o x x

    0Downloa

    ded12/26/12to128.95.104.109.RedistributionsubjecttoSIAMl

    icenseorcopyright;see

    http://www.siam.org/journals/ojsa.php

  • 5/27/2018 Eigenvalues

    35/55

    Nonsymmetric Eigenvalue Problems 173Choosing a Shift for the QR AlgorithmTo completely specify one iteration of either single shift or double shift Mes-senberg QR iteration, we need to choose the shift u (and Q). Recall from theend of section 4.4.4 that a reasonable choice of single shift, one that resultedin asymptotic quadratic convergence to a real eigenvalue, was a = an , n , thebottom right entry of A . The generalization for double shifting is to use theFrancis shift, which means that a and Q are the eigenvalues of the bottom 2-by-2 corner of Ai: [ an-1 n-1 an-1 n ] This will let us converge to either two reala n ,.,_1n,neigenvalues in the bottom 2-by-2 corner or a single 2-by-2 block with complexconjugate eigenvalues. When we are close to convergence, we expect a_1,_2(andand possibly an , n _1) to be small so that the eigenvalues of this 2-by-2 matrixare good approximations for eigenvalues of A. Indeed, one can show that thischoice leads to quadratic convergence asymptotically. This means that oncean _ 1 , n _2 (and possibly an ,n _1) is small enough, its magnitude will square ateach step and quickly approach zero. In practice, this works so well that onaverage only two QR iterations per eigenvalue are needed for convergence foralmost all matrices. This justifies calling QR iteration a "direct" method.

    In practice, the QR iteration with the Francis shift can fail to converge(indeed, it leaves

    0 0 11 0 00 1 0

    unchanged). So the practical algorithm in use for decades had an "exceptionalshift" every 10 shifts if convergence had not occurred. Still, tiny sets ofmatrices where that algorithm did not converge were discovered only recently[25, 65]; matrices in a small neighborhood of

    01 0010 h 00 h 0 10 0 1 0

    where h is a few thousand times machine epsilon, form such a set. So another"exceptional shift" was recently added to the algorithm to patch this case.But it is still an open problem to find a shift strategy that guarantees fastconvergence for all matrices.

    4 5 Other Nonsymmetric Eigenvalue Problems4 5 1 Regular Matrix Pencils and Weierstrass Canonical FormThe standard eigenvalue problem asks for which scalars z the matrix A z Iis singular; these scalars are the eigenvalues. This notion generalizes in severalimportant ways.

    Downloa

    ded12/26/12to128.95.104.109.RedistributionsubjecttoSIAMl

    icenseorcopyright;see

    http://www.siam.org/journals/ojsa.php

  • 5/27/2018 Eigenvalues

    36/55

    174 Applied Numerical Linear AlgebraDEFINITION 4.6. A ) B, w here A and B are m -by-n m atrices, is called amatrix pencil, or just a pencil. Here ) is an indeterm inate, not a particular,numerical v alue.DEFINITION 4.7. If A and B are square and det(A ) B) is not identicallyzero, the pencil A ) B is called regular. Otherwise it is called singular. WhenA .A B is regular, p7) - det(A )B) is called the characteristic polynomialof A A B and the eigenvalues of A A B are defined to be

    (1 ) the roots of p(.X ) = 0,(2 ) oo (with m ultiplicity n deg(p)) if deg(p) < n.

    EXAMPLE 4.12. Let1 2A A B = 1 A 0

    0Then p(\) = det(A )B) = (1 2)^) (1 0)) (0 A) = (2\ 1).\, so theeigenvalues are ) . = 2, 0 and oo. o

    Matrix pencils arise naturally in many mathematical models of physicalsystems; we give examples below. The next proposition relates the eigenvaluesof a regular pencil A ) B to the eigenvalues of a single matrix.PIfoPosITIOlu 4.6. Let A A B be regular. If B is nonsingular, all eigenvaluesof A A B are finite and the same as the eigenvalues of A B ' or B 'A . If Bis singular, A A B has eigenvalue oo with multiplicity n rank(B). If A isnonsingular, the eigenvalues of A A B are the same as the reciprocals of theeigenvalues of A -1 B or B A -1 , where a zero eigenvalue of A -1 B correspondsto an infinite eigenvalue of A .AB.Proof. If B is nonsingular and )^' is an eigenvalue, then 0 = det(A .V B) =det(AB -1 )^'I) = det(B -1 A )1'I), so A' is also an eigenvalue of A B -1 andB -1 A . If B is singular, then take p(.^) = det(A )B), write the SVD of B asB = UEV T , and substitute to get

    p(A) = det(A ,\UEV T ) = det(U(UT A V )E)V T ) = fdet(U T A V ^E).Since rank(B) = rank(), only rank(B ) )'s appear in UT AV AE, so thedegree of the polynomial det(UT A V )^E) is rank(B).

    If A is nonsingular, det(A )B) = 0 if and only if det(I ).A -1 B ) = 0or det(I )B A -1 ) = 0. This equality can hold only if ). 54 0 and 1/) is aneigenvalue of A -1 B or BA l .

    Downloa

    ded12/26/12to128.95.104.109.RedistributionsubjecttoSIAMl

    icenseorcopyright;see

    http://www.siam.org/journals/ojsa.php

  • 5/27/2018 Eigenvalues

    37/55

    Nonsymmetric Eigenvalue Problems 175DEFINITION 4.8. Let )^' be a finite eigenvalue of the regelar pencil A A B.T hen x 0 is a right eigenvector if (A ) B)x = 0, or equivalently A x = )'B x.If ,\' = oo is an eigenvalue and Bx = 0, then x is a right eigenvector. A lefteigenvector of A A B is a right eigenvector of (A )B)*.EXAMPLE 4.13. Consider the pencil A )^B in Example 4.12. Since A and Bare diagonal, the right and left eigenvectors are just the columns of the identitymatrix. oEXAMPLE 4.14. Consider the damped mass-spring system from Example 4.1.There are two matrix pencils that arise naturally from this problem. First, wecan write the eigenvalue problem

    M -1 B M 'KA x =0=.fixas ^ 0 jx=[ 0I ]x.This may be a superior formulation if M is very ill-conditioned, so that M -1 Band M -1 K are hard to compute accurately.Second, it is common to consider the case B = 0 (no damping), so theoriginal differential equation is M (t) + K x (t) = 0. Seeking solutions of the

    form x (t) = e t x2(0), we get ))?e A i t M xi(0) + e t Kxi(0) = 0, or )?M x2(0) +Kx z (0) = 0. In other words, ) is an eigenvalue and x2 (0) is a right eigen-vector of the pencil K AM. Since we are assuming that M is nonsingular,these are also the eigenvalue and right eigenvector of M -1 K . o

    Infinite eigenvalues also arise naturally in practice. For example, laterin this section we will show how infinite eigenvalues correspond to impulseresponse in a system described by ordinary differential equations with linearconstraints, or differential-algebraic equations [41]. See also Question 4.16 foran application of m atrix pencils to computational geometry and computergraphics.

    Recall that all of our theory and algorithms for the eigenvalue problem of asingle matrix A depended on finding a similarity transformation S 'A S of Athat is in "simpler" form than A. The next definition shows how to generalizethe notion of similarity to matrix pencils. Then we show how the Jordan formand Schur form generalize to pencils.DEFINITION 4.9. Let PL and PR be nonsingular matrices. Then pencils A A Band PLA PR )PLBPR are called equivalent.PROPOSITION 4.7. The equivalent regular pencils A ) B and PLA PR /\PLB PRhave the same eigenvalues. The vector x is a right eigenvector of A A B if

    Downloa

    ded12/26/12to128.95.104.109.RedistributionsubjecttoSIAMl

    icenseorcopyright;see

    http://www.siam.org/journals/ojsa.php

  • 5/27/2018 Eigenvalues

    38/55

    176 Applied Numerical Linear Algebraand only if PR l x is a right eigenvector of PLA PR - .\PLB PR. The v ector yis a left eigenvector of A - A B if and only if (PL) -l y is a left eigenv ector ofPLA PR )^PLBPR.Proof.

    det(A - .AB) = 0 if and only if det(PL(A - )B)PR) = 0.(A - AB)x = 0 if and only if PL(A -)^B)PRPR 1 x = 0.(A - .\B)*y = 0 if and only if PR(A - ) B)*PL(PL) -l y = 0.The following theorem generalizes the Jordan canonical form to regular

    matrix pencils.THEOREM 4.10. Weierstrass canonical form. Let A A B be regular. Thenthere are nonsingular PL and PR such thatPL(A - )B)PR = diag(Jl ) Inl , ... , Jnk (''nk)- X Ink , N m l ,...

    where Jn , Ai) is an n i -by-n iJordan block with eigenvalue Ai,1

    Jni (A i) = 1Ai

    and Nmzis a "Jordan block for ). = oo with m ultiplicity m i, "1

    Nm;. = = Im2- A Jmi(0)1

    For a proof, see [110].

    Application of Jordan and Weierstrass Forms to Differential Equa-tionsConsider the linear differential equation x (t) = A x(t) + f (t), x (0) = x o. Anexplicit solution is given by x (t) = e A txo + f eA(tT)f(T)dr. If we knowthe Jordan form A = S JS -1 , we may change variables in the differentialequation to y(t) = S - l x(t) to get y(t) = Jy(t) + S - 1f (t), with solution y(t) =eJt yo+ fo eJ(t-T)S-1 f (r)dr. There is an explicit formula to compute e t orany other function f (J) of a matrix in Jordan form J. (We should not use thisformula numerically For the basis of a better algorithm, see Question 4.4.)Suppose that is given by its Taylor seriesO f(z) (0)zzand J is aPf g f ( z) = ^ i_0 iDownloaded12/2

    6/12to128.95.104.109.RedistributionsubjecttoSIAMl

    icenseorcopyright;see

    http://www.siam.org/journals/ojsa.php

  • 5/27/2018 Eigenvalues

    39/55

    Nonsymmetric Eigenvalue Problems 177single Jordan block J = .V + N , where N has ones on the first superdiagonaland zeros elsewhere. Then J) _(2)(0)(A I+N)iii=0co f i) ) i0 i ) y the binomial theoremj Ji=0 -0 (Z) (0)Z eversing the order of summationj=0 i=jn-1o f (i) (0)_ EN'E i ( j ^a',j= 0=jwhere in the last equality we used the fact that N = 0 for j > n 1. Notethat N has ones on the jth superdiagonal and zeros elsewhere. Finally, notethat ^ ; (^ )^ iJ is the Taylor expansion for f(i) (^)/j . Thus

    1xn... ii jf J) = = 0 Nj f (j) (^)

    2 (n-1)Ir .4.6)= f"(a)2

    f(A)so that f (J) is upper triangular with f(i) 7)/j on the jth superdiagonal.

    To solve the more general problem Bx = A x + f (t), A A B regular, we usethe Weierstrass form: let PL(A )B)PR be in Weierstrass form, and rewritethe equation as PLBPRPR 1= PLA PRP, 1 x + PL f (t). Let PR 1 x = y andPL f (t) = g(t). Now the problem has been decomposed into subproblems:

    Ini

    InkJml ( 0 )

    J., (0)

    Downloa

    ded12/26/12to128.95.104.109.RedistributionsubjecttoSIAMl

    icenseorcopyright;see

    http://www.siam.org/journals/ojsa.php

  • 5/27/2018 Eigenvalues

    40/55

    178 Applied Numerical Linear AlgebraJ1 p . )_ Ig + g .l

    mr

    Each subproblem y = Jni (.X j)y+g(t) - Jy+D(t) is a standard linear ODEas above with solution

    y(t) = y(0)eJt +in t eJ(tT)9( T)dy

    The solution of Jm (0)y = y + g(t) is gotten by back substitution startingfrom the last equation: write Jm (0)y = y + g(t) as

    yl yl gl_ +

    0 Ym ^m 9mThe mth (last) equation says 0 = y.,,,, + gmor ym = g,mThe ith equation

    says y i+l= yi + gi, so yi = y i+l gi and thusm dki

    ui =dt k-i 9k (t)k=iTherefore the solution depends on derivatives of g, not an integral of g as inthe usual ODE. Thus a continuous g which is not differentiable can cause adiscontinuity in the solution; this is sometimes called an impulse response andoccurs only if there are infinite eigenvalues. Furthermore, to have a continuoussolution y must satisfy certain consistency conditions at t = 0:

    ik ii 0) _tk-i9k 0)k=mNumerical methods, based on time-stepping, for solving such differentialalgebraic equations, or ODEs with algebraic constraints, are described in [41].Generalized Schur Form for Regular PencilsJust as we cannot compute the Jordan form stably, we cannot compute itsgeneralization by Weierstrass stably. Instead, we compute the generalizedSchur form.

    Downloa

    ded12/26/12to128.95.104.109.RedistributionsubjecttoSIAMl

    icenseorcopyright;see

    http://www.siam.org/journals/ojsa.php

  • 5/27/2018 Eigenvalues

    41/55

    Nonsymmetric Eigenvalue Problems 179THEOREM 4.11. Generalized Schur form. Let A A TB be regular. Then thereexist unitary QL and QR so that QLA QR = TA and QLB QR = TB are bothupper triangular. The eigenvalues of A A B are then TAITB z2 , the ratios ofthe diagonal entries of TA and TB.Proof. The proof is very much like that for the usual Schur form. Let A' be aneigenvalue and x be a unit right eigenvector: 1X 112 = 1. Since A x .V'Bx = 0,both A x and B x are multiples of the same unit vector y (even if one of A xor Bx is zero). Now let X = [x, X ] and Y = [y, Y ] be unitary matrices withfirst columns x and y, respectively. Then Y * A X = [ all 22 and Y *B X[

    bll b12 ] by construction. Apply this process inductively to A22 ) B22.0 B 22If A and B are real, there is a generalized real Schur form too: real or-thogonal QL and QR, where QLA QR is quasiupper triangular and QLBQR isupper triangular.

    The QR algorithm and all its refinements generalize to compute the gener-alized (real) Schur form; it is called the QZ algorithm and available in LAPACKsubroutine sgges. In Matlab one uses the command eig(A,B).

    Definite PencilsA simpler special case that often arises in practice is the pencil A AB, whereA = A T, B = B T, and B is positive definite. Such pencils are called definitepencils.THEOREM 4.12. Let A = A T , and let B = B Tbe positive definite. Thenthere is a real nonsingular matrix X so that X T A X = diag(al, ... , ca,) andX T B X = diag(/31, . .. , j3,). In particular, all the eigenvalues ai//3 are realand finite.Proof. The proof that we give is actually the algorithm used to solve theproblem:

    (1 ) Let LL T Bbe the Cholesky decomposition.(2 ) Let H = L l A L T ; note that H is symmetrie.(3) Let H = QA Q T , with Q orthogonal, A real and diagonal.

    Then X L TQ satisfies X T A X = Q T L -1A L T Q = A and X TB X =Q T L -1 BL T Q =1.

    Note that the theorem is also true if cA + ,QB is positive definite for somescalars a and f3.Software for this problem is available as LAPACK routine ssygv.

    EXAMPLE 4.15. Consider the pencil K A M from Example 4.14. This is adefinite pencil since the stiffness matrix K is symmetrie and the mass matrix

    Downloa

    ded12/26/12to128.95.104.109.RedistributionsubjecttoSIAMl

    icenseorcopyright;see

    http://www.siam.org/journals/ojsa.php

  • 5/27/2018 Eigenvalues

    42/55

    180 Applied Numerical Linear AlgebraM is symmetric and positive definite. In fact, K is tridiagonal and M isdiagonal in this very simple example, so M's Cholesky factor L is also diagonal,and H = L -1 K L Tis also symmetric and tridiagonal. In Chapter 5 we willconsider a variety of algorithms for the symmetric tridiagonal eigenproblem.0

    4 5 2 Singular Matrix Pencils and the KroneckerCanonical Form

    Now we consider singular pencils A A B . Recall that A A B is singular ifeither A and B are nonsquare or they are square and det(A ) B ) = 0 forall values of ) . The next example shows that care is needed in extending thedefinition of eigenvalues to this case.

    EXAMPLE 4.16. Let A = [ 0 0 ] and B [ 0 0 ] . Then by making arbi-trarily small changes to get A = [ 1 1 and B [ 1 3 the eigenvaluesbecome e1/E3 and E2/E4, which can be arbitrary complex numbers. So theeigenvalues are infinitely sensitive. o

    Despite this extreme sensitivity, singular pencils are used in modeling cer-tain physical systems, as we describe below.

    We continue by showing how to generalize the Jordan and Weierstrass formsto singular pencils. In addition to Jordan and "infinite Jordan" blocks, we gettwo new "singular blocks" in the canonical form.

    THEORE M 4.13. Kronecker canonical form. Let A and B be arbitrary rectan-gular m-by -n m atrices. Then there are square nonsingular matrices PL andPR so that PL A PR ) PLBPR is block diagonal with four kinds of blocks:)V A 1m W ) .W = , m-by-m Jordan block;1-a1) m-by-m Jordan block

    Nm ,Aor1Downloaded12/26/12to128.95.104.109.RedistributionsubjecttoSIAMl

    icenseorcopyright;see

    http://www.siam.org/journals/ojsa.php

  • 5/27/2018 Eigenvalues

    43/55

    Nonsymmetric Eigenvalue Problems 1811m-by-(m + 1) rightL,

    _ingular block;1\1_^ (m + 1)-by-m leftL m singular block.A

    We call Lma right singular block since it has a right null vector [.gym, -gym-i1] for all A. L has an analogous left null vector.For a proof, see [110].Just as Schur form generalized to regular matrix pencils in the last section,

    it can be generalized to arbitrary singular pencils as well. For the canonicalform, perturbation theory and software, see [27, 79, 246].

    Singular pencils are used to model systems arising in systems and control.We give two examples.

    Application of K ronecker Form to D ifferential EquationsSuppose that we want to solve Bx = A x + f (t), where A - .AB is a singularpencil. Write PLBPRPR l x = PLA PRP, l x + PL f (t) to decompose the prob-lem into independent blocks. There are four kinds, one for each kind in theKronecker form. We have already dealt with J,,(,X ') -) I and N blocks whenwe considered regular pencils and Weierstrass form, so we have to consideronly L mand L n blocks. From the L m blocks we get

    0 110ll_+0 1 1 0m mor Y2IJl+9ir2(t) = ^J2(0)+ f(yl(T)+91(T))dT^9392+g2r3(t) = 93(0 ) +JO(92(T )+92( -r dT,Y m +l = y m I gm or y m +l(t) = 9m { 1( 0)+ f0(9m (T)+9m (T))d-.

    This means that we can choose yl as an arbitrary integrable function and usethe above recurrence relations to get a solution. This is because we have onemore unknown than equation, so the the ODE is underdetermined. From theL blocks we get

    Downloa

    ded12/26/12to128.95.104.109.RedistributionsubjecttoSIAMl

    icenseorcopyright;see

    http://www.siam.org/journals/ojsa.php

  • 5/27/2018 Eigenvalues

    44/55

    182 Applied Numerical Linear Algebra0 11 l1ym Jm,m+1or 0 = yl + 91,U1 = y2 92,

    2Jm-1 = gm + 9m,Ym = gm l

    Starting with the first equation, we solve to getyl = 91,Y2 = 92 g1,

    dm-1gm = 9m 9m -1 dtm-1 91

    and the consistency condition gm +l = g,,,, dt^ gl . So unless the gisatisfy this equation, there is no solution. Here we have one more equationthan unknown, and the subproblem is overdetermined.Application of Kronecker Form to Systems and C ontrol TheoryThe controllable subspace of i(t) = A x (t) + B u(t) is the space in which thesystem state x(t) can be "controlled" by choosing the control input u(t) startingat x(0) = 0. This equation is used to model (feedback) control systems, wherethe u(t) is chosen by the control system engineer to make x(t) have certaindesirable properties, such as boundedness. From

    x (t) = J te A(t- T )B u(T)dr = f (t-T)ZA Bu()dT2 i=0A 'B(t I

    iT)i u(r)d roi=0

    one can prove the controllable space is span{[B, AB, A 2 B, ... , An -1 B]}; anycomponents of x(t) outside this space cannot be controlled by varying u(t).To compute this space in practice, in order to determine whether the physicalsystem being modeled can in fact be controlled by input u(t), one applies a QR-like algorithm to the singular pencil [B, A - )J]. For details, see [78, 246, 247].

    Downloa

    ded12/26/12to128.95.104.109.RedistributionsubjecttoSIAMl

    icenseorcopyright;see

    http://www.siam.org/journals/ojsa.php

  • 5/27/2018 Eigenvalues

    45/55

    Nonsymmetric Eigenvalue Problems 1834.5.3. Nonlinear Eigenvalue Problem sFinally, we consider the nonlinear eigenvalue problem or matrix polynomialdd>,'A i= dA d +^ d- 'A d_ 1 + ... + A 1+ A0 .4.7)

    i=oSuppose for simplicity that the A iare n-by-n matrices and A d is nonsingular.DEFINITION 4.10. The characteristic polynomial of the matrix polynomial (4.7)is p(X) = det(Ed_ 0) 'A i ) . The roots of p(,) = 0 are def ined to be the eigenval-ues. One can conf irm that p(X ) has degree d n, so there are d n eigenvalues.Suppose that -y is an eigenvalue. A nonzero vector x satisfy ing ^a ry i A ix = 0is a right eigenvector for 'y . A left eigenvector y is defined analogously byEd 7 i y * Ai = 0.EXAMPLE 4.17. Consider Example 4.1 once again. The ODE arising therein equation (4.3) is M (t) + B (t) + K x(t) = 0. If we seek solutions of theform x(t) = e^i txi(0), we get ex i t () 2 Mxi(0) + )^iBxi(0) + Kxi(0)) = 0, or.)2Mxi(0) + )iBxi(0) + Kxi(0) = 0. Thus X i is an eigenvalue and xi(0) is aneigenvector of the matrix polynomial ) 2 M + .)B + K . o

    Since we are assuming that A d is nonsingular, we can multiply throughby Ad 1to get the equivalent problem ,d j + A d 1 Ad-1X d-1 + + A d'Ao.Therefore, to keep the notation simple, we will assume A d = I (see section 4.6for the general case). In the verg simplest case where each A i is 1-by-1, i.e., ascalar, the original matrix polynomial is equal to the characteristic polynomial.

    We can turn the problem of finding the eigenvalues of a matrix polynomialinto a standard eigenvalue problem by using a trick analogous to the one usedto change a high-order ODE into a first-order ODE. Consider first the simplestcase n = 1, where each A i is a scalar. Suppose that ry is a root. Then thevector x' = [-y d-1 7d-2 1]T satisfies

    A d-1 A d-2d1 i.. ... ... _Ao_ ^ io,yA i1...... 7d-1Cx - ...0= 720....1 7

    ^d-1

    727

    Downloa

    ded12/26/12to128.95.104.109.RedistributionsubjecttoSIAMl

    icenseorcopyright;see

    http://www.siam.org/journals/ojsa.php

  • 5/27/2018 Eigenvalues

    46/55

    184 Applied Numerical Linear AlgebraThus x' is an eigenvector and -y is an eigenvalue of the matrix C, which iscalled the companion matrix of the polynomial (4.7).(The Matlab routine roots for finding roots of a polynomial applies theHessenberg QR iteration of section 4.4.8 to the companion matrix C, since this

    is currently one of the most reliable, if expensive, methods known [100, 117,241]. Cheaper alternatives are under development.)

    The same idea works when the A i are matrices. C becomes an (n d)-by-(n d) block companion matrix, where the 1's and O's below the top row becomen-by-n identity and zero matrices, respectively. Also, x ' becomesd-1 x

    d-2x

    ry xx

    where x is a right eigenvector of the matrix polynomial. It again turns outthat Cx ' = yx '.EXAMPLE 4.18. Returning once again to X 2 M + ) B + K , we first convert itto ) 2+ A M- 'B + M -1 K and then to the companion matrix

    _ M -1 B M -1 KC 0.This is the same as the matrix A in equation 4.4 ofJExample 4.1. o

    Finally, Question 4.16 shows how to use matrix polynomials to solve aproblem in computational geometry.4.6. SummaryThe following list summarizes all the canonical forms, algorithms, their costs,and applications to ODEs described in this chapter. It also includes pointersto algorithms exploiting symmetry, although these are discussed in more detailin the next chapter. Algorithms for sparse matrices are discussed in Chapter 7.

    A - AI- Jordan form: For some nonsingular S,

    )Z -) ntxnzA A I = Sdiag ... , , ...i1^i - xDownloaded12/26/12to128.95.104.109.RedistributionsubjecttoSIAMl

    icenseorcopyright;see

    http://www.siam.org/journals/ojsa.php

  • 5/27/2018 Eigenvalues

    47/55

    Nonsymmetric Eigenvalue Problems 18 5 Schur form: For some unitary Q, A A I = Q(T )I)Q*, where Tis triangular. Real Schur form of real A : For some real orthogonal Q, A AI =Q(T X I)Q T , where T is real quasi-triangular.Application to ODEs: Provides solution of x(t) = A x(t) + f (t).

    Algorithm: Do Hessenberg reduction (Algorithm 4.6), followed byQR iteration to get Schur form (Algorithm 4.5, implemented asdescribed in section 4.4.8). Eigenvectors can be computed from theSchur form (as described in section 4.2.1).

    Cost: This costs 10n 3flops if eigenvalues only are desired, 25n 3ifT and Q are also desired, and a little over 27n 3if eigenvectors arealso desired. Since not all parts of the algorithm can take advantageof the Level 3 BLAS, the cost is actually higher than a comparisonwith the 2n 3cost of matrix multiply would indicate: instead of tak-ing (10n3 )/(2n 3 ) = 5 times longer to compute eigenvalues than tomultiply matrices, it takes 23 times longer for n = 100 and 19 timeslonger for n = 1000 on an IBM RS6000/590 [10, page 62]. Insteadof taking 27n3 )/(2n3 ) = 13.5 times longer to compute eigenvaluesand eigenvectors, it takes 41 times longer for n = 100 and 60 timeslonger for n = 1000 on the same machine. Thus computing eigen-values of nonsymmetric matrices is expensive. (The symmetric caseis much cheaper; see Chapter 5.)

    LAPACK: sgees for Schur form or sgeev for eigenvalues and eigen-vectors; sgeesx or sgeevx for error bounds too.

    Matlab: schur for Schur form or eig for eigenvalues and eigenvec-tors.

    Exploiting symmetry: When A = A *, better algorithms are dis-cussed in Chapter 5, especially section 5.3.

    Regular A ) B (det(A A B) 0) Weierstrass form: For some nonsingular PL and PR ,

    A AB = PL diag Jordan,1, nixni

    PR 1 .1

    Generalized Schur form: For some unitary QL and QR, A AB =QL(TA )TB)Q* , where T A and TB are triangular.

    Downloa

    ded12/26/12to128.95.104.109.RedistributionsubjecttoSIAMl

    icenseorcopyright;see

    http://www.siam.org/journals/ojsa.php

  • 5/27/2018 Eigenvalues

    48/55

    186 Applied Numerical Linear Algebra- Generalized real Schur form of real A and B : For some real orthog-

    onal QL and QR, A - A B = QL(TA - .\TB )QR, where TA is realquasi-triangular and TB is real triangular.

    - Application to ODEs: Provides solution of B (t) = A x (t) + f (t),where the solution is uniquely determined but may depend non-smoothly on the data (impulse response).

    - Algorithm: Hessenberg/triangular reduction followed by QZ itera-tion (QR applied implicitly to A B -1 ).

    - Cost: Computing TA and TB costs 30n 3 . Computing QL and QR inaddition costs 66n 3 . Computing eigenvectors as well costs a littleless than 69n 3in total. As before, Level 3 BLAS cannot be used inall parts of the algorithm.

    - LAPACK: sgges for Schur form or sggev for eigenvalues; sggesxor sggevx for error bounds too.

    - Matlab: eig for eigenvalues and eigenvectors.- Exploiting symmetry: When A = A *, B = B *, and B is positive

    definite, one can convert the problem to finding the eigenvaluesof a single symmetric matrix using Theorem 4.12. This is done inLAPACK routines ssygv, sspgv (for symmetric matrices in "packedstorage"), and ssbgv (for symmetric band matrices).

    Singular A - AB- Kronecker form: For some nonsingular PL and PR,

    AAB=PL1ni x nim^xmy

    diag Weierstrass, PR 11- Generalized upper triangular form: For some unitary QL and QR,A - A B = QL(TA - \TB )QR, where TA and TB are in generalizedupper triangular form, with diagonal blocks corresponding to dif-ferent parts of the Kronecker form. See [79, 246] for details of theform and algorithms.

    - Cost: The most general and reliable version of the algorithm cancost as much as O(n 4 ), depending on the details of the KroneckerStructure; this is much more than for regular A - A B. There is alsoa slightly less reliable O(n 3 ) algorithm [27].

    Downloa

    ded12/26/12to128.95.104.109.RedistributionsubjecttoSIAMl

    icenseorcopyright;see

    http://www.siam.org/journals/ojsa.php

  • 5/27/2018 Eigenvalues

    49/55

    Nonsymmetric Eigenvalue Problems 187 Application to ODEs: Provides solution of B(t) = A x(t) + f (t),where the solution may be overdetermined or underdetermined. Software: NETLIB/linalg/guptri. Matrix polynomials Ed o,V A i [118]

    If A d = I (or Ad is square and well-conditioned enough to replaceeach A i by Ad'Ai), then linearize to get the standard problem

    Ad-1 Ad-2 ... ... ... A a1 0 ... ... ... 00 1 0 ... ... 0 .I.

    0 .. ... 0 1 0If Ad is ill-conditi, ^ned or s ingular, linearize to get the pencil

    Ad_1 A d_2 A oAd1 I00 I

    4.7. References and O ther Topics for Chapter 4For a general discussion of properties of eigenvalues and eigenvectors, see [139].For more details about perturbation theory of eigenvalues and eigenvectors,see [161, 237, 52], and chapter 4 of [10]. For a proof of Theorem 4.7, see [69].For a discussion of Weierstrass and Kronecker canonical forms, see [110, 118].For their application to systems and control theory, see [246, 247, 78]. Forapplications to computational geometry, graphics, and mechanical CAD, see[181, 182, 165]. For a discussion of parallel algorithms for the nonsymmetriceigenproblem, see [76] .

    4.8. Questions for Chapter 4QUESTION 4.1. (Easy) Let A be defined as in equation (4.1). Show thatdet(A) = j jb_ 1det(A) and then that det(A AI) = rjb_ 1det(A22 XI).Conclude that the set of eigenvalues of A is the union of the sets of eigenvaluesof All through Abb.QUESTION 4.2. (Medium; Z. Bai) Suppose that A is normal; i.e., AA* = A*A.Show that if A is also triangular, it must be diagonal. Use this to show thatan n-by-n matrix is normal if and only if it has n orthonormal eigenvectors.Hint: Show that A is normal if and only if its Schur form is normal.

    Downloa

    ded12/26/12to128.95.104.109.RedistributionsubjecttoSIAMl

    icenseorcopyright;see

    http://www.siam.org/journals/ojsa.php

  • 5/27/2018 Eigenvalues

    50/55

    188 Applied Numerical Linear AlgebraQUESTION 4.3. (Easy; Z. Bai) Let A and be distinct eigenvalues of A , let xbe a right eigenvector for )., and let y be a left eigenvector f