ma 580; numerical analysis i - nc state university · 2018. 7. 23. · matrix computations, golub...
TRANSCRIPT
Part VIIIb: Eigenvalue Conditioning
MA 580; Numerical Analysis I
C. T. KelleyNC State University
Version of November 14, 2016
NCSU, Fall 2016Part VIIIb: Eigenvalue Conditioning
c©C. T. Kelley, I. C. F. Ipsen, 2016 Part VIIIb: Eigenvalue Conditioning MA 580, Fall 2016 1 / 46
Part VIIIb: Eigenvalue Conditioning
References
This part of the notes comes from
Applied Numerical Linear Algebra, Demmel, SIAM 1997
Matrix Computations, Golub and Van Loan, Johns Hopkins,2013
Numerical Linear Algebra, Trefethen and Bau, SIAM 1997
Introduction to Matrix Computations, Stewart, AcademicPress, 1973
c©C. T. Kelley, I. C. F. Ipsen, 2016 Part VIIIb: Eigenvalue Conditioning MA 580, Fall 2016 2 / 46
Part VIIIb: Eigenvalue Conditioning
Eigenvalue Conditioning
Here’s some bad news.
A =
0 1 00 0 1 0
. . .. . .
. . .. . .
0 0 1 00 0 1
0 0
,B =
0 1 00 0 1 0
. . .. . .
. . .. . .
0 0 1 00 0 1
ε 0 0
.
c©C. T. Kelley, I. C. F. Ipsen, 2016 Part VIIIb: Eigenvalue Conditioning MA 580, Fall 2016 3 / 46
Part VIIIb: Eigenvalue Conditioning
Eigenvalues
A is a Jordan block. So
σ(A) = {0}Algebraic multiplicity NGeometric multiplicity 1
As for B . . .
c©C. T. Kelley, I. C. F. Ipsen, 2016 Part VIIIb: Eigenvalue Conditioning MA 580, Fall 2016 4 / 46
Part VIIIb: Eigenvalue Conditioning
Spectrum of B
Suppose Bx = λx and x1 = 1 then
0 1 00 0 1 0
. . .. . .
. . .. . .
0 0 1 00 0 1
ε 0 0
1x2...
xN−2xN−1xN
=
x2...
xN−2xN−1xNε
= λ
1x2...
xN−2xN−1xN
c©C. T. Kelley, I. C. F. Ipsen, 2016 Part VIIIb: Eigenvalue Conditioning MA 580, Fall 2016 5 / 46
Part VIIIb: Eigenvalue Conditioning
and so . . .
λ = x2, x3 = λx2 = λ2, . . . ,
xN = λxN−1 = λN−1, ε = λxN−1 = λN .
So ε = λN .We have N solutions
λk = ε1/Ne2πik/N 1 ≤ k ≤ N.
which are evenly spaced on the circle of radius ε1/N on thecomplex plane.
c©C. T. Kelley, I. C. F. Ipsen, 2016 Part VIIIb: Eigenvalue Conditioning MA 580, Fall 2016 6 / 46
Part VIIIb: Eigenvalue Conditioning
Condition number
Any sensible definition of condition number is the ratio of
the (relative) size of the change in the output, which is ε1/N
to the size of the change in the input, which is O(ε)
So
κ =ε1/N
ε= ε
N−1N →∞
as ε→ 0.So, Jordan blocks are bad things. Any hope for nicer problems?
c©C. T. Kelley, I. C. F. Ipsen, 2016 Part VIIIb: Eigenvalue Conditioning MA 580, Fall 2016 7 / 46
Part VIIIb: Eigenvalue Conditioning
The characteristic polynomial
Recall that the characteristic polynomial of A is
p(z ,A) = det(zI− A)
and its roots are the eigenvalues of A.The roots of a polynomial are continiuous functions of thecoefficients, so then
σ(A + δA)→ σ(A) as ‖δA‖ → 0.
As the Jordan block example shows, eigenvalues need not bedifferentiable functions of the coefficients.
c©C. T. Kelley, I. C. F. Ipsen, 2016 Part VIIIb: Eigenvalue Conditioning MA 580, Fall 2016 8 / 46
Part VIIIb: Eigenvalue Conditioning
Simple eigenvlaues because I’m tired of Jordan blocks
Suppose λ is a simple eigenvalue of A.
Is a nearby simple eigenvalue λ+ δλ of A + δA out there?
Is there a useful definition of condition number?
c©C. T. Kelley, I. C. F. Ipsen, 2016 Part VIIIb: Eigenvalue Conditioning MA 580, Fall 2016 9 / 46
Part VIIIb: Eigenvalue Conditioning
Left and Right eigenvectors
Ax = λx says “x is a right eigenvector.
σ(A) = σ(AT ), so there’s also a left eigenvector
ATy = λy or yTA = λyT
From now on, x will be a right eigenvector, y a lefteigenvector, and
‖x‖ = ‖y‖ = 1
c©C. T. Kelley, I. C. F. Ipsen, 2016 Part VIIIb: Eigenvalue Conditioning MA 580, Fall 2016 10 / 46
Part VIIIb: Eigenvalue Conditioning
Perturbation theory for simple eigenvalues
Theorem: Assume
λ is a simple eigenvalue of A,
x (y) are normalized right (left) eigenvectors,
λ+ δλ is the eigenvalue of A + δA nearest to λ.
Let θ(y, x) be the acute angle between y and x.Note that sec(θ(y, x)) = 1/|yTx|.Then . . .
c©C. T. Kelley, I. C. F. Ipsen, 2016 Part VIIIb: Eigenvalue Conditioning MA 580, Fall 2016 11 / 46
Part VIIIb: Eigenvalue Conditioning
Perturbation estimates
δλ =yT δAx
yTx+ O
(‖δA‖2
)and
|δλ| ≤ ‖δA‖xyTx
+ sec(θ(y, x))‖δA‖+ O(‖δA‖2
)So sec(θ(y, x)) = 1/|yTx| is the condition number of the simpleeigenvalue λ.
c©C. T. Kelley, I. C. F. Ipsen, 2016 Part VIIIb: Eigenvalue Conditioning MA 580, Fall 2016 12 / 46
Part VIIIb: Eigenvalue Conditioning
Proof: I
We’ve done things like this for equations.
(A + δA)(x + δx)− Ax = (λ+ δλ)(x + δx)− λx
so,Aδx + δAx + δAδx = λδx + δλx + δλδx
Ignore for now any term with two δs in it and multiply by yT
yTAδx + yT δAx ≈ λyT δx + δλyTx
Note that yTAδx = λyT δx because yTA = λyT . So . . .
c©C. T. Kelley, I. C. F. Ipsen, 2016 Part VIIIb: Eigenvalue Conditioning MA 580, Fall 2016 13 / 46
Part VIIIb: Eigenvalue Conditioning
Proof: II
yT δAx ≈ δλyTx
so
δλ ≈ yT δAx
yTx
The terms we ingored, after multiplying by yT are
yT δAδx and δλyT δx
If we now put them back we get . . .
c©C. T. Kelley, I. C. F. Ipsen, 2016 Part VIIIb: Eigenvalue Conditioning MA 580, Fall 2016 14 / 46
Part VIIIb: Eigenvalue Conditioning
Proof: III
δλ =yT δAx
yTx+
yT (δAδx− δλδx)
yTx
The terms we neglected are smaller than the main term, if δA issufficiently small, so
‖δλ‖ = O
(‖δA‖yTx
)We now assume that δA is small enough that we can ignore factorsof 1/yTx in the high order terms.This means that
yT (δAδx− δλδx)
yTx= O(‖δA‖‖δx‖).
c©C. T. Kelley, I. C. F. Ipsen, 2016 Part VIIIb: Eigenvalue Conditioning MA 580, Fall 2016 15 / 46
Part VIIIb: Eigenvalue Conditioning
Proof: IV
The power method says that
‖δx‖ = O(|δλ|)
if A is sufficiently small. That’s it.
c©C. T. Kelley, I. C. F. Ipsen, 2016 Part VIIIb: Eigenvalue Conditioning MA 580, Fall 2016 16 / 46
Part VIIIb: Eigenvalue Conditioning
Observations
If A = AT , then |yTx| = 1, and the conditioning is perfect.
For the Jordan block example,
x =
10...00
and y =
00...01
so the condition number is infinite.
c©C. T. Kelley, I. C. F. Ipsen, 2016 Part VIIIb: Eigenvalue Conditioning MA 580, Fall 2016 17 / 46
Part VIIIb: Eigenvalue Conditioning
Gershgorin Theorem
Let B be a square matrix. The eigenvalues of B lie in the union ofthe disks
Gi =
z
∣∣∣∣ |z − bii | ≤∑j 6=i
|bij |
for 1 ≤ i ≤ N
c©C. T. Kelley, I. C. F. Ipsen, 2016 Part VIIIb: Eigenvalue Conditioning MA 580, Fall 2016 18 / 46
Part VIIIb: Eigenvalue Conditioning
Proof of Gershgorin Theorem: I
Let λ ∈ σ(A) and let x be a corresponding eigenvector. Let i besuch that
|xi | = ‖x‖∞.
Since Ax = λx(λ− aii )xi =
∑j 6=i
aijxj .
c©C. T. Kelley, I. C. F. Ipsen, 2016 Part VIIIb: Eigenvalue Conditioning MA 580, Fall 2016 19 / 46
Part VIIIb: Eigenvalue Conditioning
Proof of Gershgorin Theorem: II
We picked i so that |xj |/|xi | ≤ 1 for j 6= i , so
|λ− aii | ≤∑j 6=i
|aij ||xj |/|xi |. ≤∑j 6=i
|aij |,
as asserted.
c©C. T. Kelley, I. C. F. Ipsen, 2016 Part VIIIb: Eigenvalue Conditioning MA 580, Fall 2016 20 / 46
Part VIIIb: Eigenvalue Conditioning
Diagonalizable Matrices and the Bauer-Fike Theorem
Theorem: Suppose
A is diagonalizable with only simple eigenvalues {λi}xi (yi ) are the normalized left (right) eigenvectorscorresponding to λi .
Then the eigenvalues of A + δA lie in disks Bi where
Bi =
{z | |z − λi | ≤
N‖δA‖|yTi xi |
}
c©C. T. Kelley, I. C. F. Ipsen, 2016 Part VIIIb: Eigenvalue Conditioning MA 580, Fall 2016 21 / 46
Part VIIIb: Eigenvalue Conditioning
Proof: preliminaries
Lemma: Diagonalization Let S be the matrix with the righteigenvectors as columns. Then
S−1 =
(y1
yT1 x1,
y2yT2 x2
, . . .yN
yTNxN
)T
Proof: plug in.
c©C. T. Kelley, I. C. F. Ipsen, 2016 Part VIIIb: Eigenvalue Conditioning MA 580, Fall 2016 22 / 46
Part VIIIb: Eigenvalue Conditioning
Proof: more preliminaries
Lemma: Suppose the columns of S are normalized (‖si‖ = 1).Then ‖S‖ ≤
√N.
Proof: Let x be the unit vector so that ‖Sx‖ = ‖S‖. UseCauchy-Schwarz
N∑i=1
|ai ||bi | ≤ ‖a‖2‖b‖2
‖S‖ = ‖Sx‖ = ‖∑N
i=1 sixi‖ ≤∑N
i=1 ‖si‖|xi |
≤√∑N
i=1 ‖si‖2√∑N
i=1 x2i ≤
√∑Ni=1 =
√N.
c©C. T. Kelley, I. C. F. Ipsen, 2016 Part VIIIb: Eigenvalue Conditioning MA 580, Fall 2016 23 / 46
Part VIIIb: Eigenvalue Conditioning
Proof of Bauer-Fike: I
Note that S is the diagonalizing transformation for A, so
S−1AS = Λ.
Apply Gersgorin to
B = S−1(A + δA)S = Λ + F
where F = S−1δAS.Gersgorin says that the eigenvalues of B lie in the disks,
Gi =
|λ− (λi + fii )| ≤∑j 6=i
|fij |
c©C. T. Kelley, I. C. F. Ipsen, 2016 Part VIIIb: Eigenvalue Conditioning MA 580, Fall 2016 24 / 46
Part VIIIb: Eigenvalue Conditioning
Proof of Bauer-Fike: II
Since
Gi =
|λ− (λi + fii )| ≤∑j 6=i
|fij |
Any λ ∈ Gi satisfies
|λ− λi | − |fii | ≤∑j 6=i
|fij | which implies that
|λ− λi | ≤N∑i=1
|fij | ≤√N
√√√√ N∑i=1
|fij |2 =√N‖F(i , :)‖
c©C. T. Kelley, I. C. F. Ipsen, 2016 Part VIIIb: Eigenvalue Conditioning MA 580, Fall 2016 25 / 46
Part VIIIb: Eigenvalue Conditioning
Proof of Bauer-Fike: III
So we need a bound on the ith row of F.Note that if B = B1B2, then
‖B(i , :)‖ ≤ ‖B1(i , :)‖‖B2‖
as you can see from the rules for matrix-matrix multiply.So, since F = S−1δAS,
‖F(i , :)‖ ≤ ‖S−1(i , :)‖‖δA‖‖S‖
and we have formulae to estimate all this stuff . . .
c©C. T. Kelley, I. C. F. Ipsen, 2016 Part VIIIb: Eigenvalue Conditioning MA 580, Fall 2016 26 / 46
Part VIIIb: Eigenvalue Conditioning
Proof of Bauer-Fike: IV
Since the columns of S are the normalized eigenvectors
‖S‖ ≤√N
by one of the lemmas.
Use the other lemma and ‖yi‖ = 1 to get
‖S−1(i , :)‖ ≤ 1
|yTi xi |
and glue everything together to get . . .
c©C. T. Kelley, I. C. F. Ipsen, 2016 Part VIIIb: Eigenvalue Conditioning MA 580, Fall 2016 27 / 46
Part VIIIb: Eigenvalue Conditioning
Proof of Bauer-Fike: V
‖F(i , :)‖ ≤√N
|yTi x|‖δA‖.
Plug into
|λ− λi | ≤N∑i=1
|fij | ≤√N
√√√√ N∑i=1
|fij |2 =√N‖F(i , :)‖
and we’re done.
c©C. T. Kelley, I. C. F. Ipsen, 2016 Part VIIIb: Eigenvalue Conditioning MA 580, Fall 2016 28 / 46
Part VIIIb: Eigenvalue Conditioning
`p etimates: Bauer-Fike revisited
Theorem: Suppose
A is diagonalizable with eigenvalues {λi}xi are the normalized left eigenvectors corresponding to λi .
µ ∈ σ(A + δA)
Thenmin
λ∈σ(A)|µ− λ| ≤ κp(S)‖δA‖p
where S is the matrix whose columns are the eigenvectors of A.
c©C. T. Kelley, I. C. F. Ipsen, 2016 Part VIIIb: Eigenvalue Conditioning MA 580, Fall 2016 29 / 46
Part VIIIb: Eigenvalue Conditioning
Proof: I
If µ ∈ σ(A), then the left side of the estimate is 0.Here we let ‖ · ‖ be any `p norm.Otherwise, the matrix Λ− µI is not singular, but
S−1(A + δA− µI)S = S−1(A− µI)S + S−1δAS
= (Λ− µI) + S−1δAS
is singular.
c©C. T. Kelley, I. C. F. Ipsen, 2016 Part VIIIb: Eigenvalue Conditioning MA 580, Fall 2016 30 / 46
Part VIIIb: Eigenvalue Conditioning
Proof: II
Multiply the singular matrix by (Λ− µI)−1 to see that
I + (Λ− µI)−1(S−1δAS)
is also singular. Hence
1 ≤ ‖(Λ− µI)−1(S−1δAS)‖ ≤ ‖(Λ− µI)−1‖‖S−1‖‖δA‖‖S‖
= maxλ∈σ(A)1
|λ−µ|κp(S)‖δA‖.
That’s it since max(1/x) = 1/(min x).
c©C. T. Kelley, I. C. F. Ipsen, 2016 Part VIIIb: Eigenvalue Conditioning MA 580, Fall 2016 31 / 46
Part VIIIb: Eigenvalue Conditioning
The QR algorithm
Consider this iteration:
A0 = Afor k = 0, . . . do
Factor Ak = QRAk+1 = RQ
end for
c©C. T. Kelley, I. C. F. Ipsen, 2016 Part VIIIb: Eigenvalue Conditioning MA 580, Fall 2016 32 / 46
Part VIIIb: Eigenvalue Conditioning
What does this have to do with eigenvalues?
Note thatAk+1 = QTQRQ = QTAQ
is similar to Ak , so has the same eigenvalues.Let’s give it a shot.
A=[1 2 3; 4 5 6; 7 8 9];
for i=1:10
[q,r]=qr(A); A=r*q;
end
c©C. T. Kelley, I. C. F. Ipsen, 2016 Part VIIIb: Eigenvalue Conditioning MA 580, Fall 2016 33 / 46
Part VIIIb: Eigenvalue Conditioning
Results
The eigenvalues are
>> eig(A)
ans =
1.6117e+01
-1.1168e+00
-1.3037e-15
and when the loop’s done
A =
1.6117e+01 4.8990e+00 -6.9295e-16
-8.0448e-11 -1.1168e+00 1.6506e-15
0 0 0
c©C. T. Kelley, I. C. F. Ipsen, 2016 Part VIIIb: Eigenvalue Conditioning MA 580, Fall 2016 34 / 46
Part VIIIb: Eigenvalue Conditioning
What?
For diagonalizable A with distinct real eigenvalues
The iteration converges to an upper triangular matrix,
which is similar to A,
and therefore has the same eigenvalues.
You can understand this via the power method.This is the core of Matlab’s eig code.
c©C. T. Kelley, I. C. F. Ipsen, 2016 Part VIIIb: Eigenvalue Conditioning MA 580, Fall 2016 35 / 46
Part VIIIb: Eigenvalue Conditioning
What a real code must do
Reduce A to a form with a cheap QR factorization(upper Hessenberg),
deal with multiple eigenvalues,
deal with complex conjugate pairs of eigenvalues,
build in shifts, . . .
c©C. T. Kelley, I. C. F. Ipsen, 2016 Part VIIIb: Eigenvalue Conditioning MA 580, Fall 2016 36 / 46
Part VIIIb: Eigenvalue Conditioning
A feel-good theorem
Suppose:
A is symmetric.
A is nonsingular.
The QR iterations An, Rn, Qn converge to A, Q, R.
Then A is diagonal with the eigenvalues of A along the diagonal.
c©C. T. Kelley, I. C. F. Ipsen, 2016 Part VIIIb: Eigenvalue Conditioning MA 580, Fall 2016 37 / 46
Part VIIIb: Eigenvalue Conditioning
Feel-good proof: I
Convergence implies that
A = QR = RQ
Then symmetry implies that
AT = QT RT = RT QT = A = QR = RQ.
So
RT R = RT QT QR = AT A = A2 = RQQT RT = RRT
c©C. T. Kelley, I. C. F. Ipsen, 2016 Part VIIIb: Eigenvalue Conditioning MA 580, Fall 2016 38 / 46
Part VIIIb: Eigenvalue Conditioning
Feel-good proof: II
Since R is upper triagular and
RT R = RRT
R is diagonal. Let’s prove this.Lemma: Suppose U is upper triangular and UUT = UTU. ThenU is diagonal.
c©C. T. Kelley, I. C. F. Ipsen, 2016 Part VIIIb: Eigenvalue Conditioning MA 580, Fall 2016 39 / 46
Part VIIIb: Eigenvalue Conditioning
Proof of Lemma: I
The proof is via induction. It’s clear for N = 1. Assume that thetheorem holds for dimensions up to N − 1. Let U be N × N uppertriangular and decompose it as
U =
(U1 x0 α
)where U1 is (N − 1)× (N − 1) upper triangular, x ∈ RN−1, and αis real.Assume that UUT = UTU then . . .
c©C. T. Kelley, I. C. F. Ipsen, 2016 Part VIIIb: Eigenvalue Conditioning MA 580, Fall 2016 40 / 46
Part VIIIb: Eigenvalue Conditioning
Proof of Lemma: II
UUT =
(U1 x0 α
)(UT
1 0xT α
)=
(U1UT
1 + xxT αxαxT α2
)=
UTU =
(UT
1 0xT α
)(U1 x0 α
)=
(UT
1 U1 UT1 x
xTU1 α2 + xTx
)So x = 0 and . . .
c©C. T. Kelley, I. C. F. Ipsen, 2016 Part VIIIb: Eigenvalue Conditioning MA 580, Fall 2016 41 / 46
Part VIIIb: Eigenvalue Conditioning
Proof of Lemma: III
UT1 U1 = U1UT
1 , so
U1 is diagonal by the induction hypothesis.
x = 0 implies that U is diagonal.
We are almost done.
c©C. T. Kelley, I. C. F. Ipsen, 2016 Part VIIIb: Eigenvalue Conditioning MA 580, Fall 2016 42 / 46
Part VIIIb: Eigenvalue Conditioning
Feel-good proof: III
Now that R is diagonal, we can use
RT R = RT QT QR = AT A = A2 = RQQT RT = RRT
to conclude thatQR = QT RT = QT R
since A is nonsingular, we must have
Q = QT = Q−1.
c©C. T. Kelley, I. C. F. Ipsen, 2016 Part VIIIb: Eigenvalue Conditioning MA 580, Fall 2016 43 / 46
Part VIIIb: Eigenvalue Conditioning
Feel-good proof: IV
So R is diagonal and Q is symmetric. This means that
A2 = QRRQ = QR2Q
is a spectral decomposition of A2 so
The columns of Q are eigenvectors of A2
and hence they are eigenvectors of A (symmetry).
So I can order the eigenvalues of A so that
A = QΛQ
is a spectral decomposition of A
c©C. T. Kelley, I. C. F. Ipsen, 2016 Part VIIIb: Eigenvalue Conditioning MA 580, Fall 2016 44 / 46
Part VIIIb: Eigenvalue Conditioning
Feel-good proof: IV
We’re done because
QΛ = AQ = RQQ = R
which means thatΛ = QR = A
is diagonal and the eigenvectors of A (which are the eigenvectorsof A are the entries.
c©C. T. Kelley, I. C. F. Ipsen, 2016 Part VIIIb: Eigenvalue Conditioning MA 580, Fall 2016 45 / 46
Part VIIIb: Eigenvalue Conditioning
Convergence Theory for Happy Matrices
Assume that A has real distinct eigenvalues and
|λ1| < |λ2| < . . . |λN |.
Then An → R where R has the eigenvalues of A on the diagonal.If A is symmetric, then An → Λ. Moreover
‖R− An‖ = O
([max
i
|λi ||λi+1|
]n)which sure does smell like the power method.
c©C. T. Kelley, I. C. F. Ipsen, 2016 Part VIIIb: Eigenvalue Conditioning MA 580, Fall 2016 46 / 46