linear systems & naive gaussian elimination – part b – · pdf filelinear...
TRANSCRIPT
Linear Systems & NaiveGaussian Elimination
– Part B –
Prof. Dr. Florian Rupp
German University of Technology in Oman (GUtech)Introduction to Numerical Methods for ENG & CS
(Mathematics IV)
Spring Term 2017
Today, we will discuss further numericalaspects of naive Gaussian elimination
Prof. Dr. Florian Rupp GUtech 2017: Numerical Methods – 2 / 37
Today’s topics:
■ Continuing the Computer Lab
■ Computing errors & accuracy of the naive Gaussian algorithm
■ Residual vectors and their meaning
■ Testing the algorithm
■ Ill-conditioning and the Vandermonde matrix
■ Motivation of pivoting strategies
Corresponding textbook chapters: 2.1 and 2.2
Computer Lab – Part 2:Error & Residual Vector
Recap: The naive Gaussian eliminationalgorithm runs with O(n3) long operations
Prof. Dr. Florian Rupp GUtech 2017: Numerical Methods – 4 / 37
Phase 1 (forward elimination):
for k = 1 to n− 1for i = k + 1 to n
xmult ← ai,k/ak,kai,k ← xmultfor j = k + 1 to n
ai,j ← ai,j − xmult · ak,jend forbi ← bi − xmult · bk
end forend for
3 nested loops of length n with one
multiplication in the innermost loop.
We assume O(n3) long operations.
Phase 2 (back substitution):
xn ← bn/an,nfor i = n− 1 to 1
sum ← bifor j = i+ 1 to n
sum ← sum− ai,j · xj
end forxi ← sum/ai,i
end for
2 nested loops of length n with one
multiplication in the innermost loop.
We assume O(n2) long operations.
Reviewing the highlights from last time(1/ 3)
Prof. Dr. Florian Rupp GUtech 2017: Numerical Methods – 5 / 37
Page 375, exercise 1 a (easier version)
Use naive Gaussian elimination to LU -factorize the following matrix into anunit lower triangular matrix L and an upper triangular matrix U :
A :=
1 3 00 −1 33 0 3
.
We have
L =
0 0 00 0 0−3 9 0
, and U =
1 3 00 −1 30 0 30
.
Reviewing the highlights from last time(2/ 3)
Prof. Dr. Florian Rupp GUtech 2017: Numerical Methods – 6 / 37
Computer Exercise
The upward velocity of a rocket is given at three different times as followsv1 = 106.8 [m/s] at time t1 = 5 [s], v2 = 177.2 [m/s] at time t2 = 8 [s],and v3 = 279.2 [m/s] at time t3 = 12 [s]. Applying the method of leastsquare approximation we obtain the linear system
52 5 182 8 1122 12 1
x1
x2
x3
=
106.8177.2279.2
.
Solve this system with the help of our NaiveGauss-function.
We have:
x1 = 0.29047 . . . , x2 = 19.69047 . . . , and x3 = 1.08571 . . . .
A remark on the computer exercise
Prof. Dr. Florian Rupp GUtech 2017: Numerical Methods – 7 / 37
The computed solution elementsx1, x2, x3 are the coefficients of anquadratic least-square approximation
p(t) = x1t2 + x2t+ x3 .
Think of it as interpolating the
speed of the rocket such that the
quadratic error between the
approximation and the given data
points is minimal.
For t ∈ [0, 13] we see that this
quadratic interpolation gives very
nice results:
4 5 6 7 8 9 10 11 12 1350
100
150
200
250
300
350
Back to our computer code
Prof. Dr. Florian Rupp GUtech 2017: Numerical Methods – 8 / 37
There, are still some tasks open:
■ Measure the quality of our computation(see next slide: error & residual vector)
■ Test the performance of the algorithm
■ Include special cases ... see “reviewing the highlights from last time(3/ 3)”
Residual & error vectors as measurementsfor the quality of a computation (1/ 2)
Prof. Dr. Florian Rupp GUtech 2017: Numerical Methods – 9 / 37
Next, it is important to get a feeling on how good our computed results are.
For a linear system Ax = b, let x be the true solution and x̃ be the computedsolution. Then, we define the
■ error vector: e := x̃− x■ residual vector: r := Ax̃− b
Of course, we would accept a solution if e and/ or r are small (in somevector-norm).
The exact solution x is often not known such that it is common to take theresidual vector r as a error-measurement.
Though, due to a possible sensitive of the algorithm to roundoff errors –
i.e., it is ill-conditioned – (think of the π-example) a small/ or smaller
residual vector does not mean that there is not a better solution of the given
linear system.
Residual & error vectors as measurementsfor the quality of a computation (2/ 2)
Prof. Dr. Florian Rupp GUtech 2017: Numerical Methods – 10 / 37
The question of whether a computed solution to a linear system is a goodsolution is extremely difficult to answer and beyond the scope of this lecture.Though we will have a short glimpse on the fundamental ideas that lead tosatisfactory answers for this question.
An important relationship between the error vector and the residual vector is
Ae = r
becauseAe = A(x̃− x) = Ax̃−Ax = Ax̃− b = r .
Please, keep this relationship in mind. We will use it later on when discussing
condition and stability of algorithms.
The famous MATLAB backslash operator
Prof. Dr. Florian Rupp GUtech 2017: Numerical Methods – 11 / 37
In MATLAB the system of equations Ax = b is solved with the backslashcommand: x = A \b.
When A is a quadratic n× n matrix MATLAB automatically examines A tosolve the system with the method that gives the least roundoff error andfewest operations.
■ If A is a permutation of a triangular system, then the appropriatetriangular solver is used.
■ If A appears to be symmetric and positive definite, then a Choleskyfactorization and two triangular solves are attempted.
■ If Cholesky factorization fails of if A does not appear to be symmetric, anLU factorization and two triangular solves are attempted.
Let us assume that with x = A \b we come as close to the exact solution of
a linear system as possible in MATLAB. (Be careful, the problem itself may be
ill-conditioned, more on that later.)
Let’s apply this to our computer code
Prof. Dr. Florian Rupp GUtech 2017: Numerical Methods – 12 / 37
Extending the code:
■ Add the calculation of the error and residual vector for the computedresults of the circuit-example.
■ How accurate is the the result in your opinion?
Computer Lab – Part 3:Testing the Code
Constructing a test case for the naiveGaussian algorithm (1/ 2)
Prof. Dr. Florian Rupp GUtech 2017: Numerical Methods – 14 / 37
One good way to test a procedure is to set-up an artificial problem whosesolution is know beforehand. Sometimes the test problem includes a parameterthat can be changed to vary the difficulty. The next example illustrates this:
In general, the Vandermonde-matrix is given as
V :=
1 2 4 8 . . . 2n−1
1 3 9 27 . . . 3n−1
...1 n+ 1 (n+ 1)2 (n+ 1)3 . . . (n+ 1)n−1
.
I.e.,V =
(
vi,j = (1 + i)j−1)
i,j=1,...,n
(you know this matrix already from your computer example – the rocket’s
speed).
Constructing a test case for the naiveGaussian algorithm (2/ 2)
Prof. Dr. Florian Rupp GUtech 2017: Numerical Methods – 15 / 37
If we define a column vector
b :=
(
bi =1
i((1 + i)n − 1)
)
i=1,...,n
we get a nice linear system V x = b the analytic solution of which is
x = (1, 1, 1, 1, . . . , 1)T ∈ Rn .
Deriving the analytic solution of V x = b(1/ 2)
Prof. Dr. Florian Rupp GUtech 2017: Numerical Methods – 16 / 37
The following procedure is similar to recovering the rocket’s speed in yourhomework. Assume, the polynomial
p(t) = 1 + t+ t2 + t3 + t4 + · · ·+ tn−1 =n∑
j=1
tj−1 =n∑
j=1
xjtj−1
is given, where all coefficients xj are by definition equal to one.
Next, we simply forget that we know the values of the coefficients and want
to recover them by the known evaluations of the polynomial p(t) at the
integers t = 1, 2, 3, . . . , n.
Deriving the analytic solution of V x = b(2/ 2)
Prof. Dr. Florian Rupp GUtech 2017: Numerical Methods – 17 / 37
This gives the following system of n equations for n unknowns:
n∑
j=1
xj(1 + i)j−1 = p(1 + i) =n∑
j=1
(1 + i)j−1 =(1 + i)n − 1
(1 + i)− 1
=1
i((1 + i)n − 1) ,
which is equivalent to V x = b. Here, we used the formula for the sum of ageometric series.
This is actually the ansatz for polynomial interpolation as we will see in a
month or so.
Let’s apply this to our computer code
Prof. Dr. Florian Rupp GUtech 2017: Numerical Methods – 18 / 37
You find a file VandermondeMat.m on our blackboard course page that youshould download to the same folder as our MATLAB naive Gaussianalgorithm implementation.
First, for increasing size n = 5, 6, 7, . . . of the Vandermonde-matrix measure
■ the time required by naive Gaussian elimination (tic - toc), and■ the number of long operations performed.
Next to the accuracy of our computations:
■ modify the code of your program such that the output vector x ofNaiveGauss for the Vandermonde-system is subtracted by ones in eachrow, i.e., compute the absolute error (which equals the relative error inthis case).
■ What do you recognize for n = 10, 11, 12?
Interpretation of theVandermonde-system test case
Prof. Dr. Florian Rupp GUtech 2017: Numerical Methods – 19 / 37
When increasing, we suddenly obtain a huge relative error for one of thecomponents such that the result of our computation is actually worthless.
At that stage the round-off error that is present in computing xi ispropagated and magnified throughout the backwards substitution phase.
We say that the Vandermonde-matrix is ill-conditioned because itssolution is that prone to errors (i.e. small changes in the input data).Note, being ill-conditioned is a property of the problem not the algorithm.
A first remark on conditioning
Prof. Dr. Florian Rupp GUtech 2017: Numerical Methods – 20 / 37
Note by heart:
Conditioning is a property of the problem, stability is a property of thealgorithm, and both have effects on the accuracy of the solution.
Thus, if the answer is not right, the algorithm should not be blamedautomatically; the condition of the problem may be bad (ill-conditionedproblem).
If the problem is ill-conditioned, no matter what algorithm is used, accu-racy can not be gained.
Reviewing the highlights from last time(3/ 3)
Prof. Dr. Florian Rupp GUtech 2017: Numerical Methods – 21 / 37
Reviewing the highlights from last time
Page 80, exercise 3 cApply naive Gaussian elimination to the following system and account for thefailures. Solve the system by other means if possible:
{
0 · x1 + 2x2 = 4x1 − x2 = 5
.
Naive Gaussian elimination does not work, because the pivot elementa1,1 = 0 is vanishing.
We have to permute the rows of this system of linear equations first, toget a non-vanishing pivot element. If we do this, we see that the systemis already in upper echelon form; backward substitution gives the result.
Naive Gaussian EliminationCan Fail: Reasons &
Remedies
Stability of the naive Gaussian elimination
Prof. Dr. Florian Rupp GUtech 2017: Numerical Methods – 23 / 37
As said, Gaussian elimination can fail. Today, we “pimp-up” our naiveGaussian elimination algorithm such that it avoids most reasons for notperforming well.
Therefore, we first discuss examples where failure occurs in
■ the forward elimination and/ or■ the backward substitution.
Note:
■ Naive Gaussian elimination is a very efficient algorithm, but it is a highlyunstable one. It does not avoid magnifying small errors (we will illustratethis by replace a1,1 = 0 with a1,1 = ε≪ 1 in the next example).
■ Even the most efficient and stable algorithm fails if the problem isill-conditioned. I.e., there may be problems, that you simply cannot solvewithout transforming them to a well-conditioned equivalent one.
Failure 1: a pivot is close to zero (1/ 3)
Prof. Dr. Florian Rupp GUtech 2017: Numerical Methods – 24 / 37
We already know that naive Gaussian elimination fails for the system
(
0 11 1
)(
x1x2
)
=
(
12
)
.
If a numerical procedure actually fails for some values of the data, then it islikely to fail for data near the failing values, too. To “test” this dictum, weconsider the system
(
ε 11 1
)(
x1x2
)
=
(
12
)
.
in which 0 < ε≪ 1 is a small positive number different from zero1.
1 negative values for ε with a small absolute value would work analogously
Failure 1: a pivot is close to zero (2/ 3)
Prof. Dr. Florian Rupp GUtech 2017: Numerical Methods – 25 / 37
In this ε-perturbed case the naive Gaussian algorithm works and its forwardelimination results in
ε 1 11 1 2
−→ε 1 10 1− ε−1 2− ε−1
and, analytically, its backward substitution gives for small values of ε:
x2 =2− ε−1
1− ε−1≈ 1 ,
and
x1 = ε−1(1− x2) = ε−1
(
1−2− ε−1
1− ε−1
)
=1
1− ε≈ 1 .
If ε is very small, ε−1 is huge. So if the calculation is performed by a
computer with finite word length, we will get another picture . . .
Failure 1: a pivot is close to zero (3/ 3)
Prof. Dr. Florian Rupp GUtech 2017: Numerical Methods – 26 / 37
In this ε-perturbed case the naive Gaussian algorithm works and its forwardelimination results on a computer with finite word length in
ε 1 11 1 2
−→ε 1 10 1− ε−1 2− ε−1
=ε 1 10 −ε−1 −ε−1
as 1− ε−1≈ −ε−1 and 2− ε−1≈ −ε−1 if ε−1 ≫ 1. Backward substitutiongives
x2 =−ε−1
−ε−1= 1 ,
andx1 = ε−1(1− x2) = ε−10 = 0 .
The relative error for the computed solution of x1 is thus 100 %.
Side remark: adding/ subtracting twonumbers of different magnitude
Prof. Dr. Florian Rupp GUtech 2017: Numerical Methods – 27 / 37
Example
Given an 8-digit decimal machine with a 16-digit accumulator (i.e. “calculator”)and ε = 10−9. What is then the result of 2− ε−1?
To subtract, the computer must interpret the numbers as
ε−1 = 109 = 0.10000000 · 1010 = 0.1000000000000000 · 1010
− 2 = 0.20000000 · 101 = 0.0000000002000000 · 1010
0.0999999998000000 · 1010
and finally rounding to 8 decimal digits gives 0.10000000 · 1010 = ε−1 again.
Remedy (?): permute the rows, i.e. find anon-zero pivot row
Prof. Dr. Florian Rupp GUtech 2017: Numerical Methods – 28 / 37
If we permute the equations, the naive Gaussian algorithm works perfectly finefor
0 1 11 1 2
I↔II−→
1 1 20 1 1
=⇒ x2 = 1 , x1 = 1
as well as for
ε 1 11 1 2
I↔II−→
1 1 2ε 1 1
−→1 1 20 1− ε 1− 2ε
which gives
x2 =1− 2ε
1− ε≈ 1 , x1 = 2− x2 ≈ 1 .
Hypothesis:The difficulty of obtaining correct results is not simply due to ε being small,but rather to its being small relative to the other coefficients in the same row.
Failure 2: elements in the same row areof extremely different magnitude (1/ 2)
Prof. Dr. Florian Rupp GUtech 2017: Numerical Methods – 29 / 37
Let us “check” this alarming hypothesis first with an example, where allelements in a row are of the same magnitude O(ε). Here, naive Gaussianelimination works without problems.
ε 2ε 3ε4 5 0 (plus −4
ε-times the 1st row)
−→ε 2ε 3ε0 −3 −12
−→ε 2ε 3ε0 1 4
−→
{
x1 = −5x2 = 4
Next, we consider the worst possible case where there is actually a non-pivot
element that is extremely small compared to all other elements. Here, we will
face the problems as in the example where one of the pivot elements was
actually very small.
Failure 2: elements in the same row areof extremely different magnitude (2/ 2)
Prof. Dr. Florian Rupp GUtech 2017: Numerical Methods – 30 / 37
Let the pivot element be one and the other element in its row be of O(ε−1),then
1 ε−1 ε−1
1 1 2−→
1 ε−1 ε−1
0 1− ε−1 2− ε−1
and we face exactly the same issue as discussed.
This situation can be resolved again by interchanging the two rows (thoughthere would be no need for that as the pivot element is clearly different formzero). This gives:
1 ε−1 ε−1
1 1 2I↔II−→
1 1 21 ε−1 ε−1
−→1 1 20 ε−1 − 1 ε−1 − 2
and hence the correct solution
x2 =ε−1 − 2
ε−1 − 1≈ 1 , x1 = 2− x2 ≈ 1 .
Thus, an intelligent row pivoting strategyis required
Prof. Dr. Florian Rupp GUtech 2017: Numerical Methods – 31 / 37
In total we have
naive Gaussian elimination
+ row switching if the pivot element is vanishing relativeto the remaining entries in its row
= stable algorithm
This improvement strategy is called scaled pivoting.
We will formalize how a computer can decide whether a “pivot elementis vanishing relative to the remaining entries in its row” or not.
Summary & Outlook
Naive Gaussian elimination opens thedoor to a plethora of further tasks
Prof. Dr. Florian Rupp GUtech 2017: Numerical Methods – 33 / 37
Naive GaussianElimination
Pivoting Strategies
Computing Costs
Matrix Factorization (LU, Cholesky, etc.)
Error Estimation (i.e., condition & stability)
What happens if the matrix has a special structure?
What happens if the matrix is really large?
Eigenvalue Computations
Major concepts covered today (1/ 2):residual & error vectors
Prof. Dr. Florian Rupp GUtech 2017: Numerical Methods – 34 / 37
■ When solving the linear system Ax = b, if the true or exact solutionis x and the approximate or computed solution is x̃, then importantquantities are
◆ error vector: e = x̃− x◆ residual vector: r = Ax̃− b
■ For an n× n system of linear equations Ax = b the forwardelimination phase of the naive Gaussian elimination involvesapproximately O(n3) long operations (multiplications or divisions),whereas the back substitution requires only O(n2) long operations.
■ A problem is ill-conditioned, if small changes in its input data havea huge impact on its result.
Major concepts covered today (2/ 2):condition & stability
Prof. Dr. Florian Rupp GUtech 2017: Numerical Methods – 35 / 37
■ Conditioning is a property of the problem, stability is aproperty of the algorithm, and both have effects on the accuracyof the solution. Thus, if the answer is not right, the algorithm shouldnot be blamed automatically; the condition of the problem may bebad (ill-conditioned problem). If the problem is ill-conditioned, nomatter what algorithm is used, accuracy can not be gained.
■ The naive Gaussian algorithm is a highly efficient algorithm, but itis not stable, i.e., does not avoids magnifying small errors. We haveseen this in our example in our pivoting example, where we replaceda1,1 = 0 with a1,1 = ε.
■ The naive Gaussian algorithm is stabilized when we apply a scaledpivoting strategy (row switching) if the pivot element is vanishingrelative to the remaining entries in its row.
Preparation for the next lecture (1/ 2)
Prof. Dr. Florian Rupp GUtech 2017: Numerical Methods – 36 / 37
Please, prepare these short exercises for the next lecture:
1. Page 80, exercise 2For what values of α ∈ R does naive Gaussian elimination produce erro-neous answers for the system
(
1 1α 1
)(
x1x2
)
=
(
22 + α
)
Explain what happens in the computer.
Preparation for the next lecture (2/ 2)
Prof. Dr. Florian Rupp GUtech 2017: Numerical Methods – 37 / 37
Please, prepare these short exercises for the next lecture:
2. Page 376, exercise 9 (easier version)Give the LU-factorization of
A :=
2 −1 22 −3 36 −1 8
.
3. Page 82, computer exercise 9Carry out the test for our function NaiveGauss based on the
Vandermonde-matrix but reverse the order of the equations. I.e., in the
code replace i by n− i+ 1 in the appropriate places.