numerical analysis — an introduction

48
Numerical Analysis — an Introduction Review www.maths.lth.se/na/courses/FMN011 Carmen Ar ´ evalo Textbook: Numerical Analysis, by Timothy Sauer. Pearson Addison Wesley. Numerisk Analys, Matematikcentrum, Lunds Universitet, 2013 Numerical Analysis — an Introduction Review – p. 1/48

Upload: others

Post on 12-Sep-2021

8 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Numerical Analysis — an Introduction

Numerical Analysis —an Introduction

Review

www.maths.lth.se/na/courses/FMN011

Carmen Arevalo

Textbook: Numerical Analysis, by Timothy Sauer.

Pearson Addison Wesley.

Numerisk Analys, Matematikcentrum, Lunds Universitet, 2013

Numerical Analysis — an Introduction Review – p. 1/48

Page 2: Numerical Analysis — an Introduction

Error analysis

◮ the absolute error is Ep = |p− p|

◮ the relative error is Rp =|p−p||p|

◮ correct (significant) digits

◮ types of errors: truncation, round-off, noise

◮ loss of significant digits

◮ If f(r) = 0, and x approximates r, the residual is |f(x)|and the error is |r − x|. Desirable: small residual ⇒small error

Numerical Analysis — an Introduction Review – p. 2/48

Page 3: Numerical Analysis — an Introduction

Bisection theorem (to solve f(x) = 0)

Suppose

◮ f is continuous in [a, b]

◮ f(r) = 0 for some r ∈ [a, b]

◮ f(a) and f(b) have opposite signs

If {cn} is the sequence produced by the bisection method,then

|r − cn| ≤ bn−an2

= b−a2n+1

so limn→∞

cn = r

Numerical Analysis — an Introduction Review – p. 3/48

Page 4: Numerical Analysis — an Introduction

Fixed Point Iteration (to solve f(x) = 0)

Rewrite f(x) = 0 as x = g(x)

r is a fixed point of the function g if r = g(r)

Theorem

g : [a, b] → R has a unique fixed point if:

◮ g is continuous on [a, b]

◮ g : [a, b] → [a, b] (assures existence)

◮ |g′(x)| < 1 for all x ∈ [a, b] (assures uniqueness)

A fixed point iteration has the form pk+1 = g(pk)

◮ |g′(r)| ≤ K < 1 ⇒ {pn} −→ r

◮ |g′(r)| > 1 ⇒ {pn} 9 r

Numerical Analysis — an Introduction Review – p. 4/48

Page 5: Numerical Analysis — an Introduction

Newton-Raphson Method (to solve f(x) = 0)

To solve f(x) = 0 with quadratic convergence

pk+1 = pk −f(pk)

f ′(pk)

Multiple roots: linear convergence; modified Newton’smethod for root of multiplicity m: (quadratic convergence)

pk+1 = pk −mf(pk)

f ′(pk)

Secant method: convergence rate of ≈ 1.6

pk = pk−1 −f(pk−1)(pk−1 − pk−2)

f(pk−1)− f(pk−2)

Numerical Analysis — an Introduction Review – p. 5/48

Page 6: Numerical Analysis — an Introduction

Newton’s Method for Systems

If f(x) = [f1(x), . . . , fn(x)]T ,

Newton’s method has the form

pk+1 = pk − Jf (pk)−1f(pk)

where Jf (x) is the Jacobian matrix of f

[

∂f(x)

∂x1

,∂f(x)

∂x2

, . . . ,∂f(x)

∂xn

]

Numerical Analysis — an Introduction Review – p. 6/48

Page 7: Numerical Analysis — an Introduction

Solving a system of equations, Ax = b

Equivalent systems have the same solution

Elementary operations on rows that yield an equivalentsystem:

◮ Row interchanges

◮ Multiplication by a constant

◮ rowr = rowr −mrp × rowp

To solve a system:

1. Perform a Gaussian elimination (to obtain an uppertriangular matrix)

2. Perform a back substitution

Numerical Analysis — an Introduction Review – p. 7/48

Page 8: Numerical Analysis — an Introduction

Solving Triangular Linear Systems

Upper triangular matrix: back substitution

Lower triangular matrix: forward substitution

Computational complexity:Total number of operations = N 2

Numerical Analysis — an Introduction Review – p. 8/48

Page 9: Numerical Analysis — an Introduction

Triangular factorization, A = LU

Ax = b

1. Solve Ly = b with forward substitution to get y

2. Use y in Ux = y and solve with back substitution) toget x

Computational complexity:

Total number of operations =2N 3

3− N 2

2− N

6

Numerical Analysis — an Introduction Review – p. 9/48

Page 10: Numerical Analysis — an Introduction

Vector and matrix norms

◮ 1-norm: ||x||1 =∑n

i=1 |xi|◮ ||A||1 = maxj

∑ni=1 |aij|

◮ 2-norm: ‖x‖2 =√∑n

i=1 |xi|2

◮ ‖A‖2 =√

ρ(ATA)

◮ ∞-norm: ‖x‖∞ maxi |xi|◮ ||A||∞ = maxi

∑nj=1 |aij|

Numerical Analysis — an Introduction Review – p. 10/48

Page 11: Numerical Analysis — an Introduction

Ill conditioning and pivoting

Ax = b is ill conditioned if small perturbations in thecoefficients of A or b produce large changes in x

κp(A) = ||A||p · ||A−1||p

If κ(A) ≈ 10k, about k significant digits will be lost in solvingAx = b.

Partial pivoting: choose largest magnitude in column

Numerical Analysis — an Introduction Review – p. 11/48

Page 12: Numerical Analysis — an Introduction

LU factorization with pivoting

Permutation matrix: P 2 = P (rows are permutations of therows of I).

If A is nonsingular, there is a P such that PA = LU

Ax = b ⇒ LUx = Pb

1. Compute L, U and P

2. Compute Pb

3. Solve Ly = Pb with forward substitution

4. Solve Ux = y with backward substitution

Numerical Analysis — an Introduction Review – p. 12/48

Page 13: Numerical Analysis — an Introduction

Iterative Methods for Linear Systems

Given x0, we construct the method

xk+1 = Bxk + c

so that a fixed point of g(x) = Bx+ c is a solution of Ax = b.

A = M −N with M nonsingular

xk+1 = M−1Nxk +M−1b

x0 can be arbitrary; however, convergence will be faster ifwe start with a good guess of the solution.

Numerical Analysis — an Introduction Review – p. 13/48

Page 14: Numerical Analysis — an Introduction

Jacobi, Gauss-Seidel and SOR methods

Separate A into upper, diagonal and lower parts:A = L+D + U

◮ Jacobi: M = D

◮ Gauss-Seidel: M = L+D

◮ SOR: accelerates GS with parameter 1 ≤ ω < 2

A is strictly diagonally dominant if |akk| >N∑

j=1,j 6=k

|akj|

If A is strictly diagonally dominant, then these methodsconverge for any choice of x0.

Numerical Analysis — an Introduction Review – p. 14/48

Page 15: Numerical Analysis — an Introduction

Convergence Theorems

Spectral radius of A: radius of smallest circle centered at 0in the complex plane that contains all eigenvalues of A

ρ(A) = max{|λ| : det(λI −A) = 0}

Suppose we have an iterative method

xk+1 = Bxk + c

1. The iterative method converges for any x0 if ‖B‖p < 1

for some p.

2. The iterative method converges for any x0) if and onlyif ρ(B) < 1.

Numerical Analysis — an Introduction Review – p. 15/48

Page 16: Numerical Analysis — an Introduction

Interpolation

y = f(x) interpolates {(x1, y1), (x2, y2), . . . , (xn, yn)} iff(xi) = yi for each i = 1, 2, . . . , n

Basis functions Φ1,Φ2, . . . ,Φn : f(x) =n∑

j=1

yjΦj(x)

To determine coefficients yj: solve

Φ1(x1) Φ2(x1) · · · Φn(x1)

Φ1(x2) Φ2(x2) · · · Φn(x2)...

.... . .

...

Φ1(xn) Φ2(xn) · · · Φn(xn)

y1

y2...

yn

=

f(x1)

f(x2)...

f(xn)

Numerical Analysis — an Introduction Review – p. 16/48

Page 17: Numerical Analysis — an Introduction

Polynomial interpolation

Unique polynomial of degree n− 1 through n distinct points

◮ Monomial: {1, x, x2, . . . , xn−1}, Vandermonde matrix

◮ Lagrange:Lj(x) =

∏nk=1,k 6=j(x− xk)

∏nk=1,k 6=j(xj − xk)

, I matrix

◮ Newton:1, x− x1, . . . , (x− x1)(x− x2) · · · (x− xn−1)

,

triangular matrix (table of divided differences)

◮ Bernstein: Bni (t) =

(

n

i

)

(1− t)n−iti t ∈ [0, 1]

Numerical Analysis — an Introduction Review – p. 17/48

Page 18: Numerical Analysis — an Introduction

Interpolation error and Chebyshev nodes

f(x)− P (x) =f (n)(θ)

n!(x− x1)(x− x2) . . . (x− xn)

where θ ∈ [x1, xn] is unknown.

Error is reduced by choosing {x1, x2, . . . , xn} as the zerosof the Chebyshev polynomials

These nodes minimize e(x) = |(x−x1)(x−x2) . . . (x−xn)|and the e (not the points) is distributed evenly in [−1, 1].

To interpolate on [a,b], take the Chebyshev nodes on[−1, 1] and use the transformation

x =b+ a

2+

b− a

2t, t ∈ [−1, 1],

to get the nodes on [a, b].

Numerical Analysis — an Introduction Review – p. 18/48

Page 19: Numerical Analysis — an Introduction

Piecewise polynomials

Large number of data points: use low-degree polynomialsover subintervals.

Partition: a = x1 < x2 < x3 < · · · < xn = b

A different polynomial is used in each [xi−1, xi]

Splines: polynomial pieces joined together with certainsmoothness conditions.

Cubic splines: 2 endpoint conditions to be imposed.Matrix is strictly diagonally dominant, so system has aunique solution.

Numerical Analysis — an Introduction Review – p. 19/48

Page 20: Numerical Analysis — an Introduction

Parametric curves

If p ∈ Πn([a, b]), we can write it as a linear combination ofBernstein polynomials:

p(t) =

n∑

i=0

biBni (t) where Bn

i (t) = Bni (

t− a

b− a)

The coefficients bi are called Bézier or control points.

Numerical Analysis — an Introduction Review – p. 20/48

Page 21: Numerical Analysis — an Introduction

Bézier curves

Given a set of control points {Pi = (xi, yi)}ni=1,

A parametric Bézier curve is

X(t) = x0Bn−10 (t) + · · · + xnB

n−1n (t), t ∈ [0, 1]

Y(t) = y0Bn−10 (t) + · · · + ynB

n−1n (t), t ∈ [0, 1]

de Casteljau’s algorithm: points on the curve are evaluatedby successive linear interpolation.

Numerical Analysis — an Introduction Review – p. 21/48

Page 22: Numerical Analysis — an Introduction

Properties of Cubic Bézier curves◮ P1 = P(0) and P4 = P(1) lie on the Bézier curve

◮ P(t) is continuous and has derivatives of all orders

◮ P′(0) = 3(P2 −P1) andP′(1) = 3(P4 −P3)

◮ The Bézier curve lies in the convex hull of its set ofcontrol points

For planar objects, the convex hull is the polygon formedby "an elastic band encompassing the given object".

Composite Bézier curves: to make the curves meetsmoothly, take the meeting point and the two control pointsnext to it collinearly.

Numerical Analysis — an Introduction Review – p. 22/48

Page 23: Numerical Analysis — an Introduction

Least Squares Fitting

m data points, n equations (m > n)

1. Choose model (with unknown parameters x)

2. Substitute data into model (construct system Ax = b)

3. Solve normal equations (ATAx = AT b)

x is the least squares solution of the inconsistent systemAx = b.

The least squares solution minimizes ‖b− Ax‖2.r = b−Ax is the residual vector of the least squaressolution.

Numerical Analysis — an Introduction Review – p. 23/48

Page 24: Numerical Analysis — an Introduction

Periodic data

If g has period P , take as model a Trigonometricpolynomial of order M

TM(x) = a0 +M∑

j=1

(

aj cos(2π

Pjx) + bj sin(

Pjx)

)

For even functions (f(−x) = f(x)): bj = 0,For odd functions (f(−x) = −f(x)): aj = 0.

Numerical Analysis — an Introduction Review – p. 24/48

Page 25: Numerical Analysis — an Introduction

Model linearization

Model linearization: (e.g., y = cekt)

◮ Linearize (ln y = ln c+ kt)

◮ Substitute (Y = ln y, C = ln c) to get linear equation(Y = kt+ C)

◮ Solve normal equations to get parameters (C and k)

◮ Convert to original parameters (c = eC)

Numerical Analysis — an Introduction Review – p. 25/48

Page 26: Numerical Analysis — an Introduction

Gram-Schmidt Orthogonalization

Orthogonalize set {v1, v2, . . . , vk}1. y1 = v1, q1 =

v1‖v1‖2

.

2. y2 = v2 − q1(qT1 v2), q2 =

y2‖y2‖2

.

3. · · ·4. yi = vi − q1(q

T1 vi)− · · · − qi−1(q

Ti−1vi), qi =

yi‖yi‖2

.

Note that projqjvi = qj(qTj vi) and qj⊥qi

Complete orthonormal basis by adding vectorsqk+1, . . . ,qn

Numerical Analysis — an Introduction Review – p. 26/48

Page 27: Numerical Analysis — an Introduction

Least Squares by QR-factorization

Given the n× k overdetermined system Ax = b, findA = QR and set

◮ R = upper k × k submatrix of R

◮ d = upper k elements of d = QT b

Solve Rx = d for least squares solution x.

The least squares solution minimizes‖b− Ax‖2 = ‖b−QRx‖2 = ‖QT b− Rx‖2

Numerical Analysis — an Introduction Review – p. 27/48

Page 28: Numerical Analysis — an Introduction

QR-factorization with Householder Reflectors◮ x1 is first column of A

◮ w1 = ±(‖x1‖2, 0, 0)◮ v1 = w1 − x1; P = v1v

T1 /v

T1 v1

◮ H1 = I − 2P ; H1A =

x x x

0 x x

0 x x

0 x x

◮ x2 is second column of submatrix starting at secondrow

Repeat the process with submatrices to get

A = H1H2H3R = QR

Numerical Analysis — an Introduction Review – p. 28/48

Page 29: Numerical Analysis — an Introduction

Gram-Schmidt vs Householder

Number of operations:

◮ Gram-Schmidt: k3

◮ Householder:2

3k3

Householder has lower memory requirements and lesserror amplification

With Gram-Schmidt the orthogonality property of Q mightbe lost because of possible cancelation in a computationlike

y3 = v3 − q1(qT1 v3)− q2(q

T2 v3)

Numerical Analysis — an Introduction Review – p. 29/48

Page 30: Numerical Analysis — an Introduction

Some Properties of Eigenvalues and Eigenvectors

◮ If u is an eigenvector, then ku is one too.

◮ The corresponding eigenvalue of u is the Rayleigh

quotient, λ =uTAu

uTu

◮ λ eigenvalue of A ⇒ λ−1 eigenvalue of A−1 (sameeigenvector)

◮ λ eigenvalue of A ⇒ λ− s eigenvalue of A− sI (sameeigenvector)

◮ (λ− s)−1 eigenvalue of (A− sI)−1 (same eigenvector)

◮ If A = S−1BS, then A and B have the sameeigenvalues (but not the same eigenvectors)

Numerical Analysis — an Introduction Review – p. 30/48

Page 31: Numerical Analysis — an Introduction

The Power MethodComputing the dominant eigenvalue/eigenvector

Suppose:• The eigenvectors of A form a basis• A has unique λ1 of maximum modulus

Start with x0 and define

yk−1 =xk−1

‖xk−1‖2xk = Ayk−1

λk = yTk−1xk

Speed of convergence is linear, and governed by |λ2/λ1|

Numerical Analysis — an Introduction Review – p. 31/48

Page 32: Numerical Analysis — an Introduction

The Shifted Inverse Power Method

To find the eigenvalue nearest to s:

Start with x0

Set B = A− sI

Set yk−1 = xk−1/‖xk−1‖2Solve Bxk = yk−1

Set ηk = xTk yk−1

λ =1

η+ s

Numerical Analysis — an Introduction Review – p. 32/48

Page 33: Numerical Analysis — an Introduction

QR Algorithm

A0 ≡ A = Q1R1

A1 ≡ R1Q1 = Q2R2

A2 ≡ R2Q2 = Q3R3

A3 ≡ R3Q3 = Q4R4

...If A is symmetric with |λ1| > |λ2| > · · · > |λm|, it convergeslinearly to a diagonal matrix containing the eigenvalues ofA and Q1 · · ·Qj converges to a matrix whose columns arethe corresponding eigenvectors of A.

Modified QR algorithm for A asymmetric: converges to anupper triangular matrix

Numerical Analysis — an Introduction Review – p. 33/48

Page 34: Numerical Analysis — an Introduction

Singular Values and Singular VectorsEigenvalues of ATA areλ1 = s21 ≥ λ2 = s22 ≥ · · · ≥ λn = s2n ≥ 0

with orthonormal eigenvectors v1, . . . , vn.

Take si ≥ 0. Define ui, i = 1, . . . ,m:

◮ If si 6= 0, ui = Avi/si

◮ If si = 0, ui is any unit vector orthogonal tou1, . . . ui−1.

◮ {v1, . . . , vn} are the (right singular vectors)

◮ {u1, . . . , um} are the (left singular vectors)

◮ Avi = siui, with s1 ≥ · · · ≥ sn ≥ 0 (si are the singularvalues)

Numerical Analysis — an Introduction Review – p. 34/48

Page 35: Numerical Analysis — an Introduction

Singular Value Decomposition

A = USV T

◮ SVD of Symmetric Matrices: si = |λi|vi are the corresponding unit eigenvectors of Aui are• vi if λi ≥ 0

• −vi if λi < 0

◮ rank(A)=rank(S)=number of nonzero elements of S

◮ | det(A)| = s1 · · · sn◮ A−1 = V S−1UT

Numerical Analysis — an Introduction Review – p. 35/48

Page 36: Numerical Analysis — an Introduction

SVD and low-rank approximation, compression

Low rank approximation:

A =

rank(A)∑

i=1

siuivTi

The best least squares approximation to A of rank p ≤ r isprovided by retaining the first p terms of the sum

If A is an n× n matrix, it contains n2 entries, but each termin the sum requires 2n+ 1 numbers

If the first singular value is much larger than the rest, mostof the information is captured by the first term.

Numerical Analysis — an Introduction Review – p. 36/48

Page 37: Numerical Analysis — an Introduction

Fourier matrix

The DFT of x = [x0, . . . , xn−1]T is

1√n

ω0 ω0 ω0 · · · ω0

ω0 ω1 ω2 · · · ωn−1

ω0 ω2 ω4 · · · ω2(n−1)

ω0 ω3 ω6 · · · ω3(n−1)

......

......

ω0 ωn−1 ω2(n−1) · · · ω(n−1)2

x0

x1

x2

...

xn−1

where ω = e−i2π/n.

Numerical Analysis — an Introduction Review – p. 37/48

Page 38: Numerical Analysis — an Introduction

Discrete Fourier Transform

Fnx = y, where

yk =1√n

n−1∑

j=0

xjωjk

F−1n = Fn

Unitary matrix: F−1 = F T

Orthogonal (real) ↔ Unitary (complex)

If x ∈ Rn, then y0 ∈ R and yn−k = yk

Numerical Analysis — an Introduction Review – p. 38/48

Page 39: Numerical Analysis — an Introduction

Fast Fourier Transform

Algorithm for computing the DFT: at each stage ittransforms the vector into 2 half-length vectors.

For n = 2N , the computational complexity is n log2 n.

For n prime it is n2.

Numerical Analysis — an Introduction Review – p. 39/48

Page 40: Numerical Analysis — an Introduction

DFT interpolation

Given x0, x1, . . . , xn−1, lettj = c+ j(d− c)/n, j = 0, 1, . . . , n− 1. Then

Q(t) =1√n

n−1∑

k=0

ykei2πk(t−c)/(d−c)

where yk = Fnxk, satisfies Q(tj) = xj for j = 0, . . . , n− 1.

If the x ∈ Rn and yk = ak + ibk, then

Q(t) =1√n

n−1∑

k=0

(

ak cos2πk(t− c)

d− c− bk sin

2πk(t− c)

d− c

)

Numerical Analysis — an Introduction Review – p. 40/48

Page 41: Numerical Analysis — an Introduction

Evaluation of trigonometric functions

To plot the interpolating trigonometric function, we caninvert the expanded DFT. The steps are the following:

1. Calculate the DFT of the evenly spaced data points:x → Fnx

2. Multiply by√

p/n: Fnx →√

p/nFnx

3. Expand the n points to p points: add zeros in positionsn/2 + 1 to p− n/2

4. Invert:√

p/nFnx → F−1p

p/nFnx.

Numerical Analysis — an Introduction Review – p. 41/48

Page 42: Numerical Analysis — an Introduction

Orthogonal Function InterpolationIf

A =

f0(t0) f0(t1) · · · f0(tn−1)

f1(t0) f1(t1) · · · f1(tn−1)...

......

fn−1(t0) fn−1(t1) · · · fn−1(tn−1)

is a real orthogonal matrix, then a function that interpolatesthe points (tj , xj) is

F (t) =n−1∑

k=0

ykfk(t),

where y = Ax.

Numerical Analysis — an Introduction Review – p. 42/48

Page 43: Numerical Analysis — an Introduction

Least squares with DFTLet {t0 = c, t1, . . . , tn−1 = c+ (n− 1)(d− c)/n} be the n

(even) equally spaced points on [c, d] and suppose we wantto have only the m < n functions {f0(t), f1(t), . . . , fm−1(t)},where m is even.

The normal equations are

c = Amx (no solving, just a matrix-vector product!)

and the least squares approximation using the first m basisfunctions is

Fm(t) =

m−1∑

k=0

ykfk(t)

Applications: filtering for audio compression or noiseremoval

Numerical Analysis — an Introduction Review – p. 43/48

Page 44: Numerical Analysis — an Introduction

Discrete cosine transform

y = Cx

C is a real orthogonal matrix and consists only of cosines.

Like the DFT, the DCT transforms n data points into n

interpolation coefficients.

Like for the DFT, the choice of m < n coefficientsy0, . . . , ym−1 gives a least-squares approximation.

2D-DCT in image processing

Y = CXCT

X = CTY C

Numerical Analysis — an Introduction Review – p. 44/48

Page 45: Numerical Analysis — an Introduction

Image compression

Crude compression: replace each k × k pixel block by itsaverage value

DCT compression:

1. take the 2D-DCT for each k × k matrix block,

2. do a least-squares approximation,

3. apply the inverse 2D-DCT.

Numerical Analysis — an Introduction Review – p. 45/48

Page 46: Numerical Analysis — an Introduction

Quantization

Quantization (mod q): round(y/q)Dequantization: y = q · round(y/q)With quantization matrix: YQ = [round(ykl/qkl)]

The larger qkl, the more the loss and the greater thecompression.

Numerical Analysis — an Introduction Review – p. 46/48

Page 47: Numerical Analysis — an Introduction

Huffman coding

Shannon information

I = −k∑

i=1

pi log2 pi

Huffman tree

Assign shorter codes to symbols with higher probabilities.From bottom up, join symbols with smallest probabilities.Assign a 0 to left branches, 1 to right branches.

Numerical Analysis — an Introduction Review – p. 47/48

Page 48: Numerical Analysis — an Introduction

Huffman coding for JPEG

The code for y00 (DC component) has two parts, the first isobtained from the DPCM tree, and the second part fromthe integer identifying table. The DC coefficient is a binaryformed by the concatenation of these two parts.

AC components are coded in a run-length pair (n,L), wheren is the length of a run of zeros and L is the length of thenext nonzero entry. Then a Huffman AC tree is used tocode these pairs. After that comes the integer identifyingcode.

Numerical Analysis — an Introduction Review – p. 48/48