numerical analysis â€” an introduction

Numerical Analysis —an Introduction

Review

www.maths.lth.se/na/courses/FMN011

Carmen Arevalo

Textbook: Numerical Analysis, by Timothy Sauer.

Pearson Addison Wesley.

Numerisk Analys, Matematikcentrum, Lunds Universitet, 2013

Numerical Analysis — an Introduction Review – p. 1/48

Error analysis

◮ the absolute error is Ep = |p− p|

◮ the relative error is Rp =|p−p||p|

◮ correct (significant) digits

◮ types of errors: truncation, round-off, noise

◮ loss of significant digits

◮ If f(r) = 0, and x approximates r, the residual is |f(x)|and the error is |r − x|. Desirable: small residual ⇒small error


Bisection theorem (to solve f(x) = 0)

Suppose

◮ f is continuous in [a, b]

◮ f(r) = 0 for some r ∈ [a, b]

◮ f(a) and f(b) have opposite signs

If {cn} is the sequence produced by the bisection method,then

|r − cn| ≤ bn−an2

= b−a2n+1

so limn→∞

cn = r


Fixed Point Iteration (to solve f(x) = 0)

Rewrite f(x) = 0 as x = g(x)

r is a fixed point of the function g if r = g(r)

Theorem

g : [a, b] → R has a unique fixed point if:

◮ g is continuous on [a, b]

◮ g : [a, b] → [a, b] (assures existence)

◮ |g′(x)| < 1 for all x ∈ [a, b] (assures uniqueness)

A fixed point iteration has the form pk+1 = g(pk)

◮ |g′(r)| ≤ K < 1 ⇒ {pn} −→ r

◮ |g′(r)| > 1 ⇒ {pn} 9 r


Newton-Raphson Method (to solve f(x) = 0)

To solve f(x) = 0 with quadratic convergence

pk+1 = pk −f(pk)

f ′(pk)

Multiple roots: linear convergence; modified Newton’smethod for root of multiplicity m: (quadratic convergence)

pk+1 = pk −mf(pk)

f ′(pk)

Secant method: convergence rate of ≈ 1.6

pk = pk−1 −f(pk−1)(pk−1 − pk−2)

f(pk−1)− f(pk−2)


Newton’s Method for Systems

If f(x) = [f1(x), . . . , fn(x)]T ,

Newton’s method has the form

pk+1 = pk − Jf (pk)−1f(pk)

where Jf (x) is the Jacobian matrix of f

[

∂f(x)

∂x1

,∂f(x)

∂x2

, . . . ,∂f(x)

∂xn

]


Solving a system of equations, Ax = b

Equivalent systems have the same solution

Elementary operations on rows that yield an equivalentsystem:

◮ Row interchanges

◮ Multiplication by a constant

◮ rowr = rowr −mrp × rowp

To solve a system:

1. Perform a Gaussian elimination (to obtain an uppertriangular matrix)

2. Perform a back substitution


Solving Triangular Linear Systems

Upper triangular matrix: back substitution

Lower triangular matrix: forward substitution

Computational complexity:Total number of operations = N 2


Triangular factorization, A = LU

Ax = b

1. Solve Ly = b with forward substitution to get y

2. Use y in Ux = y and solve with back substitution) toget x

Computational complexity:

Total number of operations =2N 3

3− N 2

2− N

6


Vector and matrix norms

◮ 1-norm: ||x||1 =∑n

i=1 |xi|◮ ||A||1 = maxj

∑ni=1 |aij|

◮ 2-norm: ‖x‖2 =√∑n

i=1 |xi|2

◮ ‖A‖2 =√

ρ(ATA)

◮ ∞-norm: ‖x‖∞ maxi |xi|◮ ||A||∞ = maxi

∑nj=1 |aij|


Ill conditioning and pivoting

Ax = b is ill conditioned if small perturbations in thecoefficients of A or b produce large changes in x

κp(A) = ||A||p · ||A−1||p

If κ(A) ≈ 10k, about k significant digits will be lost in solvingAx = b.

Partial pivoting: choose largest magnitude in column


LU factorization with pivoting

Permutation matrix: P 2 = P (rows are permutations of therows of I).

If A is nonsingular, there is a P such that PA = LU

Ax = b ⇒ LUx = Pb

1. Compute L, U and P

2. Compute Pb

3. Solve Ly = Pb with forward substitution

4. Solve Ux = y with backward substitution


Iterative Methods for Linear Systems

Given x0, we construct the method

xk+1 = Bxk + c

so that a fixed point of g(x) = Bx+ c is a solution of Ax = b.

A = M −N with M nonsingular

xk+1 = M−1Nxk +M−1b

x0 can be arbitrary; however, convergence will be faster ifwe start with a good guess of the solution.


Jacobi, Gauss-Seidel and SOR methods

Separate A into upper, diagonal and lower parts:A = L+D + U

◮ Jacobi: M = D

◮ Gauss-Seidel: M = L+D

◮ SOR: accelerates GS with parameter 1 ≤ ω < 2

A is strictly diagonally dominant if |akk| >N∑

j=1,j 6=k

|akj|

If A is strictly diagonally dominant, then these methodsconverge for any choice of x0.


Convergence Theorems

Spectral radius of A: radius of smallest circle centered at 0in the complex plane that contains all eigenvalues of A

ρ(A) = max{|λ| : det(λI −A) = 0}

Suppose we have an iterative method

xk+1 = Bxk + c

1. The iterative method converges for any x0 if ‖B‖p < 1

for some p.

2. The iterative method converges for any x0) if and onlyif ρ(B) < 1.


Interpolation

y = f(x) interpolates {(x1, y1), (x2, y2), . . . , (xn, yn)} iff(xi) = yi for each i = 1, 2, . . . , n

Basis functions Φ1,Φ2, . . . ,Φn : f(x) =n∑

j=1

yjΦj(x)

To determine coefficients yj: solve

Φ1(x1) Φ2(x1) · · · Φn(x1)

Φ1(x2) Φ2(x2) · · · Φn(x2)...

.... . .

...

Φ1(xn) Φ2(xn) · · · Φn(xn)

y1

y2...

yn

=

f(x1)

f(x2)...

f(xn)


Polynomial interpolation

Unique polynomial of degree n− 1 through n distinct points

◮ Monomial: {1, x, x2, . . . , xn−1}, Vandermonde matrix

◮ Lagrange:Lj(x) =

∏nk=1,k 6=j(x− xk)

∏nk=1,k 6=j(xj − xk)

, I matrix

◮ Newton:1, x− x1, . . . , (x− x1)(x− x2) · · · (x− xn−1)

,

triangular matrix (table of divided differences)

◮ Bernstein: Bni (t) =

(

n

i

)

(1− t)n−iti t ∈ [0, 1]


Interpolation error and Chebyshev nodes

f(x)− P (x) =f (n)(θ)

n!(x− x1)(x− x2) . . . (x− xn)

where θ ∈ [x1, xn] is unknown.

Error is reduced by choosing {x1, x2, . . . , xn} as the zerosof the Chebyshev polynomials

These nodes minimize e(x) = |(x−x1)(x−x2) . . . (x−xn)|and the e (not the points) is distributed evenly in [−1, 1].

To interpolate on [a,b], take the Chebyshev nodes on[−1, 1] and use the transformation

x =b+ a

2+

b− a

2t, t ∈ [−1, 1],

to get the nodes on [a, b].


Piecewise polynomials

Large number of data points: use low-degree polynomialsover subintervals.

Partition: a = x1 < x2 < x3 < · · · < xn = b

A different polynomial is used in each [xi−1, xi]

Splines: polynomial pieces joined together with certainsmoothness conditions.

Cubic splines: 2 endpoint conditions to be imposed.Matrix is strictly diagonally dominant, so system has aunique solution.


Parametric curves

If p ∈ Πn([a, b]), we can write it as a linear combination ofBernstein polynomials:

p(t) =

n∑

i=0

biBni (t) where Bn

i (t) = Bni (

t− a

b− a)

The coefficients bi are called Bézier or control points.


Bézier curves

Given a set of control points {Pi = (xi, yi)}ni=1,

A parametric Bézier curve is

X(t) = x0Bn−10 (t) + · · · + xnB

n−1n (t), t ∈ [0, 1]

Y(t) = y0Bn−10 (t) + · · · + ynB

n−1n (t), t ∈ [0, 1]

de Casteljau’s algorithm: points on the curve are evaluatedby successive linear interpolation.


Properties of Cubic Bézier curves◮ P1 = P(0) and P4 = P(1) lie on the Bézier curve

◮ P(t) is continuous and has derivatives of all orders

◮ P′(0) = 3(P2 −P1) andP′(1) = 3(P4 −P3)

◮ The Bézier curve lies in the convex hull of its set ofcontrol points

For planar objects, the convex hull is the polygon formedby "an elastic band encompassing the given object".

Composite Bézier curves: to make the curves meetsmoothly, take the meeting point and the two control pointsnext to it collinearly.


Least Squares Fitting

m data points, n equations (m > n)

1. Choose model (with unknown parameters x)

2. Substitute data into model (construct system Ax = b)

3. Solve normal equations (ATAx = AT b)

x is the least squares solution of the inconsistent systemAx = b.

The least squares solution minimizes ‖b− Ax‖2.r = b−Ax is the residual vector of the least squaressolution.


Periodic data

If g has period P , take as model a Trigonometricpolynomial of order M

TM(x) = a0 +M∑

j=1

(

aj cos(2π

Pjx) + bj sin(

2π

Pjx)

)

For even functions (f(−x) = f(x)): bj = 0,For odd functions (f(−x) = −f(x)): aj = 0.


Model linearization

Model linearization: (e.g., y = cekt)

◮ Linearize (ln y = ln c+ kt)

◮ Substitute (Y = ln y, C = ln c) to get linear equation(Y = kt+ C)

◮ Solve normal equations to get parameters (C and k)

◮ Convert to original parameters (c = eC)


Gram-Schmidt Orthogonalization

Orthogonalize set {v1, v2, . . . , vk}1. y1 = v1, q1 =

v1‖v1‖2

.

2. y2 = v2 − q1(qT1 v2), q2 =

y2‖y2‖2

.

3. · · ·4. yi = vi − q1(q

T1 vi)− · · · − qi−1(q

Ti−1vi), qi =

yi‖yi‖2

.

Note that projqjvi = qj(qTj vi) and qj⊥qi

Complete orthonormal basis by adding vectorsqk+1, . . . ,qn


Least Squares by QR-factorization

Given the n× k overdetermined system Ax = b, findA = QR and set

◮ R = upper k × k submatrix of R

◮ d = upper k elements of d = QT b

Solve Rx = d for least squares solution x.

The least squares solution minimizes‖b− Ax‖2 = ‖b−QRx‖2 = ‖QT b− Rx‖2


QR-factorization with Householder Reflectors◮ x1 is first column of A

◮ w1 = ±(‖x1‖2, 0, 0)◮ v1 = w1 − x1; P = v1v

T1 /v

T1 v1

◮ H1 = I − 2P ; H1A =

x x x

0 x x

0 x x

0 x x

◮ x2 is second column of submatrix starting at secondrow

Repeat the process with submatrices to get

A = H1H2H3R = QR


Gram-Schmidt vs Householder

Number of operations:

◮ Gram-Schmidt: k3

◮ Householder:2

3k3

Householder has lower memory requirements and lesserror amplification

With Gram-Schmidt the orthogonality property of Q mightbe lost because of possible cancelation in a computationlike

y3 = v3 − q1(qT1 v3)− q2(q

T2 v3)


Some Properties of Eigenvalues and Eigenvectors

◮ If u is an eigenvector, then ku is one too.

◮ The corresponding eigenvalue of u is the Rayleigh

quotient, λ =uTAu

uTu

◮ λ eigenvalue of A ⇒ λ−1 eigenvalue of A−1 (sameeigenvector)

◮ λ eigenvalue of A ⇒ λ− s eigenvalue of A− sI (sameeigenvector)

◮ (λ− s)−1 eigenvalue of (A− sI)−1 (same eigenvector)

◮ If A = S−1BS, then A and B have the sameeigenvalues (but not the same eigenvectors)


The Power MethodComputing the dominant eigenvalue/eigenvector

Suppose:• The eigenvectors of A form a basis• A has unique λ1 of maximum modulus

Start with x0 and define

yk−1 =xk−1

‖xk−1‖2xk = Ayk−1

λk = yTk−1xk

Speed of convergence is linear, and governed by |λ2/λ1|


The Shifted Inverse Power Method

To find the eigenvalue nearest to s:

Start with x0

Set B = A− sI

Set yk−1 = xk−1/‖xk−1‖2Solve Bxk = yk−1

Set ηk = xTk yk−1

λ =1

η+ s


QR Algorithm

A0 ≡ A = Q1R1

A1 ≡ R1Q1 = Q2R2

A2 ≡ R2Q2 = Q3R3

A3 ≡ R3Q3 = Q4R4

...If A is symmetric with |λ1| > |λ2| > · · · > |λm|, it convergeslinearly to a diagonal matrix containing the eigenvalues ofA and Q1 · · ·Qj converges to a matrix whose columns arethe corresponding eigenvectors of A.

Modified QR algorithm for A asymmetric: converges to anupper triangular matrix


Singular Values and Singular VectorsEigenvalues of ATA areλ1 = s21 ≥ λ2 = s22 ≥ · · · ≥ λn = s2n ≥ 0

with orthonormal eigenvectors v1, . . . , vn.

Take si ≥ 0. Define ui, i = 1, . . . ,m:

◮ If si 6= 0, ui = Avi/si

◮ If si = 0, ui is any unit vector orthogonal tou1, . . . ui−1.

◮ {v1, . . . , vn} are the (right singular vectors)

◮ {u1, . . . , um} are the (left singular vectors)

◮ Avi = siui, with s1 ≥ · · · ≥ sn ≥ 0 (si are the singularvalues)


Singular Value Decomposition

A = USV T

◮ SVD of Symmetric Matrices: si = |λi|vi are the corresponding unit eigenvectors of Aui are• vi if λi ≥ 0

• −vi if λi < 0

◮ rank(A)=rank(S)=number of nonzero elements of S

◮ | det(A)| = s1 · · · sn◮ A−1 = V S−1UT


SVD and low-rank approximation, compression

Low rank approximation:

A =

rank(A)∑

i=1

siuivTi

The best least squares approximation to A of rank p ≤ r isprovided by retaining the first p terms of the sum

If A is an n× n matrix, it contains n2 entries, but each termin the sum requires 2n+ 1 numbers

If the first singular value is much larger than the rest, mostof the information is captured by the first term.


Fourier matrix

The DFT of x = [x0, . . . , xn−1]T is

1√n

ω0 ω0 ω0 · · · ω0

ω0 ω1 ω2 · · · ωn−1

ω0 ω2 ω4 · · · ω2(n−1)

ω0 ω3 ω6 · · · ω3(n−1)

......

......

ω0 ωn−1 ω2(n−1) · · · ω(n−1)2

x0

x1

x2

...

xn−1

where ω = e−i2π/n.


Discrete Fourier Transform

Fnx = y, where

yk =1√n

n−1∑

j=0

xjωjk

F−1n = Fn

Unitary matrix: F−1 = F T

Orthogonal (real) ↔ Unitary (complex)

If x ∈ Rn, then y0 ∈ R and yn−k = yk


Fast Fourier Transform

Algorithm for computing the DFT: at each stage ittransforms the vector into 2 half-length vectors.

For n = 2N , the computational complexity is n log2 n.

For n prime it is n2.


DFT interpolation

Given x0, x1, . . . , xn−1, lettj = c+ j(d− c)/n, j = 0, 1, . . . , n− 1. Then

Q(t) =1√n

n−1∑

k=0

ykei2πk(t−c)/(d−c)

where yk = Fnxk, satisfies Q(tj) = xj for j = 0, . . . , n− 1.

If the x ∈ Rn and yk = ak + ibk, then

Q(t) =1√n

n−1∑

k=0

(

ak cos2πk(t− c)

d− c− bk sin

2πk(t− c)

d− c

)


Evaluation of trigonometric functions

To plot the interpolating trigonometric function, we caninvert the expanded DFT. The steps are the following:

1. Calculate the DFT of the evenly spaced data points:x → Fnx

2. Multiply by√

p/n: Fnx →√

p/nFnx

3. Expand the n points to p points: add zeros in positionsn/2 + 1 to p− n/2

4. Invert:√

p/nFnx → F−1p

√

p/nFnx.


Orthogonal Function InterpolationIf

A =

f0(t0) f0(t1) · · · f0(tn−1)

f1(t0) f1(t1) · · · f1(tn−1)...

......

fn−1(t0) fn−1(t1) · · · fn−1(tn−1)

is a real orthogonal matrix, then a function that interpolatesthe points (tj , xj) is

F (t) =n−1∑

k=0

ykfk(t),

where y = Ax.


Least squares with DFTLet {t0 = c, t1, . . . , tn−1 = c+ (n− 1)(d− c)/n} be the n

(even) equally spaced points on [c, d] and suppose we wantto have only the m < n functions {f0(t), f1(t), . . . , fm−1(t)},where m is even.

The normal equations are

c = Amx (no solving, just a matrix-vector product!)

and the least squares approximation using the first m basisfunctions is

Fm(t) =

m−1∑

k=0

ykfk(t)

Applications: filtering for audio compression or noiseremoval


Discrete cosine transform

y = Cx

C is a real orthogonal matrix and consists only of cosines.

Like the DFT, the DCT transforms n data points into n

interpolation coefficients.

Like for the DFT, the choice of m < n coefficientsy0, . . . , ym−1 gives a least-squares approximation.

2D-DCT in image processing

Y = CXCT

X = CTY C


Image compression

Crude compression: replace each k × k pixel block by itsaverage value

DCT compression:

1. take the 2D-DCT for each k × k matrix block,

2. do a least-squares approximation,

3. apply the inverse 2D-DCT.


Quantization

Quantization (mod q): round(y/q)Dequantization: y = q · round(y/q)With quantization matrix: YQ = [round(ykl/qkl)]

The larger qkl, the more the loss and the greater thecompression.


Huffman coding

Shannon information

I = −k∑

i=1

pi log2 pi

Huffman tree

Assign shorter codes to symbols with higher probabilities.From bottom up, join symbols with smallest probabilities.Assign a 0 to left branches, 1 to right branches.


Huffman coding for JPEG

The code for y00 (DC component) has two parts, the first isobtained from the DPCM tree, and the second part fromthe integer identifying table. The DC coefficient is a binaryformed by the concatenation of these two parts.

AC components are coded in a run-length pair (n,L), wheren is the length of a run of zeros and L is the length of thenext nonzero entry. Then a Huffman AC tree is used tocode these pairs. After that comes the integer identifyingcode.


numerical analysis â€” an introduction

Documents