lecture notes on numerical analysis

Lecture Notes on Numerical Analysis

Manuel Julio GarcaDepartment of Mechanical Engineering

EAFIT University Medellin, Colombia

April 5, 2011

Contents

Preface vii

Notation ix

1 Linear Systems 11.1 Linear Systems of Equations . . . . . . . . . . . . . . . . . . . . . . . . . . . 1

1.1.1 Geometric interpretation . . . . . . . . . . . . . . . . . . . . . . . . . 11.1.2 Singular cases . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4

1.2 Gau Elimination . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 41.2.1 Forward elimination . . . . . . . . . . . . . . . . . . . . . . . . . . . 51.2.2 Backward substitution . . . . . . . . . . . . . . . . . . . . . . . . . . 51.2.3 Operations in a matrix . . . . . . . . . . . . . . . . . . . . . . . . . . 61.2.4 Forward elimination General case . . . . . . . . . . . . . . . . . . . 81.2.5 Backward substitution General case . . . . . . . . . . . . . . . . . 9

1.3 LU Decomposition . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 111.3.1 Cholesky Factorisation . . . . . . . . . . . . . . . . . . . . . . . . . . 13

1.4 Special Types of Matrices . . . . . . . . . . . . . . . . . . . . . . . . . . . . 151.5 Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16

2 Iterative Methods 192.1 Vector Space . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 192.2 Vector Norms . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 20

2.2.1 Distance between two vectors . . . . . . . . . . . . . . . . . . . . . . 202.2.2 Some norms in Rn . . . . . . . . . . . . . . . . . . . . . . . . . . . . 20

2.3 Sequence of Vectors . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 232.3.1 Equivalent norms . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 24

2.4 Inner Product . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 262.5 Matrix Norms . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 27

2.5.1 Natural norms . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 272.6 Eigenvalues . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 29

2.6.1 Applications . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 312.7 Iterative Methods . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 32

2.7.1 Spectral Radius . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 322.7.2 Convergent Matrix . . . . . . . . . . . . . . . . . . . . . . . . . . . . 32

iii

iv CONTENTS

2.7.3 General characteristics of Iterative Methods . . . . . . . . . . . . . . 33

2.7.4 Jacobis Method . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 35

2.7.5 Gau Seidel . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 36

2.7.6 Steepest Descent . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 38

2.7.7 Conjugate Gradient . . . . . . . . . . . . . . . . . . . . . . . . . . . 40

2.8 Condition Number . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 43

2.9 Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 44

3 Interpolation 47

3.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 47

3.1.1 Polynomial approximation . . . . . . . . . . . . . . . . . . . . . . . . 47

3.2 Lagrange polynomials . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 49

3.2.1 Second order Lagrange polynomials . . . . . . . . . . . . . . . . . . 50

3.2.2 General case . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 50

3.2.3 Other Approximations . . . . . . . . . . . . . . . . . . . . . . . . . . 51

3.3 Polynomials defined by parts . . . . . . . . . . . . . . . . . . . . . . . . . . 52

3.3.1 One-dimensional Interpolation . . . . . . . . . . . . . . . . . . . . . 55

3.3.2 Two dimensional interpolation . . . . . . . . . . . . . . . . . . . . . 57

3.4 Calculo del gradiente de un campo escalar . . . . . . . . . . . . . . . . . . . 58

3.4.1 calculo del gradiente para un elemento . . . . . . . . . . . . . . . . . 59

3.4.2 calculo del gradiente para un campo . . . . . . . . . . . . . . . . . . 59

3.5 Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 59

4 The Finite Element Method 63

4.1 Classification of the Partial Differential Equations PDE . . . . . . . . . . . 63

4.2 Boundary Value Problems . . . . . . . . . . . . . . . . . . . . . . . . . . . . 65

4.2.1 One dimensional boundary problems . . . . . . . . . . . . . . . . . . 65

4.3 Preliminary mathematics . . . . . . . . . . . . . . . . . . . . . . . . . . . . 67

4.3.1 The Divergence (Gau) theorem . . . . . . . . . . . . . . . . . . . . 67

4.3.2 Greens equation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 67

4.3.3 Bilinear Operators . . . . . . . . . . . . . . . . . . . . . . . . . . . . 68

4.4 Overview of Finite Element Method . . . . . . . . . . . . . . . . . . . . . . 69

4.4.1 Weak Form . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 69

4.4.2 Abstract Form . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 69

4.4.3 Variational Form . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 70

4.4.4 Discrete Problem (Galerkin Method) . . . . . . . . . . . . . . . . . . 71

4.5 Variational Formulation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 73

4.5.1 Reduction to Homogeneous Boundary Conditions . . . . . . . . . . . 73

4.5.2 The Ritz-Galerkin Method . . . . . . . . . . . . . . . . . . . . . . . 75

4.5.3 Other Methods. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 76

4.6 One dimensional problems . . . . . . . . . . . . . . . . . . . . . . . . . . . . 77

4.6.1 Dirichlet boundary conditions . . . . . . . . . . . . . . . . . . . . . . 77

4.6.2 Pragmatics . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 79

4.6.3 Computation of the ` vector . . . . . . . . . . . . . . . . . . . . . . . 80

CONTENTS v

4.6.4 von Newman Boundary Conditions . . . . . . . . . . . . . . . . . . . 814.7 Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 85

5 Two-dimensional Elliptic Problems 875.1 Poissons equation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 875.2 Weak form of the problem . . . . . . . . . . . . . . . . . . . . . . . . . . . . 88

5.2.1 Weak form of the Dirichlet homogeneous boundary problem . . . . . 895.2.2 Weak form of the von Newman homogeneous boundary problem . . 89

5.3 Discrete problem . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 895.4 Computation of the stiffness matrix . . . . . . . . . . . . . . . . . . . . . . 915.5 Non-homogeneous Dirichlet boundary problem . . . . . . . . . . . . . . . . 935.6 Non-homogeneous von Newman boundary problems . . . . . . . . . . . . . 955.7 Fourier boundary conditions . . . . . . . . . . . . . . . . . . . . . . . . . . . 1155.8 Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 117

6 Afin Transformations 1196.1 Change of variable in an integral . . . . . . . . . . . . . . . . . . . . . . . . 1196.2 Transformation of a Standard Element . . . . . . . . . . . . . . . . . . . . . 120

6.2.1 Computation of transformation function . . . . . . . . . . . . . . . 1216.2.2 Base functions for the standard element . . . . . . . . . . . . . . . . 1226.2.3 Computation of integrals over a finite element . . . . . . . . . . . . . 122

6.3 Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 130

7 Parabolic Problems 1337.1 Finite Difference Approximation . . . . . . . . . . . . . . . . . . . . . . . . 1347.2 Method for time integration . . . . . . . . . . . . . . . . . . . . . . . . . 135

A Barycentric Coordinates 137

B Gradient 139B.1 Computation of the gradient of a function . . . . . . . . . . . . . . . . . . . 139

Preface

The aim of Engineering Analysis is to model the physical behaviour of structures underthe interaction of external loads such as forces and temperatures, in order to verify thatthey comply with design specifications.

The course Advanced Methods in Numerical Analysis is a postgraduate course of theMechanical Engineering Department at EAFIT University in Medellin, Colombia. Theaim of the course is to study the numerical (computational) solutions to mathematicallyformulated physical problems. The course covers an introduction to state-of-the-art linearalgebra matrix methods as well as an introduction to the numerical formulation of contin-uum mechanics problems such as: heat transfer, potential flow, diffusion, and others thatcan be modelled by a second-order differential equation. The method of solution studiedduring this part is the Finite Element Method.

These notes grew from my undergrad notes while attending professor Jose Rafael Toroscourse, Numerical Analysis, at Los Andes University. From there they evolved after severalyears of teaching the field while at los Andes University and then at EAFIT University.They do not pretend to be a mathematical theory of numerical methods or Finite Elements.Instead, they intend to introduce the students to powerful mathematical and numericaltechniques and to motivate them to further study in the topic. Numerical modellingcan be summarised as follows: Start by establishing the nature of the problem and themathematical equations that model its physics, transform the mathematical formulationinto a suitable form to be solve by a computer and finally implement the solution intoa computer code. That covers the whole cycle. The course is not all inclusive in topicsbut tries to cover and understand all the single steps involved in the process of solvinga basic continuum problem by computer. There is no preference for computer languagesbut Matlab, Maple, and C++ are used extensively. However, the algorithms presented inthe text are generic and do not compromise with any of these languages.

Care must be taken. These notes are not in the final version and may have some typosand missing parts. I apologise for that. They are intended as a guide during the course tohelp the students in the understanding of the concepts and algorithms given during thecourse. New examples will be added continuously. I hope you find this work to be of help.

A preliminary course in calculus of several variables and a computer programing courseare required for success in the course.

Manuel GarcaMedelln, September 2002

vii

Notation

A matrixaij component in row i column j of the A matrixaj jth column of Aai ith row of Ax a vector of n elementsxT the traspouse of x, row vector, scalar quantitiesEi the ith equation of a system of equations(E1) a property of a vector spaceN(x) a norm of xWi the ith base function a domain the boundary of a domain

kxk

a(u, v) a bilinear operator`(v) a linear operator, inner productAD Dirichlet stiffness matrix. Does not contain

rows or columns in corrresponding to thedirichlet boundary.

v gradient of v2v Laplacian of v

ix

Chapter 1

Linear Systems

1.1 Linear Systems of Equations

This section deals with finding the value of the variables x1, x2, x3 . . . , xn that simultane-ously satisfy a set of linear equations which are present in the solution of many physicalproblems. In general, a system of n linear equations can be represented as

a11x1 + a12x2 + . . .+ a1nxn = b1a21x1 + a22x2 + . . .+ a2nxn = b2...an1x1 + an2x2 + . . .+ annxn = bn

where a represents real coefficients and b independent real constants. A system ofequations have a geometrical interpretation that helps to to understand the meaning ofa system of equations, also the mechanics of the method of solution, and the meaning ofsingular systems. Next section, we introduce these concepts starting with a system of twoequations and later we extend the concept to the general case.

1.1.1 Geometric interpretation

Lets have the following system of two equations:

2x1 + 4x2 = 2 (1.1)

4x1 + 11x2 = 1. (1.2)

This system can be written in matrix form as:[2 44 11

] [x1x2

]=

[21

]or in a compact notation:

Ax = b

1

2 Linear Systems

or

[A]x = b

with

A =

[2 44 11

], and x =

[x1x2

].

We refer to xi as the ith component of vector x. Also in the literature it is common torefer to the ith component of vector x as [x]i and x(i). In the same way aij represent theith row, jth column component of the matrix A. Unless it is explicitly stated, aij and xiare real numbers. That is xi and aij R. Rows and columns are denoted using Matlabcolon notation: thus A[i, :] represents the ith row and A[:, i] represents the ith column.

A system of equations have a geometric interpretation that differs if we look at therows or the columns of the system.

Row representation (System of two equations)

Equations 1.1 and 1.2 are linear equations that can be plotted as lines in a two dimensionalplane. The simultaneous solution represents the intersection of the two lines in the plane,see figure 1.1.

Figure 1.1: Geometric representation to the solution of a system of equations. Row representation.

Column representation (System of two equations)

Notice that equations 1.1 and 1.2 can also be written also in the following way:

x1

(24

)+ x2

(411

)=

(21

).

That is the sum of two vectors which are multiplied by scalars x. Remember that when anumber is multiplied by a scalar quantity, its magnitude is changed by this factor. Figure1.2 shows the vectors represented by the columns of the system. If we multiply the first

1.1 Linear Systems of Equations 3

vector (2, 4) by 3 and the second (4, 11) by 1 and then add them, the result is vector (2, 1)which is the solution to the system. This result is not a coincidence at all. As the columnsare linearly independent they represent a base of the space which in this particular caseis a base of R2. Therefore any vector in the space can be found as a linear combination ofthe elements of the base.

first column

(4,11)

(2,4)second column

x

y

3 (2,4)1 (4,11)

Figure 1.2: Column representation of a system of two equations.

An important conclusion is that any vector in the plane can be reproduced by thesetwo vectors if we choose x1 and x2 properly. Notice that this result can also be extendedto any two vectors different from (2, 4) and (4, 11) if and only if they are not parallel toeach other. Why?

Row representation (System of three equations) a11 a12 a13a21 a22 a23a31 a32 a33

x1x2x3

= b1b2b3

Every row represents a plane in three dimensional space. That way, for example, (a11, a12, a13)is the vector normal to the plane represented by the first row. It can also be written asA[1, :] using the Matlab colon notation. In the same way (a21, a22, a23) = A[2, :] definesanother plane and so forth.

The solution to the system of equations is given by the intersection of the planes. Thatis plane A[1, :] intersects plane A[2, :] in a line and that line intersects plane A[3, :] in apoint which is the solution to the system.

4 Linear Systems

Column representation (System of three equations)

In the same way as the n = 2 case, each matrix column represents a vector in threedimensional space. In Matlab colon notation the columns are written as a11a21

a31

= A(:, 1) a12a22a32

= A(:, 2) a13a23a33

= A(:, 3)and the vector solution is obtained as a linear combination of these three vectors. That is,multiplying the column vectors A [:, 1], A [:, 2], and A [:, 3] by the scalar quantities x1, x2and x3 and then adding the result. See figure 1.3.

Figure 1.3: Column representation to the solution of a system of 3 linear equations

1.1.2 Singular cases

There are cases when the system does not have a solution. That can be interpreted fromthe geometrical point of view as follows. If we use row vector representation, then eachmatrix row represents a plane and the solution is the simultaneous intersection of theplanes. If one of the planes is parallel to the intersection of the other two planes then thethree planes never intersect in a point and therefore there is no solution. See figure 1.4(a).On the other hand, when we plot the column vectors of the same singular system, theywill be lying in the same plane. See figure 1.4(b). This means the vectors do not form abase in R3.

1.2 Gau Elimination

From the geometric point-of-view each equation (row) of a system of n equations representsa hyperplane of nth dimension. Finding the solution to the system is therefore equivalent tofind the intersection of all the n hyperplanes. This procedure can be illustrated as follows:Eliminate one variable by intersecting two hyperplanes and the result is hyper-plane of

1.2 Gau Elimination 5

(a) Row representation

x

z

y

(b) Column representation

Figure 1.4: Geometric representation of a singular system.

n 1 dimension. Then, consecutively find the intersection of the resulting hyperplanewith the next hyperplane. Every intersection will reduce the dimension of the resultinghyperplane by one. After n 1 operations, we will obtain a hyper-point of n dimensionwhich is the solution of the system.

Gau elimination consists of two main steps: forward elimination and backward sub-stitution.

1.2.1 Forward elimination

The purpose of forward elimination is to reduce the set of equations to an upper triangularsystem. The process starts eliminating the coefficients of the first column from the secondequation until the last equations. Then it eliminates the coefficient of the second columnfrom the third equation and so forth until the n 1 column of the system. That waythe last equation will have only one unknown. To eliminate a coefficient aik from the ithequation, equation k must be multiplied by 1/aik and added to equation i. The firstequation is called the pivot equation and the term akk is called the pivot coefficient.

1.2.2 Backward substitution

After the forward elimination, the original matrix is transformed into an upper triangularmatrix. The last of the equations (n) will have only one unknown: an,n xn = bn. Theunknown xn is found and replaced back into the n1 equation an1,n1 xn1+an1,n xn =bn which now has only one unknown xn1. The equation is solved and the procedure isrepeated until we reach the first equation and all the values are known.

Known Problems:

During the process of forward elimination and backward substitution, a division byzero can be presented. In the same way, due to the computer arithmetics even if thenumber is not zero but close to zero, the same problem can be presented.

Rounded off values can result in inexact solutions.

6 Linear Systems

Ill-conditioned system of equations where small changes in the coefficients give riseto large changes in the solution.

Singular systems of equations.

1.2.3 Operations in a matrix

Let A Rnn and x, b Rn. The system of n equations can be written in terms of A, x,and b as

a11 a12 a1na21 a22 a2n...

...an1 an2 ann

x1x2...xn

=b1b2...bn

(E1)(E2)

...(En)

where (Ei) denotes equation i of the system. The following operations can be accomplishedwithout altering the result.

A equation, Ei, can be multiplied by a scalar number , with 6= 0 and the resultingequation can replace Ei

(Ei) Ei

A equation, Ej , can be multiplied by , and added to equation Ei

(Ei + Ej) Ei

The order of the equation can be changed

(Ei) (Ej)

Example 1.1Lets have the following system of two equations,

2x1 + 4x2 = 2 (1.3)

4x1 + 11x2 = 1. (1.4)

To eliminate the first variable we multiply equation (1.3) by 4/2 and add the result toequation (1.4),

2x1 + 4x2 = 2

0 + 3x2 = 3.

We arrive at the last equation then we solve for x2 as x2 = 1. The second step consistsof replacing x2 = 1 back into equation (1.3) and solving for x1 (back substitution).


Example 1.2Solve the following system of equations, represented in matrix form, using forward elimi-nation and backward substitution: 2 4 01 4 1

2 2 2

x1x2x3

= 21

4

(E1)(E2)(E3)

.

Then we operate the system in the following way:E2 E2 12E1, 2 4 00 2 1

2 2 2

x1x2x3

= 20

4

.E3 E3 22E1, 2 4 00 2 1

0 2 2

x1x2x3

= 20

2

.Notice that variable x1 has been eliminated in all the equations (column one equals zero).Now we can proceed with column two.

E3 E3 E2, 2 4 00 2 10 0 3

x1x2x3

= 20

2

.The second step consists of back-replacing the unknowns. For the last equation we have:

x3 =2

3.

then after successive replacements

2x2 +2

3= 0 x2 = 1

3.

2x1 + 4

(13

)= 2 x1 = 5

3.

Notice that the forward elimination transformed the original system Ax = b into anupper triangular system Ux = c with 2 4 00 2 1

0 0 3

= U y c = 20

2

which has the same solution since equality was maintained by applying the same operationsto A and b during the process.

8 Linear Systems

1.2.4 Forward elimination General case

Given the following set of n linear equations, E1, . . . , En

a11x1 + a12x2 + + a1nxn = b1a21x1 + a22x2 + . . .+ a2nxn = b2

...an1x1 + an2x2 + .............annxn = bn

(E1)(E2)

...(En)

we want to find xi for i = 1 . . . n. The solution is obtained by applying forward eliminationfollowed by back substitution. Forward elimination transforms the original system intoan upper triangular one. Algorithm 1 presents the forward elimination algorithm. Foreach i row it puts zeros in the positions underneath the diagonal (outer loop). To do thisit divides equation Ei by aii so that the diagonal becomes one, that is: aii = 1. Thenit multiplies Ei by aji (the value that we want to eliminate) and adds it to Ej . Theresulting equation replaces Ej .

Algorithm 1 Forward Elimination

n is the dimension of the matrix (operations)for i = 1 to n do{divides ith equation (row) by aii so that aii = 1 }Ei Ei/aii (n i+ 2){Put zeros in the ith column under the diagonal}for j > i doEj = Ej ajiEi (n i+ 2)

end forend for

In order to have a measure of how long the algorithm takes to solve a system, we com-pute the total number of operations (multiplications, subtractions, divisions and sums).We can approximate this value by only counting the number of multiplications and divi-sions. For each cycle the operations per cycle opi are (n i) divisions plus the number ofoperations of the j loop which is done (n i) times. The number of operations per j cycleis (n i), resulting from multiplying ajiEi. That is

opi = (n i+ 2) + (n i)(n i+ 2)= (n i+ 2)(1 + n i)= n2 2ni+ 3n+ i2 3i+ 2

which is the total number of operations per ith cycle. Remembering that

m1 = m

mj =

m(m+ 1)

2

mj2 =

m(m+ 1)(2m+ 1)

6


then the total number of operations after n cycles will be

op =

ni=1

(n2 2ni+ 3n+ i2 3i+ 2)

= n2ni

1 2nni

i+ 3n

ni

1 +

ni

i2 3ni

i+ 2

ni

1

= n2n 2n(n(n+ 1)

2

)+ 3nn+

n(n+ 1)(2n+ 1)

6 3

(n(n+ 1)

2

)+ 2n

= n2 +2

3n+

1

3n3.

When n is large, the number of operations op will be of the order of O(n3).

1.2.5 Backward substitution General case

After applying the forward elimination algorithm the matrix is transformed into an uppertriangular system like this

Ux =

U11 U1i U1n

. . .

Uii Uin0

. . .

Unn

x = c.

To solve an upper triangular system we solve/replace the values for x starting from thelast equation and moving backwards. This way, for the last equation we have

Unnxn = cn xn = cn/Unnwith xn known, we replace this value into the n 1 row (equation)

Un1,n1 xn1 + Un1,n xn = cn1

and solve for xn1

Un1,n1xn1 = cn1 Un1,nxnxn1 =

cn1 Un1,nxnUn1,n1

.

Now with xn and xn1 known, we can move up to the n 2 row and solve for xn2. Thisprocedure can be repeated for equation n 3 and so forth. In general, suppose we arereplacing in equation j. Equation j has the following form

UTj x = Ujjxj + Uj,j+1xj+1 + .............+ Ujnxn = cn.

10 Linear Systems

U

j

j

0

C=

xj =

unkown

known

x

Figure 1.5: The operation of the jth row times vector x has the only unknown value of xj

Notice that equation Ej is obtained by multiplying the jth row of the matrix byvector x. Figure 1.5 illustrates this multiplication. At this stage, vector x consist of twoparts. The upper consisting of unknown values and the lower known part. Then when x ismultiplied by the jth row of the matrix, all the unknown values of the matrix are cancelledexcept the xj term. This is because U is an upper triangular matrix and therefore Ujk = 0k < j

Uj,1x1 + . . .+ 0

Uj,jxj + Uj,j+1xj+1 + . . .+ Ujnxn x known

= cj

then solving for xj

Uj,jxj = cj (Uj,j+1xj+1 + . . .+ Ujnxn)

Uj,jxj = cj nk>j

Ujkxk

xj =

cj nk>j

Ujkxk

Uj,j.

This result is summarised in algorithm 2.The total number of multiplications and divisions will be

1j=n

(n j) + 1 = n1

j=n

11

j=n

j +

1j=n

1

= nn n(n+ 1)2

+ n

= n2 n2

2 n

2+ n

=(n2 + n)

2

1.3 LU Decomposition 11

Algorithm 2 Backward Substitution

n is the dimension of the matrixfor j = n 1 to 1 dotemp = 0for k = j + 1 to n dotemp = temp+ Ujk xk

end forxj = (Cj temp)/Ujj

end for

which is of order O(n2), recalling that the first part of the algorithm was of order O(n3).This example shows the advantage of doing backward substitution instead of setting zerosin the upper triangular positions of the matrix (backward elimination). This result leadus to the conclusion that substitution must be preferred over elimination due to the orderof complexity of the algorithm.

1.3 LU Decomposition

Backward substitution starts from an upper triangular system of equations and solves forx in Ux = b. As we showed in the previous section this procedure is of a lower order ofcomplexity than forward elimination. So if a system of equations can be written as

LUx = b, (1.5)

with L lower triangular and U upper triangular matrices then it can be solved with twosubstitution procedures: i) Use backward substitution to find c from Lc = b and ii) useforward substitution to find x from Ux = c. These two substitutions are of less order thanone elimination plus one substitution. However not many systems can be easily expressedin terms of (1.5). In general, one can find a possible factorisation at the same time of theelimination procedure. This of course does not have any advantage in terms of efficiencybecause the factorisation itself is a O(n3) algorithm. Nevertheless, it can be reworded ifwe are solving a system with the same matrix but several b.

To illustrate the factorisation procedure, lets consider the following system of fourequations:

Ax =

2 1 0 01 2 1 00 1 2 10 0 1 2

x1x2x3x4

=

2148

= b.Reducing the system to an upper triangular system is a straight-forward procedure becauseof the tridiagonal nature of the matrix where most of the terms are already set to zero.We proceed using forward elimination. To eliminate a21, we multiply equation E1 by

12 Linear Systems

l21 = a21/a11 = (1/2) and subtract it from equation E2,

E2 E2 (

1/2

)E1

2 1 0 0

0 3/2 1 00 1 2 10 0 1 2

.Now to eliminate a32 we multiply equation E2 by l32 = a32/a22 = (2/3) and subtract itfrom equation E3,

E3 E3 (

2/3

)E2

2 1 0 0

0 3/2 1 0

0 0 4/3 10 0 1 2

.Finally, to eliminate a43 we multiply equation E3 by l43 = a43/a33 = (3/4) and subtractit from equation E4

E4 E4 (

3/4

)E3

2 1 0 0

0 3/2 1 0

0 0 4/3 1

0 0 0 5/4

.The result is an upper triangular matrix. Also, at the same time we operate over thematrix we should operate over the b vector. It is transformed into vector c:

b2 b2 l21 b1

2048

, b3 b3 l32 b2

2048

,

b4 b4 l43 b3

2045

= c,where the same multiples lij (used for the matrix) are used to operate the vector. Theoriginal problem Ax = b was transformed into an upper triangular system Ux = c

2 1 0 0

0 3/2 1 0

0 0 4/3 1

0 0 0 5/4

x1x2x3x4

=

2045

.U x = c

Now we concentrate in how c was obtained. First notice that the operations onlyinvolves multiplications by the lij multiples. Observing with detail these sequence of

1.3 LU Decomposition 13

operations it can be notice that if these multiples are put into a matrix form (L), then thesteps from b to c are exactly the same as solving Lc = b. That is, the procedure from bto c is equivalent to apply forward elimination on the following system

1 0 0 01/2 1 0 0

0 2/3 1 0

0 0 3/4 1

2045

=

2148

.L c = b

In general, 1 0 0 0 0l21 1 0 0 0l31 l32 1 0 0...

......

. . . 0ln1 ln2 . . . lnn1 1

c1c2c3...cn

=b1b2b3...bn

.

In conclusion, we started with a system Ax = b and then by forward elimination trans-formed it into Ux = c. Both systems have the same solution since equality was maintainedat each step by applying the same operators to A and b.

As c is given by

Lc = b

and

Ux = c

then

Ux = L1b

therefore

LU x = b.

Which implies the original matrix A was factorised into a lower L and an upper U , trian-gular matrix. A good Gau elimination code consists of two main steps:

i. Factorise A into L and U and

ii. Compute x from LU x = b.

Please note that we used a tridiagonal symmetric system only to simplify the compu-tation. In general, this method applies to fully populated non-symmetrical matrices.

1.3.1 Cholesky Factorisation

Theorem 1.3.1. If A is a symmetric positive definite matrix, then A can be factorisedinto A = CCT , and C is a lower triangular matrix and CT is its transpose.

14 Linear Systems

Proof. This serves at the same time to deduce the algorithm. To prove that A = CCT

is equivalent to find C, we proceed using induction. First we show that we can calculatethe values for the first column of C and then we suppose the known values of the p 1columns and show we can compute the values for the pth column.

First Column

Ai1 =

nk=1

CpkCTk1, p = 1, . . . , n.

As C is lower triangular, that means Cik = 0 for k > i therefore

Ai1 = Ci1CT11 + Ci2C

T21 + + CinCTn1

using Cij = CTji, we have

Ai1 = Ci1C11 + Ci2C12 + + CinC1n.

Because C is lower triangular C1k = 0, for k > 1, then all the terms to the right hand sideof the equation, with the exception of the first, are cancelled.

Ai1 = Ci1C11

solving for Ci1

Ci1 =Ai1C11

and for i = 1 we have

A11 = C11C11

A11 = C211 C11 =

A11 .

Because A is positive definite, then A11 > 0. We can compute the values of the firstcolumn of C as,

Ci1 =Ai1A11

.

Now suppose we know the values of the first p 1 columns of C, we want to show wecan compute the values of the pth column. We accomplish this by expanding a componentof A in the pth column. See figure 1.6. That is

Aip =

nk=1

CikCTkp

by definition of CT

Aip =n

k=1

CikCpk

1.4 Special Types of Matrices 15

0

p

0

C C Ti

i

p

Figure 1.6: Cholesky factorisation.

which is equivalent to multiplying rows p and i of matrix C. See figure 1.6. Notice thatCpk = 0 for k > p, therefore the upper limit of the sum can be reduced from n to p

Aip =

pk=1

CikCpk

expanding the last term

Aip =

p1k=1

CikCpk + CipCpp

and solving for Cip, we have

Cip =

Aip p1k=1

CikCpk

Cpp(1.6)

where Cpp can be calculated making p = i, in the last equation

Cpp =

App p1k=1

CpkCpk. (1.7)

Equations 1.6 and 1.7 demonstrate that we can compute the matrix C for any Apositive definite.

1.4 Special Types of Matrices

Diagonally Dominant Matrix

A square matrix A Rnn is strictly diagonally dominant if

|aii| >n

j=1j 6=i

|aij |, i = 1, . . . , n.

16 Linear Systems

Non zero position

Figure 1.7: Banded matrix

A strictly diagonally dominant matrix is nonsingular. A symmetric diagonally dominantreal matrix with nonnegative diagonal entries is positive semidefinite.

Positive definite

A matrix A Rnn is positive definite if it is symmetric and xtAx > 0, for all x, withx 6= 0, and x Rn.

Banded

A matrix A is banded if there exist integer numbers p and q, 1 < p, q < n, such thataij = 0, always that j i + p and i j + q. In other words the non zero values of thematrix are around a diagonal by a distance of p. See figure 1.7.

1.5 Exercises

1. Matrix and vector operations. Assume that R, x,y, z Rn and A Rnnthen implement the following operations into computer code.

(a) Scalar vector multiplication: z = x zi = xi(b) Vector addition: z = x+ y zi = xi + yi(c) Dot product (inner product): = x y = xTy = xiyi(d) Vector multiplication: z = x y zi = xi yi(e) Scalar matrix multiplication: B = A Bij = Aij(f) Matrix addition: C = A+B Cij = Aij +Bij(g) Matrix vector multiplication: y = Ax y =

jaijxj

(h) Matrix matrix multiplication: Let A Rnp and B Rpm, thenC = AB Cij =

pk=1

aikbkj , and in colon notation Cij = A (i, :)B ((:, j)

1.5 Exercises 17

2. Given the following matrix A =

[4 11 54

]find the Cholesky factorisation C such that

CCT = A.

3. Given b =

(594

)solve the system Ax = b using the Cholesky method and the factori-

sation found the the previous exercise.

4. The stress tensor [] for a two dimensional case is represented by a 22 matrix is givenby

[] = traza() [I] + 2 [] (1.8)

where [] is a 2 2 matrix that represents the strain tensor, [I] is the identity matrixand and are the Lame constants which are related with the Young modulus E andthe Poissons ratio by

=E

(1 2)(1 + ) , =E

2(1 + )(1.9)

The traction q (force per unit area) over a plane with normal e = (e1, e2) is given bythe following expression

q = []e (1.10)

Write a computer program that given the strain tensor , the material constants E and and the plane normal vector e, computes the traction vector over the plane.

5. Given the following matrices L and S shows that L is the inverse of S

L =

1l21 1l31 0 1

, S = 1l21 1l31 0 1

6. Write the system Ax = b corresponding to the following plot.

18 Linear Systems

3

1

y

x

(2,5)

7. Shows that if A = CCT then A is symmetric.

8. A matrix A is positive definite if xT Ax > 0 allways that x 6= 0. Shows that the matrixA defined by

A =

[a 00 c

]is positive definite always that a and c be greater than zero.

9. Let be A a positive definite and symmetric matrix in Rnn. Compute the number ofoperations (multiplications and divisions) needed to accomplish the Cholesky factori-sation A = C CT . Assume that the matrix is stored in complete form (lower and upperentries) and its band equal to b. The number of operations should be expressed interms of n and b.

Chapter 2

Iterative Methods

2.1 Vector Space

A vector space E is a set with two operations defined as addition and scalar multiplication.Additionally, if x, y, and z belong to E and and real numbers, then the followingaxioms are satisfied.

(E1) Commutativity x+ y = y + x(E2) Associativity (x+ y) + z = x+ (y + z)(E3) Identity element There is an element in E, denoted by 0, such

that 0 + x = x+ 0 = x(E4) Inverse element For each x in E, there is an element x in E,

such that x+ (x) = (x) + x = 0(E5) Distributivity of scalar

multiplication with re-spect to scalar addition

(+ )x = x+ x

(E6) Distributivity of scalarmultiplication with re-spect to vector addition

(x+ y) = x+ y

(E7) Compatibility of scalarmultiplication withscalar multiplication

()x = ()x

(E8) Identity element ofscalar multiplication

1 x = x

19

20 Iterative Methods

2.2 Vector Norms

A vector norm in E is a function defined in E R with the following properties: Letx, y E and R then

i) N(x) 0, x Eii) N(x) = 0, x = (0, 0, . . . , 0)T = 0iii) N(x) = ||N(x)iv) N(x+ y) N(x) +N(y).

2.2.1 Distance between two vectors

Let x,y E the distance d R between two vectors with respect to the norm bedefined as:

d = N(x y)then d is referred to as a metric for Rn.

Properties of a metric

Let x, y, z E then

i) d(x, y) 0, d(x, y) = 0 if and only if x = yii) d(x, y) = d(y, x)

iii) d(x, y) d(y, z) + d(z, y).

2.2.2 Some norms in Rn

The norms l1, l2, and l, are defined as follows

x1 =|xi|.

x2 ={

x2i

}1/2.

x = max1in |xi| .

Example 2.1Show that x is a norm.

Properties i) to iii) are straightforward and are left as exercise. For iv) property wehave

iv) x+ y = max |xi + yi| max(|xi|+ |yi|)x+ y < max |xi|+ max |yi| .

2.2 Vector Norms 21

Example 2.2Find the l1, l2 and l norms of the following vectors

a. x = (1, 1,2)T

b. x =(

3,4, 0, 3/2)T

c. x =(sin k, cos k, 2k

)T, k Z+.

For the a. case we have

x1 = 1 + 1 + 2 = 4x2 =

(1)2 + (1)2 + (2)2 =

6 = 2.449 and

x = max {|1| , |1| , |2|} = 2.Example 2.3Given x = (1, 1, 1) and y = (1.2, 1, 1) find the distance between x and by.Ans.

x y2 = 0.21356 (2.1)x y = .2 (2.2)

Example 2.4In an experiment, theory predicts that the solution to a problem is x = (1, 1, 1). But theexperiment results are xe = (1.001, 0.989, 0.93). Find the distance between the two resultsusing l and l2.l:

d = (1, 1, 1) (1.001, 0.989, 0.93)= (0.001, 0.011, 0.07)= 0.07

l2:

d = (0.001, 0.011, 0.07)2=0.0012 + 0.0112 + 0.072

= 0.070866

Some norms as the l2 norm requires a bit more work to be proved. The followingtheorem is an intermediary result necessary to prove that l2 is a well defined norm.

Theorem 2.2.1 (The Cauchy-Buniakowsky-Shwarz inequality). This inequality statesthat for all x,y Rn then

ni=1

|xiyi| {

ni=1

x2i

}1/2{ ni=1

y2i

}1/2.


Proof. If y = 0 or x = 0, the result is straight forward because both sides of the inequalityare zero.

Lets suppose that y 6= 0 and x 6= 0. For each R we have

0 x y22 =ni=1

(xi yi)2 =ni=1

x2i 2ni=1

xiyi + 2

ni=1

y2i

then

2ni=1

xiyi 5ni=1

x2i + 2

ni=1

y2i = x22 + 2y||22

because x2 > 0 and y2 > 0 the last equation is particularly true for = x2y2 . Then

2x2y2

ni=1

xiyi 5 x22 +(x2y2

)2y22 = 2x22

dividing by

2

xiyi 2x22(y2x2

)= 2x2 y2

therefore xiyi x2y2

Defining x as a vector whose xi component is xi = xi whenever xi yi 0. It is truethat x2 = x2 and,

|xiyi| =

xiyi x2 y2 = x2 y2

that is

ni=1

|xiyi| {

ni=1

|xi|2}1/2{ n

i=1

|yi|2}1/2

Example 2.5Prove that 2 is a well defined norm

i. x2 0, x2 = 0 x = 0

ii. x2 = ||x2iii. x+ y2 x2 + y2

i. and ii. have obvious results. To prove iii. we can use the Cauchy-Buniakowsky-Shwarz inequality.

2.3 Sequence of Vectors 23

Let be x,y Rn

x+ y22 =

(xi + yi)2 =

(x2i + 2xiyi + y

2i )

x2i + 2

xiyi +

y2i

x2i + 2x2 y2 +

y2i

x22 + 2x2 y2 + y22= (x2 + y2)2.

Then

x+ y22 (x2 + y2)2.

Therefore

x+ y2 x2 + y2

2.3 Sequence of Vectors

A sequence of vectors{x(k)

}k=1

converge to x in Rn, if > 0, N() such thatx(k) x < , k > N().Theorem 2.3.1. A sequence of vectors {x(k)} converge to x Rn with respect to ,if and only if

limk

x(k)i = xi, i n.

Proof. It consist of two parts

a. Lets suppose that {x(k)}k=1 converge to x with respect to the || || norm. Then givenany > 0, exists an integer N() such that for all k N()

||x(k) x|| <

that is

maxi|x(k)i xi| <

this implies that |x(k)i xi| < for all i and therefore

limk

x(k)i = xi


b. Suppose that limk

x(k)i = xi for all i = 1, 2, ..., n. That is for a given > 0 Ni()

such that |x(k)i xi| < always that k > Ni(). Now lets define N() = maxi Ni().Therefore if k N() then |x(k)i xi| < for each i and in particular

maxi|x(k)i xi| < (2.3)

||x(k) x|| < (2.4)(2.5)

which implies {x(k)}k=1 converges to x.

Example 2.6Show that the following vector converges.

xk = (1, 2 + 1/k, 3/k2, ek sin(k))

According to the theorem, to show that the vector bx converges with respect to the norml it is enough to show that each of its components converges. Then we have

limk

1 = 1, limk

2 + 1/k = 2, limk

3/k2 = 0, limk

ek sin(k) = 0

as each xi converges we conclude that x converges.

2.3.1 Equivalent norms

Definition: Let N1 and N2 be two norms on a vector space E. These norms are equivalentnorms if there exist positive real numbers c and d such that

c N1(x) N2(x) d N1(x)

for all x E. An equivalent condition is that there exists a number C > 0 such that1

CN1(x) N2(x) C N1(x)

for all x E. To see the equivalence, set C = max{1/c, d}.Some key results are as follows:

i. On a finite dimensional vector space all norms are equivalent. The same is not truefor vector spaces of infinite dimension (Kreyszig, 1978).

ii. It follows that on a finite dimensional vector space, one can check the convergenceof a sequence with respect to any norm. If a sequence converges in one norm, itconverges in all norms.

2.3 Sequence of Vectors 25

iii. If two norms are equivalent on a vector space E, they induce the same topology onE (Kreyszig, 1978).

Theorem 2.3.2. Let x Rn then the norms l2 and l are equivalent in Rn

x x2 n x

Graphically this result is illustrated in figure 2.1 for the n=2 case.

Proof. Let us select xj as the maximum component of x, that is xj = max | xi|. Itfollows that

x2 = |xj |2 = x2j ni=1

x2i = x22 .

Additionally,

x22 =ni=1

x2i ni=1

x2j = x2j

ni=1

1 = nx2j = n x2 .

Figure 2.1: Equivalence of l and l2 norms

Example 2.7Given:

xk = (1, 2 + 1/k, 3/k2, ek sin(k))

Show that xk converge to x = (1, 2, 0, 0) with respect to the norm 2.

Solution.

We already proved that this sequence of vectors converges with respect to the l norm.Therefore, given any R, > 0, an integer N(/2) with the property thatxk x

< /2.


always that k > N(/2) and using the result from theorem 2.3.2xk x2 N(/2) therefore{xk}

converges to x with respect to 2.Note: It can be shown that all the norms of Rn are equivalent with respect to theconvergence.

2.4 Inner Product

An Inner product is a way to multiply vectors together and the result of this multiplicationis a scalar number. For a vector space V , an inner product is a map , : V V Rthat satisfies the following four properties. Let x, y, and z and be elements in V and be a scalar, then:

i. x,y = y,xii. x,y = x,yiii. x+ z,y = x,y+ z,yiv. x,x 0v. x,x = 0 x = 0

A vector space together with an inner product on it is called an inner product space. Thisdefinition also applies to an abstract vector space over any field.Examples of inner product spaces include:

a. The Euclidean space Rn, where the inner product is given by the dot product

(x1, x2, . . . , xn), (y1, y2, . . . , yn) =ni=1

xi yi.

b. The vector space of real functions whose domain is a closed interval [a, b] with innerproduct

f, g = ba

= f g dx.

Every inner product space is a metric space. The metric is given by

g(v, w) = v w, v w . (2.6)

If this process results in a complete metric space ( a metric space in which every Cauchysequence is convergent) it is called a Hilbert space.

2.5 Matrix Norms 27

2.5 Matrix Norms

Let A y B Rnn, R. A matrix norm in Rn is a function defined in Rnn Rwith the following properties:

i) A 0ii) A = 0 if and only if A = 0 zero matrixiii) A = || Aiv) A+B A+ Bv) AB A B

The distance between two matrices can be defined in the usual way as d = AB.Ax

x

1

0.5

0

0.5

1

2 1 1 2

Figure 2.2: Geometrical effect of multiplying a vector x by a matrix A over a set of vectors withnorm equal to one.

2.5.1 Natural norms

A matrix natural norm is derived from vector norms. To understand how a vector normcan be used to define a matrix norm let us first observe the geometrical effect of matrix-vector multiplication. When a matrix A is multiplied by a vector x, the result is a newvector which is rotated and scaled in comparison with the original x vector. Figure 2.2illustrates this transformation for a series of vectors x R2 with Euclidean norm equal toone. This set of vectors represents a circle. When operator A is applied to the vectors theoriginal circle is transformed into an ellipse. All vectors are scaled by different factors. Ifwe choose a factor C large enough to be larger than the maximum scale factor, then wecan affirm that

Ax C x .If C is the smallest number for which the inequality holds for all x, that is C is themaximum factor by which A can stretch a vector, then A is defined as the supreme ofC over all vectors

A = sup C = sup Axx


or equivalentlyA = sup

x=1Ax .

Figure 2.3: Examples of norms.

Theorem 2.5.1. If is a vector norm in Rn, then

A = maxx=1

Ax

is a matrix norm and is called natural norm.

Proof. In the following proof we use Einstein notation to simplify the overuse of thesummation symbol. In Einstein notation the symbol

is suppressed and summation is

assumed over the repeated indices. That is,

j aij xj is equal to aij xj .

i. A > 0,

Ax =

a1jxj...

aijxj...

anjxj

> 0, if and only if aijxj 6= 0

because x 6= 0 aijxj = 0, if and only if aij = 0ii. A = || A,

A = aijxj = || aijxj = || A

2.6 Eigenvalues 29

iii. A+B A+ B

A+B = maxx=1

(aij + bij)xj = maxx=1 aijxj + bijxj

maxx=1

(aijxj+ bijxj) A+ B

Example 2.8It can be shown that if A = aij then the A can be calculated by

A = maxnj=1

|aij |

see exercise 4. If A is defined as:

A =

1 2 10 3 15 1 1

then,

|a1j | = 4|a2j | = 4|a3j | = 7

thereforemax = 7and A = 7

2.6 Eigenvalues

For any square matrix A, we can look for vectors x that are in the same direction as Ax.These vectors are called eigenvectors. Multiplication by a matrix A normally changes thedirection of a vector but for certain exceptional vectors, Ax is a multiple of x, that is

Ax = x.

In this way, the effect of multiplying Ax is to stretch, contract, or reverse x by a factor. This is illustrated in figure 2.4.

Example 2.9The matrix:

A =

3 0 00 2 00 0 1


x

Ax = x

Figure 2.4: Multiplication of matrix A by special vector only changes its magnitude

has the following eigenvalues and eigenvectors:

1 = 3, x1 =

100

; 2 = 2, x2 = 01

0

; 3 = 1, x3 = 00

1

.Additionally, any vector in the space can be written as a linear combination of the

eigenvectors (this is only if the eigenvectors are all different)

y = 1x1 + 2x2 + 3x3

with i R and xi R3. If we apply A to vector y, we have:

Ay = A(1x1 + 2x2 + 3x3)

= A1x1 + 2Ax2 + 3Ax3

= 11x1 + 22x2 + 33x3.

The action of A in any vector y is still determined by the eigenvectors.

Diagonal matrices are certainly the simplest. The eigenvalues are the diagonal itselfand eigenvectors are in the direction of the Cartesian axes. For other matrices we find theeigenvalues and then the eigenvectors in the following form:

If Ax = x, then (A I)x = 0, and because x 6= 0 therefore A I has dependentcolumns and the determinant of A I must be zero, in other words, shifting the matrixby I, it becomes singular.

Example 2.10Let

A =

[2 11 2

]then

A I =[

2 11 2

]. (2.7)

2.6 Eigenvalues 31

The eigenvalues of A are the numbers that make the determinant of A I equal tozero

det(A I) = (2 )2 1 = 0= 2 4+ 3 = 0= ( 1)( 3) = 0

which leads to 1 = 1 and 2 = 3. Replacing 1 = 1 into equation 2.7

A 1I =[

1 11 1

]because (A I)x(1) = 0 [

1 11 1

][x

(1)x

x(1)y

]=

[00

]which leads to,

x(1) =

[11

]2 = 3

A 2I =[ 1 1

1 1]

and

x(2) =

[11

].

In the case when n = 2 the characteristic polynomial is quadratic and there is an exactformula for computing the roots. For n = 3 and n = 4 the characteristic polynomials areof order 3 and 4 for which there exist formulas for computing the roots. For n > 4 thereis not (and there will never be) such a formula. Numerical methods must be used in thosecases. See (Golub and Loan, 1996) for a detailed presentation of such algorithms.

2.6.1 Applications

There is probably not other mathematical property with larger physical application thanthe Eigenvalues and Eigenvectors of a matrix.

In Solid Mechanics the stress tensor represents the state of internal forces of a point ina continuum. The actual force over a surface is given by the multiplication of the stressmatrix by a vector normal to the surface. In this way depending on the orientation ofthe plane there will be different components of the shear or tension stress. The directionsgiven by the eigenvalues of the stress tensor represent the main directions of the stress inwhich the tension stress is maximum and the shear stress is zero.

A second example is found in vibration analysis. A dynamic system can be identifiedin terms of the mass and spring constants of its components. When the equations ofmotion for the system are written, it results in a set of differential equations that can


be written in matrix form. The solution of these equations leads to the conclusion thatthe eigenvectors of the matrix identifying the system correspond to the natural vibrationmodes of the dynamic system (Clough and Penzien, 1993).

Principal component analysis is a well established method used to explain the rela-tionship between multidimensional data (n observations) and lower number of variablesm. The method finds the covariance matrix that relates the data and finds the eigenvec-tors and eigenvalues of it. Eigenvectors represent the principal components of the dataanalysed (Fukunaga, 1990).

There many other applications and in fields as variated as for example web searchengines (The famous Google page rank (Berry and Browne, 2005; Vise and Malseed)

2.7 Iterative Methods

The first part of this section introduces some basic concepts essential to the understandingof the iterative methods in linear algebra.

2.7.1 Spectral Radius

The spectral radius of a matrix A is defined as (A) = max() when is an eigenvalue ofA.

Theorem 2.7.1 (Norm of Spectral Radius). If A is a n n matrix, then:i. A 2=

(AtA)

ii. (A) A , for any natural norm .Proof. The first part is left as an exercise. To prove the part two suppose that is aneigenvalue of A with associate eigenvector x then and x = 1

| |=| | x = x = Ax A x = A

therefore if is the maximum eigenvalue.

max | |

2.7 Iterative Methods 33

Theorem 2.7.2 (Convergent Matrix). For a given matrix A (n n) the following state-ments are equivalent.

i. A is convergent.

ii. limn A

n = 0, for any natural norm.

iii. limn A

n = 0, for every natural norm.

iv. (A) < 1

v. limn A

nx = 0, x.

Proof. The proof of this theorem can be found in (Issacson and Keller, 1966) page 14.

2.7.3 General characteristics of Iterative Methods

A sequence of vectors can be generated in the following way

x(k) = T x(k1) + c

Examples of iterative methods includes the Jacobi, Gauss-Seidel methods, and more recentConjugate Gradient (Hestenes and Stiefel, 1952), GMRES (Saad and Schultz, 1986) andmany others. The development and interest of iterative methods has increased greatly overthe years. While they do not seem to be efficient for small system of equations they are acomplete success for large linear systems. The following results are useful to understandwhen do they converge.

Theorem 2.7.3. If the spectral radius (T ) satisfies (T ) < 1 then there exist a (IT )1and

(I T )1 = I + T + T 2 + ... =j=0

T j

Proof. As Tx = x is true for an eigenvalue and x its eigenvector, then multiplying by-1 and adding x to both sides we have.

x Tx = x x(I T )x = (1 )x

meaning that (1 ) is an eigenvalue of the matrix (I T ) but || (T ) < 1 therefore = 1 is not an eigenvalue of T , hence = 0 is not an eigenvalue of (I T ). Therefore(I T )x 6= 0 means (I T )1 exists!.

Let Sm = I + T + T2 + ...+ Tm then

(I T )Sm = (I + T + T 2 + ...+ Tm) (T + T 2 + ...+ Tm+1)= I Tm+1


As T is convergent.

limm (I T )Sm = limm (I T

m+1) = I

therefore

(I T )1 = limm Sm = I + T + T

2 + ... =j=0

T j

Theorem 2.7.4. For all x(0) Rn, the sequence {xk}k=0 defined by:x(k) = Tx(k1) + c, k 1

converges to the unique solution

x = Tx+ c (T ) < 1Proof. Lets suppose that (T ) < 1 then

x(k) = Tx(k1) + c

but in turn x(k1) is expressed in terms of x(k2) and so forth.

xk = T (Tx(k2) + c) + c

= T 2x(k2) + Tc+ c

= T 2x(k2) + (T + I)c

=...

= T kx(0) + (T k1 + T (k2) + ...+ I)c

xk = T kx(0) + (Sk1)c

in the limit:

limk0

xk = 0 + limk

(Sk1)c

because T is convergent.Additionally

limk

(Sk1) = (I T )1

Therefore.

limk

xk = (I T )1cx = (I T )1c

(I T )x = cx Tx = c


Now the opposite

xk = Tx(k1) + c converges to x = Tx+ c then (T ) < 1

for an arbitrary initial guess x0 lets define z as z = x0 x. For the exact value x wehave

x = Tx+ c

from which we substract(x(k) = Tx(k1) + c)

to obtain,

x x(k) = Tx Tx(k1)x x(k) = T (x x(k1))

using recursively this definition for x x(k1)

x x(k) = T (T (x x(k2))) = T 2(x x(k2)) = T k(x x0) = T k(z)

because the solution converges then

0 = T k(z) when k .then T k converges (T ) < 1.

2.7.4 Jacobis Method

Matrix vector multiplication can be expressed in index notation as

Ei :j

aijxj = bi.

Which corresponds to the multiplication of the ith row of A times vector x. Taking theith term of the sum ()aiixi) apart from the rest of the sum we have

Ei : aiixi +j 6=i

aijxj = bi

Solving for xi

xi =

j 6=i

aijxj + bi

aii

which can be written asxi =

Tijxj + ci (2.8)


where

Tij =

{0, if i = j,aijaii

if i 6= j , andbiaii

= ci

Equation 2.8 can be used to produce a sequence of the form

x(k) = Tx(k1) + c

with the stop condition xk xk1xk <

If we define

(D)ij = aij if i = j, and 0 if i 6= j(L)ij = aij if i > j, and 0 if i j(U)ij = aij if i < j, and 0 if i j

thenA = (D L U)

and we can write the same result in compact matrix notation as

Ax = b

(D L U)x = bDx = D1(L+ U)x+D1b

2.7.5 Gau Seidel

In the Jacobi method at each iteration (k) the vector xk is computed using the informationof the previous computation xk1. However, as you compute the vector xk at position xkione can use the freshly computed components (xkj ) for the j < i. That is, from equation2.8 we have

xki =

j 6=i

aijxk1j + bi

aiidividing the sum in two

xki =

ji

aijxk1j + bi

aii

and replacing the xk1i by the xki for the first sum

xki =

ji

aijxk1j + bi

aii

Which can seeing as an improvement over the Jacobi method. However, none of thismethods has been famous for its convergence speed.


Theorem 2.7.5 (Symmetric Matrix). If A is symmetric then

yTAx = xTAy

Proof.

yT (Ax) = yT

j

aijxj

=i

yij

aijxj

Since A = AT , aij = aji then,

=i

j

yiajixj

=j

xji

ajiyi

= xTAy

which completes the proof.

Positive definite matrix

A symmetric matrix A Rnn is said to be positive definite if

x, Ax > 0, x Rn.

Theorem 2.7.6. If A is a positive definite matrix, then the quadratic

P (x) =1

2x, Ax x, b

is minimised at the point where Ax = b and the minimum value is

P (A1b) = 12

b, A1b

.

Proof. For the case when n = 1, it is quite simple,

P (x) =1

2ax2 bx

dP (x)

dx= ax b = 0

ax = b.

If a is positive, then P (x) is a parabola which opens upward.


For n > 1, suppose x is the solution to Ax = b, we want to show that at any point y,P (y) is larger than P (x).

P (y) P (x) = 12y, Ay y, b 1

2x, Ax+ x, b

Replacing Ax = b, we have

P (y) P (x) = 12y, Ay y, Ax 1

2x, Ax+ x, Ax

=1

2y, Ay y, Ax+ 1

2x, Ax

because A is symmetric

y, Ax = 12y, Ax+ 1

2x, Ay

then,

P (y) P (x) = 12y, Ay

(1

2y, Ax+ 1

2x, Ay

)+

1

2x, Ax

=1

2y, (Ay Ax) 1

2x, (Ay Ax)

=1

2y, A(y x) 1

2x, A(y x)

=1

2(y x), A(y x) .

Since A is positive the last expression can never be negative. It is equal to zero if y = x.Therefore P (y) is larger than P (x) and the minimum occurs at x = A1b

Pmin =1

2

A1b, A(A1b)

(A1b), b=

1

2

(A1b), b

(A1b), b= 1

2

(A1b), b

.

In conclusion, minimising P (x) is equivalent to solving Ax = b. Figure 2.5 illustratesthis result in R2. P (x) represents a paraboloid facing upward and with its minimum valueat x = A1b.

2.7.6 Steepest Descent

One of the simplest strategies to minimise P (x) is the Steepest Descent method. At agiven point xc the function P (x) decreases most rapidly in the direction of the negativegradient P (xc) = bAxc. We call rc = bAxc, the residual of xc. If the residual isnon zero, then there exists a positive such that P (xc + rc) < P (xc). In the method


Figure 2.5: Two dimensional representation of P (x)

of the steepest descent (with exact line search) we set = rc, rc / rc, Arc therebyminimising P (xc + rc).

To show this let us expand P at the point (xc + rc). If P (x) is defined as

P (x) =1

2x, Ax x, b ,

then

P (xc + rc) =1

2xc + rc), A(xc + rc) xc + rc, b

=1

2[xc, A(xc + rc)+ rc, A(xc + rc)] (xc + rc), b

=1

2

[xc, Axc+ xc, Arc+ rc, Axc+ 2 rc, Arc] xc, b rc, b .Sorting terms we have:

P (xc+rc) =1

2xc, Axcxc, b+1

2

[ xc, Arc+ rc, Axc+ 2 rc, Arc

] rc, b .Notice that the first two terms represent P (xc). Since A is symmetric

P (xc + rc) = P (xc) +1

2

[2 xc, Arc+ 2 rc, Arc

] rc, b= P (xc) + xc, Arc+ 1

22 rc, Arc rc, b .

Replacing b = rc +Axc

P (xc + rc) = P (xc) + xc, Arc+ 122 rc, Arc rc, (rc +Axc) .


Expanding the last term and using the symmetry of A again

P (xc + rc) = P (xc) + xc, Arc+ 122 rc, Arc rc, rc rc, Axc

= P (xc) +1

22 rc, Arc rc, rc

which is minimum when

d

d

(1

22 rc, Arc rc, rc

)= 0

that is

rc, Arc = rc, rctherefore

=rc, rcrc, Arc .

Algorithm 3 summarise this results. Notice that in a practical computer implementationit is not necessary to store all the terms (x(k), r(k), and (k)) in the sequence.

Algorithm 3 Steepest Descent

Set initial guess x(0)

Set = small numberCompute r(0) = bAx(0)k = 0while r(k) > do

(k) =

r(k), r(k)

r(k), Ar(k)

x(k+1) = x(k) + (k)r(k)

r(k+1) = bAx(k)k = k + 1

end while

2.7.7 Conjugate Gradient

This is an improvement over the previous method. The goal, as before, is to minimise thefunction P (x) defined by

P (x) =1

2x, Ax x, b . (2.9)

The general search directions are defined by an arbitrary direction vector v as

x(k) = x(k1) + t(k)v(k) (2.10)


and this search direction will be optimum for t given by

t(k) =

v(k), r(k1)

v(k), Av(k)

(2.11)where r is the residual. This result can be easily demonstrated using the same procedureused in the Steepest Decent to find the optimum when the search direction is the negativeof the gradient.

As a difference of the steepest Descent method the search directions in the ConjugateGradient method are not based on local information only, instead it accounts for all theprevious search directions taking care that the new direction be A-orthogonal to all theprevious ones. This will have some advantages that will be mention, but first lets definethe meaning of A-orthogonal.

The Method of A-orthogonality

A set of vectors V = {v(1),v(2), . . . ,v(n)} is called A-orthogonal with respect to a matrixA if

v(i), Av(j)

= 0 if i 6= j (2.12)As a result the set of vectors V is linearly independent

Theorem 2.7.7. Given V = {v(1),v(2), . . . ,v(n)} a set of A-orthogonality vectors in Rmand given x0 and arbitrary vector in Rm, then the sequence

x(k) = x(k1) + t(k)v(k)

with tk as in (2.11), will converge to the exact solution of Ax = b in n iterations assumingexact mathematics.

Proof. A proof can be found in (Burden, 1985)

Theorem 2.7.8. The residual vectors r(k), with k = 1, 2, . . . n for the conjugate gradientmethod satisfy

r(k),v(j)

= 0 for j = 1, 2, . . . k

Proof. A proof can be found in reference (Burden, 1985)

Given that the r(k1) residual is A-orthogonal to all the previous search directions vj

(j = 1, 2, . . . , k 2) then vk can be constructed in terms of r(k1) and v(k1) asv(k) = r(k1) + s(k1) v(k1) (2.13)

Selection of this direction also implies that any two different residual from the sequencewill be orthogonal. See (Shewchuk, 1994) for a proof. As we want v(k) to be A-orthogonalto all the previous directions it should satisfy

v(k1), Av(k)

= 0


and replacing v(k) by (2.13)

0 =v(k1), Av(k)

=v(k1), A(r(k1) + s(k1) v(k1))

=v(k1), Ar(k1)

+ s(k1)

v(k1), Av(k1)

solving for s(k1)

s(k1) = v(k1), Ar(k1)

v(k1), Av(k1)

(2.14)To compute x(k) we need to compute first t(k) from (2.11)

t(k) =

v(k), r(k1)

v(k), Av(k)

= (r(k1) + s(k1)v(k1)), r(k1)v(k), Av(k)

=

r(k1), r(k1)

+ s(k1)

v(k1), r(k1)

v(k), Av(k)

and by theorem 2.7.8

v(k1), r(k1)

= 0 therefore

t(k) =

r(k1), r(k1)

v(k), Av(k)

(2.15)finally the new residual is find as r(k) = b Ax(k). Equations (2.14) (2.13) (2.15) (2.10)can be used in that order to produce the sequence of x(k) vectors. However a furthersimplification can be done for s(k1) avoiding one matrix vector multiplication. Let usfirst re-write the residual r(k) = bAx(k) using equation (2.10)

r(k) = bAx(k) = bA(x(k1) + t(k)v(k)

)that is

r(k) = r(k1) t(k)Av(k) (2.16)Using this result into the inner product of r(k) by itself and recalling that any two differentresiduals are orthogonal we have

r(k), r(k)

=r(k), r(k1) t(k)Av(k)

=r(k), r(k1)

t(k)

r(k), Av(k)

= t(k)

r(k), Av(k)

.

Besides we can get an expression forv(k), Av(k)

from equation (2.15)

v(k), Av(k)

=(

1/t(k))r(k1), r(k1)

2.8 Condition Number 43

replacing this two last results into equation 2.14 for s(k)

s(k) = v(k), Ar(k)

v(k), Av(k)

= r(k), Av(k)v(k), Av(k)

=

(1/t(k)

) r(k), r(k)

(1/t(k)

) r(k1), r(k1)

That is

s(k) =

r(k), r(k)

r(k1), r(k1)

(2.17)As a difference, the equation now solve for s(k) instead of s(k1) this change the orderapplication of the equations to produce the sequence for x(k). Before completing the algo-rithm, it is necessary to set the initial guess x(0) and initial residual and search direction.The new order to evaluate a step in the iteration process is (2.15), (2.10), (2.16), (2.17),and (2.13). This is summarised in algorithm 4. However, this version of the method doesnot have fast convergence properties. The preconditioned version of the method in whichthe system is pre-multiplied at each iteration by a special matrix improves largely its con-vergence properties. This is out of the scope of this book but the reader is referred to thebook of SIAM, Templates for the Solution of Linear Systems see reference (Barrett et al.,1994).

Algorithm 4 Conjugate Gradient

Set initial guess x(0)

Compute r(0) = bAx(0) and set v(1) = r(0)for k = 1 to n do

t(k) =

r(k1), r(k1)

v(k), Av(k)

x(k) = x(k1) + t(k)v(k)

r(k) = r(k1) t(k)Av(k)

s(k) =

r(k), r(k)

r(k1), r(k1)

v(k+1) = r(k) + s(k) v(k)

end for

2.8 Condition Number

Condition number of a non singular matrix A related to the norm . is

(A) = A A1.


The base-b logarithm of (A) is an estimate of how many base-b digits are lost in solvinga linear system with that matrix. It approximates the worst case loss of precision. For allnon singular matrix A and any norm .

1 = I = A A1 A.A1 = (A).

A system is said to be singular if the condition number is infinite, and ill-conditioned if itis too large, where too large means roughly the precision of matrix entries.

Consider for example

A =

[1 2

1.0001 2

],

for which A = 3..0001. This is not a large number. However,

A1 =[ 10000 10000

5001.5 5000],

and its norm A1 = 20000. Therefore, The condition number will be (A) = (3.0001)(20000) =60002

2.9 Exercises

1. Show that the norm defined by

A = maxi,j|ai,j |

is not a consistent norm, that is it do not comply with the following property

AB A B

Hint: Find a counter example with A = B

2. The Hilbert-Schmidt o Frobenius norm is defined as

AF =

i,j

(ai,j)2

Show the triangular inequality for this norm

3. Show that if A1 is the matrix norm induced by the vector norm x =m

i=1 |xi|,then A1 is equal to the maximum sum of the columns of A, that is

A1 = maxj

(ni=1

|ai,j |)

2.9 Exercises 45

4. Show that if Ainf is the matrix norm induced by the vector norm x = maxi |xi|,then A is equal to the maximum sum of the rows of A, that is

A = maxi

nj=1

|ai,j |

5. For the following system

A =

[8 11 2

] [xy

]=

[93

]Find the two first terms of the sequence generated by the steepest descent method. Ifthe solution is [x, y] = [1, 1], find the maximum norm error and the two norm error forthese two terms.

6. Show that if A is a symmetric matrix then its eigenvalues are real numbers and theeigenvectors are perpendicular.

7. Prove that A 2=(AtA).

8. Shows that the gradient of the quadratic form P (x) = 12 x, Ax x, b . is equal toP (xc) = bAxc.

Chapter 3

Interpolation

3.1 Introduction

Suppose that as a result of an experiment a set of data points (x, u) related to each other isobtained. The relationship between x and u is expressed as u(x) but from the experimentaldata we only know the values at certain points, that is, ui = u(xi). For example supposewe have the following data:

u 1.29 1.74 2.38 3.19 4.03 4.65 4.55 3.04x 1.00 1.90 2.80 3.70 4.60 5.50 6.40 7.30

This data is plotted in figure 3.1. The continuous line represents the function we wantto interpolate. Unfortunately this function is usually unknown and the only informationavailable is that presented in the table. The problem that rises is to evaluate the functionu(x) at intermediate data points x. That is xi < x < xi+1. Then we need to find acontinuous function u(x), that approximates u(x) in such way that is exact at the datapoints u(xi) = ui.

1.5

2

2.5

3

3.5

4

4.5

1 2 3 4 5 6 7x

Figure 3.1: Approximation of a set of data points

3.1.1 Polynomial approximation

Polynomials are functions that provide advantages for approximating a set of points. Theyare continuous, differentiable, and integrable. Furthermore, these operations can be ac-

47

48 Interpolation

complished and implemented straight-away.In general, a polynomial of order n can be written as,

Pn(x) = anxn + . . .+ a0

with n a positive integer and an, . . . , a0 real coefficients. Its derivate

P n(x) = nanxn1 + . . .+ a1

and its integral Pn(x)dx =

anxn+1

(n+ 1)+ . . .+ a0x+ C.

WEIERSTRASS theorem of approximation

Suppose that f is a function defined and continuous in the interval [a, b]. then for each > 0 exist a polynomial P (x) defined over [a, b] with the following property

|f(x) P (x)| < x .[a, b]Figure 3.2 illustrate such a result.

Figure 3.2: Weierstrass theorem of approximation

Example 3.1 Taylor Polynomials.Taylor polynomials are very good to locally approximate a function around a point. Sup-pose that f(x) Cn[a, b], that fn+1 exists in [a, b], and that x0 [a, b]. Then for allx [a, b] at a distance h = x x0 it is true that

f(x) = f(x0 + h) = f(x0) + f(x0)(h) +

f (x0)2!

(h)2 + . . .+fn(x0)

n!(h)n +O(hn+1)

where O(hn+1 is the error of the norder approximation and is a function of hn+1. Figure3.3 plots f(x) = cos(x) and the local Taylor approximations, of order zero and one, aroundpoint x = 1. For n = 1, p0 is a constant function defined by p0(x) = cos(1). For n = 1p1 is a line given by p1(x) = cos(1) sin(1)(x 1). Thus, when the order of the Taylorpolynomial increases a better approximation of the function in the neighbourhood of x = 1is obtained as it approximates not only the function but its derivatives. However, the errorof approximation increases as we move away from x = 1.

3.2 Lagrange polynomials 49

Figure 3.3: Taylor polynomial approximation of the cos function at point x = 1. p0(x), zero, andp1(x), fist, order approximations are shown.

3.2 Lagrange polynomials

Suppose that you want to find a first degree polynomial that passes through the points(x0, u0) y (xi, ui). There are many ways to find a straight line that passes through thesepoints. The main idea behind Lagrange polynomials is to find a set of polynomials offirst degree whose linear combination is equal to the desired polynomial. In this case, theline that joins the points (x0, u0) and (x1, u1) is given by the first degree polynomial P (x)defined as,

P (x) =x x1x0 x1 u0 +

x x0x1 x0 u1

which defines the line crossing the points (x0, u0) and (x1, u1) by adding the lines P1 =xx1x0x1 u0 and P2 =

xx0x1x0 u1. If we define the functions W0(x) and W1(x) as

W0(x) =x x1x0 x1 , W1(x) =

x x0x1 x0

then P (x) can be written also as

P (x) = W0(x)u0 +W1(x)u1.

Notice that the W functions were found using the following rule

W0(x0) = 1, W0(x1) = 0.

andW1(x0) = 0, W1(x1) = 1.

Consequently when multiplying a polynomial W0 by the function value at point x0, thatis u0 = u(x0) then the W0 function cross point (x0, u0) and A(x1, 0). The same applies toW1 that cross points (x0, 0) and (x1, u1). When both functions are added, the resultingfirst order polynomial will cross through the given points (x0, u0) and (x1, u1).

50 Interpolation

3.2.1 Second order Lagrange polynomials

Suppose now that you have three points and you want to find the polynomial of seconddegree that interpolates those points. Using the technique explained above, we want tofind functions W0(x), W1(x), W2(x) such that

P (x) = W0(x)u0 +W1(x)u1 +W2(x)u2

with Wi second order polynomials. The result of adding second order polynomials is asecond order polynomial. Additionally they must comply with

P (x0) = u0, P (x1) = u1 and P (x2) = u2. (3.1)

In other words the functions must go through (interpolate) the points.

In order to obtain the Wi functions we proceed in the same way as in the first orderpolynomials, that is, in order to guarantee the condition in equation 3.1 it is enough thatfunctions Wi be defined as

Wi(xj) =

{1 if i = j.

0 if i 6= j. (3.2)

That is Wi(xj) is equal to one at point xi and zero at the other points. This function canbe computed in different ways. For example to construct function W0 that cancels at eachxi with i 6= 0, we choose a series of binomial factors each one cancelling at points xi

W0 = (x x1)(x x2). (3.3)

Now, in order to satisfy W0(x0) = 1, we divide this result by the same product of binomialsas in equation 3.3 and choosing x = x0

W0 =(x x1)(x x2)

(x0 x1)(x0 x2) . (3.4)

In this way W0 complies with 3.2. In a similar way we can construct functions W1 andW2. Figure 3.4 shows these three functions. Notice that any quadratic function passingthrough points x0, x1 and x2 can be constructed by the linear combination of these threefunctions.

3.2.2 General case

Given a set of n + 1 discrete points by which a function pass through. The set of pointscan be interpolated by an n degree polynomial obtained by the linear combination of n+1polynomials of n degree with the following property:

Wi(xk) =

{1 if i = k.

0 if i 6= k

3.2 Lagrange polynomials 51

Interpolated Function

W1

W2

W3

u1

u2

u3

x

0.2 0.4 0.6 0.8 1.0

0

0.2

0.4

0.6

0.8

1.0

1.2

1.4

1.6

Figure 3.4: Second degree Lagrange polynomials

which can be constructed by:

Wi =n

k=0k 6=i

(x xk)(xi xk)

using Polynomial interpolation provides continuous and derivable functions. Besidesderivation and integration of polynomials is straight-forward. However, when the or-der of the polynomials increases, the loop of the polynomials increases resulting in a poorlocal approximation. So high order polynomials must be avoided in the construction ofinterpolated functions. Instead, interpolation by parts must be considered. This will bediscussed in the next section.

3.2.3 Other Approximations

Lagrange polynomials are not the only way of interpolating a set of data points. In generalif v is a function defined over n points xi by vi with

vi = v(xi)

then the function v(x) can be interpolated by v(x) V n defined in terms ofWi(x) functionswith

v(x) =i

viWi(x).

Wi functions conform a base that generates an approximation space. For example thespace of second order functions can be generated by a base of P 2

span {W1(x),W2(x),W3(x)} .

52 Interpolation

Example 3.2Given the functions W1 = 1, W2 = x, W3 = x

2, any polynomial p P 2 is given by thelinear combination of

p = a1W1 + a2W2 + a3W3.

For example find a1,a2 and a3 such as p is equal to p(x) = 3x2 + 2x+ 5

3.3 Polynomials defined by parts

60

4z

102 8

10

0

50

40

30

20

Figure 3.5: Oscillation due to high order polynomial approximation

One disadvantage of the polynomial interpolation is that as the number of pointsincreases the order of the polynomial increases. A higher degree of polynomial does notnecessarily mean a good approximation as it can introduce undesirable oscillations as isshown in figure 3.5. To avoid this, the domain can be subdivided and the function can beapproximated by a series of functions in each subdomain.

Let define the domain of a function u defined as,

u : R

can be expressed in terms of subdivisions of the domain such as

= K1 K2 K3 . . . . . . . . . Kn

whereKi represents a finite element andKi intersectsKj at the maximum at the boundary.

Some examples of domains defined by parts for the one-, two-, and three-dimensionalcase are shown next.

One dimensional case

Figure 3.6 presents a one-dimensional domain defined by parts. The segment [a, b] issubdivided as a set of elements Ki. Each element consist of a segment of line and in this

3.3 Polynomials defined by parts 53

Figure 3.6: One dimensional domain defined by parts

particular case they have four nodes per element. Additionally, the union of the elementsis equal to the domain

Ki = and the element only intersects at one point,

Ki Kj ={.vertex

Two dimensional case

x

y

= Union of triangles

Figure 3.7: Two dimensional domain defined by parts

Figure 3.7 presents a two dimensional domain defined by parts. The domain is subdi-vided as a set of triangles Ki. Each triangle consist of three vertex and three line segments.In this particular case the nodes are defined at the vertex of the triangle. The union of theelements is equal to the domain

Ki = and the element only intersects at one point

or at an edge,

Ki Kj =

1 vertex.

1 edge

Three dimensional case

Figure 3.8 presents a three dimensional domain defined by parts. The domain is subdividedas a set of tetrahedral elements Ki. Each tetrahedron consist of four vertices, four three-dimensional triangles and six line segments. In this particular case the nodes are defined

54 Interpolation

K1

= Union of tetrahedras

Figure 3.8: Three dimensional domain defined by parts

at the vertex of the triangles. The union of the elements is equal to the domainKi =

and the element only intersects at one face, edge or vertex,

Ki Kj =

1 vertex.

1 edge

1 face

Base (Shape) functions

Base functions are defined for the elements ki in the following way:

Pn =

{u Pn(Ki)

u(x) =ni=0

aixi

}. (3.5)

So in two dimensions we have

Pn =

u Pn(Ki)u(x) =

ni+jn

aijxiyj

.It can be observed that equation 3.5 can be written in terms of the base function of

Pn as:u1 = W1(x1)u2 = W2(x2).

...un = Wn(xn)

Expanding the terms of the base and remembering that Wi Pn, it delivers:W1(x1) = a

10x1 + a

11x

11 + + a1nxn1 = u1

...Wi(xi) = a

i0xi + a

i1x

i1 + + ainxni = ui


Wi(xi), can be written as:[A]x = u

Notice that these functions are defined for each element ki . Therefore the limits ofthe elements can not be arbitrarily chosen. Two adjacent elements should have commonvertices in order to preserve the continuity in the function. Therefore, if xi is a commonnode for two elements and ur(x) is the function defined at element Kr and us(x) is thefunction defined at element Ks, then it must be true that

ur(xi) = us(xi).

3.3.1 One-dimensional Interpolation

First order polynomials

A function for which its value is only known at certain points can be represented by a pairvectors:

x =

x1...xm

u =

u1...um

The domain is represented by the interval [x1, xn] With this information it can be foundPn(x) = u(x) that interpolates the function u(x) at the m points. Suppose that you wantto know the value of the function at certain point x in the domain. Proceed as follows:

i. Find ki such that x kiii. Compute the base functions W |ki

iii. Compute u(x) as u(x) =uiW

i(x).

Differential Operators

Let v be a function known at n points xi. That is vi = v(xi) is known for n points. Thenthe function is interpolated in V as

v(x) =i

viWi(x)

where the functionsWi form the basis of V . Figure 3.9 plots a function and its interpolationusing five elements. The derivate of v V is given by

dv

dx= d

dxviWi(x) =

vidWi(x)

dx

Notice that while the continuity in the interpolated function is guarantee by the interpo-lation, its derivative in only derivable by parts as at the node points the derivative is notwell defined. One could try to interpolate again this result but in general the derivativeof the interpolated function is different from the interpolated of the derivative.

56 Interpolation

1

1 2 x6

W W2

4

1 W4

v4

v6

v2

v(x)

x x x

Figure 3.9: Piece-wise linear approximation of a smooth function

Second order polynomials

Definition of the base functions :

Wi(xj) =

{1 if i = j.

0 if i 6= j

Let us analyse the meaning of having a domain defined by parts. First, take a data pointinside an element, for example point x4 k2. Notice that W4 is completely defined inelement k2 Outside of it takes a value of zero. However, when we look at function W3at point x3, belonging to elements k1 and k2, we notice that the function is defined overthese two elements. See figure 3.10.

Figure 3.10: Piece-wise quadratic interpolation using Lagrange polynomials


Figure 3.11: Interpolation using a second order polynomial defined by parts

Example

Figure 3.11 shows a second order polynomial interpolating u by parts. Notice that eachelement consists of three nodes and therefore a second order polynomial can be obtainedat each element.

Exercises

i. Using Matlab or Maple, draw function u(x) = (x/5) sin(5x) + x defined over theinterval [0,4].

ii. Using the same partition of the domain of the last example, plot u(x) second orderLagrange polynomials that interpolates u(x).

iii. Global numeration Vs local numeration. From the last example we see the need ofhaving a global numeration of the domain nodes but in order to compute the basefunctions it is convenient to have a local numeration.

3.3.2 Two dimensional interpolation

Let be the domain of a function u(x, y); is bounded by a polygon and = Ki (theunion of subdomains where Ki can be a triangle or a square. The approximation spacePn(xi) is defined for each element Ki.

Example

Approximation space with P 2

P P 2 P (x) =i+jn

aijxiyj .

The number of nodes needed to define a polynomial is given by the number of coefficientsof the polynomial. That way

58 Interpolation

P1 N(1) = 3

P2 N(1) = 6

P3 N(3) = 10 .

Bases in the 2-D:

P 1 = span {W1(x),W2(x),W3(x)}P 2 = span {W1(x), . . . ,W6(x)} .

In general Wr(x) is defined as:

Wr(x) =i+j

3.5 Exercises 59

3.4.1 calculo del gradiente para un elemento

Se debe calcular

u =(u

x,u

x

)(3.6)

Adicionalmete u se aproxima en terminos de las funciones base como

u =

uii(x, y) (3.7)

y ui la funcion evaluada en los vertices para el caso de elementos P1. Reemplazando elvalor de la funcion aproximada en el calculo del gradiente tenemos:

u

x=uii

x=

uiix

(3.8)

Lo que significa que solo se necesita calcular el gradiente de las funciones base sobre elelemento. Para elementos P1 las funciones base son lineales de la forma

i(x, y) = aix+ biy + c

por tantoi = (ai, bi) (3.9)

El gradiente puede calcularse entonces para cada uno de los nodos del triangulo usandoecuaciones 3.8 y 3.9

3.4.2 calculo del gradiente para un campo

Una vez calculado el gradiente para un nodo como el node 3 de la figura nos daremoscuenta que su valor difiere dependiendo del elemento con el cual se haya hecho el calculo.Una opcion aceptable es hacer el promedio de los resultados de cada elemento. Es decirsi hay n elementos que comparten un nodo i,

uixj

=

ne=1

(uixj

)e

n(3.10)

donde(uixj

)e

es el valor obtenido usando el elemento e.

3.5 Exercises

1. Given the following points {x0 = 0, x1 = 0.6, x2 = 0.9} defining a real number intervaland its middle point, find the Lagrange polynomials Wi(x) of first and second order.

2. For the functions f(x) = cos(x), f(x) = ln(x + 1), and f(x) = tan(x) find the valuesof the function at the points fi = f(xi) and use the lagrange polynomials from theprevious point to find the interpolating function

g(x) =i

fiWi(x)

60 Interpolation

3. Find the approximation error for each of the functions of the previous point when firstand second order Lagrange interpolation is used. The error is defined as

error =

ba|f(x) g(x)|dx

4. The aim of this exercise is to explore the interpolation and approximation functionsavailable in commercial software as Matlab, Maple, and Excel/OpenOffice .

Consider the following table 1 that relates the density against Altitude.

h Altitude Density(m) (kg/m3)

0 1.2252000 1.0074000 0.8196000 0.6608000 0.52610000 0.41412000 0.31214000 0.22816000 0.16618000 0.12220000 0.089

(a) Find the first, second third and fourth order polynomial that best approximatesthe curve. With this functions construct a vector approx and compute the `2 and` norm error.

(b) If dP/dz = g is the function that relates pressure and altitude for a hydrostaticfluid, find the function that describe the pressure as a function of height. Withthis function write a table and compare the results with the values found in tables.

5. Compute the fist order Lagrange polynomials W1,W2,W3 for the triangle shown inthe figure using the following equation which correspond to the coefficients of the Wifunction x1 y1 1x2 y2 1

x3 x3 1

aibici

= eiwhere ei is a canonic unit vector in the i direction (it is equal to one in the i positionand zero otherwise). The coordinates of the triangle are given in the following figure

1Data from U.S. Standard Atmosphere, 1962, U.S. Government Printing Office

3.5 Exercises 61

(3.5, 4)

(4, 3.5)

(3, 3)

(5, 4)

(4, 5)

(4.5, 4.5)

6. Find the interpolation g(x, y) =fiWi(x, y), with fi = f(xi, yi), for the following

functions:

(a) f(x, y) = 2x2 + 3y2 + 1

(b) f(x, y) = sin(3x) cos(3y).

Plot each of these functions and its interpolation g(x, y).

7. Select te

lecture notes on numerical analysis

Documents

newman boundary problems

boundary value problems

newman boundary conditions

fourier boundary conditions

weak form

dimensional problems

standard element

vector norms