university college dublin an col aiste ollscoile, baile atha cliath - … · 2014. 1. 17. ·...
TRANSCRIPT
-
University College Dublin
An Coláiste Ollscoile, Baile Átha Cliath
School of Mathematical SciencesScoil na nEoláıochtáı Matamaitice
Computational Science (ACM 20030)
Dr Lennon Ó Náraigh
Lecture notes in Computational Science, January 2014
-
Computational Science (ACM 20030)
• Subject: Applied and Computational Maths
• School: Mathematical Sciences
• Module coordinator: Dr Edward Cox; Lecturer: Dr Lennon Ó Náraigh
• Credits: 5
• Level: 2
• Semester: Second
Typically, problems in Applied Mathematics are modelled using a set of equations that can be written
down but cannot be solved analytically. In this module we examine numerical methods that can be
used to solve such problems on a desktop computer. Practical computer lab sessions will cover the
implementation of these methods using mathematical software (Matlab). No previous knowledge of
computing is assumed.
Topics and techniques discussed include but are not limited to the following list. Computer archi-
tecture: The Von Neumann model of a computer, memory hierarchies, the compiler. Floating-
point representation: Binary and decimal notation, floating-point arithmetic, the IEEE double
precision standard, rounding error. Elementary programming constructions: Loops, logical
statements, precedence, array operations, vectorization. Root-finding for single-variable func-
tions: Bracketing and Bisection, Newton–Raphson method. Error and reliability analyses for the
Newton–Raphson method. Numerical integration: Midpoint, Trapezoidal and Simpson methods.
Error analysis. Solving ordinary differential equations (ODEs): Euler Method, Runge–Kutta
method. Stability and accuracy for the Euler method. Linear systems of equations: Gaussian
elimination, partial pivoting. The condition number of a matrix: quantifying the idea that a
matrix can be ‘almost’ singular, investigating the consequences of this idea for the robustness of
numerical solutions of linear systems. Fitting data to polynomials using the method of least
squares. Random-number generation using the linear congruential method.
i
-
What will I learn?
On completion of this module students should be able to
1. Describe the architecture of a modern computer using the Von Neumann model.
2. Describe how numbers are represented on a computer.
3. Use floating-point arithmetic, having due regard for rouding error.
4. Do elementary operations in Matlab, such ‘for’ and ‘while’ loops, logical statements, precdence.
5. Do array operations using loops; and equivalently, using vectorization.
6. Describe elementary root-finding procedures, analyse their robustness, and implement them
in Matlab.
7. Describe elementary numerical integration integration schemes, analyse their accuracy, and
implement them in Matlab.
8. Solve ODEs numerically uzing standard algorithms, analyse their accuracy and stability, and
implement them numerically.
9. Solve systems of linear equations using Gaussian elimination.
10. Analyse ill-conditioned systems of equations.
11. Fit data to polynomials.
ii
-
Editions
First edition: January 2013
This edition: January 2014
iii
-
iv
-
Contents
Module description i
1 Introduction 1
2 Floating-Point Arithmetic 6
3 Computer architecture and Compilers 18
4 Our very first Matlab function 25
5 Vectors, Arrays, and Loops in Matlab 27
6 Operations using for-loops and their built-in Matlab analogues 33
7 While loops, logical operations, precedence, subfunctions 35
8 Plotting in Matlab 48
9 Root-finding 53
10 The Newton–Raphson method 62
11 Interlude: One-dimensional maps 65
12 Newton–Raphson method: Failure analysis 69
13 Numerical Quadrature – Introduction 79
14 Numerical Quadrature – Simpson’s rule 87
v
-
15 Ordinary Differential Equations – Euler’s method 95
16 Euler’s method – Accuracy and Stability 102
17 Runge–Kutta methods 109
18 Gaussian Elimination 115
19 Gaussian Elimination – the algorithm 121
20 Gaussian Elimination – performance and operation count 128
21 Operator norm, condition number 137
22 Condition number, continued 142
23 Eigenvalues – the power method 148
24 Fitting polynomials to data 154
25 Random-number generation 160
A Calculus theorems you should know 168
B Facts about Linear Algebra you should know 170
vi
-
Chapter 1
Introduction
1.1 Module summary
Here is the executive summary of the module:
You will learn enough numerical analysis to enable you to solve ODEs, integrate functions, find
roots, and fit curves to data. At the same time, you will learn the basics of Matlab. You will
also learn about Matlab’s powerful built-in functions that make numerical calculations effortless.
In more detail, we will follow the following programme of work:
1. The architecture of a modern computer: Von Neumann model, memory hierarchies.
2. Represetation of numbers on a computer: binary versus decimal. Floating-point arithmetic.
Rounding error.
3. Elementary operations in Matlab: ‘for’ and ‘while’ loops, logical statements, precdence.
4. Array operations using loops; the superseding of these loop calculations by vectorization.
5. Root-finding: the Intermediate Value Theorem, Bracketing and Bisection, Newton–Raphson
method.
6. Failure analysis for the Newton–Raphson method, including analysis of iterative maps.
7. Numerical integration (quadrature) using the Midpoint, Trapezoidal, and Simpson’s rules.
Error analysis for the same.
8. Solving ODEs numerically: Euler and Runge–Kutta methods. Error analysis for the Euler
method. Stability analysis for the same.
1
-
2 Chapter 1. Introduction
9. Solving systems of linear equations using Gaussian elimination.
10. Analysis of ill-conditioned systems (i.e. systems of linear equations that are ‘barely solvable’).
The condition number.
1.2 Learning and Assessment
Learning
• 36 contact hours, 3 per week, with the following possibilities:
– Three hours of lecturers (theory), no computer-aided labs;
– Two hours of lectures, one hour of labs;
– One hour of lectures, two hours of labs.
The split will happen on an ad-hoc basis as the module progresses.
Note finally, there will be precisely three contact hours per week, in spite of appearances to
the contrary on the official timetable.
• The lab sessions will involve using the mathematical software Matlab. No prior knowledge ofMatlab or programming is assumed. The students will be taught how to use Matlab in these
lab sessions.
• Supplementary reading and Matlab coding practice.
Assessment
• Three homework assignments, 623% each, for a total of 20%
• One midterm exam, for a total of 20%
• One end-of-semester exam, 60%
Note that percentage-to-grade conversion table is the one used by the School of Mathematical
Sciences, see
http://mathsci.ucd.ie/tl/grading/en06
-
1.2. Learning and Assessment 3
Resitting the module
Assessment of resit students will be by one end-of-semester exam only, which will be assessed in the
usual way on a pass/fail basis.
Textbooks
• Lecture notes will be put on the web. These are self-contained. They will be available beforeclass. It is anticipated that you will print them and bring them with you to class. You can
then annotate them and follow the proofs and calculations done on the board in class.
• The lecture notes will also be used as a practical Matlab guide in the lab-based sessions.
• You are still expected to attend all classes and lab sessions, and I will occasionally deviatefrom the content of the notes, and give revision tips for the final exam.
• Here is a list of the resources on which the notes are based:
– Afternotes on Numerical Analysis, G. W. Steward, (SIAM, 1996).
– For issues concerning numerical linear algebra: Dr Sinéad Ryan’s website:
http://www.maths.tcd.ie/~ryan/TeachingArchive/161/teaching.html
– For issues concerning computer architecture and memory, the course Introduction to
high-performance scientific computing on the website
www.tacc.utexas.edu/~eijkhout/Articles/EijkhoutIntroToHPC.pdf
• Other, more advanced works are referred to very occasionally:
– Chebyshev and Fourier Spectral Methods, J. P. Boyd (Dover, 2001), and the website
http://www-personal.umich.edu/~jpboyd/BOOK_Spectral2000.html
– The art of Computer Programming, Volume 2, D. Knuth (Addison-Wesley, 3rd Edition,
1997)
– Numerical Recipes in C, W. H. Press et al. (CUP, 1992):
http://apps.nrbook.com/c/index.html
Module dependencies
Some knowledge of Linear Algebra and Calculus is assumed. Important theorems in analysis are
referred to. For a reference, see the book Analysis: An Introduction, R. Beals (CUP, 2004).
-
4 Chapter 1. Introduction
Office hours
I do not keep specific office hours. If you have a question, you can visit me whenever you like – from
09:00-18:00 I am usually in my office if not lecturing. It is a bit hard to get to. The office number,
building name, and location are indicated on a map at the back of this introductory chapter.
Otherwise, email me:
-
1.2. Learning and Assessment 5
-
Chapter 2
Floating-Point Arithmetic
Overview
Binary and decimal arithmetic, floating-point representation, truncation, truncation errors, IEEE
double precision standard
2.1 Introduction
Being electrical devices, ‘on’ and ‘off’ are things that all computers understand. Imagine a computer
made up of lots of tiny switches that can either be on or off. We can represent any number (and
hence, any information) in terms of a sequence of switches, each of which is in an ‘on’ or ‘off’ state.
We do this through binary arithmetic. An ‘on’ or an ‘off’ switch is therefore a fundamental unit
of information in a computer. This unit is called a bit.
2.2 Positional notation and base 2
One of the crowing achievements of human civilization is the ability to represent arbitrarily large
and small real numbers in a compact way using only ten digits. For example, the integer 570, 123
really means
570, 123 = (5× 105) + (7× 104) + (0× 103) + (1× 102) + (2× 101) + (3× 100)
Here,
• The leftmost digit (5) has five digits to its right and therefore comes with a power 105,
6
-
2.2. Positional notation and base 2 7
• The digit second from the left (7) has four digits to its right and therefore comes with powerof 104,
• And so on, down to the rightmost digit, which, by definition, has no other digits to its right,and therefore comes with a power of 100.
In contrast, the Romans would have struggled to represent this number:
570, 123 = DLXXCXX I I I,
where the overline means multiplication by 1, 000.
Rational numbers with absolute value less than unity can be expressed in the same way, e.g.
0.217863:
0.217863 = (2× 10−1) + (1× 10−2) + (7× 10−3) + (8× 10−4) + (6× 10−5) + (3× 10−6).
Other rational numbers have a decimal expansion that is infinite but consists of a periodic repeating
pattern of digits:
17= 0.142857142857 · · · = (1×10−1)+(4×10−2)+(2×10−3)+(8×10−4)+(5×10−5)+(7×10−6)
+ (1× 10−7) + (4× 10−8) + (2× 10−9) + (8× 10−10) + (5× 10−11) + (7× 10−12) + · · ·
Using geometric progressions, it can be checked that 1/7 does indeed equal 0.142857142857 · · · ,since
0.142857142857 · · · = 1(
1
10+
1
107+
1
1013+ · · ·
)+ 4
(1
102+
1
108+ · · ·
)+
+ 2
(1
103+
1
109+ · · ·
)+ 8
(1
104+
1
1010+ · · ·
)+
+ 5
(1
105+
1
1011+ · · ·
)+ 7
(1
106+
1
1012+ · · ·
)+ · · ·
=1
10
(1 +
1
106+
1
1012+ · · ·
)+
4
102
(1 +
1
106+
1
1012
)+ · · ·
=
(1 +
1
106+
1
1012+ · · ·
)[1
10+
4
102+
2
103+
8
104+
5
105+
7
106
]=
1
1− 1106
(105 + 4× 104 + 2× 103 + 8× 102 + 5× 10 + 7
106
),
-
8 Chapter 2. Floating-Point Arithmetic
Hence,
0.142857142857 · · · = 106
106 − 1
(105 + 5× 104 + 2× 103 + 8× 102 + 5× 10 + 7
106
),
=105 + 4× 104 + 2× 103 + 8× 102 + 5× 10 + 7
106 − 1,
=142857
999999,
=142857
7× 142857,
= 17.
In a similar way, all real numbers can be represented as a decimal string. The decimal string may
terminate or be periodic (rational numbers), or may be infinite with no repeating pattern (irrational
numbers). For example, a real number y ∈ [0, 1), with
y =∞∑n=1
xn10n
= 0.x1x2 · · ·
where xi ∈ {0, 1, · · · , 9}. This number does not as yet have a meaning. However, consider thesequence {yN} of rational numbers, where
yN =N∑
n=1
xn10n
. (2.1)
This is a sequence that is bounded above and monotone increasing. By the completeness axiom,
the sequence has a limit, hence
y = limN→∞
yN .
The completeness axiom is therefore equivalent to the construction of the real numbers: any real
number can be obtained as the limit of a rational sequence such as Equation (2.1).
Now that we understand how numbers are represented in base 10 using positional notation, we now
examine other bases. Consider for example the string
x = 1010110,
in base 2. Using positional notation and base 2, we understand x to be the number
x = (1× 26) + (0× 25) + (1× 24) + (0× 23) + (1× 22) + (1× 2) + (0× 20),
= 64 + 16 + 4 + 2,
= 86, base 10.
-
2.2. Positional notation and base 2 9
Numbers with absolute value less than unity can be represented in a similar way. For example, let
x = 0.01101 base 2.
Using positional notation, this is understood as
x =0
2+
1
22+
1
23+
0
24+
1
25,
= 14+ 1
8+ 1
32,
= 832
+ 432
+ 132,
= 1332,
= 0.40625 base 10.
Two binary strings can be added by ‘carrying twos’. For example,
+0.0 1 1 0 11.1 1 1 0 010.0 1 0 0 1
Let’s check our calculation using base 10:
x1 = 0.01101 =0
2+
1
4+
1
8+
0
16+
1
32=
13
32,
x2 = 1.111 = 1 +1
2+
1
4+
1
8=
15
8=
60
32.
Hence,
x1+x2 =73
32= 2+
9
32= 2+
1
32+
8
32= 2+
1
32+1
4= (1×21)+(0×2)+ 1
22+
1
25= 10.01001 base 2.
Because computers (at least notionally) consist of lots of switches that can be on or off, it makes
sense to store numbers in binary, as a collection of switches in ‘on’ or ‘off’ states can be put into a
one-to-one correspondence with a set of binary numbers. Of course, a computer will always contain
only a finite number of switches, and can therefore only store the following kinds of numbers:
1. Numbers with absolute value less than unity that can be represented as a binary expansion
with a finite number of non-zero digits;
2. Integers less than some certain maximum value;
3. Combinations of the above.
-
10 Chapter 2. Floating-Point Arithmetic
An irrational real number (e.g.√2) will be represented on a computer by a truncation of the true
value. This introduces a potential source of error into numerical calculations – so-called rounding
error.
2.3 Floating-point representation
Rounding error is the original sin of computational mathematics. A partial atonement for this sin is
the idea of floating-point arithmetic. A base-10 floating-point number x consists of a fraction F
containing the significant figures of the number, and an exponent E:
x = F × 10E,
where110
≤ F < 1.
Representing floating-point numbers on a computer comes with two kinds of limitations:
1. The range of the exponent is limited, Emin ≤ E ≤ Emax, where Emin is negative and Emaxis positive; both have large absolute values. Calculations leading to exponents E > Emax
are said to lead to overflow; calculations leading to exponents E < Emin are said to have
underflowed.
2. The number of digits of the fraction F that can be represented by on and off switches on a
computer is finite. This results in rounding error.
The idea of working with rounded floating-point numbers is that the number of significant figures
(‘precision’) with which an arbitrary real number is represented is independent of the magnitude of
the number. For example,
x1 = 0.0000001234 = 0.1234× 10−6, x2 = 0.5323× 106
are both represented to a precision of four significant figures. However, let us add these numbers,
keeping only four significant figures:
x1 + x2 = 0.0000001234 + 532, 300,
= 532, 300.0000001234,
= 0.5323000000001234× 106,
= 0.5323× 106 four sig. figs.,
= x1.
-
2.3. Floating-point representation 11
Rounding has completely negated the effect of adding x1 and x2.
When starting with a real number x with a possibly indefinite decimal expansion, and representing it
floating-point form with a finite number of digits in the fraction F , the rounding can be implemented
in two ways:
1. Rounding up, e.g.
0.12345 = 0.1235, four sig. figs.,
and 0.12344 = 0.1234 and 0.12346 = 0.1235, again to four significant figures;
2. ‘Chopping’, e.g.
0.12345 = 0.12344 = 0.12346 = 0.1234, truncated to four sig. figs.
The choice between these two procedures appears arbitrary. However, consider
x = a.aaaaB,
which is rounded up to
x̃ = a.aaaC,
If B < 5, then C = a, hence
x− x̃ = 0.0000B = B × 10−5 < 5× 10−5.
On the other hand, if B ≥ 5, then C = a + 1 (the digit is incremented by one). In a worst-casescenario, B = 5, and
x̃− x = a.aaaC − a.aaaaaB = (C − a)× 10−4 −B × 10−5 = 10−4 − 5× 10−5 = 5× 10−5.
In either case therefore,
|x̃− x| ≤ 5× 10−5.
Assuming a ̸= 0, we have |x| > 1, hence 1/|x| < 1, and
|x̃− x||x|
≤ 5× 10−5 = 12× 10−4.
-
12 Chapter 2. Floating-Point Arithmetic
More generally, rounding x to N decimal digits gives a relative error
|x̃− x||x|
≤ 12× 10−N+1.
See if you can show by similar arguments that for chopping, the relative error is twice as large than
that for rounding: ∣∣˜̃x− x∣∣|x|
≤ 10−N+1.
A more convenient way of summarizing these results is as follows: Let
x̃ = fl(x)
be the result of rounding the real number x using either rounding up or chopping. Define the signed
relative error
ϵ =fl(x)− x
x. (2.2)
We know,
|ϵ| ≤ ϵN =
1210−N+1, rounding up,10−N+1, chopping. (2.3)Thus, by definition,
|ϵ| ≤ ϵN
Re-arranging Equation (2.2), we have
fl(x) = x(1 + ϵ), |ϵ| ≤ ϵN .
The value ϵN is calledmachine epsilon and depends on the floating-point arithmetic of the machine
in question. We can also think of machine epsilon as the largest number x for which the computed
value of 1 + x is 1. It can be computed as follows in Matlab:
x=1;
while( 1+x~=1)
x=x/2;
end
x=2*x;
display(x)
However, Matlab will display machine epsilon if you simply enter ‘eps’ at the command prompt.
-
2.4. Error accumulation 13
Common Programming Error:
Thinking that machine epsilon is ‘the smallest number (in absolute value) the computer’.
This is wrong. Machine epsilon refers to the maximum relative error between a number
and its representation on the computer. Equivalently, you can think of it as follows:
let x be the smallest number strictly greater 1 representable by the computer. Then
ϵN = x− 1. If you are still not convinced, we shall see soon when we study the double-precision format that the smallest and largest numbers in absolute value terms are quite
distinct from machine epsilon.
2.4 Error accumulation
Most computing standards will have the following property:
fl(a ◦ b) = (a ◦ b)(1 + ϵ), |ϵ| ≤ ϵN , (2.4)
where ϵN is the machine epsilon and ◦ represents an arithmetic operation such as ×, +, −, or ÷.This is a good property to have: if the error in representing the numbers a and b is small, then the
error in representing their sum is also small. Because machine epsilon is very small, the compound
error obtained in a long sequence of arithmetic operations (where each component operation has the
property (2.4)) is very small. Errors induced by compounding individual errors such as Equation (2.4)
are therefore almost always negligible. However, error accumulation can still occur in two other ways:
1. The numbers entered into the computer code lack the precision required for a long calculation,
and ‘cancellation errors’ occur;
2. Certain iterative algorithms contain stable and unstable solutions. The unstable solution is
not accessed if the ‘initial condition’ is zero. However, if the initial condition is ϵN , then the
unstable solution can grow over time until it swamps the other, desired solution.
These sources of error will become more apparent in the examples in the homework.
-
14 Chapter 2. Floating-Point Arithmetic
2.5 Double precision and other formats
The gold standard for approximating an arbitrary real number in rounded floating-point form
x = F × 2E (2.5)
is the so-called IEEE double precision. A double-precision number on a computer can be thought
of as a 64 contiguous pieces of memory (64 bits). One bit is reserved for the sign of the number,
eleven bits are reserved for the exponent (naturally stored in base 2), and the remaining fifty-two
bits are reserved for the significand. Thus, in IEEE double precision, a real number is approximated
Figure 2.1: 64 contiguous bits in memory make up an IEEE floating-point number, with bits re-served for the sign, the exponent, and the fraction. From http://en.wikipedia.org/wiki/Double-precision floating-point format (20/11/2012).
and then stored as follows:
x ≈ fl(x) = (−1)sign(1 +
52∑i=1
b−i2i
)× 2Es−1023.
Here, the exponent Es is stored using a contiguous eleven-bit binary string, meaing that Es can in
principle range from Es = 0 to Es = 2047. However, Es = 0 is reserved for underflow to zero, and
Es = 2047 is reserved for overflow to infinity, meaning that the maximum possible finite exponent
is Es = 2046. Accounting for offset, the maximum true exponent is
E = Es,max − 1023 = 2046− 1023 = 1023.
Hence, xmax ≈ 21023. Similarly,xmin = 2
1−1023 = 2−1022.
Now, recall the formula
|x− fl(x)||x|
:= ϵ ≤ ϵN =
1210−N+1, rounding up,10−N+1, chopping,which gave the truncation error in base 10 for truncation after N figures of significance. Going over
-
2.5. Double precision and other formats 15
to base two and chopping, we have
|x− fl(x)||x|
:= ϵ ≤ ϵN = 2−N+1.
In IEEE double precision, the precision is N = 52 + 1 (the extra 1 comes from the digit stored
implicitly), hence
ϵN = 2−53+1 = 2−52.
Equivalently, the smallest positive number strictly greather than 1 detectable in this standard is
1 +0
2+
0
22+ · · ·+ 1
252,
and again,
ϵN = 2−52 ≈ 2.220456× 10−16
gives machine precision.
The IEEE standard also supports extensions to the real numbers, including the symbols Inf (which
will appear when a code has overflowed), and NaN. The symbol NaN will appear as a code’s output
if you do something stupid. Examples in Matlab sytanx include the following particularly egregious
one:
x=0/0;
display(x)
Another datatype is the integer, which is stored in a contiguous chunk of memory like a double,
typically of length 8, 16, 32, or 64 bits. Typically, the integers are defined with respect to an offset
(two’s complement), so that no explicit storage of the sign is required.
-
16 Chapter 2. Floating-Point Arithmetic
Common Programming Error:
Mixing up integers and doubles. For example, suppose in a computer-programming lan-
guage such as C, that x has been declared to be a double-precision number. Then,
assigning x the value 1, i.e.
x=1;
confuses the compiler, as it now thinks that x is an integer! In order not to confuse the
compiler, one would have to write
x=1.0;
Happily, the distinction between integers and doubles is not enforced in Matlab, and
ambiguity about variable types is allowed. However, you should remember this lesson if
you do more advanced programming in high-level languages such as C or Fortran.
As hinted at previously, Matlab implements the IEEE double precision standard, albeit implicitly.
For example, if you type
display(pi)
at the command line, you will only see the answer
3.1416
However, you can rest assured that the built-in working precision of the machine is 53 bits. For
example, typing
display(eps)
yields
2.2204e-016
Also, typing
x=2;
while(x~=Inf)
x_old=x;
x=2*x;
end
display(x_old)
-
2.5. Double precision and other formats 17
yields
8.9885e+307,
the same as 21023 = 8.9885e+ 307.
-
Chapter 3
Computer architecture and Compilers
Overview
Computer architecture means the relationship between the different components of hardware in a
computer. In this chapter, this idea is discussed under the following headings: the memory/processor
model, memory organization, processor organization, simple assembly language.
3.1 The memory/processor or von Neumann model
Computer architecture means the relationship between the different components of hardware
in a computer. On a very high level of abstraction, many architectures can be described as von
Neumann architectures. This is a basic design for a computer with two components:
1. An undivided memory that stores both program and data;
2. A processing unit that executes the instructions of the program and operates on the data
(CPU).
This design is different from the earliest computers in which the program was hard-wired. It is
also very clever, as the line between ‘data’ and ‘program’ can become blurred – to our advantage.
When we write a program in a given language, we work with a computer that has other, more
basic programs installed – including a text editor and a compiler. The von Neumann architecture
enables the computer to treat the code we write in the text editor as data, and the compiler is in
this context a ‘super-program’ that operates on these data and converts our high-level code into
instructions that can be read by the machine. Having said this, in this module, we understand ‘data’
to be the collection of numbers to be operated on, and the code is the set of instructions detailing
the operations to be performed.
18
-
3.2. Memory organization 19
In conventional computers, the machine instructions generated by the compiled version of our code
do not communicate directly with the memory. Instead, information about the location of data
in the computer memory, and information about where in memory the results of data processing
should go, are stored directly in a part of the CPU called the register. Rather counter-intuitively,
the existence of this ‘middle-man’ register speeds up execution times for the code. Many computer
programs possess locality of reference: the same data are often accessed repeatedly. Rather than
moving these frequently-used data to and from memory, it is best to store them locally on the CPU,
where they can be manipulated at will.
The main statistic that is quoted about CPUs is their Gigahertz rating, implying that the speed of
the processor is the main determining factor of a computer’s performance. While speed certainly
influences performance, memory-related factors are important too. To understand these factors, we
need to describe how computer memory is organized.
3.2 Memory organization
Practically, a pure von Neumann architecture is unrealistic because of the so-called memory wall.
In a modern computer, the CPU performs operations on data on timescales much shorter than the
time required to move data from memory to the CPU. To understand why this is the case, we need
to study how the CPU and the computer memory communicate.
In essence, the CPU and the computer memory communicate via a load of wires called the bus. The
front-side bus (FSB) or ‘North bridge’ connects the computer main memory (or ‘RAM’) directly to
the CPU. The bus is typically much slower than the processor, and operates with clock frequencies
of ∼ 1GHz, a fraction of the CPU clock frequency. A processor can therefore consume many itemsof data fed from the bus in one clock tick – this is the reason for the memory wall.
The memory wall can be broken up further in two parts. Associated with the movement of data are
two limitations: the bandwidth and the latency. During the execution of a process, the CPU will
request data from memory. Stripping out the time required for the actual data to be transferred,
the time required to process this request is called latency. Bandwidth refers to the amount of data
that can be transferred per unit time. Bandwidth is measured in bytes/second, where a byte (to
be discussed below) is a unit of data. In this way, the total time required to for the CPU to request
and receive n bytes from memory is
T (n) = α+ βn,
where α is the latency and β is the inverse of the bandwidth (second/byte). Thus, even with infinite
bandwidth (β = 0), the time required for this process to be fulfilled is non-zero.
Typically, if the chunk of memory of interest physically lies far away from the CPU, then the latency
-
20 Chapter 3. Computer architecture and Compilers
is high and the bandwidth is low. It is for this reason that a computer architecture tries to maximize
the amount of memory near the CPU as possible. For that reason, a second chunk of memory close
the CPU is introduced, called the cache. This is shown schematically in Figure 3.1. Data needed in
Figure 3.1: The different levels of memory shown in a hierarchy
some operation gets copied into the cache on its way to the processor. If, some instructions later,
a data item is needed again, it is searched for in the cache. If it is not found there, it is loaded
from the main memory. Finding data in cache is called a cache hit, and not finding it is called a
cache miss. Again, the cache is a part of the computer’s memory that is located on the die, that
is, on the processor chip. Because this part of the memory is close the CPU, it is relatively quick
to transfer data to and from the CPU and the cache. For the same reason, the cache is limited
in size. Typically, during the execution of a programme, data will be brought from slower parts
of the computer’s memory to the cache, where it is moved on and off the register, where in turn,
operations are performed on the data. There is a sharp distinction between the register and the
cache. The instructions in machine language that have been generated by our compiled code are
instructions to the CPU and hence, to the register. It is therefore possible in some circumstances
to control movement of data on and off the register. On the other hand, the move from the main
memory to the cache is done purely by the hardware, and is outside of direct programmer control.
-
3.3. The rest of the memory 21
3.3 The rest of the memory
The rest of the memory is referred to as ‘RAM’, and is neither built into the CPU (like the registers),
nor collocated with the CPU (like the cache). It is therefore relatively slow but has the redeeming
feature that it is large. The most-commonly known feature of RAM is that the data it contains are
removed when the computer powers off. This is why you must save your work to the hard drive!
RAM itself is broken up into two parts – the stack and the heap.
Stacks are regions of memory where data is added or removed on a last-in-first-out basis. The stack
really does resemble a stack of plates. You can only take a plate on or off the top of a stack – this
is also true of data stored in the stack. Another silly analogy is to imagine a series of postboxes
attached one on top of the other to a vertical pole. Initially, all the postboxes are empty. Then,
the bottommost postbox is filled and a postit note is placed on it, indicating that the location of
the next available postbox. As letters are put into and removed from postboxes, the postit note
moves up and down the stack of postboxes accordingly. It is therefore very simple to know how
many postboxes are full and how many are empty – a single label suffices. The system for addressing
memory slots in the stack is equally simple and for that reason, accessing the stack is faster than
accessing other kinds of memory.
On the other hand, there is the heap, which is a region of memory where data can be added or
removed at will. The system for addressing memory slots in the heap is therefore much more detailed,
and accessing the heap is therefore much slower than accessing the stack. However, the size of the
stack is fixed at runtime and is usually quite small. Many codes require lots of memory. Trying
to fit lots of data into the relatively small amount of stack that exists can lead to stack overflow
and segmentation faults. Stack overflow is a specific error where the exectuting program requests
more stack resources than those that exist; segmentation faults are generic errors that occur when
a code tries to access addresses in memory that either do not exist, or are not available to the code.
So ubiquitous and terrifying are these errors to computer codes a popular web forum for coders and
computer scientists is called http://stackoverflow.com/.
If you ever do beginner’s coding in C or Fortran remember the following lesson:
Common Programming Error:
Never allocate arrays on the stack (Possibly Fatal)!
In this module, these issues will never arise; however, this is a salutary lesson, and one not often
referred to in beginner’s courses on real coding!
All of the different levels of memory and their dependencies are summarized in the diagram at the
-
22 Chapter 3. Computer architecture and Compilers
end of this chapter (Figure 3.2).
3.4 Multicore architectures
If you open the task manager on a modern machine running Windows, the chances are you will see
two panels by first going to ‘performance’ and then ‘CPU Usage History’ . It would appear that
the machine has two CPUs. In fact, modern computers contain multiple cores. We still consider
the machine to have a single CPU, but two smaller processing units (or cores) are placed on the
same chip. The two cores share some cache (‘L2 cache’), while some other cache is private to each
core (‘L1 cache’). This enables computer to break up a computational task into two parts, work on
each task separately, via the private cache, and communicate necessary shared data via the shared
cache. This architecture therefore facilitates parallel computing, thereby speeding up computation
times. High-level programs such as MATLAB take advantage of multiple-core computing without
any direction from the user. On the other hand, lower-level programming standards (e.g. C, Fortran)
require explicit direction from the user in order to implement multiple-core processing. This is done
using the OpenMP standard.
Unfortunately, the idea of having several cores on a single chip makes the description of this archi-
tecture ambiguous. We reserve the word processor for the entire chip, which will consist of multiple
sub-units called cores. Sometimes the cores are referred to as threads and this kind of computing
is called multi-threaded.
3.5 Compilers
As mentioned in Section 3.1, a standard procedure for writing code is the following:
1. Write the code in a high-level computer language such as C or Fortran. You will do this in a
text editor. Computer code on this level has a definite syntax that is very similar to ordinary
English.
2. Convert this high-level code to machine-readable code using a compiler. You can think of
this as a translator that takes the high-level code (readable to us, and similar in its syntax to
English) into lots of gobbledegook that only the computer can understand.
3. Compilation takes in a text file and outputs a machine-readable executable file. The exe-
cutable can then be run from the command line.
MATLAB sits one level higher than a high-level computer language, with a friendly syntax and all
sorts of clever procedures for allocating memory so that we don’t need to worry about technical
-
3.5. Compilers 23
issues. It also has a user-friendly interface so that our high-level Matlab files can be run and the
output interpreted and plotted in a user-friendly fashion. Incidently, Matlab is written in C, so it as
though two translations happen before the computer executes our code: Matlab→ C → (Machine-readable code).
In this course, issues of precision, truncation error, and computer architecture are moot. Now that
we have tentatively (and metaphorically) opened the lid of our computer and seen its architecture,
we will close it firmly, learn Matlab, and compute things. That said, these questions are important
a number of reasons:
1. Learning stuff is always good!
2. We should never treat something as a ‘black box’ to be intereacted with only by mindlessly
pressing a few buttons. Knowledge is good (point 1 again).
3. Sometimes, things go wrong with our codes (e.g. truncation error). Then, we need to
understand properly how numbers are represented on a computer.
4. Suppose that our calculations become large (requiring long runtimes and large amounts of
memory). Then, knowledge of the computer’s architecture helps us to understand the limi-
tations of the calculations, and extend those limits (e.g. virtual memory, multi-threading /
shared memory, distributed memory). These last topics would be studied typically in an MSc
in High-Performace Computing.
-
24 Chapter 3. Computer architecture and Compilers
Figure 3.2: (From Wikipedia) Computer architecture showing the interaction between the differentlevels of memory.
-
Chapter 4
Our very first Matlab function
Open the Matlab text editor and type the following:
function x=addnumbers(a,b)
x=a+b;
end
Save this as a file called “addnumbers.m” We have thus created a Matlab function “addnumbers”
with filename “addnumbers.m”. We call a, b, and x variables. These are placeholders for a real
number. There are rich analogies between computer syntax and mathematical syntax. Given a
function like f(x) = 2x2+x+1, f(x) and x are placeholders for real numbers, and the real number
f(x) is got by setting x equal to a definite value and then evaluating the function. Again, just like
in mathematical functions, we have the notion of inputs and outputs:
1. The inputs to the Matlab function are a and b, which can be any real numbers.
2. The output is x = a+ b.
Common Matlab Programming Error:
• Not giving the Matlab function and its filename the same name.
• Matlab is CaSE SensItiVE: a and A are not the same variable. [‘Little-a’ and ‘big-a’are not the same variable.]
Now, at the command line, type
x=addnumbers(1,2);
display(x)
25
-
26 Chapter 4. Our very first Matlab function
The result should be x = 3. You could get the same result by typing
x=addnumbers(1,2)
Common Matlab Programming Error:
Not using the semicolon to suppress output. This is not fatal, but can lead to lots of
unnecessary numbers being displayed on the GUI.
Matlab functions can have more than one output. For example, consider the following:
function [x,y]=add_and_multiply(a,b)
x=a+b;
y=a*b;
end
After saving this function, one would type at the command line:
[x,y]=add_and_multiply(1,2)
-
Chapter 5
Vectors, Arrays, and Loops in Matlab
Overview
At its heart Matlab is nothing more than a glorified Linear Algebra package. It is a giant calculator
for doing linear-algebra calculations very efficiently. A main aim of this module is therefore to
understand Matlab’s syntax for handling vectors and matrices (and more generally, arrays).
5.1 Vectors and For Loops
Supposing we have an ordinary three-dimensional vector
v = (1, 2, 4)
This can be stored in Matlab (for example, in RAM, on the command line) by typing
v=[1,2,3];
We can check that the individual components of the vector have been stored properly by typing
display(v(1))
display(v(2))
display(v(3))
Thus, v(i) is the ith component of the vector, in the Matlab syntax. We call i the index. Here,
obviously, i = 1, 2, 3.
27
-
28 Chapter 5. Vectors, Arrays, and Loops in Matlab
The for loop
Accessing the different components of a vector is straightforward for a three-dimensional vector.
However, supposing we had the following vector:
v=rand(100,1);
which is a 100-wide row vector with entries that are random numbers between 0 and 11. We might
like to print all of the elements to the screen. Typing
display(v(1))
display(v(2))
display(v(3))
&c &c all the way down to the 100th index would be tiresome and very silly. Happily, we can tell
Matlab to cycle through each of the elements in the vector in a sequential manner, and print the
elements to the screen as Matlab cycles through the vector. This is done with a for loop:
for i=1:100
display(v(i))
end
Granted, the same result could be accomplished by typing
v
but that would be less instructive.
1The notion of random numbers on a computer are treated in Chapter 25.
-
5.1. Vectors and For Loops 29
The mean of the components
Suppose now that we want to compute the mean of the components of the vector. Mathematically,
we have
v = (v1, · · · , v100), v :=1
100
100∑i=1
vi.
This can be accomplished with a for loop as follows:
sum_val=0;
for i=1:100
sum_val=sum_val+v(i);
end
sum_val=sum_val/100;
display(sum_val)
I can’t really explain this to you; you will just have to go away and look at it, and play with the
associated Matlab function. After worrying about this for long enough, I promise it will make sense.
Common Matlab Programming Error:
Not initializing sum val to be zero (Fatal).
Moving on, a keynote of this module is the following principle:
Good Programming Practice:
Operations on vectors can be performed component-wise or equivalently, using inbuilt
vector functions.
In other words, for every for loop that we construct, there is a specialized Matlab command that
does the same thing. For example, typing
sum_val=sum(v)/100
will also give the mean of the random vector; here ‘sum’ is the built-in Matlab function.
-
30 Chapter 5. Vectors, Arrays, and Loops in Matlab
Exercise 5.1 Let
v=rand(1,200), w=rand(1,200)
be two distinct random vectors. Compute the dot product of v and w,
v ·w =200∑i=1
viwi
(i) using a for loop; (ii) using a built-in function to be found by looking at the Matlab Help
pages.
The dot-star operation
Following on from this exercise, we introduce a very useful operation in matlab called dot-star.
This is pointwise multiplication. Given vectors
v = (v1, · · · , vn), w = (w1, · · · , wn),
a new vector v · ∗w is defined such that
v · ∗w = (v1w1, · · · , vnwn).
Thus, an alternative way of doing Exercise 5.1 is to type
newvec=v.*w;
dotprod=sum(newvec);
Common Matlab Programming Error:
Typing v ∗ w when v · ∗w is meant. The ordinary ∗ operation in Matlab means themultiplication of two scalars, or two matrices (see below).
-
5.2. Nested for-loops and matrices 31
5.2 Nested for-loops and matrices
Let A ∈ Rm×n and B ∈ Rn×p be matrices. We can take the product of these matrices: the matrixAB has ijth component
(AB)ij =n∑
k=1
AikBkj.
Thus, the ijth component is obtained by taking the ith row of A and dotting it with the jth column
of B. For that reason, to do matrix multiplication, the number of elements in a column of A should
be the same as the number of elements in a row of B. This can be remembered in a mnemonic:
(Matrix product) (m× n)(n× p) = (new matrix) (m× p).
It is as if we do a ‘cross multiplication’ whereby ‘the n in the middle cancels’. Using dot products,
we can now multiply two matrices, as in the following example:
A=[3,2,1;1,-1,2];
B=[7,-1,2,6;4,-3,2,5;3,4,-7,-1];
It might be nice to visualize these matrices before we go any further:
The matrix A is a 2 × 3 matrix; B is 3 × 4. Their matrix product AB will be 2 × 4. We nowallocate a matrix to hold the result of our calculation:
-
32 Chapter 5. Vectors, Arrays, and Loops in Matlab
ABprod=zeros(2,4);
Good Programming Practice:
Always initialize or ‘allocate’ any arrays which are to be accessed using ‘for’ loops. In
some cases, this can speed up the code’s execution times by factors of 10 or 100.
Now, we take the ith row of A and we dot it with the jth row of B. But we have now hit a problem!
There are two labels (or ‘indices’) to ‘loop’ over – and we are only familiar with ‘for loops’ over one
index. The answer is a nested for loop:
for i=1:2
for j=1:4
tempa=A(i,:);
tempb=B(:,j);
ABprod(i,j)=dot(tempa,tempb);
end
end
Now, by now, you should be starting to realise that a main goal of this course is to open up the
‘black box’ made up by Matlab’s built-in functions. For that reason, we can check the results of our
calculation with Matlab’s own built-in method for multiplying matrices:
display(ABprod)
display(A*B)
-
Chapter 6
Operations using for-loops and their
built-in Matlab analogues
Exercise 6.1 Write a Matlab function to do the following tasks. If possible, verify your answer
using the appropriate built-in functions which can be found in the Matlab ‘help’ documents.
1. Compute the factorial of a non-negative integer.
2. Compute the cross product of two three-dimensional vectors.
3. Compute the square of a n×n matrix. The input must be a square matrix – A, say. The sizeof A can be obtained from the command
[nx,ny]=size(A);
Because the matrix is square, nx and ny should be the same. Later on we will write code to
check if conditions like this one are true.
4. Using the formula
190π4 =
∞∑n=1
1
n4, (6.1)
compute π valid to 10 significant figures.
Hints:
• The apparent (i.e. displayed) precision of Matlab can be lengthened by first of all typing
format long
33
-
34 Chapter 6. Operations using for-loops and their built-in Matlab analogues
at the Matlab command line, before the function is executed.
• In this exercise, you should write a function that takes in Napprox – a finite truncationorder of the sum (6.1). It should return a value πapprox. You should experiment by
executing the function for different (increasing) values of Napprox until there is no change
in the first 10 digits of πapprox.
• You should write two versions of the function. The first version will use a four loop; thesecond will use only built-in Matlab functions .∗, ./, and sum(). A vector (1, 2, · · · , N)can be defined in Matlab with the command
vecN=1:1:N;
Here, 1 is the starting value of the vector, N is the final value, and the 1 sandwiched
between the colons is the increment.
-
Chapter 7
While loops, logical operations,
precedence, subfunctions
Overview
We introduce some additional operations in Matlab that will be indispensable throughout this mod-
ule.
7.1 The ‘while’ loop
We have seen how the ‘for’ loop provides a means of accessing the elements of a vector or an array
in a sequential fashion, e.g.
v=1:1:10;
for i=1:length(v)
temp_val=v(i);
display(temp_val)
end
The ‘for’ loop passes the counter i through the loop. During each pass through the loop, the
counter is incremented by one. The passes continue through the loop provided the statement
i ≤ 10
is true. When this statement becomes false, the passes through the loop stop. Thus, a sequence of
logical operations (true/false) is carred out automatically, until certains statements become false.
Another way of doing this is with a while loop, as follows:
35
-
36 Chapter 7. While loops, logical operations, precedence, subfunctions
v=1:1:10;
i=1;
while(i
-
7.2. Logical operations 37
The ‘while’ loop is therefore more general than a ‘for’ loop. With this extra freedom comes a
requirement for extra caution:
Common Programming Error:
• Forgetting to initialize the counter in the ‘while loop’
• Forgetting to increment the counter in the ‘while loop’
• Performing an operation on the incremented counter (i+ 1) instead of using i.
7.2 Logical operations
We have already mentioned that the counter in ‘for’ and ‘while loops’ are incremented until some
logical condition becomes false. This suggests that Matlab has a way of checking for the truth or
falseness. This is indeed correct. Such checks are often encountered in ‘if’ statements.
‘If’ statements
Suppose that in Chapter 6 had a Matlab code to compute A2, where A is a square matrix. This
code would contain the following elements:
1 f u n c t i o n Asq=square A (A)
2
3 [ nx , ny ]= s i z e (A) ;
4
5 . . .
6
7 end
sample matlab codes/square A missing info.m
If nx ̸= ny there is not really much point in going any further with this calculation, as it will returnnonsense. It might be good to have in the code a check to see if nx = ny, and to know what to do
in case nx ̸= ny. The following flowchart indicates what we need:
• If nx = ny we need to get on with the calculation!
• If nx ̸= ny we should exit the code.
-
38 Chapter 7. While loops, logical operations, precedence, subfunctions
This can be implemented in Matlab with an ‘if-else statement’:
1 f u n c t i o n Asq=s q u a r e A m i s s i n g i n f o 1 (A)
2
3 [ nx , ny ]= s i z e (A) ;
4
5 i f ( nx==ny )
6 % The code to squa r e A goes he r e .
7 . . .
8 e l s e
9 % We shou ld e x i t the code and r e t u r n a v a l u e .
10 Asq=0∗A;11 d i s p l a y ( ’ E r r o r : A i s not a squa r e mat r i x ’ )
12 d i s p l a y ( ’ Re tu rn i ng Aˆ2=0 and e x i t i n g code ’ )
13 r e t u r n
14 end
15
16 end
sample matlab codes/square A missing info1.m
Some notes:
• The condtion nx = ny is checked in Line 5, with the piece of code if(nx==ny). The doubleequals sign is not a typo: this is a logical equals sign, which is an operation to check the
truth of the statement nx = ny.
On the other hand, the piece of code nx=ny is called an assignment equals sign: it is an
operation whereby the variable nx is assigned the value ny.
Common Matlab Programming Error:
Using an assignment equals sign in a logical check.
• On line 8, Matlab is instructed what to do if A is not a square matrix. Because we havewritten a function, we have in a sense painted ourselves into a corner: we must return some
output to the command line, even if a correct calculation is impossible. We elect to return a
zero matrix of size nx × ny, and alert the user using the warnings on lines 11 and 12 that amistake has been made.
As a further example of an ‘if-else statement’, consider a homemade Matlab function to compute
the absolute value of a number:
|x| =
+x, if x ≥ 0,−x, if x ≤ 0.
-
7.2. Logical operations 39
This is implemented as follows:
1 f u n c t i o n [ ab s x ]=abs x homemade ( x )
2
3 i f ( x>=0)
4 ab s x=x ;
5 e l s e
6 ab s x=−x ;7 end
8
9 end
sample matlab codes/abs x homemade.m
Of course, as with many other things in Matlab, there is a built-in function for computing absolute
values:
abs_x=abs(x);
If built-in functions exist, they should always be preferred over their home-made alternatives: armies
of Ph.D. computational scientists are paid lots of money by Matlab to devise clever algorithms;
unfortunately, we are rarely likely to beat them at their own game.
Common Matlab Programming Error:
• Using a homemade Matlab function instead of the built-in alternative.
• Calling a homemade function by a name reserved for a built-in function.
Other logical operations are possible. For example, it is possible to check a condition without having
an alternative (‘if without the else’). Further possibilities:
• A series of independent ‘if’ statements, e.g.
if(i
-
40 Chapter 7. While loops, logical operations, precedence, subfunctions
• A series of dependent ‘if’ statements, e.g.
if(i
-
7.2. Logical operations 41
A better idea is the following:
1 f u n c t i o n x=s amp l e i f s t a t emen t s 2 ( i )
2
3 i f ( i 0) )
14 check=1;
15 d i s p l a y ( ’ both f u n c t i o n e v a l u a t i o n s have p o s i t i v e s i g n ’ )
16 e l s e
17 check=0;
-
42 Chapter 7. While loops, logical operations, precedence, subfunctions
18 end
19
20 end
sample matlab codes/check sign f1.m
On the other hand, suppose that our code relies on f(x) being positive at x = a OR x = b (or
both). We check this using a logical ‘or’ operation:
1 f u n c t i o n check=c h e c k s i g n f 2 ( )
2
3 % We are go ing to check the s i g n o f f ( a ) and f ( b ) , f o r
4 %
5 % f ( x ) = s i n ( x )+x∗ cos ( x )+exp ( x ) /(1+x ˆ2) .6
7 a=1;
8 b=2;
9
10 f a=s i n ( a )+a∗ cos ( a )+exp ( a ) /(1+a ˆ2) ;11 f b=s i n ( b )+b∗ cos ( b )+exp ( b ) /(1+bˆ2) ;12
13 i f ( ( fa >0) | | ( fb>0) )14 check=1;
15 d i s p l a y ( ’ a t l e a s t one o f the f u n c t i o n e v a l u a t i o n s has p o s i t i v e s i g n ’ )
16 e l s e
17 check=0;
18 end
19
20 end
sample matlab codes/check sign f2.m
Logical negation
Often it is useful to check if a variable x is NOT equal to some singular value. For example, suppose
we want to compute f(x) = sin(x)/x. Obviously, sin(0)/0 is not defined, but by l’Hôpital’s rule,
we know that it is sensible to define f(0) = 1. We would write the following piece of code:
if(x==0)
fx=1;
else
fx=sin(x)/x;
end
-
7.2. Logical operations 43
However, the same operation can be achieved using a logical negation:
• If x ̸= 0, then f(x) = sin(x)/x;
• Otherwise, we have x = 0 and we set f(x) = 1.
This is implemented in Matlab as follows:
if(x~=0)
fx=sin(x)/x;
else
fx=1;
end
‘Isnan’ and ‘Isinf’ statements
Finally, there are other checks that one can perform. We might like to see if a varible has overflowed
to become ‘numerical infinity’:
x=1/0;
isinf(x)
Typing isinf(x) in this instance returns the value 1. In logical operations, ‘1’ corresponds to ‘true;
and ‘0’ to ‘false’. Thus, when isinf(x)= 1, we know that x has overflowed to become numerical
infinity.
Similarly, we can check to see if a number has been badly defined to become ‘Not a number’:
x=0/0;
isnan(x)
Typing isnan(x) returns the value 1, meaning that it is true that x is not a (double precision)
number. On the other hand, typing
y=1;
isnan(y)
returns 0, meaning that y is well-defined as a double-precision number.
-
44 Chapter 7. While loops, logical operations, precedence, subfunctions
7.3 Precedence
As in ordinary arithmetic, the precedence of operations (i.e. which comes first in a composition of
operations) is BOMDAS. Sensibly, compositions of operations that ordinarily have the same level
or precedence are performed starting with the leftmost operation and then reading to the right.
However, Matlab admits more operations than primary-school arithmetic, so the list is longer. The
following list is not exhaustive, but includes all of the operations you will encounter in this module:
1. Brackets ()
2. Matrix transpose (.’), pointwise power (.∧), Matrix complex-conjugate-transpose (’) and scalar
complex conjugate (’), matrix power (∧)
3. Unary plus (+), unary minus (−), logical negation (∼)
Unary operators (operators involving only one argument) do not really have an independent
existence in Matlab; here +A just means A, and −A means (−1)× A, where A is an array.
4. Pointwise operations: multiplication (.∗), right division (./), left division (.\); Matrix opera-tions: matrix multiplication (∗), matrix right division (/ ), matrix left division (\)
5. Addition (+), subtraction (−)
6. Logical operators: less than (=), equal to (==), not equal to ( =)
7. Short-circuit AND (&&)
8. Short-circuit OR (||)
Short-circuit AND and OR means that the second argument of the operation is not evaluated
unless it is needed.
7.4 Subfunctions
It is quite common in Matlab to write a function in Matlab (a ‘.m’ file) and to find that within
that file, you need to call other functions. This idea of a ‘function within a function’ can be easily
accommodated in Matlab and is called ‘nesting’.
We re-visit the example in Section 7.2 (check sign f1.m), with a small twist: we check the sign of
the (mathematical) function
f(x) = sin x+ x cos x+ex
k20 + x2,
-
7.4. Subfunctions 45
at locations x = a and x = b. Here k0 is a user-defined constant that entered at the command line
when the (Matlab) function is called. Instead of having two near-identical function evaluations at
x = a and x = b, we make a one-off definition of f(x) and reuse it as follows:
1 f u n c t i o n check=c h e c k s i g n f 3 ( k0 )
2
3 % We are go ing to check the s i g n o f f ( a ) and f ( b ) , f o r
4 %
5 % f ( x ) = s i n ( x )+x∗ cos ( x )+exp ( x ) /( k0ˆ2+x ˆ2) .6
7 a=1;
8 b=2;
9
10 f a=e v a l f ( a ) ;
11 f b=e v a l f ( b ) ;
12
13 i f ( ( fa >0) && ( fb>0) )
14 check=1;
15 d i s p l a y ( ’ both f u n c t i o n e v a l u a t i o n s have p o s i t i v e s i g n ’ )
16 e l s e
17 check=0;
18 end
19
20 % ∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗21 % De f i n i t i o n o f f ( x ) he r e .
22
23 f u n c t i o n y=e v a l f ( x )
24 y=s i n ( x )+x∗ cos ( x )+exp ( x ) /( k0ˆ2+x ˆ2) ;25 end
26
27 end
sample matlab codes/check sign f3.m
The advantage of this is approach is economy. While this economy is not very clear here, one can
imagine that such ‘recycling’ is extremely important when (say) 100 sequential function evaluations
are required.
-
46 Chapter 7. While loops, logical operations, precedence, subfunctions
Writing subfunctions has its pitfalls. In the example above (check sign f3.m) the subfunction where
f(x) is defined is nested – it appears between the beginning and the end of the main function. It
is also possible to have a completely independent subfunction:
1 f u n c t i o n check=c h e c k s i g n f 4 ( k0 )
2
3 % We are go ing to check the s i g n o f f ( a ) and f ( b ) , f o r
4 %
5 % f ( x ) = s i n ( x )+x∗ cos ( x )+exp ( x ) /( k0ˆ2+x ˆ2) .6
7 a=1;
8 b=2;
9
10 f a=e v a l f ( a , k0 ) ;
11 f b=e v a l f ( b , k0 ) ;
12
13 i f ( ( fa >0) && ( fb>0) )
14 check=1;
15 d i s p l a y ( ’ both f u n c t i o n e v a l u a t i o n s have p o s i t i v e s i g n ’ )
16 e l s e
17 check=0;
18 end
19
20 end
21
22 % ∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗23
24 f u n c t i o n y=e v a l f ( x , k 0 l o c )
25 y=s i n ( x )+x∗ cos ( x )+exp ( x ) /( k 0 l o c ˆ2+x ˆ2) ;26 end
sample matlab codes/check sign f4.m
However, in this case, none of the variables defined in the main part of the code is defined in the
subfunction. A real programmer would say that the variables in the main function are limited in
scope, or are only locally defined. For that reason, we pass two values to the subfunction f(x) –
the value of the variable x, and the value of the parameter k. For the avoidance of ambiguity, we
give the parameter k a new variable name in the subfunction, calling it k loc (for ‘local’, as it is
locally defined in the subfunction).
Common Matlab Programming Error:
Hoping that local variables will be defined in an indpendent (non-nested) subfunction.
-
7.4. Subfunctions 47
There is another way around the issue of passing variables limited in scope to independent (non-
nested) subfunctions. One can declare a variable to be globally defined. However, to the uniniti-
ated, these can be very dangerous, and are not discussed further in this module.
-
Chapter 8
Plotting in Matlab
Overview
We learn how to make simple one-dimensional curve plots in Matlab. We also learn how to prettify
these plots in order to create production-level graphics.
8.1 The idea
As we have mentioned before, at its heart, Matlab is a tool for maniuplating vectors and matrices.
For that reason, the way in which we plot functions is based on the maniuplation of vectors.
For example, suppose we wish to plot the function
f(x) = sin x+ x cos x+ex
1 + x2
in the range [0, 6].
We would create a vector of x-locations, spaced apart by a small distance:
x=0:0.01:6;
We would then create a second vector of points, corresponding to f(x):
fx=sin(x)+x.*cos(x)+exp(x)./(1+x.^2)
(note the ‘.*’ operation here). We would then plot the result as follows:
plot(x,fx)
48
-
8.1. The idea 49
The result looks like the following figure:
0 1 2 3 4 5−2
−1
0
1
2
3
4
5
6
7
Of course, we have not plotted a continuous curve, rather we have plotted the value of f(x) at the
discrete x-locations x = 0, 0.01, 0.02, · · · . One way to see this explicitly is to put a big ‘X’ at eachof these discrete locations:
plot(x,fx,’-x’)
Clearly, there are lots of these dots, and our grid x=0:0.01:6 is fine enough to give a good
description of the continuous curve (x, f(x)).
0 1 2 3 4 5−2
−1
0
1
2
3
4
5
6
7
To see the effects of having too coarse a grid, we de-refine the x-grid as follows:
x=0:0.1:6;
plot(x,fx,’-x’)
The result is terrible!
-
50 Chapter 8. Plotting in Matlab
0 1 2 3 4 5 6−2
0
2
4
6
8
10
12
14
16
18
Clearly, the grid chosen must match the amount of variation in the function. This choice can be
refined by trial-and-error.
8.2 Embellishments
Any Physics student who has survived the gruelling ordeal of lab sessions will know the importance
of labelling graphs clearly. Matlab provides this facility:
(a) (b)
However, I prefer to do this kind of thing on the command line (it gets quicker with practice, and
it can be automated for batches of plots):
• To create production-quality axis labels:
-
8.2. Embellishments 51
set(gca,’fontsize’,18,’fontname’,’times new roman’)
Here, ‘gca’ is a handle to the current axes (‘get current axes’).
• To label the graph:
xlabel(’x’)
ylabel(’y=f(x)’)
The order is important here – you must change the font before drawing the labels; otherwise
the labels will be in the default font (small and plain).
• For production-quality graphics, the thickness of the curve (‘linewidth’) should be set tothree. This can be done via the editor, or immediately on creation of the plot, using instead
the modifed plot command
plot(x,fx,’linewidth’,3)
• Sometimes, the line y = 0 can be helpful in a plot to guide the eye. This can be included asfollows:
hold on
plot(x,0*x,’linewidth’,1,’color’,’black’)
hold off
Here, the ‘hold on’ command holds the current figure in place so that another plot layer can
be included. Without this ‘hold on’ command, the additional plot command would overwrite
the first plot.
The instruction ...,’color’,’black’ tells Matlab to plot the horizontal line in black. Mat-
lab only takes American spellings!
• To pick out a particular point on the curve (e.g. a point where y = f(x) hits zero, one canuse the data cursor.
-
52 Chapter 8. Plotting in Matlab
I think the final, embellished result is much nicer than our original attempts (Fig. 8.1)!
0 1 2 3 4 5 6−5
0
5
10
15
20
X: 2.56Y: 0
x
y=f(
x)
Figure 8.1: Final, embellished plot of f(x) = sinx+ x cos x+ ex/(1 + x2) on the range x ∈ [0, 6].
-
Chapter 9
Root-finding
Overview
In this chapter we study an elementary numerical method to compute roots of the problem
f(x) = 0,
where f(x) is a continuous function.
9.1 Roots
Definition: Let f : R → R be a continuous function The value x∗ is said to be a a root of f if
f(x∗) = 0.
Example: x = 1 is a root of f(x) = x2− 3x+2 because f(1) = 1− 3+2 = 0. There is no limit tothe number or roots that a function may have. For example, the quadratic function just described
has two roots, x∗ = 1, 2. On the other hand, the function f(x) = sin x has infinitely many roots,
x∗ = nπ, where n ∈ Z. We do have some theorems however that tell us when at least one rootshould exist:
Theorem 9.1 (Intermediate Value Theorem) Let f : [a, b] → R be a continuous real-valuedfunction, with f(a) < f(b). Then for each real number u with f(a) < u < f(b), there exists at
least one value c ∈ (a, b) such that f(c) = u.
No proof is given here but see for example Beales (p. 105); see also Figure 9.1.
53
-
54 Chapter 9. Root-finding
Corollary 9.1 If f : [a, b] → R is a continuous real-valued function with f(a) < 0 and f(b) > 0,then there exists at least one value x∗ ∈ (a, b) such that f(x∗) = 0, that is, f has a root strictlybetween a and b.
(a)
(b)
Figure 9.1: Sketch for the Intermediate Value Theorem and its corollary.
-
9.2. Bracketing and Bisection 55
9.2 Bracketing and Bisection
Let f : [a, b] → R be a continuous function with f(a) < 0 and f(b) > 0. By the IntermediateValue Theorem, f has at least one root on (a, b). Bracketing and Bisection (B&B) is an algorithm
for finding one of these roots:
1. Compute the midpoint c1 = (a+ b)/2.
2. Compute f(c1). If f(c1) < 0 then focus on a new interval [c1, b]. If f(c1) > 0 then focus on
a new interval [a, c1].
3. Compute the midpoint of the new interval, then repeat step 2.
4. Repeat indefinitely until convergence down to the required precision is obtained.
Steps (1)–(2) are shown schematically in Figure 9.2, and a sample MATLAB code is given here in
what follows.
1 f u n c t i o n x s t a r=d o b r a c k e t i n g b i s e c t i o n ( a , b )
2
3 % ∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗4 % I t e r a t e u n t i l s o l u t i o n i s r oo t i s conve rged to w i t h i n the f o l l o w i n g
5 % to l e r a n c e .
6
7 t o l=1e−16;8
9 % ∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗10 % I n i t i a l gue s s f o r the i n t e r v a l and f o r the r oo t .
11
12 c1=a ;
13 c2=b ;
14
15 x s t a r o l d =(c1+c2 ) /2 ;
16
17 % ∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗18 % Er r o r check i ng : See i f B r a ck e t i ng and B i s e c t i o n i s p o s s i b l e .
19
20 i f ( ( f ( a ) ∗( f ( b ) )>=0))21 d i s p l a y ( ’ b r a c k e t i n g and b i s e c t i o n not p o s s i b l e ; e x i t i n g ’ )
22 x s t a r=’ r u bb i s h ’ ;
23 r e t u r n
24 end
25
26 % ∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗27 % Er r o r check i ng : See i f i n i t i a l gue s s i s a c t u a l l y the r oo t ; i f so ,
-
56 Chapter 9. Root-finding
28 % te rm i na t e program .
29
30 i f ( abs ( f ( x s t a r o l d ) )< t o l )
31 d i s p l a y ( ’ i n i t i a l gue s s h i t s r oo t ’ )
32 x s t a r=x s t a r o l d ;
33 r e t u r n
34 end
35
36 % ∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗37 % F i r s t pas s th rough the a l g o r i t hm to f i n d new va l u e o f x s t a r .
38 % There a r e two sub−a l g o r i t hm s :39 % 1. One sub−a l g o r i t hm i f f ( a )0 −− the one d e s c r i b e d i n the40 % te x t
41 % 2. Another sub−a l g o r i t hm i f f ( a )>0 and f ( b )
-
9.2. Bracketing and Bisection 57
73
74 % St r u c t u r e f o r sub−a l g o r i t hm 1 :75 %
76 % 1. I f f (cm)0 then the new i n t e r v a l shou ld be [ c1 , cm ] ;
78 % 3. I f f (cm)=0 then we have h i t the r oo t e x a c t l y and shou ld e x i t the
79 % loop .
80
81 i f ( f ( a ) t o l )84 cm=(c1+c2 ) /2 ;
85 i f ( f (cm)0)
90 c2=cm ;
91 x s t a r o l d=x s t a r ;
92 x s t a r=(c1+c2 ) /2 ;
93 e l s e
94 x s t a r o l d =(c1+c2 ) /2 ;
95 x s t a r= ( c1+c2 ) /2 ;
96 end
97 end
98
99 e l s e
100 wh i l e ( abs ( x s t a r−x s t a r o l d )> t o l )101 cm=(c1+c2 ) /2 ;
102 i f ( f (cm)0)
107 c1=cm ;
108 x s t a r o l d=x s t a r ;
109 x s t a r=(c1+c2 ) /2 ;
110 e l s e
111 x s t a r o l d =(c1+c2 ) /2 ;
112 x s t a r= ( c1+c2 ) /2 ;
113 end
114 end
115
116 end
117
-
58 Chapter 9. Root-finding
118
119
120 % ∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗121 % End o f main program .
122
123
124 end
125
126 % ∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗127 % ∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗128 % Sub func t i on to e v a l u a t e y=f ( x ) .
129
130 f u n c t i o n y=f ( x )
131 % y=x .ˆ2−2;132 % y=x .ˆ3−2∗x .ˆ2+x−1;133 % y=x .ˆ3+10∗ x .ˆ2+x−1;134 y=s i n ( x ) ;
135 end
sample matlab codes/do bracketing bisection.m
There is a lot to discuss in this code! Let’s go through it line-by-line:
• Lines 12-15. Here I find the initial values for the interval, with c1 = a and c2 = b. I make aninitial guess for the root, namely f [(c1 + c2)/2].
Note that I am leaving the definition of f(·) in a subfunction. This is handy: the code can beeasily recycled to compute the roots of many different continuous functions.
• Lines 20-24. Here I check to see if there really is a sign change, i.e. if f(a)f(b) < 0. If thereis not a sign change, then bracketing and bisection will not work, and the code should be
halted. Because the function must return a value, I set the variable xstar to equal a string
called rubbish. A string is an array of characters.
• Lines 30-34. These lines are included in case we get very lucky. If we are very lucky, thestarting-guess for the root will in fact be the root, to within machine precision. Then we
should set x∗ = (c1 + c2)/2 = (a+ b)/2 and exit the code.
• Lines 43-69. A first pass through the algorithm (i.e. Steps 1 and 2). I have to split up thealgorithm into two sub-algorithms:
1. When f(a) < 0 and f(b) > 0;
2. When f(a) > 0 and f(b) < 0,
-
9.2. Bracketing and Bisection 59
since conceptually, there is no reason why B&B should not work in the second case. Let’s
focus on the first sub-algorithm. I compute the midpoint cm = (c1 + c2)/2 and evaluate
f(cm). Since c1 = a and c2 = b, there are two possibilities:
Case 1 Case 2
f(c1) < 0 f(c1) < 0f(cm) > 0
f(cm) < 0f(c2) > 0 f(c2) > 0
In Case 1 I take my new interval to be [cm, c2] and in Case 2 I take my new interval to be [c1, cm].
I compute my new estimate of the root using the new interval endpoints: x∗new = (c1+c2)/2.
• Lines 81-116. I check the difference between the initial guess and the new guess |x∗ − x∗new|.If this is too large, I repeat steps (1)–(2) of the algorithm. Again, two sub-algorithms are
considered.
• Lines 85–96. The first sub-algorithm again with f(a) < 0. I repeat steps (1)–(2), very similarto Lines 43–69. An extra step is included in here, namely the possibility to break out of the
while loop if the estimated value of the root is in fact the true root, i.e. if f(cm) = 0. Note
the application of the very useful elseif statement here.
Figure 9.2: Sketch for Bracketing and Bisection
-
60 Chapter 9. Root-finding
Convergence analysis
At each level n of iteration, the estimate of the root is
x∗n =c1n + c2n
2,
and the maximum possible distance between the estimated value of the root and the true value is
given by
Error(n) = max (|c2n − x∗n|, |x∗n − c1n|) .
We have
Error(n) = max (|c2n − x∗n|, |x∗n − c1n|) ≤|c2n − c1n|
2:= δn.
Thus, at the zeroth level of iteration, we have
δ0 = |b− a|.
At the first level, we have (case 1) c1 = a and c2 = (a+ b)/2 or (case 2) c1 = (a+ b)/2 and c2 = b.
In either case,
δ1 =|b− a|
2.
Guessing the pattern, or doing a proper proof by induction, we have
Error(n) ≤ δn =|b− a|2n
.
Also,δn+1δn
= 12
is a constant, so the maximum possible error δn converges linearly as n → ∞. As we shall seelater, linear convergence is rather slow, and B&B is not normally used as the sole method by which
a root is found.
Failure analysis
When applied to a continuous function on an interval where a sign change occurs, Bracketing
and Bisection will never fail. It will converge (slowly) to a root. Ambiguity can occur however
when the continuous function possesses multiple roots on the interval (e.g. f(x) = sin(x) on
x ∈ (−π/2, 5π/2), with roots at 0, π, 2π, and sin(−π/2) = −1 and sin(5π/2) = +1. In this case,B&B will converge to one of the roots; however, it is not obvious in advance which root will be
selected.
-
9.2. Bracketing and Bisection 61
Brackecketing and Bisection is therefore robust but slow. In the next chapter we examine a method
with the opposite properties. The goal is to combine these two methods to produce a hybrid scheme
that is robust and fast.
-
Chapter 10
The Newton–Raphson method
Overview
In this chapter we study the Newton–Raphson method for solving
f(x) = 0,
where f(x) is a differentiable function.
10.1 The idea
Figure 10.1: Sketch for the Newton–Raphson method
Let f : [a, b] → R be a differentiable function on (a, b), with at least one root in the interval
62
-
10.1. The idea 63
(a, b). Start with a guess for the root xn. We refine the guess as follows. Referring to Figure 10.1,
construct the tangent line to f(xn), called Ln. The slope is f′(xn) and a point on the line is
(xn, f(xn)). We have
Ln : y − f(xn) = f ′(xn)[x− xn]. (10.1)
Our next level of refinement for the root – xn+1 is got by moving along the tangent line Ln until
the x-axis is crossed. Using Equation (10.1), this is
0− f(xn) = f ′(xn)[xn+1 − xn].
Re-arranging, this is
xn+1 = xn −f(xn)
f ′(xn), (10.2)
provided of course the tangent line has finite slope. The method (10.2), supplemented with a
starting value, is called the Newton–Raphson method for root-finding:
xn+1 = xn −f(xn)
f ′(xn), x0 given. (10.3)
Error analysis
In this section, we require that f be C2 on any interval of interest, and that f ′(x) ̸= 0 on the sameinterval. We let ϵn = x∗−xn be the difference between the root and the nth level of approximation.Then,
ϵn+1 = x∗ − xn+1,
= x∗ −(xn −
f(xn)
f ′(xn)
),
= (x∗ − xn)︸ ︷︷ ︸=ϵn
+f(xn)
f ′(xn). (10.4)
Also, by definition
f(x∗) = f(ϵn + xn) = 0.
Hence, by Taylor’s remainder theorem, we have the exact expression
f(xn) + f′(xn)ϵn +
12f ′′(η)ϵ2n = 0, η ∈ [xn, xn + ϵn].
Re-arrange:f(xn)
f ′(xn)= −ϵn
[1 + 1
2
f ′′(η)
f ′(xn)ϵn
]. (10.5)
-
64 Chapter 10. The Newton–Raphson method
Combine Equations (10.4) and (10.5):
ϵn+1 = ϵn − ϵn[1 + 1
2
f ′′(η)
f ′(xn)ϵn
],
= −[12
f ′′(η)
f ′(xn)ϵ2n
]Thus,
ϵn+1 =12
f ′′(η)
f ′(xn)ϵ2n.
Taking absolute values with δn := |ϵn| &c., this becomes
δn+1 =
∣∣∣∣12 f ′′(η)f ′(xn)∣∣∣∣ δ2n.
An upper limit on the error is
δn+1 = Mδ2n, (10.6)
where
M = supx∈(a,b)y∈(a,b)
∣∣∣∣12 f ′′(x)f ′(y)∣∣∣∣ .
The convergence in the Newton–Raphson method is called quadratic because, by Equation (10.6),
δn+1 ∝ δ2n.
It would now seem that we have a rather awesome numerical method for root finding, with excellent
convergence properties. However, the result (10.6) should be regarded only as ‘local’: it guarantees
fast convergence only if δ0 is small. In other words, if an initial guess is a small distance away from
a root, then the guess will converge quadratically fast to the true root. However, the method is
very sensitive, and in the next chapters we investigate what happens if the initial guess is not close
to the root.
-
Chapter 11
Interlude: One-dimensional maps
Overview
The failure analysis for the Newton–Raphson method is linked intimately to the study of one-
dimensional maps. For that reason, we make a brief interlude and study such maps: their definition,
the notion of fixed points, stability, and periodic orbits.
11.1 Definitions
Definition 11.1 A sequence x is a map from non-negative integers to the real numbers:
x : {0} ∪ N → R,
n 7→ xn.
Example:
{0} ∪ N →{0, 1,
1
22,1
32,1
42, · · ·
}is a sequence.
Definition 11.2 An autonomous discrete map F is a sequence where the (n+1)th element depends
on the nth element through a definite functional form:
xn+1 = F (xn),
and where starting value x0 is also specified.
65
-
66 Chapter 11. Interlude: One-dimensional maps
Example:
xn+1 = λxn + sin(2πxn), λ ∈ R
is a discrete autonomous map.
Another example is the root-finding procedure in the Newton–Raphson method:
xn+1 = F (xn), F (x) = x−f(x)
f ′(x).
There are more general discrete maps, such as
xn+1 = F (xn, xn−1).
Such maps, involving more than two levels, are often called difference equations. We do not
discuss these any further.
11.2 Fixed points and stability
Definition 11.3 Let
xn+1 = F (xn)
be a discrete autonomous map. The fixed points of the map are those values x∗ for which
F (x∗) = x∗.
Theorem 11.1 (Fixed points of the Newton–Raphson map) Let
xn+1 = F (xn), F (x) = x−f(x)
f ′(x)
be the Newton–Raphson dynamical system. Then the fixed points of the dynamical system are the
roots of f(x).
Proof: Set x∗ = F (x∗), i.e.
x∗ = F (x∗) = x∗ −f(x∗)
f ′(x∗)
Cancellation yieldsf(x∗)
f ′(x∗)= 0,
hence f(x∗)=0.
-
11.2. Fixed points and stability 67
Definition 11.4 Let
xn+1 = F (xn)
be a discrete autonomous map with a fixed point at x∗.
• The fixed point is called stable if |F ′(x∗)| < 1;
• The fixed point is called unstable if |F ′(x∗)| > 1.
The reason for this definition is the following. Suppose the initial condition for the map xn+1 =
F (xn) is near the fixed point:
xn=0 = x∗ + δ0, δ0 ≪ 1.
We want to know what the next value of x will be:
xn=1 = F (xn=0) = F (x∗ + δ0).
Now δ0 is small, so we can do a Taylor expansion:
F (x∗ + δ0) = F (x∗) + F′(x∗)δ0 +
12F ′′(x∗)δ
20 + · · · .
However, δ0 is so small that we are going to ignore the quadratic terms:
F (x∗ + δ0) ≈ F (x∗) + F ′(x∗)δ0 = x∗ + F ′(x∗)δ0
since F (x∗) = x∗. Hence,
xn=1 = x∗ + F′(x∗)δ0.
Let us introduce δ1 such that xn=1 = x∗ + δ1. Thus,
δ1 = F′(x∗)δ0.
Imagine repeating the map n times, such that
δn+1 = F′(x∗)δn.
This equation is linear and has solution
δn = δ0 [F′(x∗)]
n.
• If |F ′(x∗)| < 0, then limn→∞ δn = 0, or limn→∞ xn = x∗;
-
68 Chapter 11. Interlude: One-dimensional maps
• If |F ′(x∗)| > 0, then limn→∞ δn = ∞, and limn→∞ xn is undetermined from the linearizedanalysis.
• In the first case, if the system (the map and the x-values) starts near the fixed point, it staysnear the fixed point – the fixed point is stable;
• In the second case, if the system starts near the fixed point, it moves away from the fixedpoint exponentially fast – the fixed point is unstable.
Exercise 11.1 Let x∗ be a fixed point of the Newton–Raphson map. Analyse the behaviour
of the map near a fixed point by showing that F ′(x∗) = 0. Such a fixed point is called
superstable.
-
Chapter 12
Newton–Raphson method: Failure analysis
Overview
We classify the different ways in which the Newton–Raphson method can fail. We apply the theory
of one-dimensional maps to analysing these failures. Finally, we examine Ma