Download - CS-110 Computational Engineering Part 3
CS-110Computational Engineering
Part 3
• A. Avudainayagam
• Department of Mathematics
Root Finding: f(x)=0
Method 1: The Bisection methodTh: If f(x) is continuous in [a,b] and if f(a)f(b)<0, then there
is atleast one root of f(x)=0 in (a,b).
Single root in (a,b)
f (a) < 0a
x1 b
f(b) > 0
Multiple roots in (a,b)
ax1
x2
x3
x4 x5
b
y
x
y
x
The Method• Find an interval [x0 ,x1] so that f(x0)f(x1) < 0 (This may not be easy.
Use your knowledge of the physical phenomenon which the equation represents).
• Cut the interval length into half with each iteration,by examining the sign of the function at the mid point.
• If f(x2) = 0, x2 is the root. • If f(x2) 0 and f(x0)f(x2) < 0, root lies in [x0,x2].
• Otherwise root lies in [x2,x1].
• Repeat the process until the interval shrinks to a desired level.
210
2
xxx
Number of Iterations and Error Tolerance
• Length of the interval (where the root lies) after n iterations
n01
n 2
xxe
• We can fix the number of iterations so that the root lies within an interval of chosen length (error tolerance).
ne 2ln
ln)xxln(n 01
• If n satisfies this, root lies within a distance of / 2 of the actual root.
Though the root lies in a small interval, |f(x)| may not be small if f(x) has a large slope.
Conversely if |f(x)| small, x may not be close to the root if
f(x) has a small slope.
So, we use both these facts for the termination criterion. We first choose an error tolerance on f(x) : |f(x)| < and m the maximum number of iterations.
Pseudo code (Bisection Method)1. Input > 0 , m > 0, x1 > x0 so that f(x0) f(x1) < 0.
Compute f0 = f(x0) .
k = 1 (iteration count)
2. Do { (a) Compute f2 = f(x2) = f
(b) If f2f0 < 0 , set x1= x2 otherwise set x0 = f2 and f0 = f2.
(c) Set k = k+1.
}
210 xx
3. While |f2| > and k m
set x = x2 , the root.
False Position Method (Regula Falsi)
Instead of bisecting the interval [x0,x1], we choose
the point where the straight line through the end
points meet the x-axis as x2 and bracket the root
with [x0,x2] or [x2,x1] depending on the sign of
f(x2).
False Position Method
(x1,f1)
x0
(x0,f0)
x2 x1
(x2,f2)
y=f(x)
Straight line through (x0,f0) ,(x1,f1) :)( 1
01
011 xx
xx
fffy
New end point x2 :
10 1
0 11 2f
f f
x xx x
y
x
False Position Method (Pseudo Code)
101
0112 f
ff
xxxx
3. While (|f2| ) and (k m)
1. Choose > 0 (tolerance on |f(x)| ) m > 0 (maximum number of iterations ) k = 1 (iteration count) x0 ,x1 (so that f0 ,f1 < 0)
2. { a. Compute
f2 = f(x2)
b. If f0f2 < 0 set x1 = x2 , f0 = f2
c. k = k+1
}
4. x = x2 , the root.
Newton-Raphson Method / Newton’s Method
At an approximate xk to the root ,the curve is approximated by the tangent to the curve at xk and the next approximation xk+1 is the point where the tangent meets the x-axis.
y = f(x)
root
x0
xx1x2
y
)(
)(1
k
kkk xf
xfxx
Warning : If f´(xk) is very small , method fails.
• Two function Evaluations per iteration
This tangent cuts the x-axis at xk+1
Tangent at (xk, fk) : y = f(xk) + f ´(xk)(x-xk)
Newton’s Method - Pseudo code
4. x = x1 the root.
1. Choose > 0 (function tolerance |f(x)| < ) m > 0 (Maximum number of iterations) x0 - initial approximation k - iteration count Compute f(x0)2. Do { q = f (x0) (evaluate derivative at x0) x1= x0 - f0/q x0 = x1
f0 = f(x0) k = k+1 }3. While (|f0| ) and (k m)
Getting caught in a cycle of Newton’s Method
Alternate iterations fall at the same point .
No Convergence.
xk+1
xk x
y
Newton’s Method for finding the square
root of a number x = a
k
kkk x
axxx
2
22
1
Example : a = 5 , initial approximation x0 = 2.
x1 = 2.25
x2 = 2.236111111
x3 = 2.236067978
x4 = 2.236067978
f(x) = x2 - a2 = 0
The secant Method
• The Newton’s Method requires 2 function evaluations (f,f ).
• The Secant Method requires only 1 function evaluation and converges as fast as Newton’s Method at a simple root.
• Start with two points x0,x1 near the root (no need for bracketing the root as in Bisection Method or Regula Falsi Method) .
• xk-1 is dropped once xk+1 is obtained.
The Secant Method (Geometrical Construction)
• Two initial points x0, x1 are chosen
• The next approximation x2 is the point where the straight line joining (x0,f0) and (x1,f1) meet the x-axis
• Take (x1,x2) and repeat.
y = f(x1)
x2 x0 x1
(x0,f0)(x1,f1)
x
y
The secant Method (Pseudo Code)1. Choose > 0 (function tolerance |f(x)| ) m > 0 (Maximum number of iterations) x0 , x1 (Two initial points near the root ) f0 = f(x0) f1 = f(x1) k = 1 (iteration count)
2. Do { x0 = x1
f0 = f1
x1= x2
f1 = f(x2) k = k+1 }
ff f
x xx x
0 1
2 11 2
3. While (|f1| ) and (m k)
General remarks on Convergence
# The false position method in general converges faster than the bisection method. (But not always ) counter example follows.
# The bisection method and the false position method are guaranteed for convergence.
# The secant method and the Newton-Raphson method are not guaranteed for convergence.
Order of Convergence
# A measure of how fast an algorithm converges .
Let be the actual root : f () = 0
Let xk be the approximate root at the kth iteration . Error at the kth iteration = ek = |xk- |
The algorithm converges with order p if there exist such that p
k 1k e e
Order of Convergence of
# Bisection method p = 1 ( linear convergence ) # False position - generally Super linear ( 1 < p < 2 )
# Secant method (super linear) 618.12
51
# Newton Raphson method p = 2 quadratic
Machine Precision
# The smallest positive float M that can be added to
one and produce a sum that is greater than one.
Pseudo code to find ‘Machine Epsilon’
1. Set M = 1
2. Do
{
M = M / 2
x = 1 + M
}
3. While ( x < 1 )
4. M = 2 M
Solutions Of Linear Algebraic Systems
AX = B
A n n matrixB n 1 vectorX n 1 vector (unknown)
• If |A| 0, the system has a unique solution, provided B 0.
• If |A| = 0, the matrix A is called a singular matrix.
# Direct Methods
# Iterative Methods
The Augmented Matrix
[A | B]
nnnn3n2n1
22n232221
11n131211
baaaa
baaaa
baaaa
Elementary row operations
1. Interchange rows j and k.
2. Multiply row k by .
3. Add times row j to row k.
• Use elementary row operations and transform the equation so that the solution becomes apparent
Diagonal Matrix (D) =
Upper Triangular Matrix (U) =
Lower Triangular Matrix (L) =
33
22
11
a00
0a0
00a
33
2322
131211
a00
aa0
aaa
333231
2221
11
aaa
0aa
00a
Gauss Elimination Method
Using row operations, the matrix is reduced to an upper
triangular matrix and the solutions obtained by back
substitution
Partial Pivot MethodStep 1 We look for the largest element in absolute value (called
pivot) in the first column and interchange this row with the first row
Step 2 Multiply the I row by an appropriate number and subtract this from the II row so that the first element of the second row becomes zero.
Do this for the remaining rows so that all the elements in the I column except the top most element are zero.
Now the matrix looks like:
x x x x 0
x x x x 0
x x x x 0
x x x x x
Now look for the pivot in the second column below the first row
and interchange this with the second row.
Multiply the II row by an appropriate numbers and subtract this
from remaining rows (below the second row) so that the matrix
looks like :
xxx00
xxx00
xxxx0
xxxxx
Now look for the pivot in the III column below the second
row and interchange this row with the third row. Proceed
till the matrix looks like :
xx000
xxx00
xxxx0
xxxxx
Now the solution is obtained by back substitution.
Gauss Elimination (An example)
2
1
1
x
x
x
814
312
201
3
2
1
Augmented Matrix :
2814
1312
1201
2814
1312
1201
1201
1312
2814
5.0025.00
215.10
2814
8333.01666.000
215.10
2814
Solution by back substitution
x3 = 0.8333/1.1667 = 5
x2 = (2 - 5 )/1.5 = -2
x1 = (2-8(5)+2)/4 = -9
x1 = -9 , x2 = -2 , x3 = 5
8333.01667.100
215.10
2814
Gauss Elimination (Pseudo Code)1. Input matrix A of order n n and r h s vector B and form the augmented matrix [A|B].
2. For j = 1 to n do
{ (a) Compute the pivot index j p n such that }A{maxA ij
n
jipj
(b) If Apj = 0 , print “singular matrix” and exit.(c) If p > j , interchange rows p and j (d) For each i > j , subtract Aij/Ajj times row j from row i. }
3. Now the augmented matrix is [C|D] where C is upper triangular.
4. Solve by back substitution For j = 1= n down to 1 compute :
n
1jiijij
jj j xDD
C
1x
Determinant of a Matrix
Some properties of the determinant.
1. det (AT) = det (A)
2. det (AB) = det (A) det (B)
3. det (U) = U11U22U33...Unn
4. det [E1(k,j)] = -1 . det A
5. det [E2(k,)] = det A
6. det [E3(k,j,)] = det A
Matrix DeterminantGauss Elimination procedure uses two types of row operations.
1. Interchange of rows for pivoting
2. Add times row j to row k.
The first operation changes the sign of the determinant while the second has no effect on the determinant.
To obtain the matrix determinant, follow Gauss elimination and obtain the upper triangular form U, keeping count of (r) the interchanges of rows
Then det A = (-1)r U11U22 ... Unn
Matrix Determinant (Pseudo Code)
1. Set r = 0
2. For j = 1 to n do
{ (a) Compute the pivot index j p n such that
}A{maxA ij
n
jipj
3. det = (-1)r A11A22 ... Ann
(b) If Apj = 0 , print “Singular Matrix” and exit.
(c) If p > j interchange rows p and j , set r = r +1.
(d) For each i > j , subtract Aij/Ajj times row j and row i
}
Matrix Determinant
A =
0r101
620
041
0r
140
620
041
1r620
140
041
1 r 5. 5 0 0
1 4 0
0 4 1
det = (-1)1 (1)(4)(5.5) = -22
LU Decomposition
• When Gauss elimination is used to solve a linear system, the row operations are applied simultaneously to the RHS vector.
• If the system is solved again for a different RHS vector, the row operations must be repeated.
• The LU decomposition eliminates the need to repeat the row operators each time a new RHS vector is used.
The Matrix A (AX = B) is written as a product of an upper triangular matrix U and a lower triangular matrix L. The diagonal elements of the upper triangular matrix are chosen to be 1 to get a consistent system of equations
n = 3
333231
232221
131211
AAA
AAA
AAA
333231
2221
11
LLL
0LL
00L
100
U10
UU1
23
1312
=
9 Equations for 9 unknowns L11, L21, L22, L31, L32, L33, U12, U13, U23.
332332133132123131
2322132122122121
1311121111
LULULLULL
ULULLULL
ULULL
333231
232221
131211
AAA
AAA
AAA
First column gives L11 L21 L31 = (A11, A21, A31)
First row gives U11 U12 U13 = (1, A12/L11, A13/L11)
Second column gives L12 L22 L32 = (0, A22 - L21U12, A32 - L31U22)
Second row gives U21 U22 U23 = (0, 1, (A23 - L21U13)/L22 )
Third column gives L13 L23 L33 = (0, 0, A33 - L31U13 - L32 U23)
=
LU Decomposition (Pseudo Code)Li1 = Ai1 , i = 1, 2, ... , n
U1j = A1j/L11 , j = 2, 3, ... , n
for j = 2, 3, ... , n-1
1-j
1kkjik ij ij ULAL , i = j, j+1, ... , n
jj
1
1iikjijk
jk L
ULAU
j
, k = j, j+1, ... , n
1
1knnknnnn ULAL
n
k
LU Decomposition (Example)
243
131
152
=
45.33
05.01
002
100
110
5.05.21
The value of U are stored in the zero space of L .
Q = [L \ U] =
nnn3n2n1
2n232221
1n131211
L...LLL
U...ULL
U...UUL
After each element of A is employed , it is not needed again. So Q can be stored in place of A.
After LU decomposition of the matrix A, the system AX = B is solved in 2 steps : (1) Forward substitution, (2) Backward Substitution
AX = LUX = B
Put UX = Y
LY = =
3
2
1
B
B
B
Forward Substitution : Y1 = B1/ L11
Y2 = (B2 - L21Y1) / L22
Y3 = (B3 - L31B1 - L32B2) / L33
3
2
1
333231
2221
11
Y
Y
Y
LLL
0LL
00L
UX =
3
2
1
23
1312
x
x
x
100
U10
UU1
=
3
2
1
Y
Y
Y
Back Substitution :
x3 = Y3
x2 = Y2 - x3 U23
x1 = Y1 - x2U12 - x3 U23
As in Gauss elimination, LU decomposition must
employ pivoting to avoid division by zero and to
minimize round off errors. The pivoting is done
immediately after computing each column.
Matrix Inverse Using the LU Decomposition
• LU decomposition can be used to obtain the inverse of
the original coefficient matrix.
• Each column j of the inverse is determined by using a
unit vector (with 1 in the jth row ) as the RHS vector
Matrix inverse using LU decomposition-an example
243
131
152
=
45.33
05.01
002
100
110
5.05.21
L U First column of the inverse of A is given by
45.33
05.01
002
100
110
5.05.21
3
2
1
x
x
x
=
0
0
1
25.1
1
5.0
0
0
1
45.33
05.01
002
3
2
1
3
2
1
d
d
d
d
d
d
25.1
25.0
5.0
x
x
x
25.1
1
5.0
x
x
x
100
110
5.05.21
3
2
1
3
2
1
The second and third columns are obtained by taking the RHS vector as (0,1,0) and (0,0,1).
A
Iterative Method for solution of Linear systems
• Jacobi iteration
• Gauss - Seidel iteration
Jacobi IterationAX = B
n = 3
a11x1 + a12x2 + a13x3 = b1
a21x1+ a22x2 + a23x3 = b2
a31x1 + a32x2 + a33x3 = b3
Preprocessing :
1. Arrange the equations so that the diagonal terms of the coefficient matrix are not zero.
2. Make row transformation if necessary to make the diagonal elements as large as possible.
Rewrite the equation as
x1 = 1/a11 (b1 - a12x2 - a13x3)
x2 = 1/a22 (b2 - a21x1 - a23x3)
x3 = 1/a33 (b3 - a31x1 - a32x2)
• Choose an initial approximation vector (x0, x1, x2).
If no prior information is available then (x0, x1, x2) = (0, 0, 0) will do.
Iterate : )xAb(
a
1x
ij
kjiji
ii
1ki
, 1 i n
Stop when
100x
xxji
iji
ji
Jacobi Pseudo Code
Diagonally Dominant Matrices
n
ij1j
ijii aa
ie., the diagonal element of each row is larger than the sum of the absolute values of the other elements in the row.
Diagonal dominance is a sufficient but not necessary condition for the convergence of the Jacobi or Gauss-Seidel iteration.
The Gauss-Seidel iteration
• In the Jacobi iteration all the components of the new estimate of the vector 1k
n1k
31k
21k
1 x,...,x,x,x are computed from thecurrent estimate k
nk3
k2
k1 x,...,x,x,x
• However when is computed the updated estimates
are already available.
1kix
1k1i
1k2
1ki x,...,x,x
• The Gauss - Seidel iteration takes advantage of the latest information available where updating an estimate.
Gauss - Seidel Jacobi First Iteration
x1 = (b1 - a12x2 - a13x3) / a11
x2 = (b2 - a21x1 - a23x3) / a22
x3 = (b3 - a31x1 - a32x2) / a33
x1 = (b1 - a12x2 - a13x3) / a11
x2 = (b2 - a21x1 - a23x3) / a22
x3 = (b3 - a31x1 - a23x3) / a33
Second Iteration
x1 = (b1 - a12x2 - a13x3) / a11
x2 = (b2 - a21x1 - a23x3) / a22
x3 = (b3 - a31x1 - a32x2) / a33
x1 = (b1 - a12x2 - a13x3) / a11
x2 = (b2 - a21x1 - a23x3) / a22
x3 = (b3 - a31x1 - a32 x2) / a33
Matrix Condition Number • Vector and Matrix norms
• Norm is a real valued function that gives a measure of size or length of the vector or the matrix.
• Example : Length of the vector A = (a, b, c)
is a norm.
222 cbaA
For the n dimensional vector A = (x1, x2, x3, ... , xn)
For the matrix Aij ,
2n
22
21 x...xxA
n
1i
n
1j
2ijij AA
These are called Euclidean norms.
Other Useful NormsFor Vectors
- (the p - norm)
n
1ii1
xA - (sum of the absolute values
of the element )
p
1pn
1iip
xA
inii
xmaxA
- (Maximum magnitude norm)
For Matrices
n
1iij
njiAmaxA - (Column sum norm)
n
1 jij
n i 1A max A - (row sum norm)
Properties of norms
1. || x || > 0 for x 0
2. || x || = 0 for x = 0
3. || x || = || || . || x ||
4. || x + y || || x || +|| y ||
Ill Conditioning Consider the system :
100
100
x
x
401200
200100
2
1
100
100
x
x
400200
200101
|2
1
The exact solution : x1 = 201, x2 = 100
Consider the slightly perturbed system :
A11 is increased by 1 percent
A22 is decreased by 0.25 percent (approx.)
Solution of the perturbed system :
x1 = 50, x2 = 24.75
Near 75% change in the solution.
When the solution of system, is highly sensitive to the
values of the coefficient matrix or the RHS vector, we
say that the system is ill - conditioned or the matrix is
ill - conditioned
Ill conditioned systems will result in large accumulated
round off errors.
Condition number of a Matrix
k(A) = || A || || A-1 ||
• This is a measure of the relative sensitivity of the solution to
small change in the RHS vector.
• If k is large, the system is regarded as being ill - conditioned
# The row - sum norm is usually employed to find k(A)
Condition Number of the Matrix
401200
200100A
12
201.4A 1
Row - sum norm of A = || A || = Max (300, 601) = 601.
Row - sum norm of A-1 = || A-1 || = Max (6.01, 3) = 6.01.
Condition Number k(A) = 601 (6.01) = 3612 (large).
A is ill - conditioned.
Equations that are not properly scaled may give rise to large condition numbers.
Consider
10001000
11A
0005.05.0
0005.05.0A 1
k (A) = || A || || A-1 || = 2000 × 0.0005 = 1001(large)
Equilibration : Scale each row of A by its largest element.
11
11A
5.05.0
5.05.0A 1
K(A) = || A || || A-1 || = 2(1) = 2s
Great Improvement in the condition number.
• Checking the answer is not enough when the condition number is high
• Consider
34
21
y
x
8955
5534
x = - 0.11 and y = 0.45 given
34 (-0.11) + 55 (0.45) - 21 = 0.01 0
55 (-0.11) + 89 (0.45) - 34 = 0.00 0
Interpolation
×
× ×
P1
P2
P3P4y
xGiven the data :
(xk , yk) , k = 1, 2, 3, ... , n ,
find a function f passing through each of the data samples.
f(xk) = yk , 1 k n
Once f is determined, we can use it to predict the value of y at points other than the samples.
×
Piecewise Linear Interpolation• A straight line segment is used between each adjacent pair of the data points.
x1 x2 x3 x4 x5
××
××
×
k1kk1k
kkk yy
xx
xxykf
1 k n
Polynomial Interpolation
For the data set (xk , yk), k = 1, ... , n ,
we find a polynomial of degree (n - 1) that has n constants :
n
1k
1kk xaxf
The n interpolation constraints f(xk) = yk gives
n
2
1
n
2
1
1nn
2nn
1n2
222
1n1
211
y
y
y
a
a
a
xxx1
xxx1
xxx1
• Not feasible for large data sets, since the condition number increases rapidly with increasing n.
Lagrange Interpolating PolynomialThe Lagrange interpolating polynomial of degree k, Qk(x) is constructed as follows :
)xx)...(xx)(xx(
)xx)...(xx(xxxQ
n13121
n321
.
.
.
)xx)(xx)(xx)...(xx)(xx(
)xx)...(xx)(xx)...(xx(xxxQ
nk1kk1kk3k1k
n1k1k21k
)xx)...(xx)(xx(
)xx)...(xx(xxxQ
n23212
n312
• Each Qk(x) is a polynomial of degree (n - 1)
• Qk (xj) = 1 , j = k
= 0 , j k
• The polynomial curve that passes through the data set (xk, yk), k = 1, 2, ..., n is
f(x) = y1Q1(x) + y2Q2(x) + ... +yn Qn(x)
• Polynomial is written directly without having to solve a system of equations.
Lagrange interpolations (Pseudo Code)Choose x, the point where the function value is required
y = 0
Do for i = 1 to n
product = f1
Do for j = 1 to n
if ( i j )
product = product (x - xj)/ (xi - xj)
end if
end Do
y = y + product
End Do
Least Square Fit
If the number of sample is large or if the dependent variable contains measurement noise, it is often letter to find a function which minimizes an error criterion such as
A function that minimizes E is called the Least Square Fit
n
1k
2kk y)x(fE
Straight Line Fit• To fit a straight line through the n points (x1, y1), (x2, y2), ... ,(xn, yn)
• Assume f(x) = a1 + a2x
Error 2n
1kkk y)x(fE
2n
1kkk21 yxaa
• Find a1, a2 which minimizes E.
•
0x)yxaa(2a
E n
1kkkk21
2
0)yxaa(2a
E n
1kkk21
1
kk
k
2
12kk
k
yx
y
a
a
xx
xx
Straight Line Fit (An example )Fit a straight line through the five points
(0, 2.10), (1, 2.83), (2, 1.10), (3, 3.20), (4, 3.90)
a11 = n = 5
3016941xa 2k22
043210xa k12
15.1320.310.185.210.2yb k1
a21 = a12
25.30)90.3(4)20.3(3)10.1(285.2yxb kk2
25.30
15.13
a
a
3010
105
2
1
a1 = 1.84 , a2 = 0.395 , f(x) = 1.84 +0.395x
Data Representation
Integers - Fixed Point Numbers
Decimal System - base 10 uses 0,1,2,…,9
(396)10 = (6 100) + (9 101) + (3 102) = (396)10
Binary System - base 2 uses 0,1
(11001)2 = (1 20) + (0 21) + (0 22) + (1 23) + (1 24) = (25)10
Decimal to Binary ConversionConvert (39)10 to binary form
base = 2
3922 19 + Remainder 1 2 9 + Remainder 1 2 4 + Remainder 1
2 2 + Remainder 0
2 1 + Remainder 0
0 + Remainder 1
Put the remainder in reverse order
(100111)2 = (1 20 ) + (1 21) + (1 22) + (0 23) + (0 24)
+ (1 25) = (39)10
Largest number that can be stored in m-digits base - 10 : (99999…9) = 10m - 1
base - 2 : (11111…1) = 2m - 1
m = 3 (999) = 103 - 1
(111) = 23 - 1
Limitation : Memory cells consist of 8 bits(1byte) multiples each position containing 1 binary digit.
• Common cell lengths for integers : k = 16 bits
k = 32 bits
Sign - Magnitude Notation
First bit is used for a sign
0 - non negative number 1 - negative number
The remaining bits are used to store the binary magnitude of the number.
Limit of 16 bit cell : (32767)10
Limit of 32 bit cell : (2 147 483 647)10
Two’s Compliment notation
Definition : The two’s complement of a negative integer I in a
k - bit cell :
Two’s Compliment of I = 2k + I
(Eg) : Two’s Complement of (-3)10 in a 3 - bit cell
= (2 3 - 3)10 = (5)10 = (101)2
(-3)10 will be stored as 101
Storage Scheme for storing an integer I in a k - bit cell in Two’ Compliment notation
Stored Value C =
I , I 0 , first bit = 0
2k + I , I < 0
The Two’s Compliment notation admits one more negative number than the sign - magnitude notation.
(Eg) Take a 2 bit cell (k = 2)
Range in Sign - magnitude notation : 21 - 1 = 1
-1 = 11
1 = 01
Range in Two’s Compliment notation
Two’s Compliment of -1 = 22-1 = (3)10 = (11)2
Two’s Compliment of -2 = 22- 2 = (2)10 = (10)2
Two’s Compliment of -3 = 22 - 2 = 0 - Not possible
Floating Point Numbers
Integer Part + Fractional Part
Decimal System - base 10 235 . 7846
Binary System - base 2 10011 . 11101
410
6
310
4
210
8
10
7Fractional Part (0.7846)10
Fractional Part (0.11101)2 52
1
42
0
32
1
22
1
2
1
Binary Fraction Decimal Fraction
(10.11)
Integer Part (10)2 = 0.20 + 1.21 = 2
Fractional Part (11)2 = 75.025.05.022
1
2
1
Decimal Fraction = ( 2.75 )10
Decimal Fraction Binary Fraction
Convert (0.9)10 to binary fraction
0.9 2
0.8 + integer part 1 2 0.6 + integer part 1 2
0.2 + integer part 1 2
0.4 + integer part 0 2
0.8 + integer part 0 Repetition
(0.9)10 = (0.11100)2
Scientific Notation (Decimal)
0.0000747 = 7.47 10-5
31.4159265 = 3.14159265 10
9,700,000,000 = 9.7 109
Binary
(10.01)2 = (1.001)2 21
(0.110)2 = (1.10)2 2-1
Computer stores a binary approximation to x
x q 2n
q - mantissa , n exponent
(-39.9)10 = (-100111.1 1100)2
= (-1.0001111 1100)2 (25)10
Decimal Value of stored number (-39.9)10
= -1. 001111 1100 1100 1100 11001
23 bit
32 bits : First bit for sign
Next 8 bits for exponent
23 bits for mantissa
= -39. 900001525 878 906 25
Round of Errors can be reduced by Efficient Programming Practice
# The number of operations (multiplications and additions ) must be kept minimum. (Complexity theory)
An Example of Efficient Programming
Problem : Evaluate the value of the Polynomial.
P(x) = a4x4 + a3x3 + a2x2 + a1x + a0
at a given x.
Requires 13 mulitiplications and 4 additions.
Bad Programme !
An Efficient method (Horner’s method)
P(x) = a4x4 + a3x3 + a2x2 + a1x + a0
= a0 + x(a1 + x(a2 + x(a3 + xa4) ))
Requires 4 multiplications and 4 additions.
Psuedo code for an nth degree polynomial
Input a’s, n,x
p = 0 For i = n, n-1, … , 0
p = xP + a(i)
End
Numerical Integration
• When do we need numerical integration ?• Antiderivate of f(x) difficult to find.• f(x) is available only in a tabular form
b
a
f(x)dx :evaluate To
Trapezoidal Rule
X0 X0+hX0-h
b
a
bfhnafhafhafafh
dxxf ))()]1(....)2()([2)((2
)(
Simpson’s rule
b
a
bffhafhaffhafhafafh
dxxf )]()}(.............)4()2({2)}(........)3()({4)([3
)(
a+h a+2ha
EULER’S METHOD-Pseudocode
:SolutionExact
1)0(,'
2
2x
ey
yxyy
Euler’s Method-Pseudo code
end
xy,print
x value theupdate,
loopsolution1
condition initial,0,1
size step,/1
steps ofnumber , :Input
hxx
yxyy
ntoiFor
xy
nh
n
A Second order method:Taylor Series method
0
0
0
0,00
00
00
)(''
)),((''
:calculated becan '',At
)(',At
; , At
,,'
dx
dy
y
f
x
fxy
dx
dy
y
f
x
fyxf
dx
dy
yx
yxfyx
yyx
yxyyxfy
Taylor’s method
..........,)2( toproceed and
thisfrom)( Calculate
).(''2
)(')()(
0
0
0
2
000
hxy
hxy
xyh
xyhxyhxy
Taylor Series Method:An example
1
'
)0(''
;0)0('
1)0(,'
0
0
yxy
dx
dy
y
f
x
fy
y
yxyy
Taylor Series solution at x = 0.1
005.1
005.1
)1(2
)1.0(01.01
)(''2
)(')()(
1.0
2
)1.0(
2
0
2
000
2
eysolutionExact
xyh
xhyxyhxy
h
Finite Difference Method forBoundary Value Problems
2
)()(2)()(''
2
)()()('
h
xyxyxyxy
h
xyxyxy
xxx
hiihii
hihii
hiihi