Download - Floating Point Numbers
Floating Point Numbers
It's all just 1s and 0s
Computers are fundamentally driven by logic and thus bits of data Manipulation of bits can be done incredibly
quickly
Given n bits of information, there are 2n possible combinationsThese 2n representations can encode pretty much anything you want, letters, numbers, instructions….
Bases of number systems
Base 10 numbers: 0,1,2,3,4,5,6,7,8,9 3107 = 3103 +1102 + 0101 +7100
Base 2 numbers: 0,1 3107 = 1 2 4 8 16 32 64 128 256 512 1024
2048 =1211 + 1210 + 029 + 028 + 027 + 026 + 125
+ 024 + 023 + 022 + 121 + 120
=110000100011
Addition, multiplication etc, all proceed same way
Base Notation
What does 10 mean? 10 in binary = 2 decimal 10 in octal (base 8) = 8 decimal 10 in decimal = 10 decimal
Need some method of differentiating between these possibilitiesTo avoid confusion, where necessary we write 1010= 102=
Integer Representation
Integers obviously fit into this base 2 notationsRemains challenge to represent negative numbers 2s complement Excess-N
Extra choice is order of bitsChoice is made chip-by-chip portability
Floating Point Representation
Computers represent oating point numbers in binary form
For generality, they use a binary form of scientic notation
329.25 = 0.2925 10
In binary, we can use powers of 2
29.25
Floating Point Size
In IEEE.h IEEE.h:#define IEEE_FLOAT_SIZE 4 IEEE.h:#define IEEE_DOUBLE_SIZE 8 IEEE.h:#define IEEE_QUAD_SIZE 16
Distribution
Precision
# bits
MantissaBits
Expon.Bits
SignBit
Single 32 23 8 1
Double 64 52 11 1
In Decimal Terms
Each binary floating point double holds roughly 16 decimal digits technically, 2^(-52)
MATLAB example
Advantages
Scientific notation can work on any scale (all handled by exponent)So long as errors are small relative to scale of data values, calculations are accurate right?
Example 1
1e12 + 0.2 – 1e12
Problem
Nice decimal numbers (0.2) have continuing binary representations like 1/3 = 0.3333333, 0.2 has binary
0.0011 0011 0011 0011…
Analogy with adding, subtracting large number
Roundoff Error
Round-off error will always be present e.g. Roundoff error is more significant when you are subtracting two almost equal quantitiese.g in decimal, 255.67 – 255.69
Example 2
A = 112000000 B = 100000 C = 0.0009 X = A - B / C
Common occurrence
Delta x in finite element methods numerical differentiation
Places where more closely packed data gives
Example 3: Numerical Diff.
Example 4: Recursion
Comparing sum of delta x and real sum t = 0; N = 10000; dx = 1/N; for (I = 1:N)
t = t + dx; end
Avoiding (Large) Roundoff Error
Avoid substracting almost-equal quantitiesAvoid dividing by small quantitiesAvoid sums over large loops, especially with different orders of magnitude in the sumAvoid recursive calculations, where errors will accumulate