Download - Floating Point Numbers

Floating Point Numbers

It's all just 1s and 0s

Computers are fundamentally driven by logic and thus bits of data Manipulation of bits can be done incredibly

quickly

Given n bits of information, there are 2n possible combinationsThese 2n representations can encode pretty much anything you want, letters, numbers, instructions….

Bases of number systems

Base 10 numbers: 0,1,2,3,4,5,6,7,8,9 3107 = 3103 +1102 + 0101 +7100

Base 2 numbers: 0,1 3107 = 1 2 4 8 16 32 64 128 256 512 1024

2048 =1211 + 1210 + 029 + 028 + 027 + 026 + 125

+ 024 + 023 + 022 + 121 + 120

=110000100011

Addition, multiplication etc, all proceed same way

Base Notation

What does 10 mean? 10 in binary = 2 decimal 10 in octal (base 8) = 8 decimal 10 in decimal = 10 decimal

Need some method of differentiating between these possibilitiesTo avoid confusion, where necessary we write 1010= 102=

Integer Representation

Integers obviously fit into this base 2 notationsRemains challenge to represent negative numbers 2s complement Excess-N

Extra choice is order of bitsChoice is made chip-by-chip portability

Floating Point Representation

Computers represent oating point numbers in binary form

For generality, they use a binary form of scientic notation

329.25 = 0.2925 10

In binary, we can use powers of 2

29.25

Floating Point Size

In IEEE.h IEEE.h:#define IEEE_FLOAT_SIZE 4 IEEE.h:#define IEEE_DOUBLE_SIZE 8 IEEE.h:#define IEEE_QUAD_SIZE 16

Distribution

Precision

# bits

MantissaBits

Expon.Bits

SignBit

Single 32 23 8 1

Double 64 52 11 1

In Decimal Terms

Each binary floating point double holds roughly 16 decimal digits technically, 2^(-52)

MATLAB example

Advantages

Scientific notation can work on any scale (all handled by exponent)So long as errors are small relative to scale of data values, calculations are accurate right?

Example 1

1e12 + 0.2 – 1e12

Problem

Nice decimal numbers (0.2) have continuing binary representations like 1/3 = 0.3333333, 0.2 has binary

0.0011 0011 0011 0011…

Analogy with adding, subtracting large number

Roundoff Error

Round-off error will always be present e.g. Roundoff error is more significant when you are subtracting two almost equal quantitiese.g in decimal, 255.67 – 255.69

Example 2

A = 112000000 B = 100000 C = 0.0009 X = A - B / C

Common occurrence

Delta x in finite element methods numerical differentiation

Places where more closely packed data gives

Example 3: Numerical Diff.

Example 4: Recursion

Comparing sum of delta x and real sum t = 0; N = 10000; dx = 1/N; for (I = 1:N)

t = t + dx; end

Avoiding (Large) Roundoff Error

Avoid substracting almost-equal quantitiesAvoid dividing by small quantitiesAvoid sums over large loops, especially with different orders of magnitude in the sumAvoid recursive calculations, where errors will accumulate

Download - Floating Point Numbers

Top Related