floating point. agenda history basic terms general representation of floating point constructing...

Floating Point

Agenda

History Basic Terms General representation of floating point Constructing a simple floating point

representation Floating Point Arithmetic The IEEE-754 Floating-Point Standard Range, Precision, and Accuracy

History The first floating point representation

was firstly used in “V1” machine (1945). It had 7-bit exponent, 16-bit mantissa, and a sign bit.

In 1954, floating point representation was used by IBM for the modern computing system.

In 1962, the UNIVAC 1100/2200 series was introduced. It contains single precision and double precision.

Basic Terms Scientific notation: A notation that renders numbers

with a single digit to the left of the decimal point. Normalized: A number in floating-point notation that

has no leading 0s. Floating point: Computer arithmetic that represents

numbers in which the binary point is not fixed. Fraction: The value, between 0 and 1, placed in the

fraction field of the floating point. Exponent: In the numerical representation system of

floating-point arithmetic, the value that is placed in the exponent field.

General representation of floating point

Constructing a simple floating point

representation We will use 14-bit model: 1 sign bit, 5-bit

exponent, and 8-bit significand. For example, storing a decimal number

17 into this model. In decimal we can say, 17 = 0.17 x 10^2 But, in order to construct a floating point

representation we have to convert it into binary.

17 (decimal) = 10001 ( binary) 10001 = 0.10001 x 2^5 Then, we can now construct its

representation

0 00101 1000100

1bit 5 bits 8 bits

sign field:

0 : positive value

1 : negative value

What if we want to store a negative exponent value?

The previous example can’t handle this problem, thus we could fix that by using biased exponent.

For example, if we want to store 0.25, we will have 0.1 x 2^-1

We can fix this by using excess-16 representation. So that we add 16 to the negative exponent (-1 + 16 = 15).

0 01111 1000000

We don’t have a unique representation for each number.

0 11000 00010001

0 10111 00100010

0 10110 01000100

0 10101 10001000

= 17

Another problem using this method

Remedy

This problem can be fixed by normalization.

Normalization is a convention that the leftmost bit of the significand must always be 1. So that we only have

for decimal value 17.

0 01111 1000000

Floating Point Arithmetic

Addition

11.001000

0.10011010

11.10111010

0 10010 11001000

0 10000 10011010

0 10010 11101110

0 10010 11001000

0 10000 10011010

0 10001 11110000

Some other problems in floating point arithmetic

Division by zero. Overflow, if the result is greater in

magnitude than the given storage. Underflow, if the result is smaller in

magnitude than the given storage.

The IEEE-754 Floating-Point Standard

This was first introduced in 1985. This type of floating point includes two

formats: single precision and double precision.

The standard defines: arithmetic formats: sets of binary and decimal

floating-point data, which consist of finite numbers, (including negative zero and subnormal numbers), infinities, and special 'not a number' values.

interchange formats: encodings (bit strings) that may be used to exchange floating-point data in an efficient and compact form

rounding algorithms: methods to be used for rounding numbers during arithmetic and conversions

operations: arithmetic and other operations on arithmetic formats

exception handling: indications of exceptional conditions (such as division by zero, overflow, underflow, etc.)

Single Precision IEEE-754

This representation uses an excess-127 This representation assumes an implied

1 to the left of the radix point, for example we put 1 = 1.0 x 2^(0+127)

1bit 8 bits 23bits

Floating Point Number Single Precision Representation

1.0 0 01111111 00000000000000000000000

0.5 0 10000000 00000000000000000000000

19.5 0 10000011 00111000000000000000000

-3.75 1 10000000 11100000000000000000000

Double Precision IEEE-754

This representation uses an excess-1023

This representation assumes an implied 1 to the left of the radix point, for example we put 1 = 1.0 x 2^(0+127). (same as the single precision)

1bit 11 bits 52 bits

Range, Precision, and Accuracy

Range

In double precision, for example, we have

Negative Expressible Negative Negative Positive Expressible Positive Positive

Overflow Number Underflow Underflow Numbers Overflow

-1.0 x 10^308 -1.0 x 10^-308 0 1.0 x 10^-308 1.0 x 10^308

Accuracyhow close a number is to its true valuefor example, we can’t represent 0.1 in floating point, but we can still find a number in the range that relatively close to 0.1

Precisionhow much information we have about a value and the amount of information used to represent the valuefor example, 1.666 (4 decimal digits of precision) and 1.6660 (5 decimal digits of precision). Thus, the first number is more accurate than the second one.

Thank You

References

Wikipedia:

http://en.wikipedia.org/wiki/IEEE_754 Books:

Computer Organization and Design

by Patterson, D

Computer Organization and Architecture

by Null, Linda

floating point. agenda history basic terms general representation of floating point constructing...

Documents

decimal point

type of floating point

binary point

unique representation

representation sign

decimal number

exponent field

negative exponent value