floating point representations

Floating Point Floating Point RepresentationsRepresentations

CDA 3101 CDA 3101

Discussion Session 02Discussion Session 02

Question 1Question 1• Converting the binary number1010 0100 1001 0010 0100 1001 0010 01002

to decimal, if the binary is

Unsigned? 2’s complement? Single precision floating-point?

Question 1.1Question 1.1• Converting bin (unsigned) to dec 1010 0100 1001 0010 0100 1001 0010 01002

1*231 + 1*229 + … + 1*28 + 1*25 + 1*22

= 2761050404

Question 1.2Question 1.2• Converting bin (2’s complement) to dec 1010 0100 1001 0010 0100 1001 0010 01002

-1*231 + 1*229 + … + 1*28 + 1*25 + 1*22

= -1533916892

Question 1.3Question 1.3• Converting bin (Single precision FP) to dec

1010 0100 1001 0010 0100 1001 0010 01002

Sign bit : 1

Exponent : 01001001 = 73

Fraction : 00100100100100100100100 =1*2-3 + 1*2-6 + … + 1*2-15 + 1*2-18 + 1*2-21

=0.142857074

(-1)S * (1.Fraction) * 2(Exponent - 127)

=(-1)1 * (1.142857074) * 2(73 - 127)

=-1.142857074 * 2-54

=-6.344131187 * 10-17

S(1) Biased Exponent(8) Fraction (23)

Question 2Question 2• Show the IEEE 754 binary representation

for the floating-point number 0.110 in single precision and double precision

Question 2.1Question 2.1• Converting 0.110 to single-precision FP

Step1: Covert fraction 0.1 to binary (multiplying by 2)0.1*2 = 0.2, 0.2*2 = 0.4, 0.4*2 = 0.8, 0.8*2 = 1.6, 0.6*2 = 1.2, 0.2*2 = 0.4, 0.4*2 = 0.8, 0.8*2 = 1.6, 0.6*2 = 1.2, … 000110011…

1.10011… * 2-4

Step2: Express in single precision format(-1)S * (1.Fraction) * 2(Exponent +127)

=(-1)0 * (1.10011001100110011001100) * 2(-4+127)

0 01111011 10011001100110011001100

Question 2.2Question 2.2• Converting 0.110 to double-precision FP

Step1: Covert fraction 0.1 to binary (multiplying by 2)0.1*2 = 0.2, 0.2*2 = 0.4, 0.4*2 = 0.8, 0.8*2 = 1.6, 0.6*2 = 1.2, 0.2*2 = 0.4, 0.4*2 = 0.8, 0.8*2 = 1.6, 0.6*2 = 1.2, … 000110011…

1.10011… * 2-4

Step2: Express in double precision format(-1)S * (1.Fraction) * 2(Exponent +1023)

=(-1)0 * (1.1001100110011001100110) * 2(-4+1023)

0 01111111011 1001100110011001100110011001100110011001100110011001

Question 3Question 3• Convert the following single-precision

numbers into decimala. 0 11111111 0000000000000000000000b. 0 00000000 0000000000000000000010

Question 3.1Question 3.1• Converting bin (Single precision FP) to dec 0 11111111 000000000000000000000002

Sign bit : 0 Exponent : 11111111 = Infinity Fraction : 00000000000000000000000 = 0

Infinity


Question 3.2Question 3.2• Converting bin (Single precision FP) to dec 0 00000000 000000000000000000000102

Sign bit : 0 Exponent : 00000000 = 0 Fraction : 00000000000000000000010 =1*2-22

=0.000000238

(-1)S * (0.Fraction) * 2-126

=(-1)0 * (0.000000238) * 2-126

= 2.797676555 * 10-45


Question 4Question 4• Consider the 80-bit extended-precision IEEE

754 floating point standard that uses 1 bit for the sign, 16 bits for the biased exponent and 63 bits for the fraction (f). Then, write (i) the 80- bit extended-precision floating point representation in binary and (ii) the corresponding value in base-10 positional (decimal) system of

a. the third smallest positive normalized numberb. the largest (farthest from zero) negative

normalized number c. the third smallest positive denormalized

number that can be represented.

Question 4.1Question 4.1

• The third smallest positive normalized numberBias: 215-1 = 32767

Sign: 0Biased Exponent: 0000 0000 0000 0001Fraction (f): 61 zeros followed by 10Decimal Value: (-1)0*2(1-32767)*(1+2-62) = 2-32766+2-32828

Question 4.2Question 4.2• The largest (farthest from zero)

negative normalized number Sign: 1Biased Exponent: 1111 1111 1111 1110Fraction: 63 onesDecimal Value: (-1)1*2(65534-32767)*(1+2-1+2-2+…+2-63) = -232767(264-1)2-63 = -232768 (approx.)

Question 4.3Question 4.3• The third smallest positive

denormalized number Sign: 0Biased Exponent: 0000 0000 0000 0000Fraction: 61 zeros followed by 11Decimal Value: (-1)0*2-32766*(2-62+2-63) = 3*2-32829

floating point representations

Documents