floating point representations
DESCRIPTION
Floating Point Representations. CDA 3101 Discussion Session 0 2. Question 1. Converting the binary number 1010 0100 1001 0010 0100 1001 0010 0100 2 to decimal, if the binary is Unsigned? 2 ’ s complement? Single precision floating-point?. Question 1 .1. - PowerPoint PPT PresentationTRANSCRIPT
![Page 1: Floating Point Representations](https://reader036.vdocuments.mx/reader036/viewer/2022082819/568139db550346895da18f53/html5/thumbnails/1.jpg)
Floating Point Floating Point RepresentationsRepresentations
CDA 3101 CDA 3101
Discussion Session 02Discussion Session 02
![Page 2: Floating Point Representations](https://reader036.vdocuments.mx/reader036/viewer/2022082819/568139db550346895da18f53/html5/thumbnails/2.jpg)
Question 1Question 1• Converting the binary number1010 0100 1001 0010 0100 1001 0010 01002
to decimal, if the binary is
Unsigned? 2’s complement? Single precision floating-point?
![Page 3: Floating Point Representations](https://reader036.vdocuments.mx/reader036/viewer/2022082819/568139db550346895da18f53/html5/thumbnails/3.jpg)
Question 1.1Question 1.1• Converting bin (unsigned) to dec 1010 0100 1001 0010 0100 1001 0010 01002
1*231 + 1*229 + … + 1*28 + 1*25 + 1*22
= 2761050404
![Page 4: Floating Point Representations](https://reader036.vdocuments.mx/reader036/viewer/2022082819/568139db550346895da18f53/html5/thumbnails/4.jpg)
Question 1.2Question 1.2• Converting bin (2’s complement) to dec 1010 0100 1001 0010 0100 1001 0010 01002
-1*231 + 1*229 + … + 1*28 + 1*25 + 1*22
= -1533916892
![Page 5: Floating Point Representations](https://reader036.vdocuments.mx/reader036/viewer/2022082819/568139db550346895da18f53/html5/thumbnails/5.jpg)
Question 1.3Question 1.3• Converting bin (Single precision FP) to dec
1010 0100 1001 0010 0100 1001 0010 01002
Sign bit : 1
Exponent : 01001001 = 73
Fraction : 00100100100100100100100 =1*2-3 + 1*2-6 + … + 1*2-15 + 1*2-18 + 1*2-21
=0.142857074
(-1)S * (1.Fraction) * 2(Exponent - 127)
=(-1)1 * (1.142857074) * 2(73 - 127)
=-1.142857074 * 2-54
=-6.344131187 * 10-17
S(1) Biased Exponent(8) Fraction (23)
![Page 6: Floating Point Representations](https://reader036.vdocuments.mx/reader036/viewer/2022082819/568139db550346895da18f53/html5/thumbnails/6.jpg)
Question 2Question 2• Show the IEEE 754 binary representation
for the floating-point number 0.110 in single precision and double precision
![Page 7: Floating Point Representations](https://reader036.vdocuments.mx/reader036/viewer/2022082819/568139db550346895da18f53/html5/thumbnails/7.jpg)
Question 2.1Question 2.1• Converting 0.110 to single-precision FP
Step1: Covert fraction 0.1 to binary (multiplying by 2)0.1*2 = 0.2, 0.2*2 = 0.4, 0.4*2 = 0.8, 0.8*2 = 1.6, 0.6*2 = 1.2, 0.2*2 = 0.4, 0.4*2 = 0.8, 0.8*2 = 1.6, 0.6*2 = 1.2, … 000110011…
1.10011… * 2-4
Step2: Express in single precision format(-1)S * (1.Fraction) * 2(Exponent +127)
=(-1)0 * (1.10011001100110011001100) * 2(-4+127)
0 01111011 10011001100110011001100
![Page 8: Floating Point Representations](https://reader036.vdocuments.mx/reader036/viewer/2022082819/568139db550346895da18f53/html5/thumbnails/8.jpg)
Question 2.2Question 2.2• Converting 0.110 to double-precision FP
Step1: Covert fraction 0.1 to binary (multiplying by 2)0.1*2 = 0.2, 0.2*2 = 0.4, 0.4*2 = 0.8, 0.8*2 = 1.6, 0.6*2 = 1.2, 0.2*2 = 0.4, 0.4*2 = 0.8, 0.8*2 = 1.6, 0.6*2 = 1.2, … 000110011…
1.10011… * 2-4
Step2: Express in double precision format(-1)S * (1.Fraction) * 2(Exponent +1023)
=(-1)0 * (1.1001100110011001100110) * 2(-4+1023)
0 01111111011 1001100110011001100110011001100110011001100110011001
![Page 9: Floating Point Representations](https://reader036.vdocuments.mx/reader036/viewer/2022082819/568139db550346895da18f53/html5/thumbnails/9.jpg)
Question 3Question 3• Convert the following single-precision
numbers into decimala. 0 11111111 0000000000000000000000b. 0 00000000 0000000000000000000010
![Page 10: Floating Point Representations](https://reader036.vdocuments.mx/reader036/viewer/2022082819/568139db550346895da18f53/html5/thumbnails/10.jpg)
Question 3.1Question 3.1• Converting bin (Single precision FP) to dec 0 11111111 000000000000000000000002
Sign bit : 0 Exponent : 11111111 = Infinity Fraction : 00000000000000000000000 = 0
Infinity
S(1) Biased Exponent(8) Fraction (23)
![Page 11: Floating Point Representations](https://reader036.vdocuments.mx/reader036/viewer/2022082819/568139db550346895da18f53/html5/thumbnails/11.jpg)
Question 3.2Question 3.2• Converting bin (Single precision FP) to dec 0 00000000 000000000000000000000102
Sign bit : 0 Exponent : 00000000 = 0 Fraction : 00000000000000000000010 =1*2-22
=0.000000238
(-1)S * (0.Fraction) * 2-126
=(-1)0 * (0.000000238) * 2-126
= 2.797676555 * 10-45
S(1) Biased Exponent(8) Fraction (23)
![Page 12: Floating Point Representations](https://reader036.vdocuments.mx/reader036/viewer/2022082819/568139db550346895da18f53/html5/thumbnails/12.jpg)
Question 4Question 4• Consider the 80-bit extended-precision IEEE
754 floating point standard that uses 1 bit for the sign, 16 bits for the biased exponent and 63 bits for the fraction (f). Then, write (i) the 80- bit extended-precision floating point representation in binary and (ii) the corresponding value in base-10 positional (decimal) system of
a. the third smallest positive normalized numberb. the largest (farthest from zero) negative
normalized number c. the third smallest positive denormalized
number that can be represented.
![Page 13: Floating Point Representations](https://reader036.vdocuments.mx/reader036/viewer/2022082819/568139db550346895da18f53/html5/thumbnails/13.jpg)
Question 4.1Question 4.1
• The third smallest positive normalized numberBias: 215-1 = 32767
Sign: 0Biased Exponent: 0000 0000 0000 0001Fraction (f): 61 zeros followed by 10Decimal Value: (-1)0*2(1-32767)*(1+2-62) = 2-32766+2-32828
![Page 14: Floating Point Representations](https://reader036.vdocuments.mx/reader036/viewer/2022082819/568139db550346895da18f53/html5/thumbnails/14.jpg)
Question 4.2Question 4.2• The largest (farthest from zero)
negative normalized number Sign: 1Biased Exponent: 1111 1111 1111 1110Fraction: 63 onesDecimal Value: (-1)1*2(65534-32767)*(1+2-1+2-2+…+2-63) = -232767(264-1)2-63 = -232768 (approx.)
![Page 15: Floating Point Representations](https://reader036.vdocuments.mx/reader036/viewer/2022082819/568139db550346895da18f53/html5/thumbnails/15.jpg)
Question 4.3Question 4.3• The third smallest positive
denormalized number Sign: 0Biased Exponent: 0000 0000 0000 0000Fraction: 61 zeros followed by 11Decimal Value: (-1)0*2-32766*(2-62+2-63) = 3*2-32829