1/8/2007 - l24 ieee floating point basics copyright 2006 - joanne degroat, ece, osu1 ieee floating...

25
1/8/2007 - L24 IEEE Fl oating Point Basics Copyright 2006 - Joanne DeGroat, ECE, OSU 1 IEEE Floating Point The IEEE Floating Point Standard and execution units for it

Upload: howard-stephen-fleming

Post on 23-Dec-2015

288 views

Category:

Documents


4 download

TRANSCRIPT

Page 1: 1/8/2007 - L24 IEEE Floating Point Basics Copyright 2006 - Joanne DeGroat, ECE, OSU1 IEEE Floating Point The IEEE Floating Point Standard and execution

1/8/2007 - L24 IEEE Floating Point Basics

Copyright 2006 - Joanne DeGroat, ECE, OSU 1

IEEE Floating Point The IEEE Floating Point Standard and execution units for it

Page 2: 1/8/2007 - L24 IEEE Floating Point Basics Copyright 2006 - Joanne DeGroat, ECE, OSU1 IEEE Floating Point The IEEE Floating Point Standard and execution

1/8/2007 - L24 IEEE Floating Point Basics

Copyright 2006 - Joanne DeGroat, ECE, OSU 2

Lecture overview The standard Floating Point Basics A floating point adder design

Page 3: 1/8/2007 - L24 IEEE Floating Point Basics Copyright 2006 - Joanne DeGroat, ECE, OSU1 IEEE Floating Point The IEEE Floating Point Standard and execution

1/8/2007 - L24 IEEE Floating Point Basics

Copyright 2006 - Joanne DeGroat, ECE, OSU 3

The floating point standard Single Precision

Value of bits stored in representation is: If e=255 and f /= 0, then v is NaN regardless of s If e=255 and f = 0, then v = (-1)

s

If 0 < e < 255, then v = (-1)s 2e-127 (1.f) – normalized number

If e = 0 and f /= 0, the v = (-1)s 2-126 (0.f)

Denormalized numbers – allow for graceful underflow

If e = 0 and f = 0 the v = (-1)s 0 (zero)

s e (8-bits) f (23-bits)

Page 4: 1/8/2007 - L24 IEEE Floating Point Basics Copyright 2006 - Joanne DeGroat, ECE, OSU1 IEEE Floating Point The IEEE Floating Point Standard and execution

1/8/2007 - L24 IEEE Floating Point Basics

Copyright 2006 - Joanne DeGroat, ECE, OSU 4

The floating point standard Double Precision

Value of bits in word representation is: If e=2047 and f /= 0, then v is NaN regardless of s If e=2047 and f = 0, then v = (-1)

s

If 0 < e < 2047, then v = (-1)s 2e-1023 (1.f)

– normalized number

If e = 0 and f /= 0, the v = (-1)s 2-1022 (0.f)

Denormalized numbers – allow for graceful underflow

If e = 0 and f = 0 the v = (-1)s 0 (zero)

s e (11-bits) f (52-bits)

Page 5: 1/8/2007 - L24 IEEE Floating Point Basics Copyright 2006 - Joanne DeGroat, ECE, OSU1 IEEE Floating Point The IEEE Floating Point Standard and execution

1/8/2007 - L24 IEEE Floating Point Basics

Copyright 2006 - Joanne DeGroat, ECE, OSU 5

The floating point standard Notes on single and double precision

The leading 1 of the fractional part is not stored for normalized numbers

Representation allows for +0 and -0 indicating direction of 0 (allow determination that might matter if rounding was used)

Denormalized numbers allow graceful underflow towards 0

Page 6: 1/8/2007 - L24 IEEE Floating Point Basics Copyright 2006 - Joanne DeGroat, ECE, OSU1 IEEE Floating Point The IEEE Floating Point Standard and execution

1/8/2007 - L24 IEEE Floating Point Basics

Copyright 2006 - Joanne DeGroat, ECE, OSU 6

Conversion Examples Converting from base 10 to the representation Single precision example Covert 10010

Step 1 – convert to binary - 0110 0100

In a binary representation form of 1.xxx have 0110 0100 = 1.100100 x 26

128 64 32 16 8 4 2 1

0 1 1 0 0 1 0 0

Page 7: 1/8/2007 - L24 IEEE Floating Point Basics Copyright 2006 - Joanne DeGroat, ECE, OSU1 IEEE Floating Point The IEEE Floating Point Standard and execution

1/8/2007 - L24 IEEE Floating Point Basics

Copyright 2006 - Joanne DeGroat, ECE, OSU 7

Conversion Example Continued 1.1001 x 26 is binary for 100 Thus the exponent is a 6

Biased exponent will be 6+127=133 = 1000 0101 Sign will be a 0 for positive Stored fractional part f will be 1001

Thus we have s e f 0 100 0 010 1 1 00 1000…. 4 2 C 8 0 0 0 0 in hexadecimal $42C8 0000 is representation for 100

Page 8: 1/8/2007 - L24 IEEE Floating Point Basics Copyright 2006 - Joanne DeGroat, ECE, OSU1 IEEE Floating Point The IEEE Floating Point Standard and execution

1/8/2007 - L24 IEEE Floating Point Basics

Copyright 2006 - Joanne DeGroat, ECE, OSU 8

Another example Representation for -175

175 = 128 + 32 + 8 + 4 + 2 +1 = 1010 1111 Or 1.0101111 x 27 S = 1 Exponent is 7 +127 = 134 = 1000 0110 Fractional part f = 0101111 Representation 1100 0011 0010 1111 0000 …. Or in Hex $C32F 0000

Page 9: 1/8/2007 - L24 IEEE Floating Point Basics Copyright 2006 - Joanne DeGroat, ECE, OSU1 IEEE Floating Point The IEEE Floating Point Standard and execution

1/8/2007 - L24 IEEE Floating Point Basics

Copyright 2006 - Joanne DeGroat, ECE, OSU 9

Converting back Convert $C32F 0000 into decimal Extract components from

1100 0011 0010 1111 S = 1 Exponent = 1000 0110 = 128+4+2 = 134 unbias 134 – 127 =7 f = 0101111 so mantissa is 1.0101111 Adjust by exponent 1010 1111 (move binary pt 7 places) Or 128+32+15 = 175 Sign is negative so -175

Page 10: 1/8/2007 - L24 IEEE Floating Point Basics Copyright 2006 - Joanne DeGroat, ECE, OSU1 IEEE Floating Point The IEEE Floating Point Standard and execution

1/8/2007 - L24 IEEE Floating Point Basics

Copyright 2006 - Joanne DeGroat, ECE, OSU 10

Another example Convert $41C8 0000 to decimal

0100 0001 1100 1000 0000 …. S is 0 so positive number Exponent 1000 0011 = 128+3=131-127=4 f = 1001 so mantissa is 1.1001 With 4 binary positions have 11001 as final

number or a decimal 25

Page 11: 1/8/2007 - L24 IEEE Floating Point Basics Copyright 2006 - Joanne DeGroat, ECE, OSU1 IEEE Floating Point The IEEE Floating Point Standard and execution

1/8/2007 - L24 IEEE Floating Point Basics

Copyright 2006 - Joanne DeGroat, ECE, OSU 11

Arithmetic with floating point numbers Add op1 $42C8 0000 and op2 $41C8 0000 First divide into component parts

Op1 $42C8 0000 =0100 0010 1100 1000 0000 …. S = 0 E = 1000 0101 = 133 – 127 = 6 Mop1 = 1.10010000…

Op2 $41C8 0000 =0100 0001 1100 1000 0000 …. S = 0 E = 1000 0011 = 131 – 127 = 4 Mop2 = 1.10010000…

Page 12: 1/8/2007 - L24 IEEE Floating Point Basics Copyright 2006 - Joanne DeGroat, ECE, OSU1 IEEE Floating Point The IEEE Floating Point Standard and execution

1/8/2007 - L24 IEEE Floating Point Basics

Copyright 2006 - Joanne DeGroat, ECE, OSU 12

Now add the mantissas But first align the mantissas

Op1 1.1001000…. Op2 1.1001000…. Which is the smaller number

and needs to be aligned Exponent difference between op1 and op2 is 2 So shift op2 by 2 binary places or Op2 becomes 0.0110010000…

Page 13: 1/8/2007 - L24 IEEE Floating Point Basics Copyright 2006 - Joanne DeGroat, ECE, OSU1 IEEE Floating Point The IEEE Floating Point Standard and execution

1/8/2007 - L24 IEEE Floating Point Basics

Copyright 2006 - Joanne DeGroat, ECE, OSU 13

Add Add op1 mantissa with the aligned op2

mantissa 1.1001000000… 0.0110010000… 1.1111010000

Result exponent is 6 Value is 1111101 or 64+32+16+8+4+1=125 Values added were 100 and 25

Page 14: 1/8/2007 - L24 IEEE Floating Point Basics Copyright 2006 - Joanne DeGroat, ECE, OSU1 IEEE Floating Point The IEEE Floating Point Standard and execution

1/8/2007 - L24 IEEE Floating Point Basics

Copyright 2006 - Joanne DeGroat, ECE, OSU 14

Constructing Result Value Sign 0 Exponent 6 E = 1000 0101 = 133 – 127 = 6 Mantissa of Result 1.1111010000 Fractional Part 1111010000….

Constructed Value 0 100 0010 1 111 1010 0000 0000 0000 0000 $4 2 F A 0 0 0 0 (125)

Page 15: 1/8/2007 - L24 IEEE Floating Point Basics Copyright 2006 - Joanne DeGroat, ECE, OSU1 IEEE Floating Point The IEEE Floating Point Standard and execution

1/8/2007 - L24 IEEE Floating Point Basics

Copyright 2006 - Joanne DeGroat, ECE, OSU 15

Floating point representation of 125 Positive so s is 0 Exponent is 6 + 127 = 133 = 1000 0101 Fractional part from mantissa of

1.111101 or 111101 Constructed value

0 1000 0101 111101 00000000000000000 $42FA 0000

Page 16: 1/8/2007 - L24 IEEE Floating Point Basics Copyright 2006 - Joanne DeGroat, ECE, OSU1 IEEE Floating Point The IEEE Floating Point Standard and execution

1/8/2007 - L24 IEEE Floating Point Basics

Copyright 2006 - Joanne DeGroat, ECE, OSU 16

Multiplication example Multiply op1 $42C8 0000 & op2 $41C8 0000 First divide into component parts

Op1 $42C8 0000 =0100 0010 1100 1000 0000 …. S = 0 E = 1000 0101 = 133 – 127 = 6 Mop1 = 1.10010000…

Op2 $41C8 0000 =0100 0001 1100 1000 0000 …. S = 0 E = 1000 0011 = 131 – 127 = 4 Mop2 = 1.10010000…

Page 17: 1/8/2007 - L24 IEEE Floating Point Basics Copyright 2006 - Joanne DeGroat, ECE, OSU1 IEEE Floating Point The IEEE Floating Point Standard and execution

1/8/2007 - L24 IEEE Floating Point Basics

Copyright 2006 - Joanne DeGroat, ECE, OSU 17

Multiplication basics Base 10 example

3x102 * 1.1x102 = 3.3 x 104

Have 2 numbers A x 2ea and B x 2eb Multiply and get result = A*B x 2ea+eb

Page 18: 1/8/2007 - L24 IEEE Floating Point Basics Copyright 2006 - Joanne DeGroat, ECE, OSU1 IEEE Floating Point The IEEE Floating Point Standard and execution

1/8/2007 - L24 IEEE Floating Point Basics

Copyright 2006 - Joanne DeGroat, ECE, OSU 18

So here Have sign of both is + so result is + Exponent addition

Both exponents are biased as stored If you add stored binary exponents you need to

subtract the extra bias or 127 Or using pencil and paper (or powerpoint) can

just add the unbiased exponent of one operand to the other biased exponent

Here have 133 + 4 = 137 = 1000 1001

Page 19: 1/8/2007 - L24 IEEE Floating Point Basics Copyright 2006 - Joanne DeGroat, ECE, OSU1 IEEE Floating Point The IEEE Floating Point Standard and execution

1/8/2007 - L24 IEEE Floating Point Basics

Copyright 2006 - Joanne DeGroat, ECE, OSU 19

The mantissas Do a binary multiplication

1.1001 1.1001 1 1001 1100 1 11001 and add 100111 0001

Adjusting for binary point have 10.01110001

Page 20: 1/8/2007 - L24 IEEE Floating Point Basics Copyright 2006 - Joanne DeGroat, ECE, OSU1 IEEE Floating Point The IEEE Floating Point Standard and execution

1/8/2007 - L24 IEEE Floating Point Basics

Copyright 2006 - Joanne DeGroat, ECE, OSU 20

Final result Exponent is 137 or 10 Mantissa is 10.01110001 Adjusted for exponent 1001 1100 0100 Value is 2048+256+128+64+4 Or 2304+128+68 = 2432 + 68 = 2500 And we were multiplying 100 * 25

Page 21: 1/8/2007 - L24 IEEE Floating Point Basics Copyright 2006 - Joanne DeGroat, ECE, OSU1 IEEE Floating Point The IEEE Floating Point Standard and execution

1/8/2007 - L24 IEEE Floating Point Basics

Copyright 2006 - Joanne DeGroat, ECE, OSU 21

Specification of a FPA Floating Point Add/Subtract Unit Specification

Inputs in IEEE 754 Double Precision Must perform both addition and subtraction Must handle the full floating point standard

Normalized numbers Not a Numbers – NaNs +/- Infinity Denormalized numbers

Page 22: 1/8/2007 - L24 IEEE Floating Point Basics Copyright 2006 - Joanne DeGroat, ECE, OSU1 IEEE Floating Point The IEEE Floating Point Standard and execution

1/8/2007 - L24 IEEE Floating Point Basics

Copyright 2006 - Joanne DeGroat, ECE, OSU 22

Specifications continued Result will be a IEEE 754 Double Precision

representation Unit will correctly handle the invalid operation of

adding + and - = Nan per the standard Unit latches it inputs into registers from parallel

64-bit data busses. There is a separate signal line that indicates the

operation add or subtract

Page 23: 1/8/2007 - L24 IEEE Floating Point Basics Copyright 2006 - Joanne DeGroat, ECE, OSU1 IEEE Floating Point The IEEE Floating Point Standard and execution

1/8/2007 - L24 IEEE Floating Point Basics

Copyright 2006 - Joanne DeGroat, ECE, OSU 23

Specifications continued Outputs

The correctly represented result Flags that are output are

Zero result Overflow to infinity from normalized numbers as inputs NaN result Overshift (result is the larger of the two operands) Denormalized result Inexact (result was rounded) Invalid operation for addition

Page 24: 1/8/2007 - L24 IEEE Floating Point Basics Copyright 2006 - Joanne DeGroat, ECE, OSU1 IEEE Floating Point The IEEE Floating Point Standard and execution

1/8/2007 - L24 IEEE Floating Point Basics

Copyright 2006 - Joanne DeGroat, ECE, OSU 24

High level block diagram Basic architecture interface

Data – 64 bit A,B,& C Busses Control signals – Latch, Add/Sub, Asel, Drive Condition Flags Output – 7 Flag signals Clocks – Phi1 and Phi2 (a 2 phase clocked architecture

Floating Point Adder Unit

Abus Bbus

Cbus Flags

Add/Sub Latch

Asel Drive

Phi1 Phi2

Page 25: 1/8/2007 - L24 IEEE Floating Point Basics Copyright 2006 - Joanne DeGroat, ECE, OSU1 IEEE Floating Point The IEEE Floating Point Standard and execution

1/8/2007 - L24 IEEE Floating Point Basics

Copyright 2006 - Joanne DeGroat, ECE, OSU 25

Start the VHDL The entity interface

In the next lecture