chapter-iii fpga-based design of 32-bit...

58

CHAPTER-III

FPGA-BASED DESIGN OF 32-BIT FPAU

The literature survey in the last chapter shows that since the introduction of FPGA,

research and development has produced dramatic improvements in FPGA speed and

area efficiency, narrowing the gap between FPGAs and ASICs and making FPGAs the

platform of choice for implementing all types of digital circuits. FPGAs hold significant

promise as a fast to market replacement for ASICs in many applications at present time

as FPGAs are configured in less than a second and can often be reconfigured. The

flexibility and reconfigurability of FPGA based design comes at the cost of three major

parameters of delay, area and power. Although a lot of techniques and methods have

been proposed by the research scholars to reduce any of these three parameters but very

rare consolidated work has been reported for developing an approach or methodology to

improve the overall performance of FPGA based digital system design in respect of all

these three parameters.

This chapter covers the basic development stages of FPGA-based design followed by the

basics of 32-bit floating point representation. Then the algorithms along with flow-charts

for design of FPGA based 32-bit FPAU using VHDL to carry out any of operations of

addition/subtraction, multiplication and division are presented. The same is taken as the

base digital system design to carry out the further work.

3.1 Introduction

After the exhaustive literature survey and review in the area of the FPGA based system

design and the work carried out to improve its overall performance, as per the stated

objectives of the research work, some FPGA based comprehensive digital system is to be

designed that shall be taken as the base digital system on which the further work is to be

carried out. For this purpose FPGA-based 32-bit FPAU is designed and presented in this

chapter.

59

The floating point operations have found intensive applications in the various fields for

the requirements for high precious operation due to its great dynamic range, high

precision and easy operation rules. High attention has been paid on the design and

research of the floating point processing units. The requirements for the high-speed

hardware floating point arithmetic units have become more and more exigent with the

increasing requirements for the floating point operations for the high-speed data signal

processing and the scientific operation,. The implementation of the floating point

arithmetic has been very easy and convenient in the floating point high level languages,

but the implementation of the arithmetic by hardware has been very difficult. With the

development of the very large scale integration (VLSI) technology, devices like FPGA

has become the best options for implementing floating hardware arithmetic units because

of their high integration density, low price, high performance and flexible applications

requirements for high precious operation.

Floating-point implementation on FPGAs has been the interest of many researchers. The

use of custom floating-point formats in FPGAs has been investigated in a long series of

work by Belanovic and Leeser (2002), Dido et al (2002) Gaar et al (2002) Liang, Tessier

and Mencer (2003). The earliest work on IEEE floating-point by Fagin and Renard

(1994) focused on single precision although found to be feasible but it was extremely

slow. Eventually, Ligon et al (1998) demonstrated that while FPGAs were uncompetitive

with CPUs in terms of peak FLOPs, they could provide competitive sustained floating-

point performance. Since then, Luo and Martonosi (2000), Belanovic and Leeser (2002),

Tessier and Mencer (2003) and Wang and Nelson (2003) have demonstrated the growing

feasibility of IEEE compliant, single precision floating point arithmetic and other

floating-point formats of approximately same complexity in their variety of works.

Before the algorithms along with flow-charts for design of FPGA based 32-bit FPAU

using VHDL to carry out any of operations of addition/subtraction, multiplication and

division are presented, the basic development stages of FPGA are described in the

following section.

60

3.2 Development Stages of FPGA

Regardless of the final product, FPGA designer has to follow the following four basic

FPGA development stages as shown in Fig.3.1:

i. Design

ii. Simulation

iii. Synthesis

iv. Design Implementation (Place and Route + Bit Stream Generation)

Figure3.1: Development stages of FPGA

3.2.1 Design

The design process involves conversion of requirements into a format that represents the

desired digital function(s). Common design formats are schematic capture, hardware

description language (HDL), or a combination of the two. Each method has its

advantages and disadvantages but HDLs generally offer the greatest design flexibility.

Schematic capture: Schematic capture is a graphical depiction of a digital design and

shows the actual interconnection between each logic gate that produces the desired output

function(s). Many of these logic gate symbols involve proprietary information which is

available to the designer only through the specific vendor’s component library. It makes

61

the design unrecognizable by competitors’ FPGA development tools and makes it vendor

dependent. That means, the entire design process has to be repeated if a different vendor

is used. View- Draw and EASE are examples of schematic capture tools by Viewlogic

and HDL respectively. The main advantage of schematic capture is that the graphical

representation is easy to understand. But an increase in cost and time to reproduce a

design for different vendors due to the design’s proprietary nature are its major

drawbacks.

HDL method: Hardware Description Languages (HDLs) use code to represent digital

functions. “Firmware” often refers to the resulting HDL code. Use of HDLcodes is a

common and popular approach to FPGA design. One can create the source code with any

text editor. HDLs can be generic (supported by multiple simulation and synthesis tool

sets) like Verilog or VHDL (Very High Speed Integrated Circuit HDL), or vendor

specific like Altera’s Hardware Description Language (AHDL), which is only

recognizable by Altera’s design tool set. There are two writing styles for HDL designs:

structural or behavioral.

Structural HDL firmware is the software equivalent of a schematic capture design. Like

schematic capture, a structural design uses vendor specific components to construct the

desired digital functions. It is again vendor dependent and has the same disadvantages.

Behavioral HDL firmware describes digital functions in generic or abstract terms that are

generally vendor independent. This provides enough flexibility for code reuse in different

vendor’s FPGAs with little or no code modification. Behavioral designs have advantages

of its flexibility, time and cost-savings. Only those components are required to be

changed for designs that require vendor specific resources, such as RAM. VHDL and

Verilog are the most popular HDL languages. VHDL files consist of three main parts:

Library declaration

Entity declaration and

Architecture section.

62

An optional heading section, containing pertinent information, such as the designer’s

name, filename, a brief summary of the code, and a revision historyshould also be

included, which otherwise is not required for VHDL.

Library declaration - The library declaration is the first section in the source file.

This is where one places the library and package call-out statements. Libraries

and packages define and store components, define signal types, functions,

procedures, and so forth. Packages and libraries are standardized, such as the

IEEE library, and also defined by a user (designer) or vendor. This section is

considered to be complete once all the libraries and packages are visible.

Entity declaration - The entity declaration section immediately follows the library

declaration. Each entity has an assigned name. Just as the library declaration

section makes libraries and packages visible to the design, the entity section

makes the I/Os visible to other source files and the design and can represent the

I/Os as physical FPGA pins. VHDL designs can contain one source file or a

hierarchy of multiple files. Hierarchical file structures consist of several files

connected through the signals declared in their entities. All entities must be

associated with an architecture section.

Architecture section - The architecture section is the body of the VHDL source

code and contains the circuit description. The libraries, packages and signals work

together to develop the desired functions. Like the entity, each architecture

section must have an assigned name. The format for declaring the architecture is

the reserved word ‘Architecture’ followed by its name. Moreover, signals not

defined in the entity section are defined in this section.

After defining all the design’s signals, the designer is ready to develop the code that

describes the desired functions. The reserved word ‘Begin’ signifies the start of the next

subsection, which combines the concurrent and sequential statements. Concurrent

statements update or change value at anytime. The signal assignment immediately

following the first reserved word. The architecture section closes by using the reserved

word ‘End’ followed by the architecture’s name.

63

3.2.2 Simulate or synthesize

Once the design is complete, there are two options available:

a) simulate and then synthesize

b) synthesize and then simulate.

There is no hard and fast rule stating that one must simulate before synthesis. There are

advantages to each option, and designers must determine which step is most beneficial.

Simulating the design prior to synthesis allows logic errors and design flaws to be

resolved early in the development process. Synthesizing lets the designer resolve

synthesis errors prior to logic errors and design flaws. Ideally, the designer would

perform minimal simulation, leaving the more stringent testing to a code tester. The

original code designer shouldnot test his own code because he is less likely to detect

specific design flaws such as:

1. Misinterpretation of requirements; if the designer misunderstood a requirement, he or

she will test and evaluate the design based on that misunderstanding.

2. It is more difficult for a person to find his own errors. A third party generally tests the

code more rigorously and is more eager to find bugs than the original designer.

Regardless of who performs the simulations, the process is the same.

Simulation is an act of verifying the HDL or graphical digital designs prior to actual

hardware validation. The circuit’s input-signal characteristics are described in HDL or in

graphical terms that are then applied to the design. This lets the code tester observe the

outputs’ behavior. It may be necessary to modify the source code during simulation to

resolve any discrepancies, bugs, or errors. Simulation inputs or stimulus are inputs that

mimic realistic circuit I/Os. Stimulus forces the circuit to operate under various

conditions and states. The greatest benefit of stimulus is the ability to apply a wide range

of both valid and invalid input-signal characteristics, test circuit limits, vary signal

parameters (such as pulse width and frequency), and observe output behavior without

damaging hardware. Generally, it is referred to applying stimulus to the design in the

form of HDL.

64

Some popular simulators are Mentor Graphics’ ModelSim, Aldec’s Riviera, and Altera’s

Quantus II.

There are three levels of simulation:

Register transfer level (RTL)

Functional, and

Gate level.

Each occurs at a specific place in the development process.

RTL follows the design stage; functional follows synthesis and gate level simulation is

executed once the implementation is completed.

Generally, the stimulus developed for the RTL simulation is reusable without

modification for each level of simulation.

3.3.3 Simulation

The initial simulation performed immediately after the design stage is the RTL simulation

and it only verifies the correctness of the logic and no realistic timing information is

available to the simulator. Therefore, no serious timing exists for the design. The only

timing information that can be available to the simulator is tester generated. Like input

stimulus, a tester can insert simulated or injected delays into the original HDL design.

Applying test stimulus to the synthesized or optimized netlist produced by a synthesis

tool is a functional simulation. Optimized netlists produced by non-vendors apply

estimated delays that produce more realistic simulation output results. The main benefit

of performing functional simulation is that it lets the tester verify that the synthesis

process hasn’t changed the design.

Gate-level simulation involves applying stimulus to the netlist created by the

implementation process. All internal timing delays are included in this netlist, which

provides the tester with the most accurate design output. Many third-party simulation

tools can perform gate simulation but not all.

65

Each level of simulation is performed at the appropriate development stage and offers

various benefits. RTL uncovers logic errors, the functional level verifies that the pre- and

post-synthesis designs are equivalent, and the gate level uncovers timing errors. Opting to

omit simulation and testbenching will generally cost the project additional time and

money. Simulation is valuable and as a guideline, at least 2X the number of hours spent

writing the code should be spent developing and testing the code.

3.2.4 Synthesis

Synthesis is the process that reduces and optimizes the HDL or graphical design logic.

Some third-party synthesis tools are available as a part of the FPGA vendor’s complete

development package. Synplicity’s Synplify and Mentor Graphics’ Leonardo Spectrum,

Precision RTL, and Precision Physical are some examples of third-party synthesis tools.

Xilinx offers ISE Project Foundation, which is a complete development application that

includes a synthesis tool. Altera has Quartus II Integrated Synthesis (QIS).

Although some FPGA vendors offer synthesis, they still recommend using a third-party’s

synthesis tools. The synthesis tool must be set up prior to actually synthesizing the

design. The synthesis process takes this information and the user-defined constraints and

produces the output netlist. A constraints file specifies information like the critical signal

paths and clock speeds. Synthesis can begin after completing set-up. General synthesis

flow for tools involves three steps: creating structural element, optimizing, and mapping.

Figure 3.2 shows a synthesis flow diagram.

Figure 3.2 Design Synthesis Flow Diagram

The first step in the synthesis process takes the HDL design and compiles it into

structural elements. The next step involves optimizing the design, making it smaller and

66

faster by removing unnecessary logic and allowing signals to arrive at the inputs or

output faster. The goal of the optimizing process is the make the design perform better

without changing the circuit’s functions. The final step in the synthesis process maps or

associates the design to the vendor specific architecture. The mapping process takes the

design and maps or connects it using the architecture of the specific vendor. This means

that the design connects to vendor specific components such as look-up tables and

registers. The optimized netlist is the output of the synthesis process. This netlist may be

produced in one of several formats. Edif is a general netlist format accepted by most

implementation tools, while ‘.xnf’ format is specific to Xilinx and is only recognized by

Xilinx’s implementation. In addition to the optimized netlist, many synthesis tools like

Synplify will produce a netlist for gate-level simulation and other report files. Stimulus

applied to this netlist instead of the original HDL design produces the functional-level

simulation, which lets the designer verify that the synthesis process hasn’t changed the

design’s functions. At this point, synthesis is complete and ready for the implementation

process. Each FPGA vendor has its own implementation tool, such as Xilinx’s has

Project Navigator and Altera’s has Quartus II’s.

3.2.5 Design implementation (Place and Route+ Bit Stream Generation)

The final stage in the FPGA development process is the design implementation, also

known as place and route (PAR). The placement is done by selecting the optimal position

for each block in a circuit with the basic goal of locating functional blocks such that the

interconnects required to route the signals between them is minimized. As described by

Mak and Hao (2005) it is extremely important to have good placement for FPGA designs

as it directly affects the routability and performance of the design on FPGA. A poor

placement may lead to increased power consumption and lower operating speed. Broadly

FPGA placement algorithms are classified in two categories [Marquardt, Betz and Rose

(2000)]:

67

Routability-driven algorithms, which have the objective of creating a placement

that minimizes the total interconnect.

Timing-driven algorithms, which use timing analysis to identify critical paths

and/or connections to optimize the delay of these connections, in addition to

optimize for routability.

Routing is the last basic step in design methodology which is prior to generating of

bitstream to program the FPGA. It has to use only the prefabricated routing resources

such as wire segments, programmable switches and multiplexers and therefore it is a

tedious process and challenging task to achieve 100% routability.

If the FPGA vendor has a complete development tool that can synthesis the design, little

or no setup is required for PAR. However, if a third-party synthesis tool is used, the

implementation tool must be set up, which involves directing the PAR tool to the

synthesized netlist and possibly a constraint file. The constraint file contains information

such as maximum or minimum timing delays for selected signal(s) and I/O pin

assignments. Pin assignments can be automatic (performed by the tool) or manual

(dictated by the designer). Automatic pin assignment is generally the best option for new

designs, as it lets the tool more effectively route the design without having fixed pin

assignments. It may be necessary to manually assign signals to specific pins to achieve

easy board routing, to provide the minimum signal route for timing-critical signals, or be

compatible with legacy designs.

But regardless of the reason, the designer must make this information available to the

PAR tool, which is done by creating a user constraint file that is used by the PAR tool.

After completing setup, the PAR process can begin. Xilinx’s Foundation or Project

Navigator performs design implementation in three steps: translate, fit, and generate

programming file. Translate, involves verifying that the synthesized net list is consistent

with the selected FPGA architecture and there are no inconsistencies in the constraint file.

Inconsistencies would consist of assigning two different signals to the same pin,

assigning a pin to a power or ground pin, or trying to assign a non-existing design signal

to a pin. In such cases the translate step will fail and the implementation process will be

stopped. Translate errors must be corrected and the translation step must be error free

68

before advancing to step of fit stage. This step involves taking the constraints file and

netlist and distributing the design logic in the selected FPGA. If the design is too large

and requires more resources than the selected device offers, the fitter will fail and halt the

implementation process. To correct this type of error, replace the current FPGA with a

larger one and re synthesize, and repeat the PAR for the design. A successful fit stage is

necessary to proceed to generate the programming file stage. All timing information is

available and many PAR tools will provide the required files necessary for the simulator

to perform a timing simulation.

For downloading the design to FPGA, the bitstream is generated as the final step, which

takes the mapped, placed and routed design as input and generates the logic and

interconnects on the target device to implement the intended logic design and layout. The

finally generated programming filecan be stored in flash memory, PROMs, or directly

into the FPGA. This process is also called Bit Stream Generation. Joint Test Action

Group (JTAG) and third-party programmers like Data I/O are the two programming

methods that are used to store the programming file in memory. The appropriate format

depends on the FPGA vendor, the programming method and the device used to hold the

programming. In addition to the implementation process and creating the programming

file, several output report files are also created, such as a pad file which contains

information such as signal pin assignment, part number, and part speed.

3.3 Floating Point Architecture

Floating point numbers are one possible way of representing real numbers in binary

format; the IEEE 754standard presents two different floating point formats, Binary

interchange format and Decimal interchange format. This paper focuses only on single

precision normalized binary interchange format. Figure 3.3 shows the IEEE 754 single

precision binary format representation; it consists of a one bit sign (S), an eight bit

exponent (E), and a twenty three bit fraction (M) or Mantissa.

69

32 bit Single Precision Floating Point Numbers IEEE standard are stored as:

S EEEEEEEE MMMMMMMMMMMMMMMMMMMMMM

S: Sign – 1 bit

E: Exponent – 8 bits

M: Mantissa – 23 bits Fraction

Figure3.3: IEEE 754 single precision binary format representation

The value of number V:

• If E=255 and F is nonzero, then V= NaN ("Not a Number")

• If E=255 and F is zero and S is 1, then V= - Infinity

• If E=255 and F is zero and S is 0, then V= Infinity

• If 0<E<255 then V= (-1)**S * 2 ** (E-127) * (1.F) (exponent range = -127 to

+128)

• If E=0 and F is nonzero, then V= (-1)**S * 2 ** (-126) * (0.F) ("un-normalized"

values”)

• If E=0 and F is zero and S is 1, then V= - 0

If E=0 and M is zero and S is 0, then V = 0

An extra bit is added to the mantissa to form what is called the significand. If the

exponent is greater than 0 and smaller than 255, and there is 1 in the MSB of the

significand then the number is said to be a normalized number; in this case the real

number is represented by (i)

V = (-1s) * 2 (E - Bias) * (1.M) ------ (i)

Where M = m22 2-1 + m21 2-2 + m20 2-3+…+ m1 2-22+m0 2-23;

Bias = 127.

70

3.4. Algorithms for Floating Point Arithmetic Unit

The following sections describes the algorithms using flow charts for floating point

addition/subtraction, multiplication and division that become the base for writing VHDL

codes for implementation of 32-bit floating point arithmetic unit.

3.4.1 Floating Point Addition / Subtraction

The algorithm for floating point addition is explained through flow chart in Figure 3.4.

While adding the two floating point numbers two cases may arise. Case I: when both the

numbers are of same sign i.e. when both the numbers are either +ve or –ve. In this case

MSB of both the numbers are either 1 or 0. Case II: when both the numbers are of

different sign i.e. when one number is +ve and other number is –ve. In this case the MSB

of one number is 1 and other is 0

Case I: - When both numbers are of same sign

Step 1:- Enter two numbers N1 and N2. E1, S1 and E1, S2 represent exponent and

significand of N1 and N2.

Step 2:- Check if E1 or E2 =’0’. If yes; set hidden bit of N1 or N2 to zero. If not; then

check if E2 > E1.If yes swap N1 and N2 and if E1 > E2; make contents of N1 and N2

same and there is no need to swap.

Step 3:- Calculate difference in exponents d=E1-E2. If d = ‘0’ then there is no need of

shifting the significand. If d is more than ‘0’ say ‘y’ then shift S2 to the right by an

amount ‘y’ and fill the left most bits by zero. Shifting is done through hidden bit.

Step 4:- Amount of shifting i.e. ‘y’ is added to exponent of N2 value. New exponent

value of E2= (previous E2 + ‘y’). Now result is in normalize form because E1 = E2.

Step 5:- Check if N1 and N2 have different sign, if ‘no’;

Step 6:- Add the significands of 24 bits each including hidden bit S=S1+S2.

Step7:- Check if there is carry out in significand addition. If yes; then add ‘1’ to the

exponent value of either E1 or new E2. After addition, shift the overall result of

significand addition to the right by one by making MSB of S as ‘1’ and dropping LSB of

significand.

71

Yes

No Yes

No

Yes No

Yes No

Carry Out Carry out No Carry Out

No Carry Out

If MSB is 1

Figure 3.4: Flow Chart for 32-bit floating point Addition/Subtraction

Step 8:- If there is no carry out in step 6, then previous exponent is the real exponent.

Step 9:- Sign of the result i.e. MSB = MSB of either N1 or N2.

Start

Enter N1 and N2 in Floating Format

Is E1 or E2=0 Set S23 =0 of N1 or N2 i.e. hidden bit

Is E1 or E2=0 Swap N1 and N2

Calculate Difference d=E1-E2

Shift S2 of N2 to right by amount‘d’ and fill left most bit by Zero’s. Shifting is done by Hidden Bit.

Amount of Shifting i.e. ‘d’ is added to the exponent of N2 .New exponent of N2 =D+E2 {Expo N1=Expo N2}. Now result is in

normalized form

Are N1 and N2 havingdifferentsi

gn?

Replace S2 of N2 by 2’s complement

Compute Significand S=S1+S2

Compute Sign=Sign

oflarger number

Compute Sign=Sign ofN1

or N2

Compute Significand S=S1+S2

Discard Carry and shift the result to left until there is ‘1’ at MSB fill least

significant bits by zero. Calculate amount ofshifting say ‘x’

Add 1 to Exponent and Also Shift overall result to right dropping LSB andmaking MSB ‘1’

Previous

Exponent is the realExponent

If MSB is 1, Replace S by 2’s Complement,

otherwise keep S as such

Amount of Shifting is Subtracting from Exponent to produce original exponent .Exponent of result =N1Expo/N2Expo-‘x’

Assemble Result into 32 bit format

72

Step 10:- Assemble result into 32 bit format excluding 24th bit of significand i.e. hidden

bit.

Case II: - When both numbers are of different sign

Step 1, 2, 3 & 4 are same as done in case I.

Step 5:- Check if N1 and N2 have different sign, if ‘Yes’;

Step 6:- Take 2’s complement of S2 and then add it to S1 i.e. S=S1+(2’s complement of

S2).

Step 7:- Check if there is carry out in significand addition. If yes; then discard the carry

and also shift the result to left until there is ‘1’ in MSB and also count the amount of

shifting say ‘z’.

Step 8:- Subtract ‘z’ from exponent value either from E1 or E2. Now the original

exponent is E1-‘z’. Also append the ‘z’ amount of zeros at LSB.

Step 9:- If there is no carry out in step 6 then MSB must be ‘1’ and in this case simply

replace ‘S’ by 2’s complement.

Step 10:- Sign of the result i.e. MSB = Sign of the larger number either MSB of N1or it

can be MSB of N2.

Step 11:- Assemble result into 32 bit format excluding 24th bit of significand i.e. hidden

bit.

In this algorithm three 8-bit comparators, one 24-bit and two 8-bit adders, two 8-bit

subtractors, two shift units and one swap unit are required in the design.

First 8-bit comparator is used to compare the exponent of two numbers. If

exponents of two numbers are equal then there is no need of shifting. Second 8-bit

comparator compares exponent with zero. If the exponent of any number is zero

set the hidden bit of that number zero. Third comparator is required to check

whether the exponent of second number is greater than first number. If the

exponent of second number is greater than first number then the numbers are

swapped.

73

One subtractor is required to compute the difference between the 8 bit exponents

of two numbers. Second subtractor is required in case both the numbers are of

different sign and is used to subtract the carry from exponent if carry appears after

addition of the significands of two numbers.

One 24-bit adder is required to add the 24-bit significands of two numbers. One 8-

bit adder is required,in case both the numbers are of same sign, and is used to add

the carry to the exponent, if carry appears after addition of the significands of two

numbers. Second 8-bit adder is used to add the amount of shifting to the exponent

of smaller number.

One swap unit is required to swap the numbers if N2 is greater than N1. Swapping

is normally done by taking the third variable. Two shift units are required: one is

for shift left and other is for shift right.

3.4.2 Floating Point Multiplication

The algorithm for floating point multiplication is explained through flow chart in Figure

3.5. Let N1 and N2 are normalized operands represented by S1, M1, E1 and S2, M2, E2

as their respective sign bit, mantissa (significand) and exponent. Basically following four

steps are used for floating point multiplication.

1. Multiply signifcands, add exponents, and determine sign

M=M1*M2

E=E1+E2-Bias

S=S1XORS2

2. Normalize Mantissa M (Shift left or right by 1) and update exponent E

3. Round off the result to fit in the available bits

4. Determine exception flags and special values for overflow and underflow.

74

Sign Bit Calculation: The result of multiplication is a negative sign if one of the

multiplied numbers is of a negative value and that can be obtained by XORing the sign of

two inputs.

Exponent Addition is done through unsigned adder for adding the exponent of the first

input to the exponent of the second input and after that subtract the Bias (127) from the

addition result (i.e. E1+E2 - Bias). The result of this stage can be called as intermediate

exponent. Significand Multiplication is donefor multiplying the unsigned significand and

placing the decimal point in the multiplication product. The result of significand

multiplication can be called as intermediate product (IP). The unsigned significand

multiplication is done on 24 bit.

Yes No

Yes No

Yes No

Yes

No

Figure 3.5: Flow Chart for floating point Multiplication

Start


Add E2 from E1 i.e. E=E1-E2-Bias

Multiply M1 and M2 i.e. M=M1*M2

Is M=0 Set Exponent E for zero

Check if M overflows

Right Shift M and Set E=E+1

Left Shift M and Set E=E-1

Check if M is

Check if E overflows

Set Indication for overflow

Assemble Result into 32 bit format with final S, M and E

Compute Sign S=S1 XOR S2

75

3.4.3 Floating Point Division

The algorithm for floating point multiplication is explained through flow chart in Figure

3.6. Let N1 and N2 are normalized operands represented by S1, M1, E1 and S2, M2, E2

as their respective sign bit, mantissa (significand) and exponent. If let us say we consider

x=N1 and d=N2 and the final result q has been taken as “x/d”. Again the following four

steps are used for floating point division.

Yes No

Yes No

Yes No

Yes

No

Figure 3.6: Flow Chart for floating point Division (q = x/d; N1=x and

N2=d)

Start


Subtract E2 from E1 i.e. E=E1-E2

Divide M1 by M2 i.e. M=M1/M2

Is M=0 Set Exponent E for zero

Check if M overflows

Right Shift M and Set E=E+1

Left Shift M and Set E=E-1

Check if M is

Check if E overflows

Set Indication for overflow

Assemble Result q into 32 bit format with final S, M and E

Compute Sign S=S1 XOR S2

76

1. Divide signifcands, subtract exponents, and determine sign

M=M1/M2

E=E1-E2

S=S1XORS2

2. Normalize Mantissa M (Shift left or right by 1) and update exponent E

3. Round off the result to fit in the available bits

4. Determine exception flags and special values

The sign bit calculation, mantissa division, exponent subtraction (no need of bias

subtraction here); rounding off the result to fit in the available bits and normalization is

done in the similar way as has been described for multiplication.

3.5 Design of 32-bit FPAU using VHDL

VHDL is an acronym for VHSIC Hardware Description Language and in this VHSIC is

an abbreviation for Very High Speed Integrated Circuit. It is a hardware descriptive

language that can be used to model a digital system at many levels of abstraction, ranging

from algorithm to the gate level. It can be considered as an integrated amalgamation of

following languages: sequential, concurrent, net-list, timing specifications and waveform

generation. The VHDL, therefore has constructs that enable to express the concurrent or

sequential behavior of digital system with or without timings. It also allows modeling the

system as interconnection of components. All the constructs may be combined for

providing a comprehensive description of the system in single model.

It was developed by the Department of Defense (DoD)in1981. It has the following

capabilities and features that differentiate it from other hardware descriptive languages:

It can be used as communication medium between different Computer Aided

Design (CAD) and Computer Aided Engineering (CAE).

It can be used as an exchange medium between chip vendor and CAD tool user.

It supports flexible design methodologies i.e. top-down, bottom-up or mixed.

77

It supports hierarchy i.e. digital system can modeled as a set of interconnected

components and each component can further be modeled as a set of

interconnected sub-components.

It supports both synchronous and asynchronous timing models.

Model developed using this language is portable as it is an IEEE and ANSI

standard.

It supports three basic different descriptions i.e. structural, data flow and

behavioral.

It is not technology specific.

It can be used to describe library components from different vendors.

It is capable of being synthesized to gate level descriptions.

An entity in VHDL is a hardware abstraction of actual hardware device and to describe

an entity, VHDL provides following five different types of constructs which are called

design units:

i) Entity declaration: It describes the external view of the entity and an entity is

modeled using entity declaration and at least one architecture body. It

specifies the name of the entity being modeled and lists the set of interface

ports.

ii) Architecture body: It specifies the internal details of the entity using any of

the modeling styles: structure, data-flow, behavior or any combination of

these.

iii) Configuration declaration: It is used to select one of the possible architecture

bodies that an entity may have and to bind components used to represent

structure in that architecture body.

iv) Package declaration: It is used to store a set of common declarations such as

component types, procedures and functions.

v) Package body: It is used to store the definitions of functions and procedures

that are declared in corresponding package declaration and also complete

78

constant declaration for any deferred constants that appear in package

declaration.

The complete design of 32-bit FPAU using VHDL is presented at Appendix 1 (from page

154 to 177) covering 391 lines of VHDL codes.

3.6 Conclusions

Due to its great dynamic range, high precision and easy operation rules, the floating point

operations have found intensive applications in the various fields for the requirements for

high precious operation. With the increasing requirements for the floating point

operations for the high-speed data signal processing and the scientific operation, the

requirements for the high-speed hardware floating point arithmetic units have become

more and more exigent. The implementation of the floating point arithmetic has been

very easy and convenient in the floating point high level languages, but the

implementation of the arithmetic by hardware has been very difficult. Therefore, an

FPGA-based digital system for a very comprehensive 32-bit Floating Point Arithmetic

Unit (FPAU) using VHDL is designed as base digital system design for further

developing a systematic approach/ methodology that shall be applied on this designed

system which can give the best trade-off among the three prime parameters of delay, area

and power. The next chapter is devoted for synthesization, implementation and testing of

this design on the FPGA platform/device and analysis of the resources of FPGA used, its

timing summary and power estimated for this digital system design.

chapter-iii fpga-based design of 32-bit...

Documents