58
CHAPTER-III
FPGA-BASED DESIGN OF 32-BIT FPAU
The literature survey in the last chapter shows that since the introduction of FPGA,
research and development has produced dramatic improvements in FPGA speed and
area efficiency, narrowing the gap between FPGAs and ASICs and making FPGAs the
platform of choice for implementing all types of digital circuits. FPGAs hold significant
promise as a fast to market replacement for ASICs in many applications at present time
as FPGAs are configured in less than a second and can often be reconfigured. The
flexibility and reconfigurability of FPGA based design comes at the cost of three major
parameters of delay, area and power. Although a lot of techniques and methods have
been proposed by the research scholars to reduce any of these three parameters but very
rare consolidated work has been reported for developing an approach or methodology to
improve the overall performance of FPGA based digital system design in respect of all
these three parameters.
This chapter covers the basic development stages of FPGA-based design followed by the
basics of 32-bit floating point representation. Then the algorithms along with flow-charts
for design of FPGA based 32-bit FPAU using VHDL to carry out any of operations of
addition/subtraction, multiplication and division are presented. The same is taken as the
base digital system design to carry out the further work.
3.1 Introduction
After the exhaustive literature survey and review in the area of the FPGA based system
design and the work carried out to improve its overall performance, as per the stated
objectives of the research work, some FPGA based comprehensive digital system is to be
designed that shall be taken as the base digital system on which the further work is to be
carried out. For this purpose FPGA-based 32-bit FPAU is designed and presented in this
chapter.
59
The floating point operations have found intensive applications in the various fields for
the requirements for high precious operation due to its great dynamic range, high
precision and easy operation rules. High attention has been paid on the design and
research of the floating point processing units. The requirements for the high-speed
hardware floating point arithmetic units have become more and more exigent with the
increasing requirements for the floating point operations for the high-speed data signal
processing and the scientific operation,. The implementation of the floating point
arithmetic has been very easy and convenient in the floating point high level languages,
but the implementation of the arithmetic by hardware has been very difficult. With the
development of the very large scale integration (VLSI) technology, devices like FPGA
has become the best options for implementing floating hardware arithmetic units because
of their high integration density, low price, high performance and flexible applications
requirements for high precious operation.
Floating-point implementation on FPGAs has been the interest of many researchers. The
use of custom floating-point formats in FPGAs has been investigated in a long series of
work by Belanovic and Leeser (2002), Dido et al (2002) Gaar et al (2002) Liang, Tessier
and Mencer (2003). The earliest work on IEEE floating-point by Fagin and Renard
(1994) focused on single precision although found to be feasible but it was extremely
slow. Eventually, Ligon et al (1998) demonstrated that while FPGAs were uncompetitive
with CPUs in terms of peak FLOPs, they could provide competitive sustained floating-
point performance. Since then, Luo and Martonosi (2000), Belanovic and Leeser (2002),
Tessier and Mencer (2003) and Wang and Nelson (2003) have demonstrated the growing
feasibility of IEEE compliant, single precision floating point arithmetic and other
floating-point formats of approximately same complexity in their variety of works.
Before the algorithms along with flow-charts for design of FPGA based 32-bit FPAU
using VHDL to carry out any of operations of addition/subtraction, multiplication and
division are presented, the basic development stages of FPGA are described in the
following section.
60
3.2 Development Stages of FPGA
Regardless of the final product, FPGA designer has to follow the following four basic
FPGA development stages as shown in Fig.3.1:
i. Design
ii. Simulation
iii. Synthesis
iv. Design Implementation (Place and Route + Bit Stream Generation)
Figure3.1: Development stages of FPGA
3.2.1 Design
The design process involves conversion of requirements into a format that represents the
desired digital function(s). Common design formats are schematic capture, hardware
description language (HDL), or a combination of the two. Each method has its
advantages and disadvantages but HDLs generally offer the greatest design flexibility.
Schematic capture: Schematic capture is a graphical depiction of a digital design and
shows the actual interconnection between each logic gate that produces the desired output
function(s). Many of these logic gate symbols involve proprietary information which is
available to the designer only through the specific vendor’s component library. It makes
61
the design unrecognizable by competitors’ FPGA development tools and makes it vendor
dependent. That means, the entire design process has to be repeated if a different vendor
is used. View- Draw and EASE are examples of schematic capture tools by Viewlogic
and HDL respectively. The main advantage of schematic capture is that the graphical
representation is easy to understand. But an increase in cost and time to reproduce a
design for different vendors due to the design’s proprietary nature are its major
drawbacks.
HDL method: Hardware Description Languages (HDLs) use code to represent digital
functions. “Firmware” often refers to the resulting HDL code. Use of HDLcodes is a
common and popular approach to FPGA design. One can create the source code with any
text editor. HDLs can be generic (supported by multiple simulation and synthesis tool
sets) like Verilog or VHDL (Very High Speed Integrated Circuit HDL), or vendor
specific like Altera’s Hardware Description Language (AHDL), which is only
recognizable by Altera’s design tool set. There are two writing styles for HDL designs:
structural or behavioral.
Structural HDL firmware is the software equivalent of a schematic capture design. Like
schematic capture, a structural design uses vendor specific components to construct the
desired digital functions. It is again vendor dependent and has the same disadvantages.
Behavioral HDL firmware describes digital functions in generic or abstract terms that are
generally vendor independent. This provides enough flexibility for code reuse in different
vendor’s FPGAs with little or no code modification. Behavioral designs have advantages
of its flexibility, time and cost-savings. Only those components are required to be
changed for designs that require vendor specific resources, such as RAM. VHDL and
Verilog are the most popular HDL languages. VHDL files consist of three main parts:
Library declaration
Entity declaration and
Architecture section.
62
An optional heading section, containing pertinent information, such as the designer’s
name, filename, a brief summary of the code, and a revision historyshould also be
included, which otherwise is not required for VHDL.
Library declaration - The library declaration is the first section in the source file.
This is where one places the library and package call-out statements. Libraries
and packages define and store components, define signal types, functions,
procedures, and so forth. Packages and libraries are standardized, such as the
IEEE library, and also defined by a user (designer) or vendor. This section is
considered to be complete once all the libraries and packages are visible.
Entity declaration - The entity declaration section immediately follows the library
declaration. Each entity has an assigned name. Just as the library declaration
section makes libraries and packages visible to the design, the entity section
makes the I/Os visible to other source files and the design and can represent the
I/Os as physical FPGA pins. VHDL designs can contain one source file or a
hierarchy of multiple files. Hierarchical file structures consist of several files
connected through the signals declared in their entities. All entities must be
associated with an architecture section.
Architecture section - The architecture section is the body of the VHDL source
code and contains the circuit description. The libraries, packages and signals work
together to develop the desired functions. Like the entity, each architecture
section must have an assigned name. The format for declaring the architecture is
the reserved word ‘Architecture’ followed by its name. Moreover, signals not
defined in the entity section are defined in this section.
After defining all the design’s signals, the designer is ready to develop the code that
describes the desired functions. The reserved word ‘Begin’ signifies the start of the next
subsection, which combines the concurrent and sequential statements. Concurrent
statements update or change value at anytime. The signal assignment immediately
following the first reserved word. The architecture section closes by using the reserved
word ‘End’ followed by the architecture’s name.
63
3.2.2 Simulate or synthesize
Once the design is complete, there are two options available:
a) simulate and then synthesize
b) synthesize and then simulate.
There is no hard and fast rule stating that one must simulate before synthesis. There are
advantages to each option, and designers must determine which step is most beneficial.
Simulating the design prior to synthesis allows logic errors and design flaws to be
resolved early in the development process. Synthesizing lets the designer resolve
synthesis errors prior to logic errors and design flaws. Ideally, the designer would
perform minimal simulation, leaving the more stringent testing to a code tester. The
original code designer shouldnot test his own code because he is less likely to detect
specific design flaws such as:
1. Misinterpretation of requirements; if the designer misunderstood a requirement, he or
she will test and evaluate the design based on that misunderstanding.
2. It is more difficult for a person to find his own errors. A third party generally tests the
code more rigorously and is more eager to find bugs than the original designer.
Regardless of who performs the simulations, the process is the same.
Simulation is an act of verifying the HDL or graphical digital designs prior to actual
hardware validation. The circuit’s input-signal characteristics are described in HDL or in
graphical terms that are then applied to the design. This lets the code tester observe the
outputs’ behavior. It may be necessary to modify the source code during simulation to
resolve any discrepancies, bugs, or errors. Simulation inputs or stimulus are inputs that
mimic realistic circuit I/Os. Stimulus forces the circuit to operate under various
conditions and states. The greatest benefit of stimulus is the ability to apply a wide range
of both valid and invalid input-signal characteristics, test circuit limits, vary signal
parameters (such as pulse width and frequency), and observe output behavior without
damaging hardware. Generally, it is referred to applying stimulus to the design in the
form of HDL.
64
Some popular simulators are Mentor Graphics’ ModelSim, Aldec’s Riviera, and Altera’s
Quantus II.
There are three levels of simulation:
Register transfer level (RTL)
Functional, and
Gate level.
Each occurs at a specific place in the development process.
RTL follows the design stage; functional follows synthesis and gate level simulation is
executed once the implementation is completed.
Generally, the stimulus developed for the RTL simulation is reusable without
modification for each level of simulation.
3.3.3 Simulation
The initial simulation performed immediately after the design stage is the RTL simulation
and it only verifies the correctness of the logic and no realistic timing information is
available to the simulator. Therefore, no serious timing exists for the design. The only
timing information that can be available to the simulator is tester generated. Like input
stimulus, a tester can insert simulated or injected delays into the original HDL design.
Applying test stimulus to the synthesized or optimized netlist produced by a synthesis
tool is a functional simulation. Optimized netlists produced by non-vendors apply
estimated delays that produce more realistic simulation output results. The main benefit
of performing functional simulation is that it lets the tester verify that the synthesis
process hasn’t changed the design.
Gate-level simulation involves applying stimulus to the netlist created by the
implementation process. All internal timing delays are included in this netlist, which
provides the tester with the most accurate design output. Many third-party simulation
tools can perform gate simulation but not all.
65
Each level of simulation is performed at the appropriate development stage and offers
various benefits. RTL uncovers logic errors, the functional level verifies that the pre- and
post-synthesis designs are equivalent, and the gate level uncovers timing errors. Opting to
omit simulation and testbenching will generally cost the project additional time and
money. Simulation is valuable and as a guideline, at least 2X the number of hours spent
writing the code should be spent developing and testing the code.
3.2.4 Synthesis
Synthesis is the process that reduces and optimizes the HDL or graphical design logic.
Some third-party synthesis tools are available as a part of the FPGA vendor’s complete
development package. Synplicity’s Synplify and Mentor Graphics’ Leonardo Spectrum,
Precision RTL, and Precision Physical are some examples of third-party synthesis tools.
Xilinx offers ISE Project Foundation, which is a complete development application that
includes a synthesis tool. Altera has Quartus II Integrated Synthesis (QIS).
Although some FPGA vendors offer synthesis, they still recommend using a third-party’s
synthesis tools. The synthesis tool must be set up prior to actually synthesizing the
design. The synthesis process takes this information and the user-defined constraints and
produces the output netlist. A constraints file specifies information like the critical signal
paths and clock speeds. Synthesis can begin after completing set-up. General synthesis
flow for tools involves three steps: creating structural element, optimizing, and mapping.
Figure 3.2 shows a synthesis flow diagram.
Figure 3.2 Design Synthesis Flow Diagram
The first step in the synthesis process takes the HDL design and compiles it into
structural elements. The next step involves optimizing the design, making it smaller and
66
faster by removing unnecessary logic and allowing signals to arrive at the inputs or
output faster. The goal of the optimizing process is the make the design perform better
without changing the circuit’s functions. The final step in the synthesis process maps or
associates the design to the vendor specific architecture. The mapping process takes the
design and maps or connects it using the architecture of the specific vendor. This means
that the design connects to vendor specific components such as look-up tables and
registers. The optimized netlist is the output of the synthesis process. This netlist may be
produced in one of several formats. Edif is a general netlist format accepted by most
implementation tools, while ‘.xnf’ format is specific to Xilinx and is only recognized by
Xilinx’s implementation. In addition to the optimized netlist, many synthesis tools like
Synplify will produce a netlist for gate-level simulation and other report files. Stimulus
applied to this netlist instead of the original HDL design produces the functional-level
simulation, which lets the designer verify that the synthesis process hasn’t changed the
design’s functions. At this point, synthesis is complete and ready for the implementation
process. Each FPGA vendor has its own implementation tool, such as Xilinx’s has
Project Navigator and Altera’s has Quartus II’s.
3.2.5 Design implementation (Place and Route+ Bit Stream Generation)
The final stage in the FPGA development process is the design implementation, also
known as place and route (PAR). The placement is done by selecting the optimal position
for each block in a circuit with the basic goal of locating functional blocks such that the
interconnects required to route the signals between them is minimized. As described by
Mak and Hao (2005) it is extremely important to have good placement for FPGA designs
as it directly affects the routability and performance of the design on FPGA. A poor
placement may lead to increased power consumption and lower operating speed. Broadly
FPGA placement algorithms are classified in two categories [Marquardt, Betz and Rose
(2000)]:
67
Routability-driven algorithms, which have the objective of creating a placement
that minimizes the total interconnect.
Timing-driven algorithms, which use timing analysis to identify critical paths
and/or connections to optimize the delay of these connections, in addition to
optimize for routability.
Routing is the last basic step in design methodology which is prior to generating of
bitstream to program the FPGA. It has to use only the prefabricated routing resources
such as wire segments, programmable switches and multiplexers and therefore it is a
tedious process and challenging task to achieve 100% routability.
If the FPGA vendor has a complete development tool that can synthesis the design, little
or no setup is required for PAR. However, if a third-party synthesis tool is used, the
implementation tool must be set up, which involves directing the PAR tool to the
synthesized netlist and possibly a constraint file. The constraint file contains information
such as maximum or minimum timing delays for selected signal(s) and I/O pin
assignments. Pin assignments can be automatic (performed by the tool) or manual
(dictated by the designer). Automatic pin assignment is generally the best option for new
designs, as it lets the tool more effectively route the design without having fixed pin
assignments. It may be necessary to manually assign signals to specific pins to achieve
easy board routing, to provide the minimum signal route for timing-critical signals, or be
compatible with legacy designs.
But regardless of the reason, the designer must make this information available to the
PAR tool, which is done by creating a user constraint file that is used by the PAR tool.
After completing setup, the PAR process can begin. Xilinx’s Foundation or Project
Navigator performs design implementation in three steps: translate, fit, and generate
programming file. Translate, involves verifying that the synthesized net list is consistent
with the selected FPGA architecture and there are no inconsistencies in the constraint file.
Inconsistencies would consist of assigning two different signals to the same pin,
assigning a pin to a power or ground pin, or trying to assign a non-existing design signal
to a pin. In such cases the translate step will fail and the implementation process will be
stopped. Translate errors must be corrected and the translation step must be error free
68
before advancing to step of fit stage. This step involves taking the constraints file and
netlist and distributing the design logic in the selected FPGA. If the design is too large
and requires more resources than the selected device offers, the fitter will fail and halt the
implementation process. To correct this type of error, replace the current FPGA with a
larger one and re synthesize, and repeat the PAR for the design. A successful fit stage is
necessary to proceed to generate the programming file stage. All timing information is
available and many PAR tools will provide the required files necessary for the simulator
to perform a timing simulation.
For downloading the design to FPGA, the bitstream is generated as the final step, which
takes the mapped, placed and routed design as input and generates the logic and
interconnects on the target device to implement the intended logic design and layout. The
finally generated programming filecan be stored in flash memory, PROMs, or directly
into the FPGA. This process is also called Bit Stream Generation. Joint Test Action
Group (JTAG) and third-party programmers like Data I/O are the two programming
methods that are used to store the programming file in memory. The appropriate format
depends on the FPGA vendor, the programming method and the device used to hold the
programming. In addition to the implementation process and creating the programming
file, several output report files are also created, such as a pad file which contains
information such as signal pin assignment, part number, and part speed.
3.3 Floating Point Architecture
Floating point numbers are one possible way of representing real numbers in binary
format; the IEEE 754standard presents two different floating point formats, Binary
interchange format and Decimal interchange format. This paper focuses only on single
precision normalized binary interchange format. Figure 3.3 shows the IEEE 754 single
precision binary format representation; it consists of a one bit sign (S), an eight bit
exponent (E), and a twenty three bit fraction (M) or Mantissa.
69
32 bit Single Precision Floating Point Numbers IEEE standard are stored as:
S EEEEEEEE MMMMMMMMMMMMMMMMMMMMMM
S: Sign – 1 bit
E: Exponent – 8 bits
M: Mantissa – 23 bits Fraction
Figure3.3: IEEE 754 single precision binary format representation
The value of number V:
• If E=255 and F is nonzero, then V= NaN ("Not a Number")
• If E=255 and F is zero and S is 1, then V= - Infinity
• If E=255 and F is zero and S is 0, then V= Infinity
• If 0<E<255 then V= (-1)**S * 2 ** (E-127) * (1.F) (exponent range = -127 to
+128)
• If E=0 and F is nonzero, then V= (-1)**S * 2 ** (-126) * (0.F) ("un-normalized"
values”)
• If E=0 and F is zero and S is 1, then V= - 0
If E=0 and M is zero and S is 0, then V = 0
An extra bit is added to the mantissa to form what is called the significand. If the
exponent is greater than 0 and smaller than 255, and there is 1 in the MSB of the
significand then the number is said to be a normalized number; in this case the real
number is represented by (i)
V = (-1s) * 2 (E - Bias) * (1.M) ------ (i)
Where M = m22 2-1 + m21 2-2 + m20 2-3+…+ m1 2-22+m0 2-23;
Bias = 127.
70
3.4. Algorithms for Floating Point Arithmetic Unit
The following sections describes the algorithms using flow charts for floating point
addition/subtraction, multiplication and division that become the base for writing VHDL
codes for implementation of 32-bit floating point arithmetic unit.
3.4.1 Floating Point Addition / Subtraction
The algorithm for floating point addition is explained through flow chart in Figure 3.4.
While adding the two floating point numbers two cases may arise. Case I: when both the
numbers are of same sign i.e. when both the numbers are either +ve or –ve. In this case
MSB of both the numbers are either 1 or 0. Case II: when both the numbers are of
different sign i.e. when one number is +ve and other number is –ve. In this case the MSB
of one number is 1 and other is 0
Case I: - When both numbers are of same sign
Step 1:- Enter two numbers N1 and N2. E1, S1 and E1, S2 represent exponent and
significand of N1 and N2.
Step 2:- Check if E1 or E2 =’0’. If yes; set hidden bit of N1 or N2 to zero. If not; then
check if E2 > E1.If yes swap N1 and N2 and if E1 > E2; make contents of N1 and N2
same and there is no need to swap.
Step 3:- Calculate difference in exponents d=E1-E2. If d = ‘0’ then there is no need of
shifting the significand. If d is more than ‘0’ say ‘y’ then shift S2 to the right by an
amount ‘y’ and fill the left most bits by zero. Shifting is done through hidden bit.
Step 4:- Amount of shifting i.e. ‘y’ is added to exponent of N2 value. New exponent
value of E2= (previous E2 + ‘y’). Now result is in normalize form because E1 = E2.
Step 5:- Check if N1 and N2 have different sign, if ‘no’;
Step 6:- Add the significands of 24 bits each including hidden bit S=S1+S2.
Step7:- Check if there is carry out in significand addition. If yes; then add ‘1’ to the
exponent value of either E1 or new E2. After addition, shift the overall result of
significand addition to the right by one by making MSB of S as ‘1’ and dropping LSB of
significand.
71
Yes
No Yes
No
Yes No
Yes No
Carry Out Carry out No Carry Out
No Carry Out
If MSB is 1
Figure 3.4: Flow Chart for 32-bit floating point Addition/Subtraction
Step 8:- If there is no carry out in step 6, then previous exponent is the real exponent.
Step 9:- Sign of the result i.e. MSB = MSB of either N1 or N2.
Start
Enter N1 and N2 in Floating Format
Is E1 or E2=0 Set S23 =0 of N1 or N2 i.e. hidden bit
Is E1 or E2=0 Swap N1 and N2
Calculate Difference d=E1-E2
Shift S2 of N2 to right by amount‘d’ and fill left most bit by Zero’s. Shifting is done by Hidden Bit.
Amount of Shifting i.e. ‘d’ is added to the exponent of N2 .New exponent of N2 =D+E2 {Expo N1=Expo N2}. Now result is in
normalized form
Are N1 and N2 havingdifferentsi
gn?
Replace S2 of N2 by 2’s complement
Compute Significand S=S1+S2
Compute Sign=Sign
oflarger number
Compute Sign=Sign ofN1
or N2
Compute Significand S=S1+S2
Discard Carry and shift the result to left until there is ‘1’ at MSB fill least
significant bits by zero. Calculate amount ofshifting say ‘x’
Add 1 to Exponent and Also Shift overall result to right dropping LSB andmaking MSB ‘1’
Previous
Exponent is the realExponent
If MSB is 1, Replace S by 2’s Complement,
otherwise keep S as such
Amount of Shifting is Subtracting from Exponent to produce original exponent .Exponent of result =N1Expo/N2Expo-‘x’
Assemble Result into 32 bit format
72
Step 10:- Assemble result into 32 bit format excluding 24th bit of significand i.e. hidden
bit.
Case II: - When both numbers are of different sign
Step 1, 2, 3 & 4 are same as done in case I.
Step 5:- Check if N1 and N2 have different sign, if ‘Yes’;
Step 6:- Take 2’s complement of S2 and then add it to S1 i.e. S=S1+(2’s complement of
S2).
Step 7:- Check if there is carry out in significand addition. If yes; then discard the carry
and also shift the result to left until there is ‘1’ in MSB and also count the amount of
shifting say ‘z’.
Step 8:- Subtract ‘z’ from exponent value either from E1 or E2. Now the original
exponent is E1-‘z’. Also append the ‘z’ amount of zeros at LSB.
Step 9:- If there is no carry out in step 6 then MSB must be ‘1’ and in this case simply
replace ‘S’ by 2’s complement.
Step 10:- Sign of the result i.e. MSB = Sign of the larger number either MSB of N1or it
can be MSB of N2.
Step 11:- Assemble result into 32 bit format excluding 24th bit of significand i.e. hidden
bit.
In this algorithm three 8-bit comparators, one 24-bit and two 8-bit adders, two 8-bit
subtractors, two shift units and one swap unit are required in the design.
First 8-bit comparator is used to compare the exponent of two numbers. If
exponents of two numbers are equal then there is no need of shifting. Second 8-bit
comparator compares exponent with zero. If the exponent of any number is zero
set the hidden bit of that number zero. Third comparator is required to check
whether the exponent of second number is greater than first number. If the
exponent of second number is greater than first number then the numbers are
swapped.
73
One subtractor is required to compute the difference between the 8 bit exponents
of two numbers. Second subtractor is required in case both the numbers are of
different sign and is used to subtract the carry from exponent if carry appears after
addition of the significands of two numbers.
One 24-bit adder is required to add the 24-bit significands of two numbers. One 8-
bit adder is required,in case both the numbers are of same sign, and is used to add
the carry to the exponent, if carry appears after addition of the significands of two
numbers. Second 8-bit adder is used to add the amount of shifting to the exponent
of smaller number.
One swap unit is required to swap the numbers if N2 is greater than N1. Swapping
is normally done by taking the third variable. Two shift units are required: one is
for shift left and other is for shift right.
3.4.2 Floating Point Multiplication
The algorithm for floating point multiplication is explained through flow chart in Figure
3.5. Let N1 and N2 are normalized operands represented by S1, M1, E1 and S2, M2, E2
as their respective sign bit, mantissa (significand) and exponent. Basically following four
steps are used for floating point multiplication.
1. Multiply signifcands, add exponents, and determine sign
M=M1*M2
E=E1+E2-Bias
S=S1XORS2
2. Normalize Mantissa M (Shift left or right by 1) and update exponent E
3. Round off the result to fit in the available bits
4. Determine exception flags and special values for overflow and underflow.
74
Sign Bit Calculation: The result of multiplication is a negative sign if one of the
multiplied numbers is of a negative value and that can be obtained by XORing the sign of
two inputs.
Exponent Addition is done through unsigned adder for adding the exponent of the first
input to the exponent of the second input and after that subtract the Bias (127) from the
addition result (i.e. E1+E2 - Bias). The result of this stage can be called as intermediate
exponent. Significand Multiplication is donefor multiplying the unsigned significand and
placing the decimal point in the multiplication product. The result of significand
multiplication can be called as intermediate product (IP). The unsigned significand
multiplication is done on 24 bit.
Yes No
Yes No
Yes No
Yes
No
Figure 3.5: Flow Chart for floating point Multiplication
Start
Enter N1 and N2 in Floating Format
Add E2 from E1 i.e. E=E1-E2-Bias
Multiply M1 and M2 i.e. M=M1*M2
Is M=0 Set Exponent E for zero
Check if M overflows
Right Shift M and Set E=E+1
Left Shift M and Set E=E-1
Check if M is
Check if E overflows
Set Indication for overflow
Assemble Result into 32 bit format with final S, M and E
Compute Sign S=S1 XOR S2
75
3.4.3 Floating Point Division
The algorithm for floating point multiplication is explained through flow chart in Figure
3.6. Let N1 and N2 are normalized operands represented by S1, M1, E1 and S2, M2, E2
as their respective sign bit, mantissa (significand) and exponent. If let us say we consider
x=N1 and d=N2 and the final result q has been taken as “x/d”. Again the following four
steps are used for floating point division.
Yes No
Yes No
Yes No
Yes
No
Figure 3.6: Flow Chart for floating point Division (q = x/d; N1=x and
N2=d)
Start
Enter N1 and N2 in Floating Format
Subtract E2 from E1 i.e. E=E1-E2
Divide M1 by M2 i.e. M=M1/M2
Is M=0 Set Exponent E for zero
Check if M overflows
Right Shift M and Set E=E+1
Left Shift M and Set E=E-1
Check if M is
Check if E overflows
Set Indication for overflow
Assemble Result q into 32 bit format with final S, M and E
Compute Sign S=S1 XOR S2
76
1. Divide signifcands, subtract exponents, and determine sign
M=M1/M2
E=E1-E2
S=S1XORS2
2. Normalize Mantissa M (Shift left or right by 1) and update exponent E
3. Round off the result to fit in the available bits
4. Determine exception flags and special values
The sign bit calculation, mantissa division, exponent subtraction (no need of bias
subtraction here); rounding off the result to fit in the available bits and normalization is
done in the similar way as has been described for multiplication.
3.5 Design of 32-bit FPAU using VHDL
VHDL is an acronym for VHSIC Hardware Description Language and in this VHSIC is
an abbreviation for Very High Speed Integrated Circuit. It is a hardware descriptive
language that can be used to model a digital system at many levels of abstraction, ranging
from algorithm to the gate level. It can be considered as an integrated amalgamation of
following languages: sequential, concurrent, net-list, timing specifications and waveform
generation. The VHDL, therefore has constructs that enable to express the concurrent or
sequential behavior of digital system with or without timings. It also allows modeling the
system as interconnection of components. All the constructs may be combined for
providing a comprehensive description of the system in single model.
It was developed by the Department of Defense (DoD)in1981. It has the following
capabilities and features that differentiate it from other hardware descriptive languages:
It can be used as communication medium between different Computer Aided
Design (CAD) and Computer Aided Engineering (CAE).
It can be used as an exchange medium between chip vendor and CAD tool user.
It supports flexible design methodologies i.e. top-down, bottom-up or mixed.
77
It supports hierarchy i.e. digital system can modeled as a set of interconnected
components and each component can further be modeled as a set of
interconnected sub-components.
It supports both synchronous and asynchronous timing models.
Model developed using this language is portable as it is an IEEE and ANSI
standard.
It supports three basic different descriptions i.e. structural, data flow and
behavioral.
It is not technology specific.
It can be used to describe library components from different vendors.
It is capable of being synthesized to gate level descriptions.
An entity in VHDL is a hardware abstraction of actual hardware device and to describe
an entity, VHDL provides following five different types of constructs which are called
design units:
i) Entity declaration: It describes the external view of the entity and an entity is
modeled using entity declaration and at least one architecture body. It
specifies the name of the entity being modeled and lists the set of interface
ports.
ii) Architecture body: It specifies the internal details of the entity using any of
the modeling styles: structure, data-flow, behavior or any combination of
these.
iii) Configuration declaration: It is used to select one of the possible architecture
bodies that an entity may have and to bind components used to represent
structure in that architecture body.
iv) Package declaration: It is used to store a set of common declarations such as
component types, procedures and functions.
v) Package body: It is used to store the definitions of functions and procedures
that are declared in corresponding package declaration and also complete
78
constant declaration for any deferred constants that appear in package
declaration.
The complete design of 32-bit FPAU using VHDL is presented at Appendix 1 (from page
154 to 177) covering 391 lines of VHDL codes.
3.6 Conclusions
Due to its great dynamic range, high precision and easy operation rules, the floating point
operations have found intensive applications in the various fields for the requirements for
high precious operation. With the increasing requirements for the floating point
operations for the high-speed data signal processing and the scientific operation, the
requirements for the high-speed hardware floating point arithmetic units have become
more and more exigent. The implementation of the floating point arithmetic has been
very easy and convenient in the floating point high level languages, but the
implementation of the arithmetic by hardware has been very difficult. Therefore, an
FPGA-based digital system for a very comprehensive 32-bit Floating Point Arithmetic
Unit (FPAU) using VHDL is designed as base digital system design for further
developing a systematic approach/ methodology that shall be applied on this designed
system which can give the best trade-off among the three prime parameters of delay, area
and power. The next chapter is devoted for synthesization, implementation and testing of
this design on the FPGA platform/device and analysis of the resources of FPGA used, its
timing summary and power estimated for this digital system design.