high bit rate_mul
TRANSCRIPT
CHAPTER-1
INTRODUCTION TO VLSI DOMAIN
1.1 VLSI DESIGN:
The complexity of VLSI is being designed and used today makes the manual
approach to design impractical. Design automation is the order of the day. With the
rapid technological developments in the last two decades, the status of VLSI
technology is characterized by the following
A steady increase in the size and hence the functionality of the ICs:
• A steady reduction in feature size and hence increase in the speed of operation as well as gate or transistor density.
• A steady improvement in the predictability of circuit behavior.
• A steady increase in the variety and size of software tools for VLSI design.
The above developments have resulted in a proliferation of approaches to VLSI design.
1.2 HISTORY OF VLSI:
VLSI began in the 1970s when complex semiconductor and communication
technologies were being developed. The microprocessor is a VLSI device. The term is
no longer as common as it once was, as chips have increased in complexity into the
hundreds of millions of transistors.
This is the field which involves packing more and more logic devices into
smaller and smaller areas. VLSI circuits can now be put into a small space few
millimeters across.. VLSI circuits are everywhere ... our computer, our car, our brand
new state-of-the-art digital camera, the cell-phones, and what we have.
1.3 VARIOUS INTEGRATIONS:
Over time, millions, and today billions of transistors could be placed on one
chip, and to make a good design became a task to be planned thoroughly.
In the early days of integrated circuits, only a few transistors could be placed on a
chip as the scale used was large because of the contemporary technology, and
1
manufacturing yields were low by today's standards. As the degree of integration was
small, the design was done easily. Over time, millions, and today billions of
transistors could be placed on one chip, and to make a good design became a task to
be planned thoroughly.
1.3.1 SSI TECHNOLOGY:
The first integrated circuits contained only a few transistors. Called "small-
scale integration" (SSI), digital circuits containing transistors numbering in the tens
provided a few logic gates for example, while early linear ICs such as the Plessey
SL201 or the Philips TAA320 had as few as two transistors. The term Large Scale
Integration was first used by IBM scientist Rolf Landauer when describing the
theoretical concept from there came the terms for SSI, MSI, VLSI, and ULSI.
1.3.2 MSI TECHNOLOGY:
The next step in the development of integrated circuits, taken in the late
1960s, introduced devices which contained hundreds of transistors on each chip,
called "medium-scale integration" (MSI).
They were attractive economically because while they cost little more to
produce than SSI devices, they allowed more complex systems to be produced using
smaller circuit boards, less assembly work (because of fewer separate components),
and a number of other advantages.
1.3.3 LARGE SCALE INTEGRATION:
Further development, driven by the same economic factors, led to "large-scale
integration" (LSI) in the mid 1970s, with tens of thousands of transistors per chip.
Integrated circuits such as 1K-bit RAMs, calculator chips, and the first
microprocessors, that began to be manufactured in moderate quantities in the early
1970s, had under 4000 transistors. True LSI circuits, approaching 10,000 transistors,
began to be produced around 1974, for computer main memories and second-
generation microprocessors.
1.3.4 VLSI:
Final step in the development process, starting in the 1980s and continuing
through the present, was in the early 1980s, and continues beyond several billion
transistors as of 2009. In 1986 the first one megabit RAM chips were introduced,
which contained more than one million transistors. Microprocessor chips passed the
2
million transistor mark in 1989 and the billion transistor mark in 2005.The trend
continues largely unabated, with chips introduced in 2007 containing tens of billions
of memory transistors.
VLSI DESIGN FLOW:
Fig 2.1 vlsi design flow
3
Start
Design Entity
Pre layout Simulation Logic Synthesis
System Partitioning
Pre layout Simulation Floor Planning
Placement
Circuit Extraction Routing Finish
1.4 ULSI, WSI, SOC and 3D-IC:
To reflect further growth of the complexity, the term ULSI that stands for
"ultra-large-scale integration" was proposed for chips of complexity of more than 1
million transistors. Wafer-scale integration (WSI) is a system of building very-large
integrated circuits that uses an entire silicon wafer to produce a single "super-chip".
Through a combination of large size and reduced packaging.
A system-on-a-chip ( SOC) is an integrated circuit in which all the
components needed for a computer or other system are included on a single chip. The
design of such a device can be complex and costly, and building disparate
components on a single piece of silicon may compromise the efficiency of some
elements. However, these drawbacks are offset by lower manufacturing and assembly
costs and by a greatly reduced power budget: because signals among the components
are kept on-die, much less power is required.
Three-dimensional integrated circuit (3D-IC) has two or more layers of active
electronic components that are integrated both vertically and horizontally into a single
circuit, &less power consumption.
1.5 VLSI DESIGN FLOW AND THEIR DESCRIPTION:
The design at the behavioral level is to be elaborated in terms of known and
acknowledged functional blocks. It forms the next detailed level of design description.
Once again the design is to be tested through simulation and iteratively corrected for
errors. The elaboration can be continued one or two steps further. It leads to a detailed
design description in terms of logic gates and transistor switches.
Optimization
The circuit at the gate level – in terms of the gates and flip-flops – can be
redundant in nature. The same can be minimized with the help of minimization tools.
The step is not shown separately in the figure. The minimized logical design is
converted to a circuit in terms of the switch level cells from standard libraries
provided by the foundries. The cell based design generated by the tool is the last step
in the logical design process; it forms the input to the first level of physical design.
Simulation
The design descriptions are tested for their functionality at every level –
behavioral, data flow, and gate. One has to check here whether all the functions are
carried out as expected and rectify them. All such activities are carried out by the
4
simulation tool. The tool also has an editor to carry out any corrections to the source
code. Simulation involves testing the design for all its functions, functional sequences,
timing constraints, and specifications. Normally testing and simulation at all the levels
– behavioral to switch level – are carried out by a single tool; the same is identified as
“scope of simulation tool” in Figure 1.1.
5
Synthesis
With the availability of design at the gate (switch) level, the logical design is
complete. The corresponding circuit hardware realization is carried out by a synthesis
tool. Two common approaches are as follows:
• The circuit is realized through an FPGA. The gate level design description is the
starting point for the synthesis here. The FPGA vendors provide an interface to the
synthesis tool. Through the interface the gate level design is realized as a final circuit.
With many synthesis tools, one can directly use the design description at the data flow
level itself to realize the final circuit through an FPGA. The FPGA route is attractive
for limited volume production or a fast development cycle.
• The circuit is realized as an ASIC. A typical ASIC vendor will have his own library
of basic components like elementary gates and flip-flops. Eventually the circuit is to
be realized by selecting such components and interconnecting them conforming to the
required design. This constitutes the physical design. Being an elaborate and costly
process, a physical design may call for an intermediate functional verification through
the FPGA route. The circuit realized through the FPGA is tested as a prototype. It
provides another opportunity for testing the design closer to the final circuit.
Physical Design
A fully tested and error-free design at the switch level can be the starting point
for a physical design [Baker & Boyce, Wolf]. It is to be realized as the final circuit
using (typically) a million components in the foundry’s library. The step-by-step
activities in the process are described briefly as follows:
• System partitioning: The design is partitioned into convenient compartments or
functional blocks. Often it would have been done at an earlier stage itself and the
software design prepared in terms of such blocks. Interconnection of the blocks is part
of the partition process.
• Floor planning: The positions of the partitioned blocks are planned and the blocks
are arranged accordingly. The procedure is analogous to the planning and
arrangement of domestic furniture in a residence. Blocks with I/O pins are kept close
to the periphery; those which interact frequently or through a large number of
interconnections are kept close together, and so on. Partitioning and floor planning
may have to be carried out and refined iteratively to yield best results.
6
• Placement: The selected components from the ASIC library are placed in position
on the “Silicon floor.” It is done with each of the blocks above.
• Routing: The components placed as described above are to be interconnected to the
rest of the block: It is done with each of the blocks by suitably routing the
interconnects. Once the routing is complete, the physical design cam is taken as
complete. The final mask for the design can be made at this stage and the ASIC
manufactured in the foundry.
Post Layout Simulation
Once the placement and routing are completed, the performance specifications
like silicon area, power consumed, path delays, etc., can be computed. Equivalent
circuit can be extracted at the component level and performance analysis carried out.
This constitutes the final stage called “verification.” One may have to go through the
placement and routing activity once again to improve performance.
Critical Subsystems
The design may have critical subsystems. Their performance may be crucial to
the overall performance; in other words, to improve the system performance
substantially, one may have to design such subsystems afresh. The design here may
imply redefinition of the basic feature size of the component, component design,
placement of components, or routing done separately and specifically for the
subsystem. A set of masks used in the foundry may have to be done afresh for the
purpose.
7
CHAPTER 2
INTRODUCTION TO THE PROJECT
2.1 Motivation:
The multiplication operation can be employed to implement the system
performance and had been widely used in Digital Signal Processing and in Digital
Communications.
The traditional array based multiplication performs a regular usage of more
number of addition and shifting operations, thus utilizing more amount of Hardware
and having more complex operations.
2.2 Overview of the Project:
Multiplication operation involves generation of partial products and their
accumulation. The speed of multiplication can be increased by reducing the number
of partial products and/or accelerating the accumulation of partial products. Among
the many methods of implementing high speed parallel multipliers, there are two
basic approaches namely Booth algorithm and Wallace Tree compressors.
This paper describes an efficient implementation of a high speed parallel
multiplier using both these approaches. Here two multipliers are proposed. The first
multiplier makes use of the Radix-4 Booth Algorithm with 3:2 compressors while the
second multiplier uses the Radix-8 Booth algorithm with 4:2 compressors. The design
is structured for m x n multiplication where m and n can reach up to 126 bits. The
number of partial products is n/2 in Radix-4 Booth algorithm while it gets reduced to
n/3 in Radix-8 Booth algorithm.
The Wallace tree uses Carry Save Adders (CSA) to accumulate the partial
products. This reduces the time as well as the chip area. To further enhance the speed
of operation, carry-look-ahead (CLA) adder is used as the final adder.
2.3 Organization of Thesis:
The first chapter in this project report is introduction to the Booth Encoding.
Second chapter gives the brief idea on different types of operations, like, addition and
8
shifting. Third chapter is the different types of Wallace tree method. Fourth chapter
shows the operation of Carry Look-ahead Adder scheme.
The synthesis and simulation results for calculating processor (CP) reports in
the fifth chapter. Conclusions and future scope are explained in sixth chapter,
References are given after sixth chapter. The Code for calculating processor (CP) put
in Appendix.The efficient implementation of Radix-8 multiplication operation is
an important prerequisite in Booth Algorithm because multiplication operations are
performed using Radix-8 representation operations in the underlying field.
Wallace tree method provides an efficient way of adding the partial products.
Three kinds of Radix operations that are especially amenable for the efficient
implementation of multiplication operations. Finally a Carry Look-ahead Adder is
used in addition of partial products.
9
CHAPTER 3
BASIC THEORY OF BOOTH ALGORITHM
3.1 Introduction to Booth Algorithm:
It consists of four major modules: Booth encoder, partial product generator,
Wallace tree and carry look-ahead adder. The Booth encoder performs Radix-2 or
Radix-4 encoding of the multiplier bits. Based on the multiplicand and the encoded
multiplier, partial products are generated by the generator. For large multipliers of 32
bits, the performance of the modified Booth algorithm is limited. So Booth recoding
together with Wallace tree structures have been used in the proposed fast multiplier.
The partial products are supplied to Wallace Tree and added appropriately. The
results are finally added using a Carry Look-ahead Adder (CLA) to get the final
product.
Fig 3.1 Block Diagram of Wallace Booth Multiplier
10
3.2 Radix – 8 Booth Algorithm
11
Multiplier Bits Recoded Operation on multiplicand, X
Yi+
2
Yi+
1
Y
i
Yi –
1
0 0 0 0 0 X
0 0 0 1 +1X
0 0 1 0 +1X
0 0 1 1 +2X
0 1 0 0 +2X
0 1 0 1 +3X
0 1 1 0 +3X
0 1 1 1 +4X
1 0 0 0 -4X
1 0 0 1 -3X
1 0 1 0 -3X
1 0 1 1 -2X
1 1 0 0 -2X
1 1 0 1 -1X
1 1 1 0 -1X
1 1 1 1 0X
Table 3.2 Radix-8 Multiplication
Here we have an odd multiple of the multiplicand, 3Y, which is not
immediately available. To generate it we need to perform this previous add:
2Y+Y=3Y. But we are designing a multiplier for specific purpose and thereby the
multiplicand belongs to a previously known set of numbers which are stored in a
memory chip. We have tried to take advantage of this fact, to ease the bottleneck of
the radix-8 architecture, that is, the generation of 3Y.
In this manner we try to attain a better overall multiplication time, or at
least comparable to the time we could obtain using a radix-4 architecture (with the
additional advantage of using a less number of transistors). To generate 3Y with 21-
bit words we only have to add 2Y+Y, that is, to add the number with the same number
12
shifted one position to the left, getting in this way a new 23-bit word, as shown in
below figure 3.2.
Fig. 3.2: 21-bit previous add.
In fact, only a 21-bit adder is needed to generate the bit positions from z1 to
z21. Bits z0 and z22 are directly known because z0=y0 and z22=y20 (sign bit of the
2s-complement number; 3Y and Y have the same sign). If in the memory from where
we take the numbers just two additional bits are stored together with each value of the
set of numbers, we can decompose the previous add in three shorter adds that can be
done in parallel. In this way, the delay is the same of a 7-bit adder:
Fig. 3.3: Modified previous add
Bits which are going to be stored are the two intermediate carry signals c8 and
c15. Before each word of the set of numbers is stored in the memory,the value of its
intermediate carries has to be obtained and stored beside it. In this way, they are
immediately available when it is required to perform the previous add to get the
multiple 3Y of one of the numbers that belongs to the set.
The increment in memory requirements is relatively small (9.5%, 23 bits
instead of 21 for every word), and the gain in time is obvious because we substitute a
13
21-bit adder by three 7-bit adders which can operate in parallel. In order to get the
minimum delay in the previous adder we use high-speed adders. The adders that best
fit our needs are the carry and sum select adders (CSSA) with an estimated delay of
where n is the word length.
So reducing the word length to one third, the diminishing of the previous add
delay will be 42% approximately. Although this reduction, the previous add delay will
keep on being dominant compared to the recodification time which is the only
operation that can be done in parallel with the previous add.
3.3 Multiplier unit design
The multiplication of two binary numbers, 21-bit length, 2s-complement and using the
algorithm with radix-8 recoding of the multiplier presents the following features:
a) Radix-8 recoding of the multiplier implies a reduction in the number of digits to 7:
Fig. 3.4: Multiplier recoding.
b) The partial products multiplexer must choose one out of nine possibilities depending
on the value of the corresponding signed-digit, as shown in figure 3.5:
14
Fig. 3.5: Partial products multiplexer.
c) The partial product length is two bits longer than the multiplicand length, giving
23-bit length partial products.
d) The number of partial products entering the Wallace tree structure is 8: 7 coming
from the multiplier recoded digits plus another partial product due to the compensation
bits of the 2scomplement multiplication algorithm which cannot be included in any of
the other 7 words.
e) The best structure for the reduction of 8 partial products applies only 4-2
compressors [7] (instead of the conventional full adders) .
The Wallace tree has the following scheme:
Fig. 8: Wallace reduction tree.
with an equivalent delay of 6 logic gates.
15
f) The previous and the final add must be done as fast as possible, so they are
implemented with carry and sum select adders (CSSA). In order to have a better
understanding of the multiplier design we are going to show an example following the
radix-8 recoding algorithm.
Consider the multiplication of these 2s-complement binary numbers:
Multiplicand: 111100010010110111001
Multiplier: 100011010100110100111
The multiplier recoding has the result shown here (following table 1):
The generation of three times the multiplicand gives:
The partial products array and its summation, which gives the multiplication
result, is shown in figure 9. In the array, some bits are encircled (fixed 1’s) and they
avoid the partial products sign extension. Some other bits are squared and they will be
1’s when the corresponding partial product has to be complemented (if recodification
gives a negative digit).
The leading four partial products will enter the first block of 4-2 compressors
while the other three partial products plus the compensation bits will enter the second
block of 4-2 compressors, still in the first compression level. Moreover, the final adder
has been decomposed in three adders with lengths 3, 6 and 31 bits. The 31-bit adder is
the proper final adder while the 3 and the 6-bit adders are used to advance bits of the
final result without passing through all the compression blocks in the Wallace tree.
16
CHAPTER 4
Wallace Tree
The Wallace tree method is used in high speed designs in order to produce two
rows of partial products that can be added in the last stage. Also critical path and the
number of adders get reduced when compared to the conventional parallel adders.
Here the Wallace tree has taken the role of accelerating the accumulation of the partial
products. Its advantage becomes more pronounced for multipliers of greater than 16
bits .The speed, area and power consumption of the multipliers will be in direct
proportion to the efficiency of the compressors.
The Wallace tree structure with 3:2 compressors and 4:2 compressors is
shown in Figure 3.2 and Figure 3.3 respectively. In this regard, we can expect a
significant reduction in computing multiplications.
17
Figure 4.2 Wallace Tree using 4:2 compressors
The 3:2 compressors make use of a carry save adder .The carry save adder
outputs two numbers of the same dimensions as the inputs, one is a sequence of
partial sum bits and other is a sequence of carry bits. In carry save adder, the carry
digit is taken from the right and passed to the left, just as in conventional addition; but
the carry digit passed to the left is the result of the previous calculation and not the
current one.
So in each clock cycle, carries only have to move one step along and the clock
can tick much faster. Also the carry-save adder produces all of its output values in
parallel, and thus has the same delay as a single full-adder. The 4:2 compressors have
been widely employed in the high speed multipliers to lower the latency of the partial
product accumulation stage.
A 4:2compressor can be built using two 3:2 compressors. Owing to its regular
interconnection, the 4:2 compressors is ideal for the construction of regularly
18
structured Wallace Tree with low complexity. The number of levels in the Wallace
tree using 3:2 compressors can be approximately given as
Number of Levels =
3.3Where, k is the number of partial products.
Table III shows the number of levels in the Wallace tree using 3:2 compressors
for different number of partial products.
Table III . NUMBER OF LEVELS IN THE WALLACE TREE
The final results obtained at the output of the Wallace tree are added using a
Carry Look-ahead Adder (CLA) which is independent of the number of bits of the
two operands. In Carry Look-ahead Adder, for every bit the carry and sum outputs are
independent of the previous bits and thus the rippling effect has completely been
eliminated.
19
It works by creating two signals, propagate and generate for each bit position,
based on whether a carry is propagated through from a less significant bit position, a
carry is generated in that bit position, or if a carry is killed in that bit position.
The design entry of 126×126 bit multipliers using Radix-4 Booth algorithm
with 3:2 compressors and Radix-8 Booth algorithm with 4:2 compressors are done
using VHDL and simulated using ModelSim SE 6.4 design suite from Mentor
Graphics. It is then synthesized and implemented in a Xilinx XC3S5000 fg1156 -4
FPGA using the Xilinx ISE 9.2i design suite.
Figure 4 presents a snapshot of simulation waveforms for 126×126 bit
multiplier. Table IV summarizes the FPGA resource utilization of these two
multipliers.
Finally the performance improvement is validated by implementing a higher
order FIR filter using these multipliers. Table V summarizes the FPGA resource
utilization for FIR filters using these multipliers.
This shows that the multiplier using Radix-8 Booth multiplier with 4:2
compressors gives better speed and the number of occupied slices is lower for the
multiplier using Radix-4 Booth algorithm with 3:2 compressors.
The FIR filters are implemented in Xilinx XC3S1500fg676-4 FPGA. The
specifications of the FIR filter chosen are as follows.
Sampling frequency : 24 KHz
Pass band frequency : 8 KHz
Stop band frequency : 9 KHz
Pass band ripple : 0.1 linear scale
Stop band attenuation : 0.001 linear scale
20
TABLE IV. DEVICE UTILIZATION SUMMARY OF MULTIPLIERS
CHAPTER 5
TOOLS AND HDL USED
5.1 ROLE OF HDL:
An HDL provides the framework for the complete logical design of the ASIC. All the
activities coming under the purview of an HDL are shown enclosed in bold dotted lines .
Verilog and VHDL are the two most commonly used HDLs today. Both have constructs with
which the design can be fully described at all the levels. There are additional constructs
available to facilitate setting up of the test bench, spelling out test vectors for them and
“observing” the outputs from the designed unit.
21
IEEE has brought out Standards for the HDLs, and the software tools conform to
them. Verilog as an HDL was introduced by Cadence Design Systems; they placed it into the
public domain in 1990. It was established as a formal IEEE Standard in 1995. The revised
version has been brought out in 2001. However, most of the simulation tools available today
conform only to the 1995 version of the standard.VHDL used by a substantial number of the
VLSI designers today is the used in this project for modeling the design.
We have used Xilinx ISE 9.2i for simulation and synthesis purposes. We
implemented the prescribed design in VHDL, a famous Industry and IEEE standard HDL.
5.2 NEEDS OF (V)HDL:
o Interoperability.
o Technology independence.
o Design reuse.
o Several levels of abstraction.
o Readability.
o Standard language.
o Widely supported.
What is VHDL?
VHDL = VHSIC Hardware Description Language(VHSIC = Very High-Speed IC)
Fig.5.1 Data Flow of VHDL
VHDL language are called as
Design specification language.
Design entry language.
Design simulation language.
Design documentation language.
An alternative to schematics.
5.2.1 BRIEF HISTORY:
22
Specify Capture Verify Formalize Implement
o VHDL was developed in the early 1980s for managing design problems that
involved large circuits and multiple teams of engineers.
o Funded by U.S Department of Defence.
o The first publicly available version was released in 1985.
o In 1986 IEEE (Institute of Electrical and Electronics Engineers) was presented
with a proposal to standardize the VHDL.
o In 1987 standardization => IEEE 1076-1987.
o An improved version of the language was released in 1994=> IEEE standard
1076-1993 .
Related Standards:
o IEEE 1076 doesn’t support simulation conditions such as unknown and high-
impedance.
o Soon after IEEE 1076-1987was released, simulator companies began using
their own, non-standard types=>VHDL was becoming a nonstandard.
o IEEE 1164 standard was developed by an IEEE.IEEE1164 contains definitions
for a nine –valued data type, std_logic.
5.3 VHDL ENVIRONMENT:
23
Fig 5.2 VHDL Environment
Design Units:
Segments of VHDL code that can be compiled separately and stored in a library.
Fig.5.3 Designs Uni
24
5.3 LEVELS OF ABSTRACTION:
VHDL supports many possible styles of design description, which differ
primarily in how closely they relate to the HW.
It is possible to describe a circuit in a number of ways.
Structural.
Data flow.
Behavioral.
Structural VHDL description:
• Circuit is described in terms of its components.
• From a low-level description (e.g., transistor-level description)to a high level
description.
• For large circuits, a low-level description quickly becomes impractical.
Dataflow VHDL Description:
• Circuit is described in terms of how data moves through the system.
• In the dataflow style you described how information flows between registers
in the system.
• The combinational of is described at a relatively high level, the placement
and operation register is specified quite precisely.
Fig 5.4.Data Flow Of VHDL Description
25
• The behavior of the system over the time is defined by registers.
• There are no build-in registers in VHDL-language.
-Either lowers level description.
-Or behavioral description of sequential elements is needed.
• The lower level descriptions must be created or obtained.
• If their is no 3rd party models for registers => you must write the behavioral
description of registers.
• The behavioral description can be provided in the form of
subprograms(functions or procedures).
Behavioral VHDL Description
• Circuit is described in terms of its operation over time.
• Representation might include, e.g., state diagrams ,timing diagrams and
algorithmic descriptions.
• The concept of time may be expressed precisely using delays(e.g., A<=B after
10ns).
• If no actual delay is used, order of sequential operations is defined.
• In the lower level of abstraction (e.g., RTL) synthesis tools ignore detailed
timing specifications.
• The actual timing results depend on implementation technology and efficiency
of synthesis tools.
• There are few tools for behavioral synthesis.
General format:
Process [(sensitivity list)]
Process_declarative_part
Begin
Process_statements
[wait_statement]
End process
26
CHAPTER 6
SOFTWARE TOOLS
6.1 SOFTWARE TOOL-XILINX:
Xilinx ISE is a software tool produced by Xilinx for synthesis and analysis of
HDL designs, which enables the developer to synthesize ("compile") their designs,
perform timing analysis , examine RTL diagrams, simulate a design's reaction to
different stimuli, and configure the target device with the programmer.
Xilinx was founded in 1984 by two semiconductor engineers, Ross Freeman
and Bernard Vonderschmitt, who were both working for integrated circuit and solid-
state device manufacturer Zilog Corp.
While working for Zilog, Freeman wanted to create chips that acted like a
blank tape, allowing users to program the technology themselves. At the time, the
concept was paradigm-changing. "The concept required lots of transistors and, at that
time, transistors were considered extremely precious – people thought that Ross's idea
was pretty far out", said Xilinx Fellow Bill Carter, who when hired in 1984 as the first
IC designer was the company's eighth employee.
Xilinx is a software tool, which is used to run the programs in VHDL
language. It has various versions like Xilinx 92.1, Xilinx 10.1, Xilinx 10.5 etc. Xilinx
has various pre-defined libraries ,packages.
6.2 VERSION 9.2I:
New Device Support.
This release supports the new Spartan™- 3A DSP family.
New Software Features.
Following are the new features in this release.
Operating System Support:
• Support for Windows® Vista Business 32-bit operating system.
• This operating system is supported, but has had limited testing.
• Support for Windows XP Professional 64-bit operating system
27
• Support for Red Hat Enterprise WS 5.0 32-bit and 64-bit operating system.
This operating system is supported, but has had limited testing.
WHY XILINX ONLY?
We have many software tools to run the VHDL programs like cadence .But
compared to all software tools Xilinx is cost effective.
28
CHAPTER 7
TUTORIAL OF ISE8.2i
ISE 8.2i Quick Start Tutorial
The ISE 8.2i Quick Start Tutorial provides Xilinx PLD designers with
a quick overview of the basic design process using ISE 8.2i. After you have
completed the tutorial, you will have an understanding of how to create, verify, and
implement a design.
Note: This tutorial is designed for ISE 8.2i on Windows.
This tutorial contains the following sections:
• “Getting Started”
• “Create a New Project”
• “Create an HDL Source”
• “Design Simulation”
• “Create Timing Constraints”
• “Implement Design and Verify Constraints”
• “Reimplement Design and Verify Pin Locations”
• “Download Design to the Spartan™-3 Demo Board”
For an in-depth explanation of the ISE design tools, see the ISE In-Depth Tutorial on
the
Xilinx® web site at: http://www.xilinx.com/support/techsup/tutorials/
29
Getting Started
Software Requirements:
To use this tutorial, you must install the following software:
• ISE 8.2i
For more information about installing Xilinx® software, see the ISE Release Notes
and
Installation Guide at: http://www.xilinx.com/support/software_manuals.htm.
Hardware Requirements:
To use this tutorial, you must have the following hardware:
• Spartan-3 Startup Kit, containing the Spartan-3 Startup Kit Demo Board
Starting the ISE Software
To start ISE, double-click the desktop icon,
or start ISE from the Start menu by selecting:
Start → All Programs → Xilinx ISE 8.2i → Project Navigator
Note: Your start-up path is set during the installation process and may differ from the
one above.
Accessing Help
At any time during the tutorial, you can access online help for additional information
about the ISE software and related tools.
30
To open Help, do either of the following:
• Press F1 to view Help for the specific tool or function that you have selected or
highlighted.
• Launch the ISE Help Contents from the Help menu. It contains information about
creating and maintaining your complete design flow in ISE.
Figure 1: ISE Help Topics
Create a New Project
Create a new ISE project which will target the FPGA device on the Spartan-3 Startup
Kit demo board.
To create a new project:
1. Select File > New Project... The New Project Wizard appears.
2. Type tutorial in the Project Name field.
3. Enter or browse to a location (directory path) for the new project. A tutorial
subdirectory is created automatically.
4. Verify that HDL is selected from the Top-Level Source Type list.
5. Click Next to move to the device properties page.
6. Fill in the properties in the table as shown below:
♦ Product Category: All
31
♦ Family: Spartan3
♦ Device: XC3S200
♦ Package: FT256
♦ Speed Grade: -4
♦ Top-Level Module Type: HDL
♦ Synthesis Tool: XST (VHDL/Verilog)
♦ Simulator: ISE Simulator (VHDL/Verilog)
♦ Verify that Enable Enhanced Design Summary is selected.
Leave the default values in the remaining fields.
When the table is complete, your project properties will look like the following:
32
Figure 2: Project Device Properties
7. Click Next to proceed to the Create New Source window in the New Project
Wizard. At the end of the next section, your new project will be complete.
Create an Verilog HDL Source
In this section, I will create the a example top-level Verilog HDL file
Creating a Verilog Source
Create the top-level Verilog source file as follows:
1. Click New Source in the New Project dialog box.
33
2. Select Verilog Module as the source type in the New Source dialog box.
3. Type in the file name counter.
4. Verify that the Add to Project checkbox is selected.
5. Click Next.
6. Declare the ports for the counter design by filling in the port information as shown
below:
Figure 5: Define Module
34
7. Click Next, then Finish in the New Source Information dialog box to complete the
new source file template.
8. Click Next, then Next, then Finish.
The source file containing the counter module displays in the Workspace, and the
counter displays in the Sources tab, as shown below:
35
Figure 6: New Project in ISE
Using Language Templates (Verilog)
The next step in creating the new source is to add the behavioral description for
counter.
36
Use a simple counter code example from the ISE Language Templates and customize
it for the counter design.
1. Place the cursor on the line below the output [3:0] COUNT_OUT; statement.
2. Open the Language Templates by selecting Edit → Language Templates…
Note: You can tile the Language Templates and the counter file by selecting Window
→ Tile Vertically to make them both visible.
3. Using the “+” symbol, browse to the following code example:
Verilog → Synthesis Constructs → Coding Examples → Counter → Binary →
Up/Down Counters → Simple Counter
4. With Simple Counter selected, select Edit → Use in File, or select the Use
Template in File toolbar button. This step copies the template into the counter source
file.
5. Close the Language Templates.
Final Editing of the Verilog Source
1. To declare and initialize the register that stores the counter value, modify the
declaration statement in the first line of the template as follows:
replace: reg [<upper>:0] <reg_name>;
with: reg [3:0] count_int = 0;
2. Customize the template for the counter design by replacing the port and signal
name
placeholders with the actual ones as follows:
♦ replace all occurrences of <clock> with CLOCK
♦ replace all occurrences of <up_down> with DIRECTION
♦ replace all occurrences of <reg_name> with count_int
37
3. Add the following line just above the endmodule statement to assign the register
value to the output port:
assign COUNT_OUT = count_int;
4. Save the file by selecting File → Save.
When you are finished, the code for the counter will look like the following:
module counter(CLOCK, DIRECTION, COUNT_OUT);
input CLOCK;
input DIRECTION;
output [3:0] COUNT_OUT;
reg [3:0] count_int = 0;
always @(posedge CLOCK)
if (DIRECTION)
count_int <= count_int + 1;
else
count_int <= count_int - 1;
assign COUNT_OUT = count_int;
endmodule
You have now created the Verilog source for the tutorial project.
Checking the Syntax of the New Counter Module
When the source files are complete, check the syntax of the design to find errors and
typos.
1. Verify that Synthesis/Implementation is selected from the drop-down list in the
Sources window.
38
2. Select the counter design source in the Sources window to display the related
processes in the Processes window.
3. Click the “+” next to the Synthesize-XST process to expand the process group.
4. Double-click the Check Syntax process.
Note: You must correct any errors found in your source files. You can check for
errors in the Console tab of the Transcript window. If you continue without valid
syntax, you will not be able to simulate or synthesize your design.
5. Close the HDL file.
Design Simulation
Verifying Functionality using Behavioral Simulation
Create a test bench waveform containing input stimulus you can use to verify the
functionality of the counter module. The test bench waveform is a graphical view of a
test bench.
Create the test bench waveform as follows:
1. Select the counter HDL file in the Sources window.
2. Create a new test bench source by selecting Project → New Source.
3. In the New Source Wizard, select Test Bench WaveForm as the source type, and
type counter_tbw in the File Name field.
4. Click Next.
5. The Associated Source page shows that you are associating the test bench
waveform with the source file counter. Click Next.
6. The Summary page shows that the source will be added to the project, and it
displays the source directory, type and name. Click Finish.
7. You need to set the clock frequency, setup time and output delay times in the
Initialize Timing dialog box before the test bench waveform editing window opens.
39
The requirements for this design are the following:
♦ The counter must operate correctly with an input clock frequency = 25 MHz.
♦ The DIRECTION input will be valid 10 ns before the rising edge of CLOCK.
♦ The output (COUNT_OUT) must be valid 10 ns after the rising edge of CLOCK.
The design requirements correspond with the values below.
Fill in the fields in the Initialize Timing dialog box with the following information:
♦ Clock Time High: 20 ns.
♦ Clock Time Low: 20 ns.
♦ Input Setup Time: 10 ns.
♦ Output Valid Delay: 10 ns.
♦ Offset: 0 ns.
♦ Global Signals: GSR (FPGA)
Note: When GSR(FPGA) is enabled, 100 ns. is added to the Offset value
automatically.
♦ Initial Length of Test Bench: 1500 ns.
Leave the default values in the remaining fields.
40
Figure 7: Initialize Timing
41
8. Click Finish to complete the timing initialization.
9. The blue shaded areas that precede the rising edge of the CLOCK correspond to the
Input Setup Time in the Initialize Timing dialog box. Toggle the DIRECTION port to
define the input stimulus for the counter design as follows:
♦ Click on the blue cell at approximately the 300 ns to assert DIRECTION high so
that the counter will count up.
♦ Click on the blue cell at approximately the 900 ns to assert DIRECTION high so
that the counter will count down.
Note: For more accurate alignment, you can use the Zoom In and Zoom Out toolbar
buttons.
Figure 8: Test Bench Waveform
42
10. Save the waveform.
11. In the Sources window, select the Behavioral Simulation view to see that the test
bench waveform file is automatically added to your project.
Figure 9: Behavior Simulation Selection
12. Close the test bench waveform.
Create a Self-Checking Test Bench Waveform
Add the expected output values to finish creating the test bench waveform.
This transforms the test bench waveform into a self-checking test bench waveform.
The key benefit to a self-checking test bench waveform is that it compares the desired
and actual output values and flags errors in your design as it goes through the various
transformations, from behavioral HDL to the device specific representation.
To create a self-checking test bench, edit output values manually, or run the
Generate Expected Results process to create them automatically. If you run the
Generate Expected Results process, visually inspect the output values to see if they
are the ones you expected for the given set of input values.
43
To create the self-checking test bench waveform automatically, do the following:
1. Verify that Behavioral Simulation is selected from the drop-down list in the
Sources window.
2. Select the counter_tbw file in the Sources window.
3. In the Processes tab, click the “+” to expand the Xilinx ISE Simulator process and
double-click the Generate Expected Simulation Results process. This process
simulates the design in a background process.
4. The Expected Results dialog box opens. Select Yes to annotate the results to the
test bench.
Figure 10: Expected Results Dialog Box
5. Click the “+” to expand the COUNT_OUT bus and view the transitions that
correspond to the Output Delay value (yellow cells) specified in the Initialize Timing
dialog box.
44
Figure 11: Test Bench Waveform with Results
6. Save the test bench waveform and close it.
You have now created a self-checking test bench waveform.
Simulating Design Functionality
Verify that the counter design functions as you expect by performing behavior
simulation
as follows:
1. Verify that Behavioral Simulation and counter_tbw are selected in the Sources
window.
2. In the Processes tab, click the “+” to expand the Xilinx ISE Simulator process and
double-click the Simulate Behavioral Model process.
45
The ISE Simulator opens and runs the simulation to the end of the test bench.
3. To view your simulation results, select the Simulation tab and zoom in on the
transitions.
The simulation waveform results will look like the following:
Figure 12: Simulation Results
Note: You can ignore any rows that start with TX.
4. Verify that the counter is counting up and down as expected.
5. Close the simulation view. If you are prompted with the following message, “You
have an active simulation open. Are you sure you want to close it?“, click Yes to
continue.You have now completed simulation of your design using the ISE Simulator.
46
CHAPTER-8
HARDWARE TOOLS
A field-programmable gate array (FPGA) is a semiconductor device that can
be configured by the customer or designer after manufacturing—hence the name
"field-programmable". FPGAs are programmed using a logic circuit diagram or a
source code in a hardware description language (HDL) to specify how the chip will
work.
They can be used to implement any logical function that an application-
specific integrated circuit (ASIC) could perform, but the ability to update the
functionality after shipping offers advantages for many applications. FPGAs contain
programmable logic components called "logic blocks", and a hierarchy of
reconfigurable interconnects that allow the blocks to be "wired together"—somewhat
like a one-chip programmable breadboard.
Logic blocks can be configured to perform complex combinational functions,
or merely simple logic gates like AND and XOR. In most FPGAs, the logic blocks
also include memory elements, which may be simple flip-flops or more complete
blocks of memory.
7.1 HISTORY
The FPGA industry sprouted from programmable read only memory (PROM)
and programmable logic devices (PLDs). PROMs and PLDs both had the option of
being programmed in batches in a factory or in the field (field programmable),
however programmable logic was hard-wired between logic gates.
Xilinx Co-Founders, Ross Freeman and Bernard Vonderschmitt, invented the
first commercially viable field programmable gate array in 1985 – the XC2064. The
XC2064 had programmable gates and programmable interconnects between gates, the
beginnings of a new technology and market. The XC2064 boasted a mere 64
configurable logic blocks (CLBs), with two 3-input lookup tables (LUTs). More than
20 years later, Freeman was entered into the National Inventor's Hall of Fame for his
invention.
47
7.2 ARCHITECTURE
The most common FPGA architecture consists of an array of configurable
logic blocks (CLBs), I/O pads, and routing channels. Generally, all the routing
channels have the same width (number of wires). Multiple I/O pads may fit into the
height of one row or the width of one column in the array.
An application circuit must be mapped into an FPGA with adequate resources.
While the number of CLBs and I/Os required is easily determined from the design,
the number of routing tracks needed may vary considerably even among designs with
the same amount of logic.
Fig 7.1 Internal Structure of FPGA
7.3 APPLICATIONS
Applications of FPGAs include digital signal processing, software-defined
radio, aerospace and defense systems, ASIC prototyping, medical imaging, computer
vision, speech recognition, cryptography, bioinformatics, computer hardware
emulation, radio astronomy and a growing range of other areas.
7.4 A BRIEF TUTORIAL: SOURCE CODE IS DUMPED INTO FPGA.
48
1. Now let’s look at the flow for actually synthesizing and implementing the
design in the FPGA prototyping boards. Close ModelSim and go back to the
Xilinx ISE environment. In the Sources subwindow change the selection in
the dropdown box from “Behavioral Simulation” to
“Synthesis/Implementation”.
2. To properly synthesize the design we need to specify which pins on the chip
all the inputs and outputs should be assigned to. In general of course we could
assign the signals just about any way we want. Since we will be using specific
prototype boards, we need to make sure our pins assignments match the
switches, buttons, and LEDs so we can test our design. We will be starting
with Digilab 2E boards that are connected to Digilab DIO2 input/output
boards. The I/O board has already been programmed and configured to have
the following connections:
49
3. To assign specific pins, expand the User Constraints selection under the
Process subwindow and double-click on Assign Package Pins.
50
4. A new application called Xilinx PACE should be launched.
a. In the Design Object List subwindow you should see a listing of all the
input and output signals from our design.
51
Here is where we can specify which pin locations we want for each signal.
Simply enter the pins numbers from the tables shown in Step 19 above,
making sure to use a capital letter “P” in front of the pin specification.
Let’s assign our signals as A P163 (Switch 1)
I0 P164 (Switch 2)
I1 P166 (Switch 3)
52
Y P149 (LED 0)
Once all pins have been assigned, save your constraints by selecting File
Save from the menu bar and exit Xilinx Pace.
5. Back in the Xilinx ISE. In the Process subwindow double-click on the
Synthesize – XST selection and wait for the process to complete. Then
double-click on the Implement Design selection and wait for the process to
complete. Then double-click on the Generate Programming File selection and
wait for the process to complete. If all goes well, you should have green
checks marks for the whole design.
53
6. There is a lot of information you can obtain through all of the objects listed in
the Processes subwindow, but let us proceed to downloading the design onto
the prototyping board for testing. First make sure the prototyping board is
connected to the PC and has power on. Also make sure the slide switch on the
FPGA board by the parallel port is set to JTAG (as opposed to “Port”). Then
select Configure Device (iMPACT) underneath the Generate Programming
File selection. You should the following window
54
7. Now you need to specify which bitstream file to use to configure the device.
For this tutorial we want to select the mux.bit file and click Open.
55
You will probably get the message below. Just click Yes.
56
You will also get a warning message saying the JTAG clock was updated in
the bitstream file (which is good) so just click OK. There is a way to correct
for that in the original design flow, but Xilinx automatically catches it here so
I don’t usually bother.
8. You should now see the Spartan XC2S200E chip in the main window. Right
click on the chip to prepare for downloading the bitstream file.
Select Program on the resulting window.
57
9. Click OK.
58
If all goes well you should get the Programming Succeeded message
10. Now just test and verify your design on the actual FPGA board!
CONCLUSION
It has been performed the design, implementation and simulation of a 21´21-
bit, radix-8, multiplier unit for specific purpose. The number of transistors is 8224
with an active area size of 2.97 mm2. The measured multiplication time is 9.4 ns and
the power dissipation is 60.7 mW at the frequency of 10 MHz It has been proved that
it can be useful to apply a radix-8 architecture in high-speed multipliers for specific
purpose because of the gain in time and number of transistors compared to the
conventional radix-4 recoding architecture.
This can be achieved with a slight modification in the previous adder. To do
the modification is needed to store two additional bits (intermediate carries) for each
word in the set of numbers. Memory needs are increased in a 9.5% while time
decrease in the previous adder can be estimated in a 42%. Due to this, the overall
multiplication time can be reduced with our radix-8 architecture for specific purpose.
59
REFERENCE
[1] Dong-Wook Kim, Young-Ho Seo, “A New VLSI Architecture of Parallel
Multiplier-Accumulator based on Radix-2 Modified Booth Algorithm”, Very Large
Scale Integration (VLSI) Systems, IEEE Transactions, vol.18, pp.: 201-208, 04 Feb.
2010
[2] Prasanna Raj P, Rao, Ravi, “VLSI Design and Analysis of Multipliers for Low
Power”, Intelligent Information Hiding and Multimedia Signal Processing, Fifth
International Conference, pp.: 1354-1357, Sept. 2009
60
[3] Lakshmanan, Masuri Othman and Mohamad Alauddin Mohd.Ali, “High
Performance Parallel Multiplier using Wallace-Booth Algorithm”, Semiconductor
Electronics, IEEE International Conference , pp.: 433- 436, Dec. 2002.
[4] Jan M Rabaey, “Digital Integrated Circuits, A Design Perspective”, Prentice Hall,
Dec.1995
[5] Louis P. Rubinfield, “A Proof of the Modified Booth's Algorithm for
Multiplication”, Computers, IEEE Transactions,vol.24, pp.: 1014-1015, Oct. 1975
[6] Rajendra Katti, “A Modified Booth Algorithm for High Radix Fixedpoint
Multiplication”, Very Large Scale Integration (VLSI) Systems, IEEE Transactions,
vol. 2, pp.: 522-524, Dec. 1994.
7] C. S. Wallace, “A Suggestion for a Fast Multiplier”, Electronic Computers, IEEE
Transactions, vol.13, Page(s): 14-17, Feb. 1964
[8] Hussin R et al , “An Efficient Modified Booth Multiplier Architecture”, IEEE
International Conference, pp.:1-4, 2008.
61