high bit rate_mul

CHAPTER-1

INTRODUCTION TO VLSI DOMAIN

1.1 VLSI DESIGN:

The complexity of VLSI is being designed and used today makes the manual

approach to design impractical. Design automation is the order of the day. With the

rapid technological developments in the last two decades, the status of VLSI

technology is characterized by the following

A steady increase in the size and hence the functionality of the ICs:

• A steady reduction in feature size and hence increase in the speed of operation as well as gate or transistor density.

• A steady improvement in the predictability of circuit behavior.

• A steady increase in the variety and size of software tools for VLSI design.

The above developments have resulted in a proliferation of approaches to VLSI design.

1.2 HISTORY OF VLSI:

VLSI began in the 1970s when complex semiconductor and communication

technologies were being developed. The microprocessor is a VLSI device. The term is

no longer as common as it once was, as chips have increased in complexity into the

hundreds of millions of transistors.

This is the field which involves packing more and more logic devices into

smaller and smaller areas. VLSI circuits can now be put into a small space few

millimeters across.. VLSI circuits are everywhere ... our computer, our car, our brand

new state-of-the-art digital camera, the cell-phones, and what we have.

1.3 VARIOUS INTEGRATIONS:

Over time, millions, and today billions of transistors could be placed on one

chip, and to make a good design became a task to be planned thoroughly.

In the early days of integrated circuits, only a few transistors could be placed on a

chip as the scale used was large because of the contemporary technology, and

1

http://en.wikipedia.org/wiki/Microprocessor

http://en.wikipedia.org/wiki/Communication

http://en.wikipedia.org/wiki/Semiconductor

manufacturing yields were low by today's standards. As the degree of integration was

small, the design was done easily. Over time, millions, and today billions of

transistors could be placed on one chip, and to make a good design became a task to

be planned thoroughly.

1.3.1 SSI TECHNOLOGY:

The first integrated circuits contained only a few transistors. Called "small-

scale integration" (SSI), digital circuits containing transistors numbering in the tens

provided a few logic gates for example, while early linear ICs such as the Plessey

SL201 or the Philips TAA320 had as few as two transistors. The term Large Scale

Integration was first used by IBM scientist Rolf Landauer when describing the

theoretical concept from there came the terms for SSI, MSI, VLSI, and ULSI.

1.3.2 MSI TECHNOLOGY:

The next step in the development of integrated circuits, taken in the late

1960s, introduced devices which contained hundreds of transistors on each chip,

called "medium-scale integration" (MSI).

They were attractive economically because while they cost little more to

produce than SSI devices, they allowed more complex systems to be produced using

smaller circuit boards, less assembly work (because of fewer separate components),

and a number of other advantages.

1.3.3 LARGE SCALE INTEGRATION:

Further development, driven by the same economic factors, led to "large-scale

integration" (LSI) in the mid 1970s, with tens of thousands of transistors per chip.

Integrated circuits such as 1K-bit RAMs, calculator chips, and the first

microprocessors, that began to be manufactured in moderate quantities in the early

1970s, had under 4000 transistors. True LSI circuits, approaching 10,000 transistors,

began to be produced around 1974, for computer main memories and second-

generation microprocessors.

1.3.4 VLSI:

Final step in the development process, starting in the 1980s and continuing

through the present, was in the early 1980s, and continues beyond several billion

transistors as of 2009. In 1986 the first one megabit RAM chips were introduced,

which contained more than one million transistors. Microprocessor chips passed the

2

http://en.wikipedia.org/wiki/Random_Access_Memory

http://en.wikipedia.org/wiki/Rolf_Landauer

http://en.wikipedia.org/wiki/IBM

http://en.wikipedia.org/wiki/Philips

http://en.wikipedia.org/wiki/Plessey

million transistor mark in 1989 and the billion transistor mark in 2005.The trend

continues largely unabated, with chips introduced in 2007 containing tens of billions

of memory transistors.

VLSI DESIGN FLOW:

Fig 2.1 vlsi design flow

3

Start

Design Entity

Pre layout Simulation Logic Synthesis

System Partitioning

Pre layout Simulation Floor Planning

Placement

Circuit Extraction Routing Finish

1.4 ULSI, WSI, SOC and 3D-IC:

To reflect further growth of the complexity, the term ULSI that stands for

"ultra-large-scale integration" was proposed for chips of complexity of more than 1

million transistors. Wafer-scale integration (WSI) is a system of building very-large

integrated circuits that uses an entire silicon wafer to produce a single "super-chip".

Through a combination of large size and reduced packaging.

A system-on-a-chip ( SOC) is an integrated circuit in which all the

components needed for a computer or other system are included on a single chip. The

design of such a device can be complex and costly, and building disparate

components on a single piece of silicon may compromise the efficiency of some

elements. However, these drawbacks are offset by lower manufacturing and assembly

costs and by a greatly reduced power budget: because signals among the components

are kept on-die, much less power is required.

Three-dimensional integrated circuit (3D-IC) has two or more layers of active

electronic components that are integrated both vertically and horizontally into a single

circuit, &less power consumption.

1.5 VLSI DESIGN FLOW AND THEIR DESCRIPTION:

The design at the behavioral level is to be elaborated in terms of known and

acknowledged functional blocks. It forms the next detailed level of design description.

Once again the design is to be tested through simulation and iteratively corrected for

errors. The elaboration can be continued one or two steps further. It leads to a detailed

design description in terms of logic gates and transistor switches.

Optimization

The circuit at the gate level – in terms of the gates and flip-flops – can be

redundant in nature. The same can be minimized with the help of minimization tools.

The step is not shown separately in the figure. The minimized logical design is

converted to a circuit in terms of the switch level cells from standard libraries

provided by the foundries. The cell based design generated by the tool is the last step

in the logical design process; it forms the input to the first level of physical design.

Simulation

The design descriptions are tested for their functionality at every level –

behavioral, data flow, and gate. One has to check here whether all the functions are

carried out as expected and rectify them. All such activities are carried out by the

4

http://en.wikipedia.org/wiki/Three-dimensional_integrated_circuit

http://en.wikipedia.org/wiki/System-on-a-chip

http://en.wikipedia.org/wiki/Wafer-scale_integration

simulation tool. The tool also has an editor to carry out any corrections to the source

code. Simulation involves testing the design for all its functions, functional sequences,

timing constraints, and specifications. Normally testing and simulation at all the levels

– behavioral to switch level – are carried out by a single tool; the same is identified as

“scope of simulation tool” in Figure 1.1.

5

Synthesis

With the availability of design at the gate (switch) level, the logical design is

complete. The corresponding circuit hardware realization is carried out by a synthesis

tool. Two common approaches are as follows:

• The circuit is realized through an FPGA. The gate level design description is the

starting point for the synthesis here. The FPGA vendors provide an interface to the

synthesis tool. Through the interface the gate level design is realized as a final circuit.

With many synthesis tools, one can directly use the design description at the data flow

level itself to realize the final circuit through an FPGA. The FPGA route is attractive

for limited volume production or a fast development cycle.

• The circuit is realized as an ASIC. A typical ASIC vendor will have his own library

of basic components like elementary gates and flip-flops. Eventually the circuit is to

be realized by selecting such components and interconnecting them conforming to the

required design. This constitutes the physical design. Being an elaborate and costly

process, a physical design may call for an intermediate functional verification through

the FPGA route. The circuit realized through the FPGA is tested as a prototype. It

provides another opportunity for testing the design closer to the final circuit.

Physical Design

A fully tested and error-free design at the switch level can be the starting point

for a physical design [Baker & Boyce, Wolf]. It is to be realized as the final circuit

using (typically) a million components in the foundry’s library. The step-by-step

activities in the process are described briefly as follows:

• System partitioning: The design is partitioned into convenient compartments or

functional blocks. Often it would have been done at an earlier stage itself and the

software design prepared in terms of such blocks. Interconnection of the blocks is part

of the partition process.

• Floor planning: The positions of the partitioned blocks are planned and the blocks

are arranged accordingly. The procedure is analogous to the planning and

arrangement of domestic furniture in a residence. Blocks with I/O pins are kept close

to the periphery; those which interact frequently or through a large number of

interconnections are kept close together, and so on. Partitioning and floor planning

may have to be carried out and refined iteratively to yield best results.

6

• Placement: The selected components from the ASIC library are placed in position

on the “Silicon floor.” It is done with each of the blocks above.

• Routing: The components placed as described above are to be interconnected to the

rest of the block: It is done with each of the blocks by suitably routing the

interconnects. Once the routing is complete, the physical design cam is taken as

complete. The final mask for the design can be made at this stage and the ASIC

manufactured in the foundry.

Post Layout Simulation

Once the placement and routing are completed, the performance specifications

like silicon area, power consumed, path delays, etc., can be computed. Equivalent

circuit can be extracted at the component level and performance analysis carried out.

This constitutes the final stage called “verification.” One may have to go through the

placement and routing activity once again to improve performance.

Critical Subsystems

The design may have critical subsystems. Their performance may be crucial to

the overall performance; in other words, to improve the system performance

substantially, one may have to design such subsystems afresh. The design here may

imply redefinition of the basic feature size of the component, component design,

placement of components, or routing done separately and specifically for the

subsystem. A set of masks used in the foundry may have to be done afresh for the

purpose.

7

CHAPTER 2

INTRODUCTION TO THE PROJECT

2.1 Motivation:

The multiplication operation can be employed to implement the system

performance and had been widely used in Digital Signal Processing and in Digital

Communications.

The traditional array based multiplication performs a regular usage of more

number of addition and shifting operations, thus utilizing more amount of Hardware

and having more complex operations.

2.2 Overview of the Project:

Multiplication operation involves generation of partial products and their

accumulation. The speed of multiplication can be increased by reducing the number

of partial products and/or accelerating the accumulation of partial products. Among

the many methods of implementing high speed parallel multipliers, there are two

basic approaches namely Booth algorithm and Wallace Tree compressors.

This paper describes an efficient implementation of a high speed parallel

multiplier using both these approaches. Here two multipliers are proposed. The first

multiplier makes use of the Radix-4 Booth Algorithm with 3:2 compressors while the

second multiplier uses the Radix-8 Booth algorithm with 4:2 compressors. The design

is structured for m x n multiplication where m and n can reach up to 126 bits. The

number of partial products is n/2 in Radix-4 Booth algorithm while it gets reduced to

n/3 in Radix-8 Booth algorithm.

The Wallace tree uses Carry Save Adders (CSA) to accumulate the partial

products. This reduces the time as well as the chip area. To further enhance the speed

of operation, carry-look-ahead (CLA) adder is used as the final adder.

2.3 Organization of Thesis:

The first chapter in this project report is introduction to the Booth Encoding.

Second chapter gives the brief idea on different types of operations, like, addition and

8

shifting. Third chapter is the different types of Wallace tree method. Fourth chapter

shows the operation of Carry Look-ahead Adder scheme.

The synthesis and simulation results for calculating processor (CP) reports in

the fifth chapter. Conclusions and future scope are explained in sixth chapter,

References are given after sixth chapter. The Code for calculating processor (CP) put

in Appendix.The efficient implementation of Radix-8 multiplication operation is

an important prerequisite in Booth Algorithm because multiplication operations are

performed using Radix-8 representation operations in the underlying field.

Wallace tree method provides an efficient way of adding the partial products.

Three kinds of Radix operations that are especially amenable for the efficient

implementation of multiplication operations. Finally a Carry Look-ahead Adder is

used in addition of partial products.

9

CHAPTER 3

BASIC THEORY OF BOOTH ALGORITHM

3.1 Introduction to Booth Algorithm:

It consists of four major modules: Booth encoder, partial product generator,

Wallace tree and carry look-ahead adder. The Booth encoder performs Radix-2 or

Radix-4 encoding of the multiplier bits. Based on the multiplicand and the encoded

multiplier, partial products are generated by the generator. For large multipliers of 32

bits, the performance of the modified Booth algorithm is limited. So Booth recoding

together with Wallace tree structures have been used in the proposed fast multiplier.

The partial products are supplied to Wallace Tree and added appropriately. The

results are finally added using a Carry Look-ahead Adder (CLA) to get the final

product.

Fig 3.1 Block Diagram of Wallace Booth Multiplier

10

3.2 Radix – 8 Booth Algorithm

11

Multiplier Bits Recoded Operation on multiplicand, X

Yi+

2

Yi+

1

Y

i

Yi –

1

0 0 0 0 0 X

0 0 0 1 +1X

0 0 1 0 +1X

0 0 1 1 +2X

0 1 0 0 +2X

0 1 0 1 +3X

0 1 1 0 +3X

0 1 1 1 +4X

1 0 0 0 -4X

1 0 0 1 -3X

1 0 1 0 -3X

1 0 1 1 -2X

1 1 0 0 -2X

1 1 0 1 -1X

1 1 1 0 -1X

1 1 1 1 0X

Table 3.2 Radix-8 Multiplication

Here we have an odd multiple of the multiplicand, 3Y, which is not

immediately available. To generate it we need to perform this previous add:

2Y+Y=3Y. But we are designing a multiplier for specific purpose and thereby the

multiplicand belongs to a previously known set of numbers which are stored in a

memory chip. We have tried to take advantage of this fact, to ease the bottleneck of

the radix-8 architecture, that is, the generation of 3Y.

In this manner we try to attain a better overall multiplication time, or at

least comparable to the time we could obtain using a radix-4 architecture (with the

additional advantage of using a less number of transistors). To generate 3Y with 21-

bit words we only have to add 2Y+Y, that is, to add the number with the same number

12

shifted one position to the left, getting in this way a new 23-bit word, as shown in

below figure 3.2.

Fig. 3.2: 21-bit previous add.

In fact, only a 21-bit adder is needed to generate the bit positions from z1 to

z21. Bits z0 and z22 are directly known because z0=y0 and z22=y20 (sign bit of the

2s-complement number; 3Y and Y have the same sign). If in the memory from where

we take the numbers just two additional bits are stored together with each value of the

set of numbers, we can decompose the previous add in three shorter adds that can be

done in parallel. In this way, the delay is the same of a 7-bit adder:

Fig. 3.3: Modified previous add

Bits which are going to be stored are the two intermediate carry signals c8 and

c15. Before each word of the set of numbers is stored in the memory,the value of its

intermediate carries has to be obtained and stored beside it. In this way, they are

immediately available when it is required to perform the previous add to get the

multiple 3Y of one of the numbers that belongs to the set.

The increment in memory requirements is relatively small (9.5%, 23 bits

instead of 21 for every word), and the gain in time is obvious because we substitute a

13

21-bit adder by three 7-bit adders which can operate in parallel. In order to get the

minimum delay in the previous adder we use high-speed adders. The adders that best

fit our needs are the carry and sum select adders (CSSA) with an estimated delay of

where n is the word length.

So reducing the word length to one third, the diminishing of the previous add

delay will be 42% approximately. Although this reduction, the previous add delay will

keep on being dominant compared to the recodification time which is the only

operation that can be done in parallel with the previous add.

3.3 Multiplier unit design

The multiplication of two binary numbers, 21-bit length, 2s-complement and using the

algorithm with radix-8 recoding of the multiplier presents the following features:

a) Radix-8 recoding of the multiplier implies a reduction in the number of digits to 7:

Fig. 3.4: Multiplier recoding.

b) The partial products multiplexer must choose one out of nine possibilities depending

on the value of the corresponding signed-digit, as shown in figure 3.5:

14

Fig. 3.5: Partial products multiplexer.

c) The partial product length is two bits longer than the multiplicand length, giving

23-bit length partial products.

d) The number of partial products entering the Wallace tree structure is 8: 7 coming

from the multiplier recoded digits plus another partial product due to the compensation

bits of the 2scomplement multiplication algorithm which cannot be included in any of

the other 7 words.

e) The best structure for the reduction of 8 partial products applies only 4-2

compressors [7] (instead of the conventional full adders) .

The Wallace tree has the following scheme:

Fig. 8: Wallace reduction tree.

with an equivalent delay of 6 logic gates.

15

f) The previous and the final add must be done as fast as possible, so they are

implemented with carry and sum select adders (CSSA). In order to have a better

understanding of the multiplier design we are going to show an example following the

radix-8 recoding algorithm.

Consider the multiplication of these 2s-complement binary numbers:

Multiplicand: 111100010010110111001

Multiplier: 100011010100110100111

The multiplier recoding has the result shown here (following table 1):

The generation of three times the multiplicand gives:

The partial products array and its summation, which gives the multiplication

result, is shown in figure 9. In the array, some bits are encircled (fixed 1’s) and they

avoid the partial products sign extension. Some other bits are squared and they will be

1’s when the corresponding partial product has to be complemented (if recodification

gives a negative digit).

The leading four partial products will enter the first block of 4-2 compressors

while the other three partial products plus the compensation bits will enter the second

block of 4-2 compressors, still in the first compression level. Moreover, the final adder

has been decomposed in three adders with lengths 3, 6 and 31 bits. The 31-bit adder is

the proper final adder while the 3 and the 6-bit adders are used to advance bits of the

final result without passing through all the compression blocks in the Wallace tree.

16

CHAPTER 4

Wallace Tree

The Wallace tree method is used in high speed designs in order to produce two

rows of partial products that can be added in the last stage. Also critical path and the

number of adders get reduced when compared to the conventional parallel adders.

Here the Wallace tree has taken the role of accelerating the accumulation of the partial

products. Its advantage becomes more pronounced for multipliers of greater than 16

bits .The speed, area and power consumption of the multipliers will be in direct

proportion to the efficiency of the compressors.

The Wallace tree structure with 3:2 compressors and 4:2 compressors is

shown in Figure 3.2 and Figure 3.3 respectively. In this regard, we can expect a

significant reduction in computing multiplications.

17

Figure 4.2 Wallace Tree using 4:2 compressors

The 3:2 compressors make use of a carry save adder .The carry save adder

outputs two numbers of the same dimensions as the inputs, one is a sequence of

partial sum bits and other is a sequence of carry bits. In carry save adder, the carry

digit is taken from the right and passed to the left, just as in conventional addition; but

the carry digit passed to the left is the result of the previous calculation and not the

current one.

So in each clock cycle, carries only have to move one step along and the clock

can tick much faster. Also the carry-save adder produces all of its output values in

parallel, and thus has the same delay as a single full-adder. The 4:2 compressors have

been widely employed in the high speed multipliers to lower the latency of the partial

product accumulation stage.

A 4:2compressor can be built using two 3:2 compressors. Owing to its regular

interconnection, the 4:2 compressors is ideal for the construction of regularly

18

structured Wallace Tree with low complexity. The number of levels in the Wallace

tree using 3:2 compressors can be approximately given as

Number of Levels =

3.3Where, k is the number of partial products.

Table III shows the number of levels in the Wallace tree using 3:2 compressors

for different number of partial products.

Table III . NUMBER OF LEVELS IN THE WALLACE TREE

The final results obtained at the output of the Wallace tree are added using a

Carry Look-ahead Adder (CLA) which is independent of the number of bits of the

two operands. In Carry Look-ahead Adder, for every bit the carry and sum outputs are

independent of the previous bits and thus the rippling effect has completely been

eliminated.

19

It works by creating two signals, propagate and generate for each bit position,

based on whether a carry is propagated through from a less significant bit position, a

carry is generated in that bit position, or if a carry is killed in that bit position.

The design entry of 126×126 bit multipliers using Radix-4 Booth algorithm

with 3:2 compressors and Radix-8 Booth algorithm with 4:2 compressors are done

using VHDL and simulated using ModelSim SE 6.4 design suite from Mentor

Graphics. It is then synthesized and implemented in a Xilinx XC3S5000 fg1156 -4

FPGA using the Xilinx ISE 9.2i design suite.

Figure 4 presents a snapshot of simulation waveforms for 126×126 bit

multiplier. Table IV summarizes the FPGA resource utilization of these two

multipliers.

Finally the performance improvement is validated by implementing a higher

order FIR filter using these multipliers. Table V summarizes the FPGA resource

utilization for FIR filters using these multipliers.

This shows that the multiplier using Radix-8 Booth multiplier with 4:2

compressors gives better speed and the number of occupied slices is lower for the

multiplier using Radix-4 Booth algorithm with 3:2 compressors.

The FIR filters are implemented in Xilinx XC3S1500fg676-4 FPGA. The

specifications of the FIR filter chosen are as follows.

Sampling frequency : 24 KHz

Pass band frequency : 8 KHz

Stop band frequency : 9 KHz

Pass band ripple : 0.1 linear scale

Stop band attenuation : 0.001 linear scale

20

TABLE IV. DEVICE UTILIZATION SUMMARY OF MULTIPLIERS

CHAPTER 5

TOOLS AND HDL USED

5.1 ROLE OF HDL:

An HDL provides the framework for the complete logical design of the ASIC. All the

activities coming under the purview of an HDL are shown enclosed in bold dotted lines .

Verilog and VHDL are the two most commonly used HDLs today. Both have constructs with

which the design can be fully described at all the levels. There are additional constructs

available to facilitate setting up of the test bench, spelling out test vectors for them and

“observing” the outputs from the designed unit.

21

IEEE has brought out Standards for the HDLs, and the software tools conform to

them. Verilog as an HDL was introduced by Cadence Design Systems; they placed it into the

public domain in 1990. It was established as a formal IEEE Standard in 1995. The revised

version has been brought out in 2001. However, most of the simulation tools available today

conform only to the 1995 version of the standard.VHDL used by a substantial number of the

VLSI designers today is the used in this project for modeling the design.

We have used Xilinx ISE 9.2i for simulation and synthesis purposes. We

implemented the prescribed design in VHDL, a famous Industry and IEEE standard HDL.

5.2 NEEDS OF (V)HDL:

o Interoperability.

o Technology independence.

o Design reuse.

o Several levels of abstraction.

o Readability.

o Standard language.

o Widely supported.

What is VHDL?

VHDL = VHSIC Hardware Description Language(VHSIC = Very High-Speed IC)

Fig.5.1 Data Flow of VHDL

VHDL language are called as

Design specification language.

Design entry language.

Design simulation language.

Design documentation language.

An alternative to schematics.

5.2.1 BRIEF HISTORY:

22

Specify Capture Verify Formalize Implement

o VHDL was developed in the early 1980s for managing design problems that

involved large circuits and multiple teams of engineers.

o Funded by U.S Department of Defence.

o The first publicly available version was released in 1985.

o In 1986 IEEE (Institute of Electrical and Electronics Engineers) was presented

with a proposal to standardize the VHDL.

o In 1987 standardization => IEEE 1076-1987.

o An improved version of the language was released in 1994=> IEEE standard

1076-1993 .

Related Standards:

o IEEE 1076 doesn’t support simulation conditions such as unknown and high-

impedance.

o Soon after IEEE 1076-1987was released, simulator companies began using

their own, non-standard types=>VHDL was becoming a nonstandard.

o IEEE 1164 standard was developed by an IEEE.IEEE1164 contains definitions

for a nine –valued data type, std_logic.

5.3 VHDL ENVIRONMENT:

23

Fig 5.2 VHDL Environment

Design Units:

Segments of VHDL code that can be compiled separately and stored in a library.

Fig.5.3 Designs Uni

24

5.3 LEVELS OF ABSTRACTION:

VHDL supports many possible styles of design description, which differ

primarily in how closely they relate to the HW.

It is possible to describe a circuit in a number of ways.

Structural.

Data flow.

Behavioral.

Structural VHDL description:

• Circuit is described in terms of its components.

• From a low-level description (e.g., transistor-level description)to a high level

description.

• For large circuits, a low-level description quickly becomes impractical.

Dataflow VHDL Description:

• Circuit is described in terms of how data moves through the system.

• In the dataflow style you described how information flows between registers

in the system.

• The combinational of is described at a relatively high level, the placement

and operation register is specified quite precisely.

Fig 5.4.Data Flow Of VHDL Description

25

• The behavior of the system over the time is defined by registers.

• There are no build-in registers in VHDL-language.

-Either lowers level description.

-Or behavioral description of sequential elements is needed.

• The lower level descriptions must be created or obtained.

• If their is no 3rd party models for registers => you must write the behavioral

description of registers.

• The behavioral description can be provided in the form of

subprograms(functions or procedures).

Behavioral VHDL Description

• Circuit is described in terms of its operation over time.

• Representation might include, e.g., state diagrams ,timing diagrams and

algorithmic descriptions.

• The concept of time may be expressed precisely using delays(e.g., A<=B after

10ns).

• If no actual delay is used, order of sequential operations is defined.

• In the lower level of abstraction (e.g., RTL) synthesis tools ignore detailed

timing specifications.

• The actual timing results depend on implementation technology and efficiency

of synthesis tools.

• There are few tools for behavioral synthesis.

General format:

Process [(sensitivity list)]

Process_declarative_part

Begin

Process_statements

[wait_statement]

End process

26

CHAPTER 6

SOFTWARE TOOLS

6.1 SOFTWARE TOOL-XILINX:

Xilinx ISE is a software tool produced by Xilinx for synthesis and analysis of

HDL designs, which enables the developer to synthesize ("compile") their designs,

perform timing analysis , examine RTL diagrams, simulate a design's reaction to

different stimuli, and configure the target device with the programmer.

Xilinx was founded in 1984 by two semiconductor engineers, Ross Freeman

and Bernard Vonderschmitt, who were both working for integrated circuit and solid-

state device manufacturer Zilog Corp.

While working for Zilog, Freeman wanted to create chips that acted like a

blank tape, allowing users to program the technology themselves. At the time, the

concept was paradigm-changing. "The concept required lots of transistors and, at that

time, transistors were considered extremely precious – people thought that Ross's idea

was pretty far out", said Xilinx Fellow Bill Carter, who when hired in 1984 as the first

IC designer was the company's eighth employee.

Xilinx is a software tool, which is used to run the programs in VHDL

language. It has various versions like Xilinx 92.1, Xilinx 10.1, Xilinx 10.5 etc. Xilinx

has various pre-defined libraries ,packages.

6.2 VERSION 9.2I:

New Device Support.

This release supports the new Spartan™- 3A DSP family.

New Software Features.

Following are the new features in this release.

Operating System Support:

• Support for Windows® Vista Business 32-bit operating system.

• This operating system is supported, but has had limited testing.

• Support for Windows XP Professional 64-bit operating system

27

http://en.wikipedia.org/wiki/Transistors

http://en.wikipedia.org/wiki/Paradigm

http://en.wikipedia.org/wiki/Zilog

http://en.wikipedia.org/wiki/Integrated_circuit

http://en.wikipedia.org/wiki/Bernard_Vonderschmitt

http://en.wikipedia.org/wiki/Ross_Freeman

http://en.wikipedia.org/wiki/Engineers

http://en.wikipedia.org/wiki/Programmer_(hardware)

http://en.wikipedia.org/wiki/Register_transfer_level

http://en.wikipedia.org/wiki/Static_timing_analysis

http://en.wikipedia.org/wiki/Logic_synthesis

http://en.wikipedia.org/wiki/Hardware_description_language

http://en.wikipedia.org/wiki/Xilinx

• Support for Red Hat Enterprise WS 5.0 32-bit and 64-bit operating system.

This operating system is supported, but has had limited testing.

WHY XILINX ONLY?

We have many software tools to run the VHDL programs like cadence .But

compared to all software tools Xilinx is cost effective.

28

CHAPTER 7

TUTORIAL OF ISE8.2i

ISE 8.2i Quick Start Tutorial

The ISE 8.2i Quick Start Tutorial provides Xilinx PLD designers with

a quick overview of the basic design process using ISE 8.2i. After you have

completed the tutorial, you will have an understanding of how to create, verify, and

implement a design.

Note: This tutorial is designed for ISE 8.2i on Windows.

This tutorial contains the following sections:

• “Getting Started”

• “Create a New Project”

• “Create an HDL Source”

• “Design Simulation”

• “Create Timing Constraints”

• “Implement Design and Verify Constraints”

• “Reimplement Design and Verify Pin Locations”

• “Download Design to the Spartan™-3 Demo Board”

For an in-depth explanation of the ISE design tools, see the ISE In-Depth Tutorial on

the

Xilinx® web site at: http://www.xilinx.com/support/techsup/tutorials/

29

Getting Started

Software Requirements:

To use this tutorial, you must install the following software:

• ISE 8.2i

For more information about installing Xilinx® software, see the ISE Release Notes

and

Installation Guide at: http://www.xilinx.com/support/software_manuals.htm.

Hardware Requirements:

To use this tutorial, you must have the following hardware:

• Spartan-3 Startup Kit, containing the Spartan-3 Startup Kit Demo Board

Starting the ISE Software

To start ISE, double-click the desktop icon,

or start ISE from the Start menu by selecting:

Start → All Programs → Xilinx ISE 8.2i → Project Navigator

Note: Your start-up path is set during the installation process and may differ from the

one above.

Accessing Help

At any time during the tutorial, you can access online help for additional information

about the ISE software and related tools.

30

http://www.xilinx.com/support/software_manuals.htm

To open Help, do either of the following:

• Press F1 to view Help for the specific tool or function that you have selected or

highlighted.

• Launch the ISE Help Contents from the Help menu. It contains information about

creating and maintaining your complete design flow in ISE.

Figure 1: ISE Help Topics

Create a New Project

Create a new ISE project which will target the FPGA device on the Spartan-3 Startup

Kit demo board.

To create a new project:

1. Select File > New Project... The New Project Wizard appears.

2. Type tutorial in the Project Name field.

3. Enter or browse to a location (directory path) for the new project. A tutorial

subdirectory is created automatically.

4. Verify that HDL is selected from the Top-Level Source Type list.

5. Click Next to move to the device properties page.

6. Fill in the properties in the table as shown below:

♦ Product Category: All

31

♦ Family: Spartan3

♦ Device: XC3S200

♦ Package: FT256

♦ Speed Grade: -4

♦ Top-Level Module Type: HDL

♦ Synthesis Tool: XST (VHDL/Verilog)

♦ Simulator: ISE Simulator (VHDL/Verilog)

♦ Verify that Enable Enhanced Design Summary is selected.

Leave the default values in the remaining fields.

When the table is complete, your project properties will look like the following:

32

Figure 2: Project Device Properties

7. Click Next to proceed to the Create New Source window in the New Project

Wizard. At the end of the next section, your new project will be complete.

Create an Verilog HDL Source

In this section, I will create the a example top-level Verilog HDL file

Creating a Verilog Source

Create the top-level Verilog source file as follows:

1. Click New Source in the New Project dialog box.

33

2. Select Verilog Module as the source type in the New Source dialog box.

3. Type in the file name counter.

4. Verify that the Add to Project checkbox is selected.

5. Click Next.

6. Declare the ports for the counter design by filling in the port information as shown

below:

Figure 5: Define Module

34

7. Click Next, then Finish in the New Source Information dialog box to complete the

new source file template.

8. Click Next, then Next, then Finish.

The source file containing the counter module displays in the Workspace, and the

counter displays in the Sources tab, as shown below:

35

Figure 6: New Project in ISE

Using Language Templates (Verilog)

The next step in creating the new source is to add the behavioral description for

counter.

36

Use a simple counter code example from the ISE Language Templates and customize

it for the counter design.

1. Place the cursor on the line below the output [3:0] COUNT_OUT; statement.

2. Open the Language Templates by selecting Edit → Language Templates…

Note: You can tile the Language Templates and the counter file by selecting Window

→ Tile Vertically to make them both visible.

3. Using the “+” symbol, browse to the following code example:

Verilog → Synthesis Constructs → Coding Examples → Counter → Binary →

Up/Down Counters → Simple Counter

4. With Simple Counter selected, select Edit → Use in File, or select the Use

Template in File toolbar button. This step copies the template into the counter source

file.

5. Close the Language Templates.

Final Editing of the Verilog Source

1. To declare and initialize the register that stores the counter value, modify the

declaration statement in the first line of the template as follows:

replace: reg [<upper>:0] <reg_name>;

with: reg [3:0] count_int = 0;

2. Customize the template for the counter design by replacing the port and signal

name

placeholders with the actual ones as follows:

♦ replace all occurrences of <clock> with CLOCK

♦ replace all occurrences of <up_down> with DIRECTION

♦ replace all occurrences of <reg_name> with count_int

37

3. Add the following line just above the endmodule statement to assign the register

value to the output port:

assign COUNT_OUT = count_int;

4. Save the file by selecting File → Save.

When you are finished, the code for the counter will look like the following:

module counter(CLOCK, DIRECTION, COUNT_OUT);

input CLOCK;

input DIRECTION;

output [3:0] COUNT_OUT;

reg [3:0] count_int = 0;

always @(posedge CLOCK)

if (DIRECTION)

count_int <= count_int + 1;

else

count_int <= count_int - 1;

assign COUNT_OUT = count_int;

endmodule

You have now created the Verilog source for the tutorial project.

Checking the Syntax of the New Counter Module

When the source files are complete, check the syntax of the design to find errors and

typos.

1. Verify that Synthesis/Implementation is selected from the drop-down list in the

Sources window.

38

2. Select the counter design source in the Sources window to display the related

processes in the Processes window.

3. Click the “+” next to the Synthesize-XST process to expand the process group.

4. Double-click the Check Syntax process.

Note: You must correct any errors found in your source files. You can check for

errors in the Console tab of the Transcript window. If you continue without valid

syntax, you will not be able to simulate or synthesize your design.

5. Close the HDL file.

Design Simulation

Verifying Functionality using Behavioral Simulation

Create a test bench waveform containing input stimulus you can use to verify the

functionality of the counter module. The test bench waveform is a graphical view of a

test bench.

Create the test bench waveform as follows:

1. Select the counter HDL file in the Sources window.

2. Create a new test bench source by selecting Project → New Source.

3. In the New Source Wizard, select Test Bench WaveForm as the source type, and

type counter_tbw in the File Name field.

4. Click Next.

5. The Associated Source page shows that you are associating the test bench

waveform with the source file counter. Click Next.

6. The Summary page shows that the source will be added to the project, and it

displays the source directory, type and name. Click Finish.

7. You need to set the clock frequency, setup time and output delay times in the

Initialize Timing dialog box before the test bench waveform editing window opens.

39

The requirements for this design are the following:

♦ The counter must operate correctly with an input clock frequency = 25 MHz.

♦ The DIRECTION input will be valid 10 ns before the rising edge of CLOCK.

♦ The output (COUNT_OUT) must be valid 10 ns after the rising edge of CLOCK.

The design requirements correspond with the values below.

Fill in the fields in the Initialize Timing dialog box with the following information:

♦ Clock Time High: 20 ns.

♦ Clock Time Low: 20 ns.

♦ Input Setup Time: 10 ns.

♦ Output Valid Delay: 10 ns.

♦ Offset: 0 ns.

♦ Global Signals: GSR (FPGA)

Note: When GSR(FPGA) is enabled, 100 ns. is added to the Offset value

automatically.

♦ Initial Length of Test Bench: 1500 ns.

Leave the default values in the remaining fields.

40

Figure 7: Initialize Timing

41

8. Click Finish to complete the timing initialization.

9. The blue shaded areas that precede the rising edge of the CLOCK correspond to the

Input Setup Time in the Initialize Timing dialog box. Toggle the DIRECTION port to

define the input stimulus for the counter design as follows:

♦ Click on the blue cell at approximately the 300 ns to assert DIRECTION high so

that the counter will count up.

♦ Click on the blue cell at approximately the 900 ns to assert DIRECTION high so

that the counter will count down.

Note: For more accurate alignment, you can use the Zoom In and Zoom Out toolbar

buttons.

Figure 8: Test Bench Waveform

42

10. Save the waveform.

11. In the Sources window, select the Behavioral Simulation view to see that the test

bench waveform file is automatically added to your project.

Figure 9: Behavior Simulation Selection

12. Close the test bench waveform.

Create a Self-Checking Test Bench Waveform

Add the expected output values to finish creating the test bench waveform.

This transforms the test bench waveform into a self-checking test bench waveform.

The key benefit to a self-checking test bench waveform is that it compares the desired

and actual output values and flags errors in your design as it goes through the various

transformations, from behavioral HDL to the device specific representation.

To create a self-checking test bench, edit output values manually, or run the

Generate Expected Results process to create them automatically. If you run the

Generate Expected Results process, visually inspect the output values to see if they

are the ones you expected for the given set of input values.

43

To create the self-checking test bench waveform automatically, do the following:

1. Verify that Behavioral Simulation is selected from the drop-down list in the

Sources window.

2. Select the counter_tbw file in the Sources window.

3. In the Processes tab, click the “+” to expand the Xilinx ISE Simulator process and

double-click the Generate Expected Simulation Results process. This process

simulates the design in a background process.

4. The Expected Results dialog box opens. Select Yes to annotate the results to the

test bench.

Figure 10: Expected Results Dialog Box

5. Click the “+” to expand the COUNT_OUT bus and view the transitions that

correspond to the Output Delay value (yellow cells) specified in the Initialize Timing

dialog box.

44

Figure 11: Test Bench Waveform with Results

6. Save the test bench waveform and close it.

You have now created a self-checking test bench waveform.

Simulating Design Functionality

Verify that the counter design functions as you expect by performing behavior

simulation

as follows:

1. Verify that Behavioral Simulation and counter_tbw are selected in the Sources

window.

2. In the Processes tab, click the “+” to expand the Xilinx ISE Simulator process and

double-click the Simulate Behavioral Model process.

45

The ISE Simulator opens and runs the simulation to the end of the test bench.

3. To view your simulation results, select the Simulation tab and zoom in on the

transitions.

The simulation waveform results will look like the following:

Figure 12: Simulation Results

Note: You can ignore any rows that start with TX.

4. Verify that the counter is counting up and down as expected.

5. Close the simulation view. If you are prompted with the following message, “You

have an active simulation open. Are you sure you want to close it?“, click Yes to

continue.You have now completed simulation of your design using the ISE Simulator.

46

CHAPTER-8

HARDWARE TOOLS

A field-programmable gate array (FPGA) is a semiconductor device that can

be configured by the customer or designer after manufacturing—hence the name

"field-programmable". FPGAs are programmed using a logic circuit diagram or a

source code in a hardware description language (HDL) to specify how the chip will

work.

They can be used to implement any logical function that an application-

specific integrated circuit (ASIC) could perform, but the ability to update the

functionality after shipping offers advantages for many applications. FPGAs contain

programmable logic components called "logic blocks", and a hierarchy of

reconfigurable interconnects that allow the blocks to be "wired together"—somewhat

like a one-chip programmable breadboard.

Logic blocks can be configured to perform complex combinational functions,

or merely simple logic gates like AND and XOR. In most FPGAs, the logic blocks

also include memory elements, which may be simple flip-flops or more complete

blocks of memory.

7.1 HISTORY

The FPGA industry sprouted from programmable read only memory (PROM)

and programmable logic devices (PLDs). PROMs and PLDs both had the option of

being programmed in batches in a factory or in the field (field programmable),

however programmable logic was hard-wired between logic gates.

Xilinx Co-Founders, Ross Freeman and Bernard Vonderschmitt, invented the

first commercially viable field programmable gate array in 1985 – the XC2064. The

XC2064 had programmable gates and programmable interconnects between gates, the

beginnings of a new technology and market. The XC2064 boasted a mere 64

configurable logic blocks (CLBs), with two 3-input lookup tables (LUTs). More than

20 years later, Freeman was entered into the National Inventor's Hall of Fame for his

invention.

47

7.2 ARCHITECTURE

The most common FPGA architecture consists of an array of configurable

logic blocks (CLBs), I/O pads, and routing channels. Generally, all the routing

channels have the same width (number of wires). Multiple I/O pads may fit into the

height of one row or the width of one column in the array.

An application circuit must be mapped into an FPGA with adequate resources.

While the number of CLBs and I/Os required is easily determined from the design,

the number of routing tracks needed may vary considerably even among designs with

the same amount of logic.

Fig 7.1 Internal Structure of FPGA

7.3 APPLICATIONS

Applications of FPGAs include digital signal processing, software-defined

radio, aerospace and defense systems, ASIC prototyping, medical imaging, computer

vision, speech recognition, cryptography, bioinformatics, computer hardware

emulation, radio astronomy and a growing range of other areas.

7.4 A BRIEF TUTORIAL: SOURCE CODE IS DUMPED INTO FPGA.

48

1. Now let’s look at the flow for actually synthesizing and implementing the

design in the FPGA prototyping boards. Close ModelSim and go back to the

Xilinx ISE environment. In the Sources subwindow change the selection in

the dropdown box from “Behavioral Simulation” to

“Synthesis/Implementation”.

2. To properly synthesize the design we need to specify which pins on the chip

all the inputs and outputs should be assigned to. In general of course we could

assign the signals just about any way we want. Since we will be using specific

prototype boards, we need to make sure our pins assignments match the

switches, buttons, and LEDs so we can test our design. We will be starting

with Digilab 2E boards that are connected to Digilab DIO2 input/output

boards. The I/O board has already been programmed and configured to have

the following connections:

49

3. To assign specific pins, expand the User Constraints selection under the

Process subwindow and double-click on Assign Package Pins.

50

4. A new application called Xilinx PACE should be launched.

a. In the Design Object List subwindow you should see a listing of all the

input and output signals from our design.

51

Here is where we can specify which pin locations we want for each signal.

Simply enter the pins numbers from the tables shown in Step 19 above,

making sure to use a capital letter “P” in front of the pin specification.

Let’s assign our signals as A P163 (Switch 1)

I0 P164 (Switch 2)

I1 P166 (Switch 3)

52

Y P149 (LED 0)

Once all pins have been assigned, save your constraints by selecting File

Save from the menu bar and exit Xilinx Pace.

5. Back in the Xilinx ISE. In the Process subwindow double-click on the

Synthesize – XST selection and wait for the process to complete. Then

double-click on the Implement Design selection and wait for the process to

complete. Then double-click on the Generate Programming File selection and

wait for the process to complete. If all goes well, you should have green

checks marks for the whole design.

53

6. There is a lot of information you can obtain through all of the objects listed in

the Processes subwindow, but let us proceed to downloading the design onto

the prototyping board for testing. First make sure the prototyping board is

connected to the PC and has power on. Also make sure the slide switch on the

FPGA board by the parallel port is set to JTAG (as opposed to “Port”). Then

select Configure Device (iMPACT) underneath the Generate Programming

File selection. You should the following window

54

7. Now you need to specify which bitstream file to use to configure the device.

For this tutorial we want to select the mux.bit file and click Open.

55

You will probably get the message below. Just click Yes.

56

You will also get a warning message saying the JTAG clock was updated in

the bitstream file (which is good) so just click OK. There is a way to correct

for that in the original design flow, but Xilinx automatically catches it here so

I don’t usually bother.

8. You should now see the Spartan XC2S200E chip in the main window. Right

click on the chip to prepare for downloading the bitstream file.

Select Program on the resulting window.

57

9. Click OK.

58

If all goes well you should get the Programming Succeeded message

10. Now just test and verify your design on the actual FPGA board!

CONCLUSION

It has been performed the design, implementation and simulation of a 21´21-

bit, radix-8, multiplier unit for specific purpose. The number of transistors is 8224

with an active area size of 2.97 mm2. The measured multiplication time is 9.4 ns and

the power dissipation is 60.7 mW at the frequency of 10 MHz It has been proved that

it can be useful to apply a radix-8 architecture in high-speed multipliers for specific

purpose because of the gain in time and number of transistors compared to the

conventional radix-4 recoding architecture.

This can be achieved with a slight modification in the previous adder. To do

the modification is needed to store two additional bits (intermediate carries) for each

word in the set of numbers. Memory needs are increased in a 9.5% while time

decrease in the previous adder can be estimated in a 42%. Due to this, the overall

multiplication time can be reduced with our radix-8 architecture for specific purpose.

59

REFERENCE

[1] Dong-Wook Kim, Young-Ho Seo, “A New VLSI Architecture of Parallel

Multiplier-Accumulator based on Radix-2 Modified Booth Algorithm”, Very Large

Scale Integration (VLSI) Systems, IEEE Transactions, vol.18, pp.: 201-208, 04 Feb.

2010

[2] Prasanna Raj P, Rao, Ravi, “VLSI Design and Analysis of Multipliers for Low

Power”, Intelligent Information Hiding and Multimedia Signal Processing, Fifth

International Conference, pp.: 1354-1357, Sept. 2009

60

[3] Lakshmanan, Masuri Othman and Mohamad Alauddin Mohd.Ali, “High

Performance Parallel Multiplier using Wallace-Booth Algorithm”, Semiconductor

Electronics, IEEE International Conference , pp.: 433- 436, Dec. 2002.

[4] Jan M Rabaey, “Digital Integrated Circuits, A Design Perspective”, Prentice Hall,

Dec.1995

[5] Louis P. Rubinfield, “A Proof of the Modified Booth's Algorithm for

Multiplication”, Computers, IEEE Transactions,vol.24, pp.: 1014-1015, Oct. 1975

[6] Rajendra Katti, “A Modified Booth Algorithm for High Radix Fixedpoint

Multiplication”, Very Large Scale Integration (VLSI) Systems, IEEE Transactions,

vol. 2, pp.: 522-524, Dec. 1994.

7] C. S. Wallace, “A Suggestion for a Fast Multiplier”, Electronic Computers, IEEE

Transactions, vol.13, Page(s): 14-17, Feb. 1964

[8] Hussin R et al , “An Efficient Modified Booth Multiplier Architecture”, IEEE

International Conference, pp.:1-4, 2008.

61

high bit rate_mul

Documents

vlsi design

vlsi circuits

complexity of vlsi

vlsi domain1

history of vlsi

vlsi device

status of vlsi technology

hundreds of transistors