unit-iii case studies -fpga & cpga architectures applications

Dr.Y.Narasimmha Murthy Ph.D [email protected]

UNIT III : CASE STUDIES

[CPLD & FPGA ARCHITECTURE & APPLICATIONS]

INTRODUCTION:

The Field Programmable Gate Arrays consist of an array of programmable logic blocks

including general logic, memory and multiplier blocks, surrounded by a programmable routing

fabric that allows blocks to be . The array is surrounded by programmable input/output blocks,

labeled I/O in the figure, that connect the chip to the outside world. Here the term

“programmable” indicates an ability to program a function into the chip after completion of

silicon fabrication . This is possible by the programming technology, which is a method that

can cause a change in the behavior of the pre-fabricated chip after fabrication, in the “field,”

where system users create designs. The first programmable logic devices used very small fuses

as the programming technology. Every FPGA depends on a programming technology that is

used to control the programmable switches that give FPGAs their programmability.

Programming Technologies

There are a number of programming technologies that have been used for reconfigurable

architectures. Each of these technologies have different characteristics and have significant effect

on the programmable architecture. Some of the well-known technologies are

(i).SRAM Based Programming Technology (ii).Flash Programming Technology(EEPROM) ,

and (iii) Anti-fuse based Programming Technology

SRAM-Based Programming Technology

Static memory cells are the basic cells used for SRAM-based FPGAs. Most commercial vendors like

XILINX, Lattice and Altera etc.. use static memory (SRAM) based programming technology in their

devices. These devices use static memory cells which are divided throughout the FPGA to provide

configurability. An example of such memory cell is shown below .In an SRAM-based FPGA, SRAM

cells are mainly used for following purposes

(i). To program the routing interconnect of FPGAs which are generally steered by small multiplexors.

1


(ii). To program Configurable Logic Blocks (CLBs) that are used to implement logic functions.

There are two primary uses for the SRAM cells. Most are used to set the select lines to

multiplexers that steer interconnect signals. The majority of the remaining SRAM cells are used

to store the data in the lookup-tables (LUTs) that are typically used in SRAM-based FPGAs to

implement logic functions. Historically, SRAM cells were used to control the tri-state buffers

and simple pass transistors that were also used for programmable interconnect.

SRAM-based programming technology has become the dominant approach for FPGAs because

of its re-programmability and the use of standard CMOS process technology and therefore

leading to increased integration, higher speed and lower dynamic power consumption of new

process with smaller geometry.

There are however a number of drawbacks associated with SRAM-based programming

technology. For example an SRAM cell requires 6 transistors which makes this technology

costly in terms of area compared to other programming technologies.

Further SRAM cells are volatile in nature and external devices are required to permanently store

the configuration data. These external devices add to the cost and area overhead of SRAM-based

FPGAs.

There is a problem in terms of security of data also. Since the configuration information must be

loaded into the device at power up, there is the possibility that the configuration information

2


could be intercepted and stolen for use in a competing system. To overcome this problem certain

encryption techniques are followed.

Electrical properties of pass transistors are not ideal. i.e SRAM-based FPGAs typically rely on

the use of pass transistors to implement multiplexers. However, they are far from ideal switches

as they have significant on-resistances and present an appreciable capacitive load. As FPGAs

migrate to smaller device geometries these issues may be exacerbated.

Flash Programming Technology

An important alternative to the SRAM-based programming technology is the use of flash or

EEPROM based programming technology. This technology inject charge onto a gate that

“floats” above the transistor. This approach is used in flash or EEPROM memory cells. These

cells are non-volatile; they do not lose information when the device is powered down. With

modern IC fabrication processes, it has become possible to use the floating gate cells directly as

switches. Flash memory cells, in particular, are now used because of their improved area

efficiency. The widespread use of flash memory cells for non-volatile memory chips ensures that

flash manufacturing processes will benefit from steady decreases in process geometries.

Flash-based programming technology offers several advantages. For example, this programming

technology is nonvolatile in nature. Flash-based programming technology is also more area

efficient than SRAM-based programming technology. Flash-based programming technology has

its own disadvantages also. Unlike SRAM-based programming technology, flash based devices

cannot be reconfigured/reprogrammed an infinite number of times. Also, flash-based technology

uses non-standard CMOS process.

This flash-based programming technology offers several unique advantages, most importantly

non-volatility. This feature eliminates the need for the external resources required to store and

load configuration data when SRAM-based programming technology is used. Additionally,

a flash-based device can function immediately upon power-up instead of having to wait for the

loading of configuration data. The flash approach is also more area efficient than SRAM-based

technology which requires up to six transistors to implement the programmable storage. The

programming circuitry, such as the high and low voltage buffers needed to program the cell,

contributes an area overhead not present in SRAM-based devices. However, this cost is

relatively modest as it is amortized across numerous programmable elements. In comparison to

3


anti-fuses, an alternative non-volatile programming technology, flash-based FPGAs are

reconfigurable and can be programmed without being removed from a printed circuit board. The

use of a floating-gate to control the switching transistor adds design complexity because care

must be taken to ensure the source–drain voltage remains sufficiently low to prevent charge

injection into the floating gate . Since newer processes require lower voltage levels, this issue

may become less of a concern in the future .One disadvantage of flash-based devices is that they

cannot be reprogrammed an infinite number of times. Charge buildup in the oxide eventually

prevents a flash-based device from being properly erased and programmed . Devices such as the

Actel ProASIC3 are useful for only 500 programming cycles . For most of the uses of

FPGAs ,this programming count is more than sufficient. In many cases FPGAs are programmed

for only one use. Another significant disadvantage of flash devices is the need for a non-

standard CMOS process. Also, like the static memory-based technology, this programming

technology suffers from relatively high resistance and capacitance due to the use of transistor-

based switches. One trend that has recently emerged is the use of flash storage in combination

with SRAM programming technology.

In devices from Altera, Xilinx and Lattice, on-chip flash memory is used to provide non-

volatile storage while SRAM cells are still used to control the programmable elements in the

design. This addresses the problems associated with the volatility of pure-SRAM approaches,

such as the cost of additional storage devices or the possibility of configuration data interception,

while maintaining the infinite re-configurability of SRAM-based devices.

It is important to recognize that, since the programming technology is still based on SRAM cells,

the devices are no different than pure-SRAM based devices from an FPGA architecture

standpoint. However, the incorporation of flash memory generally means that the processing

technology will not be as advanced as pure-SRAM devices. Additionally, the devices incur more

area overhead than pure-SRAM devices since both flash and SRAM bits are required for every

programmable element.

Anti-fuse Programming Technology

An alternative to SRAM and floating gate-based technologies is anti fuse programming

technology. This technology is based on structures which exhibit very high-resistance under

normal circumstances but can be programmably “blown” (in reality, connected) to create a low

resistance link.

4


An anti-fuse is a two terminal device with an unprogrammed state presenting a very high

resistance between its terminals. When a high voltage (from 11 to 20 volts, depending on the

type of anti-fuse) is applied across its terminals the anti-fuse will “blow” and create a low

resistance link. This link is permanent. Anti-fuses in use today are built either using an Oxygen-

Nitrogen-Oxygen (ONO) dielectric between N+ diffusion and poly-silicon or amorphous silicon

between metal layers or between polysilicon and the first layer of metal.

Programming an anti-fuse requires extra circuitry to deliver the high programming voltage and a

relatively high current of 5 mA or more. This is done in through fairly sizable pass transistors to

provide addressing to each anti-fuse. Anti-fuse technology is used in the FPGA’s from Actel ,

Quick logic , and Cross point.

A major advantage of the anti-fuse is its small size, little more than the cross-section of two

metal wires. But this advantage is limited by the large size of the necessary programming

transistors, which handle large currents, and the inclusion of isolation transistors that are

sometimes needed to protect low voltage transistors from high programming voltages.

A second major advantage of an anti-fuse is its relatively low series resistance. The on-resistance

of the ONO anti-fuse is 300 to500 ohms, while the amorphous silicon anti-fuse is 50 to100 ohms.

Additionally, the parasitic capacitance of an un programmed amorphous anti-fuse is significantly

lower than for other programming technologies.

The limitations of this technology are , this technology does not make use of standard CMOS

process. Also, anti-fuse programming technology based devices cannot be reprogrammed. The

ideal technology should be re-programmable, non-volatile, and that uses a standard CMOS

process. But it is clear that none of the above technologies satisfy these conditions.

However, SRAM-based programming technology is the most widely used programming

technology. The main reason is its use of standard CMOS process .Due to this reason it is

expected that this technology will continue to dominate the other two programming technologies.

5


Comparison of Programming Technologies

Programmin

g Technology

Re-Programmable Volatile

Storage

Series

Resistance

Capacitance

in pf

Cell Area

Static RAM In-circuit Yes 1KΩ 15 5X

Anti-Fuse No No 50-500 Ω 1.2 – 5.0 1X

EPROM Outside circuit No 2 KΩ 10 1X

EEPROM In-Circuit No 2 KΩ 10 2X

XILINX XC3000 FPGA Device

Xilinx introduced the first FPGA family, called the XC2000 series, in 1985 and next offered

three more series of FPGAs namely XC3000, XC4000, and XC5000 etc. The first modern-era

FPGA was introduced with 64 logic blocks and 58 inputs and outputs. XC3000 series of FPGA

devices were introduced in 1985 by XILINX Inc.This was the most successful family of FPGAs.

The XC3000 archtecture includes enhancements to the XC2000 architecture to improve

performance ,density and usability. The XC3000 architecture was developed with manual tools

for design implementation and the architecture also shows a bias towards manual design. The

XC3000 Family covers a range of nominal device densities from 2,000 to 9,000 gates, practically

achievable densities from 1,000 to 6,000 gates with up to 144 user-definable I/Os. Device

speeds, described in terms of maximum guaranteed toggle frequencies, range from 70 to 125

MHz. The XC3000 Configurable Logic block is substantially larger than XC2000 and Each of

the lookup tables has four inputs and requires 16 bits of configuration memory.

The two lookup tables can be combined with a multiplexer to produce any function of five inputs

and some functions of up to seven inputs.The XC3000 archtecture allows faster logic

implementation with minimum CLBs in series.

There are now four distinct familes within the XC3000 Series of FPGA devices

• XC3000A Family

• XC3000L Family

• XC3100A Family

• XC3100L Family

6


All four families share a common architecture, development software, design and programming

methodology, and also common package pin-outs.

• XC3000A Family : The XC3000A is an enhanced version of the basic XC3000 family,

featuring additional interconnect resources and other user-friendly enhancements.

• XC3000L Family : The XC3000L is identical in architecture and features to the XC3000A

family, but operates at a nominal supply voltage of 3.3 V. The XC3000L is the right solution for

battery-operated and low-power applications.

• XC3100A Family — The XC3100A is a performance-optimized relative of the XC3000A

family. While both families are bit stream and footprint compatible, the XC3100A family

extends toggle rates to 370 MHz and in-system performance to over 80 MHz. The XC3100A

family also offers one additional array size, the XC3195A.

• XC3100L Family — The XC3100L is identical in architectures and features to the XC3100A

family, but operates at a nominal supply voltage of 3.3V

The basic LCA (Logic Cell Array) of XC3000 consists of three components .They are

Programmable I/O Blocks , Configurable Logic Block and Programmable Interconnect. In

addition to this a small amount of configurable memory is also present .

Programmable I/O Block

Each user-configurable IOB as shown below, provides an interface between the external

package pin of the device and the internal user logic. Each IOB includes both registered and

direct input paths. Each IOB provides a programmable3-state output buffer, which may be driven

by a registered or direct output signal. Configuration options allow each IOB an inversion, a

controlled slew rate and a high impedance pull-up. Each input circuit also provides input

clamping diodes to provide electrostatic protection, and circuits to inhibit latch-up produced by

input currents.

7


Each IOB includes input and output storage elements and I/O options selected by configuration

memory cells. A choice of two clocks is available on each die edge. The polarity of each clock

line (not each flip-flop or latch) is programmable. A clock line that triggers the flip-flop on the

rising edge is an active Low Latch Enable (Latch transparent) signal and vice versa. Passive pull-

up can only be enabled on inputs, not on outputs. All user inputs are programmed for TTL or

CMOS thresholds.

Configurable Logic Block.

Each CLB includes a combinatorial logic section, two flip-flops and a program memory

controlled multiplexer selection of function. It has the following components

Five logic variable inputs A, B, C, D, and E

a direct data in DI

an enable clock EC

a clock (invertible) K

an asynchronous direct RESET RD

Two outputs X and Y.

8


XC3000 CLB

Each CLB has a combinatorial logic section, two flip-flops, and an internal control section. The

CLB has five logic inputs (A, B, C, D and E) ; a common clock input(K); an asynchronous

direct RESET input (RD) and an enable clock (EC) as shown in the block diagram. Each CLB

also has two outputs (X and Y) which may drive interconnect networks. Data input for the flip-

flops within a CLB is supplied from the function F or G outputs of the combinatorial logic, or

the block input, DI. Both flip-flops in each CLB share the asynchronous RD which, when

enabled , is dominant over clocked inputs. All flip-flops are reset by the active-Low chip input,

RESET, or during the configuration process. The flip-flops share the enable clock (EC) which,

when Low, re circulates the flip-flops present states and inhibits response to the data-in or

combinatorial function inputs on a CLB. The user may enable these control inputs and select

their sources. The user may also select the clock net input (K), as well as its active sense within

each CLB. This programmable inversion eliminates the need to route both phases of a clock

signal throughout the device.

Programmable Interconnect :

Programmable-interconnection resources in the Field Programmable Gate Array provide routing

paths to connect inputs and outputs of the IOBs and CLBs into logic networks. Interconnections

9


between blocks are composed of a two-layer grid of metal segments. Specially designed pass

transistors, each controlled by a configuration bit, form programmable interconnect points (PIPs)

and switching matrices used to implement the necessary connections between selected metal

segments and block pins.

Three types of metal resources are provided to accommodate various network interconnect

requirements.

• General Purpose Interconnect

• Direct Connection

• Long lines (multiplexed busses and wide AND gates)

XC3000 Interconnect

XILINX XC4000 FPGA Device : The XC4000 features a Configurable Logic Block (CLB)

that is based on look-up tables (LUTs). A LUT is a small one bit wide memory array, where the

address lines for the memory are inputs of the logic block and the one bit output from the

memory is the LUT output. A LUT with K inputs would then correspond to a 2K x 1 bit memory

and can realize any logic function of its K inputs by programming the logic function’s truth table

directly into the memory. The XC4000 CLB contains three separate LUTs, in the configuration

10


as shown below. There are two 4-input LUTS that are fed by CLB inputs, and the third LUT can

be used in combination with the other two. This arrangement allows the CLB to implement a

wide range of logic functions of up to nine inputs, two separate functions of four inputs or other

possibilities. Each CLB also contains two flip-flops.

Xilinx XC4000 Configurable Logic Block (CLB).

To provide high density devices that support the integration of entire systems, the XC4000

chips have “system oriented” features. For example, each CLB contains circuitry that allows it to

efficiently perform arithmetic (i.e., a circuit that can implement a fast carry operation for adder-

like circuits) and also the LUTs in a CLB can be configured as read/write RAM cells. A new

version of this family, the 4000E, has the additional feature that the RAM can be configured as a

dual port RAM with a single write and two read ports. In the 4000E, RAM blocks can be

synchronous RAM. Also, each XC4000 chip includes very wide AND-planes around the

periphery of the logic block array to facilitate implementing circuit blocks such as wide

decoders.

11


The other important feature of this FPGA is its interconnect structure. The XC4000

interconnect is arranged in horizontal and vertical channels. Each channel contains some number

of short wire segments that span a single CLB (the number of segments in each channel depends

on the specific part number), longer segments that span two CLBs, and very long segments that

span the entire length or width of the chip. Programmable switches are available to connect the

inputs and outputs of the CLBs to the wire segments, or to connect one wire segment to another..

The figure below shows only the wire segments in a horizontal channel, and does not show the

vertical routing channels, the CLB inputs and outputs, or the routing switches. The salient feature

about the Xilinx interconnect is that signals must pass through switches to reach one CLB from

another, and the total number of switches traversed depends on the particular set of wire

segments used. Thus, speed-performance of an implemented circuit depends in part on how the

wire segments are allocated to individual signals by CAD tools.

Actel FPGAs

In contrast to XILINX FPGAs the devices manufactured by Actel are based on anti fuse

technology. Actel offers three main families .They are : Act 1, Act 2, and Act 3. Actel devices

are based on a structure similar to traditional gate arrays; the logic blocks are arranged in rows

and there are horizontal routing channels between adjacent rows. This architecture is shown in

figure below. The logic blocks in the Actel devices are relatively small in comparison to the LUT

based ones. , and are based on multiplexers. The figure illustrates the logic block in the Act 3

and shows that it comprises an AND and OR gate that are connected to a multiplexer based

12


circuit block. The multiplexer circuit is arranged such that, in combination with the two logic

gates, a very wide range of functions can be realized in a single logic block. About half of the

logic blocks in an Act 3 device also contain a flip-flop.

Actel FPGA structure.

Actel’s interconnect is organized in horizontal routing channels. The channels consist of wire

segments of various lengths with anti-fuses to connect logic blocks to wire segments or one wire

to another. Also, Actel chips have vertical wires that overlay the logic blocks, for signal paths

that span multiple rows. In terms of speed-performance, it is evident that Actel chips are not fully

predictable, because the number of anti-fuses traversed by a signal depends on how the wire

13


segments are allocated during circuit implementation by CAD tools. However, Actel provides a

rich selection of wire segments of different length in each channel and has developed algorithms

that guarantee strict limits on the number of anti-fuses traversed by any two-point connection in

a circuit which improves speed-performance significantly.

Quicklogic pASIC FPGAs :

The Quicklogic is the main competitor for Actel in anti-fuse -based FPGAs . It produces two

families of devices, called pASIC and pASIC-2. The pASIC-2 is an enhanced version of

pASIC. The pASIC, consists of a regular two-dimensional array of blocks called pASIC Logic

Blocks (pLBs).The logic capacities of first generation of Quick Logic FPGAs is between 48 and

380pLBs,or 500 to 4000 equivalent MPGAs gates.

As shown in figure below pASIC has similarities to other FPGAs i.e the overall structure is

array-based like Xilinx FPGAs, and logic blocks use multiplexers similar to Actel FPGAs, and

the interconnect consists of only long- lines like in Altera FLEX 8000. It is to be noted that the

pASIC architecture is now independently developed by Cypress also.

Structure of Quicklogic pASIC FPGA.

It consists of a top layer of metal, an insulating layer of amorphous silicon, and a bottom layer of

metal. When compared to Actel’s PLICE anti-fuse, Via Link offers a very low on-resistance of

about 50 ohms (PLICE is about 300 ohms) and a low parasitic capacitance. The Via Link anti-

14


fuses are present at every crossing of logic block pins and interconnect wires, providing generous

connectivity.

Quicklogic (Cypress) Logic Cell

pASIC’s multiplexer-based logic block is shown in the above figure. It is more complex than

Actel’s Logic Module, with more inputs and wide (6-input) AND-gates on the multiplexer select

lines. Every logic block also contains a flip- flops.

Altera FLEX 8000 and FLEX 10000 FPGAs :

The first FPGA chips from Aletra were simple arrays of logic cells ,which are relatively simple

logic elements (LEs),each element comprising of a three input look-up table (LUT ) to generate

logic functions ,a single configurable flip-flop and multiplexers for routing the signals and

selecting clocks. The logic cells were connected by switch boxes instead of fixed interconnect.

The general architecture of Altera’s FPGAs is shown in the diagram below .

.

15


There are two high performance FPGA series called FLEX series. Altera’s FLEX 8000 series

consists of a three-level hierarchy similar to CPLDs. However, the lowest level of the hierarchy

consists of a set of lookup tables, rather than an SPLD like block, and so the FLEX 8000 is

categorized here as an FPGA. It should be noted, however ,that FLEX 8000 is a combination of

FPGA and CPLD technologies. FLEX 8000 is SRAM-based and features a four-input LUT as its

basic logic block. Logic capacity ranges from about 4000gates to more than 15,000 for the 8000

series.

The architecture of FLEX 8000 is shown in figure below. The basic logic block, called a Logic

Element (LE) contains a four-input LUT, a flip-flop, and special-purpose carry circuitry for

arithmetic circuits (similar to Xilinx XC 4000). The LE also includes cascade circuitry that

allows for efficient implementation of wide AND functions.

In the FLEX 8000, LEs are grouped into sets of 8, called Logic Array Blocks (LABs, a term

borrowed from Altera’s CPLDs). As shown in Figure below each LAB contains local

interconnect and each local wire can connect any LE to any other LE within the same LAB.

Architecture of Altera FLEX 8000 FPGAs.

16


Altera FLEX 8000 Logic Element (LE).

Local interconnect also connects to the FLEX 8000’s global interconnect, called Fast Track. Fast

Track is similar to Xilinx long lines in that each Fast Track wire extends the full width or height

of the device. However, a major difference between FLEX 8000 and Xilinx chips is that Fast

Track consists of only long lines. This makes the FLEX 8000 easy for CAD tools to

automatically configure. All Fast-Track wires horizontal wires are identical, and so interconnect

delays in the FLEX 8000 are more predictable than FPGAs that employ many smaller length

segments because there are fewer programmable switches in the longer paths. Predictability is

furthered aided by the fact that connections between horizontal and vertical lines pass through

active buffers.

The FLEX 8000 architecture has been extended in the state-of-the-art FLEX 10000 family.

FLEX 10000 offers all of the features of FLEX 8000, with the addition of variable-sized blocks

of SRAM, called Embedded Array Blocks (EABs) which shows that each row in a FLEX 10000

chip has an EAB on one end. Each EAB is configurable to serve as an SRAM block with a

variable aspect ratio: 256 x 8, 512 x 4, 1K x 2, or 2K x 1. In addition, an EAB can alternatively

be configured to implement a complex logic circuit, such as a multiplier, by employing it as a

large multi-output lookup table. Altera provides, as part of their CAD tools, several macro-

functions that implement useful logic circuits in EABs. Counting the EABs as logic gates, FLEX

10000 offers the highest logic capacity of any FPGA, although it is hard to provide an accurate

number.

17


Concurrent Logic FPGA Device : The manufacturer Concurrent Logic offers the CFA6006

FPGA device ,which is based on two dimensional array of identical blocks ,where each block is

symmetrical on its four sides. The array holds 3136 of such blocks ,providing a total logic

capacity of about 5000 equivalent gates. Connections are formed using multiplexers that are

configured by a static RAM programming technology.

The structure of the Concurrent Logic Block is shown below diagram. It comprises of user

configurable multiplexers, basic gates and a D type flip-flop .The concurrent FPGA is especially

suitable for register-intensive and arithmetic applications since the logic block can easily

implement a half-adder and a register bit.

There are two direct connections A and B formed by routing signals through the multiplexers

within the blocks.Long connection is implemented using a bussing network, in which wires of

various lengths are superimposed on the array of logic blocks.

Crosspoint Solutions FPGAs:

The crosspoint FPGAs are different from other FPGAs because it is configurable at the

transistor level as aoposed to logic block level in other FPGAs.Basically the architecture

consists of rows of transistor pairs ,where the rows are separated by horizontal wiring

segments .Veritical wiring segments are also available ,for connection among the rows.

18


Each transistor row comprises two lines of series connected transistors ,with one line being

NMOS and the other PMOS .The wiring resources allow individual transistor pairs tobe

interconnected to implement CMOS logic gates. The programming technology used for the

programmable switches is similar to the Via-Link anti-fuse ,which is based on amorphous

silicon.

The structure of the transistor pair rows is shown in below diagram.The diagram shows the

implementation of a NOR agte and a NAND gate using the transistor lines. The transistor

gates ,drains , sources can be programmable interconnected to other transistors and also to

power and ground.The series connections across the lines is broken where necessary by

permanently holding a transistor in its OFF state. A wide range of logic gates can be

implemented by the transistor lines and the interconnection patterns.

The FPGAs currently offered by Crosspoint Solutions has a total logic capacity of 4200

gates.The chip has 256 rows of transistor pairs and an additional 64-rows of multiplexer like

structures are provided.With its rows based architecture ,anti-fuse programming technology and

multiplexers ,the Crosspoint FPGAs are most similar to those of Actel FPGAs.

ALGOTRONIX CAL-1024

This design has a two-dimensional mesh array structure which resembles the gatearray “sea of

gates” architecture previously identified in Figure . Like the Xilinx architecture, Algotronics

used Static RAM programming technology to specify the function performed by each logic cell

19


and to control the switching of connections between cells. The CAL1024 design contains 1024

identical logic cells arranged in a 32 X 32 matrix. The design is considered to be a mesh-

connected architecture since each cell is directly connected to its nearest north, south, east, and

west neighbors. In addition to these direct connects, two global interconnect signals are routed to

each cell to distribute clock and other “low skew requirement” control signals. Figure 19 shows

the basic array architecture, indicating both nearest neighbor and global connections to the logic

cells. In addition to these logical connections, row select lines and bit select lines which are not

shown on the figure are connected to program each cell’s SRAM bits.

ALGOTRONIX Array Architecture

The basic building block of the Algotronix design is a configurable cell containing multiplexers

and a function unit. As indicated in the figure , the function unit is preceded by multiplexers

20


which select the source for the X1 and X2 inputs. The function unit is capable of generating any

logic function of the two inputs, or of operating as a D-type latch. Not shown in the figure are

four additional multiplexers which select the function output or one of the external inputs for

routing to each of the four outputs (north, south, east, and west).

A unique feature in the Algotronix I/O pad design is its capability to provide simultaneous input

and output on the same pin when communicating with another Algotronix chip. This is done

through a 3-level (ternary) logic signaling scheme in which I/O pads sense whenever two outputs

are driving each other via a contention scheme. Even during contention, the pad can deduce the

correct input value and pass it along to the internal circuitry. This makes it easier to partition a

single design across multiple FPGAs because the increased connectivity reduces pin limitations

on communications bandwidth.

AMD Mach : AMD offers a CPLD family comprising five subfamilies calledMach. Each Mach

device consists of multiple PAL-like blocks (or optimizedPALs). Mach 1 and 2 consist of

optimized22V16 PALs, Mach 3 and 4 consist of several optimized 34V16 PALs,and Mach 5 is

similar to Mach 3 and 4but offers enhanced speed performance .All Mach chips use EEPROM

technology, and together the five subfamilies provide a wide range of selection ,from small,

inexpensive chips to larger, state-of-the-art ones. We will focus on Mach 4 because it represents

the most advanced currently available parts in the family.

21


Figure (a) below depicts a Mach 4 chip, showing the multiple 34V16 PAL-like blocks and the

interconnect, called the central switch matrix. The in-circuit programmable chips range in size

from6 to 16 PAL-like blocks, corresponding roughly to 2,000 to 5,000 equivalent gates. All

connections between PAL-like blocks (even from a PAL-like block to itself) pass through the

central switch matrix. Thus, the device is not merely a collection of PAL-like blocks but a

single ,large device. Since all connections travel through the same path, circuit timing delays are

predictable. Figure (b) illustrates a Mach 4 PAL-like block. It has 16 outputs and a total of

34inputs (16 of which are the fed-back outputs), so it corresponds to a 34V16 PAL. However,

there are two key differences between this block and a normal PAL:1) a product term (PT)

allocator between the AND plane and the macro cells (the macro cells comprise an OR gate, an

EXOR gate, and a flip-flop), and2) an output switch matrix between the OR gates and the I/O

pins. These features make a Mach 4 chip easier to use because they decouple sections of the

PAL-like block. More specifically, the product term allocator distributes and shares product

terms from the AND plane to OR gates that require them, allowing much more flexibility than

thefixed-size OR gates in regular PALs. The output switch matrix enables any macrocell output

(OR gate or flip-flop)to drive any I/O pin connected to the PAL-like block, again providing

greater flexibility than a PAL, in which each macro cell can drive only one specific I/O pin.

Mach 4’s combination of in-system programmability and high flexibility allow easy hardware

design changes.

22


AMD Mach 4 structure

FPGA Design Flow:

23


The earlier PLD and FPGA designs were performed largely by hand But to-days

complex programmable logic devices requires the use of an integrated Computer-Aided Design

(CAD) system. Both commercial CAD tool vendors and FPGA companies offer appropriate

tools. For example, traditional Electronic Design Automation (EDA) vendors such as Cadence,

Mentor Graphics, Synopsys, and View Logic etc. offer tools to support FPGA design. These

tools are typically used for the front-end design entry and simulation operations and provide the

necessary interfaces to vendor-specific back-end tools for chip placement and routing.

Examples of vendor specific tools are the Xilinx XACT system and the Altera

MAX+PLUS II software.The Altera’s MAX+PLUS II software supports the entire design flow

on either PC or workstation platforms.

The first step in the design process is the description of the logic circuit,which can be done

either by schematic capture tool or with Boolean expressions.This is followed by a translation

that converts the original circuit description into a standard format used by the suitable CAD

tools (Ex: XILINX CAD tools).The circuit is then passed through CAD programs that partition it

into appropriate logic blocks. Select a specific location in the FPGA for each logic block and

form the required interconnections.( (Cadence, View Logic, OrCAD, etc.)

The performance of the implemented circuit can then be checked and its functionality is

verified.Finally a bitmap is generated and downloaded in a serial fashion to configure the FPGA.

Initial Design Entry: The detailed description of the logic circuit are entered using a schematic

capture program. In the design entry phase, RTL or schematic entry is used to create the logic

to be implemented in the device. Pin assignments can also be made, including pin placement

information, and timing constraints that might be necessary for building a functioning design. In

the design entry step a schematic or Block Design File (.bdf) is created that is the top-level

design. The library of parameterized modules (LPM) functions are added and Verilog HDL

code is used to add a logic block.

The library may be either supplied by the vendor of the schematic capture program or any FPGA

vendor(Like Xilinx or Altera etc.) .An alternate way to specify the logic circuit is to use a

Boolean expression or state machine language.This is done without the graphical interface.Some

times it is possible to use a mixture of both schematic and Boolean expressions.

24


Translation to XNF Format: After the logic circuit is successfully designed and merged into

one circuit ,it is translated into a special format that is understood by the CAD tools.Foe Xilinx

this format is called Xilinx net list format or XNF.This translation utility is supported by the

Xilinx or by the vendor of the logic entry tool.The translation process may also involve

automatic optimizations of the circuit.

Partition: The XNF circuit is partitioned into logic cells (this partition is also known as

Technology Mapping). This technology mapping converts the XNF circuit which is a net list of

basic logic gates ,into a net list of Xilinx logic cells.The logic cell used depends on which Xilinx

product the circuit is to be implemented in. XACT tools also attempt to optimize the circuit

during this step. For example, circuitry associated with unused logic block inputs or outputs is

eliminated from the design. In addition, the partitioning program attempts to minimize either the

total number of CLBs used or the number of logic stages in the critical delay path. The mapping

procedure attempts to optimize the resulting circuit, either to minimize the total of logic cells

required or the number of stages of logic cells in time critical circuitry.

Place and Route: This step is performed by using the CAD tools, manually by the user or

mixture of the two. The first step is placement ,in which each logic cell generated during the

partition step is assigned to a specific location in the FPGA. Automatic placement can be done

using the simulated annealing algorithm.

After the placement ,the required interconnections among the logic cells must be realized by

selecting wire segments and routing switches within the FPGA interconnection resources.An

automatic routing algorithm is used for this task which is based on Maze routing algorithm.

Generally this routing and placement must be done automatically but sometimes it is done

manually by the user also. With the physical placement and routing completed, exact timing

values can now be used to determine chip performance. The XACT tools provide a critical path

25


timing analyzer which provides delay information on the longest through shortest paths through

the chip.In addition, the physical layout timing information can also be back-annotated to the

schematics to get more accurate functional simulation results. The final step in the Xilinx design

flow is the creation of the BIT file which contains the binary programming data needed to

configure the SRAM bits of the target chip. This file is then downloaded to configure the chip

for final functional and timing tests of the programmed chip.

After creating the design it must be compiled. Compilation converts the design into a bitstream

that can be downloaded into the FPGA. The most important output of compilation is an SRAM

Object File (.sof), which is used to program the device. The software also generates other report

files that provide information about the code as it compiles

In the design flow process the simulation is very important to learn, and there are entire

applications devoted to simulating hardware designs. There are two types of simulation, RTL

and timing. RTL (or functional) simulation allows you to verify that your code is place-and-

route) simulation verifies that the design meets timing and functions appropriately in the device.

After completion of the design ,its performance is checked either by downloading the

configuration bits into FPGA or by using an interface to a timing simulation program.If the

performance is not satisfactory ,suitable modifications are done at some point in the design

flow.Once the timing and functionality is verified the implementation is complete.

---------------------xxxxxx------------------

References:

1.Field Programmable Gate Arrays – S.D Brown, R.J.Francis et al

2.FPGA and CPLD Architectures : A Tutorial -STEPHEN BROWN & JONATHAN ROSE.

3. FPGA Architecture: Survey and Challenges --Ian Kuon1, Russell Tessier and Jonathan Rose1

26


27