reconfigurable computing - · pdf file• reconfigurable computing is intended to fill the...

Post on 06-Feb-2018

215 Views

Category:

Documents

1 Downloads

Preview:

Click to see full reader

TRANSCRIPT

Reconfigurable computing

Eduardo Sanchez

EPFL

Eduardo Sanchez 2

Reconfigurable computing

• Methods for execution of algorithms:

• hardwired technology: high performance

• software-programmed microprocessors: high flexibility

Eduardo Sanchez 3

• Why hardwired solutions are faster than software solutions?

Eduardo Sanchez 4

• Reconfigurable computing is intended to fill the gap betweenhard and soft, achieving potentially much higher performancethan software, while maintaining a higher level of flexibility thanhardware (Compton and Hauck, “Reconfigurable computing”,ACM Computing Surveys, June 2002)

• Reconfigurable computing:

• systems incorporating some form of hardware programmability

• when we talk about reconfigurable computing we are usually talkingabout FPGA-based systems design

• Main motivations:

• accelerators for computing intensive applications

• tools for system validation: prototyping, emulation

Eduardo Sanchez 5

Moore's law

Eduardo Sanchez 6

Eduardo Sanchez 7

Eduardo Sanchez 8

410 millionItanium 2

125 millionPentium 4

37 millionMoore's law

(2x - 24 months)

3.3 billionMoore's law

(2x - 18 months)

27.4 trillionMoore's law

(2x - 12 months)

Transistors

(actual in 2004)

Transistors

(predicted for 2004)

Eduardo Sanchez 9

Eduardo Sanchez 10

• Intel introduces a new chip-fabrication process every twoyears:

• 2001: 0.13 micron

• 2003: 90 nm

• 2005: 65 nm

• 2007: 45 nm

• 2009: 32 nm

• ....

Eduardo Sanchez 11

Problem: Wirth's law

• Software is slowing faster than hardware is accelerating

• Expressed in Biblical cadences:rov ive and Ga ake awa

Andy GroveIntel Chairman

Bill GatesMicrosoft President

Eduardo Sanchez 12

• Computing requirement increases even faster than Moore'slaw

Eduardo Sanchez 13

Problem: time to market

Eduardo Sanchez 14

Problem: power consumption

400MHz

200MHz

100MHz

50MHz

Eduardo Sanchez 15

Embedded systems

• Embedded systems, which are hidden from the user andcannot usually be manipulated or reprogrammed, are found invirtually all electronic equipment used today, from wirelesstelephones and DVD players to cars and airplanes

• A fifth of the value of each car produced in the EU is due toembedded electronics, a value that is expected to rise to about40 percent by 2015

Eduardo Sanchez 16

Eduardo Sanchez 17

Eduardo Sanchez 18

Pervasive computing

• "The most profound technologies are those that disappear.They weave themselves into the fabric of everyday life untilthey are indistinguishable from it"Mark Weiser, "The Computer for the 21st Century", ScientificAmerican, Septiembre, 1991

• In a near future, the computer will disappear for beingeverywhere: it will be ubiquitous, pervasive

• Pervasive systems will be so integrated with theirs users thatthey will be invisible, they will disappear

Eduardo Sanchez 19

Eduardo Sanchez 20

• Evolution of computer systems:

• mainframes: one computer, many users

• PCs: one computer, one user

• pervasive systems: many computers, one user

Eduardo Sanchez 21

remote communication fault tolerancehigh availability

remote information accessdistributed security

mobile networkingadaptive applications

energy-aware systemsmobile information access

location sensitivity

smart spacesinvisibility

localized scalabilityuneven conditioning

distributed systems

mobile computing

pervasive systems

Eduardo Sanchez 22

Smart Dust project

Eduardo Sanchez 23

• A 20 MIPS CPU embedded in a shoe

• Four times more powerful than early Silicon Graphics workstations (Motorola68000)

Eduardo Sanchez 24

• A lot of new devices to design

Eduardo Sanchez 25

Integrated circuits

Full-custom(ASIC)

Hand-made

Libraries

Semi-custom

Maskprogrammable

Fieldprogrammable

Gate array ROMPROMPALPLA

CPLD FPGA

Standard circuits

Eduardo Sanchez 26

Field Programmable Gate Arrays

• Array of logic cells

• Each cell is able to implement a logic function, chosen amongseveral possible functions: the choice is done by programming

• Interconnections between cells are also programmable

• Two types, depending on the cell’s complexity:

• fine grain

• coarse grain

• Two types, depending on the programming mode:

• RAM: every logic cell contains a LUT (look-up table), accompanied by aflip-flop, and all interconnected with programmable routing pathways

• anti-fuses

Eduardo Sanchez 27

programmableinterconnections

programmablefonctions

configuration

I/O celllogic cell

Eduardo Sanchez 28

Programmable

interconnect

Programmable

logic blocks

Eduardo Sanchez 29

|

&a

b

cy

y = (a & b) | !c

Required function Truth table

1011101

000

001

010

011

100

101

110

1111

y

a b c y

00001111

00110011

01010101

10111011

SRAM cells

Programmed LUT

8:1

Multi

ple

xer

a b c

Eduardo Sanchez 30

• An example of logic cell:

LUTCarry &Control

SP

D

EC

RC

Q

G4

G3

G2

G1

BY

YQ

YYB

Cout

Cin

• Functional frequencies are design-dependant

Eduardo Sanchez 31

16-bit SR

flip-flop

clock

mux

y

qe

a

b

c

d

16x1 RAM

4-input

LUT

clock enable

set/reset

Eduardo Sanchez 32

16-bit SR

16x1 RAM

4-input

LUT

LUT MUX REG

Logic Cell (LC)

16-bit SR

16x1 RAM

4-input

LUT

LUT MUX REG

Logic Cell (LC)

Slice

Eduardo Sanchez 33

CLB CLB

CLB CLB

Logic cell

Slice

Logic cell

Logic cell

Slice

Logic cell

Logic cell

Slice

Logic cell

Logic cell

Slice

Logic cell

Configurable logic block (CLB)

Eduardo Sanchez 34

Columns of embedded

RAM blocks

Arrays of

programmable

logic blocks

Eduardo Sanchez 35

RAM blocks

Multipliers

Logic blocks

Eduardo Sanchez 36

x

+

x

+

A[n:0]

B[n:0] Y[(2n - 1):0]

Multiplier

Adder

Accumulator

MAC

Eduardo Sanchez 37

uP

RAM

I/O

etc.

Main FPGA fabric

Microprocessorcore, special RAM,

peripherals andI/O, etc.

The “Stripe”

Eduardo Sanchez 38

uP

(a) One embedded core (b) Four embedded cores

uP uP

uP uP

Eduardo Sanchez 39

Configuration data in

Configuration data out

= I/O pin/pad

= SRAM cell

Eduardo Sanchez 40

Serial load with FPGA as master

Mode Pins Mode

Serial load with FPGA as slave

Parallel load with FPGA as master

Parallel load with FPGA as slave

0 0

0 1

1 0

1 1

Eduardo Sanchez 41

Configuration data in

Mem

ory

Dev

ice

Control

Configuration

data out

FPGA

Cdata In

Cdata Out

Eduardo Sanchez 42

Configuration data [7:0]

Mem

ory

Dev

ice

Control FPGA

Cdata In[7:0]

Address

Eduardo Sanchez 43

Configuration data [7:0]Mem

ory

Dev

ice Control FPGA

Cdata In[7:0]

Eduardo Sanchez 44

Mem

ory

Devic

e

Control

Mic

rop

roce

ss

or

Address

Data

Peri

ph

era

lP

ort

, etc

.

FPGA

Cdata In[7:0]

Eduardo Sanchez 45

• Total area = active logic + configuration memory + interconnect

interconnect

active logic

configuration memory

Eduardo Sanchez 46

• Advantages over PLDs:

• enhanced flexibility

• reduced board space, power and cost

• increased performance

• Advantages over ASICs:

• reprogrammability

• off-the-shelf availability

• zero NRE (non-recurring engineering) costs

• reduced time-to-market

• ease-of-use

Eduardo Sanchez 47

CumulativeNRE + Unit Cost

CumulativeVolume K Units

ASIC .15

ASIC .25

FPGA .25 FPGA .15

ASIC costs starthigher, but slopeis flatter

For each technologyadvance, FPGAs becomemore cost effective

Eduardo Sanchez 48

• As performance requirements increase, the implementation ofcontrol elements in embedded applications is moving from 8-bits to 32-bits

• At the same time, the implementation vehicle of choice forembedded applications is moving from ASICs to FPGAs due tocost and time-to-market pressures

Eduardo Sanchez 49

Eduardo Sanchez 50

Synthesis methodology

configuration bit-string

schematic

graphic editor VHDL

placement

routing

partition

Eduardo Sanchez 51

Registertransfer level

RTL

Logic

Simulator

RTL functionalverification

LogicSynthesis

Gate-levelnetlist

Logic

Simulator

Place-and-Route

Gate-level functionalverification

Eduardo Sanchez 52

Graphical State Diagram

Graphical Flowchart

When clock rises If (s == 0) then y = (a & b) | c; else y = c & !(d ^ e);

Textual HDL

Top-level

block-level

schematic

Block-level schematic

Eduardo Sanchez 53

Eduardo Sanchez 54

Eduardo Sanchez 55

Intellectual property (IP)

• A semiconductor IP block is a predesigned function to beimplemented in a semiconductor device. In some cases, thefunctions are parametrisable, allowing a degree ofcustomization. These functions include physical libraryfunctions (analog or digital), basic blocks (such as countersand muxes) and system-level macros (also known as cores orvirtual components) - including memory blocks

• Market:

• 1999: 442 millions dollars (semiconductors total : 196’136 M$)

• 2000: 620 millions dollars (total semiconductors total : 231’601 M$)

• 2004: 2’940 millions dollars (semiconductors total : 339’545 M$)

Eduardo Sanchez 56

System-on-a-chip (SOC)

SOC

ASIC FPGA

· expensive circuit· lower performance· higher consumption· lower development cost· faster adaptation to change

Eduardo Sanchez 57

ASIC SOC

24012020SRAM Mb/cm2

400022001000MIPS/watt

200x10630x1065x106Gates/cm2

0.050.090.18Technology

201120042000

Eduardo Sanchez 58

MicroBlaze soft processor• Thirty-two 32-bit general purpose registers

• 32-bit instruction word with three operands and two addressingmodes

• Separate 32-bit instruction and data buses that conform to IBM’sOPB (On-chip Peripheral Bus) specification

• Separate 32-bit instruction and data buses with direct connectionto on-chip block RAM through a LMB (Local Memory Bus)

• 32-bit address bus

• Single issue, 3-stage pipeline (instruction fetch, operand fetch,execution)

• Hardware multiplier

• Big-endian

Eduardo Sanchez 59

Eduardo Sanchez 60

Virtex-II Pro family• Virtex-II Pro FPGAs provide up to four embedded 32-bit IBM PowerPC 405

RISC processors, each delivering over 420 Dhrystone MIPS at 300 MHz

• 16KB data / 16KB instruction caches

• memory management unit

• variable page size (1KB-16MB)

• five-stage datapath pipeline

• integer multiply/divide unit

• 32x32 bit general purpose registers

• dedicated on-chip memory interface

• it takes up as little as 2% of the total die area of XC2VP50

• it does not have a hardware floating point unit

• Up to twenty-four on-chip 3.125 Gbps Rocket I/O transceivers

• Based on a 0.13μ, 9-layer copper/low-K dielectric technology

Eduardo Sanchez 61

Eduardo Sanchez 62

Eduardo Sanchez 63

Virtex-4 family from Xilinx

• Columnar architecture

Eduardo Sanchez 64

• A Configurable Logic Block (CLB) contains 4 interconnectedslices

Eduardo Sanchez 65

• A simplified view of the slice is:

Eduardo Sanchez 66

• BRAM • Multipliers (DSP) blocks

Eduardo Sanchez 67

• Integrated PowerPC 405• Fully integrated Ethernet

Media Access Controller(EMAC)

• Bitstreams encrypted with256-bit AES algorithm

• 90-nm, 11-layer technology

• 500MHz for memory andmultipliers

• Lowest power

Eduardo Sanchez 68

• Three platforms:

Logic

Memory

DCMs

DSP

Logic

Memory

DCMs

DSP

Logic

Memory

DCMs

DSP

RocketIO

PowerPC

SX PlatformOptimized for

high-performancesignal processing

FX PlatformOptimized for

embedded processing andhigh-speed serial

connectivity

LX PlatformOptimized for

high-performance logic

Eduardo Sanchez 69

2442192896209,936142,128XC4VFX14

0

2042160768126,76894,896XC4VFX10

0

1642128576124,17656,880XC4VFX60

12424844882,59241,904XC4VFX40

8213232041,22419,224XC4VFX20

-2132320464812,312XC4VFX12

---51264085,76055,296XC4VSX55

---19244883,45634,560XC4VSX35

---12832042,30423,040XC4VSX25

---96960126,048200,448XC4VLX20

0

---96960125,184152,064XC4VLX16

0

---96960124,320110,592XC4VLX10

0

---80768123,60080,640XC4VLX80

---6464082,88059,904XC4VLX60

---6464081,72841,472XC4VLX40

---4844881,29624,192XC4VLX25

---32320486413,824XC4VLX15

RocketIO

transceiv

er

10/100/

1000

EMAC

PowerP

C

XtremeDS

P Slice

SelectI

O

DC

M

Block

RAM

[Kb]

Logic

CellsDevice

Eduardo Sanchez 70

Virtex-5 family from Xilinx

Eduardo Sanchez 71

Eduardo Sanchez 72

• 32-Kb block RAM running at 550 MHz

• Compared to Virtex-4, Virtex-5 devices offer 30% higheraverage speed and 65% higher capacity in the largest device.Dynamic power consumption is reduced by 35% and chip areais 45% smaller

Eduardo Sanchez 73

Stratix II family from Altera

Eduardo Sanchez 74

Eduardo Sanchez 75

• Each Logic Array Block (LAB) contains 4 Adaptive Logic Modules(ALM)

Eduardo Sanchez 76

Eduardo Sanchez 77

Eduardo Sanchez 78

Eduardo Sanchez 79

Eduardo Sanchez 80

Eduardo Sanchez 81

Stratix III family from Altera

Eduardo Sanchez 82

• Adaptive Logic Module (ALM)

Eduardo Sanchez 83

Eduardo Sanchez 84

Eduardo Sanchez 85

Cyclone II family from Altera

Eduardo Sanchez 86

Eduardo Sanchez 87

Eduardo Sanchez 88

Eduardo Sanchez 89

Nios soft processor

• RISC-like processor

• Full 32-bit instruction set, data path and address space

• 32 general-purpose registers

• 32 external interrupt sources

• Single-instruction 32x32 multiply and divide producing a 32-bitresult

• Single-instruction barrel shifter

• 6-level pipeline

• Branch prediction

Eduardo Sanchez 90

Eduardo Sanchez 91

Eduardo Sanchez 92

Eduardo Sanchez 93

Fusion family from Actel

• This FPGA family integrates thestandard programmable logicwith configurable analog andFlash memory

• Configurable analog to digitalconverter (ADC), supportingresolutions up to 12 bits, andsample rates up to 600 ksamples per second

• A 32-bit ARM7 soft-core isavailable

Eduardo Sanchez 94

Eduardo Sanchez 95

• VersaTile configurations:

Eduardo Sanchez 96

top related