reconfigurable computing - · pdf file• reconfigurable computing is intended to fill the...
TRANSCRIPT
Reconfigurable computing
Eduardo Sanchez
EPFL
Eduardo Sanchez 2
Reconfigurable computing
• Methods for execution of algorithms:
• hardwired technology: high performance
• software-programmed microprocessors: high flexibility
Eduardo Sanchez 3
• Why hardwired solutions are faster than software solutions?
Eduardo Sanchez 4
• Reconfigurable computing is intended to fill the gap betweenhard and soft, achieving potentially much higher performancethan software, while maintaining a higher level of flexibility thanhardware (Compton and Hauck, “Reconfigurable computing”,ACM Computing Surveys, June 2002)
• Reconfigurable computing:
• systems incorporating some form of hardware programmability
• when we talk about reconfigurable computing we are usually talkingabout FPGA-based systems design
• Main motivations:
• accelerators for computing intensive applications
• tools for system validation: prototyping, emulation
Eduardo Sanchez 5
Moore's law
Eduardo Sanchez 6
Eduardo Sanchez 7
Eduardo Sanchez 8
410 millionItanium 2
125 millionPentium 4
37 millionMoore's law
(2x - 24 months)
3.3 billionMoore's law
(2x - 18 months)
27.4 trillionMoore's law
(2x - 12 months)
Transistors
(actual in 2004)
Transistors
(predicted for 2004)
Eduardo Sanchez 9
Eduardo Sanchez 10
• Intel introduces a new chip-fabrication process every twoyears:
• 2001: 0.13 micron
• 2003: 90 nm
• 2005: 65 nm
• 2007: 45 nm
• 2009: 32 nm
• ....
Eduardo Sanchez 11
Problem: Wirth's law
• Software is slowing faster than hardware is accelerating
• Expressed in Biblical cadences:rov ive and Ga ake awa
Andy GroveIntel Chairman
Bill GatesMicrosoft President
Eduardo Sanchez 12
• Computing requirement increases even faster than Moore'slaw
Eduardo Sanchez 13
Problem: time to market
Eduardo Sanchez 14
Problem: power consumption
400MHz
200MHz
100MHz
50MHz
Eduardo Sanchez 15
Embedded systems
• Embedded systems, which are hidden from the user andcannot usually be manipulated or reprogrammed, are found invirtually all electronic equipment used today, from wirelesstelephones and DVD players to cars and airplanes
• A fifth of the value of each car produced in the EU is due toembedded electronics, a value that is expected to rise to about40 percent by 2015
Eduardo Sanchez 16
Eduardo Sanchez 17
Eduardo Sanchez 18
Pervasive computing
• "The most profound technologies are those that disappear.They weave themselves into the fabric of everyday life untilthey are indistinguishable from it"Mark Weiser, "The Computer for the 21st Century", ScientificAmerican, Septiembre, 1991
• In a near future, the computer will disappear for beingeverywhere: it will be ubiquitous, pervasive
• Pervasive systems will be so integrated with theirs users thatthey will be invisible, they will disappear
Eduardo Sanchez 19
Eduardo Sanchez 20
• Evolution of computer systems:
• mainframes: one computer, many users
• PCs: one computer, one user
• pervasive systems: many computers, one user
Eduardo Sanchez 21
remote communication fault tolerancehigh availability
remote information accessdistributed security
mobile networkingadaptive applications
energy-aware systemsmobile information access
location sensitivity
smart spacesinvisibility
localized scalabilityuneven conditioning
distributed systems
mobile computing
pervasive systems
Eduardo Sanchez 22
Smart Dust project
Eduardo Sanchez 23
• A 20 MIPS CPU embedded in a shoe
• Four times more powerful than early Silicon Graphics workstations (Motorola68000)
Eduardo Sanchez 24
• A lot of new devices to design
Eduardo Sanchez 25
Integrated circuits
Full-custom(ASIC)
Hand-made
Libraries
Semi-custom
Maskprogrammable
Fieldprogrammable
Gate array ROMPROMPALPLA
CPLD FPGA
Standard circuits
Eduardo Sanchez 26
Field Programmable Gate Arrays
• Array of logic cells
• Each cell is able to implement a logic function, chosen amongseveral possible functions: the choice is done by programming
• Interconnections between cells are also programmable
• Two types, depending on the cell’s complexity:
• fine grain
• coarse grain
• Two types, depending on the programming mode:
• RAM: every logic cell contains a LUT (look-up table), accompanied by aflip-flop, and all interconnected with programmable routing pathways
• anti-fuses
Eduardo Sanchez 27
programmableinterconnections
programmablefonctions
configuration
I/O celllogic cell
Eduardo Sanchez 28
Programmable
interconnect
Programmable
logic blocks
Eduardo Sanchez 29
|
&a
b
cy
y = (a & b) | !c
Required function Truth table
1011101
000
001
010
011
100
101
110
1111
y
a b c y
00001111
00110011
01010101
10111011
SRAM cells
Programmed LUT
8:1
Multi
ple
xer
a b c
Eduardo Sanchez 30
• An example of logic cell:
LUTCarry &Control
SP
D
EC
RC
Q
G4
G3
G2
G1
BY
YQ
YYB
Cout
Cin
• Functional frequencies are design-dependant
Eduardo Sanchez 31
16-bit SR
flip-flop
clock
mux
y
qe
a
b
c
d
16x1 RAM
4-input
LUT
clock enable
set/reset
Eduardo Sanchez 32
16-bit SR
16x1 RAM
4-input
LUT
LUT MUX REG
Logic Cell (LC)
16-bit SR
16x1 RAM
4-input
LUT
LUT MUX REG
Logic Cell (LC)
Slice
Eduardo Sanchez 33
CLB CLB
CLB CLB
Logic cell
Slice
Logic cell
Logic cell
Slice
Logic cell
Logic cell
Slice
Logic cell
Logic cell
Slice
Logic cell
Configurable logic block (CLB)
Eduardo Sanchez 34
Columns of embedded
RAM blocks
Arrays of
programmable
logic blocks
Eduardo Sanchez 35
RAM blocks
Multipliers
Logic blocks
Eduardo Sanchez 36
x
+
x
+
A[n:0]
B[n:0] Y[(2n - 1):0]
Multiplier
Adder
Accumulator
MAC
Eduardo Sanchez 37
uP
RAM
I/O
etc.
Main FPGA fabric
Microprocessorcore, special RAM,
peripherals andI/O, etc.
The “Stripe”
Eduardo Sanchez 38
uP
(a) One embedded core (b) Four embedded cores
uP uP
uP uP
Eduardo Sanchez 39
Configuration data in
Configuration data out
= I/O pin/pad
= SRAM cell
Eduardo Sanchez 40
Serial load with FPGA as master
Mode Pins Mode
Serial load with FPGA as slave
Parallel load with FPGA as master
Parallel load with FPGA as slave
0 0
0 1
1 0
1 1
Eduardo Sanchez 41
Configuration data in
Mem
ory
Dev
ice
Control
Configuration
data out
FPGA
Cdata In
Cdata Out
Eduardo Sanchez 42
Configuration data [7:0]
Mem
ory
Dev
ice
Control FPGA
Cdata In[7:0]
Address
Eduardo Sanchez 43
Configuration data [7:0]Mem
ory
Dev
ice Control FPGA
Cdata In[7:0]
Eduardo Sanchez 44
Mem
ory
Devic
e
Control
Mic
rop
roce
ss
or
Address
Data
Peri
ph
era
lP
ort
, etc
.
FPGA
Cdata In[7:0]
Eduardo Sanchez 45
• Total area = active logic + configuration memory + interconnect
interconnect
active logic
configuration memory
Eduardo Sanchez 46
• Advantages over PLDs:
• enhanced flexibility
• reduced board space, power and cost
• increased performance
• Advantages over ASICs:
• reprogrammability
• off-the-shelf availability
• zero NRE (non-recurring engineering) costs
• reduced time-to-market
• ease-of-use
Eduardo Sanchez 47
CumulativeNRE + Unit Cost
CumulativeVolume K Units
ASIC .15
ASIC .25
FPGA .25 FPGA .15
ASIC costs starthigher, but slopeis flatter
For each technologyadvance, FPGAs becomemore cost effective
Eduardo Sanchez 48
• As performance requirements increase, the implementation ofcontrol elements in embedded applications is moving from 8-bits to 32-bits
• At the same time, the implementation vehicle of choice forembedded applications is moving from ASICs to FPGAs due tocost and time-to-market pressures
Eduardo Sanchez 49
Eduardo Sanchez 50
Synthesis methodology
configuration bit-string
schematic
graphic editor VHDL
placement
routing
partition
Eduardo Sanchez 51
Registertransfer level
RTL
Logic
Simulator
RTL functionalverification
LogicSynthesis
Gate-levelnetlist
Logic
Simulator
Place-and-Route
Gate-level functionalverification
Eduardo Sanchez 52
Graphical State Diagram
Graphical Flowchart
When clock rises If (s == 0) then y = (a & b) | c; else y = c & !(d ^ e);
Textual HDL
Top-level
block-level
schematic
Block-level schematic
Eduardo Sanchez 53
Eduardo Sanchez 54
Eduardo Sanchez 55
Intellectual property (IP)
• A semiconductor IP block is a predesigned function to beimplemented in a semiconductor device. In some cases, thefunctions are parametrisable, allowing a degree ofcustomization. These functions include physical libraryfunctions (analog or digital), basic blocks (such as countersand muxes) and system-level macros (also known as cores orvirtual components) - including memory blocks
• Market:
• 1999: 442 millions dollars (semiconductors total : 196’136 M$)
• 2000: 620 millions dollars (total semiconductors total : 231’601 M$)
• 2004: 2’940 millions dollars (semiconductors total : 339’545 M$)
Eduardo Sanchez 56
System-on-a-chip (SOC)
SOC
ASIC FPGA
· expensive circuit· lower performance· higher consumption· lower development cost· faster adaptation to change
Eduardo Sanchez 57
ASIC SOC
24012020SRAM Mb/cm2
400022001000MIPS/watt
200x10630x1065x106Gates/cm2
0.050.090.18Technology
201120042000
Eduardo Sanchez 58
MicroBlaze soft processor• Thirty-two 32-bit general purpose registers
• 32-bit instruction word with three operands and two addressingmodes
• Separate 32-bit instruction and data buses that conform to IBM’sOPB (On-chip Peripheral Bus) specification
• Separate 32-bit instruction and data buses with direct connectionto on-chip block RAM through a LMB (Local Memory Bus)
• 32-bit address bus
• Single issue, 3-stage pipeline (instruction fetch, operand fetch,execution)
• Hardware multiplier
• Big-endian
Eduardo Sanchez 59
Eduardo Sanchez 60
Virtex-II Pro family• Virtex-II Pro FPGAs provide up to four embedded 32-bit IBM PowerPC 405
RISC processors, each delivering over 420 Dhrystone MIPS at 300 MHz
• 16KB data / 16KB instruction caches
• memory management unit
• variable page size (1KB-16MB)
• five-stage datapath pipeline
• integer multiply/divide unit
• 32x32 bit general purpose registers
• dedicated on-chip memory interface
• it takes up as little as 2% of the total die area of XC2VP50
• it does not have a hardware floating point unit
• Up to twenty-four on-chip 3.125 Gbps Rocket I/O transceivers
• Based on a 0.13μ, 9-layer copper/low-K dielectric technology
Eduardo Sanchez 61
Eduardo Sanchez 62
Eduardo Sanchez 63
Virtex-4 family from Xilinx
• Columnar architecture
Eduardo Sanchez 64
• A Configurable Logic Block (CLB) contains 4 interconnectedslices
Eduardo Sanchez 65
• A simplified view of the slice is:
Eduardo Sanchez 66
• BRAM • Multipliers (DSP) blocks
Eduardo Sanchez 67
• Integrated PowerPC 405• Fully integrated Ethernet
Media Access Controller(EMAC)
• Bitstreams encrypted with256-bit AES algorithm
• 90-nm, 11-layer technology
• 500MHz for memory andmultipliers
• Lowest power
Eduardo Sanchez 68
• Three platforms:
Logic
Memory
DCMs
DSP
Logic
Memory
DCMs
DSP
Logic
Memory
DCMs
DSP
RocketIO
PowerPC
SX PlatformOptimized for
high-performancesignal processing
FX PlatformOptimized for
embedded processing andhigh-speed serial
connectivity
LX PlatformOptimized for
high-performance logic
Eduardo Sanchez 69
2442192896209,936142,128XC4VFX14
0
2042160768126,76894,896XC4VFX10
0
1642128576124,17656,880XC4VFX60
12424844882,59241,904XC4VFX40
8213232041,22419,224XC4VFX20
-2132320464812,312XC4VFX12
---51264085,76055,296XC4VSX55
---19244883,45634,560XC4VSX35
---12832042,30423,040XC4VSX25
---96960126,048200,448XC4VLX20
0
---96960125,184152,064XC4VLX16
0
---96960124,320110,592XC4VLX10
0
---80768123,60080,640XC4VLX80
---6464082,88059,904XC4VLX60
---6464081,72841,472XC4VLX40
---4844881,29624,192XC4VLX25
---32320486413,824XC4VLX15
RocketIO
transceiv
er
10/100/
1000
EMAC
PowerP
C
XtremeDS
P Slice
SelectI
O
DC
M
Block
RAM
[Kb]
Logic
CellsDevice
Eduardo Sanchez 70
Virtex-5 family from Xilinx
Eduardo Sanchez 71
Eduardo Sanchez 72
• 32-Kb block RAM running at 550 MHz
• Compared to Virtex-4, Virtex-5 devices offer 30% higheraverage speed and 65% higher capacity in the largest device.Dynamic power consumption is reduced by 35% and chip areais 45% smaller
Eduardo Sanchez 73
Stratix II family from Altera
Eduardo Sanchez 74
Eduardo Sanchez 75
• Each Logic Array Block (LAB) contains 4 Adaptive Logic Modules(ALM)
Eduardo Sanchez 76
Eduardo Sanchez 77
Eduardo Sanchez 78
Eduardo Sanchez 79
Eduardo Sanchez 80
Eduardo Sanchez 81
Stratix III family from Altera
Eduardo Sanchez 82
• Adaptive Logic Module (ALM)
Eduardo Sanchez 83
Eduardo Sanchez 84
Eduardo Sanchez 85
Cyclone II family from Altera
Eduardo Sanchez 86
Eduardo Sanchez 87
Eduardo Sanchez 88
Eduardo Sanchez 89
Nios soft processor
• RISC-like processor
• Full 32-bit instruction set, data path and address space
• 32 general-purpose registers
• 32 external interrupt sources
• Single-instruction 32x32 multiply and divide producing a 32-bitresult
• Single-instruction barrel shifter
• 6-level pipeline
• Branch prediction
Eduardo Sanchez 90
Eduardo Sanchez 91
Eduardo Sanchez 92
Eduardo Sanchez 93
Fusion family from Actel
• This FPGA family integrates thestandard programmable logicwith configurable analog andFlash memory
• Configurable analog to digitalconverter (ADC), supportingresolutions up to 12 bits, andsample rates up to 600 ksamples per second
• A 32-bit ARM7 soft-core isavailable
Eduardo Sanchez 94
Eduardo Sanchez 95
• VersaTile configurations:
Eduardo Sanchez 96