fpga architecture and reconfigurable computing topics · 2015-06-04 · fpga architecture and...
TRANSCRIPT
FPGA Architecture and Reconfigurable ComputingAn introduction
João Canas Ferreira
Universidade do PortoFaculdade de Engenharia
May 2015
Topics
1 Introduction
2 General Architecture of Island-style FPGAs
3 Implementation Aspects
4 Computing with FPGAs
João Canas Ferreira (FEUP) FPGA Architecture and Reconfigurable Computing May 2015 2 / 42
What are FPGAs?
I Field-Programmable Gate ArrayI COTS (commodity off-the-shelf) IC that is configured by the
customer/user after manufacturingI The configuration can be done once (anti-fuse) or multiple times (SRAM
or FLASH configuration memory)I FPGAs contain configurable logic modules and a configurable
interconnection networkI Modern FPGA devices are system-on-chip (SoC) which may include
embedded processors, signal processing blocks, memory blocks, clockmanagers, transceivers, . . .
I FPGAs have a short design cycle and, for smaller production volumes, aremore cost-effective than ASICs
I The cost-effectiveness of FPGAs improves as technology becomes morecomplex (higher design effort) and manufacturing costs increase.
João Canas Ferreira (FEUP) FPGA Architecture and Reconfigurable Computing May 2015 3 / 42
FPGAs are heterogeneous computing platforms
Family Spartan-6 Artix-7 Kintex-7 Virtex-7Kintex
UltraScale
Kintex
UltraScale+
Virtex
UltraScale
Virtex
UltraScale+
Logic Cells (K) 147 215 478 1,955 1,161 915 4,433 2,863
Block RAM (Mb) 4.8 13 34 68 76 34.5 132.9 94.5
DSP Slices 180 740 1,92 3,6 5,52 3,528 2,88 11,904
Transceiver Count 8 16 32 96 64 76 120 128Max. Transceiver
Speed (Gb/s)3.2 6.6 12.5 28.05 16.3 32.75 30.5 32.75
Memory Interface
(DDR3 )800 1,066 1,866 1,866 2,133 2,133 2,133 2,133
I/O Pins 576 500 500 1,2 832 572 1,456 832
x8 Gen 4
x16 Gen 3
x8 Gen 4
x16 Gen 3x8 Gen3PCI Express x1 Gen1 x4 Gen2 x8 Gen2 x8 Gen3 x8 Gen3
I Embedded CPU(s)I Clock distribution tree and clock manager(s)I Analog-to-digital converter(s)I Ethernet and DRAM memory controllersI . . .
João Canas Ferreira (FEUP) FPGA Architecture and Reconfigurable Computing May 2015 4 / 42
Advanced process nodesI Xilinx: Spartan (45 nm), Virtex-7 (28 nm), Virtex Ultrascale (20 nm),
Virtex Ultrascale+ (16 nm)I Altera: Arria 10 (TSMC 20 nm), Startix V (28nm), Stratix 10 (Intel 14 nm)
Source: [ALTERA, Expect a Breakthrough Advantage in Next- Generation FPGAs (White paper WP-01199-1.0, June 2013)]
João Canas Ferreira (FEUP) FPGA Architecture and Reconfigurable Computing May 2015 5 / 42
Advanced interconnect technology
Source: [Xilinx, Xilinx Stacked Silicon Interconnect Technology . . . (White paper WP380, 2012)]
João Canas Ferreira (FEUP) FPGA Architecture and Reconfigurable Computing May 2015 6 / 42
Topics
1 Introduction
2 General Architecture of Island-style FPGAs
3 Implementation Aspects
4 Computing with FPGAs
João Canas Ferreira (FEUP) FPGA Architecture and Reconfigurable Computing May 2015 7 / 42
Main features of island-style FPGAs
à “Islands” of logic in a “sea” of interconnections
à Regular layout, but not necessarily uniform
I Logic blocksI Configurable combinational function generatorsI Flip-flopsI Clusters: groups of function generatorsI Local interconnect
I Routing infrastructureI Segmented routingI Switch boxes (between segments)I Connection points (segments to logic blocks)
I Configurable I/O cellsI input, output, bidirectionalI I/O standard (LVCMOS, LVTTL, LVDS, etc.)
João Canas Ferreira (FEUP) FPGA Architecture and Reconfigurable Computing May 2015 8 / 42
Reconfigurable fabric
Logic
Block
Logic
Block
Logic
Block
Logic
Block
Switch block
Long wire segmentShort wire segment
Connection
block
Programmable
connection
switch
Programmable
routing switch
João Canas Ferreira (FEUP) FPGA Architecture and Reconfigurable Computing May 2015 9 / 42
Configurable logic element
à Conceptual basic logic element with a single output and 1 flip-flop
à LUT = look-up table: table of 2N bits (N = number of inputs)
à Configuration defines:
1 contents of the LUT
2 the value of Sel
N-inputLUT
D Q
Q
CLK
Out
Inputs
Sel
João Canas Ferreira (FEUP) FPGA Architecture and Reconfigurable Computing May 2015 10 / 42
Clustered logic blockà Granularity: size of the configurable logic block (CLB)à Local interconnect (simplified routing routing)
CLK
CLE #1
CLE #2
CLE #N
N outputs
K Inputs
à Inputs may be shared between CLEs
à Clusters may contain other ele-ments (e.g., carry chain)
João Canas Ferreira (FEUP) FPGA Architecture and Reconfigurable Computing May 2015 11 / 42
Look-up table
à Conceptually, just a set of 2N memory cells and a multiplexer
à Configuration defines contents of the SRAM cells
à Multiplexer control signals are the function inputs
2N
SRAMcells
Output
N inputs
João Canas Ferreira (FEUP) FPGA Architecture and Reconfigurable Computing May 2015 12 / 42
Alternative CLBà Microsemi (formerly Actel) anti-fuse FPGA
Source: [Microsemi, Axcelerator Family FPGAs, datasheet, 2012]
João Canas Ferreira (FEUP) FPGA Architecture and Reconfigurable Computing May 2015 13 / 42
Wire segments
à Segments of length 1, 2 and 4
CLB CLB CLB CLB
switch box switch box switch box switch boxswitch box
à Overlapping segments of length 3
CLB CLB CLB CLB CLB
João Canas Ferreira (FEUP) FPGA Architecture and Reconfigurable Computing May 2015 14 / 42
Connection points and switch boxes
CLB
In0 Out0
Out1
In1
CLB
In0 Out0
Out1
In1
à Patterns of connection points and flexibility of the switch boxes varies withFPGA family.
João Canas Ferreira (FEUP) FPGA Architecture and Reconfigurable Computing May 2015 15 / 42
Topics
1 Introduction
2 General Architecture of Island-style FPGAs
3 Implementation Aspects
4 Computing with FPGAs
João Canas Ferreira (FEUP) FPGA Architecture and Reconfigurable Computing May 2015 16 / 42
Static memory cells
à Standard 6 transistor SRAM cell
data data
write
write_data write_data
João Canas Ferreira (FEUP) FPGA Architecture and Reconfigurable Computing May 2015 17 / 42
Implementing configurable multiplexers
I0
I1
I2
I3
Y
2 SRAM cells
data data
I0
I1
I2
I3
Y
data data
João Canas Ferreira (FEUP) FPGA Architecture and Reconfigurable Computing May 2015 18 / 42
Implementing look-up tablesà Example: 3-input LUT
I0 I1 I2
SRAM cells
F
João Canas Ferreira (FEUP) FPGA Architecture and Reconfigurable Computing May 2015 19 / 42
Implementing input connection points
inB
inA
inB
inA
inB
inA
inB
inA
inB
inA
inB
inA
João Canas Ferreira (FEUP) FPGA Architecture and Reconfigurable Computing May 2015 20 / 42
Implementing switch boxesà Example: pass transistors and tri-state gates
João Canas Ferreira (FEUP) FPGA Architecture and Reconfigurable Computing May 2015 21 / 42
Implementing output buffering
SRAM
SRAM
SRAM
CLB
Out0 Out0
Out0Out0
João Canas Ferreira (FEUP) FPGA Architecture and Reconfigurable Computing May 2015 22 / 42
Gate boostingà Increase gate voltage of pass transistor
à Example for 0.25 µm technology with 2.5 V supply:
2.5V
2.5 V
1.80 V
2.5V
3 V
2.23 V
à Avoids static power dissipation / increases noise immunity
à Without gate boosting: 20 µW; with gate boosting: ≈ 10 µW
João Canas Ferreira (FEUP) FPGA Architecture and Reconfigurable Computing May 2015 23 / 42
Topics
1 Introduction
2 General Architecture of Island-style FPGAs
3 Implementation Aspects
4 Computing with FPGAs
João Canas Ferreira (FEUP) FPGA Architecture and Reconfigurable Computing May 2015 24 / 42
Reconfigurable computingReconfigurable computing (RC) vs. conventional (CPU) computing
I Reconfigurable hardware infrastructure instead of CPUI Computation performed by a circuit rather than by executing instructions
lw $t1, 0($s0) # load wordlw $t2, 4($s0)lw $t3, 8($s0)add $t6, $t2, $t3 # b + cmult $t6, $t1 # a * ( )mflo $t1 # keep 32 msbmult $t2, $t3 # b * cmflo $t2add $t1, $t1, $t2 # final additionsw $t1, 0($s1) # store res
res = a * (b + c) + b*c
+
*
+
*
a b c
João Canas Ferreira (FEUP) FPGA Architecture and Reconfigurable Computing May 2015 25 / 42
Advantages of reconfigurable computing
Key advantage 1Naturally concurrent computation
I “Natural” functioning mode of hardwareI In the absence of resource constraints, only dependencies restrict
operation
Key advantage 2Circuit can be tailored precisely to the requirements of application
I Bit-width optimizationI Partial evaluation
I Example: embedding constants in the circuitI May improve latency and power consumption
I Memory organization
João Canas Ferreira (FEUP) FPGA Architecture and Reconfigurable Computing May 2015 26 / 42
Implementation strategies for reconfigurable systems
Reconfigurable systems can follow two main strategies:
I Configure-once: (ASIC-like operation)I Single, system-wide configurationI FPGAs configured prior to operation
Variant: For some applications, input data remains constant for hours ordays: the bitstream is regenerated occasionally.
Example: acceleration of the SNORT packet filter by translating regularexpressions into hardware [Hutchings et al., 2002].
I Run-time reconfiguration (RTR):Application consists of multiple configurations for each device.
During normal execution, the FPGA is potentially reconfigured many times(configuration steps).
João Canas Ferreira (FEUP) FPGA Architecture and Reconfigurable Computing May 2015 27 / 42
General implementation strategies for RTRClassification of RTR-based systems according to scope of reconfiguration:
I Global RTR:All resources are reconfigured in each configuration step.
Example: Back-propagation training of artificial neural network divided in3 mutually exclusive phases: idle circuitry in each phase is eliminated[Eldredge and Hutchings, 1996].
I Local RTR: (partial reconfiguration)A subset of the resources is reconfigured in each step.
à Part of a single FPGA (or a whole FPGA in a multi-FPGA system)
à Ideally, the operation of the remainder of the system is not affected.
à Use of hardware resources adapts to run-time profile of application.
à Several tasks may be independently supported in hardware at the sametime (multiple hardware modules).
à Shorter reconfiguration time (individual hardware modules).
João Canas Ferreira (FEUP) FPGA Architecture and Reconfigurable Computing May 2015 28 / 42
Creation of configuration data
Default scenario: local RTR for one FPGA.à In the simplest development scenarios, configuration data is created atdesign time, together with the rest of system.
A
B
...
Z
A
...
+
If sequence of configuration steps is known: partial difference bitstreams canbe used to take the hardware from one configuration to another.
João Canas Ferreira (FEUP) FPGA Architecture and Reconfigurable Computing May 2015 29 / 42
The basic design flow
Main characteristics:
I Regular development flow and tools
Just a modification of the bitstream generation procedure to createpartial bitstreams.
I Full design for each configuration
Advantage: Safe—each configuration can be validated independently,including timing restrictions
Disadvantage: Time-consuming, with a great amount of redundant work.
Disadvantage: Any change of the common sections requires recreating allthe designs.
João Canas Ferreira (FEUP) FPGA Architecture and Reconfigurable Computing May 2015 30 / 42
Expanding hardware capacityà RTR can be used to provide more hardware support than would fit in astatic configuration.à Example: image processing for driver assistance [Claus et al., 2007]
João Canas Ferreira (FEUP) FPGA Architecture and Reconfigurable Computing May 2015 31 / 42
RTR Advantage: Increased flexibility
Increased flexibility afforded by RTR can be used to:
I develop versatile framework field updates [Fong et al., 2003]I develop sophisticated adaptive systems
à SAFES—Secure Architecture For Embedded Systems:
Support for security standards and defence against hardware attacks byusing reconfigurable hardware [Gogniat et al., 2008]
à Autonomous System-on-a-Chip Adaptation
Uses Bayesian network to choose and activate appropriate filter tomitigate changing RF interference [French et al., 2008].
Interference identification (96 %), correct filter selection (65% plus 16%partial mitigation). Virtex-4 FX100 FPGA.
Reaction time is 112 ms (against 3–5 s for human operator).
João Canas Ferreira (FEUP) FPGA Architecture and Reconfigurable Computing May 2015 32 / 42
SAFES
RTR security primitives: (i) speed up computation; (ii) allow switchingbetween different primitives; (iii) provide trade-offs.
The security primitivecontroller selects thebitstream correspond-ing to the chosen al-gorithm and parameters(in the “configuration”state).
Source: [Gogniat et al.,2008]
João Canas Ferreira (FEUP) FPGA Architecture and Reconfigurable Computing May 2015 33 / 42
Example: Autonomous interference mitigation
Source: [French et al., 2008]João Canas Ferreira (FEUP) FPGA Architecture and Reconfigurable Computing May 2015 34 / 42
Autonomous system feedback loop
Source:[French et al., 2008]
João Canas Ferreira (FEUP) FPGA Architecture and Reconfigurable Computing May 2015 35 / 42
Assembling bitstreams in runtime
I Objective: flexible bitstream generation
I Approach inspired by traditional software development:
Partial configurations are produced by assembling components from apreviously created library
Rationale: reduction of the effort involved in creating many partialbitstreams for RTR
I Additionally: Support for generation of bitstreams at run-time
à Prototype implementation for target platform: Virtex-II Pro + externalmemory
João Canas Ferreira (FEUP) FPGA Architecture and Reconfigurable Computing May 2015 36 / 42
Bitstream generation by component assembly
Problem: How to generate many similar configurations efficiently?
à Assemble configurations from partial bitstreams of smaller components.
Example: Creation of pipelines where each stage may have several variants.
Analogy: Linking several procedures to create one executable.
I library of basic components (medium granularity) withI bitstream format (black box)I interface information
I bitstream manipulation to create new assembliesI relocation of component bitstreams + mergingI interconnection: simplest (but less flexible) is by abutmentI no additional restrictions on the internal organization of the dynamic area
I fast process
João Canas Ferreira (FEUP) FPGA Architecture and Reconfigurable Computing May 2015 37 / 42
Example of bitstream assembly
Source: [Silva and Ferreira, 2006]João Canas Ferreira (FEUP) FPGA Architecture and Reconfigurable Computing May 2015 38 / 42
Example: cores for sound processing
João Canas Ferreira (FEUP) FPGA Architecture and Reconfigurable Computing May 2015 39 / 42
Research issues in dynamically reconfigurable systemsI Including hardware change (temporal dimension) in
design,implementation and validation.I High-level, general-purpose models for specification and verification are
not yet available.I Debugging adaptive systems is difficult.I Design space exploration is more complex (but can achieve better
results).I Benefits of RTR are application-dependent, but RTR may enable new
classes of systems.
à Trend towards complex, autonomous, adaptive embedded systems makesRTR more attractive:
I Increased use of heterogeneous, many-core SoCs (including RTR fine- andcoarse-grained fabrics).
I Areas: domestic robots, smart camera networks, cars, mobile broadbandwireless access (IEEE 802.16j), cognitive radio . . .
à And wouldn’t hardware JIT compilation be nice?João Canas Ferreira (FEUP) FPGA Architecture and Reconfigurable Computing May 2015 40 / 42
References I
Christopher Claus, Johannes Zeppenfeld, Florian Müller, and Walter Stechele. Usingpartial-run-time reconfigurable hardware to accelerate video processing in driver assistancesystem. In Proceedings of the conference on Design, automation and test in Europe, pages498–503, Nice, France, 2007. EDA Consortium.
James G. Eldredge and Brad L. Hutchings. Run-Time reconfiguration: A method for enhancingthe functional density of SRAM-based FPGAs. The Journal of VLSI Signal Processing, 12(1):67–86, 1996.
R. J. Fong, S. J. Harper, and Peter M. Athanas. A versatile framework for FPGA field updates: anapplication of partial self-reconfiguration. In Propc. 14th IEEE International Workshop onRapid Systems Prototyping, pages 117 – 123, June 2003.
Matthew French, Erik Anderson, and Dong-In Kang. Autonomous system on a chip adaptationthrough partial runtime reconfiguration. In 16th International Symposium onField-Programmable Custom Computing Machines (FCCM ’08), pages 77–86, 2008.
G. Gogniat, T. Wolf, W. Burleson, J.-P. Diguet, L. Bossuet, and R. Vaslin. Reconfigurablehardware for High-Security/ High-Performance embedded systems: The SAFES perspective.IEEE Trans.Very Large Scale Integration (VLSI) Systems, 16(2):144 –155, February 2008. ISSN1063-8210.
João Canas Ferreira (FEUP) FPGA Architecture and Reconfigurable Computing May 2015 41 / 42
References IIB. L. Hutchings, R. Franklin, and D. Carver. Assisting network intrusion detection with
reconfigurable hardware. In Proc. 10th Annual IEEE Symp. Field-Programmable CustomComputing Machines, pages 111–120, 2002.
Miguel L. Silva and João C. Ferreira. Support for partial run-time reconfiguration of platformFPGAs. Journal of Systems Architecture, 52(12):709–726, 2006.
João Canas Ferreira (FEUP) FPGA Architecture and Reconfigurable Computing May 2015 42 / 42