design framework for partial run-time fpga reconfiguration
DESCRIPTION
ERSA 2008 Las Vegas, NV July 14–17, 2008. Design Framework for Partial Run-Time FPGA Reconfiguration. Chris Conger, Ann Gordon-Ross, and Alan D. George Presented by: Abelardo Jara-Berrocal HCS Research Laboratory College of Engineering University of Florida. Outline. Introduction - PowerPoint PPT PresentationTRANSCRIPT
Design Framework for Design Framework for Partial Run-Time FPGA Partial Run-Time FPGA ReconfigurationReconfiguration
Chris Conger, Ann Gordon-Ross, and Alan D. George
Presented by: Abelardo Jara-Berrocal
HCS Research LaboratoryCollege of Engineering
University of FloridaERSA 2008Las Vegas, NVJuly 14–17, 2008
2
Outline Introduction Partial Reconfiguration (PR) Overview Proposed Design Methodologies Framework analysis Conclusions
3
General purpose I/O
System controller
FPGA
Configuration lines
Shared memory
Battery
Module A
Module B
Module A
Module BModule A
Module B
Module C
Introduction – Fully reconfigurable systems
Bitstreams storage
External I/O
Design station
Required design
1. Device too small for complex designs
Module C
Module B
Module A
Module B
Module A
Module C
Module C
Module B
Module A
Module C
2. Big full bitstreams (long reconfiguration time)
Config 1
Config 2
Config 3Config 1 RequestConfig 2 Request
3. Complete system operation is halted prior to reconfiguration
Doe
s’nt
fit Module C
Module B
disabled
disabled
enabled
enabled
disabled
disabled
4
Newer Xilinx FPGA families offer partial reconfiguration feature A rectangular region of the FPGA can be reconfigured without affecting
the remaining FPGA area System can continue operating without interruption
Introduction – The Virtex 4 PR architecture
)Reconfigurable
region 1
Reconfigurable region 2
5
Module A
Module C
Module B
Introduction – A sample PR architecture
FPGA
Bitstreams storage
Battery
External I/O
Module C
3. Smaller partial bitstreams
Module A request
1. System controller does not need to be placed in an external device2. Access to fast Internal Configuration Access Port (ICAP – 32 bits, 100 MHz)
4. No need to halt complete system when reconfiguring a module5. Time multiplexing of FPGA resources, load and unload HW modules on demand
Base system configuration
JTAG
Reconfigurable area
disabled
disabled
Co
ntr
oll
er
(Mic
rob
laze
)
ICAP
Fla
sh
co
ntr
oll
er
Module C
Module B
enabled
Module Aenableddisabled
Static area
Module A
Module B
6
Co
ntr
oll
er
(Mic
rob
laze
)
ICAP
Fla
sh
co
ntr
oll
er
Introduction – Current PR Design Flow Steps
Partition the system into modules Define static modules and
reconfigurable modules Decide the number of PR regions
(PRRs) Decide PRR sizes, shapes and
locations Map modules to PRRs Define PRR interfaces, instantiate
slice macros for PRR interfaces
Optimization problems Design partitioning Number of PRRs PRR sizes, shapes and locations Mapping PRMs to PRRs Type and placement of PRR
interfaces
Module A
Module C
Module B
Static modules Reconfigurable Modules (PRMs)
12
FP
GA
# of PRRs?
PRR 1
PRR 2
Sta
tic r
egio
nStatic modules
Modules: A and B
Modules: C
De
sig
n
pa
rtiti
on
ing
De
sig
n
floo
rpla
nn
ing
a
nd
bu
dg
etin
g
7
Introduction – Early Access PR Design Flow Introduced by Xilinx in FPL’06
Major improvements: Automatic implementation scripts Rectangular regions (not full column reconfiguration) Static nets can cross reconfigurable regions Slice macros replace bus macros
Partitioning and floorplanning steps are manually executed Design guidelines for these steps are not provided
(manual)
Placement and PRRs constraints
PRM Bitstreams
Design partitioning
Design floorplanning and budgeting
Xilinx PR Implementation
FlowFull Initial Bistream
Reconfigurable design
specifications
(automatic)Potential for development of automatic CAD tools
8
Introduction – Current PR design tools limitations
PR design is a very specialized task Only a physical level of support is provided
Architectural knowledge of the target device is a must Not very flexible, many design constraints
Partitioning and floorplanning steps are manually executed No performance sensitive design guidelines are provided No automatic heuristics based design flow is available too
Lack of abstraction from low level details discourages designers from using PR Difficult for many end users
In this work, we will propose a taxonomy of PR systems design flows and a efficient methodology for each type.
9
PR Overview – Taxonomy of PR systems design flows PR System
Design Flow
MultipurposeSpecial purpose
Highly specialized systems design
All PRMs that will exist on the system are known at design time
Each PRR is independently optimized (size, shape, location, interface) based on the PRMs that will be mapped to it
Output is:
1) Floorplan defining a static region and a set of optimized PRRs
2) The set of PRMs that can be placed in each PRR (PRMs to PRRs mapping)
Not optimized for a specific application
PRMs required by the application are not known when designing the base system
Goal is to design a flexible and reusable base design that can be used for several different PR systems
Base system designer defines a set of PRRs with fixed shapes, sizes, locations and interfaces
Generated floorplan is used as input template for the PRMs implementation
10
Proposed Design Methodology: Special-Purpose Partition the system into several
hardware modules Synthesize the hardware modules Use a control flow graph (CFG) and a
states table to represent: Application states and the transitions
between them (execution path coverage) Set of modules required in each
application state
Let
’s s
ee a
n e
xam
ple
11
Proposed Design Methodology: Special-Purpose
1. A, B are present in all states (static modules)
2. C, F, G and D are reconfigurable modules (PRMs)
3. F and G are mutually exclusive with respect to C (they can not be placed in the same PRR than C)
4. F, G, D and E can be placed in the same PRR
5. C, D and E can be placed in the same PRR
S1
S2
S5S4
S3STATE MODULES
S1 A, B, C
S2 A, B, C, F
S3 A, B, C, G
S4 A, B, D
S5 A, B, E
Static Reconfigurable
C
F
G
D
E
Define region partitioning constraints
Establishing constraints
12
4
?21 ?
Proposed Design Methodology: Special-Purpose Define the number of PRRs to be used
Optimization variable Number is computed based on CFG and states table
# PRRs =
Define a PRMs to PRRs mapping Optimization problem Combinatorial design space Design space is reduced usign design constraints
Static Region:
PRR 1:
PRR 2:
A, B
C, D, E
F, G Possible solution (not necessarily the optimal)
13
Module A
Module B
Module C
Module D
Module E
Module F
Module G
And when do we size our PRRs? Don’t worry, it is our next step
Proposed Design Methodology: Special-Purpose
Required static region resources (Resources
are added)
Required PRR 1 Resources (Maximum of
each resource type)
Required PRR 2 Resources (Maximum of each resource type)
Mo
du
les
pro
file
Slices BRAMs DSP48s
14
Fin
al o
ptim
ized
cus
tom
bas
e sy
stem
flo
orpl
an Define the PRR sizes, shapes, locations inside the FPGA fabric
Floorplanning optimization problem Proper metrics for PRR performance analysis are required Design guidelines for efficient PRR floorplanning are also a necessity
Proposed Design Methodology: Special-Purpose
FP
GA
Sta
tic r
egio
n
PRR 1 Resources
PRR 2 Resources
Reconfigurable region with enough resources for PRR1
PR
R1
PR
R2
We do the same for PRR2
Define PRR interfaces Place slice macros
15
Proposed Design Methodology: Special-Purpose Methodology outputs
Custom base system
PRMs to PRRs mapping
They are used as input files for the automatic Xilinx PR Design Flow
16
Proposed Design Methodology: Special-Purpose
Opportunity to automate this flow through design tools
Optimization variables Number of PRRs PRRs sizes, shapes, and
locations PRMs to PRRs mapping Other additional
optimization variables can be defined
Several possible cost functions: Area wastage Power usage Application latency Throughput …
17
Framework analysis – PRR Geometries PR system design flows require:
Proper metrics for PRR performance analysis
Design guidelines for efficient PRR floorplanning
Study of the effects of varying PRR shape over Maximum Clock Frequency Partial Bitstream Size
Five separate test cores: Beamforming (DSP/slice) CFAR (slice/memory) AES (register) ARM7 softcore (hybrid) Sine/Cosine LUT (memory)
Performed on V4SX55 thus far
Aspect ratio =
PRR Height / PRR Width
18
Framework analysis – Beamforming (~125 MHz, 40%)
5022 slices 16 DSP48s 17 RAMB16s Baseline, non-PR performance = 1614 kB, 127.845 MHz
Clo
ck fr
eq
uen
cy (
MH
z)
Bits
trea
m s
ize
(kB
)
Aspect ratio Aspect ratio
19
Framework analysis – CFAR (~100 MHz, 16%)
2610 slices 2 DSP48s 34 RAMB16s Baseline, non-PR performance = 1001 kB, 103.616 MHz
Clo
ck fr
eq
uen
cy (
MH
z)
Bits
trea
m s
ize
(kB
)
Aspect ratio Aspect ratio
20
Framework analysis – AES (~80 MHz, 13.75%)
3634 slices 3943 registers 4 RAMB16s Baseline, non-PR performance = 1393 kB, 80.483 MHz
Clo
ck fr
eq
uen
cy (
MH
z)
Bits
trea
m s
ize
(kB
)
Aspect ratio Aspect ratio
21
Framework analysis – ARM7 (~40 MHz, 6.8%)
1826 slices 16 DSP48s 10 RAMB16s Baseline, non-PR performance = 872 kB, 40.985 MHz
Clo
ck fr
eq
uen
cy (
MH
z)
Bits
trea
m s
ize
(kB
)
Aspect ratio Aspect ratio
22
Framework analysis – Sine/Cosine LUT
107 slices 27 RAMB16s Baseline, non-PR performance = 571 kB, 204.918 MHz
Clo
ck fr
eq
uen
cy (
MH
z)
Bits
trea
m s
ize
(kB
)
Aspect ratio Aspect ratio
23
Framework analysis – PRR Geometries Slice-intensive designs show best bitstream
size/clock frequency performance with aspect ratio around 2-4 Roughly equivalent to aspect ratio of the FPGA as a whole
Non-slice intensive designs show best bitstream performance with aspect ratio >> 4 Due to columnar distribution of RAMB16/DSP48 resources on
chip Clock frequency relatively insensitive to aspect ratio Not shown in graph: resource wastage also improved
Results are more pronounced for high frequency designs
However, aspect ratio not the only design consideration Placement on a chip relative to other regions, pins, or
resources may affect (restrict) choice of PRR shape
24
Conclusions - Contributions of this work Taxonomy for PR systems design flows and a design methodology for
efficient development of each type Identification of relevant optimization variables and constraints
Number of PRRs, optimal mapping of PRMs to PRRs, system floorplanning Propose their incorporation in a future automatic design tool
Study of the effects of varying PRR shape Maximum Clock Frequency Partial Bitstream Size Multiple classes of cores/designs
Memory-intensive DSP-intensive Combinational Logic-intensive Register-intensive Etc.
PRR floorplanning guidelines definitions and delivery
25
Questions