design framework for partial run-time fpga reconfiguration

Post on 31-Dec-2015

38 Views

Category:

Documents

0 Downloads

Preview:

Click to see full reader

DESCRIPTION

ERSA 2008 Las Vegas, NV July 14–17, 2008. Design Framework for Partial Run-Time FPGA Reconfiguration. Chris Conger, Ann Gordon-Ross, and Alan D. George Presented by: Abelardo Jara-Berrocal HCS Research Laboratory College of Engineering University of Florida. Outline. Introduction - PowerPoint PPT Presentation

TRANSCRIPT

Design Framework for Design Framework for Partial Run-Time FPGA Partial Run-Time FPGA ReconfigurationReconfiguration

Chris Conger, Ann Gordon-Ross, and Alan D. George

Presented by: Abelardo Jara-Berrocal

HCS Research LaboratoryCollege of Engineering

University of FloridaERSA 2008Las Vegas, NVJuly 14–17, 2008

2

Outline Introduction Partial Reconfiguration (PR) Overview Proposed Design Methodologies Framework analysis Conclusions

3

General purpose I/O

System controller

FPGA

Configuration lines

Shared memory

Battery

Module A

Module B

Module A

Module BModule A

Module B

Module C

Introduction – Fully reconfigurable systems

Bitstreams storage

External I/O

Design station

Required design

1. Device too small for complex designs

Module C

Module B

Module A

Module B

Module A

Module C

Module C

Module B

Module A

Module C

2. Big full bitstreams (long reconfiguration time)

Config 1

Config 2

Config 3Config 1 RequestConfig 2 Request

3. Complete system operation is halted prior to reconfiguration

Doe

s’nt

fit Module C

Module B

disabled

disabled

enabled

enabled

disabled

disabled

4

Newer Xilinx FPGA families offer partial reconfiguration feature A rectangular region of the FPGA can be reconfigured without affecting

the remaining FPGA area System can continue operating without interruption

Introduction – The Virtex 4 PR architecture

)Reconfigurable

region 1

Reconfigurable region 2

5

Module A

Module C

Module B

Introduction – A sample PR architecture

FPGA

Bitstreams storage

Battery

External I/O

Module C

3. Smaller partial bitstreams

Module A request

1. System controller does not need to be placed in an external device2. Access to fast Internal Configuration Access Port (ICAP – 32 bits, 100 MHz)

4. No need to halt complete system when reconfiguring a module5. Time multiplexing of FPGA resources, load and unload HW modules on demand

Base system configuration

JTAG

Reconfigurable area

disabled

disabled

Co

ntr

oll

er

(Mic

rob

laze

)

ICAP

Fla

sh

co

ntr

oll

er

Module C

Module B

enabled

Module Aenableddisabled

Static area

Module A

Module B

6

Co

ntr

oll

er

(Mic

rob

laze

)

ICAP

Fla

sh

co

ntr

oll

er

Introduction – Current PR Design Flow Steps

Partition the system into modules Define static modules and

reconfigurable modules Decide the number of PR regions

(PRRs) Decide PRR sizes, shapes and

locations Map modules to PRRs Define PRR interfaces, instantiate

slice macros for PRR interfaces

Optimization problems Design partitioning Number of PRRs PRR sizes, shapes and locations Mapping PRMs to PRRs Type and placement of PRR

interfaces

Module A

Module C

Module B

Static modules Reconfigurable Modules (PRMs)

12

FP

GA

# of PRRs?

PRR 1

PRR 2

Sta

tic r

egio

nStatic modules

Modules: A and B

Modules: C

De

sig

n

pa

rtiti

on

ing

De

sig

n

floo

rpla

nn

ing

a

nd

bu

dg

etin

g

7

Introduction – Early Access PR Design Flow Introduced by Xilinx in FPL’06

Major improvements: Automatic implementation scripts Rectangular regions (not full column reconfiguration) Static nets can cross reconfigurable regions Slice macros replace bus macros

Partitioning and floorplanning steps are manually executed Design guidelines for these steps are not provided

(manual)

Placement and PRRs constraints

PRM Bitstreams

Design partitioning

Design floorplanning and budgeting

Xilinx PR Implementation

FlowFull Initial Bistream

Reconfigurable design

specifications

(automatic)Potential for development of automatic CAD tools

8

Introduction – Current PR design tools limitations

PR design is a very specialized task Only a physical level of support is provided

Architectural knowledge of the target device is a must Not very flexible, many design constraints

Partitioning and floorplanning steps are manually executed No performance sensitive design guidelines are provided No automatic heuristics based design flow is available too

Lack of abstraction from low level details discourages designers from using PR Difficult for many end users

In this work, we will propose a taxonomy of PR systems design flows and a efficient methodology for each type.

9

PR Overview – Taxonomy of PR systems design flows PR System

Design Flow

MultipurposeSpecial purpose

Highly specialized systems design

All PRMs that will exist on the system are known at design time

Each PRR is independently optimized (size, shape, location, interface) based on the PRMs that will be mapped to it

Output is:

1) Floorplan defining a static region and a set of optimized PRRs

2) The set of PRMs that can be placed in each PRR (PRMs to PRRs mapping)

Not optimized for a specific application

PRMs required by the application are not known when designing the base system

Goal is to design a flexible and reusable base design that can be used for several different PR systems

Base system designer defines a set of PRRs with fixed shapes, sizes, locations and interfaces

Generated floorplan is used as input template for the PRMs implementation

10

Proposed Design Methodology: Special-Purpose Partition the system into several

hardware modules Synthesize the hardware modules Use a control flow graph (CFG) and a

states table to represent: Application states and the transitions

between them (execution path coverage) Set of modules required in each

application state

Let

’s s

ee a

n e

xam

ple

11

Proposed Design Methodology: Special-Purpose

1. A, B are present in all states (static modules)

2. C, F, G and D are reconfigurable modules (PRMs)

3. F and G are mutually exclusive with respect to C (they can not be placed in the same PRR than C)

4. F, G, D and E can be placed in the same PRR

5. C, D and E can be placed in the same PRR

S1

S2

S5S4

S3STATE MODULES

S1 A, B, C

S2 A, B, C, F

S3 A, B, C, G

S4 A, B, D

S5 A, B, E

Static Reconfigurable

C

F

G

D

E

Define region partitioning constraints

Establishing constraints

12

4

?21 ?

Proposed Design Methodology: Special-Purpose Define the number of PRRs to be used

Optimization variable Number is computed based on CFG and states table

# PRRs =

Define a PRMs to PRRs mapping Optimization problem Combinatorial design space Design space is reduced usign design constraints

Static Region:

PRR 1:

PRR 2:

A, B

C, D, E

F, G Possible solution (not necessarily the optimal)

13

Module A

Module B

Module C

Module D

Module E

Module F

Module G

And when do we size our PRRs? Don’t worry, it is our next step

Proposed Design Methodology: Special-Purpose

Required static region resources (Resources

are added)

Required PRR 1 Resources (Maximum of

each resource type)

Required PRR 2 Resources (Maximum of each resource type)

Mo

du

les

pro

file

Slices BRAMs DSP48s

14

Fin

al o

ptim

ized

cus

tom

bas

e sy

stem

flo

orpl

an Define the PRR sizes, shapes, locations inside the FPGA fabric

Floorplanning optimization problem Proper metrics for PRR performance analysis are required Design guidelines for efficient PRR floorplanning are also a necessity

Proposed Design Methodology: Special-Purpose

FP

GA

Sta

tic r

egio

n

PRR 1 Resources

PRR 2 Resources

Reconfigurable region with enough resources for PRR1

PR

R1

PR

R2

We do the same for PRR2

Define PRR interfaces Place slice macros

15

Proposed Design Methodology: Special-Purpose Methodology outputs

Custom base system

PRMs to PRRs mapping

They are used as input files for the automatic Xilinx PR Design Flow

16

Proposed Design Methodology: Special-Purpose

Opportunity to automate this flow through design tools

Optimization variables Number of PRRs PRRs sizes, shapes, and

locations PRMs to PRRs mapping Other additional

optimization variables can be defined

Several possible cost functions: Area wastage Power usage Application latency Throughput …

17

Framework analysis – PRR Geometries PR system design flows require:

Proper metrics for PRR performance analysis

Design guidelines for efficient PRR floorplanning

Study of the effects of varying PRR shape over Maximum Clock Frequency Partial Bitstream Size

Five separate test cores: Beamforming (DSP/slice) CFAR (slice/memory) AES (register) ARM7 softcore (hybrid) Sine/Cosine LUT (memory)

Performed on V4SX55 thus far

Aspect ratio =

PRR Height / PRR Width

18

Framework analysis – Beamforming (~125 MHz, 40%)

5022 slices 16 DSP48s 17 RAMB16s Baseline, non-PR performance = 1614 kB, 127.845 MHz

Clo

ck fr

eq

uen

cy (

MH

z)

Bits

trea

m s

ize

(kB

)

Aspect ratio Aspect ratio

19

Framework analysis – CFAR (~100 MHz, 16%)

2610 slices 2 DSP48s 34 RAMB16s Baseline, non-PR performance = 1001 kB, 103.616 MHz

Clo

ck fr

eq

uen

cy (

MH

z)

Bits

trea

m s

ize

(kB

)

Aspect ratio Aspect ratio

20

Framework analysis – AES (~80 MHz, 13.75%)

3634 slices 3943 registers 4 RAMB16s Baseline, non-PR performance = 1393 kB, 80.483 MHz

Clo

ck fr

eq

uen

cy (

MH

z)

Bits

trea

m s

ize

(kB

)

Aspect ratio Aspect ratio

21

Framework analysis – ARM7 (~40 MHz, 6.8%)

1826 slices 16 DSP48s 10 RAMB16s Baseline, non-PR performance = 872 kB, 40.985 MHz

Clo

ck fr

eq

uen

cy (

MH

z)

Bits

trea

m s

ize

(kB

)

Aspect ratio Aspect ratio

22

Framework analysis – Sine/Cosine LUT

107 slices 27 RAMB16s Baseline, non-PR performance = 571 kB, 204.918 MHz

Clo

ck fr

eq

uen

cy (

MH

z)

Bits

trea

m s

ize

(kB

)

Aspect ratio Aspect ratio

23

Framework analysis – PRR Geometries Slice-intensive designs show best bitstream

size/clock frequency performance with aspect ratio around 2-4 Roughly equivalent to aspect ratio of the FPGA as a whole

Non-slice intensive designs show best bitstream performance with aspect ratio >> 4 Due to columnar distribution of RAMB16/DSP48 resources on

chip Clock frequency relatively insensitive to aspect ratio Not shown in graph: resource wastage also improved

Results are more pronounced for high frequency designs

However, aspect ratio not the only design consideration Placement on a chip relative to other regions, pins, or

resources may affect (restrict) choice of PRR shape

24

Conclusions - Contributions of this work Taxonomy for PR systems design flows and a design methodology for

efficient development of each type Identification of relevant optimization variables and constraints

Number of PRRs, optimal mapping of PRMs to PRRs, system floorplanning Propose their incorporation in a future automatic design tool

Study of the effects of varying PRR shape Maximum Clock Frequency Partial Bitstream Size Multiple classes of cores/designs

Memory-intensive DSP-intensive Combinational Logic-intensive Register-intensive Etc.

PRR floorplanning guidelines definitions and delivery

25

Questions

top related