automating shift-register-lut based run-time reconfiguration karel heyse, brahim al farisi, karel...

20
Automating Shift- Register-LUT Based Run-Time Reconfiguration Karel Heyse, Brahim Al Farisi, Karel Bruneel, Dirk Stroobandt [email protected]

Upload: tracey-anthony

Post on 25-Dec-2015

219 views

Category:

Documents


0 download

TRANSCRIPT

Automating Shift-Register-LUT Based Run-Time Reconfiguration

Karel Heyse, Brahim Al Farisi, Karel Bruneel, Dirk Stroobandt

[email protected]

ARC 2012 2

Run-Time Reconfiguration (RTR)

• Changing (part of) circuit at run-time• To save area, power, time … money

zz z

zz z

i0-3

f(i0-3) g(i0-3)h(i0-3)

ARC 2012 3

Run-Time Reconfiguration – cont.

• Reconfiguration time– Time during which (part of) circuit is disabled– Can nullify gains of RTR• Defines when RTR is feasible

– Faster reconfiguration is important

SRL reconfiguration

ARC 2012 4

SRL

Reconfiguration methods

• ICAPHow: Similar to configuration interface, frame based

• SRL reconfigurationHow: Shift-register functionality of LUT’s truth table

inputsshift-in

shift-enableshift-clock

shift-out

output

SRL SRL

ARC 2012 5

Reconfiguration methods – cont.

• SRL reconfiguration+ Very fine grained+ Lower overhead+ Bandwidth adjustableFAST− Only LUTs

TLUTMAP

• ICAP− Coarse grained− Higher overhead− Fixed bandwidth

+ Full reconfiguration

ARC 2012 6

TLUTMAP - Technology mapper

• Takes an HDL design with some slow inputs• Creates configuration:– Dynamically specialisable for the slow inputs– By reconfiguring only part of the LUTs

• Smaller & faster specialised design– FIR filter: -39% LUTs, +38% max clock freq.– TCAM: -66% LUTs, +30% max clock

freq.• Fast RTR

ARC 2012 7

You are here (★)

• Run-time reconfiguration• Reconfiguration methods• TLUTMAP – Technology mapper• Generating reconfiguration chains• Modelling as mTSP• Solution method• Results

ARC 2012 8

Generating reconfiguration chains

• SRLs have to be chained, connected to configuration manager

Configuration manager

ARC 2012 9

Optimising reconfiguration chains

• Influence on the design– Shares routing resources: routability, clock speed

Minimise combined length of reconfiguration chains

• Influence on the reconfiguration time– Clock cycles

Minimise #SRLs in longest chain

– Clock speed reconfigurationMinimise longest connection

ARC 2012 10

Modelling as mTSP

• We chose: Generating chains after placement– Position of SRLs fixed & known

• Model: multiple Travelling Salesman Problem– Minimise influence on design

ARC 2012 11

Constrained mTSP

• Extra constraints to optimise reconfiguration time– Minimise # cities per salesman– Minimise longest link

ARC 2012 12

Solution method:Simulated Annealing

• Summary– Iterative heuristic:• Repetitive small, random alterations to a solution

– Temperature (T): • Starts high: exploration• Ends low: converge to minimum

ARC 2012 13

Solution method:Simulated Annealing – cont.

• Solution space– Common starting point for salesmen– No fixed end point for salesmen– Every salesman visits same number of cities (±1)

Minimise # cities per salesman

ARC 2012 14

Solution method:Simulated Annealing – cont.

• Random alterations

1)

2)

ARC 2012 15

Solution method:Simulated Annealing – cont.

• Cost function

0 0.2 0.4 0.6 0.8 1 1.20

0.20.40.60.8

11.21.41.6

l/lmax

C(l)

ARC 2012 16

Experimental results

• Designs– TCAM: 60% reconfigurable LUTs– FIR filter: 37% reconfigurable LUTs

• Evaluated using Xilinx Tools– Flow:• Place• Insert reconfiguration chains• Place & route

ARC 2012 17

Results: Clock of the design

Relative to design without reconfiguration chainsAveraged over experiments with 1, 4, 16 and 32 chains

128 elem.

256 elem.

512 elem.

1024 elem.

-20%

-10%

0%

10%

20%

30%

40%

50%

TCAM

SRL only

Random

mTSP

Manual

32 taps 64 taps 128 taps 256 taps-20%

-10%

0%

10%

20%

30%

40%

50%

FIR

ARC 2012 18

Results

• Second placement step– VPR: -4% to -28% clock speed– Xilinx: +130% longer reconfiguration chain

• Number of reconfiguration chains– No influence on clock design– Small influence on clock reconfiguration

• Max clock speed reconfiguration– 1x to 2x clock speed design

ARC 2012 19

Conclusion

• Automated method to generate reconfiguration chains

• Takes into account routability of design and reconfiguration speed

• Better than random, almost as good as manual• Could be improved by avoiding second placement

step

Automating Shift-Register-LUT Based Run-Time Reconfiguration

Karel Heyse, Brahim Al Farisi, Karel Bruneel, Dirk Stroobandt

[email protected]