reducing the pressure on routing resources of fpgas with generic logic chains
DESCRIPTION
Reducing the Pressure on Routing Resources of FPGAs with Generic Logic Chains. Hadi P. Afshar Joint work with: Grace Zgheib , Philip Brisk and Paolo Ienne. FPGAs and ASICs Gaps*. How to narrow the gap ? Specialized (DSP) blocks Coarser grained logic b locks Hard-wired connections. - PowerPoint PPT PresentationTRANSCRIPT
Reducing the Pressure on Routing Resources of FPGAs with Generic Logic Chains
Hadi P. Afshar
Joint work with: Grace Zgheib, Philip Brisk and Paolo Ienne
2
FPGAs and ASICs Gaps*
• Performance– Ratio: 3-4
• Area– Ratio: 20-35
• Power– Ratio: 7-15
*I. Kuon and J. Rose, "Measuring the gap between FPGAs and ASICs“, IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems, Vol. 26, NO. 2, FEBRUARY 2007, pp. 203 – 215.
Routing resources consume ≈60-80% of the chip area and are significant contributors to circuit delay.
Concerns:✘ Lack of generality and flexibility✘ Underutilization✘ Change in routing structure
How to narrow the gap? Specialized (DSP) blocks Coarser grained logic blocks Hard-wired connections
3
Fracturable LUTs
S0
S1
S2
S3
S4
S5
S6
S7
2-LUT
i0 i1 i2
LUT
LUT
LUT
LUT
LUT
LUT
LUT
LUT
LUT
LUT
LUT
LUT
2-LUT
3-LUT
4
Motivation
CLB
CLB
CLB
CLB
CLB
CLB
CLB
CLB
CLB
CLB
CLB
4-LUT
4-LUT
4-LUT
4-LUT
CLB
Fracturable LUT structure and extraCLB outputs reduce the problem oflarge LUT under-utilization.
6-LUT
5-LUT
5-LUT
8 In
puts
5-LUT
3-LUT
4-LUT
4-LUT
5
What is the solution?
4-LUT
4-LUT
4-LUT
4-LUT
+
+
?
?
CLB
CLB
CLB
CLB
CLB
CLB
CLB
CLB
CLB
CLB
CLB
CLB
8 In
puts
More input bandwidth Improved logic density Dedicated and faster connections
6
Vertical Look-Up Tables
4-LUT
4-LUT
4-LUT
4-LUT
+
+
5-LUT
5-LUT
A 5-LUT can be built by two 4-LUTs with shared inputs and a multiplexer that selects between the two sub-LUTs and is controlled by the 5th input.
Two 5-LUTs in the logic cell with disjoint inputs No routing wire is needed for the interconnection No change in the routing network interface
Fanout
7
Example
Routing wire
Hard-wired logic chain
F(i0,i1, ... i12) F(i0,i1, ... i15)
8
Chaining HeuristicInput
Output
1 2
3
4
5
2 0
5
1
Input
Output
2
0
1
1
Input
Output
We need to find chains of functions, which have 5 or less number of inputs, to be mapped on the logic chains (vertical 5-LUTs)
Synthesis and Chaining ResultsBenchmark Chainable Chained Max Chain
LengthAverage Chain
Lengthalu4 94% 39% 12 5.2
pdc 89% 53% 9 5.8
misex3 93% 42% 9 5.1
ex1010 60% 47% 8 5.3
ex5p 88% 46% 7 5.2
des* 77% 20% 4 3.1
apex2 84% 39% 8 4.9
apex4 82% 59% 8 4.3
spla 91% 46% 11 5.3
seq 88% 43% 6 4.9
Average 85% 44% 8.2 4.9
9* The minimum threshold for the chain length is 4, except for “des” which is 3.
10
Experimental Methodology
4-LUT
4-LUT
4-LUT
4-LUT
+
+Si
mila
r Int
erfa
ce
Chain Heuristic
Netlist Generation
Critical Path Finding Power Estimation
Synthesis and LUT MappingQuartus-II
DAG GenerationVQM Parser
Place and RouteQuartus-II
Goal: Extract chains of eligible functions from the synthesized netlist in order to place them on the logic chains; the non-chained ones are remained unchanged.
5-LUT
5-LUT
ABC?VPR?
11
Logic Cell Utilization
alu4 pdc misex3 ex1010 ex5p des apex2 apex4 spla seq Avg0
100
200
300
400
500
600
700
800
Logic Cells (ALMs)
Stratix-III New Cell
4% saving in the ALM counts
12
Local Routing Wires
alu4 pdc misex3 ex1010 ex5p des apex2 apex4 spla seq Avg0
100
200
300
400
500
600
700
800
Local Interconnection Wires
Stratix-III New Cell
37% saving in local wires number
13
Total Wire Lengths
alu4 pdc misex3 ex1010 ex5p des apex2 apex4 spla seq Avg0
1000
2000
3000
4000
5000
6000
7000
Total Interconnection Wires
Stratix-III New Cell
12% saving in total wire lengths
14
Delay
alu4 pdc misex3 ex1010 ex5p des apex2 apex4 spla seq Avg0
1
2
3
4
5
6
Delay (ns)
Stratix-III New Cell
No average delay penalty for the placement restriction
15
Did I say something new?!
• Local connection in Altera Stratix and Cyclone– Use available logic cell bandwidth– No fracturable LUT structure
• Local connections in Xilinx FPGAs, goes through multiplexers– Carry look-ahead– Wide AND functions
• Cascading LUTs to build bigger LUTs in Xilinx Virtex-5– Routing wire– Few large functions
16
Conclusion
Narrow the FPGA and ASIC Gaps
Lighten the stress on routing resources
Hardwired connections + Dedicated logic
More logic density Less Power
More LC bandwidth
Less routing wiresLess circuit delay
Improved Routability with a Lighter Network
17
Future Work
• Logic chain aware synthesis
• Guided chaining heuristic
• Multiple logic chains
• 2-D logic chains