synthesis : e w a fpga ^ tools lag behind industrycommercial xilinx fpgas, a comparison can be made...

1
Experimental Flow ( SYNTHESIS ) : EXAMINING WHERE ACADEMIC FPGA TOOLS LAG BEHIND INDUSTRY ^ Photo credit: Leandro Amato – https://www.flickr.com/photos/grunge/14302660967/ Department of Computing, Department of Electrical and Electronic Engineering Executive Summary: Through extending the open-source academic CAD suite Verilog-To-Routing (VTR) onto targeting commercial Xilinx FPGAs, a comparison can be made with industrial tools that indicate the academic delay gap is +31% at synthesis, +10% at packing & placement, and +15% at routing. Introduction FPGA research into architecture design, and on the corresponding computer-aided design (CAD) algorithms, is typically conducted on the academic VTR toolflow. Unfortunately, since these tools do not currently target commercial architectures, any CAD innovations that are made cannot be easily evaluated on physical silicon. Key Challenge: Adding VTR support ... … for the complex and irregular Xilinx Virtex-6 architecture. FPGA Tile x4 x4 x2 x4 L=2 L=4 x4 L=16 (bi-dir) L=1 x4 x4 x4 L=4 x7 L=16 (bi-dir) L=2 L=1 x5 x4 L=2 L=4 L=2 L=4 x4 x4 L=4 L=2 x6 x2 N E S W x5 x4 x7 x4 x4 x4 x4 SLICE x2 BLE O6 FF O6 O5 {O6,O5, COUT} COUT FF {O6,O5,AX} AQ A AMUX A6:A1 AX BLE BLE BLE x3 XORCY XADDER CIN3 5LUT 5LUT Fracturable 6LUT A6 CIN3 CIN MUXCY Industrial Comparison Results Flow 1 Flow 5 Geomean ODIN II failed ODIN II failed Area: Area Gap: +82% (13%) (6%) (7%) (4%) Synthesis proportion Runtime: Runtime Gap: 3.5X (synthesis stage consumes a diminishing proportion of total runtime ...) +31% +10% +15% Delay: (from Xilinx TRCE) Difference between flows 1 & 2 Synthesis Gap: 31% (… yet contributes largest gap!) Pack&Place Gap: 10% Routing Gap: 15% Target FPGA: mid-range Xilinx Virtex-6 xc6vlx240t, with 150KLUTs. All results geometrically averaged over 10 different placement seeds. Logic cluster: composed of four 6-input LUTs, fracturable into two 5-input LUTs followed by two flip-flops, with partially-hardened adder resources (XADDER) as well as a vertical carry-chain. RAMs and DSPs are also supported. Routing architecture: On top of VTR's default routing model, Virtex-6 devices also contain a mix of bi- and unidirectional, diagonal and bent wires, amongst other features. This exact architecture is extracted from Torc and stitched into VTB. Yosys is a Verilog synthesis tool gaining traction in the ASIC community with extensive support for Verilog-2005 as well as for BLIF and EDIF netlist formats Current Limitations Support for xc6vlx240t only, contributions welcome for other Xilinx architectures (though due to ISE being deprecated, no path beyond Virtex-7 currently exists) Incomplete support for all architectural features, e.g. register control (clock enable, set-reset), distributed memory (SLICEM), wide multiplexers (MUXF7/F8), etc. Future Work Investigate outliers (e.g. area utilisation of the 'stereo2' benchmark) Improve routing runtime (in particular, two high fanout nets in the 'mcml' benchmark were observed to consume 70% of the total routing runtime) Examine the effect of increasing synthesis effort – for example with more aggressive technology-mapping algorithms. Conclusion The finding that the synthesis stage of the academic VTR CAD suite consumes the least amount of total runtime (on average, 4%) yet contributes the largest delay gap (31%) across the three stages leads us to believe that not only should research focus on back-end tools such as VPR, but that opportunities also exist at the front-end, too. Eddie Hung [email protected] Imperial College London, England VTR-to-Bitstream v2.0 available from http://eddiehung.github.io Grateful for support from the UK EPSRC (grants EP/I012036/1 & EP/I020357/1) and for equipment and license donations from Xilinx. Acknowledgements: The long term goal of this work is to bridge this divide, allowing researchers to use real parts in weird (but also wonderful!) ways. One such way that is now accessible is to make a robust comparison between academic and industrial tools by targeting the same FPGA. Theoretical FPGA Architectures Proprietary CAD Tools Physical FPGAs ? Academic Industry VTR/VPR Main Contributions i) VTR-to-Bitstream (VTB) v2.0 update – an open-source extension that improves front-end Verilog support by leveraging Yosys, as well as back-end support for timing-driven routing on Xilinx architectures. Available from: http://eddiehung.github.io ii) Applied this extended toolflow to make fair and rigorous comparisons between the quality of results gained by academic and industrial offerings, by targeting an identical commercial device and analysing the outputs using an industrial static timing analysis tool. Xilinx ISE Verilog HDL .edif .bit .twr Yosys – Synthesis (new) VTR Flow 1 Flow 2 Flow 5 Flow 3 Flow 4 .ncd .ncd .blif .blif Vanilla ISE Vanilla VTR .ncd bitgen trce – STA ABC – T.map VPR – Route Odin II – Synthesis ABC – Tech. map VPR – Route VPR – Pack & Place xdl2ncd VTR-to-Bitstream v2 VPR – Pack & Place ngdbuild – Merge xst – Synthesis DRC xdl2ncd DRC xdl2ncd DRC .blif Prior work xdl2ncd VTB v1 par Flows proposed in this work Architecture Description (new) (e.g. cluster model, placement sites, routing model, wire delays, etc.) VTR-to-Bitstream v2 VTR-to-Bitstream v2 ngdbuild – Merge map – Pack & Place map – Pack & Place par – Route par – Route par – Route

Upload: others

Post on 10-Oct-2020

5 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: SYNTHESIS : E W A FPGA ^ TOOLS LAG BEHIND INDUSTRYcommercial Xilinx FPGAs, a comparison can be made with industrial tools that indicate the academic delay gap is +31% at synthesis,

Experimental Flow

(SYNTHESIS)

: EXAMINING WHERE ACADEMIC FPGA TOOLS LAG BEHIND INDUSTRY^

Photo credit: Leandro Amato – https://www.flickr.com/photos/grunge/14302660967/

Department of Computing, Department of Electrical and Electronic Engineering

Executive Summary: Through extending the open-source academic CAD suite Verilog-To-Routing (VTR) onto targetingcommercial Xilinx FPGAs, a comparison can be made with industrial tools that indicate theacademic delay gap is +31% at synthesis, +10% at packing & placement, and +15% at routing.

Introduction

FPGA research into architecture design, and on the corresponding computer-aided design (CAD) algorithms, is typically conducted on the academic VTR toolflow. Unfortunately, since these tools do not currently target commercial architectures, any CAD innovations that are made cannot be easily evaluated on physical silicon.

Key Challenge: Adding VTR support ...

… for the complex and irregular Xilinx Virtex-6 architecture.

FPGA Tile

x4

x4

x2

x4 L=2L=4

x4

L=16(bi-dir)

L=1

x4

x4

x4L=4

x7

L=16(bi-dir)

L=2

L=1

x5

x4

L=2

L=4

L=2

L=4

x4

x4L=4

L=2

x6

x2

N

E

S

W

x5

x4 x7x4

x4

x4

x4

SLICE x2

BLE

O6

FF

O6

O5

{O6,O5,COUT}

COUT

FF{O6,O5,AX}AQ

A

AMUX

A6:A1

AX

BLEBLEBLE x3

XORCY

XADDER

CIN3

5LUT

5LUT

Fracturable 6LUTA6

CIN3

CIN

MUXCY

Industrial Comparison Results

Flow

1 Flow

5

Geomean

OD

IN II fa

iled

OD

IN II fa

iled

Area:

AreaGap:+82%

(13%)

(6%)

(7%)

(4%)

Synthesis proportion

Runtime:

Runtime Gap: 3.5X(synthesis stage consumes

a diminishing proportionof total runtime ...)

+31%

+10%

+15%

Delay: (from Xilinx TRCE) Difference between flows 1 & 2

Synthesis Gap: 31%(… yet contributes largest gap!)

Pack&Place Gap: 10%

Routing Gap: 15%

Target FPGA: mid-range Xilinx Virtex-6 xc6vlx240t, with 150KLUTs.All results geometrically averaged over 10 different placement seeds.

Logic cluster: composed of four 6-input LUTs, fracturable into two 5-input LUTs followed by two flip-flops, with partially-hardened adder resources (XADDER) as well as a vertical carry-chain. RAMs and DSPs are also supported.

Routing architecture: On top of VTR's default routing model, Virtex-6 devices also contain a mix of bi- and unidirectional, diagonal and bent wires, amongst other features. This exact architecture is extracted from Torc and stitched into VTB.

Yosys is a Verilog synthesis tool gaining traction in the ASIC community with extensive support for Verilog-2005 as well as for BLIF and EDIF netlist formats

Current Limitations● Support for xc6vlx240t only, contributions welcome for other Xilinx architectures

(though due to ISE being deprecated, no path beyond Virtex-7 currently exists)● Incomplete support for all architectural features, e.g. register control (clock enable,

set-reset), distributed memory (SLICEM), wide multiplexers (MUXF7/F8), etc.

Future Work● Investigate outliers (e.g. area utilisation of the 'stereo2' benchmark) ● Improve routing runtime (in particular, two high fanout nets in the 'mcml' benchmark

were observed to consume 70% of the total routing runtime)● Examine the effect of increasing synthesis effort – for example with more

aggressive technology-mapping algorithms.

Conclusion

The finding that the synthesis stage of the academic VTR CAD suite consumes the least amount of total runtime (on average, 4%) yet contributes the largest delay gap (31%) across the three stages leads us to believe that not only should research focus on back-end tools such as VPR, but that opportunities also exist at the front-end, too.

Eddie [email protected]

Imperial College London, England

VTR-to-Bitstream v2.0 available from http://eddiehung.github.io

Grateful for support from the UK EPSRC (grants EP/I012036/1 & EP/I020357/1) and for equipment and license donations from Xilinx.

Acknowledgements:

The long term goal of this work is to bridge this divide, allowing researchers to use real parts in weird (but also wonderful!) ways.

One such way that is now accessible is to make a robust comparison between academic and industrial tools by targeting the same FPGA.

Theoretical FPGA

Architectures

ProprietaryCAD Tools

PhysicalFPGAs?

Academic

Industry

VTR/VPR

Main Contributions

i) VTR-to-Bitstream (VTB) v2.0 update – an open-source extension that improves front-end Verilog support by leveraging Yosys, as well as back-end support for timing-driven routing on Xilinx architectures. Available from: http://eddiehung.github.io

ii) Applied this extended toolflow to make fair and rigorous comparisons between the quality of results gained by academic and industrial offerings, by targeting an identical commercial device and analysing the outputs using an industrial static timing analysis tool.

Xilinx ISE

Verilog HDL

.edif

.bit .twr

Yosys – Synthesis (new) VTR

Flow 1 Flow 2 Flow 5Flow 3 Flow 4.ncd

.ncd

.blif

.blif

Vanilla ISE Vanilla VTR

.ncd

bitgen trce – STA

ABC – T.map

VPR – Route

Odin II – Synthesis

ABC – Tech. map

VPR – Route

VPR – Pack & Place

xdl2ncd

VTR-to-Bitstream v2

VPR – Pack & Place

ngdbuild – Merge

xst – Synthesis

DRC

xdl2ncd DRC xdl2ncd DRC

.blif

Priorwork

xdl2ncd

VTB v1

par

Flows proposed in this work

ArchitectureDescription

(new)(e.g. cluster model,

placement sites,routing model,

wire delays, etc.)

VTR-to-Bitstream v2 VTR-to-Bitstream v2

ngdbuild – Merge

map – Pack & Placemap – Pack & Place

par – Route par – Routepar – Route