literature review
DESCRIPTION
Literature Review. Measuring the Gap Between FPGAs and ASICs Ian Kuon, Jonathan Rose University of Toronto IEEE TCAD/ICAS Feburary 2007. Henry Chen February 26, 2010. Introduction. Trade-offs between FPGAs and standard-cell ASICs Decreased NRE, design time - PowerPoint PPT PresentationTRANSCRIPT
Click to edit Master title styleLiterature Review
Measuring the GapBetween FPGAs and ASICs
Ian Kuon, Jonathan RoseUniversity of Toronto
IEEE TCAD/ICASFeburary 2007
Henry ChenFebruary 26, 2010
Introduction
Trade-offs between FPGAs and standard-cell ASICs– Decreased NRE, design time– Increased silicon area, power; decreased performance
FPGA inefficiencies known and accepted,but largely un-quantified
Previous Comparisons
Jones et al. (1986): MPGAs to standard cells– 1.52.6x area, ~1.1x delay– Estimates based on only 5 circuits
Brown et al. (1992): FPGAs to MPGAs– 812x area, ~3x delay– Optimistic FPGA gate counting?– Anecdotal evidence– Doesn’t consider “hard” macros (multipliers, memories)
Combine for FPGAs to standard cells– 1238x area, ~3.4x delay– Dated; based on (questionable?) extractions
Previous Comparisons (2000’s)
Zuchowski et al. (2002): LUT to ASIC gate (0.25μm90nm)– ~1/45 gate density, 1214x delay, ~500x dynamic power
– Unexplained process-dependent density/power variation– Dependent on gates implemented per LUT
Wilton et al. (2005): Partial programmable replacement– 88x area, 2x delay– Single logic module
Compton & Hauck (2007): FPGA apps. to standard-cell– Avg 7.2x area– Scaled FPGA 0.15μm to 0.18μm standard-cell
Methodology
Implement in both FPGA and standard-cell– Altera Stratix II FPGA: TSMC 90nm multi-Vt, 1.2V
– Standard-cell: ST CMOS090 90nm, dual-Vt, 1.2V
Empirical results from 23 benchmarks– Rejected if different synthesis tools resulted in
>5% register count deviation– Mix of logic, memory, DSP
Analyze gains from FPGA’s DSP and memory blocks Exclude I/Os Have device data from Altera
Implementations
FPGA– Altera-provided CAD flow– Speed/area balanced optimization; optimize critical paths
performance, otherwise optimize area– Automatic DSP, memory block inference– Set to mimic effects of high resource utilization
ASIC– Synopsys/Cadence synthesis/PAR flow– Free to choose from high/standard-Vt cells
– Timing-driven placement; target 7585% utilization– Emphasized performance in compiled memories
Area Comparison
ASIC– Post PAR’d core area– Include memory macros
FPGA– Count only silicon area for used resources– Include surrounding routing resources– Count full block area even if only partially used– Area data from Altera
Area Comparison Results
Logic only:35x avg (17‒54x)
Logic + DSP:25x avg (12‒58x)
Logic + Memory:33x avg (19‒70x)
Logic + Memory + DSP:18x avg (9.5‒26x)
Impact of Hard Macros on Area
Smaller area penalty for designs using hard macros– Hard macro close to ASIC implementation
(plus programmable interface & routing)
Area Comparison Caveats
Pessimistic FPGA area estimation; count full resource area even if only partially used (~5‒10% reduction)
ASIC density may decrease for larger designs, while FPGAs are designed to handle large designs
Delay Comparison
Altera Quartus II / Synopsys PrimeTime SI
Static timing analysis to extract max. clock frequency
Compare for different FPGA speed grades– FPGAs are binned for performance– ASICs tend to be designed for worst-case
Delay Comparison Results(Fastest Speed Grade)
Logic only:3.4x avg (1.9‒5.0x)
Logic + DSP:3.5x avg (2.4‒4.7x)
Logic + Memory:3.5x avg (2.8‒4.3x)
Logic + Memory + DSP:3.0x avg (2.6‒3.5x)
Delay Comparison Results(Slowest Speed Grade)
Logic only:4.6x avg (2.5‒6.7x)
Logic + DSP:4.6x avg (3.0‒6.3x)
Logic + Memory:4.8x avg (3.8‒5.7x)
Logic + Memory + DSP:4.1x avg (3.8‒4.7x)
Impact of Hard Macros on Delay
Almost no benefit—sometimes penalty!– Fixed positions in FPGA; extra routing to use– Fixed architecture; some apps. may not use efficiently
Power Comparison
Altera Quartus II Power Analyzer / Synopsys PrimePower
Compare power, not energy consumption– FPGAs slower; need more time or parallelism– Implement for highest speed possible– Simulate at same operating frequency, voltage
Measure only core power
Assume constant toggle rates for all nets in design– Meaningful test vectors not available for all designs
FPGA static power consumption scaled by used fraction
Power Comparison Results
Logic only:14x avg (5.7‒52x)
Logic + DSP:12x avg (7.5‒16x)
Logic + Memory:14x avg (12‒16x)
Logic + Memory + DSP:7.1x avg (5.3‒8.3x)
Impact of Hard Macros on Power
Slight benefit—primarily from area savings?– Less area and interconnect
Power Consumption Caveats
May be disproportionate power in FPGA clock network– “Overdesigned” for tested circuits– Could have small incremental power increase
ASIC clock network would have to grow with designs
Static Power Comparison
Unable to draw useful conclusions about static power– 87x for typical silicon, typical temp. (25°C)– 5.4x for worst-case silicon, worst-case temp. (85°C)
Had to scale worst-case silicon temp. characterization
Subthreshold leakage is process-dependent– Little information on leakage estimate factors– Different processes from different foundries
Some correlation between static power and area gap(correlation coefficient ~0.8)– Hard macros likely reduced static power penalty
Conclusions
Disparity hard to quantify—very application dependent– Avg. gap gap 3x; gap gap range 1.3‒9.1x
All-LUT designs avg. 35x area, 3.4‒4.6x delay, 14x power– 119x area, 47.6x power gap for equal performance
(assuming ideal parallelization)
Hard macros reduce area and power, but have little performance benefit– Avg. 18x area, 3‒4.1x delay, 7.1x power– 54x area, 21.3x power for equal performance
References
Jones, Jr., H. S., Nagle, P. R., Nguyen, H. T., “A Comparison of Standard Cell and Gate Array Implementations in a Common CAD System”, Proc. IEEE CICC, 1986, pp. 228232
Brown, S. D., Francis, R., Rose, J., Vranesic, Z., Field-Programmable Gate Arrays, Norwell, MA: Kluwer, 1992
Zuchowski, P. S., Reynolds, C. B., Grupp, R. J., Davis, S. G., Cremen, B., Troxel, B., “A Hybrid ASIC and FPGA Architecture,” Proc. ICCAD, Nov. 2002, pp. 187194
Wilton, S. J., Kafafi, N., Wu, J. C. H., Bozman, K. A., Aken’Ova, V., Saleh, R., “Design Considerations for Soft Embedded Programmable Logic Cores”, IEEE JSSC, vol 40, no. 2, pp. 485497, Feb. 2005
Compton, K., Hauck, S., “Automatic Design of Area-Efficient Configurable ASIC Cores,” IEEE Trans. Comp., vol 56, no. 5, pp. 662672, May 2007