literature review

21
Literature Review Measuring the Gap Between FPGAs and ASICs Ian Kuon, Jonathan Rose University of Toronto IEEE TCAD/ICAS Feburary 2007 Henry Chen February 26, 2010

Upload: dwight

Post on 12-Jan-2016

35 views

Category:

Documents


2 download

DESCRIPTION

Literature Review. Measuring the Gap Between FPGAs and ASICs Ian Kuon, Jonathan Rose University of Toronto IEEE TCAD/ICAS Feburary 2007. Henry Chen February 26, 2010. Introduction. Trade-offs between FPGAs and standard-cell ASICs Decreased NRE, design time - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: Literature Review

Click to edit Master title styleLiterature Review

Measuring the GapBetween FPGAs and ASICs

Ian Kuon, Jonathan RoseUniversity of Toronto

IEEE TCAD/ICASFeburary 2007

Henry ChenFebruary 26, 2010

Page 2: Literature Review

Introduction

Trade-offs between FPGAs and standard-cell ASICs– Decreased NRE, design time– Increased silicon area, power; decreased performance

FPGA inefficiencies known and accepted,but largely un-quantified

Page 3: Literature Review

Previous Comparisons

Jones et al. (1986): MPGAs to standard cells– 1.52.6x area, ~1.1x delay– Estimates based on only 5 circuits

Brown et al. (1992): FPGAs to MPGAs– 812x area, ~3x delay– Optimistic FPGA gate counting?– Anecdotal evidence– Doesn’t consider “hard” macros (multipliers, memories)

Combine for FPGAs to standard cells– 1238x area, ~3.4x delay– Dated; based on (questionable?) extractions

Page 4: Literature Review

Previous Comparisons (2000’s)

Zuchowski et al. (2002): LUT to ASIC gate (0.25μm90nm)– ~1/45 gate density, 1214x delay, ~500x dynamic power

– Unexplained process-dependent density/power variation– Dependent on gates implemented per LUT

Wilton et al. (2005): Partial programmable replacement– 88x area, 2x delay– Single logic module

Compton & Hauck (2007): FPGA apps. to standard-cell– Avg 7.2x area– Scaled FPGA 0.15μm to 0.18μm standard-cell

Page 5: Literature Review

Methodology

Implement in both FPGA and standard-cell– Altera Stratix II FPGA: TSMC 90nm multi-Vt, 1.2V

– Standard-cell: ST CMOS090 90nm, dual-Vt, 1.2V

Empirical results from 23 benchmarks– Rejected if different synthesis tools resulted in

>5% register count deviation– Mix of logic, memory, DSP

Analyze gains from FPGA’s DSP and memory blocks Exclude I/Os Have device data from Altera

Page 6: Literature Review

Implementations

FPGA– Altera-provided CAD flow– Speed/area balanced optimization; optimize critical paths

performance, otherwise optimize area– Automatic DSP, memory block inference– Set to mimic effects of high resource utilization

ASIC– Synopsys/Cadence synthesis/PAR flow– Free to choose from high/standard-Vt cells

– Timing-driven placement; target 7585% utilization– Emphasized performance in compiled memories

Page 7: Literature Review

Area Comparison

ASIC– Post PAR’d core area– Include memory macros

FPGA– Count only silicon area for used resources– Include surrounding routing resources– Count full block area even if only partially used– Area data from Altera

Page 8: Literature Review

Area Comparison Results

Logic only:35x avg (17‒54x)

Logic + DSP:25x avg (12‒58x)

Logic + Memory:33x avg (19‒70x)

Logic + Memory + DSP:18x avg (9.5‒26x)

Page 9: Literature Review

Impact of Hard Macros on Area

Smaller area penalty for designs using hard macros– Hard macro close to ASIC implementation

(plus programmable interface & routing)

Page 10: Literature Review

Area Comparison Caveats

Pessimistic FPGA area estimation; count full resource area even if only partially used (~5‒10% reduction)

ASIC density may decrease for larger designs, while FPGAs are designed to handle large designs

Page 11: Literature Review

Delay Comparison

Altera Quartus II / Synopsys PrimeTime SI

Static timing analysis to extract max. clock frequency

Compare for different FPGA speed grades– FPGAs are binned for performance– ASICs tend to be designed for worst-case

Page 12: Literature Review

Delay Comparison Results(Fastest Speed Grade)

Logic only:3.4x avg (1.9‒5.0x)

Logic + DSP:3.5x avg (2.4‒4.7x)

Logic + Memory:3.5x avg (2.8‒4.3x)

Logic + Memory + DSP:3.0x avg (2.6‒3.5x)

Page 13: Literature Review

Delay Comparison Results(Slowest Speed Grade)

Logic only:4.6x avg (2.5‒6.7x)

Logic + DSP:4.6x avg (3.0‒6.3x)

Logic + Memory:4.8x avg (3.8‒5.7x)

Logic + Memory + DSP:4.1x avg (3.8‒4.7x)

Page 14: Literature Review

Impact of Hard Macros on Delay

Almost no benefit—sometimes penalty!– Fixed positions in FPGA; extra routing to use– Fixed architecture; some apps. may not use efficiently

Page 15: Literature Review

Power Comparison

Altera Quartus II Power Analyzer / Synopsys PrimePower

Compare power, not energy consumption– FPGAs slower; need more time or parallelism– Implement for highest speed possible– Simulate at same operating frequency, voltage

Measure only core power

Assume constant toggle rates for all nets in design– Meaningful test vectors not available for all designs

FPGA static power consumption scaled by used fraction

Page 16: Literature Review

Power Comparison Results

Logic only:14x avg (5.7‒52x)

Logic + DSP:12x avg (7.5‒16x)

Logic + Memory:14x avg (12‒16x)

Logic + Memory + DSP:7.1x avg (5.3‒8.3x)

Page 17: Literature Review

Impact of Hard Macros on Power

Slight benefit—primarily from area savings?– Less area and interconnect

Page 18: Literature Review

Power Consumption Caveats

May be disproportionate power in FPGA clock network– “Overdesigned” for tested circuits– Could have small incremental power increase

ASIC clock network would have to grow with designs

Page 19: Literature Review

Static Power Comparison

Unable to draw useful conclusions about static power– 87x for typical silicon, typical temp. (25°C)– 5.4x for worst-case silicon, worst-case temp. (85°C)

Had to scale worst-case silicon temp. characterization

Subthreshold leakage is process-dependent– Little information on leakage estimate factors– Different processes from different foundries

Some correlation between static power and area gap(correlation coefficient ~0.8)– Hard macros likely reduced static power penalty

Page 20: Literature Review

Conclusions

Disparity hard to quantify—very application dependent– Avg. gap gap 3x; gap gap range 1.3‒9.1x

All-LUT designs avg. 35x area, 3.4‒4.6x delay, 14x power– 119x area, 47.6x power gap for equal performance

(assuming ideal parallelization)

Hard macros reduce area and power, but have little performance benefit– Avg. 18x area, 3‒4.1x delay, 7.1x power– 54x area, 21.3x power for equal performance

Page 21: Literature Review

References

Jones, Jr., H. S., Nagle, P. R., Nguyen, H. T., “A Comparison of Standard Cell and Gate Array Implementations in a Common CAD System”, Proc. IEEE CICC, 1986, pp. 228232

Brown, S. D., Francis, R., Rose, J., Vranesic, Z., Field-Programmable Gate Arrays, Norwell, MA: Kluwer, 1992

Zuchowski, P. S., Reynolds, C. B., Grupp, R. J., Davis, S. G., Cremen, B., Troxel, B., “A Hybrid ASIC and FPGA Architecture,” Proc. ICCAD, Nov. 2002, pp. 187194

Wilton, S. J., Kafafi, N., Wu, J. C. H., Bozman, K. A., Aken’Ova, V., Saleh, R., “Design Considerations for Soft Embedded Programmable Logic Cores”, IEEE JSSC, vol 40, no. 2, pp. 485497, Feb. 2005

Compton, K., Hauck, S., “Automatic Design of Area-Efficient Configurable ASIC Cores,” IEEE Trans. Comp., vol 56, no. 5, pp. 662672, May 2007