2 explaining the gap between asic and custom power: a custom perspective andrew chang cadence design...
TRANSCRIPT
![Page 1: 2 Explaining The Gap Between ASIC and Custom Power: A Custom Perspective Andrew Chang Cadence Design Systems* William J. Dally Computer Systems Laboratory](https://reader035.vdocuments.mx/reader035/viewer/2022070308/551c1dda550346ad4f8b59cc/html5/thumbnails/1.jpg)
![Page 2: 2 Explaining The Gap Between ASIC and Custom Power: A Custom Perspective Andrew Chang Cadence Design Systems* William J. Dally Computer Systems Laboratory](https://reader035.vdocuments.mx/reader035/viewer/2022070308/551c1dda550346ad4f8b59cc/html5/thumbnails/2.jpg)
2
Explaining The Gap Between ASIC and Custom Power: A Custom Perspective
Andrew Chang Cadence Design Systems*
William J. DallyComputer Systems Laboratory
Stanford University
* Work done while Author was at Stanford
![Page 3: 2 Explaining The Gap Between ASIC and Custom Power: A Custom Perspective Andrew Chang Cadence Design Systems* William J. Dally Computer Systems Laboratory](https://reader035.vdocuments.mx/reader035/viewer/2022070308/551c1dda550346ad4f8b59cc/html5/thumbnails/3.jpg)
3
Design Tradeoffs: Power vs. Performance
1. Move to More Energy Efficient
Operating Point
More Energy Efficient w/ Custom
Power
2
1 3
Performance
![Page 4: 2 Explaining The Gap Between ASIC and Custom Power: A Custom Perspective Andrew Chang Cadence Design Systems* William J. Dally Computer Systems Laboratory](https://reader035.vdocuments.mx/reader035/viewer/2022070308/551c1dda550346ad4f8b59cc/html5/thumbnails/4.jpg)
4
Design Tradeoffs: Power vs. Performance
1. Move to More Energy Efficient
Operating Point
More Energy Efficient w/ Custom
2. Trade Performance for
Power
Larger Range w/ Custom
Power
2
1 3
Performance
![Page 5: 2 Explaining The Gap Between ASIC and Custom Power: A Custom Perspective Andrew Chang Cadence Design Systems* William J. Dally Computer Systems Laboratory](https://reader035.vdocuments.mx/reader035/viewer/2022070308/551c1dda550346ad4f8b59cc/html5/thumbnails/5.jpg)
5
Design Tradeoffs: Power vs. Performance
1. Move to More Energy Efficient
Operating Point
More Energy Efficient w/ Custom
2. Trade Performance for
Power
Larger Range w/ Custom
3. Move to Different
Power vs. Performance Curve
More Architectural Choice with
Custom
Power
2
1 3
Performance
![Page 6: 2 Explaining The Gap Between ASIC and Custom Power: A Custom Perspective Andrew Chang Cadence Design Systems* William J. Dally Computer Systems Laboratory](https://reader035.vdocuments.mx/reader035/viewer/2022070308/551c1dda550346ad4f8b59cc/html5/thumbnails/6.jpg)
6
Dynamic Power Dissipation
Pdyn = CVdd2 f = Ecircuit f
Reduce Vdd
Static, dynamic, voltage islands, power gating
Reduce and/or f Clock gating, block enables, bus encoding, glitch identification
and elimination
Reduce Ecircuit
Engineer interconnects, increase circuit efficiency, subthreshold circuit techniques
![Page 7: 2 Explaining The Gap Between ASIC and Custom Power: A Custom Perspective Andrew Chang Cadence Design Systems* William J. Dally Computer Systems Laboratory](https://reader035.vdocuments.mx/reader035/viewer/2022070308/551c1dda550346ad4f8b59cc/html5/thumbnails/7.jpg)
7
Static Power Dissipation
Pstatic = Vdd (Isub + Iox )
Isub = K1 W e -Vt/ nV
(1- e –Vgs
/V)
Iox = K2 W (Vgs/tox)2 e – tox
/ Vgs
With K1, K2, n, and experimentally determined
Reduce Vdd Static, dynamic, voltage islands, power gating
Increase effective Vt Substituting high-threshold devices, transistor stacking, static and active
body bias
Reduce effective W Reduce number and size of devices in design
![Page 8: 2 Explaining The Gap Between ASIC and Custom Power: A Custom Perspective Andrew Chang Cadence Design Systems* William J. Dally Computer Systems Laboratory](https://reader035.vdocuments.mx/reader035/viewer/2022070308/551c1dda550346ad4f8b59cc/html5/thumbnails/8.jpg)
8
Which Design Is More Efficient?
0.7um CMOS 173MHz chip w/ 460K T’s
0.18um CMOS 10kHz chip w/ 640K T’s
![Page 9: 2 Explaining The Gap Between ASIC and Custom Power: A Custom Perspective Andrew Chang Cadence Design Systems* William J. Dally Computer Systems Laboratory](https://reader035.vdocuments.mx/reader035/viewer/2022070308/551c1dda550346ad4f8b59cc/html5/thumbnails/9.jpg)
9
Which Design Is More Efficient?
0.7um CMOS 173MHz chip w/ 460K T’sVdd (typ) = 3.3V, Vdd (min) = 1.1V
0.18um CMOS 10kHz chip w/ 640K T’s
Vdd (max) = 1.8V, Vdd (min) = 0.18V
![Page 10: 2 Explaining The Gap Between ASIC and Custom Power: A Custom Perspective Andrew Chang Cadence Design Systems* William J. Dally Computer Systems Laboratory](https://reader035.vdocuments.mx/reader035/viewer/2022070308/551c1dda550346ad4f8b59cc/html5/thumbnails/10.jpg)
10
Which Design Is More Efficient?
0.7um CMOS 173MHz chip w/ 460K T’sVdd (typ) = 3.3V, Vdd (min) = 1.1VPower = 845mW
0.18um CMOS 10kHz chip w/ 640K T’s
Vdd (max) = 1.8V, Vdd (min) = 0.18VPower = 1.6mW
![Page 11: 2 Explaining The Gap Between ASIC and Custom Power: A Custom Perspective Andrew Chang Cadence Design Systems* William J. Dally Computer Systems Laboratory](https://reader035.vdocuments.mx/reader035/viewer/2022070308/551c1dda550346ad4f8b59cc/html5/thumbnails/11.jpg)
11
Talk Outline
Normalized Metric: Ebit
Effect of Architecture ASIC vs. Custom
Building BlocksAchievable Energy Efficiency
16b 1024 FFT Example Answer to “Which Design is More Efficient”
![Page 12: 2 Explaining The Gap Between ASIC and Custom Power: A Custom Perspective Andrew Chang Cadence Design Systems* William J. Dally Computer Systems Laboratory](https://reader035.vdocuments.mx/reader035/viewer/2022070308/551c1dda550346ad4f8b59cc/html5/thumbnails/12.jpg)
12
Talk Outline
Normalized Metric: Ebit
Effect of Architecture ASIC vs. Custom
Building BlocksAchievable Energy Efficiency
16b 1024 FFT Example Answer to “Which Design is More Efficient”
![Page 13: 2 Explaining The Gap Between ASIC and Custom Power: A Custom Perspective Andrew Chang Cadence Design Systems* William J. Dally Computer Systems Laboratory](https://reader035.vdocuments.mx/reader035/viewer/2022070308/551c1dda550346ad4f8b59cc/html5/thumbnails/13.jpg)
13
Defining Ebit
Ebit = Cbit * Vdd2
Cbit = 4 * 2 fF/um * Wmin
Energy needed to write a 1-bit SRAM cell Approximates minimum useful capacitanceThe ratio of Ebit to the energy for a range of circuits
remains largely constant with technology scaling
![Page 14: 2 Explaining The Gap Between ASIC and Custom Power: A Custom Perspective Andrew Chang Cadence Design Systems* William J. Dally Computer Systems Laboratory](https://reader035.vdocuments.mx/reader035/viewer/2022070308/551c1dda550346ad4f8b59cc/html5/thumbnails/14.jpg)
14
Technology Scaling for Ebit
is a normalized unit of distance equal to the M1 pitch
Technology
0.5m
0.18m
58 18
5.7 18
m2
![Page 15: 2 Explaining The Gap Between ASIC and Custom Power: A Custom Perspective Andrew Chang Cadence Design Systems* William J. Dally Computer Systems Laboratory](https://reader035.vdocuments.mx/reader035/viewer/2022070308/551c1dda550346ad4f8b59cc/html5/thumbnails/15.jpg)
15
Technology Scaling for Nand2
is a normalized unit of distance equal to the M1 pitch
4 = 2.24m
8 = 4.48m
NAND2AB YN
A
BYN
![Page 16: 2 Explaining The Gap Between ASIC and Custom Power: A Custom Perspective Andrew Chang Cadence Design Systems* William J. Dally Computer Systems Laboratory](https://reader035.vdocuments.mx/reader035/viewer/2022070308/551c1dda550346ad4f8b59cc/html5/thumbnails/16.jpg)
16
Applying Ebit
Energy 180nm 130nm 90nm 65nm
Ebit (fJ) 3.3 1.4 0.5 0.36
Relative 180nm 130nm 90nm 65nm
Ebit 1 1 1 1
1b FO4 ~10 ~10 ~10 ~10
1b SP-SRAM 0.3-7 0.3-7 0.3-7 0.3-7
1b RF 4-20+ 4-20+ 4-20+ 4-20+
1b DFF 20-30+ 15-30+ 10-30+ 10-30+
1b Nand2 11-30 (typ 19) 5-30 (typ 14) 5-30 (typ 14) 5-30 (typ 14)
Move 1b 1000 ~100 ~100 ~100 ~100
Move 1b 1.5mm 268 367 467 714
![Page 17: 2 Explaining The Gap Between ASIC and Custom Power: A Custom Perspective Andrew Chang Cadence Design Systems* William J. Dally Computer Systems Laboratory](https://reader035.vdocuments.mx/reader035/viewer/2022070308/551c1dda550346ad4f8b59cc/html5/thumbnails/17.jpg)
17
Talk Outline
Normalized Metric: Ebit
Effect of Architecture ASIC vs. Custom
Building BlocksAchievable Energy Efficiency
16b 1024 FFT Example Answer to “Which Design is More Efficient”
![Page 18: 2 Explaining The Gap Between ASIC and Custom Power: A Custom Perspective Andrew Chang Cadence Design Systems* William J. Dally Computer Systems Laboratory](https://reader035.vdocuments.mx/reader035/viewer/2022070308/551c1dda550346ad4f8b59cc/html5/thumbnails/18.jpg)
18
Talk Outline
Normalized Metric: Ebit
Effect of Architecture ASIC vs. Custom
Building BlocksAchievable Energy Efficiency
16b 1024 FFT Example Answer to “Which Design is More Efficient”
![Page 19: 2 Explaining The Gap Between ASIC and Custom Power: A Custom Perspective Andrew Chang Cadence Design Systems* William J. Dally Computer Systems Laboratory](https://reader035.vdocuments.mx/reader035/viewer/2022070308/551c1dda550346ad4f8b59cc/html5/thumbnails/19.jpg)
19
Design Style: Custom
NVIDIA GeForceFX Intel Pentium-4
Design Style: ASIC400MHz – 125M Transistors 2600MHz – 55M Transistors
Effect of Architecture
![Page 20: 2 Explaining The Gap Between ASIC and Custom Power: A Custom Perspective Andrew Chang Cadence Design Systems* William J. Dally Computer Systems Laboratory](https://reader035.vdocuments.mx/reader035/viewer/2022070308/551c1dda550346ad4f8b59cc/html5/thumbnails/20.jpg)
20
Design Style: Custom
NVIDIA GeForceFX Intel Pentium-4
Design Style: ASIC400MHz – 125M Transistors~20 Watts
2600MHz – 55M Transistors~60 Watts
Effect of Architecture
![Page 21: 2 Explaining The Gap Between ASIC and Custom Power: A Custom Perspective Andrew Chang Cadence Design Systems* William J. Dally Computer Systems Laboratory](https://reader035.vdocuments.mx/reader035/viewer/2022070308/551c1dda550346ad4f8b59cc/html5/thumbnails/21.jpg)
21
Effect of Architecture ASIC Architecture: 6x Efficiency
Design Style: Custom
NVIDIA GeForceFX Intel Pentium-4
Design Style: ASIC400MHz – 125M Transistors~20 Watts: 10GFlops & 13 GBs
2600MHz – 55M Transistors~60 Watts: 5GFlops & 5 Gbs
![Page 22: 2 Explaining The Gap Between ASIC and Custom Power: A Custom Perspective Andrew Chang Cadence Design Systems* William J. Dally Computer Systems Laboratory](https://reader035.vdocuments.mx/reader035/viewer/2022070308/551c1dda550346ad4f8b59cc/html5/thumbnails/22.jpg)
22
Custom Circuits: 9x (7x) Efficiency
Design Style: Custom
NVIDIA GeForceFX Intel Pentium-4
Design Style: Custom400MHz – 125M Transistors~3 Watts: 10GFlops & 13 GBs Vdd = 0.65V
2600MHz – 55M Transistors~60 Watts: 5GFlops & 5 Gbs Vdd = 1.3V
![Page 23: 2 Explaining The Gap Between ASIC and Custom Power: A Custom Perspective Andrew Chang Cadence Design Systems* William J. Dally Computer Systems Laboratory](https://reader035.vdocuments.mx/reader035/viewer/2022070308/551c1dda550346ad4f8b59cc/html5/thumbnails/23.jpg)
23
Combined Architecture and Circuits40x+ Improvement but 1.5 Years vs. 3+ Years
Design Style: Custom
NVIDIA GeForceFX Intel Pentium-4
Design Style: Custom400MHz – 125M Transistors~3 Watts: 10GFlops & 13 GBs Vdd = 0.65V
2600MHz – 55M Transistors~60 Watts: 5GFlops & 5 Gbs Vdd = 1.3V
![Page 24: 2 Explaining The Gap Between ASIC and Custom Power: A Custom Perspective Andrew Chang Cadence Design Systems* William J. Dally Computer Systems Laboratory](https://reader035.vdocuments.mx/reader035/viewer/2022070308/551c1dda550346ad4f8b59cc/html5/thumbnails/24.jpg)
24
Talk Outline
Normalized Metric: Ebit
Effect of Architecture ASIC vs. Custom
Building BlocksAchievable Energy Efficiency
16b 1024 FFT Example Answer to “Which Design is More Efficient”
![Page 25: 2 Explaining The Gap Between ASIC and Custom Power: A Custom Perspective Andrew Chang Cadence Design Systems* William J. Dally Computer Systems Laboratory](https://reader035.vdocuments.mx/reader035/viewer/2022070308/551c1dda550346ad4f8b59cc/html5/thumbnails/25.jpg)
25
Talk Outline
Normalized Metric: Ebit
Effect of Architecture ASIC vs. Custom
Building BlocksAchievable Energy Efficiency
16b 1024 FFT Example Answer to “Which Design is More Efficient”
![Page 26: 2 Explaining The Gap Between ASIC and Custom Power: A Custom Perspective Andrew Chang Cadence Design Systems* William J. Dally Computer Systems Laboratory](https://reader035.vdocuments.mx/reader035/viewer/2022070308/551c1dda550346ad4f8b59cc/html5/thumbnails/26.jpg)
26
ASIC vs. Custom
ASIC Methods Provide only coarse-grain control 100K+ gates,
but require much less effort and historically scale with complexity
Custom Methods Offer fine-grain control individual transistors &
gates, but require large effort and scale poorly with complexity
Exploits Design StructureExploits Circuit Techniques
![Page 27: 2 Explaining The Gap Between ASIC and Custom Power: A Custom Perspective Andrew Chang Cadence Design Systems* William J. Dally Computer Systems Laboratory](https://reader035.vdocuments.mx/reader035/viewer/2022070308/551c1dda550346ad4f8b59cc/html5/thumbnails/27.jpg)
27
Custom Methods EmphasizeFine-Grain Manual Control + Custom Library
Design Gate Library Floorplanning/ Coarse Detailed Coarse Detailed Style Partitioning Placement Placement Routing RoutingCustom Complex Manual Manual Manual Manual Manual
Specific
ASIC Simple Manual/Automated Automated Automated Automated Automated
Generic Automated w/ Hints
![Page 28: 2 Explaining The Gap Between ASIC and Custom Power: A Custom Perspective Andrew Chang Cadence Design Systems* William J. Dally Computer Systems Laboratory](https://reader035.vdocuments.mx/reader035/viewer/2022070308/551c1dda550346ad4f8b59cc/html5/thumbnails/28.jpg)
28
Custom Methods EmphasizeFine-Grain Manual Control + Custom Library
Design Gate Library Floorplanning/ Coarse Detailed Coarse Detailed Style Partitioning Placement Placement Routing RoutingCustom Complex Manual Manual Manual Manual Manual
Specific
ASIC Simple Manual/Automated Automated Automated Automated Automated
Generic Automated w/ Hints
Operation and Performance Characterized
for the Specific Case
![Page 29: 2 Explaining The Gap Between ASIC and Custom Power: A Custom Perspective Andrew Chang Cadence Design Systems* William J. Dally Computer Systems Laboratory](https://reader035.vdocuments.mx/reader035/viewer/2022070308/551c1dda550346ad4f8b59cc/html5/thumbnails/29.jpg)
29
ASIC Methods SubstituteCoarse-Grain Control
Automation + Generic Library
Design Gate Library Floorplanning/ Coarse Detailed Coarse Detailed Style Partitioning Placement Placement Routing RoutingCustom Complex Manual Manual Manual Manual Manual
Specific
ASIC Simple Manual/Automated Automated Automated Automated Automated
Generic Automated w/ Hints
![Page 30: 2 Explaining The Gap Between ASIC and Custom Power: A Custom Perspective Andrew Chang Cadence Design Systems* William J. Dally Computer Systems Laboratory](https://reader035.vdocuments.mx/reader035/viewer/2022070308/551c1dda550346ad4f8b59cc/html5/thumbnails/30.jpg)
30
ASIC Methods SubstituteCoarse-Grain Control
Automation + Generic Library
Design Gate Library Floorplanning/ Coarse Detailed Coarse Detailed Style Partitioning Placement Placement Routing RoutingCustom Complex Manual Manual Manual Manual Manual
Specific
ASIC Simple Manual/Automated Automated Automated Automated Automated
Generic Automated w/ Hints
Operation and Performance Characterized
for the Typical/Generic Case
![Page 31: 2 Explaining The Gap Between ASIC and Custom Power: A Custom Perspective Andrew Chang Cadence Design Systems* William J. Dally Computer Systems Laboratory](https://reader035.vdocuments.mx/reader035/viewer/2022070308/551c1dda550346ad4f8b59cc/html5/thumbnails/31.jpg)
31
ASIC Focus on 100K+ GatesLost Opportunities to Exploit Structure
Designs reuse similar basic building blocks Building blocks: 1-10K-gates not 100K+ gate
64-bit adder 1K-gates64x64 rf 2K-gates 64x64 multiplier 20K-gates
Opportunities to exploit these structures lost when design is viewed in large chunks
![Page 32: 2 Explaining The Gap Between ASIC and Custom Power: A Custom Perspective Andrew Chang Cadence Design Systems* William J. Dally Computer Systems Laboratory](https://reader035.vdocuments.mx/reader035/viewer/2022070308/551c1dda550346ad4f8b59cc/html5/thumbnails/32.jpg)
32
Different Architectures Similar Building Blocks
LC LC LC
LC
LCLC
LC
LC LC
EX RF SRAM XCVRS
LC
Bus
Bank 1 Bank 0
CLST 0CLST 1CLST 2
CLST 0CLST 1CLST 2
NIF/ROUTER
MEMORY SWITCH
CLUSTER SWITCH
EMI
LTLB
1998 “MAP” 64b Microprocessor - 5M T’s(MIT/Stanford)
EX RF SRAM XCVRS Bus
LC
LCLCLC
LC
2002 “Imagine” 32b Stream Processor - 22M T’s(Stanford)
Cluster1
Cluster0
Cluster3
Cluster2
Cluster5
Cluster4
Cluster7
Cluster6
Microcontroller
![Page 33: 2 Explaining The Gap Between ASIC and Custom Power: A Custom Perspective Andrew Chang Cadence Design Systems* William J. Dally Computer Systems Laboratory](https://reader035.vdocuments.mx/reader035/viewer/2022070308/551c1dda550346ad4f8b59cc/html5/thumbnails/33.jpg)
33
Significant Structure ExistsWithin 100K-gates
LC LC LC
LC
LCLC
LC
LC LC
LC
LC
LCLCLC
LC
EX RF SRAM XCVRS Bus
EX RF SRAM XCVRS Bus
Bank 1 Bank 0
CLST 0CLST 1CLST 2
CLST 0CLST 1CLST 2
NIF/ROUTER
MEMORY SWITCH
CLUSTER SWITCH
EMI
LTLB
1998 “MAP” 64b Microprocessor - 5M T’s(MIT/Stanford)
2002 “Imagine” 32b Stream Processor - 22M T’s(Stanford)
Cluster1
Cluster0
Cluster3
Cluster2
Cluster5
Cluster4
Cluster7
Cluster6
Microcontroller
![Page 34: 2 Explaining The Gap Between ASIC and Custom Power: A Custom Perspective Andrew Chang Cadence Design Systems* William J. Dally Computer Systems Laboratory](https://reader035.vdocuments.mx/reader035/viewer/2022070308/551c1dda550346ad4f8b59cc/html5/thumbnails/34.jpg)
34
Energy of 100K-gate Equivalent
ASIC (N2) = 1400K Ebits (typ)
Custom Logic = 424K Ebits*
SRAM (small) = 1085K Ebits
SRAM (med) = 155K Ebits
SRAM (large) = 50K Ebits
*Based on data extracted from Intel McKinley
![Page 35: 2 Explaining The Gap Between ASIC and Custom Power: A Custom Perspective Andrew Chang Cadence Design Systems* William J. Dally Computer Systems Laboratory](https://reader035.vdocuments.mx/reader035/viewer/2022070308/551c1dda550346ad4f8b59cc/html5/thumbnails/35.jpg)
35
Exploiting Circuit Techniques
Custom circuits more efficient Reduced parasitics 1.7x circuit techniques and flops 1.4x libraries 1.4x due to engineering interconnects
Subthreshold Circuits Low Performance but ultra-low powerRequires Architecture, Gates, Memories, CAD
Tools
![Page 36: 2 Explaining The Gap Between ASIC and Custom Power: A Custom Perspective Andrew Chang Cadence Design Systems* William J. Dally Computer Systems Laboratory](https://reader035.vdocuments.mx/reader035/viewer/2022070308/551c1dda550346ad4f8b59cc/html5/thumbnails/36.jpg)
36
Relating Power to PerformanceCV/I, Idsat, tFO4
Idsat = K3 Leff -0.5 tox-0.8 (Vgs - Vt)1.25
tFO4 = K4 [Ceff Vdd /Idsat] (K4 ~ 13.5)
![Page 37: 2 Explaining The Gap Between ASIC and Custom Power: A Custom Perspective Andrew Chang Cadence Design Systems* William J. Dally Computer Systems Laboratory](https://reader035.vdocuments.mx/reader035/viewer/2022070308/551c1dda550346ad4f8b59cc/html5/thumbnails/37.jpg)
37
Relating Power to Performance Relating Vdd and Vt to tFO4
Idsat = K3 Leff -0.5 tox-0.8 (Vgs - Vt)1.25
tFO4 = K4 [Ceff Vdd /Idsat] (K4 ~ 13.5)
![Page 38: 2 Explaining The Gap Between ASIC and Custom Power: A Custom Perspective Andrew Chang Cadence Design Systems* William J. Dally Computer Systems Laboratory](https://reader035.vdocuments.mx/reader035/viewer/2022070308/551c1dda550346ad4f8b59cc/html5/thumbnails/38.jpg)
38
Relating Power to PerformanceCorrelation to Reported Foundry Data
Technology NodeCV/I est
(ps)CV/I reported
(ps)tFO4 est
(ps)
Foundry A 180-nm 3.94 3.70 53
Foundry A 130-nm 2.55 2.17 34
Foundry A 90-nm 1.85 2.04 25
Foundry A 65-nm 1.45 1.00 20
Idsat = K3 Leff -0.5 tox-0.8 (Vgs - Vt)1.25
tFO4 = K4 [Ceff Vdd /Idsat] (K4 ~ 13.5)
![Page 39: 2 Explaining The Gap Between ASIC and Custom Power: A Custom Perspective Andrew Chang Cadence Design Systems* William J. Dally Computer Systems Laboratory](https://reader035.vdocuments.mx/reader035/viewer/2022070308/551c1dda550346ad4f8b59cc/html5/thumbnails/39.jpg)
39
Achievable Power Improvement (Assuming 50/50 split of Logic and Memory)
Technique TypeCustom vs.
ASIC Energy Type
Circuit Styles and Flops
Dynamic
1.7 0.815 Logic
Libraries + Vdd
Scaling1.4 0.855 Logic
SRAM Circuits 2 0.95 SRAM
Interconnect + Vdd
Scaling1.4 0.855 Inter-connect
![Page 40: 2 Explaining The Gap Between ASIC and Custom Power: A Custom Perspective Andrew Chang Cadence Design Systems* William J. Dally Computer Systems Laboratory](https://reader035.vdocuments.mx/reader035/viewer/2022070308/551c1dda550346ad4f8b59cc/html5/thumbnails/40.jpg)
40
Achievable Power Improvement(Assuming 50/50 Split of Logic and Memory)
Technique TypeCustom vs.
ASIC Energy Type
Bit Encoding
Dynamic
1 0.84 Inter-connect
Clock Gating 1 0.84 Chip
Frequency Scaling 1 0.5 Chip
Subthreshold Circuits
N/A 0.062 Chip
![Page 41: 2 Explaining The Gap Between ASIC and Custom Power: A Custom Perspective Andrew Chang Cadence Design Systems* William J. Dally Computer Systems Laboratory](https://reader035.vdocuments.mx/reader035/viewer/2022070308/551c1dda550346ad4f8b59cc/html5/thumbnails/41.jpg)
41
Achievable Power Improvement(Assuming 50/50 Split of Logic and Memory)
Technique TypeCustom vs.
ASIC Energy Type
Vdd Scaling
Static
1 0.79 Chip
MT-CMOS 1 0.5 Chip
Stacking and input state vector
1.4 0.7 Chip(typically
only one of these three is
applied)
Body Bias 2 0.5
Supply Gating 10 0.1
![Page 42: 2 Explaining The Gap Between ASIC and Custom Power: A Custom Perspective Andrew Chang Cadence Design Systems* William J. Dally Computer Systems Laboratory](https://reader035.vdocuments.mx/reader035/viewer/2022070308/551c1dda550346ad4f8b59cc/html5/thumbnails/42.jpg)
42
Achievable Power ImprovementAssuming 50/50 Split of Logic and Memory
Type Tech ASIC
(Custom)Tech
ASIC (Custom)
Net Dynamic
130-nm
45% (32%)
90-nm
28%(20%)
Net Static 8% (4%) 20%(10%)
Total53%
(36%)48%(30%)
130nm uP assumes 80% Dynamic and 20% Static 90nm uP assumes 50% Dynamic and 50% Static
![Page 43: 2 Explaining The Gap Between ASIC and Custom Power: A Custom Perspective Andrew Chang Cadence Design Systems* William J. Dally Computer Systems Laboratory](https://reader035.vdocuments.mx/reader035/viewer/2022070308/551c1dda550346ad4f8b59cc/html5/thumbnails/43.jpg)
43
Talk Outline
Normalized Metric: Ebit
Effect of Architecture ASIC vs. Custom
Building BlocksAchievable Energy Efficiency
16b 1024 FFT Example Answer to “Which Design is More Efficient”
![Page 44: 2 Explaining The Gap Between ASIC and Custom Power: A Custom Perspective Andrew Chang Cadence Design Systems* William J. Dally Computer Systems Laboratory](https://reader035.vdocuments.mx/reader035/viewer/2022070308/551c1dda550346ad4f8b59cc/html5/thumbnails/44.jpg)
44
Talk Outline
Normalized Metric: Ebit
Effect of Architecture ASIC vs. Custom
Building BlocksAchievable Energy Efficiency
16b 1024 FFT Example Answer to “Which Design is More Efficient”
![Page 45: 2 Explaining The Gap Between ASIC and Custom Power: A Custom Perspective Andrew Chang Cadence Design Systems* William J. Dally Computer Systems Laboratory](https://reader035.vdocuments.mx/reader035/viewer/2022070308/551c1dda550346ad4f8b59cc/html5/thumbnails/45.jpg)
45
16b 1024 point FFT
Generally, k N log N operations (complex multiplies) with pre-computation
Radix-2, Radix-4 etc… implementations
Decimation in time and/or decimation in Frequency
![Page 46: 2 Explaining The Gap Between ASIC and Custom Power: A Custom Perspective Andrew Chang Cadence Design Systems* William J. Dally Computer Systems Laboratory](https://reader035.vdocuments.mx/reader035/viewer/2022070308/551c1dda550346ad4f8b59cc/html5/thumbnails/46.jpg)
46
Range of Implementations
MIT FFT (2005) 0.18um CMOS, 628K T’s, 10KHz: Architecture and subtheshold circuits, 180mV
operation Spiffee (1999)
0.7um CMOS, 460K T’s, 173MHz: Cached FFT Architecture and algorithm, 1.1V operation
SA-1100 (1999) 0.35um CMOS, 2.6M T’s, 74MHz: Commercial embedded processor, Custom
Circuits, 1.5V operation Imagine (2003)
0.15um CMOS, 22M T’s , 232MHz: Streaming Media Processor, tiled standard cells, 1.2V operation
Stratix IS25F627C8 (2005) 0.13um CMOS, 3.9K logic elements, 123K memory bits, 24 DSP blocks, 272MHz: Commercial FPGA Co-processor,
Intel P4 (2003) 0.13um CMOS, 3GHz, SSE: Commerical General Purpose Processor, Custom
Circuits, 1.5V operation TI ‘C6416 (2003)
0.13um CMOS, 720MHz: Commercial Digital Signal Processor
![Page 47: 2 Explaining The Gap Between ASIC and Custom Power: A Custom Perspective Andrew Chang Cadence Design Systems* William J. Dally Computer Systems Laboratory](https://reader035.vdocuments.mx/reader035/viewer/2022070308/551c1dda550346ad4f8b59cc/html5/thumbnails/47.jpg)
47
Ebit Energy 16b 1024 point FFT
Design Fab Vdd MHz mW Cycles
MIT FFT 180 1.8 0.01 1.6 95
Spiffee 700 3.3 173 845 5190
SA-1100 350 2 74 39 31500
Imagine 150 1.5 232 4000 3708
Stratix 130 1.3 275 884 1291
Intel P4 130 1.2 3000 51200 71680
TI 'C6416 130 1.2 720 1200 6526
![Page 48: 2 Explaining The Gap Between ASIC and Custom Power: A Custom Perspective Andrew Chang Cadence Design Systems* William J. Dally Computer Systems Laboratory](https://reader035.vdocuments.mx/reader035/viewer/2022070308/551c1dda550346ad4f8b59cc/html5/thumbnails/48.jpg)
48
Ebit Energy 16b 1024 point FFT
DesignEDP
(rel norm)
Ebit
(fJ) Efft (nJ)Normalized to
Ebit (1e6)EnergyRatio
MIT FFT 143 3.3 154 47 1
Spiffee 1 91 25350 277 6
SA-1100 283 4.2 16601 3953 85
Imagine 148 2.2 63931 29726 637
Stratix 24 1.4 4149 2964 64
Intel P4 12548 1.4 1E+06 873813 18591
TI 'C6416 27 1.4 10877 7769 166
![Page 49: 2 Explaining The Gap Between ASIC and Custom Power: A Custom Perspective Andrew Chang Cadence Design Systems* William J. Dally Computer Systems Laboratory](https://reader035.vdocuments.mx/reader035/viewer/2022070308/551c1dda550346ad4f8b59cc/html5/thumbnails/49.jpg)
49
Which Design Is More Efficient?
0.7um CMOS 173MHz chip w/ 460K T’sVdd (typ) = 3.3V, Vdd (min) = 1.1VPower = 845mW
0.18um CMOS 10kHz chip w/ 640K T’s
Vdd (max) = 1.8V, Vdd (min) = 0.18VPower = 1.6mW
![Page 50: 2 Explaining The Gap Between ASIC and Custom Power: A Custom Perspective Andrew Chang Cadence Design Systems* William J. Dally Computer Systems Laboratory](https://reader035.vdocuments.mx/reader035/viewer/2022070308/551c1dda550346ad4f8b59cc/html5/thumbnails/50.jpg)
50
Which Design Is More Efficient?Depends on the Metric!
0.7um CMOS 173MHz chip w/ 460K T’sVdd (typ) = 3.3V, Vdd (min) = 1.1VPower = 845mWEDP 143x better
0.18um CMOS 10kHz chip w/ 640K T’s
Vdd (max) = 1.8V, Vdd (min) = 0.18VPower = 1.6mWAbsolute energy 6x better
![Page 51: 2 Explaining The Gap Between ASIC and Custom Power: A Custom Perspective Andrew Chang Cadence Design Systems* William J. Dally Computer Systems Laboratory](https://reader035.vdocuments.mx/reader035/viewer/2022070308/551c1dda550346ad4f8b59cc/html5/thumbnails/51.jpg)
51
Summary
Normalized metric – Ebit - enables meaningful comparisons across designs and technologies
Custom designers can exploit a wide range of optimizations: enabling architecture with circuits and circuits with Architecture
Custom designs can readily achieve a 3x advantage in energy with the potential for over 10x
Selective application of custom techniques and automated support for performance characterization at specific instead of generic operating points can enable ASIC designers to begin to bridge this Power Gap.
![Page 52: 2 Explaining The Gap Between ASIC and Custom Power: A Custom Perspective Andrew Chang Cadence Design Systems* William J. Dally Computer Systems Laboratory](https://reader035.vdocuments.mx/reader035/viewer/2022070308/551c1dda550346ad4f8b59cc/html5/thumbnails/52.jpg)
52
Back-Up Slides
![Page 53: 2 Explaining The Gap Between ASIC and Custom Power: A Custom Perspective Andrew Chang Cadence Design Systems* William J. Dally Computer Systems Laboratory](https://reader035.vdocuments.mx/reader035/viewer/2022070308/551c1dda550346ad4f8b59cc/html5/thumbnails/53.jpg)
53
ASIC Rely on General Optimization TechniquesFocus - Improve the Average Case
Partitioning: Hyper-graph - min-cut, ratio cut Solutions: move-based, geometric & combinatorial forms, clustering
Hypergraph
H(V,E) E = { e1, e2….} nets
Circuite1
e3
e4
e5
e6
e7
e8V1 V3
V4
V5
V2
e2
e2
V3
V4
e6
e7
e4
e5
e8
e3Vertex & Edge weights
used to encode costs
V1
V2
V5e1
![Page 54: 2 Explaining The Gap Between ASIC and Custom Power: A Custom Perspective Andrew Chang Cadence Design Systems* William J. Dally Computer Systems Laboratory](https://reader035.vdocuments.mx/reader035/viewer/2022070308/551c1dda550346ad4f8b59cc/html5/thumbnails/54.jpg)
54
Designs with Structure Do Not Exhibit Average Characteristics
64b Multiplier (half-array)
Clear Disparity in Resource Usage
Routing
Density