power-optimal pipelining in deep submicron technology
DESCRIPTION
ISLPED 2004 8/10/2004. Power-Optimal Pipelining in Deep Submicron Technology. Seongmoo Heo and Krste Asanovi ć Computer Architecture Group, MIT CSAIL. Traditional Pipelining. Goal: Maximum performance. Vdd. Setup. Clk-Q. Propagation Delay. Clk. Clk. Clk. - PowerPoint PPT PresentationTRANSCRIPT
Power-Optimal Pipelining in Deep Submicron Technology
Seongmoo Heo and Krste Asanović
Computer Architecture Group, MIT CSAIL
ISLPED 20048/10/2004
Traditional Pipelining• Goal: Maximum performance
Clk
Clk-Q SetupPropagation Delay
Clk
Clk
Vdd
Pipelining as a Low-Power Tool
Clk
Clk-Q SetupPropagation Delay
Time Slack
• Goal: Low-Power, Fixed Throughput
Time Slack
Vdd
Clk
Clk
Pipelining as a Low-Power Tool
Clk
Clk-Q SetupPropagation Delay
Time Slack
• Goal: Low-Power, Fixed Throughput
Time Slack
Vdd
Clk
Clk
Traded for Power(supply voltage scaling)
Pipelining as a Low-Power Tool
Delay
Power
Pipelining
Time slack
Flip-flopPowerOverhead
* Clock frequency fixed
Pipelining as a Low-Power Tool
Delay
Power
Supply voltage scaling
Power Saving
* Clock frequency fixed
Power-Optimal Pipelining• Power reduction from pipelining limited by power
overhead of increased number of flip-flops Power-Optimal Pipelining
Power-Optimal Pipelining
Delay
Power
Too shallow pipelining
• Power reduction from pipelining limited by power overhead of increased number of flip-flops Power-Optimal Pipelining
Power-Optimal Pipelining
Delay
Power
Too deep pipelining
Too shallow pipelining
• Power reduction from pipelining limited by power overhead of increased number of flip-flops Power-Optimal Pipelining
Power-Optimal Pipelining
Delay
Power
Optimal Power Saving
Too deep pipelining
Too shallow pipeliningOptimal pipelining
• Power reduction from pipelining limited by power overhead of increased number of flip-flops Power-Optimal Pipelining
Contribution• Pipelining is an old idea.• Research focus has been on performance impact of
pipelining.• Idea of using pipelining [Chandrakasan ’92] to lower
power has not been fully explored in deep submicron technology.
• Analysis and circuit-level simulation of Power-Optimal Pipelining for different regimes of Vth, activity factor, clock gating
1. Impact of pipelining on power component
2. Impact of pipelining on total power (with/without clock-gating)
Bottom-to-Top Approach
TotalPower
(clock-gated)
Power
Timeactive activeinactive
SwitchingPower
Component
LeakagePower
Component
Idle Power
Component
1. Impact of pipelining on power component
2. Impact of pipelining on total power (with/without clock-gating)
Bottom-to-Top Approach
TotalPower
(not clock-gated)
SwitchingPower
Component
LeakagePower
Component
Idle Power
Component
Power
Timeactive inactive active
*Idle power = power consumed when circuit is idle and not clock-gated
• Target digital system: Fixed throughput, Highly parallel computation, Logic-dominant
• Test bench– BPTM (Berkeley Predictive Technology Model)
70nm process: – LVT(0.17/-0.2), MVT(0.19/-0.22), HVT(0.21/-0.24)– Hspice simulation at 100°C, Clock = 2 GHz
Methodology
BaselineN FO4 inverters (N = 2 ~ 24)
One Pipeline Stage
TG flip-flops TG flip-flops
Pipelining and Switching Power:Analytical Trend
O(N2)
O(1/N)
Number of FO4 per stage, N
Sw
itch
ing
Po
wer
OptimalSaving
Optimal FO4
Quadratic reductionof logic switching power
Flip-flop overhead
Vdd2 N2
O(1/N)
Lea
kag
e P
ow
er
O(N ) (1<< 2)
OptimalSaving
Optimal FO4
Superlinear reductionof logic leakage power
Flip-flop overhead
Pipelining and Leakage Power:Analytical Trend
Number of FO4 per stage, N
Vdd * e(Vdd) N
DIBL effect
Pipelining and Idle Power:Analytical Trend
• Clock-gating is not always possible– Increased control complexity – insufficient setup time of clock enable signal
• Leakage Power + Flip-flop Switching Power– Between leakage power scaling and flip-flop
switching power scaling depending on leakage level
Pipelining and Idle Power:Analytical Trend
O(1/N)
Number of FO4 per stage, N
Re
lati
ve
Po
we
r
O(N ) (1<< 2)
OptimalSaving
Optimal FO4
O(1/N)
Number of FO4 per stage, N
O(N)
OptimalSaving
Optimal FO4Linear reduction of Flip-flop switching power 1/N * Vdd
2 N
Leakage Power Scale
Flip-flop Switching Power Scale
Idle Power
Simulation Results:Power Components
Power Components
Switching Power
Leakage Power
Idle Power
Right hand side curve
O(N2) O(N ) (1<< 2)
O(N) or O(N ) (1<< 2)
Saving* 79(HVT)~ 82(LVT)%
70(LVT)~ 75(HVT)%
55(HVT)~ 70(LVT)%
N* 6 6 8
N = Number of FO4 inverters per stage(Not including flip-flop delay)
N* = Optimal N Saving* = Optimal power saving by pipelining
Fixed Throughput @ 2 GHz
Optimal Power Saving
Optimal FO4 = 6
ClockGating
NoClockGating
Optimal FO4 = 6~8
activity factor activity factor
rela
tive
po
we
r
rela
tive
po
we
r
*2 GHz*Flip-flop delaynot included in optimal FO4
IdlePower
SwitchingPower
SwitchingPower
LeakagePower
activity factor activity factor
rela
tive
po
we
r
rela
tive
po
we
r
Optimal Power Saving
Optimal FO4 = 6 Optimal FO4 = 6~8
ClockGating
NoClockGating
activity factor activity factor
rela
tive
po
we
r
rela
tive
po
we
r
Optimal Power Saving
Optimal FO4 = 6 Optimal FO4 = 6~8
LVT
ClockGating
NoClockGating
Discussion• LVT can be fast and power-efficient
– enables lower Vdd
• Flip-flop delay more important than flip-flop power for power-optimal pipelining
Limitation of This Work
Effect on optimal logic depth
Effect on optimal power saving
Super-linear growth of flip-flops
Additional memory Reduced glitches Parasitic wire capacitance
Conclusion• Pipelining is an effective low-power tool
when used to support voltage scaling in digital system implementing highly parallel computation.
• Optimal Logic Depth: 6-8 FO4– ~ 8-10 FO4 including flip-flop delay
• Optimal Power Saving: 55 – 80% – It depends on Vth, AF, Clock-Gating
• Insights:– Pipelining is more effective with High AF
• Pipelining is most effective at saving switching power
– Pipelining is more effective with lower Vth • Except for when leakage power is dominant.
– Pipelining is more effective with clock-gating • reduced flip-flop overhead.
Acknowledgments• Thanks to SCALE group members and
anonymous reviewers
• Funded by NSF CAREER award CCR-0093354, NSF ITR award CCR-0219545, and a donation from Intel Corporation.
BACKUP SLIDES