1 modeling and optimization of vlsi interconnect 049031 lecture 6: interconnect power avinoam...
TRANSCRIPT
1
Modeling and Optimization of VLSI Interconnect049031
Lecture 6: Interconnect power
Avinoam KolodnyKonstantin Moiseev
2
Outline Interconnect power modeling
Definition Activity factor (AF) and signal probability (SP) and relations between them Cross-coupling power. Miller Coupling Factor for timing and power Relation between MCF and AF AF and SP generation
Interconnect power breakdown Interconnect length distribution Local and global interconnects and their power Clock power Interconnect power of total power
Interconnect power prediction Interconnect length prediction
Rent’s rule Donath’s model
Fanout prediction
3
1 Google search = ?
Same energy as 11-watt light bulb for an 1 hr Emit 7gr CO2
There are 0.4B Google searches daily
Adopted from Muhammad Abozaed, Intel
4
So why power is important? Mobile – battery life
Reliability - Power density
User experience – skin temperature
Servers – cooling costs, environmental heating
0.1
1
10
100
1000
1970 1980 1990 2000 2010 2020
Power(Watts)
1000's ofWatts?
8080
8086 386
Pentium® proc
Pentium® 4 proc
5
Electrical Energy
Energy is defined as the ability to do work Electrical energy is energy stored in an electric field or
transported by an electric current Electrical energy can be:
Dissipated as heat by an electric current flowing through resistor
Stored in a capacitor Transformed to magnetic field energy
The work performed by current on section with voltage difference during time is:
IV T
0
( ) ( )T
E I t V t dt
6
Power
Power is work performed per unit time Measured in Watts
In VLSI, the power is usually either consumed or dissipated
Consumed from the source Dissipated by resistors (converted to heat)
The average power dissipation by current with voltage difference during time is:
0
1( ) ( )
T
P I t V t dtT
IV T
7
Power dissipation sources
Dynamic power
Dynamic power Static powerStatic power Short-circuit
powerShort-circuit
power
Power dissipation
Power dissipation
8
Energy dissipation in RC circuit
First stage – charging capacitor:
VDD
R
C
I
Vc
VR
• Capacitor current:
• Energy stored in the capacitor:
• Energy dissipated by the source:
• Energy dissipated by the resistor (converted to heat):
CC
dVI C
dt
0 0
2
0
( ) ( ) ( )
( )2
DD
T TC
C c c
VDD
c C
dVE V t I t dt C V t dt
dt
CVC V t dV
2
0 0
( )DDVT
S DD DD C DDE V I t dt CV dV CV 2
2R S CDDCV
E E E
Assumption: ( )C DDV t T V
9
Energy dissipation in RC circuit
Second stage – discharging capacitor:
VDD
R
C
I
Vc
VR
• Capacitor current:
• Energy freed by the capacitor:
• Energy dissipated by the source:
• Energy dissipated by the resistor (converted to heat):
CC
dVI C
dt
0 0
0 2
( ) ( ) ( )
( )2
DD
T TC
C c c
DDc C
V
dVE V t I t dt C V t dt
dt
CVC V t dV
0
( ) 0T
S DDE V I t dt 2
2D
R CDCV
E E
Assumption: ( ) 0CV t T
10
Dynamic power dissipation in VLSI
So, for two capacitor switches (charge and discharge), the energy dissipated is CVDD
2
For two switches of signal during time T (clock period), the average power dissipation is
If the signal switches times in average during time T, then the average power dissipation is
22DD
DD
CVP CV f
T
2DDP CV f
2
is called activity factor
11
Dynamic power contributors Dynamic power dissipation:
The capacitance is contributed by three elements:
Self-capacitance and cross-coupling capacitance
2DDP CV f
Layer 1
Layer 2
Layer 3
Cupper
Clower
CsideCside
lower upper side,1 side,2C =C +C +C +C
Area and fringe
capacitance
Coupling capacitance
2area+fringe coupling DDP=α C +C V f
12
For quiet neighbors (tied to VDD or ground)
For switching neighbors the capacitance will depend on switching direction Power calculation by equivalent circuit method Power calculation by application of Miller’s theorem
Coupling capacitance calculation
Coupling capacitance value depends on neighbor wires
L T
S
sideC
S T
L
13
Equivalent circuit method
Equivalent circuit for two coupled lines:
Simplest case – wire is switched from 0 to VDD; neighbor is quite and tied to ground, R1=R2
Energy dissipated by each resistor (wire) in this case is
Total energy dissipated is
R1
R2
V1
V2
Cc
R1
R2
CcVDD
2
4c DDC V
E 2
2c DDC V
E
14
Equivalent circuit method
For all cases of one quite wire and one switched wire the same results as in previous slide are obtained
Second case – both wires are switched simultaneously from 0 to VDD
The current through resistors is
( is voltage on the capacitor)
No power dissipation in this case!
R1
R2
VDD
R1
R2
VDD
Cc
Cc
Before
After
1 2
0cCV
IR R
cCV
15
R1
R2
VDD
R1
R2
VDD
Cc
Cc
Equivalent circuit method
Third case – both wires are switched simultaneously in opposite directions
Current in the circuit:
Energy consumed by the second source is zero (voltage of source is zero)
Energy consumed by the first source:
No energy change of the capacitor It means all the energy is dissipated
by resistors Each resistor dissipates , totally
Before
After
1 2
cDD CV VI
R R
( is the capacitor voltage)
cCV
22 DD CE V C
2C DDC V 22 C DDC V
16
Miller’s theorem
Z
Vx Vy
Z1 Z2
Z is impedance
1 (1 )V
ZZ
A
2 1(1 )V
ZZ
A
yV
x
VA
V
17
Usage of Miller’s theorem for coupling capacitance and power calculations
VX Vy
0
VDD
0
CC
VX Vy
0
VDD
0
CC
0VA 1 2 0Z Z Z 2
2C DD
total
C VP
VX Vy
0
VDD
CC
VDD
VX Vy
0
VDD
CC
VDD
VX Vy
0
VDD
0
VDD
disconnected
1VA 1 2Z Z 0totalP
VA 1 20Z Z Z 2
2C DD
total
C VP
VX Vy
0
VDD
0
VDD
2CC 2CCVX Vy
0
VDD
1VA 1 2 2
ZZ Z 22total C DDP C V
VX Vy
0
VDD
CC
0
VDD
18
Observations
Miller’s theorem gives the same results for total power dissipation as equivalent circuit method, however, the results for each wire power dissipation are inaccurate
Total power dissipation calculated by using of both methods is follows:
For one-wire switch – power dissipation is
For simultaneous switch in the same direction – there is no power dissipation
For simultaneous switch in opposite directions:
2
2C DD
total
C VP
22total C DDP C V
19
Miller factor for power
Miller factor is used in order to account effects of changing coupling capacitance due to switching
Nominal coupling capacitance is multiplied by Miller Coupling Factor (MCF) in order to obtain real capacitance:
For one-wire switching, MCF = 1
For switching in the same direction, MCF = 0
For switching in opposite directions, MCF = 4
2( )
2area fringe coupling DDC C V
P
2
2area fringe DDC V
P
222
2area fringe DD
coupling DD
C VP C V
20
Recall: MCF for delay
Z
Zx Zy
Vx Vy
Vx Vy
1y
kZZ
k
y
x
Vk
V
1x
ZZ
k
21
Activity factor Activity Factor (AF) ( a.k.a toggle rate) is an average fraction of cycles
in which signal changes from 0 to 1 or from 1 to 0, as compared to clock signal
Clock toggles twice a cycle, so its AF = 1 Combinational logic data signal normally will have maximum AF = 0.5
Domino signal can have AF = 1
Is it possible for signal to have AF > 1? Yes, because of glitches
#signal_toggles_in_ 2 N_cycles
2AF
N
clk
dataout
clk
out
Dominod1 outclk
outd2
clk
22
Signal probability
Signal probability (SP) is an average fraction of cycles in which signal has logic value of “1”
CLK SP = 0.5
1
0
SP = 1
1
0
SP ≈ 1
23
Relation between MCF and AF
Assume two neighbor uncorrelated signals make and
transitions during clock cycles It can be shown that number of simultaneous transitions
of the signals is negligible no more than 4 Therefore, energy dissipated by cross capacitance
between signals is
The power dissipated during cycles is:
For the same reason, it is usually assumed that MCF=1 for uncorrelated signals
1N
2N N
21 2
1
2 x ddE N N C V
N
2
1 2 21 22
212 2 2 x
x ddx dd
cycle cyc edd
l
N N C V N NEP C V f
N t N t N NC V f
24
Activity Factors Generation
Power test vectors generation(worst case for high power, unit stressing)
RTL full-chip simulation(results in blocks primary inputs: Activity,Probability)
Monte-Carlo based block inputs generation(based on the RTL statistics)
Transistor level simulation - per block(Unit delay, tuning for glitches)
Per node activity factorSource -”Intel® Pentium® M Processor Power Estimation, Budgeting, Optimization, and Validation”, ITJ 2003
25
Interconnect power breakdown
case study
26
Low-power, state-of-the-art μ-processor Dynamic switching power analysis Interconnect attributes:
Length Capacitance Fan Out (FO) Hierarchy data Net type Activity factors (AF) Miscellaneous.
Case study
27
Power Estimation accuracy
Simulated activity density
IREM measurement
Source -”Intel® Pentium® M Processor Power Estimation, Budgeting, Optimization, and Validation”, ITJ 2003
28
Interconnect Length Distribution
Source: Shekhar Y. Borkar, CRL - Intel
0.001
0.01
0.1
1
10
100
1000
10000
1 10 100 1000 10000 100000
Net Length [um]
Nu
mb
er o
f n
ets
Pentium® 0.5 [um]Pentium® MMX 0.35 [um]Pentium® Pro 0.5 [um]Pentium® II 0.35 [um]Pentium® II 0.25 [um]Pentium® III 0.18 [um]Low Power Processor 0.13 [um]
29
0.001
0.01
0.1
1
10
100
1000
1 10 100 1000 10000 100000
Length [um]
Num
ber
of N
ets
Total
0.001
0.01
0.1
1
10
100
1000
1 10 100 1000 10000 100000
Length [um]
Num
ber
of N
ets
Local
Global
Total
Interconnect Length Distribution
• Log – Log
scale
• Exponential
decrease with
length
• Global clock –
not included
Nets vs. Net Length
30
Total Dynamic Power
Total Dynamic
Power
Global clock –
not included
Local
nets = 66%
Global
nets = 34%
Total Power vs. Net Length
0
10
20
30
40
50
60
70
80
90
100
1 10 100 1000 10000 100000Length [um]
Nor
mal
ized
Dyn
amic
Pow
er
Total
0
10
20
30
40
50
60
70
80
90
100
1 10 100 1000 10000 100000Length [um]
Nor
mal
ized
Dyn
amic
Pow
erLocal
Global
Total
0
10
20
30
40
50
60
70
80
90
100
1 10 100 1000 10000 100000Length [um]
Nor
mal
ized
Dyn
amic
Pow
er
Interconnect
Total
Peak 1
Nets: 390kCap: 10[nF]FO: 2AF: 0.0485
Peak 2
Nets: 75kCap: 20[nF]FO: 20AF: 0.055
31
Local and Global Interconnect
Local and Global IC are different:
Number by Length breakdown
IC breakdown –cap and power
Fan out Metal usage AF is similar
0%
20%
40%
60%
80%
100%
4.16 8.32 16.64 32.864 65.728 131.456 262.496 523.744 1044.99 2084.99 4160 8300.45 16561.4 33930 83850
Length [um]
Po
wer
[ uw
]
IC
Diff
Gate
0%
20%
40%
60%
80%
100%
4.16 8.32 16.64 32.864 65.728 131.456 262.496 523.744 1044.99 2084.99 4160 8300.45 16561.4 33930 83850
Length [um]
Po
we
r
[ uw
]
IC
Diff
Gate
Local Power breakdown vs. Net Length
Global Power breakdown vs. Net Length
32
local clock20%global clock
19%
local signals27%
global signals
34%global clock
13%
global signals
21%local signals
37%
local clock29%
Interconnect power(Interconnect only)
Power Breakdown by Net Types
Total power(Gate, Diffusion and Interconnect)
Global clock included
33
Interconnect Length Prediction
Technology projections - ITRS Interconnect length predictions:
ITRS model: 1/3 of the routing space Davis model:
o Rent’s rule based
o Predicts number of nets as function of:the number of gates and complexity factors
• Models calibrated based on the case study
?
Time
34
0%
10%
20%
30%
40%
50%
60%
70%
80%
90%
100%
0.15 0.13 0.1 0.09 0.08 0.07 0.065 0.045 0.032 0.022
Generation
% G POW
% D POW
% IC POW
Future of Interconnect Power
(using optimistic interconnect scaling)
Dynamic Power breakdown
Interconnect
Diffusion
Gate
Technology generation [μm]Source - ITRS 2001 Edition adapted data
Interconnect power grows to 65%-80% within 5 years !
35
0.00001
0.0001
0.001
0.01
0.1
1
1 10 100 1000 10000 100000
Length [um]
Nu
mb
er
of
Ne
ts
Measured
model
Interconnect Power Prediction
The number of nets vs. unit length – Modified Davis model
The dynamic power average breakdown
Interconnect length projection
0%
10%
20%
30%
40%
50%
60%
70%
80%
90%
100%
Local Intermediate Global
Po
wer
Interconnect
Diff
Gate
Dynamic power breakdown
Interconnect
Diffusion
Gate
Local Intermediate Global
Upper local bound
Lower global bound
Nu
mb
er o
f N
ets
(no
rmal
ized
)P
ow
er
100
00.001
0.01
0.1
1
10
36
Interconnect Power Model
Multiplication of the number of interconnects with power breakdowns gives:
Projected dynamic power vs. net length
0
1
2
3
4
5
6
1 10 100 1000 10000 100000
Length [um]
Po
wer
Measured power
Projection
The power model matches processor power distribution !
Po
wer
(n
orm
aliz
ed)
Length [μm]
37
Experiment - Power-Aware Router Routing Experiment optimizing processor’s blocks
Local nodes (clock and signals) consume 66% of dynamic
power
10% of nets consume 90% of power
Min. spanning trees can save over 20% Interconnect power
Routing with spacing can save up to 40% Interconnect power
Small block’s local clock network
38
Power-Aware Router Flow
Power grid routing
Clock tree routingWith spacing
Global and Detailed Routing -of the un-routed nets
(timing and congestion driven)
All netsrouted?
Power-aware Rip upand re-route
No
Yes
Finish
Top n% power consumingsignal nets routing
Clock tree: high FO, long lines, very active
Rip-up: not high power nets
Avoiding congestion
Followed by downsizing
39
0%
10%
20%
30%
40%
50%
60%
Block A Block B Block C Block D Block E
Dyn
amic
pow
er s
avin
g
Driver Downsizing
Router Power Saving
Results - Power Saving
Average saving results: 14.3% for ASIC blocks 1
Downsize saving
Router saving
Average
1 - Estimated based on clock interconnect power
40
Backup
41
Rent’s Rule
Empirical rule
Terminals versus
Number of gates.
Published by:
B. S. Landman and R. L. Russo. On a pin versus block relationship for partitions of logic graphs.
IEEE Trans. on Comput., vol. C--20: pages 1469--1479, 1971.
Taken from Krishna Saraswat in SLIP 2000
42
Rent’s parameters
N gates
Rent’s rule: T = k N r
T = # of I/O terminals (pins)N = # of gatesk = avg. I/O’s per gater = Rent’s exponent
can be: 0 < r < 1 , but common - (simple) 0.5 < r < 0.75 (complex)
T terminals
43
Rent’s Rule Example
Lets assume Rent’s parameters: r=0.79 and k=2.
For a single gate: N=10.792 1 2rT k N
For a block of four gates: N=4
0.792 4 6rT k N
Fan out is implied by Rent.
44
Is Rent’s rule a coincidence ?
Random circuits do not obey Rent.
Rent’s parameters are correlated with Place and Route algorithms.P. Verplaetse J. Dambre D. Stroobandt J. Van Campenhout. On Partitioning vs. Placement Rent Properties. In Proc. of Intl. Workshop on System-Level Interconnect Prediction, March 2001.
Self similarity within circuits – Obeys Rent.
Assumption: the complexity of the interconnection topology is equal at all levels.
Conclusion – Rent’s rule is a result of the design and synthesis.
45
Donath’s Hierarchical Placement Model
1. Partition the circuit 4 equal sized modules, with a minimal cut.
2. Partition the Manhattan grid 4 equal sized modules, with a minimal cut.
3. Map the modules to the grid Arbitrary mapping.
4. Repeat recursively Until each block is assigned to one cell.
W. E. Donath. Placement and Average Interconnection Lengths of Computer Logic. IEEE Trans. on Circuits & Syst., vol. CAS-26, pp. 272-277, 1979.
Result – Rent’s parameters
46
Donath’s length estimation model
For the i-th level:
There are 4 blocksi
For each block there are: r
terminals4i
Nk
Assuming two-terminal nets :
r
nets2 4i
k N
The nets of the i-1 level must be substracted.
r r r
1 11
4 - 4 4 1 42 4 2 4 2 4
i i i ri i i
k N k N k N
Nets for level i : ni=
47
Average interconnection length
Taken from a SLIP 2001 tutorial by Dirk Stroobandt
The wires can be of two types A and D.
LA =
LD =
1 1 1 1
4
4 1
3 3A A B B
A B B Ai j i j
i i j j
1 1 1 1
4
2
2A A B B
A A B Bi j i j
i j i j
The average: ri= 14 2
9 9
1
1
I
i ii
I
ii
n rR
n
0.5 1.5 1
0.5 1.5 1
2 1 1 1 47
9 4 1 1 4 1
r r r
r r r
N N
N
Overall : equals
48
Results Donath
Scaling of the average length L as a function of the number of logic blocks N :
0.5 ( 0.5)
log( ) ( 0.5)
( ) ( 0.5)
rN r
L N r
f r r
Similar to measurements on placed designs.Taken from a SLIP 1999 tutorial by Dirk Stroobandt
L
G
0
5
10
15
20
25
30
1 10 100 103 106105104 107
r = 0.7
r = 0.5
r = 0.3
N
49
Donath’s Model - overview
Provides average net length based on the circuit’s size and Rent parameters.
Can provide a rough net length distribution.
Obvious limitations:Uniform distribution.Partitioning algorithm.Two terminals nets only.Assumes perfect similarity.