an efficient surface-based low-power buffer insertion algorithm

27
ISPD’05: Surface-Based Buffer Insertion 04-Apr- 05 1 An Efficient Surface-Based An Efficient Surface-Based Low-Power Buffer Insertion Low-Power Buffer Insertion Algorithm Algorithm Rajeev R. Rao, David Blaauw, Dennis Sylvester, Charles Alpert*, Sani Nassif* Department of EECS, University of Michigan, Ann Arbor, MI IBM Austin Research Laboratory, Austin, TX* {rrrao, blaauw, dennis}@eecs.umich.edu, {alpert, nassif}@us.ibm.com*

Upload: briana-palmer

Post on 17-Jan-2018

224 views

Category:

Documents


0 download

DESCRIPTION

Total Dynamic Power Breakdown Interconnect Trends Interconnect power a major issue Huge power consumption in both global and local signal nets Repeater counts increasing drastically IBM: 50% of leakage in inverters/buffers Assuming continuation of current design styles, dramatic projections for the 32nm technology node 70% of cell count = repeaters 65-80% of dynamic power due to interconnects Leakage increasing exponentially Require: Optimal repeater usage with the objective of total power minimization Source: N. Magen, SLIP’04 Total Dynamic Power Breakdown 10 20 30 40 50 60 70 80 90nm 65nm 45nm 32nm %repeater cells in block-level nets clk-rep rep tot-rep Source: P. Saxena, ISPD’04

TRANSCRIPT

Page 1: An Efficient Surface-Based Low-Power Buffer Insertion Algorithm

ISPD’05: Surface-Based Buffer Insertion04-Apr-05 1

An Efficient Surface-Based An Efficient Surface-Based Low-Power Buffer Insertion Low-Power Buffer Insertion

AlgorithmAlgorithm

Rajeev R. Rao, David Blaauw, Dennis Sylvester, Charles Alpert*, Sani Nassif*

Department of EECS, University of Michigan, Ann Arbor, MIIBM Austin Research Laboratory, Austin, TX*

{rrrao, blaauw, dennis}@eecs.umich.edu, {alpert, nassif}@us.ibm.com*

Page 2: An Efficient Surface-Based Low-Power Buffer Insertion Algorithm

ISPD’05: Surface-Based Buffer Insertion04-Apr-05 2

Interconnect TrendsInterconnect Trends• Interconnect power a major issue

– Huge power consumption in both global and local signal nets

– Repeater counts increasing drastically– IBM: 50% of leakage in inverters/buffers

• Assuming continuation of current design styles, dramatic projections for the 32nm technology node

– 70% of cell count = repeaters– 65-80% of dynamic power due to interconnects– Leakage increasing exponentially

• Require: Optimal repeater usage with the objective of total power minimization

01020304050607080

90nm 65nm 45nm 32nm%

repe

ater

cel

ls in

blo

ck-le

vel n

ets

clk-repreptot-rep

Source: P. Saxena, ISPD’04

Source: N. Magen, SLIP’04

Total Dynamic Power Breakdown

Page 3: An Efficient Surface-Based Low-Power Buffer Insertion Algorithm

ISPD’05: Surface-Based Buffer Insertion04-Apr-05 3

OutlineOutline• Introduction

– Delay and Buffer models

• Previous Work• Proposed Algorithm

– Library characterization– Generation of different types of candidates– Merging, Propagation, Snapping

• Results• Conclusion

Page 4: An Efficient Surface-Based Low-Power Buffer Insertion Algorithm

ISPD’05: Surface-Based Buffer Insertion04-Apr-05 4

IntroductionIntroduction• Wire RC delay is quadratic function of wire length• Segmenting wires decreases delay

• Same idea applicable for interconnect tree structures– Buffers inserted for delay management– Additional benefit: Buffers/Inverters decouple large output loads

ReceiverDriver2

Wire Length = 2, Wire Delay (2)2 = 4

ReceiverDriver

Wire Length = 2, Wire Delay (1)2+(1)2 = 2

Repeater1 1

Page 5: An Efficient Surface-Based Low-Power Buffer Insertion Algorithm

ISPD’05: Surface-Based Buffer Insertion04-Apr-05 5

Elmore Delay modelElmore Delay model• Represent interconnect tree with a

lumped RC model• Assume binary tree topology is fixed

with an initial Steiner tree estimation– n vertices (branch points) and (n-1) edges (ie., wires)

• For a wire e connecting vertices (u, v) the Elmore delay is:

where T(v) is the maximal subtree rooted at v that does not contain buffers

• The total delay from a vertex v to a sink node si is:

Source: Digital Int. Circuits, J. Rabaey

)()( )(21

vTee CCReDelay

),(),(

)()(),(sivpathvue

vDelayeDelaysivDelay

Page 6: An Efficient Surface-Based Low-Power Buffer Insertion Algorithm

ISPD’05: Surface-Based Buffer Insertion04-Apr-05 6

Buffer modelBuffer model• Linear gate delay model used for the buffers

– Assumption: Delay is a linear function of output capacitance

• Isolation Property: Buffer devices decouple “downstream” output loads from the parent trees– Assumption: Miller effect (“bootstrapping”) due to Cgd is negligible

Node v “sees” a downstream load = Cbuf. Cload is “invisible” to v.Cload

v

Cgd

Cbuf

Dbuffer = Dintrinsic-delay + Rintrinsic-resistance*Coutput-load

Page 7: An Efficient Surface-Based Low-Power Buffer Insertion Algorithm

ISPD’05: Surface-Based Buffer Insertion04-Apr-05 7

Buffer Insertion ProblemBuffer Insertion Problem

• Timing Metrics– Required Arrival Time (RAT)

• Each sink specified a given RAT(si) value and source is fixed as RAT(so)=0• Delay minimization Maximize slack at source q(so)

– Subtree Delay (SD)• SD(si) = RATmax(si) – RAT(si)• Delay minimization Minimize SD(so)• Advantage: Unlike RAT, equations using SD are additive

• Our approach– Tradeoff surfaces in 3D space of delay, capacitance and power– Continuously-sized buffer libraries

BufLib

b1

b2

b3

Source

Sink

Legal position

Page 8: An Efficient Surface-Based Low-Power Buffer Insertion Algorithm

ISPD’05: Surface-Based Buffer Insertion04-Apr-05 8

OutlineOutline• Introduction

– Delay and Buffer models

• Previous Work• Proposed Algorithm

– Library characterization– Generation of different types of candidates– Merging, Propagation, Snapping

• Results• Conclusion

Page 9: An Efficient Surface-Based Low-Power Buffer Insertion Algorithm

ISPD’05: Surface-Based Buffer Insertion04-Apr-05 9

Previous WorkPrevious Work• L. P. P. P. van Ginneken (VG) – ISCAS’90

– Two phase dynamic programming algorithm• Backward traversal up the interconnect tree to compute of load and delay values• Forward solution pass to reconstruct “best” candidate

Merge operation Cparent = Cleft + Cright

SDparent = max(SDleft, SDright)

Buffer candidate creation

Pruning provably inferior candidates

Post-order DFS traversalFunction BOTTOM_UP (v)1. If v ε sink { return (Cv, SDv) } Else2. /* compute options for subtrees */3. BOTTOM_UP( left(v) )4. BOTTOM_UP( right(v) )5. Join pairs of subtrees by a merge operation6. Find best cnd among merged cnds to add a buffer7. Add parent wire to both types of cnds8. Prune inferior cnds from set of cnds9. Store cnd list for node v and return

Page 10: An Efficient Surface-Based Low-Power Buffer Insertion Algorithm

ISPD’05: Surface-Based Buffer Insertion04-Apr-05 10

VG AlgorithmVG Algorithm• Candidate Format: 2-tuple (Load, Subtree Delay) = (c,s)• Recursive forumulas for two possible cases

• Pruning Criteria: (c1,s1) “better” than (c2,s2) if both load and subtree delay values are lower i.e., c1<c2 and s1<s2

– Merge operation linear

• Complexity = O(n2) where n = number of buffer locations– Additional objective: Minimize buffer count Complexity is non-polynomial

Only a wire is added at root of subtree A buffer and a wire added at root of subtreec1 = c0 + cwire

s1 = s0 + dwire

c1 = cbuf + cwire (Isolation Property)s1 = s0 + dint + rbuf*c0 + dwire

(c0.s0) (c1.s1) (c0.s0) (c1.s1)

Page 11: An Efficient Surface-Based Low-Power Buffer Insertion Algorithm

ISPD’05: Surface-Based Buffer Insertion04-Apr-05 11

Previous WorkPrevious Work• Extensions to VG by Lillis et. al. – ICCAD’95, JSSC’96

– A buffer library B can be used during buffer insertion Complexity = O(n2|B|2)

– Simultaneous wire sizing and buffer insertion– Incorporate signal slew into buffer delay model

• Dynamic power minimization subject to timing constraints– Candidate Format: 3-tuple (Load, Subtree Delay, Power) = (c,s,p) – Equate power with effective “total” capacitance– Assumption: All capacitive values can be linearly mapped onto a

polynomially-bounded integer domain (cmax = max cap value)– Sophisticated pruning mechanism using orthogonal range query– Complexity = O(n3|B|c2

maxlog(ncmax)) based on the assumption

Page 12: An Efficient Surface-Based Low-Power Buffer Insertion Algorithm

ISPD’05: Surface-Based Buffer Insertion04-Apr-05 12

Previous WorkPrevious Work• Several approaches presented in literature to target power

minimization in conjunction with buffer insertion. Examples:– Quadratic programming: Chu et. al. – TCAD’99– Lagrangian relaxation: C.-P.Chen et. al. TCAD’99– ClockTune: J.-L.Tsai et. al. – TCAD’04

• Associate total power with effective capacitive area of wires + devices– Area minimization Power minimization– Ignores the contribution of static leakage power– Inclusion of this component results in non-polynomial complexity– Addition of extra components in candidates generally leads to exponential

complexity for dynamic programming

Page 13: An Efficient Surface-Based Low-Power Buffer Insertion Algorithm

ISPD’05: Surface-Based Buffer Insertion04-Apr-05 13

Contributions of this paperContributions of this paper• Novel “continuous” buffer insertion algorithm with total power

minimization– Inclusive of both dynamic and leakage power

• Generate tradeoff surfaces in the 3D DCP (Delay, Capacitance, Power) space– User is able to pick any desired point on this 3D surface– Easy to explore trade-offs between the 3 variables

• Ability to handle arbitrarily large buffer libraries– Continuously sized cell libraries with numerous buffer sizes– Capable of snapping to discrete buffer sizes if necessary

• Worst-case polynomial complexity O(n2)– Similar to “basic” VG algorithm

Page 14: An Efficient Surface-Based Low-Power Buffer Insertion Algorithm

ISPD’05: Surface-Based Buffer Insertion04-Apr-05 14

OutlineOutline• Introduction

– Delay and Buffer models

• Previous Work• Proposed Algorithm

– Library characterization– Generation of different types of candidates– Merging, Propagation, Snapping

• Results• Conclusion

Page 15: An Efficient Surface-Based Low-Power Buffer Insertion Algorithm

ISPD’05: Surface-Based Buffer Insertion04-Apr-05 15

Library CharacterizationLibrary Characterization• Buffer library with a set of continuously sized buffers• Let S = sizing factor of the library. Express delay (db),

capacitance (cb) and leakage (lb) in terms of S.

• Determine c0, c1, l0, l1, d0, d1 through empirical fitting constants• Equations combine discrete buffer sizes approximate the ideal

of continuous buffer sizing

cb Buffer Area cb = c0 + c1*Slb Device width lb = l0 + l1*S

db Linear gate delay model db = d0 + d1*(Cout/S)

Page 16: An Efficient Surface-Based Low-Power Buffer Insertion Algorithm

ISPD’05: Surface-Based Buffer Insertion04-Apr-05 16

Generation of candidatesGeneration of candidates

• Point Candidate– Candidate Format: 3-tuple (Do, Co, Po)– Node has point candidate there are no buffers in subtree rooted at that

node– All sinks have point candidates

• Write equations to determine candidate at u

o u v tlw1 lw2 lw3

(D0, C0, P0)

b1 b2 b3 b4

Page 17: An Efficient Surface-Based Low-Power Buffer Insertion Algorithm

ISPD’05: Surface-Based Buffer Insertion04-Apr-05 17

Generation of candidatesGeneration of candidates

• Curve Candidate– Candidate Format: {[Dumin,Dumax], (gi, ki) i=[0,2]}– Node has curve candidate Exactly one buffer in subtree rooted at node

o u v tlw1 lw2 lw3

(D0, C0, P0)

b1 b2 b3 b4

ScclrlcrSCddD

ddDD

wwwwwoo

wirebou

1012110

11

21/

11011 wwwirebu lcSccccC uDLwirebou CkSllkPplPP 10011

2210 uuu DgDggC

2210 uuu DkDkkP

(Du, Cu, Pu)

(D0, C0, P0)

Variable S

Page 18: An Efficient Surface-Based Low-Power Buffer Insertion Algorithm

ISPD’05: Surface-Based Buffer Insertion04-Apr-05 18

Generation of candidatesGeneration of candidates

• Surface Candidate– C-plane Format: {Cv, [Dmin,Dmax], (ki) i=[0,2]}– Candidate Format: vector<CPlane>

o u v tlw1 lw2 lw3

(D0, C0, P0)

b1 b2 b3 b4

ScclrlcrSCddD

ddDD

wwwwwuu

wirebuv

1022210

22

21/

21022 wwwirebv lcSccccC vDLuwirebuv CkSllkPplPP 1022

2210| vvCv DkDkkPv

(Du, Cu, Pu)

Variable S,Du

For a given S, Cv fixed, Dv, Pv vary based on Du

C-plane with “discrete” Cv

Pv

Dv

Cv

(Du, Cu, Pu) (Dv, Cv, Pv)

Page 19: An Efficient Surface-Based Low-Power Buffer Insertion Algorithm

ISPD’05: Surface-Based Buffer Insertion04-Apr-05 19

Generation of candidatesGeneration of candidates

• Similar equations can be written to determine candidate at t– Ct S but Dt, Pt Cv, Dv, S– New set of C-planes. C-plane, Lower envelope Power optimal

solution

• Surface candidate Surface candidate

o u v tlw1 lw2 lw3

(D0, C0, P0)

b1 b2 b3 b4

(Du, Cu, Pu)(Du, Cu, Pu) (Dv, Cv, Pv)(Dv, Cv, Pv) (Dt, Ct, Pt)

Page 20: An Efficient Surface-Based Low-Power Buffer Insertion Algorithm

ISPD’05: Surface-Based Buffer Insertion04-Apr-05 20

Design ChoicesDesign Choices• Wire network is a binary tree

– Zero-length wires, dummy nodes

• Ignore signal polarity on buffers– Pair of solution sets (similar to Lillis)

• Number of surface candidates per node = 2 (Buffered/Non-buffered)– Trade-off between more fine grained solutions and efficiency– No impact on optimality or complexity

Page 21: An Efficient Surface-Based Low-Power Buffer Insertion Algorithm

ISPD’05: Surface-Based Buffer Insertion04-Apr-05 21

Merging and Implicit PruningMerging and Implicit Pruning• First, merge left and right candidate

– Compare equal delay points by checking 4 combinations of left and right candidates

– Create P/C curves and extract the lower envelope Pruning– Translate P/C curves with fixed D value into P/D curves with fixed C

values Creation of C-planes for 4 different surface candidates

• Next, recombine these 4 surfaces into single candidate– Map P/D curves from one C-plane to another using linear interpolation (D,C) value pick lowest power value Pruning

• Use composite surface to create the buffered/non-buffered candidate

Page 22: An Efficient Surface-Based Low-Power Buffer Insertion Algorithm

ISPD’05: Surface-Based Buffer Insertion04-Apr-05 22

Reconstruction and SnappingReconstruction and Snapping• Pair of candidate solutions created for source• Any trade-off point in the DCP surface can be picked

– Forward solution pass to reconstruct the tree structure with buffer locations

• Snapping: If required size is unavailable then buffer with nearest size value is chosen– Problem: Discrepancies in D, C, P values Solution: Local refinements

in the C-planes– Single pass through the RC tree

• Complexity = O(n2) where n = number of possible buffer locations

Page 23: An Efficient Surface-Based Low-Power Buffer Insertion Algorithm

ISPD’05: Surface-Based Buffer Insertion04-Apr-05 23

OutlineOutline• Introduction

– Delay and Buffer models

• Previous Work• Proposed Algorithm

– Library characterization– Generation of different types of candidates– Merging, Propagation, Snapping

• Results• Conclusion

Page 24: An Efficient Surface-Based Low-Power Buffer Insertion Algorithm

ISPD’05: Surface-Based Buffer Insertion04-Apr-05 24

ResultsResults• Benchmarks = C-tree nets• TSMC 0.13um buffer library

– Number of discrete buffer choices = 9

• Multilinear fitting models using GNU Scientific Library

• Example 3D surface

Page 25: An Efficient Surface-Based Low-Power Buffer Insertion Algorithm

ISPD’05: Surface-Based Buffer Insertion04-Apr-05 25

Results: SnappingResults: Snapping

Cap (fF) Del (ps) Pow (uW) Cap (fF) Del (ps) Pow (uW)mcu1s9 24 30 14.8 276.1 9.9 13.9 276.2 10.0 9.0n8692 28 29 14.2 315.5 11.9 13.9 315.9 11.9 11.0

pointer3 25 33 16.7 311.6 8.9 15.9 311.9 8.8 9.0n313 23 31 9.7 442.5 7.8 9.8 441.5 7.7 8.0

n18905 33 43 7.4 206.8 16.4 7.5 207.1 16.1 16.0n7866 34 52 10.8 182.5 16.9 9.8 182.1 17.3 17.0n8702 46 60 8.5 196.7 16.9 7.5 196.3 17.1 22.0netbig4 79 81 17.3 462.8 26.8 18.0 463.8 26.6 37.0netbig3 64 67 17.2 534.0 25.7 18.0 533.2 25.8 30.0netbig2 74 79 14.2 531.5 28.9 13.9 533.5 28.8 34.0netbig1 91 102 11.7 445.6 22.2 11.9 448.5 22.3 43.0

# Buffers# Edges# SinksNetPost-SnapPre-Snap

Page 26: An Efficient Surface-Based Low-Power Buffer Insertion Algorithm

ISPD’05: Surface-Based Buffer Insertion04-Apr-05 26

Results: ComparisonResults: Comparison

Cap (fF) Del (ps) Pow (uW) Runtime (s) Cap (fF) Del (ps) Pow (uW) Runtime (s)mcu1s9 15.2 128.2 8.5 0.34 16.0 127.3 8.6 0.13n8692 16.9 169.1 8.7 0.37 16.0 168.0 8.6 0.13

pointer3 15.2 154.8 10.6 0.42 16.0 154.9 10.5 0.14n313 14.9 163.7 10.9 0.43 13.9 162.6 10.8 0.15

n18905 7.6 204.4 16.8 0.69 7.8 204.5 17.0 0.18n7866 10.2 185.3 17.1 0.82 9.8 185.7 17.3 0.20n8702 9.0 177.8 17.2 1.01 9.8 178.1 17.3 0.22netbig4 8.2 212.9 29.6 2.06 7.8 212.7 29.8 0.46netbig3 11.2 223.8 28.2 2.44 11.9 224.9 28.3 0.43netbig2 19.0 188.2 35.6 2.80 18.0 189.1 35.6 0.52netbig1 13.2 215.7 32.7 3.03 13.9 216.8 32.7 0.53

NetLillis algorithm Our algorithm

• Implementation of Lillis algorithm with leakage included– Pruning less effective

Page 27: An Efficient Surface-Based Low-Power Buffer Insertion Algorithm

ISPD’05: Surface-Based Buffer Insertion04-Apr-05 27

ConclusionConclusion• Buffer insertion algorithm with total power (Pdyn + Pstat)

minimization as objective• Generate 3D surfaces in Delay, Capacitance and Power

space– Ability to explore different types of trade-offs

• Able to handle large buffer libraries with continuous sizes• Worst case polynomial complexity