1 1 avinoam kolodny technion – israel institute of technology intel pvpd symposium july 2006...

Post on 15-Jan-2016

214 Views

Category:

Documents

0 Downloads

Preview:

Click to see full reader

TRANSCRIPT

11

Avinoam Kolodny

Technion – Israel Institute of Technology

Intel PVPD SymposiumJuly 2006

Issues in the Design of Wires

22

Thanks to my students and collaborators

Anastasia Barger

Shay Michaely

Konstantin Moiseev

Nir Magen

Michael Moreinis

Arkadiy morgenshtein

David Goren

Shmuel Wimer

Uri Weiser

Nachum Shamir

Israel Wagner

Ran Ginosar

Eby Friedman

33

Connectivity and Complexity

44

What are the issues with wires?

Delay Power Noise Reliability Cost

55

Scope of this talk

Interconnect delay Interconnect Power

66

Sizing and spacing of uniform bus wires

77

RC wire delay model

0.378( )( )delay R l C l ),,,(

),(

SWHTCC

TWRR

N wiresW S

A

l

Wires do not scale well!

88

Coupling capacitance typically dominates

0.378( )( )delay R l C l ),,,(

),(

SWHTCC

TWRR

W S

A

l

99

Data Rate Optimizationin an Interconnect Channel

Increase N by: making the wires narrow (small W), and dense (small S)

What will happen to the delay?

1Data Rate N f N

delay

N wiresW S

A

* A. Barger, D. Goren, A. Kolodny, “Simple Design Criterion for Maximizing Data Rate in NoC links”, SPI 2006.

l

1010

Data Rate vs. Wire Width (W) and Spacing (S)

T=1m, H=1m, A=60m, l=2mm.

S [m] W [m]

opt optW S TH

Rough Approximation:

1 1_

( ) ( , )

AData Rate N

delay S W delay S W

1111

What about the speed of light? Assume S=W

RLC

RC

W=S [u]

_ _

_

_ _

of

r

T time of flight

wire length

speed of light

RC model is unrealistic here!

1212

Evolution of Wire Delay Models

*Source: E. G. Friedman, U. or Rochester

1313

RLC delay model Approximation

Inductive effects: Longer delay Steeper slope overshoot

0

0.5

1

1.5

2

2.5

0 50 100 150 200 250

time [psec]

[V]

RCdelay

RLCdelay

2.9 1.35 20.74asyml

asymdelay LC e l l

* Eby G. Friedman, Yehea E. Ismail, On-chip inductance in high speed integrated circuits, 2001

2asym

R

LC

L,C and R are per unit lengthl denotes wire length

RCmodel

RLCmodel

1414

Choosing Wire Width and Spacing for maximal Data Rate

T=1m H=1m, A=60m,l=2mm.

RC model

RLC model

1 1_

( ) ( , )

AData Rate N

delay S W delay S W

1515

A simple criterion to choose wire width for peak data rate

_ ofRC delay T Assume S=W

time_of_flight

RC_delay=0.37RCl2

RC region RLC region

* A. Barger, D. Goren, A. Kolodny, “Simple Design Criterion for Maximizing Data Rate in NoC links”, SPI 2006.

1616

Peak data rate is near RC/RLC boundary

RLC

RC

RC model is O.K.RC modelis unrealistichere

Simple Criterion for Maximal Data Rate

2 _0.37

_ _

r

wire lengthRCl

speed of light

RC region RLC region

W*

1717

We know how to extract C, but what about L?

0

_ _ * ( )( )total total

r

ltime of flight l LC L C

c

2( _ _ )total

total

time of flightL

C

l is the wire length

c0 is the speed of light in vacuum

r is the dielectric coefficient of the insulator

L and C are per unit length

propagation speed in the wire is

Ltotal and Ctotal are for the whole wire

0 1

r

c

LC

1818

Fast wires must use transmission line layout

Ground plane and/or wires provide current return path

w

tgGROUND

wg

t

h

S S

ws

t

wg

tgGROUND

h

S S

d

h

wg

w

SIGNAL

GROUND

t

tg

h

wg

w

SIGNAL

GROUND

t

tg

ws

t

d

1919

Slower propagation because of Crossing Lines

Crossing lines increase the

capacitance, but….

They do not provide a

return path for the

signal current

They do not reduce

inductance

Time of flight (TOF)

becomes longer !

Extract capacitance CRETURN

(by ignoring the crossing

lines in the layout) to

estimate the longer TOF from

this expression:

2 2( ) ( _ )OF OFtotal

RETURN total

T longer TL

C C

Cro

ssin

g Li

nes

Ground Plane

* A. Barger, D. Goren, A. Kolodny, “Simple Design Criterion for Maximizing Data Rate in NoC links”, SPI 2006.

2020

conclusions on uniform buses

Most wires operate at the RC region Simple criterion for peak data rate ensures this

Most wires are laid out at higher density, and operate more slowly

RLC model is necessary only for a few wires When propagation speed is important

Make them wide and thick to reduce R

Use Transmission line layout for these wires!

2121

Sizing and spacing of individual wires

in interconnect channels

2222

Should all wires be the same?How about optimizing individual widths and spaces?

N wiresW S

l

A

ii-1

WiWi-1

Si

Vcc Vcc

A

Si-1

i+1

Wi+1

1 0

n n

i ij j

W S A

Strong driver

Weak driver

Weak driver

A is a fixed constraint

2323

Interconnect channel structure

* S. Wimer, S. Michaely, K. Moiseev and A. Kolodny, "Optimal Bus Sizing in Migration of Processor Design", IEEE Transactions on Circuits and Systems – I, vol. 53. no. 5, May 2006.

2424

Delay Model for wire i

iiii

i

iiiii SSW

ed

W

cbWaSW

11,

1

2525

Timing Objectives for interconnect channel Optimization

n

i iiii

i

iiiii

n

iii SSW

ed

W

cbWaTSWTSWf

1 111

11,,

21

, ,n

ii

f W S W S

SWSWf ini

,, max1

4

iini

TSWSWf

,, max1

3

all Objective Functions are convex

total sumof delays

total sumof slacks

max delay

worst negativeslack

2626

Minimizing Total Sum of Delays (or Slacks) (equivalent to minimizing the average wire delay)

Objective: minimize

Total channel width constraint:

At optimum :

This leads to algebraic equations in 2n+2 variables:

Unique global optimum!

ASWSWgn

ji

n

ji

01

,

2f g

nn SSWW 01 ,,

21

, ,n

ii

f W S W S

2727

Minimizing the Maximal Delay (MinMax problem)(Optimizing the worst-case wire)

This objective function is not differentiable: no analytic solution

Theorem:

In the optimal MinMax solution, the delays of all the

wires are equal

SWSWf ini

,, max1

4

ASWSWgn

ji

n

ji

01

,

Objective: minimize

Same constraint:

2828

Why all wires become equally criticalin MinMax solution?

Iteratively allocate more and more area resources to the slowest wire The critical wire will improve Its neighbors will lose these resources … Until the neighbors become critical too

L

A

L

A

Criticalwire

L

A

2929

Iterative Algorithm(Wire sizing and spacing for MinMax Delay)

1. Set initial solution

2. Equalize all delays (iteratively)

3. Apply ‘area preserving local modification’

4. Go to 2 if 3 yielded max delay reduction

3030

Example: Minimizing the worst slack

Cross section of bus wires after MinMax slack optimization, assuming a critical signal (required early) in the middle.

Distance from sidewall [m]

Required delay [ps]

Obtained delay [ps]

3131

Interleaved bus example(odd-numbered drivers are strong, even-numbered drivers are weak)

Distance from sidewall [m]

Widths [m]

Spaces [m]

Delays [ps]

Widths [m]

Spaces [m]

Delays [ps]

After total sum of delays minimization:

After MinMax (worst case wire delay) minimization:

3232

Relation Between Minimal Total Sum and MinMax

Which kind of optimization is more useful in practice?

ii-1

WiWi-1

Si

Vcc Vcc

A

Si-1

i+1

Wi+1

3333

Effect of the constraint A

0

20

40

60

80

100

120

140

160

180

200

2 4 6 8 10 12 14 16 18 20

Bus width (A), um

Del

ay, p

s

Delay after MinMaxopt.

Avrg. delay aftertotal sum opt.

Min delay of wire intotal sum opt.Max delay of wire intotal sum opt.

*

'

"

3434

Migration of a bus in 65nm technology

3535

Conclusions on optimizing individual wire widths and spaces

Some performance improvement by wire sizing/spacing,

according to individual signal slack, driver resistance, etc.

Sum-of-delays is a useful objective for minimization Very appropriate for automated migration of layout to a new process

3636

What else can we do with wires in an interconnect channel?

3737

1234

1324

Wire reordering (permutation)

3838

Reordering of wires to improve

the optimal delay

3939

Which order is better?

Weak driver

W

Strong driver

Best order! Worst order

4040

The idea behind net reordering

Arrange nets by driver resistance such that cross-capacitances are shared optimally

4141

Symmetric Hill Order

Take wires sorted in descending order of driver resistance and put alternately to the left and right sides of the bus channel

Obtained permutation of wires is called Symmetric Hill order

7 6 5 4 3 2 1

Symmetric Hill order provides best sharing of inter-wire spaces

Rd

riv

er

4242

Optimal order theorem

given an interconnect channel whose wires are of uniform width W, ‘Symmetric Hill’ order of signals yields minimum total sum of delays (after spacing optimization).

Rd

riv

er

* K. Moiseev, S. Wimer and A. Kolodny, “Timing Optimization of Interconnect by Simultaneous Net-Ordering, Wire Sizing and Spacing,” ISCAS 2006.

4343

Optimal order in more general cases?

Symmetric Hill was proven optimal for most practical cases (total sum of delay minimization)

Example:20 sets of 5 wires Rdr: [0.1 ÷ 2] KΩ

(random) Cl: [10 ÷ 200] fF

(random) Bus length: 600 μm Bus width: 12 μm Technology: 90 nm

A good heuristic: Don’t try all permutations! 1) Put the signals in symmetric hill of their drivers 2) Perform optimization of widths and spaces

4444

Symmetric Hill for MinMax delay?

Not necessarily optimal

BUT: Was found optimal for most practical cases It is a good heuristic In fact, the MinMax delay is very sensitive to wire reordering

Symmetric hill is also good for reducing delay uncertainty because of crosstalk noise

4545

Delay improvement by reordering(65nm technology, examples of 5 wires)

4646

Wire reordering is most effective when there is a mix of driver strengths

65 nanometer technology, A=3m, L=500 m

0

5

10

15

20

1 2 3 4 5 6

Number of weak drivers in a channel of 7 wires

Del

ay i

mp

rove

men

t,

%

Average delayminimizationCritical delayminimization

Delay improvement: from worst ordering to best ordering

4747

0

5

10

151

1.9

2.8

3.7

4.6

5.5

6.4

Relative range of drivers

Del

ay im

pro

vem

ent,

%

Critical delayoptimization

Average delayoptimization

Impact of the range of driver strengths

4848

Conclusions on wire reordering

It is yet another degree of freedom! Can help if there is a mix of driver strengths Don’t try permutations…. Use Symmetric Hill

4949

Can we use wire sizing and spacing

to reduce power?

5050

0%

10%

20%

30%

40%

50%

60%

70%

80%

90%

100%

0.15 0.13 0.1 0.09 0.08 0.07 0.065 0.045 0.032 0.022

Generation

% G POW

% D POW

% IC POW

Dynamic Power breakdown

Interconnect

Diffusion

Gate

Technology generation [μm]Source: Nir Magen, SLIP04 ITRS 2001 Edition adapted data

The interconnect power problem

5151

Interconnect power

Interconnect switching power:

Coupling capacitance:

allocate large spaces to wires with high activity!

2area+fringe coupling DDP= C C V f

couplingC 1/S

5353

0%

10%

20%

30%

40%

50%

60%

Block A Block B Block C Block D Block E

Dyn

amic

pow

er s

avin

g

Driver Downsizing

Router Power Saving

Results of spacing experiment

Average saving results: 14.3% for ASIC blocks 1

Downsize saving

Router saving

Average

1 - Estimated based on clock interconnect power

* N. Magen, A. Kolodny, U. Weiser and N. Shamir,  “ Interconnect-power dissipation in a   Microprocessor,”  SLIP 2004.

5454

Wire reordering for power?

Use Symmetric Hill odrer according to activity factors of the signals?

LSi Si+1

Ci-1 Ci Ci+1

Vcc Vcc

1i 1i i

A

5555

Conclusions on interconnect power

5656

Optimization of drivers and wires

together?

5757

Optimizing the drivers and wires together?

Lw

Lw/n Lw/n Lw/n Lw/n1 2 n-1

Input Logic Output Logic

5858

“Logic Gates as Repeaters” (LGR) Idea

Distribute gates over the wire – each gate drives a segment

InterconnectLogical Circuit

* M. Moreinis, A. Morgenshtein, I. Wagner and A. Kolodny, “Logic gates as  Repeaters,” IEEE Transactions on VLSI,  2006.

5959

Gate Delay – Logical Effort

1 ii wgate i i

i

C CD g p

C

1i iinterconnect w w iD R C x C

211

1

Ni i int

tot i i i int int i int ii i

C L CD g p x L R C L R C

C

int int,i iw i w iC L C R L R gi gi+1

Rwi

Interconnect Segment Interconnect Segment

Cwi

Rwi+1

Cwi+1

Total Delay

Wire Delay – Elmore

LGR Delay Modeling

6060

1

2 2opt

av i av ii

w w

L R R L C CLL

N x R x C

12 3

4L1 L2L3 L4

Optimal segmenting

6161

Length/n

Length

INOUT

Cload

INOUT

Cload

Length/n Length/n1 n-1

Length1 Length2Length3

OUT

Cload

IN

Linv n-11 Linv

Wire length -1200 µm

Un-optimized 1.6 nsec

Repeater Insertion 0.42 nsec K=5, sinv=43

LGR+Repeters 0.2 nsec

s1=×16, s2=×12, s3=×25,sinv=×48, K=2L1=0µm, L2=120µm, L3=160µm, Linv=480µm,

Example: LGR + Repeater Insertion

6262

More work to do

Treat gate sizing and wire optimizations together!

Interesting issues for future research: Signal correlations Minimizing impact of crosstalk Timing / power / noise interactions

6363

Summary •Sizing and spacing of uniform bus wires•Sizing and spacing individual wires•Wire reordering•Speed and Power improvements•Optimizing drivers and wires together•Future convergence of these techniques?

top related