revamping electronic design process to embrace interconnect dominance

Post on 08-Jan-2016

27 Views

Category:

Documents

3 Downloads

Preview:

Click to see full reader

DESCRIPTION

Revamping Electronic Design Process to Embrace Interconnect Dominance. Chung-Kuan Cheng CSE Department UC San Diego La Jolla, CA 92093-0404 ckcheng@ucsd.edu. Interconnect Dominance. Goals: Speed, Power, Cost Constraints: Area, Current, Skew Challenges: PVT Variations, Signal Integrity. - PowerPoint PPT Presentation

TRANSCRIPT

1

Revamping Electronic Design Process to

Embrace Interconnect Dominance

Chung-Kuan ChengCSE DepartmentUC San DiegoLa Jolla, CA 92093-0404ckcheng@ucsd.edu

2

Interconnect Dominance

Goals: Speed, Power, Cost

Constraints: Area, Current, Skew

Challenges: PVT Variations, Signal Integrity

0

40

80

120

160

200

180 150 130 100 90 80 70 65 57 50Process Technology Node (nm)

Dela

y (p

s)

1mm Global I nterconnect with Scattering(source: I TRS Roadmap 2004)

FO4 I nverter Delay (Estimated by0.36*Ldraw)

1mm distortionless Transmission Line (Speedof Light)

3

Outlines

• Interconnect Technologies• Buffers, Pitches, Circuit Styles

• Geometrical Planning• Wire Orientations, Chip Shapes

• Interconnect Networks• Topologies, Wire Styles

• Power and Clock Distributions• Functional Modules

• Adders, Shifters

• Conclusion

4

Interconnect Technologies

• RC Wires• Wire Pitch, Width, Separation• Buffer Size, Buffer Interval

• Transmission Line• RLCG

5

Interconnect Technologies

2.5 0.31 0.08 1.31 1.5 1.5 0.13t h h ta

c s s s sC t

e e e es

0.2

0.35 0.651.53 0.98 0.01w sfr

c h hC h

e es

asC w

h

0.05 0.25

1.21.05 0.63 0.0632

t sfrs s h

C s te e

s h h

Coupling capacitance:

Self capacitance:

a fr a frc c s s

w

C C C CC

Total wire capacitance:

t

h

h

w

s

d

p/g plane

wire

w: wire widtht: wire thickness

s: wire spacingh: distance to p/g plane

d: wire pitch

Cc

Cs

Cc: coupling capacitance Cs: self capacitance

Cs

6

Interconnect Technologies

Year (On-Chip) 2005 2010 2015

rncn (ps)T40 0.870 0.400 0.180

rwcw (ps/mm)T80

metal 1

440 1792 5951

interval (um) 136 45.7 16.8

delay (ps/um) 0.120 0.164 0.200

,)1(2

ww

gn

cr

crfInt

SRC Roadmap 2005

wgwntrtr ccrrfllDelay ))1(22(/)(

7

Interconnect Technologies

ndelay bandwidth /bandwidth powerDesign metrics:

ndelay n ndelay power 2n ndelay powerObjective

functions:

For each pitch, For each objective function

Find wire width w, buffer size sinv and buffer interval linv

Calculate the metrics

8

Experimental Results – normalized delay

50100

150200

0

0.5

1

1.5x 10

-6

0

0.5

1

1.5

2x 10

-7

de

lay n(s

/m)

min-d

min-ddp

min-dp

1 2 3 4 5 6 7 8 9 10

x 10-7

0.2

0.4

0.6

0.8

1

1.2

1.4

1.6

1.8x 10

-7

pitch(m)

dela

y n(s/m

)

min-dmin-ddpmin-dp180nm130nm100nm70nm

0 0.2 0.4 0.6 0.8 1 1.2

x 10-6

0.4

0.6

0.8

1

1.2

1.4

1.6

1.8

2x 10

-7

pitch(m)

dela

y n(s/m

)

min-dmin-ddpmin-dp

Top left: OverviewTop right: at different pitches (solid lines: min-pitch; dash lines: saturating pitch)

Bottom right: at 70nm technology

9

Interconnect Tech. -- bandwidth

50100

150200

0

0.51

1.5x 10-6

1

2

3

4

5x 10

13

band

wid

th(b

its/s

)

min-d

min-ddp

min-dp

1.5 2 2.5 3 3.5 4 4.5

x 10-7

2.5

3

3.5

4

4.5

5x 10

13

pitch(m)

band

wid

th(b

its/s

)

min-dmin-ddpmin-dp180nm130nm100nm70nm

0 0.2 0.4 0.6 0.8 1 1.2

x 10-6

1

1.5

2

2.5

3

3.5

4

4.5

5x 10

13

pitch(m)

band

wid

th(b

its/s

)min-dmin-ddpmin-dp

Top left: OverviewTop right: at min-pitchesBottom right: at 70nm technology

10

Interconnect Tech. – bandwidth/power

50

100

150

200

00.5

11.5x 10

-6

0

1

2

3

4

5x 10

23

band

wid

th/p

ower

(m/J

s)

min-d

min-ddp

min-dp

1 2 3 4 5 6 7 8 9

x 10-7

0.5

1

1.5

2

2.5

3

3.5

4

4.5

5x 10

23

pitch(m)

ba

nd

wid

th/p

ow

er(

bits

*m/J

s)

min-dmin-ddpmin-dp180nm130nm100nm70nm

0 0.2 0.4 0.6 0.8 1 1.2

x 10-6

0.5

1

1.5

2

2.5

3

3.5

4

4.5

5x 10

23

pitch(m)

band

wid

th/p

ower

(bits

*m/J

s)min-dmin-ddpmin-dp

Top left: OverviewTop right: at different pitches (solid lines: min-pitch; dash lines: optimal pitch)

Bottom right: at 70nm technology

11

Interconnect Tech.: Transmission Line• Speed-of-the-light on-chip communication

• < 1/5 Delay of Traditional Wires

• Low Power Consumption• < 1/5 Power Consumption

• Robust against process variations• Short Latency• Insensitive to Feature Size

12

Interconnect Tech.: Transmission Line

RΔl LΔlGΔl CΔl

RΔl LΔl

GΔl CΔl …

i(z,t)

RΔl LΔl RΔl LΔl

Differential Transmission Line Surfliner

RΔl LΔlCΔl

RΔl LΔl

CΔl …

i(z,t)

RΔl LΔl RΔl LΔl

Serial resistance causes voltage loss.

Speed and attenuation are frequency dependent.

Shunt conductance compensates voltage loss: R/G = L/C.

Flat from DC Mode to Giga Hz

Telegraph Cable: O. Heaviside in 1887.

13

Theory (Telegrapher’s Equation)• Telegrapher’s equation:

),(),(),(

),(),(

),(

tzGVdt

tzdVC

dz

tzdIdt

tzdILtzRI

dz

tzdV

• Propagation Constant:

jCjGLjR ))((

• Wave Propagation: zjzeVzV 0)(

• Alpha and Beta corresponds to speed and phase velocity. Both are frequency dependant

14

Theory (Distortionless Line)

• Set G=RC/L• Frequency Independent speed and attenuation:

LCCLR ,//

• Characteristic impedance: (pure resistive)

CLZ /0 • Phase Velocity (Speed of light in the media)

cLCv /1• Attenuation:

zZ

R

ezA 0)(

15

Digital Signal Response

16

Interconnect Tech.: Transmission Line• Add shunt conductance between differential

wires

• Resistors realized by serpentine unsilicided poly, diffusion resistors, or high resistive metal

17

Geometrical Planning

• Wire Orientations• Manhattan, Hexagonal, Octagonal, Euclidean

• Die Shapes• Rectangle, Diamond, Hexagon, Octagon, Circle

18

Average Radius of Unit-Circle Area

lambda geo.

Shape

Man. Y-Arch X-Arch Euclid.

Square 1.329 1.122 1.070 1.017

Diamond 1.253 1.121 1.070 1.017

Hexagon 1.276 1.100 1.058 1.003

Octagon 1.272 1.104 1.054 1.001

Circle 1.273 1.103 1.055 1.000

19

Throughput : concurrent flow demand

lambda geo.

Shape

Man. Y-Arch X-Arch*

M: Square 1.000 1.225 1.346

M: Diamond 1.195

Y: Hexagon 1.315

X: Octagon* 1.420

*ratio of 0-90 planes and 45-135 planes is not fixed

20

Flow congestion map for uniform 90 Degree meshes

21

12 by 12 13 by 13

Congestion map of square chip using X-architecture

22

12 by 12 13 by 13

Y-architecture + Square Chip

23

Y Architecture + Hexagonal Chip

24

X-Architecture + Octagonal Chip

25

Manhattan Architecture + Diamond Chip

26

Routing Grids

(http://www.xinitiative.org/img/062102forum.pdf)

X-Architecture Y-Architecture

27

Interconnect Networks

• Optimized Interconnect Architecture• Data Bus, Control Signals

• Shared Interconnect• Packet Switching• Circuit Switching• RTL Level Partition

28

Interconnect Networks

Physical Implementation

On-chip transmission line for long distance communication

Well spaced RC wire with buffer insertion for local connections

29

Interconnect Networks

• Obj: Power, Latency

• Constraints:• Routing Area, Bandwidth

• Design Space:• Topology• Wire Styles, Switches

• Model:• Traffic Demand• Data Bus, Control Signals

30

Interconnect Networks: Design Flow

Power & Delay Lib

Topology Lib

Multi-commodity network flow (MCF)

formulation

Power Evaluation(MCF solver)

Latency Aware low power NoC topology

with wire style optimization

31

Power and Latency Tradeoffs for Optimal 8x8 Topologies

48

53

58

63

68

73

2. 3 2. 5 2. 7 2. 9 3. 1 3. 3Average Latency (ns)

Powe

r Co

nsum

ptio

n (W

)

area=3000um area=4000um area=5000umarea=6000um area=7000um area=8000umarea=9000um area=10000um area=11000um

32

Topology Selection (Latency, Power, BW)

48

58

68

78

88

98

2. 3 2. 8 3. 3 3. 8 4. 3

Average Latency (ns)

Powe

r Co

nsum

ptio

n (W

)

topo=opt i mal topo=mesh topo=torus topo=hypercube

33

(a) Optimal topology when area = 3000um

(b) Optimal topology when area = 7000um

(c) Optimal topology when area = 11000um

Topology Selection (Latency, Power, BW)

Optimal 8-node topologies vs area resources

34

Clock Distributions

35

Clock: Linear Variations Model

• Process variation model• Transistor length• Wire width• Linear variation model

• Power variation model• Supply voltage varies randomly (10%)

ykxkdd yx 0

36

Clock: RC Model

Input: an n level meshes and h-treesConstraint: routing area, parameter variationsObjective: skew

37

Simplified Circuit Model

Vs1Rs

C

Vs2Rs

C

R

1

2

u(t)

u(t-T)

38

Skew Expression

110 2ln CRt

)(2

1.

2

21.

1

21

V

VV

V

VVT

Assumptions:

1. T<<RsC

2. Rs /R <<RsC/T

Using first order

Taylor expansion ex=1+x,

)2ln2exp(

:function Skew

R

RTT s

39

Optimal Routing Resources Allocation

level-1

level-2

level-3

level-4

total area

40

Inductance Diminishes Shunt Effects

Vs1 Rs

C

Vs2 Rs

C

R

1

2

u(t)

u(t-T)

L

f(GHz) 0.5 1 1.5 2 3 3.5 4 5

skew(ps) 3.9 4.2 5.8 7.5 9.9 13 17 26

• 0.5um wide 1.2 cm long copper wire

• Input skew 20ps

41

Clock Distributions

Surfliner

42

Functional Modules (Data Path)

• Arithmetic

• Algorithms

• Logic

• Logic Styles

• Placement

43

Cyclic Shifter

S2

S1

S0

D7 D6 D5 D4 D3 D2 D1 D0

Z7 Z6 Z5 Z4 Z3 Z2 Z1 Z0

0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1

0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1

0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1

(7,1) (6,1) (5,1) (4,1) (3,1) (2,1) (1,1) (0,1)

(7,2) (6,2) (5,2) (4,2) (3,2) (2,2) (1,2) (0,2)

(7,3) (6,3) (5,3) (4,3) (3,3) (2,3) (1,3) (0,3)

(7,0) (6,0) (5,0) (4,0) (3,0) (2,0) (1,0) (0,0)

Delay: nlogn, Power: ¼ n2

44

Fan-out Splitting

Z7 Z6 Z5 Z4 Z3 Z2 Z1 Z0

D7 D6 D5 D4 D3 D2 D1 D0

S2

S1

S0 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1

0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1

0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1

(7,1) (6,1) (5,1) (4,1) (3,1) (2,1) (1,1) (0,1)

(7,2) (6,2) (5,2) (4,2) (3,2) (2,2) (1,2) (0,2)

(7,3) (6,3) (5,3) (4,3) (3,3) (2,3) (1,3) (0,3)

(7,0) (6,0) (5,0) (4,0) (3,0) (2,0) (1,0) (0,0)

Delay Comparison

50.7

102.7

218.7

484

45.376

128

224

0

100

200

300

400

500

600

8-bit 16-bit 32-bit 64-bit

Del

ay

MuxShifterDemuxShifter

τ

Impr. Ratio8-bit 10.7%16-bit 26.0%32-bit 41.5%64-bit 53.7%

Power Comparison

150.7484

1561.3

5220

146.9453.7

1397.6

4458.9

0

1000

2000

3000

4000

5000

6000

8-bit 16-bit 32-bit 64-bit

Tot

al S

witc

hed

Cap

acita

nce

MuxShifterDemuxShifter

Cg

Impr. Ratio8-bit 2.5%16-bit 6.3%32-bit 10.5%64-bit 14.6%

Delay: nPower: 3/16 n2

45

Cell Permutation• Fix the input/output stage, permute cells of intermediate

stages to further improve delay.• Formulate as an ILP problem and solve by CPLEX.

• Optimal solution in terms of delay• Delay/Power tradeoff

Additional Delay Reductionby Cell Permutation

13

29

61

125

818

38

78

0

20

40

60

80

100

120

140

8-bit 16-bit 32-bit 64-bit

Del

ay (

in u

nit

of c

olum

ns s

pann

ed)

w/o permuatation

w/ permutation

7 6 5 4 3 2 1 0

6 5 4 3 7 2 1 0

3 4 2 6 7 5 1 0

7 6 5 4 3 2 1 0

>> 1-bit

>> 2-bit

>> 4-bit

0 0 00 0 0 0 0

1 1 01 1 4 0 0 11130000

11141111

4 2 2 2 4 1 4 5 4 3 5 5 1 4 1 3

4 2 4 5 4 5 4 3

8 4 7 5 8 8 4 7 8 4 7 7 4 6 3 8

8 7 8 7 8 7 6 8

Optimal Timing solution for 8-bit

46

Conclusion

• Interconnect Technologies

• Geometrical Planning

• Interconnect Networks

• Power and Clock Distributions

• Functional Modules

47

Interconnect Technologies

mcr

crfl

ww

gn 242)1(2

41gw

wn

cr

crs

Example: w= 85nm, t= 145nm

Optimal interval

mmpsmfsccrrfllDelay wgwntrtr /194/194))1(22(/)(

Optimal buffer size

Optimal delay

rn= 10Kohm,cn=0.25fF,cg=2.34xcn=0.585fFrw=2ohm/um, cw=0.2fF/um

48

Experiments—Optimized Skew

s-mesh(s) m-mesh(s) ratio0.00 2.92E-11 2.92E-11 100.0%0.25 2.79E-11 2.60E-11 93.2%0.40 2.71E-11 2.45E-11 90.4%1.00 2.42E-11 1.98E-11 81.8%3.00 1.70E-11 1.24E-11 73.2%5.00 1.24E-11 8.72E-12 70.5%

skewtotal area

49

Robustness Against Supply Voltage Variations

ave worst ave worst0.00 2.10E-11 2.91E-11 2.10E-11 2.91E-111.00 8.38E-12 1.14E-11 8.26E-12 1.43E-112.00 2.71E-12 4.42E-12 6.18E-12 1.11E-113.00 1.89E-12 3.33E-12 4.83E-12 8.73E-124.00 1.45E-12 2.48E-12 3.88E-12 6.96E-125.00 1.16E-12 2.02E-12 3.18E-12 5.64E-12

total areamutli-level mesh single-level mesh

top related