clock tree synthesis with data-path sensitivity matching

29
Clock Tree Synthesis with Data-Path Sensitivity Matching Prof. Matthew R. Guthaus, UC Santa Cruz Prof. Dennis Sylvester, University of Michigan Prof. Richard B. Brown, University of Utah

Upload: others

Post on 25-Nov-2021

11 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Clock Tree Synthesis with Data-Path Sensitivity Matching

Clock Tree Synthesis with Data-PathSensitivity Matching

Prof. Matthew R. Guthaus, UC Santa CruzProf. Dennis Sylvester, University of Michigan

Prof. Richard B. Brown, University of Utah

Page 2: Clock Tree Synthesis with Data-Path Sensitivity Matching

2

Clock Distribution Networks• Uniform (H-Tree)

– Moderate power consumption– Fairly robust– Sinks are not usually uniform

• Balanced Tree [Tsay ICCADʻ91, Chao et al.DACʻ92, Boese et al ASICʻ92]– Minimum wire length– Sensitive to process parameters

• Spines [Tam et al ISSCCʼ06]– Used by Intel (P6, Xeon MP)– Variations within and between spines still exists

• Grids [Anderson et al ISSCCʼ06, Golden et alISSCCʼ06]– Used by IBM (Power4) and AMD (Hammer)– Low variation, but huge power overhead

Page 3: Clock Tree Synthesis with Data-Path Sensitivity Matching

3

Types of Variation

Lot-to-lot Die-to-die Within dieWafer-to-wafer

Inter-die Intra-die

• Environmental (temperature, voltage, etc.)• Physical (lithography, materials, etc.)• Fatigue (NBTI, metal migration, etc.)

Page 4: Clock Tree Synthesis with Data-Path Sensitivity Matching

4

Variation Source Assumptions

W

HS

T

ρL

Leff= 53nm±16.7%Vthp= 0.232±30%Vthn= -0.273±30%

W =175nm±32nmS = 175nm±32nmH = 280nm±15%T = 280nm±10%ρ = 2.2e-8±30%

Vth

Indep.

25%Corr. Corr.

Page 5: Clock Tree Synthesis with Data-Path Sensitivity Matching

5

Improving Robustness

• Variation is a major concernin clock distribution

• Current Options– Corner-based optimization

• Process-Voltage-Temp(PVT)

• Risky, Pessimistic, etc.– Direct statistical

optimization• Many simplifications or

expensive to compute• Can heuristics still help

clock tree optimization?

P2

P1

Slow

Typical

Fast

P

50% 99.8%

Skew

Y%

Page 6: Clock Tree Synthesis with Data-Path Sensitivity Matching

6

Expected Skew of “Zero Skew” Trees

Increasing number of buffers+wires

Page 7: Clock Tree Synthesis with Data-Path Sensitivity Matching

7

Skew Calculation

• Elmore delay is not accurate,but high fidelity.

• Fast for optimization.• S2M for slew calculation.• Operations Required

– Add/Subtract– Mult– Maximum/Minimum

• Delay(s,3)=0.69*(R1(C1+C2+C3+C4+C5)+R2(C2+C3+C5)+R3C3)

• Skew– Maximum Difference (global

skew)– Maximum Path-Connected

Difference (local skew)

R1

C1

sC4

C3

C2

R2

C5

R5

1

4

2

3

5

R3

R4

[Elmore, J. App. Physics 1948][Agarwal et al., TCAD 2004]

Page 8: Clock Tree Synthesis with Data-Path Sensitivity Matching

8

Parameterized FormNominal

GaussianRandomVariables

IndependentCorrelated

Sensitivity

!

N(µ," ) = N(0,1)

Page 9: Clock Tree Synthesis with Data-Path Sensitivity Matching

9

Parametric Operations

• Addition and Subtraction– Add/Sub the means– Correlated: Add/sub std. dev.– Independent: Root-sum-square std. dev.

• Multiplication– Many non-linear cross terms– Showed that approximating cross-terms as random variation

works well• Maximum and Minimum

– First and second moments calculated analytically [Clark1961,Cain 1994]

– Sensitivities approximated by proportional weight[Visweswariah et al., DAC 2004]

Page 10: Clock Tree Synthesis with Data-Path Sensitivity Matching

10

Top-Down Statistical Analysis

• Parameterized R, C,and D values.

• First bottom-uppropagate total sub-treecapacitances, Ci

• Top-down propagateparameterized delays,Di

• Skew is Max(Di -Dj) forsinks i and j

!

Dx

= Di+ R

ix(Cix

2+ C

x)

Rix,Cixi

x y

Riy,Ciy

Cx Cy

Page 11: Clock Tree Synthesis with Data-Path Sensitivity Matching

11

Clock Tree Tuning• Start with DME + Buffered tree• “In Place” Optimization• Select Buffer Sizes and Wire Widths to Minimize Skew while

Increasing Robustness• Buffer/Wire Sizes

– Two stage buffer with fixed internal gain– Continuous range of buffer output sizes– Continuous range of wire widths– Minimum and maximum limits for both sizes

M2

M3

Size wire segmentto change wireresistance andcapacitance

Size buffer to change loadand drive strength

Page 12: Clock Tree Synthesis with Data-Path Sensitivity Matching

12

Sequential LP for Clock Skew

Linear Delay Constraints

Minimum Skew Objective

Similar to Wang and Marek-Sadowska, DAC 2004, but for skew rather than power minimization.

Power BoundSimple Bounds

Page 13: Clock Tree Synthesis with Data-Path Sensitivity Matching

13

Buffer/Wire Size Changes

y x

S1 S2 S2 S3

Linear Delay Constraints

• Perturb & Difference canbe used with anyanalysis

• More buffers providesbetter incrementalanalysis

NotTraversed

Page 14: Clock Tree Synthesis with Data-Path Sensitivity Matching

14

Sequential Quadratic Formulation

• We are NOTapproximating skew orconstraints with asecond-order function– Indirect optimization– Convex cost function

• Minimize total energy– Force = k*x– Energy = k*x*x !

"(S) = wij (si + bij # s j )2

i> j

$

Useful SkewWeight

Sink Delays

[Guthaus et al., DAC 2006.]

Page 15: Clock Tree Synthesis with Data-Path Sensitivity Matching

15

• Power Bound– Dominated by dynamic power so capacitance rather than

true power is bounded– Constraint ensures total size changes are still below power

limit

• Simple Bounds– Linearity of sink delay is only valid in a small range so we

restrict the size changes by epsilon– Technology places hard upper/lower limits on buffer and

wire sizes

Additional Constraints

Page 16: Clock Tree Synthesis with Data-Path Sensitivity Matching

16

R1 Linear vs Quadratic “Push Out”

• Maximum Skew– SLP: 7ps– SQP: 15ps

• Pairs within 1psof crititical– SLP: >7,000

pairs– SQP: 12 pairs

• Mean push out– Almost 8ps

[Guthaus et al., DAC 2006.]

Page 17: Clock Tree Synthesis with Data-Path Sensitivity Matching

17

SLP vs. SQP Skew (50% Cap. Increase)

Page 18: Clock Tree Synthesis with Data-Path Sensitivity Matching

18

Why preserve sensitivities?

• Sensitivities attributevariability to a particularsource

• Underlying sources ofvariation are defined as“correlated”

• Correlated sensitivities can“cancel out” whereasindependent sensitivitiesaccumulate as root-sum-of-squares !

N(µ1"µ

2,#

1"#

2)

!

N(µ1"µ

2, #

1

2 +#2

2)

Correlated

Independent

Page 19: Clock Tree Synthesis with Data-Path Sensitivity Matching

19

Correlation Definitions

• Defines tendency for eventsto track

• Formalized with the Pearsoncorrelation coefficient

• Can also be definedgeometrically as cosine ofangle between two eventvectors

Page 20: Clock Tree Synthesis with Data-Path Sensitivity Matching

20

Geometric Interpretation of Correlation

• Parameterized form isalready centered

• Sensitivity coefficients arelinear

• Define the sensitivity vector,R:

Parameter 1

Para

met

er 2

Page 21: Clock Tree Synthesis with Data-Path Sensitivity Matching

21

Heuristic for Increasing Correlation

• Include Nominal as aparameter

• Angle is approximatelyproportional to distancesquared

• Maximizing correlation,cos(theta), is same asminimizing angle

• SQP can be used forsquared objectives

Parameter 1 Sensitivity

Nom

inal

Del

ay

Page 22: Clock Tree Synthesis with Data-Path Sensitivity Matching

22

Improvement of s1423

Very Low Nominal Skew

Very Low Nominal Skew

Statistical Tuning

Deterministic Tuning

Pre-Tuning~5%improvement in99.8% quantile

Page 23: Clock Tree Synthesis with Data-Path Sensitivity Matching

23

Infeasible Improvement

• Sometimes improvement is infeasible– Wire assignment is fixed– Contradiction of forces can result in zero improvement– Mutually exclusive sensitivities can result in zero

improvement• No improvement for other benchmarks

– Same results as deterministic SQP heuristic– But still better than SLP

• Does this mean the idea is bad? No.– Consider local, not global, skew with data-path sensitivities.

Page 24: Clock Tree Synthesis with Data-Path Sensitivity Matching

24

Timing Constraints Revisited

Tdmax

Setup ConstraintTcq+Tdmax+Tsetup+Tc1-Tc2< P

-Tc2

TsetupTcq

Tc1

Hold ConstraintTcq+ Tdmin+Thold+Tc1-Tc2>0

TholdTdmin

-Tc2

Tcq

Tc1

Page 25: Clock Tree Synthesis with Data-Path Sensitivity Matching

25

Beyond Useful Skew: Useful Variation

Tdmax

Setup ConstraintTcq+(Tdmax-Tc2 +Tc1)+Tsetup< P

-Tc2

TsetupTcq

Tc1

ImproveCorrelation

Cancellationof max delayand clockskew variationmakes thedesign morerobust. Improve

Correlation

Page 26: Clock Tree Synthesis with Data-Path Sensitivity Matching

26

Deterministic SQP vs. Statistical SQP

Increasing number of buffers+wires

Page 27: Clock Tree Synthesis with Data-Path Sensitivity Matching

27

Run-Time Costs

• Up to 50x the run-time due to naïve gradientcomputation

• Evaluation of 12 random variables• Performed all optimization using new method• Can be used for “fine tuning” after deterministic

optimization instead

Page 28: Clock Tree Synthesis with Data-Path Sensitivity Matching

28

Conclusions

• New technique for improved correlation– Uses distance between canonical vector delay

representations– Matches nominal delay– Matches first order sensitivities– Minimizes uncorrelated sensitivity

• Data-path variation awareness• Average of 16.3% better expected skew• Average of 11.9% improved mean + 3-sigma

Page 29: Clock Tree Synthesis with Data-Path Sensitivity Matching

29