minimal skew clock synthesis considering time-variant temperature gradient hao yu, yu hu, chun-chen...

22
Minimal Skew Clock Synthesis Considering Time-Variant Temperature Gradient Hao Yu, Yu Hu, Chun-Chen Liu and Lei He EE Department, UCLA Presented by Yu Hu Presented by Yu Hu Partially supported by SRC task 1116. Partially supported by SRC task 1116.

Upload: cori-hudson

Post on 18-Dec-2015

228 views

Category:

Documents


1 download

TRANSCRIPT

Page 1: Minimal Skew Clock Synthesis Considering Time-Variant Temperature Gradient Hao Yu, Yu Hu, Chun-Chen Liu and Lei He EE Department, UCLA Presented by Yu

Minimal Skew Clock Synthesis Considering Time-Variant Temperature Gradient

Minimal Skew Clock Synthesis Considering Time-Variant Temperature Gradient

Hao Yu, Yu Hu, Chun-Chen Liu and Lei He

EE Department, UCLA

Presented by Yu HuPresented by Yu Hu

Partially supported by SRC task 1116.Partially supported by SRC task 1116.

Page 2: Minimal Skew Clock Synthesis Considering Time-Variant Temperature Gradient Hao Yu, Yu Hu, Chun-Chen Liu and Lei He EE Department, UCLA Presented by Yu

IntroductionIntroduction

Both process and operation variations cause uncertainties and may lead to design failure or over-design.

Process variations have been actively studied. Statistical timing analysis Stochastic optimization Post-silicon configuration

Stochastic optimization for operation variations below has been largely ignored Fluctuation of crosstalk noise and P/G network noise due to different

input vectors Time-variant on-chip temperature map over different workloads

This work is the first in-depth study on clock synthesis considering time-variant temperature variations

Page 3: Minimal Skew Clock Synthesis Considering Time-Variant Temperature Gradient Hao Yu, Yu Hu, Chun-Chen Liu and Lei He EE Department, UCLA Presented by Yu

Limitation of Existing WorkLimitation of Existing Work

The existing work [Cho:ICCAD05] ignores the time-variant temperature variations and assumes a fixed temperature map

Different work loads lead to different temperature maps (e.g., two SPEC2000 applications: Ammp and Gzip)

Optimizing skew for one application hurts the skew for another application, this conflict is solved in this work

DSA=7ns

DSB =7ns

DSA=2ns

DSB =6ns

A A

B B

S S

Skew = 0ns Skew = 4ns

Page 4: Minimal Skew Clock Synthesis Considering Time-Variant Temperature Gradient Hao Yu, Yu Hu, Chun-Chen Liu and Lei He EE Department, UCLA Presented by Yu

OutlineOutline

Modeling and Problem Formulation

Algorithms

Experimental Results

Conclusions

Page 5: Minimal Skew Clock Synthesis Considering Time-Variant Temperature Gradient Hao Yu, Yu Hu, Chun-Chen Liu and Lei He EE Department, UCLA Presented by Yu

Stochastic Temperature ModelStochastic Temperature Model The temperature map is unique for each application or program phase

can be obtained by uArch-level simulation

For each region of the chip, temperature is characterized by its mean and variance over a number of maps Primary component analysis (PCA) to decide # of maps

Temperature correlation measured as covariance between regions is high over SPEC2000 benchmark set

Considering temperature

correlations during optimization can

compress searching space!

(i,j) Correlation between region i and j

Page 6: Minimal Skew Clock Synthesis Considering Time-Variant Temperature Gradient Hao Yu, Yu Hu, Chun-Chen Liu and Lei He EE Department, UCLA Presented by Yu

Problem FormulationProblem Formulation Given:

The source, sinks and an initial tree embedding A set of temperature maps for a benchmark set

Design freedoms: Re-embedding of clock tree Cross link insertion

To minimize the worst case skew among given temperature maps

Page 7: Minimal Skew Clock Synthesis Considering Time-Variant Temperature Gradient Hao Yu, Yu Hu, Chun-Chen Liu and Lei He EE Department, UCLA Presented by Yu

OutlineOutline

Modeling and Problem Formulation

Algorithms

Experimental Results

Conclusions

Page 8: Minimal Skew Clock Synthesis Considering Time-Variant Temperature Gradient Hao Yu, Yu Hu, Chun-Chen Liu and Lei He EE Department, UCLA Presented by Yu

Bottom-up Greedy-based Re-embeddingBottom-up Greedy-based Re-embedding

d

x y

a b c

v

a

b

vd

c

x

y

Sink

Original merging point

Re-embedding option

Page 9: Minimal Skew Clock Synthesis Considering Time-Variant Temperature Gradient Hao Yu, Yu Hu, Chun-Chen Liu and Lei He EE Department, UCLA Presented by Yu

Bottom-up Greedy-based Re-embeddingBottom-up Greedy-based Re-embedding

a

b

vd

c

x

y

d

x y

a b c

v

New merging point

Page 10: Minimal Skew Clock Synthesis Considering Time-Variant Temperature Gradient Hao Yu, Yu Hu, Chun-Chen Liu and Lei He EE Department, UCLA Presented by Yu

Perturbed Modified Nodal Analysis (MNA) x is for source, sinks and merging point L selects sink responses Defining a new state variable with both nominal (x) and

sensitivity (Δx) [key to triangulate the system]

Structured and parameterized state matrix

Delay and Skew with Re-embeddingDelay and Skew with Re-embedding

The number of re-embedding options I=5N is huge!

(N is number of merging points)

Page 11: Minimal Skew Clock Synthesis Considering Time-Variant Temperature Gradient Hao Yu, Yu Hu, Chun-Chen Liu and Lei He EE Department, UCLA Presented by Yu

Compressing Solution Space by Temperature CorrelationCompressing Solution Space by Temperature Correlation

Motivation Highly correlated merging points should be re-embedded in the

same fashion

Solution Calculate correlation between two merging points based on

temperature correlations Cluster merging points based on correlation strength Perform the same re-embedding for all points within one cluster

Page 12: Minimal Skew Clock Synthesis Considering Time-Variant Temperature Gradient Hao Yu, Yu Hu, Chun-Chen Liu and Lei He EE Department, UCLA Presented by Yu

Temperature Correlation Driven ClusteringTemperature Correlation Driven Clustering

Correlation matrix C of merging points is low-ranked, and Singular Value Decomposition (SVD) reveals the rank K

Partition the merging points into K clusters (K-Means) Maximize the correlation strength within each of K clusters

Low-Rank Approx.C KC

K = 4, N = 70

Reduced from 570 to 54

Page 13: Minimal Skew Clock Synthesis Considering Time-Variant Temperature Gradient Hao Yu, Yu Hu, Chun-Chen Liu and Lei He EE Department, UCLA Presented by Yu

Recap of Skew Calculation with Re-embeddingRecap of Skew Calculation with Re-embedding

G0

(MxM)

DG1

(MxM)

G0

(MxM)

DGN

(MxM)

G0

(MxM)

0(MxM)

DG2

(MxM)

G0

(MxM)

0(MxM)

0(MxM)

G0

(MxM)

DG1

(MxM)

G0

(MxM)

DGK

(MxM)

G0

(MxM)

0(MxM)

G0

(mxm)

DG1

(mxm)

G0

(mxm)

DGK

(mxm)

G0

(mxm)

0(mxm)

Cluster based reduction

(SVD + K-Means)

Block

-wis

e

MOR

[Yu e

t al,

DAC’06]

(Bes

t pap

er a

ward n

omin

ee)

Transient time analysis

(Back-Euler)

Time domain

Vol

tage

resp

onse

K << N

Delay and Skew

Page 14: Minimal Skew Clock Synthesis Considering Time-Variant Temperature Gradient Hao Yu, Yu Hu, Chun-Chen Liu and Lei He EE Department, UCLA Presented by Yu

Simultaneous Re-embedding and Cross Link InsertionSimultaneous Re-embedding and Cross Link Insertion

1. Decide crosslink candidates according to [Rajaram, DAC04]

2. Cluster crosslink candidates again based on the temperature correlation

3. Calculate skew sensitivities w.r.t. crosslink and re-embedding candidates In a fashion similar to the previous triangular block-wise MOR

4. Bottom-up select the best crosslink or re-embedding

Page 15: Minimal Skew Clock Synthesis Considering Time-Variant Temperature Gradient Hao Yu, Yu Hu, Chun-Chen Liu and Lei He EE Department, UCLA Presented by Yu

OutlineOutline

Modeling and Problem Formulation

Algorithms

Experimental Results

Conclusions

Page 16: Minimal Skew Clock Synthesis Considering Time-Variant Temperature Gradient Hao Yu, Yu Hu, Chun-Chen Liu and Lei He EE Department, UCLA Presented by Yu

Experimental SettingsExperimental Settings

Temperature maps are obtained by micro-architecture level power-temperature transient simulator [Liao,TCAD’05] with 6 SPEC2000 applications

100 temperature maps, one for each 10 million clock cycles

Compare four algorithms (two categories) Traditional optimization under nominal temperature and Elmore

delay DME: deferred merging-point embedding to minimize wire-length

for zero-skew xlink: cross-link insertion [Rajaram, ICCAD'04]

The proposed algorithms with temperature variation and high-order delay model

re-embed: re-embedding xlink+ Re-embed: simultaneously re-embedding and cross-link

insertion

Page 17: Minimal Skew Clock Synthesis Considering Time-Variant Temperature Gradient Hao Yu, Yu Hu, Chun-Chen Liu and Lei He EE Department, UCLA Presented by Yu

Skew Distribution Over 100 Temperature MapsSkew Distribution Over 100 Temperature Maps

X+R = cross link insertion + re-embedding DME = Deferred Merging points Embedding

Page 18: Minimal Skew Clock Synthesis Considering Time-Variant Temperature Gradient Hao Yu, Yu Hu, Chun-Chen Liu and Lei He EE Department, UCLA Presented by Yu

Worst-case SkewWorst-case Skew

For tree structure, re-embed reduces the worst-case skew by 3x on average (up to 20x) compared to DME.

For non-tree structure, xlink+re-embed reduces the worst-case skew by 30% on average (up to 7x) compared to xlink.

1

10

100

1000

10000

r1 r2 r3 r4 r5

DME xlink re-embed xlink+re-embed

ps

Page 19: Minimal Skew Clock Synthesis Considering Time-Variant Temperature Gradient Hao Yu, Yu Hu, Chun-Chen Liu and Lei He EE Department, UCLA Presented by Yu

For tree structure, re-embed has less than 1% wire length overhead compared to DME

For non-tree structure, xlink+re-embed has 5% LESS wire length compared to xlink.

0.00E+00

2.00E+06

4.00E+06

6.00E+06

8.00E+06

1.00E+07

1.20E+07

1.40E+07

r1 r2 r3 r4 r5

DME re-embed xlink xlink+re-embed

Wire LengthWire Length

Page 20: Minimal Skew Clock Synthesis Considering Time-Variant Temperature Gradient Hao Yu, Yu Hu, Chun-Chen Liu and Lei He EE Department, UCLA Presented by Yu

Temperature-aware optimizations (re-embed and xlink+re-embed) are about 10x slower compared to DME and xlink, respectively, but Our work uses high-order delay model DME and xlink use Elmore delay

RuntimeRuntime

Input(node) DME xlink re-embed xlink+re-embed r1(267) 0.5 1.1 1.1 1.4 r2(598) 1.0 3.2 4.5 5.3 r3(862) 1.4 4.7 6.1 13.2

r4(1903) 2.1 5.5 33.6 59.0 r5(3101) 6.2 11.4 86.6 191.4

GeoMean 1.6 4.0 9.7 16.2

ratio 1 2x 6x 10x

Page 21: Minimal Skew Clock Synthesis Considering Time-Variant Temperature Gradient Hao Yu, Yu Hu, Chun-Chen Liu and Lei He EE Department, UCLA Presented by Yu

Conclusions Conclusions

Studied the clock optimization for workload dependent temperature variation Reduced the worst-case skew by up to 7X with

LESS wire-length compared to best existing method

Correlation-aware modeling and optimization paradigm can be extended to handle PVT variations, and more design freedoms “Temperature Aware Microprocessor Floorplanning

Considering Application Dependent Power Load” [Chu et al, ICCAD07]

“Efficient Decoupling Capacitance Budgeting Considering Operation and Processing Variations” [Shi et al, finalist for Best Paper, ICCAD07]

Page 22: Minimal Skew Clock Synthesis Considering Time-Variant Temperature Gradient Hao Yu, Yu Hu, Chun-Chen Liu and Lei He EE Department, UCLA Presented by Yu

Thank you!Thank you!

SRC TechCon 2007

Hao Yu (graduated), Yu Hu (presenter),

Chun-Chen Liu and Lei He (PI)

Minimal Skew Clock Embedding Considering Time Variant Temperature Gradient