minimal skew clock synthesis considering time-variant temperature gradient hao yu, yu hu, chun-chen...
TRANSCRIPT
Minimal Skew Clock Synthesis Considering Time-Variant Temperature Gradient
Minimal Skew Clock Synthesis Considering Time-Variant Temperature Gradient
Hao Yu, Yu Hu, Chun-Chen Liu and Lei He
EE Department, UCLA
Presented by Yu HuPresented by Yu Hu
Partially supported by SRC task 1116.Partially supported by SRC task 1116.
IntroductionIntroduction
Both process and operation variations cause uncertainties and may lead to design failure or over-design.
Process variations have been actively studied. Statistical timing analysis Stochastic optimization Post-silicon configuration
Stochastic optimization for operation variations below has been largely ignored Fluctuation of crosstalk noise and P/G network noise due to different
input vectors Time-variant on-chip temperature map over different workloads
This work is the first in-depth study on clock synthesis considering time-variant temperature variations
Limitation of Existing WorkLimitation of Existing Work
The existing work [Cho:ICCAD05] ignores the time-variant temperature variations and assumes a fixed temperature map
Different work loads lead to different temperature maps (e.g., two SPEC2000 applications: Ammp and Gzip)
Optimizing skew for one application hurts the skew for another application, this conflict is solved in this work
DSA=7ns
DSB =7ns
DSA=2ns
DSB =6ns
A A
B B
S S
Skew = 0ns Skew = 4ns
OutlineOutline
Modeling and Problem Formulation
Algorithms
Experimental Results
Conclusions
Stochastic Temperature ModelStochastic Temperature Model The temperature map is unique for each application or program phase
can be obtained by uArch-level simulation
For each region of the chip, temperature is characterized by its mean and variance over a number of maps Primary component analysis (PCA) to decide # of maps
Temperature correlation measured as covariance between regions is high over SPEC2000 benchmark set
Considering temperature
correlations during optimization can
compress searching space!
(i,j) Correlation between region i and j
Problem FormulationProblem Formulation Given:
The source, sinks and an initial tree embedding A set of temperature maps for a benchmark set
Design freedoms: Re-embedding of clock tree Cross link insertion
To minimize the worst case skew among given temperature maps
OutlineOutline
Modeling and Problem Formulation
Algorithms
Experimental Results
Conclusions
Bottom-up Greedy-based Re-embeddingBottom-up Greedy-based Re-embedding
d
x y
a b c
v
a
b
vd
c
x
y
Sink
Original merging point
Re-embedding option
Bottom-up Greedy-based Re-embeddingBottom-up Greedy-based Re-embedding
a
b
vd
c
x
y
d
x y
a b c
v
New merging point
Perturbed Modified Nodal Analysis (MNA) x is for source, sinks and merging point L selects sink responses Defining a new state variable with both nominal (x) and
sensitivity (Δx) [key to triangulate the system]
Structured and parameterized state matrix
Delay and Skew with Re-embeddingDelay and Skew with Re-embedding
The number of re-embedding options I=5N is huge!
(N is number of merging points)
Compressing Solution Space by Temperature CorrelationCompressing Solution Space by Temperature Correlation
Motivation Highly correlated merging points should be re-embedded in the
same fashion
Solution Calculate correlation between two merging points based on
temperature correlations Cluster merging points based on correlation strength Perform the same re-embedding for all points within one cluster
Temperature Correlation Driven ClusteringTemperature Correlation Driven Clustering
Correlation matrix C of merging points is low-ranked, and Singular Value Decomposition (SVD) reveals the rank K
Partition the merging points into K clusters (K-Means) Maximize the correlation strength within each of K clusters
Low-Rank Approx.C KC
K = 4, N = 70
Reduced from 570 to 54
Recap of Skew Calculation with Re-embeddingRecap of Skew Calculation with Re-embedding
G0
(MxM)
DG1
(MxM)
G0
(MxM)
DGN
(MxM)
G0
(MxM)
0(MxM)
DG2
(MxM)
G0
(MxM)
0(MxM)
0(MxM)
G0
(MxM)
DG1
(MxM)
G0
(MxM)
DGK
(MxM)
G0
(MxM)
0(MxM)
G0
(mxm)
DG1
(mxm)
G0
(mxm)
DGK
(mxm)
G0
(mxm)
0(mxm)
Cluster based reduction
(SVD + K-Means)
Block
-wis
e
MOR
[Yu e
t al,
DAC’06]
(Bes
t pap
er a
ward n
omin
ee)
Transient time analysis
(Back-Euler)
Time domain
Vol
tage
resp
onse
K << N
Delay and Skew
Simultaneous Re-embedding and Cross Link InsertionSimultaneous Re-embedding and Cross Link Insertion
1. Decide crosslink candidates according to [Rajaram, DAC04]
2. Cluster crosslink candidates again based on the temperature correlation
3. Calculate skew sensitivities w.r.t. crosslink and re-embedding candidates In a fashion similar to the previous triangular block-wise MOR
4. Bottom-up select the best crosslink or re-embedding
OutlineOutline
Modeling and Problem Formulation
Algorithms
Experimental Results
Conclusions
Experimental SettingsExperimental Settings
Temperature maps are obtained by micro-architecture level power-temperature transient simulator [Liao,TCAD’05] with 6 SPEC2000 applications
100 temperature maps, one for each 10 million clock cycles
Compare four algorithms (two categories) Traditional optimization under nominal temperature and Elmore
delay DME: deferred merging-point embedding to minimize wire-length
for zero-skew xlink: cross-link insertion [Rajaram, ICCAD'04]
The proposed algorithms with temperature variation and high-order delay model
re-embed: re-embedding xlink+ Re-embed: simultaneously re-embedding and cross-link
insertion
Skew Distribution Over 100 Temperature MapsSkew Distribution Over 100 Temperature Maps
X+R = cross link insertion + re-embedding DME = Deferred Merging points Embedding
Worst-case SkewWorst-case Skew
For tree structure, re-embed reduces the worst-case skew by 3x on average (up to 20x) compared to DME.
For non-tree structure, xlink+re-embed reduces the worst-case skew by 30% on average (up to 7x) compared to xlink.
1
10
100
1000
10000
r1 r2 r3 r4 r5
DME xlink re-embed xlink+re-embed
ps
For tree structure, re-embed has less than 1% wire length overhead compared to DME
For non-tree structure, xlink+re-embed has 5% LESS wire length compared to xlink.
0.00E+00
2.00E+06
4.00E+06
6.00E+06
8.00E+06
1.00E+07
1.20E+07
1.40E+07
r1 r2 r3 r4 r5
DME re-embed xlink xlink+re-embed
Wire LengthWire Length
Temperature-aware optimizations (re-embed and xlink+re-embed) are about 10x slower compared to DME and xlink, respectively, but Our work uses high-order delay model DME and xlink use Elmore delay
RuntimeRuntime
Input(node) DME xlink re-embed xlink+re-embed r1(267) 0.5 1.1 1.1 1.4 r2(598) 1.0 3.2 4.5 5.3 r3(862) 1.4 4.7 6.1 13.2
r4(1903) 2.1 5.5 33.6 59.0 r5(3101) 6.2 11.4 86.6 191.4
GeoMean 1.6 4.0 9.7 16.2
ratio 1 2x 6x 10x
Conclusions Conclusions
Studied the clock optimization for workload dependent temperature variation Reduced the worst-case skew by up to 7X with
LESS wire-length compared to best existing method
Correlation-aware modeling and optimization paradigm can be extended to handle PVT variations, and more design freedoms “Temperature Aware Microprocessor Floorplanning
Considering Application Dependent Power Load” [Chu et al, ICCAD07]
“Efficient Decoupling Capacitance Budgeting Considering Operation and Processing Variations” [Shi et al, finalist for Best Paper, ICCAD07]
Thank you!Thank you!
SRC TechCon 2007
Hao Yu (graduated), Yu Hu (presenter),
Chun-Chen Liu and Lei He (PI)
Minimal Skew Clock Embedding Considering Time Variant Temperature Gradient