Outlinebull Introductionbull Modeling of IC Variabilitybull Tolerance of IC Variabilitybull Margining of IC Variabilitybull Conclusions
6ISVLSI-2014 invited talk 140710
BEOL Corner Optimization
bull 20nm and below increased timing variation due to interconnect R Cbull Design closure becomes much more difficult
bull Costs of BEOL variationsbull More design effort (eg ldquolast monthrdquo of manual ECO iteration)
bull Compromised circuit performance at high Vdd
bull Recent work reduce signoff margin by using tightened BEOL corners without sacrificing parametric yieldbull Signoff at conventional BEOL corners is pessimistic for most timing-
critical pathsbull We identify paths which can be safely signed off using tightened
BEOL corners (TBC)bull Joint work with Sorin Dobre (Qualcomm) and Tuck-Boon Chan
7ISVLSI-2014 invited talk 140710
Proposed Timing Signoff Flow
Routed design
Timing analysis using conventional BEOL corners (CBC)
ECOusing CBC
violation = 0
done
Conventional Signoff
No
Routed design
Classify timing critical paths
GTBC GCBC
ECOusing CBC
Timing analysis
using TBC
violation = 0
Timing analysis
using CBC
violation = 0
ECOusing TBC
done
This work
NoNo
8ISVLSI-2014 invited talk 140710
Conventional BEOL Corners
bull Three major variation sources per layer ΔW ΔT ΔHbull Conventional BEOL corners (CBC)
bull Homogeneous corners all variation sources are skewed in the same direction
bull BEOL RC variations are modeled in interconnect technology file (itf)
M2
M3
M1
S2 W2T2
H2 Inter-layer dielectric
Inter-metal dielectric
H3
H1
T1
T3
ΔW ΔT ΔH
Ytyp typical typical Typical
Ycb min min max
Ycw max max min
Yrcb max max max
Yrcw min min min
9ISVLSI-2014 invited talk 140710
Statistical RC Modelbull 3 variation sources in each layer ΔW ΔT ΔH
bull 9-layer metal stack has 27 variation sources z1 z2 hellip z27
bull BEOL layers in the same process module use the same manufacturing equipment and process steps
bull zu and zv are correlated if and only ifbull zu and zv are the same type (ΔW ΔT or ΔH)
bull zu and zv are in the same process module
M2 z4 z5 z6
M4 z10 z11 z12
M3 z7 z8 z9
M5 z13 z14 z15
M6 z16 z17 z18
M7 z19 z20 z21
M8 z22 z23 z24
M9 z25 z26 z27
M1 z1 z2 z3
Process module 3
Process module 2
Process module 1
Examples bull ΔW in layer M4 has a
positive correlation with ΔW in layers M5 M6 and M7
bull But ΔW in layer M4 is not correlated with ΔT in M4
ΔW ΔT ΔH
10ISVLSI-2014 invited talk 140710
Pessimism of Conventional BEOL Corners (CBC)
bull Assumption a max (setup) path pj is ldquosaferdquo when delay evaluated at a given CBC is larger than nominal delay + 3σj
dj(YCBC) ge 3σj + dj(Ytyp)
bull For a given path we can compare the statistical delay variation and the delay obtained from a given CBC αj = 3σj Δdj(YCBC)
Δdelay (vs typ) at C-worst [d(Ycw) ndash d(Ytyp)] d(Ytyp)
bull Some paths have α gt 10 a CBC can underestimate delay variationsbull But these paths often have smaller α values at the other corner ()
C-worst corner underestimates delay variations but these paths are dominated by the RC-worst corner
α lt 10 here delay variations covered by RC-worst corner
Dominated by C-worstΔdelay at C-worst gt Δdelay at RC-worst
Dominated by RC-worst Δdelay at RC-worst gt Δdelay at C-worst
Δdelay (vs typ) at RC-worst [d(Ycw) ndash d(Ytyp)] d(Ytyp)
12ISVLSI-2014 invited talk 140710
α α
Δdelay at C-worst [d(Ycw) ndash d(Ytyp)] d(Ytyp)
bull Some paths have α gt 10 a CBC can underestimate delay variationsbull But these paths often have smaller α values at the other corner ()
C-worst corner underestimates delay variations but these paths are dominated by the RC-worst corner
α lt 10 delay variations are covered by the RC-worst corner
Dominated by C-worstΔdelay at C-worst gt Δdelay at RC-worst
Dominated by RC-worst Δdelay at RC-worst gt Δdelay at C-worst
Δdelay at RC-worst [d(Ycw) ndash d(Ytyp)] d(Ytyp)
bull Paths are more sensitive to R or to Cbull Using RC-worst or C-worst only will underestimate delay variationsbull Need both RC- and C-worst corners to cover process variations
In the following α is defined at the dominant corner
Intuition on Delay Variability Across Cw RCw
13ISVLSI-2014 invited talk 140710
Scaling Factor α and Delay Variationbull Paths with small Δdrcw and Δdcw have large α
bull Eg here we see αj gt 06 when ((Δdrcw lt 3) AND (Δdcw lt 3))
bull Identify paths for tightened BEOL corners based on Δdrcw and Δdcw
Δd(Ycw)d(Ytyp)
Δd(Yrcw)d(Ytyp) α
14ISVLSI-2014 invited talk 140710
bull Paths with small Δdrcw and Δdcw have large α
bull Eg there are αj gt 06 when ((Δdrcw lt 3) AND (Δdcw lt 3))
bull Identify paths for tightened BEOL corners based on Δdrcw and Δdcw
Find Paths for Which TBCs Can Be Used
Δd(Ycw)d(Ytyp)
Δd(Yrcw)d(Ytyp)
Acw
Arcw
Gtbc = Set of paths that can be safely signed off using TBC ( (Path with Δdcw larger than Acw) OR (Path with Δdrcw larger than Arcw) )
α
15ISVLSI-2014 invited talk 140710
Determining α Arcw and Acw
Δd at C-worst corner ()Δd at RC-worst corner ()
bull Assumption critical paths in different designs have similar trends
bull Extract Arcw and Acw from a set of representative paths
bull Plot α vs Δdelay find Arcw and Acw for a given α
bull Add +1 margin on Arcw and Acw to account for sampling error
bull Smaller α larger thresholds (Arcw and Acw) fewer paths in GTBC
Δd at C-worst corner ()
Arcw Acw
16ISVLSI-2014 invited talk 140710
Benefits of Tightened BEOL Corners
bull WNS and TNS are reduced by up to 100ps and 53nsbull Timing violations reduced by
24 to 100
bull TBC-06 more benefits bull Tradeoff between reduced margin
vs paths which use TBC
Correlation factor γ = 05
LEON SUPERBLUE12 NETCARD
-018-016-014-012
-01-008-006-004-002
0
CBC TBC-05 TBC-06 TBC-07
WN
S (n
s)
LEON SUPERBLUE12 NETCARD
-90-80-70-60-50-40-30-20-10
0
CBC TBC-05 TBC-06 TBC-07
TNS
(ns)
LEON SUPERBLUE12 NETCARD0
200400600800
1000120014001600
CBC TBC-05 TBC-06 TBC-07
Tim
ing
viol
ation
s
17ISVLSI-2014 invited talk 140710
Outlinebull Introductionbull Modeling of IC Variabilitybull Tolerance of IC Variabilitybull Margining of IC Variabilitybull Conclusions
18ISVLSI-2014 invited talk 140710
How to Minimize Cost of Resilience bull Additional circuits area and power penaltiesbull Recovery from errors throughput degradationbull Large hold margin short-path padding costbull Want benefits (eg energy) to maximally outweigh costs
Razor Razor-Lite TIMBER
Razor Razor-Lite TIMBER
Power penalty 30 [Das08] ~0 [Kim13] 100 [Choudhury09]
Overall Optimization Flowbull Iteratively optimize with SEOpt and SkewOpt
Initial placement (all FFs = error-tolerant FFs)
Energy lt min energy
Save current solution
Margin insertion on K paths based on sensitivity function
Replace error-tolerant FFs w normal FFs
SEOpt
Activity aware clock skew optimization
SkewOpt
OR-tree insertion
23ISVLSI-2014 invited talk 140710
Benefit of Low-Cost Resiliencebull Reference flows
bull Pure-margin (PM) conventional method w only margin insertionbull Brute-force (BF) use error-tolerant FFs for timing-critical endpoints
bull Proposed method (CO) achieves up to 21 energy reduction compared to reference methods
bull Resilience benefits increase with larger process variation
PM BF CO PM BF CO PM BF CO27
29
31
33
35
37
En
erg
y (
mJ
)
PM BF CO PM BF CO PM BF CO22
26
30
34
38Energy penalty of throughput degradation
Energy penalty of additional circuits
Energy wo resilience
En
erg
y (
mJ
)
Large marginMedium marginSmall margin
MUL
EXU
Large marginMedium marginSmall margin
Smallmediumlarge margin 1σ2σ3σ for SS corner Technology foundry 28nm
24ISVLSI-2014 invited talk 140710
Increased Benefit of Resilience with AVSbull Adaptive voltage scaling allows a lower supply voltage for resilient
designs thus reduced powerbull Proposed method trades off between timing-error penalty vs
reduced power at a lower supply voltagebull Proposed method achieves an average of 17 energy reduction
compared to pure-margin designs Resilience benefits increase in the context of AVS strategy
086 09 094 098 10225
30
35
40
45
50pure-marginbrute-forceCombOpt
Supply voltage (V)
En
erg
y (
mJ
)
070 072 074 076 078 08024
26
28
30
32
34
36 pure-marginbrute-forceCombOpt
Supply voltage (V)
En
erg
y (
mJ
)
MUL EXU
Minimum achievable energy
Technology foundry 28nm
25ISVLSI-2014 invited talk 140710
Outlinebull Introductionbull Modeling of IC Variabilitybull Tolerance of IC Variabilitybull Margining of IC Variabilitybull Conclusions
26ISVLSI-2014 invited talk 140710
Breaking Chicken-Egg Loops Less Marginbull Example Interaction between reliability margin and AVS designs
bull Bias temperature instability (BTI) aging higher |ΔVth| lower fmax
bull AVS can be used to compensate for performance degradation
Circuit
Closed-loop AVS
On-chip aging
monitor
Circuit performanc
e
Voltage regulato
r
Circuit frequency
Vdd
time
time
Without AVSWith AVS
target
27ISVLSI-2014 invited talk 140710
Derated Library Characterization and AVS
bull VBTI = Voltage for BTI aging estimation
bull Vlib = Voltage for circuit performance estimation (library characterization)
bull VBTI and Vlib are required in signoff
bull VBTI and Vlib selection should consider BTI + AVS interaction
bull Aging and Vfinal are unknowns before circuit implementation
BTI degradation and AVS
Vfinal
VBTI |Vt|
Step 1
Vlib
Derated library
Step 2
Circuit implementation and
signoff
circuit
Step 3
28ISVLSI-2014 invited talk 140710
Library Characterization for AVS
bull VBTI = Voltage for BTI aging estimation
bull Vlib = Voltage for circuit performance estimation (library characterization)
bull VBTI and Vlib are required in signoff
bull VBTI and Vlib depend on aging during AVS
bull Aging and Vfinal are unknowns before circuit implementation
Vlib
VBTI Derated library
|Vt| Circuit implementation and
signoff
circuitBTI degradation and AVS
Vfinal
Step 1 Step 2 Step 3
No obvious guideline to define VBTI and Vlib
Inconsistency among Vfinal Vlib VBTI
bull What is the design overhead when timing libraries are not properly characterized
bull Can we define BTI- and AVS-aware signoff corners that ensure product goals with small design lifetime energy overheads Joint work with Wei-Ting Jonas Chan Tuck-Boon Chan Siddhartha Nath
29ISVLSI-2014 invited talk 140710
Power vs Area Across Different Signoffs
ldquoKneerdquo point for balanced area and power tradeoff
Optimistic signoff corner bull AVS increases supply voltage
aggressively to compensate aging
bull Large lifetime energy overhead
bull May fail to meet timing if desired supply voltage gt Vmax
bull Selection of signoff modes affects area power
bull ASP-DAC 2013 Optimization of signoff modes
Improve performance power or area
Reduce overdesign
NOM
ODNOM
OD
time
Vdd
tnom tOD tnom tOD
Also Multi-Mode Signoff Choices Matter
12
Fix fOD still 14 power range
Power of circuits w different overdrive modes
Different overdrive modes 26 power range
fnom = 800MHz Vnom = 08V
36ISVLSI-2014 invited talk 140710
Also Tunable Monitors Less Margin
Aggressive config Vmin_est lt Vmin_chip Some chips will fail
Optimized configbull Increase high
resistance passgatesbull Vmin_est asymp Vmin_chip
Default config
bull Low resistance passgates
bull Guardband for worst-case
bull Vmin_est gt Vmin_chip
bull 13mV margin
37ISVLSI-2014 invited talk 140710
Also Tunable Monitors Less Margin
Aggressive config Vmin_est lt Vmin_chip Some chips will fail
Default config
bull Low resistance passgates
bull Guardband for worst-case
bull Vmin_est gt Vmin_chip
bull 13mV margin
Optimized configbull Increase high
resistance passgatesbull Vmin_est asymp Vmin_chip
Benefits of tunability bull Compensate for difference
between model vs siliconbull Recover margin when variation is
reduced due to improved process
38ISVLSI-2014 invited talk 140710
Outlinebull Introductionbull Modeling of IC Variabilitybull Margining of IC Variabilitybull Tolerance of IC Variabilitybull Conclusions
39ISVLSI-2014 invited talk 140710
Conclusionsbull Variability severely challenges IC value
bull In manufacturing process during operation across lifetime
bull Benefit of ldquonext noderdquo is increasingly hard to findbull Entire node is a ldquo202020rdquo value propositionbull 5-10 in PPA metrics is now substantial at leading edge
bull Variability is connected to tapeout IC properties by models margins tolerances used in signoff
bull Some takeaways from this talkbull Substantial benefit from tightening BEOL corners (= signoff)bull ldquoMinimum cost of resiliencerdquo is a rich optimization challengebull Chicken-egg loops in signoff definition can be brokenbull Holistic approaches will provide ldquoequivalent scalingrdquo that
extends the value trajectory of Moorersquos Law
40ISVLSI-2014 invited talk 140710
Thank You
41ISVLSI-2014 invited talk 140710
Backup
42ISVLSI-2014 invited talk 140710
Power Penalty to Fix EM with AVS
1 2 3 4 5 6 7 8 91200
1300
1400
1500
1600
1700
030
032
034
036
Core Power (mW) PG Power (mW)
Implemetation
Core
Pow
er (m
W)
PG
Pow
er (m
W)
bull Core power increases due to elevated voltage bull PG power increases due to both elevated voltage and mesh degradationbull A tradeoff between invested guardband in signoff
Highest invested guardband
Least invested guardband
14 power penalty
>
43ISVLSI-2014 invited talk 140710
Homogeneous Cornersbull (1) Define RC corners of each layer separatelybull (2) Use corners from each layer to construct a
homogeneous corner for an interconnect stack
C-3σ
Layer M2
3σ
C-3σ
Layer M1
3σ
Interconnect stack with M1 and M2
M1 C
M2 C
3σ Pessimism
Example worst-case capacitance corner Homogeneous
Cw corner
44ISVLSI-2014 invited talk 140710
Homogeneous Cornersbull (1) Define RC corners of each layer separatelybull (2) Use corners from each layer to construct a
homogeneous corner for an interconnect stack
Interconnect stack with M1 and M2
M1 C
M2 C
3σ
Homogeneous Cw corner
C-3σ
Layer M2
3σ
C-3σ
Layer M1
3σ
Pessimism
Example worst-case capacitance corner
When variations in different layers are not fully correlated pessimism of homogeneous corners increase with layers
45ISVLSI-2014 invited talk 140710
Correlation Matrixbull Let Σ be the correlation matrix for variation sources
M1 M2 M3 M4
ΔW ΔT ΔH ΔW ΔT ΔH ΔW ΔT ΔH ΔW ΔT ΔH
M1 ΔW 1 0 0 γ 0 0 γ 0 0 0 0 0
ΔT 0 1 0 0 γ 0 0 γ 0 0 0 0
ΔH 0 0 1 0 0 γ 0 0 γ 0 0 0
M2 ΔW γ 0 0 1 0 0 γ 0 0 0 0 0
ΔT 0 γ 0 0 1 0 0 γ 0 0 0 0
ΔH 0 0 γ 0 0 1 0 0 γ 0 0 0
M3 ΔW γ 0 0 γ 0 0 1 0 0 0 0 0
ΔT 0 γ 0 0 γ 0 0 1 0 0 0 0
ΔH 0 0 γ 0 0 γ 0 0 1 0 0 0
M4 ΔW 0 0 0 0 0 0 0 0 0 1 0 0
ΔT 0 0 0 0 0 0 0 0 0 0 1 0
ΔH 0 0 0 0 0 0 0 0 0 0 0 1
= Σ
Correlation for variation sources with the same variation type and in the process module γ 05
Variation sources in different process modules are independent
46ISVLSI-2014 invited talk 140710
Wiring Structure in Timing-Critical Paths (2)
bull 92 of paths have lt 60 of wirelength on any single layer
Max wirelength ratio across all layers ()
Cum
ulati
ve p
roba
bilit
y
092
60
bull Variations in different layers are not fully correlated
bull Some paths have α gt 10 a CBC can underestimate delay variationsbull But these paths have larger delays at the other corner
C-worst corner underestimates delay variations but these paths are dominated by the RC-worst corner
α lt 10 delay variations are covered by the RC-worst corner
Dominated by C-worstΔdelay at C-worst gt Δdelay at RC-worst
Dominated by RC-worst Δdelay at RC-worst gt Δdelay at C-worst
Δdelay at RC-worst [d(Ycw) ndash d(Ytyp)] d(Ytyp)
48ISVLSI-2014 invited talk 140710
Delay Variation
α α
Δdelay at C-worst [d(Ycw) ndash d(Ytyp)] d(Ytyp)
bull Some paths have α gt 10 a CBC can underestimate delay variationsbull But these paths have larger delays at the other corner
C-worst corner underestimates delay variations but these paths are dominated by the RC-worst corner
α lt 10 delay variations are covered by the RC-worst corner
Dominated by C-worstΔdelay at C-worst gt Δdelay at RC-worst
Dominated by RC-worst Δdelay at RC-worst gt Δdelay at C-worst
Δdelay at RC-worst [d(Ycw) ndash d(Ytyp)] d(Ytyp)
bull Paths are more sensitive to R or to Cbull Using RC-worst or C-worst only will underestimate delay variationsbull Need both RC- and C-worst corners to cover process variationsbull In the following discussions α is defined at the dominant corner
49ISVLSI-2014 invited talk 140710
Non-Homogeneous Corner
bull Each layer can have different skewed variationsInterconnect stack with M1 and M2
M1 C
M2 C
3σ
Non-homogeneous cornerM1 == Cw (3σ)M2 == Ctyp
bull Less pessimism with non-homogeneous cornersbull Challenge
bull Many feasible combinationsbull A corner can only cover certain pathsbull How to choose the best combinations
50ISVLSI-2014 invited talk 140710
Opportunities for Tightened BEOL Corners
bull CBC can be pessimistic Most paths have α lt 05 bull Use tightened BEOL corners eg scale BEOL variation in
itf with α = 05
Δdj(Yrcw)dj(Ytyp) x 100
3σjd(Ytyp) x 100
Challenge how to avoid underestimating delay variation to preserve parametric yield
51ISVLSI-2014 invited talk 140710
Wiring Structure in Timing-Critical Paths
bull Critical paths are structurally similar
bull Wires on critical paths are routed on many layers
bull Structure is an outcome of the design flow
Testcasebull 45nm foundry library (wire
resistivity scaled by 8X)bull Netlist NETCARD 1mm2 570K
standard cell instancesbull 9 metal layersbull Extract critical paths from
different PVT and BEOL corners
Wirelength ratio ()
52ISVLSI-2014 invited talk 140710
Proposed Timing Signoff Flow
bull Extract RC at RC-worst C-worst and the typical corners
bull Calculate Δdelay of critical paths
bull Put path j in the group Gtbc if Δdelay is larger than a threshold
bull Fix only the paths in Gtbc using tightened BEOL corners
bull Since tightened corners have smaller delay variations timing closure is easier
Routed design
Timing analysis at BEOL corners Ytyp Ycw Yrcw
GTBC GCBC
ECOusing CBC
Timing analysis
using TBC
violation = 0
Timing analysis
using CBC
violation = 0
ECOusing TBC
done
53ISVLSI-2014 invited talk 140710
Experiment Setup
LEON3MP NETCARD SUPERBLUE12Clock period (ns) 18 20 31
bull Model BTI degradation with Vfinal throughout lifetime
bull Aging of a flat Vfinal asymp aging of an adaptive Vdd
bull But slightly pessimistic
Vdd
time
NBTI
PBTI
VBTI = Vlib asymp Vfinal
58ISVLSI-2014 invited talk 140710
Vfinal Estimation
bull Problem Vfinal is not available at early design stage (design has not been implemented)
bull Vfinal = Vdd end of life (to compensate BTI aging)
bull Gates along critical pathbull Timing slack at t = 0
bull Circuit activity is not an issue bull Because BTI effect is not sensitive to circuit activitybull DC or AC stress model is sufficient
59ISVLSI-2014 invited talk 140710
Observation and Heuristic 2
bull Observation 2 Vfinal is not sensitive to gate types
bull Heuristic 2 use average Vfinal of different gate typesbull Vfinal is a function of timing slack
bull Assume timing slack = 0
10mV
60ISVLSI-2014 invited talk 140710
Technology and Benchmark Circuits
bull NANGATE library with 32nm PTM technology bull Signoff for setup time violationbull Temperature = 125Cbull Process corner = slow NMOS and PMOSbull BTI degradation = DC AC
Supply voltages
Circuit Frequency (GHz)C5315 138c7552 125AES 089MPEG2 105
Vmax105V
Vinit090V
Vheur1 (DC) 097V
Vheur1 (AC) 095V
Vheur2 (DC) 095V
Vheur2 (AC) 093V
61ISVLSI-2014 invited talk 140710
A Reference Signoff Flow
bull Basic idea keep a consistent VBTI VLIB and Vdd throughout circuit lifetime
bull Signoff flowbull Estimate aging at each time step
bull Update circuit timing and Vdd
bull Repeat until t = tfinal
bull Modify circuit and start over if Vfinal gt maximum allowed voltage
bull No overhead in timing analysis but very slow Many STA runs
and library
Vstep AVS voltage stepVfinal converged voltage
62ISVLSI-2014 invited talk 140710
Experiment Setupbull Characterize different derated libraries
bull Evaluate impact of library characterizationbull Seven setups
1 VBTI = Vlib = Vinit Ignore AVS2 Most pessimistic derated library3 VBTI = Vlib = Vmax Extreme corner for AVS4 VBTI = Vfinal Do not overestimate aging but ignores AVS5 No derated library (reference)6 Proposed method with α=07 Proposed method with α=003
Case 1 2 3 4 5 6 7Vlib(V) Vinit Vinit Vmax Vinit NA Vheur1 Vheur2
VBTI (V) Vinit Vmax Vmax Vfinal NA Vheur1 Vheur2
63ISVLSI-2014 invited talk 140710
ldquoChicken and Eggrdquo Loop
bull ldquoChicken and eggrdquo loop in signoffbull Derated library characterization is related to BTI + AVSbull AVS affected by circuit implementation
bull Timing constraints critical paths etc
bull Circuit is affected by library characterization
Circuit
Derated Libraries
Vfinal
Vlib VBTI
64ISVLSI-2014 invited talk 140710
Bias Temperature Instability (BTI)
|ΔVth| increases when device is on (stressed)|ΔVth| is partially recovered when device is off (relaxed)NBTI PMOS PBTINMOS
|Vgs|
time
ON OFF ON OFF
[VattikondaWC06]
Device aging (|ΔVth|) accumulates over time
[TCASrsquo14]
65ISVLSI-2014 invited talk 140710
Observation 1
[Chan11]
bull BTI is a ldquofront-loadedrdquo phenomenon
bull 50 BTI aging happens within the 1st year of circuit lifetime (total lifetime = 10 years)
bull Most Vdd increment happens in early lifetime
bull Gap between Vdd and Vfinal reduces rapidly
asymp70 Vdd increment in 1 year(remaining 30 over 9 years)
Vfinal
66ISVLSI-2014 invited talk 140710
Results for DC Scenario
Optimistic signoff corner bull AVS increases supply voltage
aggressively to compensate agingbull Consume more powerbull May fail to meet timing if desired
bull Assume no EM fix at signoffbull BTI degradation is checked at each step and MTTF is updated as
30 MTTF penalty
200mV voltage compensation
>
70ISVLSI-2014 invited talk 140710
0 2 4 6 8 10 12090092094096098100102104
S1 S2 S3 S4 S5
Year
VDD
DMA 3S1 S2 S3 S4 S5
78
79
80
81
MTT
F (Y
ear)
EM Impact on AVS Scheduling
12 years MTTF penalty
>
71ISVLSI-2014 invited talk 140710
What is ldquoSignoffrdquo
bull Foundation of contract between design house and foundrybull ldquochip should workrdquo stack of models margins analysesbull Function timing signal integrity power integrity hellip
Nominal VddStatic IR drop
Power grid IR gradientDynamic IR
HCINBTI
Signoff Vdd
Voltage
Problem Margins = pessimism
overdesign schedule delay
ldquomargin stackrdquo for voltage signoff
Operating voltage
72ISVLSI-2014 invited talk 140710
Statistical Timing Analysis (1)
bull Delay sensitivity of path pj to variation source zv
bull Assumptions bull Δdjv is linear with respect to variation sources
bull Variation sources are normal distributions
bull Obtain Δdjv using 28 runs of RC extraction and static timing analysis (STA)
28 itf files (27 variation
sources + Ytyp)
Routed Netlist
RC extraction
STA
Δdjv
Δdjv = [ - ] 3dj(Yv) dj(Ytyp)
Note Path delay includes gate and wire delays
73ISVLSI-2014 invited talk 140710
Statistical Timing Analysis (2)bull Σ is the correlation matrix for variation sources (eg 27 x 27)
bull Σ = λλT (Note λ is obtained by Cholesky decomposition)
h h119879 119903119900119906119892 119901119906119905=1minus119864119877119879
+1minus119864119877119903times119879
recovery cycles
Clock period
Error rate [Kahng10]
76ISVLSI-2014 invited talk 140710
Selective-Endpoint Optimizationbull Optimize fanin cone w tighter constraints Allows replacement of Razor FF w normal FFbull Trade off cost of resilience vs data path optimization
Outlinebull Introductionbull Modeling of IC Variabilitybull Tolerance of IC Variabilitybull Margining of IC Variabilitybull Conclusions
6ISVLSI-2014 invited talk 140710
BEOL Corner Optimization
bull 20nm and below increased timing variation due to interconnect R Cbull Design closure becomes much more difficult
bull Costs of BEOL variationsbull More design effort (eg ldquolast monthrdquo of manual ECO iteration)
bull Compromised circuit performance at high Vdd
bull Recent work reduce signoff margin by using tightened BEOL corners without sacrificing parametric yieldbull Signoff at conventional BEOL corners is pessimistic for most timing-
critical pathsbull We identify paths which can be safely signed off using tightened
BEOL corners (TBC)bull Joint work with Sorin Dobre (Qualcomm) and Tuck-Boon Chan
7ISVLSI-2014 invited talk 140710
Proposed Timing Signoff Flow
Routed design
Timing analysis using conventional BEOL corners (CBC)
ECOusing CBC
violation = 0
done
Conventional Signoff
No
Routed design
Classify timing critical paths
GTBC GCBC
ECOusing CBC
Timing analysis
using TBC
violation = 0
Timing analysis
using CBC
violation = 0
ECOusing TBC
done
This work
NoNo
8ISVLSI-2014 invited talk 140710
Conventional BEOL Corners
bull Three major variation sources per layer ΔW ΔT ΔHbull Conventional BEOL corners (CBC)
bull Homogeneous corners all variation sources are skewed in the same direction
bull BEOL RC variations are modeled in interconnect technology file (itf)
M2
M3
M1
S2 W2T2
H2 Inter-layer dielectric
Inter-metal dielectric
H3
H1
T1
T3
ΔW ΔT ΔH
Ytyp typical typical Typical
Ycb min min max
Ycw max max min
Yrcb max max max
Yrcw min min min
9ISVLSI-2014 invited talk 140710
Statistical RC Modelbull 3 variation sources in each layer ΔW ΔT ΔH
bull 9-layer metal stack has 27 variation sources z1 z2 hellip z27
bull BEOL layers in the same process module use the same manufacturing equipment and process steps
bull zu and zv are correlated if and only ifbull zu and zv are the same type (ΔW ΔT or ΔH)
bull zu and zv are in the same process module
M2 z4 z5 z6
M4 z10 z11 z12
M3 z7 z8 z9
M5 z13 z14 z15
M6 z16 z17 z18
M7 z19 z20 z21
M8 z22 z23 z24
M9 z25 z26 z27
M1 z1 z2 z3
Process module 3
Process module 2
Process module 1
Examples bull ΔW in layer M4 has a
positive correlation with ΔW in layers M5 M6 and M7
bull But ΔW in layer M4 is not correlated with ΔT in M4
ΔW ΔT ΔH
10ISVLSI-2014 invited talk 140710
Pessimism of Conventional BEOL Corners (CBC)
bull Assumption a max (setup) path pj is ldquosaferdquo when delay evaluated at a given CBC is larger than nominal delay + 3σj
dj(YCBC) ge 3σj + dj(Ytyp)
bull For a given path we can compare the statistical delay variation and the delay obtained from a given CBC αj = 3σj Δdj(YCBC)
Δdelay (vs typ) at C-worst [d(Ycw) ndash d(Ytyp)] d(Ytyp)
bull Some paths have α gt 10 a CBC can underestimate delay variationsbull But these paths often have smaller α values at the other corner ()
C-worst corner underestimates delay variations but these paths are dominated by the RC-worst corner
α lt 10 here delay variations covered by RC-worst corner
Dominated by C-worstΔdelay at C-worst gt Δdelay at RC-worst
Dominated by RC-worst Δdelay at RC-worst gt Δdelay at C-worst
Δdelay (vs typ) at RC-worst [d(Ycw) ndash d(Ytyp)] d(Ytyp)
12ISVLSI-2014 invited talk 140710
α α
Δdelay at C-worst [d(Ycw) ndash d(Ytyp)] d(Ytyp)
bull Some paths have α gt 10 a CBC can underestimate delay variationsbull But these paths often have smaller α values at the other corner ()
C-worst corner underestimates delay variations but these paths are dominated by the RC-worst corner
α lt 10 delay variations are covered by the RC-worst corner
Dominated by C-worstΔdelay at C-worst gt Δdelay at RC-worst
Dominated by RC-worst Δdelay at RC-worst gt Δdelay at C-worst
Δdelay at RC-worst [d(Ycw) ndash d(Ytyp)] d(Ytyp)
bull Paths are more sensitive to R or to Cbull Using RC-worst or C-worst only will underestimate delay variationsbull Need both RC- and C-worst corners to cover process variations
In the following α is defined at the dominant corner
Intuition on Delay Variability Across Cw RCw
13ISVLSI-2014 invited talk 140710
Scaling Factor α and Delay Variationbull Paths with small Δdrcw and Δdcw have large α
bull Eg here we see αj gt 06 when ((Δdrcw lt 3) AND (Δdcw lt 3))
bull Identify paths for tightened BEOL corners based on Δdrcw and Δdcw
Δd(Ycw)d(Ytyp)
Δd(Yrcw)d(Ytyp) α
14ISVLSI-2014 invited talk 140710
bull Paths with small Δdrcw and Δdcw have large α
bull Eg there are αj gt 06 when ((Δdrcw lt 3) AND (Δdcw lt 3))
bull Identify paths for tightened BEOL corners based on Δdrcw and Δdcw
Find Paths for Which TBCs Can Be Used
Δd(Ycw)d(Ytyp)
Δd(Yrcw)d(Ytyp)
Acw
Arcw
Gtbc = Set of paths that can be safely signed off using TBC ( (Path with Δdcw larger than Acw) OR (Path with Δdrcw larger than Arcw) )
α
15ISVLSI-2014 invited talk 140710
Determining α Arcw and Acw
Δd at C-worst corner ()Δd at RC-worst corner ()
bull Assumption critical paths in different designs have similar trends
bull Extract Arcw and Acw from a set of representative paths
bull Plot α vs Δdelay find Arcw and Acw for a given α
bull Add +1 margin on Arcw and Acw to account for sampling error
bull Smaller α larger thresholds (Arcw and Acw) fewer paths in GTBC
Δd at C-worst corner ()
Arcw Acw
16ISVLSI-2014 invited talk 140710
Benefits of Tightened BEOL Corners
bull WNS and TNS are reduced by up to 100ps and 53nsbull Timing violations reduced by
24 to 100
bull TBC-06 more benefits bull Tradeoff between reduced margin
vs paths which use TBC
Correlation factor γ = 05
LEON SUPERBLUE12 NETCARD
-018-016-014-012
-01-008-006-004-002
0
CBC TBC-05 TBC-06 TBC-07
WN
S (n
s)
LEON SUPERBLUE12 NETCARD
-90-80-70-60-50-40-30-20-10
0
CBC TBC-05 TBC-06 TBC-07
TNS
(ns)
LEON SUPERBLUE12 NETCARD0
200400600800
1000120014001600
CBC TBC-05 TBC-06 TBC-07
Tim
ing
viol
ation
s
17ISVLSI-2014 invited talk 140710
Outlinebull Introductionbull Modeling of IC Variabilitybull Tolerance of IC Variabilitybull Margining of IC Variabilitybull Conclusions
18ISVLSI-2014 invited talk 140710
How to Minimize Cost of Resilience bull Additional circuits area and power penaltiesbull Recovery from errors throughput degradationbull Large hold margin short-path padding costbull Want benefits (eg energy) to maximally outweigh costs
Razor Razor-Lite TIMBER
Razor Razor-Lite TIMBER
Power penalty 30 [Das08] ~0 [Kim13] 100 [Choudhury09]
Overall Optimization Flowbull Iteratively optimize with SEOpt and SkewOpt
Initial placement (all FFs = error-tolerant FFs)
Energy lt min energy
Save current solution
Margin insertion on K paths based on sensitivity function
Replace error-tolerant FFs w normal FFs
SEOpt
Activity aware clock skew optimization
SkewOpt
OR-tree insertion
23ISVLSI-2014 invited talk 140710
Benefit of Low-Cost Resiliencebull Reference flows
bull Pure-margin (PM) conventional method w only margin insertionbull Brute-force (BF) use error-tolerant FFs for timing-critical endpoints
bull Proposed method (CO) achieves up to 21 energy reduction compared to reference methods
bull Resilience benefits increase with larger process variation
PM BF CO PM BF CO PM BF CO27
29
31
33
35
37
En
erg
y (
mJ
)
PM BF CO PM BF CO PM BF CO22
26
30
34
38Energy penalty of throughput degradation
Energy penalty of additional circuits
Energy wo resilience
En
erg
y (
mJ
)
Large marginMedium marginSmall margin
MUL
EXU
Large marginMedium marginSmall margin
Smallmediumlarge margin 1σ2σ3σ for SS corner Technology foundry 28nm
24ISVLSI-2014 invited talk 140710
Increased Benefit of Resilience with AVSbull Adaptive voltage scaling allows a lower supply voltage for resilient
designs thus reduced powerbull Proposed method trades off between timing-error penalty vs
reduced power at a lower supply voltagebull Proposed method achieves an average of 17 energy reduction
compared to pure-margin designs Resilience benefits increase in the context of AVS strategy
086 09 094 098 10225
30
35
40
45
50pure-marginbrute-forceCombOpt
Supply voltage (V)
En
erg
y (
mJ
)
070 072 074 076 078 08024
26
28
30
32
34
36 pure-marginbrute-forceCombOpt
Supply voltage (V)
En
erg
y (
mJ
)
MUL EXU
Minimum achievable energy
Technology foundry 28nm
25ISVLSI-2014 invited talk 140710
Outlinebull Introductionbull Modeling of IC Variabilitybull Tolerance of IC Variabilitybull Margining of IC Variabilitybull Conclusions
26ISVLSI-2014 invited talk 140710
Breaking Chicken-Egg Loops Less Marginbull Example Interaction between reliability margin and AVS designs
bull Bias temperature instability (BTI) aging higher |ΔVth| lower fmax
bull AVS can be used to compensate for performance degradation
Circuit
Closed-loop AVS
On-chip aging
monitor
Circuit performanc
e
Voltage regulato
r
Circuit frequency
Vdd
time
time
Without AVSWith AVS
target
27ISVLSI-2014 invited talk 140710
Derated Library Characterization and AVS
bull VBTI = Voltage for BTI aging estimation
bull Vlib = Voltage for circuit performance estimation (library characterization)
bull VBTI and Vlib are required in signoff
bull VBTI and Vlib selection should consider BTI + AVS interaction
bull Aging and Vfinal are unknowns before circuit implementation
BTI degradation and AVS
Vfinal
VBTI |Vt|
Step 1
Vlib
Derated library
Step 2
Circuit implementation and
signoff
circuit
Step 3
28ISVLSI-2014 invited talk 140710
Library Characterization for AVS
bull VBTI = Voltage for BTI aging estimation
bull Vlib = Voltage for circuit performance estimation (library characterization)
bull VBTI and Vlib are required in signoff
bull VBTI and Vlib depend on aging during AVS
bull Aging and Vfinal are unknowns before circuit implementation
Vlib
VBTI Derated library
|Vt| Circuit implementation and
signoff
circuitBTI degradation and AVS
Vfinal
Step 1 Step 2 Step 3
No obvious guideline to define VBTI and Vlib
Inconsistency among Vfinal Vlib VBTI
bull What is the design overhead when timing libraries are not properly characterized
bull Can we define BTI- and AVS-aware signoff corners that ensure product goals with small design lifetime energy overheads Joint work with Wei-Ting Jonas Chan Tuck-Boon Chan Siddhartha Nath
29ISVLSI-2014 invited talk 140710
Power vs Area Across Different Signoffs
ldquoKneerdquo point for balanced area and power tradeoff
Optimistic signoff corner bull AVS increases supply voltage
aggressively to compensate aging
bull Large lifetime energy overhead
bull May fail to meet timing if desired supply voltage gt Vmax
bull Selection of signoff modes affects area power
bull ASP-DAC 2013 Optimization of signoff modes
Improve performance power or area
Reduce overdesign
NOM
ODNOM
OD
time
Vdd
tnom tOD tnom tOD
Also Multi-Mode Signoff Choices Matter
12
Fix fOD still 14 power range
Power of circuits w different overdrive modes
Different overdrive modes 26 power range
fnom = 800MHz Vnom = 08V
36ISVLSI-2014 invited talk 140710
Also Tunable Monitors Less Margin
Aggressive config Vmin_est lt Vmin_chip Some chips will fail
Optimized configbull Increase high
resistance passgatesbull Vmin_est asymp Vmin_chip
Default config
bull Low resistance passgates
bull Guardband for worst-case
bull Vmin_est gt Vmin_chip
bull 13mV margin
37ISVLSI-2014 invited talk 140710
Also Tunable Monitors Less Margin
Aggressive config Vmin_est lt Vmin_chip Some chips will fail
Default config
bull Low resistance passgates
bull Guardband for worst-case
bull Vmin_est gt Vmin_chip
bull 13mV margin
Optimized configbull Increase high
resistance passgatesbull Vmin_est asymp Vmin_chip
Benefits of tunability bull Compensate for difference
between model vs siliconbull Recover margin when variation is
reduced due to improved process
38ISVLSI-2014 invited talk 140710
Outlinebull Introductionbull Modeling of IC Variabilitybull Margining of IC Variabilitybull Tolerance of IC Variabilitybull Conclusions
39ISVLSI-2014 invited talk 140710
Conclusionsbull Variability severely challenges IC value
bull In manufacturing process during operation across lifetime
bull Benefit of ldquonext noderdquo is increasingly hard to findbull Entire node is a ldquo202020rdquo value propositionbull 5-10 in PPA metrics is now substantial at leading edge
bull Variability is connected to tapeout IC properties by models margins tolerances used in signoff
bull Some takeaways from this talkbull Substantial benefit from tightening BEOL corners (= signoff)bull ldquoMinimum cost of resiliencerdquo is a rich optimization challengebull Chicken-egg loops in signoff definition can be brokenbull Holistic approaches will provide ldquoequivalent scalingrdquo that
extends the value trajectory of Moorersquos Law
40ISVLSI-2014 invited talk 140710
Thank You
41ISVLSI-2014 invited talk 140710
Backup
42ISVLSI-2014 invited talk 140710
Power Penalty to Fix EM with AVS
1 2 3 4 5 6 7 8 91200
1300
1400
1500
1600
1700
030
032
034
036
Core Power (mW) PG Power (mW)
Implemetation
Core
Pow
er (m
W)
PG
Pow
er (m
W)
bull Core power increases due to elevated voltage bull PG power increases due to both elevated voltage and mesh degradationbull A tradeoff between invested guardband in signoff
Highest invested guardband
Least invested guardband
14 power penalty
>
43ISVLSI-2014 invited talk 140710
Homogeneous Cornersbull (1) Define RC corners of each layer separatelybull (2) Use corners from each layer to construct a
homogeneous corner for an interconnect stack
C-3σ
Layer M2
3σ
C-3σ
Layer M1
3σ
Interconnect stack with M1 and M2
M1 C
M2 C
3σ Pessimism
Example worst-case capacitance corner Homogeneous
Cw corner
44ISVLSI-2014 invited talk 140710
Homogeneous Cornersbull (1) Define RC corners of each layer separatelybull (2) Use corners from each layer to construct a
homogeneous corner for an interconnect stack
Interconnect stack with M1 and M2
M1 C
M2 C
3σ
Homogeneous Cw corner
C-3σ
Layer M2
3σ
C-3σ
Layer M1
3σ
Pessimism
Example worst-case capacitance corner
When variations in different layers are not fully correlated pessimism of homogeneous corners increase with layers
45ISVLSI-2014 invited talk 140710
Correlation Matrixbull Let Σ be the correlation matrix for variation sources
M1 M2 M3 M4
ΔW ΔT ΔH ΔW ΔT ΔH ΔW ΔT ΔH ΔW ΔT ΔH
M1 ΔW 1 0 0 γ 0 0 γ 0 0 0 0 0
ΔT 0 1 0 0 γ 0 0 γ 0 0 0 0
ΔH 0 0 1 0 0 γ 0 0 γ 0 0 0
M2 ΔW γ 0 0 1 0 0 γ 0 0 0 0 0
ΔT 0 γ 0 0 1 0 0 γ 0 0 0 0
ΔH 0 0 γ 0 0 1 0 0 γ 0 0 0
M3 ΔW γ 0 0 γ 0 0 1 0 0 0 0 0
ΔT 0 γ 0 0 γ 0 0 1 0 0 0 0
ΔH 0 0 γ 0 0 γ 0 0 1 0 0 0
M4 ΔW 0 0 0 0 0 0 0 0 0 1 0 0
ΔT 0 0 0 0 0 0 0 0 0 0 1 0
ΔH 0 0 0 0 0 0 0 0 0 0 0 1
= Σ
Correlation for variation sources with the same variation type and in the process module γ 05
Variation sources in different process modules are independent
46ISVLSI-2014 invited talk 140710
Wiring Structure in Timing-Critical Paths (2)
bull 92 of paths have lt 60 of wirelength on any single layer
Max wirelength ratio across all layers ()
Cum
ulati
ve p
roba
bilit
y
092
60
bull Variations in different layers are not fully correlated
bull Some paths have α gt 10 a CBC can underestimate delay variationsbull But these paths have larger delays at the other corner
C-worst corner underestimates delay variations but these paths are dominated by the RC-worst corner
α lt 10 delay variations are covered by the RC-worst corner
Dominated by C-worstΔdelay at C-worst gt Δdelay at RC-worst
Dominated by RC-worst Δdelay at RC-worst gt Δdelay at C-worst
Δdelay at RC-worst [d(Ycw) ndash d(Ytyp)] d(Ytyp)
48ISVLSI-2014 invited talk 140710
Delay Variation
α α
Δdelay at C-worst [d(Ycw) ndash d(Ytyp)] d(Ytyp)
bull Some paths have α gt 10 a CBC can underestimate delay variationsbull But these paths have larger delays at the other corner
C-worst corner underestimates delay variations but these paths are dominated by the RC-worst corner
α lt 10 delay variations are covered by the RC-worst corner
Dominated by C-worstΔdelay at C-worst gt Δdelay at RC-worst
Dominated by RC-worst Δdelay at RC-worst gt Δdelay at C-worst
Δdelay at RC-worst [d(Ycw) ndash d(Ytyp)] d(Ytyp)
bull Paths are more sensitive to R or to Cbull Using RC-worst or C-worst only will underestimate delay variationsbull Need both RC- and C-worst corners to cover process variationsbull In the following discussions α is defined at the dominant corner
49ISVLSI-2014 invited talk 140710
Non-Homogeneous Corner
bull Each layer can have different skewed variationsInterconnect stack with M1 and M2
M1 C
M2 C
3σ
Non-homogeneous cornerM1 == Cw (3σ)M2 == Ctyp
bull Less pessimism with non-homogeneous cornersbull Challenge
bull Many feasible combinationsbull A corner can only cover certain pathsbull How to choose the best combinations
50ISVLSI-2014 invited talk 140710
Opportunities for Tightened BEOL Corners
bull CBC can be pessimistic Most paths have α lt 05 bull Use tightened BEOL corners eg scale BEOL variation in
itf with α = 05
Δdj(Yrcw)dj(Ytyp) x 100
3σjd(Ytyp) x 100
Challenge how to avoid underestimating delay variation to preserve parametric yield
51ISVLSI-2014 invited talk 140710
Wiring Structure in Timing-Critical Paths
bull Critical paths are structurally similar
bull Wires on critical paths are routed on many layers
bull Structure is an outcome of the design flow
Testcasebull 45nm foundry library (wire
resistivity scaled by 8X)bull Netlist NETCARD 1mm2 570K
standard cell instancesbull 9 metal layersbull Extract critical paths from
different PVT and BEOL corners
Wirelength ratio ()
52ISVLSI-2014 invited talk 140710
Proposed Timing Signoff Flow
bull Extract RC at RC-worst C-worst and the typical corners
bull Calculate Δdelay of critical paths
bull Put path j in the group Gtbc if Δdelay is larger than a threshold
bull Fix only the paths in Gtbc using tightened BEOL corners
bull Since tightened corners have smaller delay variations timing closure is easier
Routed design
Timing analysis at BEOL corners Ytyp Ycw Yrcw
GTBC GCBC
ECOusing CBC
Timing analysis
using TBC
violation = 0
Timing analysis
using CBC
violation = 0
ECOusing TBC
done
53ISVLSI-2014 invited talk 140710
Experiment Setup
LEON3MP NETCARD SUPERBLUE12Clock period (ns) 18 20 31
bull Model BTI degradation with Vfinal throughout lifetime
bull Aging of a flat Vfinal asymp aging of an adaptive Vdd
bull But slightly pessimistic
Vdd
time
NBTI
PBTI
VBTI = Vlib asymp Vfinal
58ISVLSI-2014 invited talk 140710
Vfinal Estimation
bull Problem Vfinal is not available at early design stage (design has not been implemented)
bull Vfinal = Vdd end of life (to compensate BTI aging)
bull Gates along critical pathbull Timing slack at t = 0
bull Circuit activity is not an issue bull Because BTI effect is not sensitive to circuit activitybull DC or AC stress model is sufficient
59ISVLSI-2014 invited talk 140710
Observation and Heuristic 2
bull Observation 2 Vfinal is not sensitive to gate types
bull Heuristic 2 use average Vfinal of different gate typesbull Vfinal is a function of timing slack
bull Assume timing slack = 0
10mV
60ISVLSI-2014 invited talk 140710
Technology and Benchmark Circuits
bull NANGATE library with 32nm PTM technology bull Signoff for setup time violationbull Temperature = 125Cbull Process corner = slow NMOS and PMOSbull BTI degradation = DC AC
Supply voltages
Circuit Frequency (GHz)C5315 138c7552 125AES 089MPEG2 105
Vmax105V
Vinit090V
Vheur1 (DC) 097V
Vheur1 (AC) 095V
Vheur2 (DC) 095V
Vheur2 (AC) 093V
61ISVLSI-2014 invited talk 140710
A Reference Signoff Flow
bull Basic idea keep a consistent VBTI VLIB and Vdd throughout circuit lifetime
bull Signoff flowbull Estimate aging at each time step
bull Update circuit timing and Vdd
bull Repeat until t = tfinal
bull Modify circuit and start over if Vfinal gt maximum allowed voltage
bull No overhead in timing analysis but very slow Many STA runs
and library
Vstep AVS voltage stepVfinal converged voltage
62ISVLSI-2014 invited talk 140710
Experiment Setupbull Characterize different derated libraries
bull Evaluate impact of library characterizationbull Seven setups
1 VBTI = Vlib = Vinit Ignore AVS2 Most pessimistic derated library3 VBTI = Vlib = Vmax Extreme corner for AVS4 VBTI = Vfinal Do not overestimate aging but ignores AVS5 No derated library (reference)6 Proposed method with α=07 Proposed method with α=003
Case 1 2 3 4 5 6 7Vlib(V) Vinit Vinit Vmax Vinit NA Vheur1 Vheur2
VBTI (V) Vinit Vmax Vmax Vfinal NA Vheur1 Vheur2
63ISVLSI-2014 invited talk 140710
ldquoChicken and Eggrdquo Loop
bull ldquoChicken and eggrdquo loop in signoffbull Derated library characterization is related to BTI + AVSbull AVS affected by circuit implementation
bull Timing constraints critical paths etc
bull Circuit is affected by library characterization
Circuit
Derated Libraries
Vfinal
Vlib VBTI
64ISVLSI-2014 invited talk 140710
Bias Temperature Instability (BTI)
|ΔVth| increases when device is on (stressed)|ΔVth| is partially recovered when device is off (relaxed)NBTI PMOS PBTINMOS
|Vgs|
time
ON OFF ON OFF
[VattikondaWC06]
Device aging (|ΔVth|) accumulates over time
[TCASrsquo14]
65ISVLSI-2014 invited talk 140710
Observation 1
[Chan11]
bull BTI is a ldquofront-loadedrdquo phenomenon
bull 50 BTI aging happens within the 1st year of circuit lifetime (total lifetime = 10 years)
bull Most Vdd increment happens in early lifetime
bull Gap between Vdd and Vfinal reduces rapidly
asymp70 Vdd increment in 1 year(remaining 30 over 9 years)
Vfinal
66ISVLSI-2014 invited talk 140710
Results for DC Scenario
Optimistic signoff corner bull AVS increases supply voltage
aggressively to compensate agingbull Consume more powerbull May fail to meet timing if desired
bull Assume no EM fix at signoffbull BTI degradation is checked at each step and MTTF is updated as
30 MTTF penalty
200mV voltage compensation
>
70ISVLSI-2014 invited talk 140710
0 2 4 6 8 10 12090092094096098100102104
S1 S2 S3 S4 S5
Year
VDD
DMA 3S1 S2 S3 S4 S5
78
79
80
81
MTT
F (Y
ear)
EM Impact on AVS Scheduling
12 years MTTF penalty
>
71ISVLSI-2014 invited talk 140710
What is ldquoSignoffrdquo
bull Foundation of contract between design house and foundrybull ldquochip should workrdquo stack of models margins analysesbull Function timing signal integrity power integrity hellip
Nominal VddStatic IR drop
Power grid IR gradientDynamic IR
HCINBTI
Signoff Vdd
Voltage
Problem Margins = pessimism
overdesign schedule delay
ldquomargin stackrdquo for voltage signoff
Operating voltage
72ISVLSI-2014 invited talk 140710
Statistical Timing Analysis (1)
bull Delay sensitivity of path pj to variation source zv
bull Assumptions bull Δdjv is linear with respect to variation sources
bull Variation sources are normal distributions
bull Obtain Δdjv using 28 runs of RC extraction and static timing analysis (STA)
28 itf files (27 variation
sources + Ytyp)
Routed Netlist
RC extraction
STA
Δdjv
Δdjv = [ - ] 3dj(Yv) dj(Ytyp)
Note Path delay includes gate and wire delays
73ISVLSI-2014 invited talk 140710
Statistical Timing Analysis (2)bull Σ is the correlation matrix for variation sources (eg 27 x 27)
bull Σ = λλT (Note λ is obtained by Cholesky decomposition)
h h119879 119903119900119906119892 119901119906119905=1minus119864119877119879
+1minus119864119877119903times119879
recovery cycles
Clock period
Error rate [Kahng10]
76ISVLSI-2014 invited talk 140710
Selective-Endpoint Optimizationbull Optimize fanin cone w tighter constraints Allows replacement of Razor FF w normal FFbull Trade off cost of resilience vs data path optimization
Outlinebull Introductionbull Modeling of IC Variabilitybull Tolerance of IC Variabilitybull Margining of IC Variabilitybull Conclusions
6ISVLSI-2014 invited talk 140710
BEOL Corner Optimization
bull 20nm and below increased timing variation due to interconnect R Cbull Design closure becomes much more difficult
bull Costs of BEOL variationsbull More design effort (eg ldquolast monthrdquo of manual ECO iteration)
bull Compromised circuit performance at high Vdd
bull Recent work reduce signoff margin by using tightened BEOL corners without sacrificing parametric yieldbull Signoff at conventional BEOL corners is pessimistic for most timing-
critical pathsbull We identify paths which can be safely signed off using tightened
BEOL corners (TBC)bull Joint work with Sorin Dobre (Qualcomm) and Tuck-Boon Chan
7ISVLSI-2014 invited talk 140710
Proposed Timing Signoff Flow
Routed design
Timing analysis using conventional BEOL corners (CBC)
ECOusing CBC
violation = 0
done
Conventional Signoff
No
Routed design
Classify timing critical paths
GTBC GCBC
ECOusing CBC
Timing analysis
using TBC
violation = 0
Timing analysis
using CBC
violation = 0
ECOusing TBC
done
This work
NoNo
8ISVLSI-2014 invited talk 140710
Conventional BEOL Corners
bull Three major variation sources per layer ΔW ΔT ΔHbull Conventional BEOL corners (CBC)
bull Homogeneous corners all variation sources are skewed in the same direction
bull BEOL RC variations are modeled in interconnect technology file (itf)
M2
M3
M1
S2 W2T2
H2 Inter-layer dielectric
Inter-metal dielectric
H3
H1
T1
T3
ΔW ΔT ΔH
Ytyp typical typical Typical
Ycb min min max
Ycw max max min
Yrcb max max max
Yrcw min min min
9ISVLSI-2014 invited talk 140710
Statistical RC Modelbull 3 variation sources in each layer ΔW ΔT ΔH
bull 9-layer metal stack has 27 variation sources z1 z2 hellip z27
bull BEOL layers in the same process module use the same manufacturing equipment and process steps
bull zu and zv are correlated if and only ifbull zu and zv are the same type (ΔW ΔT or ΔH)
bull zu and zv are in the same process module
M2 z4 z5 z6
M4 z10 z11 z12
M3 z7 z8 z9
M5 z13 z14 z15
M6 z16 z17 z18
M7 z19 z20 z21
M8 z22 z23 z24
M9 z25 z26 z27
M1 z1 z2 z3
Process module 3
Process module 2
Process module 1
Examples bull ΔW in layer M4 has a
positive correlation with ΔW in layers M5 M6 and M7
bull But ΔW in layer M4 is not correlated with ΔT in M4
ΔW ΔT ΔH
10ISVLSI-2014 invited talk 140710
Pessimism of Conventional BEOL Corners (CBC)
bull Assumption a max (setup) path pj is ldquosaferdquo when delay evaluated at a given CBC is larger than nominal delay + 3σj
dj(YCBC) ge 3σj + dj(Ytyp)
bull For a given path we can compare the statistical delay variation and the delay obtained from a given CBC αj = 3σj Δdj(YCBC)
Δdelay (vs typ) at C-worst [d(Ycw) ndash d(Ytyp)] d(Ytyp)
bull Some paths have α gt 10 a CBC can underestimate delay variationsbull But these paths often have smaller α values at the other corner ()
C-worst corner underestimates delay variations but these paths are dominated by the RC-worst corner
α lt 10 here delay variations covered by RC-worst corner
Dominated by C-worstΔdelay at C-worst gt Δdelay at RC-worst
Dominated by RC-worst Δdelay at RC-worst gt Δdelay at C-worst
Δdelay (vs typ) at RC-worst [d(Ycw) ndash d(Ytyp)] d(Ytyp)
12ISVLSI-2014 invited talk 140710
α α
Δdelay at C-worst [d(Ycw) ndash d(Ytyp)] d(Ytyp)
bull Some paths have α gt 10 a CBC can underestimate delay variationsbull But these paths often have smaller α values at the other corner ()
C-worst corner underestimates delay variations but these paths are dominated by the RC-worst corner
α lt 10 delay variations are covered by the RC-worst corner
Dominated by C-worstΔdelay at C-worst gt Δdelay at RC-worst
Dominated by RC-worst Δdelay at RC-worst gt Δdelay at C-worst
Δdelay at RC-worst [d(Ycw) ndash d(Ytyp)] d(Ytyp)
bull Paths are more sensitive to R or to Cbull Using RC-worst or C-worst only will underestimate delay variationsbull Need both RC- and C-worst corners to cover process variations
In the following α is defined at the dominant corner
Intuition on Delay Variability Across Cw RCw
13ISVLSI-2014 invited talk 140710
Scaling Factor α and Delay Variationbull Paths with small Δdrcw and Δdcw have large α
bull Eg here we see αj gt 06 when ((Δdrcw lt 3) AND (Δdcw lt 3))
bull Identify paths for tightened BEOL corners based on Δdrcw and Δdcw
Δd(Ycw)d(Ytyp)
Δd(Yrcw)d(Ytyp) α
14ISVLSI-2014 invited talk 140710
bull Paths with small Δdrcw and Δdcw have large α
bull Eg there are αj gt 06 when ((Δdrcw lt 3) AND (Δdcw lt 3))
bull Identify paths for tightened BEOL corners based on Δdrcw and Δdcw
Find Paths for Which TBCs Can Be Used
Δd(Ycw)d(Ytyp)
Δd(Yrcw)d(Ytyp)
Acw
Arcw
Gtbc = Set of paths that can be safely signed off using TBC ( (Path with Δdcw larger than Acw) OR (Path with Δdrcw larger than Arcw) )
α
15ISVLSI-2014 invited talk 140710
Determining α Arcw and Acw
Δd at C-worst corner ()Δd at RC-worst corner ()
bull Assumption critical paths in different designs have similar trends
bull Extract Arcw and Acw from a set of representative paths
bull Plot α vs Δdelay find Arcw and Acw for a given α
bull Add +1 margin on Arcw and Acw to account for sampling error
bull Smaller α larger thresholds (Arcw and Acw) fewer paths in GTBC
Δd at C-worst corner ()
Arcw Acw
16ISVLSI-2014 invited talk 140710
Benefits of Tightened BEOL Corners
bull WNS and TNS are reduced by up to 100ps and 53nsbull Timing violations reduced by
24 to 100
bull TBC-06 more benefits bull Tradeoff between reduced margin
vs paths which use TBC
Correlation factor γ = 05
LEON SUPERBLUE12 NETCARD
-018-016-014-012
-01-008-006-004-002
0
CBC TBC-05 TBC-06 TBC-07
WN
S (n
s)
LEON SUPERBLUE12 NETCARD
-90-80-70-60-50-40-30-20-10
0
CBC TBC-05 TBC-06 TBC-07
TNS
(ns)
LEON SUPERBLUE12 NETCARD0
200400600800
1000120014001600
CBC TBC-05 TBC-06 TBC-07
Tim
ing
viol
ation
s
17ISVLSI-2014 invited talk 140710
Outlinebull Introductionbull Modeling of IC Variabilitybull Tolerance of IC Variabilitybull Margining of IC Variabilitybull Conclusions
18ISVLSI-2014 invited talk 140710
How to Minimize Cost of Resilience bull Additional circuits area and power penaltiesbull Recovery from errors throughput degradationbull Large hold margin short-path padding costbull Want benefits (eg energy) to maximally outweigh costs
Razor Razor-Lite TIMBER
Razor Razor-Lite TIMBER
Power penalty 30 [Das08] ~0 [Kim13] 100 [Choudhury09]
Overall Optimization Flowbull Iteratively optimize with SEOpt and SkewOpt
Initial placement (all FFs = error-tolerant FFs)
Energy lt min energy
Save current solution
Margin insertion on K paths based on sensitivity function
Replace error-tolerant FFs w normal FFs
SEOpt
Activity aware clock skew optimization
SkewOpt
OR-tree insertion
23ISVLSI-2014 invited talk 140710
Benefit of Low-Cost Resiliencebull Reference flows
bull Pure-margin (PM) conventional method w only margin insertionbull Brute-force (BF) use error-tolerant FFs for timing-critical endpoints
bull Proposed method (CO) achieves up to 21 energy reduction compared to reference methods
bull Resilience benefits increase with larger process variation
PM BF CO PM BF CO PM BF CO27
29
31
33
35
37
En
erg
y (
mJ
)
PM BF CO PM BF CO PM BF CO22
26
30
34
38Energy penalty of throughput degradation
Energy penalty of additional circuits
Energy wo resilience
En
erg
y (
mJ
)
Large marginMedium marginSmall margin
MUL
EXU
Large marginMedium marginSmall margin
Smallmediumlarge margin 1σ2σ3σ for SS corner Technology foundry 28nm
24ISVLSI-2014 invited talk 140710
Increased Benefit of Resilience with AVSbull Adaptive voltage scaling allows a lower supply voltage for resilient
designs thus reduced powerbull Proposed method trades off between timing-error penalty vs
reduced power at a lower supply voltagebull Proposed method achieves an average of 17 energy reduction
compared to pure-margin designs Resilience benefits increase in the context of AVS strategy
086 09 094 098 10225
30
35
40
45
50pure-marginbrute-forceCombOpt
Supply voltage (V)
En
erg
y (
mJ
)
070 072 074 076 078 08024
26
28
30
32
34
36 pure-marginbrute-forceCombOpt
Supply voltage (V)
En
erg
y (
mJ
)
MUL EXU
Minimum achievable energy
Technology foundry 28nm
25ISVLSI-2014 invited talk 140710
Outlinebull Introductionbull Modeling of IC Variabilitybull Tolerance of IC Variabilitybull Margining of IC Variabilitybull Conclusions
26ISVLSI-2014 invited talk 140710
Breaking Chicken-Egg Loops Less Marginbull Example Interaction between reliability margin and AVS designs
bull Bias temperature instability (BTI) aging higher |ΔVth| lower fmax
bull AVS can be used to compensate for performance degradation
Circuit
Closed-loop AVS
On-chip aging
monitor
Circuit performanc
e
Voltage regulato
r
Circuit frequency
Vdd
time
time
Without AVSWith AVS
target
27ISVLSI-2014 invited talk 140710
Derated Library Characterization and AVS
bull VBTI = Voltage for BTI aging estimation
bull Vlib = Voltage for circuit performance estimation (library characterization)
bull VBTI and Vlib are required in signoff
bull VBTI and Vlib selection should consider BTI + AVS interaction
bull Aging and Vfinal are unknowns before circuit implementation
BTI degradation and AVS
Vfinal
VBTI |Vt|
Step 1
Vlib
Derated library
Step 2
Circuit implementation and
signoff
circuit
Step 3
28ISVLSI-2014 invited talk 140710
Library Characterization for AVS
bull VBTI = Voltage for BTI aging estimation
bull Vlib = Voltage for circuit performance estimation (library characterization)
bull VBTI and Vlib are required in signoff
bull VBTI and Vlib depend on aging during AVS
bull Aging and Vfinal are unknowns before circuit implementation
Vlib
VBTI Derated library
|Vt| Circuit implementation and
signoff
circuitBTI degradation and AVS
Vfinal
Step 1 Step 2 Step 3
No obvious guideline to define VBTI and Vlib
Inconsistency among Vfinal Vlib VBTI
bull What is the design overhead when timing libraries are not properly characterized
bull Can we define BTI- and AVS-aware signoff corners that ensure product goals with small design lifetime energy overheads Joint work with Wei-Ting Jonas Chan Tuck-Boon Chan Siddhartha Nath
29ISVLSI-2014 invited talk 140710
Power vs Area Across Different Signoffs
ldquoKneerdquo point for balanced area and power tradeoff
Optimistic signoff corner bull AVS increases supply voltage
aggressively to compensate aging
bull Large lifetime energy overhead
bull May fail to meet timing if desired supply voltage gt Vmax
bull Selection of signoff modes affects area power
bull ASP-DAC 2013 Optimization of signoff modes
Improve performance power or area
Reduce overdesign
NOM
ODNOM
OD
time
Vdd
tnom tOD tnom tOD
Also Multi-Mode Signoff Choices Matter
12
Fix fOD still 14 power range
Power of circuits w different overdrive modes
Different overdrive modes 26 power range
fnom = 800MHz Vnom = 08V
36ISVLSI-2014 invited talk 140710
Also Tunable Monitors Less Margin
Aggressive config Vmin_est lt Vmin_chip Some chips will fail
Optimized configbull Increase high
resistance passgatesbull Vmin_est asymp Vmin_chip
Default config
bull Low resistance passgates
bull Guardband for worst-case
bull Vmin_est gt Vmin_chip
bull 13mV margin
37ISVLSI-2014 invited talk 140710
Also Tunable Monitors Less Margin
Aggressive config Vmin_est lt Vmin_chip Some chips will fail
Default config
bull Low resistance passgates
bull Guardband for worst-case
bull Vmin_est gt Vmin_chip
bull 13mV margin
Optimized configbull Increase high
resistance passgatesbull Vmin_est asymp Vmin_chip
Benefits of tunability bull Compensate for difference
between model vs siliconbull Recover margin when variation is
reduced due to improved process
38ISVLSI-2014 invited talk 140710
Outlinebull Introductionbull Modeling of IC Variabilitybull Margining of IC Variabilitybull Tolerance of IC Variabilitybull Conclusions
39ISVLSI-2014 invited talk 140710
Conclusionsbull Variability severely challenges IC value
bull In manufacturing process during operation across lifetime
bull Benefit of ldquonext noderdquo is increasingly hard to findbull Entire node is a ldquo202020rdquo value propositionbull 5-10 in PPA metrics is now substantial at leading edge
bull Variability is connected to tapeout IC properties by models margins tolerances used in signoff
bull Some takeaways from this talkbull Substantial benefit from tightening BEOL corners (= signoff)bull ldquoMinimum cost of resiliencerdquo is a rich optimization challengebull Chicken-egg loops in signoff definition can be brokenbull Holistic approaches will provide ldquoequivalent scalingrdquo that
extends the value trajectory of Moorersquos Law
40ISVLSI-2014 invited talk 140710
Thank You
41ISVLSI-2014 invited talk 140710
Backup
42ISVLSI-2014 invited talk 140710
Power Penalty to Fix EM with AVS
1 2 3 4 5 6 7 8 91200
1300
1400
1500
1600
1700
030
032
034
036
Core Power (mW) PG Power (mW)
Implemetation
Core
Pow
er (m
W)
PG
Pow
er (m
W)
bull Core power increases due to elevated voltage bull PG power increases due to both elevated voltage and mesh degradationbull A tradeoff between invested guardband in signoff
Highest invested guardband
Least invested guardband
14 power penalty
>
43ISVLSI-2014 invited talk 140710
Homogeneous Cornersbull (1) Define RC corners of each layer separatelybull (2) Use corners from each layer to construct a
homogeneous corner for an interconnect stack
C-3σ
Layer M2
3σ
C-3σ
Layer M1
3σ
Interconnect stack with M1 and M2
M1 C
M2 C
3σ Pessimism
Example worst-case capacitance corner Homogeneous
Cw corner
44ISVLSI-2014 invited talk 140710
Homogeneous Cornersbull (1) Define RC corners of each layer separatelybull (2) Use corners from each layer to construct a
homogeneous corner for an interconnect stack
Interconnect stack with M1 and M2
M1 C
M2 C
3σ
Homogeneous Cw corner
C-3σ
Layer M2
3σ
C-3σ
Layer M1
3σ
Pessimism
Example worst-case capacitance corner
When variations in different layers are not fully correlated pessimism of homogeneous corners increase with layers
45ISVLSI-2014 invited talk 140710
Correlation Matrixbull Let Σ be the correlation matrix for variation sources
M1 M2 M3 M4
ΔW ΔT ΔH ΔW ΔT ΔH ΔW ΔT ΔH ΔW ΔT ΔH
M1 ΔW 1 0 0 γ 0 0 γ 0 0 0 0 0
ΔT 0 1 0 0 γ 0 0 γ 0 0 0 0
ΔH 0 0 1 0 0 γ 0 0 γ 0 0 0
M2 ΔW γ 0 0 1 0 0 γ 0 0 0 0 0
ΔT 0 γ 0 0 1 0 0 γ 0 0 0 0
ΔH 0 0 γ 0 0 1 0 0 γ 0 0 0
M3 ΔW γ 0 0 γ 0 0 1 0 0 0 0 0
ΔT 0 γ 0 0 γ 0 0 1 0 0 0 0
ΔH 0 0 γ 0 0 γ 0 0 1 0 0 0
M4 ΔW 0 0 0 0 0 0 0 0 0 1 0 0
ΔT 0 0 0 0 0 0 0 0 0 0 1 0
ΔH 0 0 0 0 0 0 0 0 0 0 0 1
= Σ
Correlation for variation sources with the same variation type and in the process module γ 05
Variation sources in different process modules are independent
46ISVLSI-2014 invited talk 140710
Wiring Structure in Timing-Critical Paths (2)
bull 92 of paths have lt 60 of wirelength on any single layer
Max wirelength ratio across all layers ()
Cum
ulati
ve p
roba
bilit
y
092
60
bull Variations in different layers are not fully correlated
bull Some paths have α gt 10 a CBC can underestimate delay variationsbull But these paths have larger delays at the other corner
C-worst corner underestimates delay variations but these paths are dominated by the RC-worst corner
α lt 10 delay variations are covered by the RC-worst corner
Dominated by C-worstΔdelay at C-worst gt Δdelay at RC-worst
Dominated by RC-worst Δdelay at RC-worst gt Δdelay at C-worst
Δdelay at RC-worst [d(Ycw) ndash d(Ytyp)] d(Ytyp)
48ISVLSI-2014 invited talk 140710
Delay Variation
α α
Δdelay at C-worst [d(Ycw) ndash d(Ytyp)] d(Ytyp)
bull Some paths have α gt 10 a CBC can underestimate delay variationsbull But these paths have larger delays at the other corner
C-worst corner underestimates delay variations but these paths are dominated by the RC-worst corner
α lt 10 delay variations are covered by the RC-worst corner
Dominated by C-worstΔdelay at C-worst gt Δdelay at RC-worst
Dominated by RC-worst Δdelay at RC-worst gt Δdelay at C-worst
Δdelay at RC-worst [d(Ycw) ndash d(Ytyp)] d(Ytyp)
bull Paths are more sensitive to R or to Cbull Using RC-worst or C-worst only will underestimate delay variationsbull Need both RC- and C-worst corners to cover process variationsbull In the following discussions α is defined at the dominant corner
49ISVLSI-2014 invited talk 140710
Non-Homogeneous Corner
bull Each layer can have different skewed variationsInterconnect stack with M1 and M2
M1 C
M2 C
3σ
Non-homogeneous cornerM1 == Cw (3σ)M2 == Ctyp
bull Less pessimism with non-homogeneous cornersbull Challenge
bull Many feasible combinationsbull A corner can only cover certain pathsbull How to choose the best combinations
50ISVLSI-2014 invited talk 140710
Opportunities for Tightened BEOL Corners
bull CBC can be pessimistic Most paths have α lt 05 bull Use tightened BEOL corners eg scale BEOL variation in
itf with α = 05
Δdj(Yrcw)dj(Ytyp) x 100
3σjd(Ytyp) x 100
Challenge how to avoid underestimating delay variation to preserve parametric yield
51ISVLSI-2014 invited talk 140710
Wiring Structure in Timing-Critical Paths
bull Critical paths are structurally similar
bull Wires on critical paths are routed on many layers
bull Structure is an outcome of the design flow
Testcasebull 45nm foundry library (wire
resistivity scaled by 8X)bull Netlist NETCARD 1mm2 570K
standard cell instancesbull 9 metal layersbull Extract critical paths from
different PVT and BEOL corners
Wirelength ratio ()
52ISVLSI-2014 invited talk 140710
Proposed Timing Signoff Flow
bull Extract RC at RC-worst C-worst and the typical corners
bull Calculate Δdelay of critical paths
bull Put path j in the group Gtbc if Δdelay is larger than a threshold
bull Fix only the paths in Gtbc using tightened BEOL corners
bull Since tightened corners have smaller delay variations timing closure is easier
Routed design
Timing analysis at BEOL corners Ytyp Ycw Yrcw
GTBC GCBC
ECOusing CBC
Timing analysis
using TBC
violation = 0
Timing analysis
using CBC
violation = 0
ECOusing TBC
done
53ISVLSI-2014 invited talk 140710
Experiment Setup
LEON3MP NETCARD SUPERBLUE12Clock period (ns) 18 20 31
bull Model BTI degradation with Vfinal throughout lifetime
bull Aging of a flat Vfinal asymp aging of an adaptive Vdd
bull But slightly pessimistic
Vdd
time
NBTI
PBTI
VBTI = Vlib asymp Vfinal
58ISVLSI-2014 invited talk 140710
Vfinal Estimation
bull Problem Vfinal is not available at early design stage (design has not been implemented)
bull Vfinal = Vdd end of life (to compensate BTI aging)
bull Gates along critical pathbull Timing slack at t = 0
bull Circuit activity is not an issue bull Because BTI effect is not sensitive to circuit activitybull DC or AC stress model is sufficient
59ISVLSI-2014 invited talk 140710
Observation and Heuristic 2
bull Observation 2 Vfinal is not sensitive to gate types
bull Heuristic 2 use average Vfinal of different gate typesbull Vfinal is a function of timing slack
bull Assume timing slack = 0
10mV
60ISVLSI-2014 invited talk 140710
Technology and Benchmark Circuits
bull NANGATE library with 32nm PTM technology bull Signoff for setup time violationbull Temperature = 125Cbull Process corner = slow NMOS and PMOSbull BTI degradation = DC AC
Supply voltages
Circuit Frequency (GHz)C5315 138c7552 125AES 089MPEG2 105
Vmax105V
Vinit090V
Vheur1 (DC) 097V
Vheur1 (AC) 095V
Vheur2 (DC) 095V
Vheur2 (AC) 093V
61ISVLSI-2014 invited talk 140710
A Reference Signoff Flow
bull Basic idea keep a consistent VBTI VLIB and Vdd throughout circuit lifetime
bull Signoff flowbull Estimate aging at each time step
bull Update circuit timing and Vdd
bull Repeat until t = tfinal
bull Modify circuit and start over if Vfinal gt maximum allowed voltage
bull No overhead in timing analysis but very slow Many STA runs
and library
Vstep AVS voltage stepVfinal converged voltage
62ISVLSI-2014 invited talk 140710
Experiment Setupbull Characterize different derated libraries
bull Evaluate impact of library characterizationbull Seven setups
1 VBTI = Vlib = Vinit Ignore AVS2 Most pessimistic derated library3 VBTI = Vlib = Vmax Extreme corner for AVS4 VBTI = Vfinal Do not overestimate aging but ignores AVS5 No derated library (reference)6 Proposed method with α=07 Proposed method with α=003
Case 1 2 3 4 5 6 7Vlib(V) Vinit Vinit Vmax Vinit NA Vheur1 Vheur2
VBTI (V) Vinit Vmax Vmax Vfinal NA Vheur1 Vheur2
63ISVLSI-2014 invited talk 140710
ldquoChicken and Eggrdquo Loop
bull ldquoChicken and eggrdquo loop in signoffbull Derated library characterization is related to BTI + AVSbull AVS affected by circuit implementation
bull Timing constraints critical paths etc
bull Circuit is affected by library characterization
Circuit
Derated Libraries
Vfinal
Vlib VBTI
64ISVLSI-2014 invited talk 140710
Bias Temperature Instability (BTI)
|ΔVth| increases when device is on (stressed)|ΔVth| is partially recovered when device is off (relaxed)NBTI PMOS PBTINMOS
|Vgs|
time
ON OFF ON OFF
[VattikondaWC06]
Device aging (|ΔVth|) accumulates over time
[TCASrsquo14]
65ISVLSI-2014 invited talk 140710
Observation 1
[Chan11]
bull BTI is a ldquofront-loadedrdquo phenomenon
bull 50 BTI aging happens within the 1st year of circuit lifetime (total lifetime = 10 years)
bull Most Vdd increment happens in early lifetime
bull Gap between Vdd and Vfinal reduces rapidly
asymp70 Vdd increment in 1 year(remaining 30 over 9 years)
Vfinal
66ISVLSI-2014 invited talk 140710
Results for DC Scenario
Optimistic signoff corner bull AVS increases supply voltage
aggressively to compensate agingbull Consume more powerbull May fail to meet timing if desired
bull Assume no EM fix at signoffbull BTI degradation is checked at each step and MTTF is updated as
30 MTTF penalty
200mV voltage compensation
>
70ISVLSI-2014 invited talk 140710
0 2 4 6 8 10 12090092094096098100102104
S1 S2 S3 S4 S5
Year
VDD
DMA 3S1 S2 S3 S4 S5
78
79
80
81
MTT
F (Y
ear)
EM Impact on AVS Scheduling
12 years MTTF penalty
>
71ISVLSI-2014 invited talk 140710
What is ldquoSignoffrdquo
bull Foundation of contract between design house and foundrybull ldquochip should workrdquo stack of models margins analysesbull Function timing signal integrity power integrity hellip
Nominal VddStatic IR drop
Power grid IR gradientDynamic IR
HCINBTI
Signoff Vdd
Voltage
Problem Margins = pessimism
overdesign schedule delay
ldquomargin stackrdquo for voltage signoff
Operating voltage
72ISVLSI-2014 invited talk 140710
Statistical Timing Analysis (1)
bull Delay sensitivity of path pj to variation source zv
bull Assumptions bull Δdjv is linear with respect to variation sources
bull Variation sources are normal distributions
bull Obtain Δdjv using 28 runs of RC extraction and static timing analysis (STA)
28 itf files (27 variation
sources + Ytyp)
Routed Netlist
RC extraction
STA
Δdjv
Δdjv = [ - ] 3dj(Yv) dj(Ytyp)
Note Path delay includes gate and wire delays
73ISVLSI-2014 invited talk 140710
Statistical Timing Analysis (2)bull Σ is the correlation matrix for variation sources (eg 27 x 27)
bull Σ = λλT (Note λ is obtained by Cholesky decomposition)
h h119879 119903119900119906119892 119901119906119905=1minus119864119877119879
+1minus119864119877119903times119879
recovery cycles
Clock period
Error rate [Kahng10]
76ISVLSI-2014 invited talk 140710
Selective-Endpoint Optimizationbull Optimize fanin cone w tighter constraints Allows replacement of Razor FF w normal FFbull Trade off cost of resilience vs data path optimization
Outlinebull Introductionbull Modeling of IC Variabilitybull Tolerance of IC Variabilitybull Margining of IC Variabilitybull Conclusions
6ISVLSI-2014 invited talk 140710
BEOL Corner Optimization
bull 20nm and below increased timing variation due to interconnect R Cbull Design closure becomes much more difficult
bull Costs of BEOL variationsbull More design effort (eg ldquolast monthrdquo of manual ECO iteration)
bull Compromised circuit performance at high Vdd
bull Recent work reduce signoff margin by using tightened BEOL corners without sacrificing parametric yieldbull Signoff at conventional BEOL corners is pessimistic for most timing-
critical pathsbull We identify paths which can be safely signed off using tightened
BEOL corners (TBC)bull Joint work with Sorin Dobre (Qualcomm) and Tuck-Boon Chan
7ISVLSI-2014 invited talk 140710
Proposed Timing Signoff Flow
Routed design
Timing analysis using conventional BEOL corners (CBC)
ECOusing CBC
violation = 0
done
Conventional Signoff
No
Routed design
Classify timing critical paths
GTBC GCBC
ECOusing CBC
Timing analysis
using TBC
violation = 0
Timing analysis
using CBC
violation = 0
ECOusing TBC
done
This work
NoNo
8ISVLSI-2014 invited talk 140710
Conventional BEOL Corners
bull Three major variation sources per layer ΔW ΔT ΔHbull Conventional BEOL corners (CBC)
bull Homogeneous corners all variation sources are skewed in the same direction
bull BEOL RC variations are modeled in interconnect technology file (itf)
M2
M3
M1
S2 W2T2
H2 Inter-layer dielectric
Inter-metal dielectric
H3
H1
T1
T3
ΔW ΔT ΔH
Ytyp typical typical Typical
Ycb min min max
Ycw max max min
Yrcb max max max
Yrcw min min min
9ISVLSI-2014 invited talk 140710
Statistical RC Modelbull 3 variation sources in each layer ΔW ΔT ΔH
bull 9-layer metal stack has 27 variation sources z1 z2 hellip z27
bull BEOL layers in the same process module use the same manufacturing equipment and process steps
bull zu and zv are correlated if and only ifbull zu and zv are the same type (ΔW ΔT or ΔH)
bull zu and zv are in the same process module
M2 z4 z5 z6
M4 z10 z11 z12
M3 z7 z8 z9
M5 z13 z14 z15
M6 z16 z17 z18
M7 z19 z20 z21
M8 z22 z23 z24
M9 z25 z26 z27
M1 z1 z2 z3
Process module 3
Process module 2
Process module 1
Examples bull ΔW in layer M4 has a
positive correlation with ΔW in layers M5 M6 and M7
bull But ΔW in layer M4 is not correlated with ΔT in M4
ΔW ΔT ΔH
10ISVLSI-2014 invited talk 140710
Pessimism of Conventional BEOL Corners (CBC)
bull Assumption a max (setup) path pj is ldquosaferdquo when delay evaluated at a given CBC is larger than nominal delay + 3σj
dj(YCBC) ge 3σj + dj(Ytyp)
bull For a given path we can compare the statistical delay variation and the delay obtained from a given CBC αj = 3σj Δdj(YCBC)
Δdelay (vs typ) at C-worst [d(Ycw) ndash d(Ytyp)] d(Ytyp)
bull Some paths have α gt 10 a CBC can underestimate delay variationsbull But these paths often have smaller α values at the other corner ()
C-worst corner underestimates delay variations but these paths are dominated by the RC-worst corner
α lt 10 here delay variations covered by RC-worst corner
Dominated by C-worstΔdelay at C-worst gt Δdelay at RC-worst
Dominated by RC-worst Δdelay at RC-worst gt Δdelay at C-worst
Δdelay (vs typ) at RC-worst [d(Ycw) ndash d(Ytyp)] d(Ytyp)
12ISVLSI-2014 invited talk 140710
α α
Δdelay at C-worst [d(Ycw) ndash d(Ytyp)] d(Ytyp)
bull Some paths have α gt 10 a CBC can underestimate delay variationsbull But these paths often have smaller α values at the other corner ()
C-worst corner underestimates delay variations but these paths are dominated by the RC-worst corner
α lt 10 delay variations are covered by the RC-worst corner
Dominated by C-worstΔdelay at C-worst gt Δdelay at RC-worst
Dominated by RC-worst Δdelay at RC-worst gt Δdelay at C-worst
Δdelay at RC-worst [d(Ycw) ndash d(Ytyp)] d(Ytyp)
bull Paths are more sensitive to R or to Cbull Using RC-worst or C-worst only will underestimate delay variationsbull Need both RC- and C-worst corners to cover process variations
In the following α is defined at the dominant corner
Intuition on Delay Variability Across Cw RCw
13ISVLSI-2014 invited talk 140710
Scaling Factor α and Delay Variationbull Paths with small Δdrcw and Δdcw have large α
bull Eg here we see αj gt 06 when ((Δdrcw lt 3) AND (Δdcw lt 3))
bull Identify paths for tightened BEOL corners based on Δdrcw and Δdcw
Δd(Ycw)d(Ytyp)
Δd(Yrcw)d(Ytyp) α
14ISVLSI-2014 invited talk 140710
bull Paths with small Δdrcw and Δdcw have large α
bull Eg there are αj gt 06 when ((Δdrcw lt 3) AND (Δdcw lt 3))
bull Identify paths for tightened BEOL corners based on Δdrcw and Δdcw
Find Paths for Which TBCs Can Be Used
Δd(Ycw)d(Ytyp)
Δd(Yrcw)d(Ytyp)
Acw
Arcw
Gtbc = Set of paths that can be safely signed off using TBC ( (Path with Δdcw larger than Acw) OR (Path with Δdrcw larger than Arcw) )
α
15ISVLSI-2014 invited talk 140710
Determining α Arcw and Acw
Δd at C-worst corner ()Δd at RC-worst corner ()
bull Assumption critical paths in different designs have similar trends
bull Extract Arcw and Acw from a set of representative paths
bull Plot α vs Δdelay find Arcw and Acw for a given α
bull Add +1 margin on Arcw and Acw to account for sampling error
bull Smaller α larger thresholds (Arcw and Acw) fewer paths in GTBC
Δd at C-worst corner ()
Arcw Acw
16ISVLSI-2014 invited talk 140710
Benefits of Tightened BEOL Corners
bull WNS and TNS are reduced by up to 100ps and 53nsbull Timing violations reduced by
24 to 100
bull TBC-06 more benefits bull Tradeoff between reduced margin
vs paths which use TBC
Correlation factor γ = 05
LEON SUPERBLUE12 NETCARD
-018-016-014-012
-01-008-006-004-002
0
CBC TBC-05 TBC-06 TBC-07
WN
S (n
s)
LEON SUPERBLUE12 NETCARD
-90-80-70-60-50-40-30-20-10
0
CBC TBC-05 TBC-06 TBC-07
TNS
(ns)
LEON SUPERBLUE12 NETCARD0
200400600800
1000120014001600
CBC TBC-05 TBC-06 TBC-07
Tim
ing
viol
ation
s
17ISVLSI-2014 invited talk 140710
Outlinebull Introductionbull Modeling of IC Variabilitybull Tolerance of IC Variabilitybull Margining of IC Variabilitybull Conclusions
18ISVLSI-2014 invited talk 140710
How to Minimize Cost of Resilience bull Additional circuits area and power penaltiesbull Recovery from errors throughput degradationbull Large hold margin short-path padding costbull Want benefits (eg energy) to maximally outweigh costs
Razor Razor-Lite TIMBER
Razor Razor-Lite TIMBER
Power penalty 30 [Das08] ~0 [Kim13] 100 [Choudhury09]
Overall Optimization Flowbull Iteratively optimize with SEOpt and SkewOpt
Initial placement (all FFs = error-tolerant FFs)
Energy lt min energy
Save current solution
Margin insertion on K paths based on sensitivity function
Replace error-tolerant FFs w normal FFs
SEOpt
Activity aware clock skew optimization
SkewOpt
OR-tree insertion
23ISVLSI-2014 invited talk 140710
Benefit of Low-Cost Resiliencebull Reference flows
bull Pure-margin (PM) conventional method w only margin insertionbull Brute-force (BF) use error-tolerant FFs for timing-critical endpoints
bull Proposed method (CO) achieves up to 21 energy reduction compared to reference methods
bull Resilience benefits increase with larger process variation
PM BF CO PM BF CO PM BF CO27
29
31
33
35
37
En
erg
y (
mJ
)
PM BF CO PM BF CO PM BF CO22
26
30
34
38Energy penalty of throughput degradation
Energy penalty of additional circuits
Energy wo resilience
En
erg
y (
mJ
)
Large marginMedium marginSmall margin
MUL
EXU
Large marginMedium marginSmall margin
Smallmediumlarge margin 1σ2σ3σ for SS corner Technology foundry 28nm
24ISVLSI-2014 invited talk 140710
Increased Benefit of Resilience with AVSbull Adaptive voltage scaling allows a lower supply voltage for resilient
designs thus reduced powerbull Proposed method trades off between timing-error penalty vs
reduced power at a lower supply voltagebull Proposed method achieves an average of 17 energy reduction
compared to pure-margin designs Resilience benefits increase in the context of AVS strategy
086 09 094 098 10225
30
35
40
45
50pure-marginbrute-forceCombOpt
Supply voltage (V)
En
erg
y (
mJ
)
070 072 074 076 078 08024
26
28
30
32
34
36 pure-marginbrute-forceCombOpt
Supply voltage (V)
En
erg
y (
mJ
)
MUL EXU
Minimum achievable energy
Technology foundry 28nm
25ISVLSI-2014 invited talk 140710
Outlinebull Introductionbull Modeling of IC Variabilitybull Tolerance of IC Variabilitybull Margining of IC Variabilitybull Conclusions
26ISVLSI-2014 invited talk 140710
Breaking Chicken-Egg Loops Less Marginbull Example Interaction between reliability margin and AVS designs
bull Bias temperature instability (BTI) aging higher |ΔVth| lower fmax
bull AVS can be used to compensate for performance degradation
Circuit
Closed-loop AVS
On-chip aging
monitor
Circuit performanc
e
Voltage regulato
r
Circuit frequency
Vdd
time
time
Without AVSWith AVS
target
27ISVLSI-2014 invited talk 140710
Derated Library Characterization and AVS
bull VBTI = Voltage for BTI aging estimation
bull Vlib = Voltage for circuit performance estimation (library characterization)
bull VBTI and Vlib are required in signoff
bull VBTI and Vlib selection should consider BTI + AVS interaction
bull Aging and Vfinal are unknowns before circuit implementation
BTI degradation and AVS
Vfinal
VBTI |Vt|
Step 1
Vlib
Derated library
Step 2
Circuit implementation and
signoff
circuit
Step 3
28ISVLSI-2014 invited talk 140710
Library Characterization for AVS
bull VBTI = Voltage for BTI aging estimation
bull Vlib = Voltage for circuit performance estimation (library characterization)
bull VBTI and Vlib are required in signoff
bull VBTI and Vlib depend on aging during AVS
bull Aging and Vfinal are unknowns before circuit implementation
Vlib
VBTI Derated library
|Vt| Circuit implementation and
signoff
circuitBTI degradation and AVS
Vfinal
Step 1 Step 2 Step 3
No obvious guideline to define VBTI and Vlib
Inconsistency among Vfinal Vlib VBTI
bull What is the design overhead when timing libraries are not properly characterized
bull Can we define BTI- and AVS-aware signoff corners that ensure product goals with small design lifetime energy overheads Joint work with Wei-Ting Jonas Chan Tuck-Boon Chan Siddhartha Nath
29ISVLSI-2014 invited talk 140710
Power vs Area Across Different Signoffs
ldquoKneerdquo point for balanced area and power tradeoff
Optimistic signoff corner bull AVS increases supply voltage
aggressively to compensate aging
bull Large lifetime energy overhead
bull May fail to meet timing if desired supply voltage gt Vmax
bull Selection of signoff modes affects area power
bull ASP-DAC 2013 Optimization of signoff modes
Improve performance power or area
Reduce overdesign
NOM
ODNOM
OD
time
Vdd
tnom tOD tnom tOD
Also Multi-Mode Signoff Choices Matter
12
Fix fOD still 14 power range
Power of circuits w different overdrive modes
Different overdrive modes 26 power range
fnom = 800MHz Vnom = 08V
36ISVLSI-2014 invited talk 140710
Also Tunable Monitors Less Margin
Aggressive config Vmin_est lt Vmin_chip Some chips will fail
Optimized configbull Increase high
resistance passgatesbull Vmin_est asymp Vmin_chip
Default config
bull Low resistance passgates
bull Guardband for worst-case
bull Vmin_est gt Vmin_chip
bull 13mV margin
37ISVLSI-2014 invited talk 140710
Also Tunable Monitors Less Margin
Aggressive config Vmin_est lt Vmin_chip Some chips will fail
Default config
bull Low resistance passgates
bull Guardband for worst-case
bull Vmin_est gt Vmin_chip
bull 13mV margin
Optimized configbull Increase high
resistance passgatesbull Vmin_est asymp Vmin_chip
Benefits of tunability bull Compensate for difference
between model vs siliconbull Recover margin when variation is
reduced due to improved process
38ISVLSI-2014 invited talk 140710
Outlinebull Introductionbull Modeling of IC Variabilitybull Margining of IC Variabilitybull Tolerance of IC Variabilitybull Conclusions
39ISVLSI-2014 invited talk 140710
Conclusionsbull Variability severely challenges IC value
bull In manufacturing process during operation across lifetime
bull Benefit of ldquonext noderdquo is increasingly hard to findbull Entire node is a ldquo202020rdquo value propositionbull 5-10 in PPA metrics is now substantial at leading edge
bull Variability is connected to tapeout IC properties by models margins tolerances used in signoff
bull Some takeaways from this talkbull Substantial benefit from tightening BEOL corners (= signoff)bull ldquoMinimum cost of resiliencerdquo is a rich optimization challengebull Chicken-egg loops in signoff definition can be brokenbull Holistic approaches will provide ldquoequivalent scalingrdquo that
extends the value trajectory of Moorersquos Law
40ISVLSI-2014 invited talk 140710
Thank You
41ISVLSI-2014 invited talk 140710
Backup
42ISVLSI-2014 invited talk 140710
Power Penalty to Fix EM with AVS
1 2 3 4 5 6 7 8 91200
1300
1400
1500
1600
1700
030
032
034
036
Core Power (mW) PG Power (mW)
Implemetation
Core
Pow
er (m
W)
PG
Pow
er (m
W)
bull Core power increases due to elevated voltage bull PG power increases due to both elevated voltage and mesh degradationbull A tradeoff between invested guardband in signoff
Highest invested guardband
Least invested guardband
14 power penalty
>
43ISVLSI-2014 invited talk 140710
Homogeneous Cornersbull (1) Define RC corners of each layer separatelybull (2) Use corners from each layer to construct a
homogeneous corner for an interconnect stack
C-3σ
Layer M2
3σ
C-3σ
Layer M1
3σ
Interconnect stack with M1 and M2
M1 C
M2 C
3σ Pessimism
Example worst-case capacitance corner Homogeneous
Cw corner
44ISVLSI-2014 invited talk 140710
Homogeneous Cornersbull (1) Define RC corners of each layer separatelybull (2) Use corners from each layer to construct a
homogeneous corner for an interconnect stack
Interconnect stack with M1 and M2
M1 C
M2 C
3σ
Homogeneous Cw corner
C-3σ
Layer M2
3σ
C-3σ
Layer M1
3σ
Pessimism
Example worst-case capacitance corner
When variations in different layers are not fully correlated pessimism of homogeneous corners increase with layers
45ISVLSI-2014 invited talk 140710
Correlation Matrixbull Let Σ be the correlation matrix for variation sources
M1 M2 M3 M4
ΔW ΔT ΔH ΔW ΔT ΔH ΔW ΔT ΔH ΔW ΔT ΔH
M1 ΔW 1 0 0 γ 0 0 γ 0 0 0 0 0
ΔT 0 1 0 0 γ 0 0 γ 0 0 0 0
ΔH 0 0 1 0 0 γ 0 0 γ 0 0 0
M2 ΔW γ 0 0 1 0 0 γ 0 0 0 0 0
ΔT 0 γ 0 0 1 0 0 γ 0 0 0 0
ΔH 0 0 γ 0 0 1 0 0 γ 0 0 0
M3 ΔW γ 0 0 γ 0 0 1 0 0 0 0 0
ΔT 0 γ 0 0 γ 0 0 1 0 0 0 0
ΔH 0 0 γ 0 0 γ 0 0 1 0 0 0
M4 ΔW 0 0 0 0 0 0 0 0 0 1 0 0
ΔT 0 0 0 0 0 0 0 0 0 0 1 0
ΔH 0 0 0 0 0 0 0 0 0 0 0 1
= Σ
Correlation for variation sources with the same variation type and in the process module γ 05
Variation sources in different process modules are independent
46ISVLSI-2014 invited talk 140710
Wiring Structure in Timing-Critical Paths (2)
bull 92 of paths have lt 60 of wirelength on any single layer
Max wirelength ratio across all layers ()
Cum
ulati
ve p
roba
bilit
y
092
60
bull Variations in different layers are not fully correlated
bull Some paths have α gt 10 a CBC can underestimate delay variationsbull But these paths have larger delays at the other corner
C-worst corner underestimates delay variations but these paths are dominated by the RC-worst corner
α lt 10 delay variations are covered by the RC-worst corner
Dominated by C-worstΔdelay at C-worst gt Δdelay at RC-worst
Dominated by RC-worst Δdelay at RC-worst gt Δdelay at C-worst
Δdelay at RC-worst [d(Ycw) ndash d(Ytyp)] d(Ytyp)
48ISVLSI-2014 invited talk 140710
Delay Variation
α α
Δdelay at C-worst [d(Ycw) ndash d(Ytyp)] d(Ytyp)
bull Some paths have α gt 10 a CBC can underestimate delay variationsbull But these paths have larger delays at the other corner
C-worst corner underestimates delay variations but these paths are dominated by the RC-worst corner
α lt 10 delay variations are covered by the RC-worst corner
Dominated by C-worstΔdelay at C-worst gt Δdelay at RC-worst
Dominated by RC-worst Δdelay at RC-worst gt Δdelay at C-worst
Δdelay at RC-worst [d(Ycw) ndash d(Ytyp)] d(Ytyp)
bull Paths are more sensitive to R or to Cbull Using RC-worst or C-worst only will underestimate delay variationsbull Need both RC- and C-worst corners to cover process variationsbull In the following discussions α is defined at the dominant corner
49ISVLSI-2014 invited talk 140710
Non-Homogeneous Corner
bull Each layer can have different skewed variationsInterconnect stack with M1 and M2
M1 C
M2 C
3σ
Non-homogeneous cornerM1 == Cw (3σ)M2 == Ctyp
bull Less pessimism with non-homogeneous cornersbull Challenge
bull Many feasible combinationsbull A corner can only cover certain pathsbull How to choose the best combinations
50ISVLSI-2014 invited talk 140710
Opportunities for Tightened BEOL Corners
bull CBC can be pessimistic Most paths have α lt 05 bull Use tightened BEOL corners eg scale BEOL variation in
itf with α = 05
Δdj(Yrcw)dj(Ytyp) x 100
3σjd(Ytyp) x 100
Challenge how to avoid underestimating delay variation to preserve parametric yield
51ISVLSI-2014 invited talk 140710
Wiring Structure in Timing-Critical Paths
bull Critical paths are structurally similar
bull Wires on critical paths are routed on many layers
bull Structure is an outcome of the design flow
Testcasebull 45nm foundry library (wire
resistivity scaled by 8X)bull Netlist NETCARD 1mm2 570K
standard cell instancesbull 9 metal layersbull Extract critical paths from
different PVT and BEOL corners
Wirelength ratio ()
52ISVLSI-2014 invited talk 140710
Proposed Timing Signoff Flow
bull Extract RC at RC-worst C-worst and the typical corners
bull Calculate Δdelay of critical paths
bull Put path j in the group Gtbc if Δdelay is larger than a threshold
bull Fix only the paths in Gtbc using tightened BEOL corners
bull Since tightened corners have smaller delay variations timing closure is easier
Routed design
Timing analysis at BEOL corners Ytyp Ycw Yrcw
GTBC GCBC
ECOusing CBC
Timing analysis
using TBC
violation = 0
Timing analysis
using CBC
violation = 0
ECOusing TBC
done
53ISVLSI-2014 invited talk 140710
Experiment Setup
LEON3MP NETCARD SUPERBLUE12Clock period (ns) 18 20 31
bull Model BTI degradation with Vfinal throughout lifetime
bull Aging of a flat Vfinal asymp aging of an adaptive Vdd
bull But slightly pessimistic
Vdd
time
NBTI
PBTI
VBTI = Vlib asymp Vfinal
58ISVLSI-2014 invited talk 140710
Vfinal Estimation
bull Problem Vfinal is not available at early design stage (design has not been implemented)
bull Vfinal = Vdd end of life (to compensate BTI aging)
bull Gates along critical pathbull Timing slack at t = 0
bull Circuit activity is not an issue bull Because BTI effect is not sensitive to circuit activitybull DC or AC stress model is sufficient
59ISVLSI-2014 invited talk 140710
Observation and Heuristic 2
bull Observation 2 Vfinal is not sensitive to gate types
bull Heuristic 2 use average Vfinal of different gate typesbull Vfinal is a function of timing slack
bull Assume timing slack = 0
10mV
60ISVLSI-2014 invited talk 140710
Technology and Benchmark Circuits
bull NANGATE library with 32nm PTM technology bull Signoff for setup time violationbull Temperature = 125Cbull Process corner = slow NMOS and PMOSbull BTI degradation = DC AC
Supply voltages
Circuit Frequency (GHz)C5315 138c7552 125AES 089MPEG2 105
Vmax105V
Vinit090V
Vheur1 (DC) 097V
Vheur1 (AC) 095V
Vheur2 (DC) 095V
Vheur2 (AC) 093V
61ISVLSI-2014 invited talk 140710
A Reference Signoff Flow
bull Basic idea keep a consistent VBTI VLIB and Vdd throughout circuit lifetime
bull Signoff flowbull Estimate aging at each time step
bull Update circuit timing and Vdd
bull Repeat until t = tfinal
bull Modify circuit and start over if Vfinal gt maximum allowed voltage
bull No overhead in timing analysis but very slow Many STA runs
and library
Vstep AVS voltage stepVfinal converged voltage
62ISVLSI-2014 invited talk 140710
Experiment Setupbull Characterize different derated libraries
bull Evaluate impact of library characterizationbull Seven setups
1 VBTI = Vlib = Vinit Ignore AVS2 Most pessimistic derated library3 VBTI = Vlib = Vmax Extreme corner for AVS4 VBTI = Vfinal Do not overestimate aging but ignores AVS5 No derated library (reference)6 Proposed method with α=07 Proposed method with α=003
Case 1 2 3 4 5 6 7Vlib(V) Vinit Vinit Vmax Vinit NA Vheur1 Vheur2
VBTI (V) Vinit Vmax Vmax Vfinal NA Vheur1 Vheur2
63ISVLSI-2014 invited talk 140710
ldquoChicken and Eggrdquo Loop
bull ldquoChicken and eggrdquo loop in signoffbull Derated library characterization is related to BTI + AVSbull AVS affected by circuit implementation
bull Timing constraints critical paths etc
bull Circuit is affected by library characterization
Circuit
Derated Libraries
Vfinal
Vlib VBTI
64ISVLSI-2014 invited talk 140710
Bias Temperature Instability (BTI)
|ΔVth| increases when device is on (stressed)|ΔVth| is partially recovered when device is off (relaxed)NBTI PMOS PBTINMOS
|Vgs|
time
ON OFF ON OFF
[VattikondaWC06]
Device aging (|ΔVth|) accumulates over time
[TCASrsquo14]
65ISVLSI-2014 invited talk 140710
Observation 1
[Chan11]
bull BTI is a ldquofront-loadedrdquo phenomenon
bull 50 BTI aging happens within the 1st year of circuit lifetime (total lifetime = 10 years)
bull Most Vdd increment happens in early lifetime
bull Gap between Vdd and Vfinal reduces rapidly
asymp70 Vdd increment in 1 year(remaining 30 over 9 years)
Vfinal
66ISVLSI-2014 invited talk 140710
Results for DC Scenario
Optimistic signoff corner bull AVS increases supply voltage
aggressively to compensate agingbull Consume more powerbull May fail to meet timing if desired
bull Assume no EM fix at signoffbull BTI degradation is checked at each step and MTTF is updated as
30 MTTF penalty
200mV voltage compensation
>
70ISVLSI-2014 invited talk 140710
0 2 4 6 8 10 12090092094096098100102104
S1 S2 S3 S4 S5
Year
VDD
DMA 3S1 S2 S3 S4 S5
78
79
80
81
MTT
F (Y
ear)
EM Impact on AVS Scheduling
12 years MTTF penalty
>
71ISVLSI-2014 invited talk 140710
What is ldquoSignoffrdquo
bull Foundation of contract between design house and foundrybull ldquochip should workrdquo stack of models margins analysesbull Function timing signal integrity power integrity hellip
Nominal VddStatic IR drop
Power grid IR gradientDynamic IR
HCINBTI
Signoff Vdd
Voltage
Problem Margins = pessimism
overdesign schedule delay
ldquomargin stackrdquo for voltage signoff
Operating voltage
72ISVLSI-2014 invited talk 140710
Statistical Timing Analysis (1)
bull Delay sensitivity of path pj to variation source zv
bull Assumptions bull Δdjv is linear with respect to variation sources
bull Variation sources are normal distributions
bull Obtain Δdjv using 28 runs of RC extraction and static timing analysis (STA)
28 itf files (27 variation
sources + Ytyp)
Routed Netlist
RC extraction
STA
Δdjv
Δdjv = [ - ] 3dj(Yv) dj(Ytyp)
Note Path delay includes gate and wire delays
73ISVLSI-2014 invited talk 140710
Statistical Timing Analysis (2)bull Σ is the correlation matrix for variation sources (eg 27 x 27)
bull Σ = λλT (Note λ is obtained by Cholesky decomposition)
h h119879 119903119900119906119892 119901119906119905=1minus119864119877119879
+1minus119864119877119903times119879
recovery cycles
Clock period
Error rate [Kahng10]
76ISVLSI-2014 invited talk 140710
Selective-Endpoint Optimizationbull Optimize fanin cone w tighter constraints Allows replacement of Razor FF w normal FFbull Trade off cost of resilience vs data path optimization
Energy Reduction in AVS Contextbull Adaptive voltage scaling allows lower supply voltage for resilient
designs thus reduced powerbull Proposed method trades off between timing-error penalty vs
reduced power at a lower supply voltagebull Proposed method achieves an average of 18 energy reduction
compared to pure-margin designs Resilience benefits increase in the context of AVS strategy
084 088 092 096 10030
36
42
48
54
60brute-forcepure-marginCombOpt
Supply voltage (V)
En
erg
y (
mJ
)
084 086 088 09 092 094 096 098 1 10225
29
33
37
41
45brute-force
pure-margin
CombOpt
Supply voltage (V)
En
erg
y (
mJ
)
MUL EXU
Minimum achievable energy
80ISVLSI-2014 invited talk 140710
Our Concept Mode Dominancebull Design cone (of mode A) is the union of all the feasible operating modes for
circuits signed of at mode Abull Design cone is determined by tradeoff between voltage and frequency (mainly
threshold voltages)bull One mode is outside of the design cone of the other
failed design overdesignbull Mode A has positive timing slacks with respect to mode B
mode A dominates mode Bbull Equivalent dominance no mode is dominated by the other
bull Modes are in each othersrsquo design cone
Voltage
Frequency
A
Negative Slacks = failed design
Positive Slacks = overdesign
B
C
Design Cone of mode A
Multi-mode signoff at modes which do not exhibit equivalent dominance leads to overdesign
Guideline search for signoff modes within design cone reduce overdesign
81ISVLSI-2014 invited talk 140710
Our Method Global Optimizationbull Iteratively sample and refine power models
bull Avoid circuit implementation at each modebull Small constant of runs is enough Scalable
Sample (SPampR)
Construct power models
Estimate optimal signoff modes
Sample (SPampR)
Refine power models
Adaptive search
Global optimization flow
09 10 11 1214
15
16
17
18
19
201st 2nd real
Signoff Voltage (v)
Po
wer
(m
W)
Power estimation of adaptive search
bull Ovals indicate sample pointsbull 1st 2nd power from power models at first
second iterationbull real power from real implemented circuits
Design AESf 700MHz
82ISVLSI-2014 invited talk 140710
Classes of Closed-Loop AVS
bull Critical path may be difficult to identify (IP from 3rd party)
bull Calibrating monitors at multiple modesvoltages requires long test time
Closed-Loop AVS
Design-dependent replica
In-situmonitor
Generic monitor
bull Does not capture design-specific performance variation
82
This work Tunable monitor for closed-loop AVSbull Can be applied as a generic monitorbull Or tuned to capture design-specific performance
83ISVLSI-2014 invited talk 140710
Design of RO with Tunable Vmin
bull Identified two circuit knobs to tune Vmin
bull Series resistancebull Cell types (INV NAND NOR)
bull Proposed circuitbull Different cell type covers different process cornersbull Tune series resistance of each stage to high or low
1 bit 1 bit 1 bit Control pins
High resistance
Low resistance
84ISVLSI-2014 invited talk 140710
Benefit of Resilience Cost Reductionbull Reference flows
bull Pure-margin (PM) conventional methodology w only margin insertionbull Brute-force (BF) insert error-tolerant FFs at timing-critical endpoints
bull Proposed method (CO) achieves up to 20 energy reduction compared to reference methods
bull Resilience benefits increase with safety margin
PM BF CO PM BF CO PM BF CO25
30
35
40
45
50
55 Energy penalty of throughput degradationEnergy penalty of additional circuitsEnergy wo resilience
En
erg
y (
mJ
)
Large marginMedium marginSmall margin
MUL
PM BF CO PM BF CO PM BF CO25
27
29
31
33
35
En
erg
y (
mJ
)
EXU
Large marginMedium marginSmall margin
Smallmediumlarge margin safety margin = 51015 of clock period
85ISVLSI-2014 invited talk 140710
Increased Benefit of Resilience With AVSbull AVS (Adaptive Voltage Scaling) allows lower supply voltage for
resilient designs reduced powerbull We trade off between timing-error penalty vs reduced power at a
lower supply voltagebull Average 18 energy reduction compared to pure-margin designs
Resilience benefits increase in AVS context
084 088 092 096 10030
36
42
48
54
60brute-forcepure-marginCombOpt
Supply voltage (V)
En
erg
y (
mJ
)
084 086 088 09 092 094 096 098 1 10225
29
33
37
41
45brute-force
pure-margin
CombOpt
Supply voltage (V)
En
erg
y (
mJ
)
MUL EXU
Minimum achievable energy
86ISVLSI-2014 invited talk 140710
Overall Optimization Flow
bull Iteratively optimize with SEOpt and SkewOpt
Initial placement (all FFs = error-tolerant FFs)
Energy lt min energy
Save current solution
Margin insertion on K paths based on sensitivity function
Replace error-tolerant FFs w normal FFs
SEOpt
Activity-aware clock skew optimization
SkewOpt
Toward Holistic Modeling Margining and Tolerance of IC Variabi
IC Variability
Challenge Value of Technology
Solutions Modeling Margining Tolerance
Outline
BEOL Corner Optimization
Proposed Timing Signoff Flow
Conventional BEOL Corners
Statistical RC Model
Pessimism of Conventional BEOL Corners (CBC)
Intuition on Delay Variability Across Cw RCw
Intuition on Delay Variability Across Cw RCw (2)
Scaling Factor α and Delay Variation
Find Paths for Which TBCs Can Be Used
Determining α Arcw and Acw
Benefits of Tightened BEOL Corners
Outline (2)
How to Minimize Cost of Resilience
Tradeoff Resilience Cost vs Datapath Cost
Selective-Endpoint Optimization (SEOpt)
Clock Skew Optimization (SkewOpt)
Overall Optimization Flow
Benefit of Low-Cost Resilience
Increased Benefit of Resilience with AVS
Outline (3)
Breaking Chicken-Egg Loops Less Margin
Derated Library Characterization and AVS
Library Characterization for AVS
Power vs Area Across Different Signoffs
Heuristics 1
Vfinal Estimation
Observation and Heuristic 2
Proposed Library Characterization Flow
Power vs Area for All Designs
Also Multi-Mode Signoff Choices Matter
Also Tunable Monitors Less Margin
Also Tunable Monitors Less Margin (2)
Outline (4)
Conclusions
Thank You
Backup
Power Penalty to Fix EM with AVS
Homogeneous Corners
Homogeneous Corners (2)
Correlation Matrix
Wiring Structure in Timing-Critical Paths (2)
Delay Variation
Delay Variation (2)
Non-Homogeneous Corner
Opportunities for Tightened BEOL Corners
Wiring Structure in Timing-Critical Paths (2)
Proposed Timing Signoff Flow (2)
Experiment Setup
Further Analysis
Scaling Factor Results
Benefits of Tightened BEOL Corners (1)
Heuristics 1 (2)
Vfinal Estimation (2)
Observation and Heuristic 2 (2)
Technology and Benchmark Circuits
A Reference Signoff Flow
Experiment Setup (2)
ldquoChicken and Eggrdquo Loop
Bias Temperature Instability (BTI)
Observation 1
Results for DC Scenario
Problem Signoff Corner Definition
AVS Signoff Corner Selection
AVS Impact on EM Lifetime
EM Impact on AVS Scheduling
What is ldquoSignoffrdquo
Statistical Timing Analysis (1)
Statistical Timing Analysis (2)
Resilient Designs
Resilience Cost Reduction Problem
Selective-Endpoint Optimization
Process-Aware Vdd Scaling (PVS)
Challenge Variability
Energy Reduction in AVS Context
Our Concept Mode Dominance
Our Method Global Optimization
Classes of Closed-Loop AVS
Design of RO with Tunable Vmin
Benefit of Resilience Cost Reduction
Increased Benefit of Resilience With AVS
Overall Optimization Flow (2)
5ISVLSI-2014 invited talk 140710
Outlinebull Introductionbull Modeling of IC Variabilitybull Tolerance of IC Variabilitybull Margining of IC Variabilitybull Conclusions
6ISVLSI-2014 invited talk 140710
BEOL Corner Optimization
bull 20nm and below increased timing variation due to interconnect R Cbull Design closure becomes much more difficult
bull Costs of BEOL variationsbull More design effort (eg ldquolast monthrdquo of manual ECO iteration)
bull Compromised circuit performance at high Vdd
bull Recent work reduce signoff margin by using tightened BEOL corners without sacrificing parametric yieldbull Signoff at conventional BEOL corners is pessimistic for most timing-
critical pathsbull We identify paths which can be safely signed off using tightened
BEOL corners (TBC)bull Joint work with Sorin Dobre (Qualcomm) and Tuck-Boon Chan
7ISVLSI-2014 invited talk 140710
Proposed Timing Signoff Flow
Routed design
Timing analysis using conventional BEOL corners (CBC)
ECOusing CBC
violation = 0
done
Conventional Signoff
No
Routed design
Classify timing critical paths
GTBC GCBC
ECOusing CBC
Timing analysis
using TBC
violation = 0
Timing analysis
using CBC
violation = 0
ECOusing TBC
done
This work
NoNo
8ISVLSI-2014 invited talk 140710
Conventional BEOL Corners
bull Three major variation sources per layer ΔW ΔT ΔHbull Conventional BEOL corners (CBC)
bull Homogeneous corners all variation sources are skewed in the same direction
bull BEOL RC variations are modeled in interconnect technology file (itf)
M2
M3
M1
S2 W2T2
H2 Inter-layer dielectric
Inter-metal dielectric
H3
H1
T1
T3
ΔW ΔT ΔH
Ytyp typical typical Typical
Ycb min min max
Ycw max max min
Yrcb max max max
Yrcw min min min
9ISVLSI-2014 invited talk 140710
Statistical RC Modelbull 3 variation sources in each layer ΔW ΔT ΔH
bull 9-layer metal stack has 27 variation sources z1 z2 hellip z27
bull BEOL layers in the same process module use the same manufacturing equipment and process steps
bull zu and zv are correlated if and only ifbull zu and zv are the same type (ΔW ΔT or ΔH)
bull zu and zv are in the same process module
M2 z4 z5 z6
M4 z10 z11 z12
M3 z7 z8 z9
M5 z13 z14 z15
M6 z16 z17 z18
M7 z19 z20 z21
M8 z22 z23 z24
M9 z25 z26 z27
M1 z1 z2 z3
Process module 3
Process module 2
Process module 1
Examples bull ΔW in layer M4 has a
positive correlation with ΔW in layers M5 M6 and M7
bull But ΔW in layer M4 is not correlated with ΔT in M4
ΔW ΔT ΔH
10ISVLSI-2014 invited talk 140710
Pessimism of Conventional BEOL Corners (CBC)
bull Assumption a max (setup) path pj is ldquosaferdquo when delay evaluated at a given CBC is larger than nominal delay + 3σj
dj(YCBC) ge 3σj + dj(Ytyp)
bull For a given path we can compare the statistical delay variation and the delay obtained from a given CBC αj = 3σj Δdj(YCBC)
Δdelay (vs typ) at C-worst [d(Ycw) ndash d(Ytyp)] d(Ytyp)
bull Some paths have α gt 10 a CBC can underestimate delay variationsbull But these paths often have smaller α values at the other corner ()
C-worst corner underestimates delay variations but these paths are dominated by the RC-worst corner
α lt 10 here delay variations covered by RC-worst corner
Dominated by C-worstΔdelay at C-worst gt Δdelay at RC-worst
Dominated by RC-worst Δdelay at RC-worst gt Δdelay at C-worst
Δdelay (vs typ) at RC-worst [d(Ycw) ndash d(Ytyp)] d(Ytyp)
12ISVLSI-2014 invited talk 140710
α α
Δdelay at C-worst [d(Ycw) ndash d(Ytyp)] d(Ytyp)
bull Some paths have α gt 10 a CBC can underestimate delay variationsbull But these paths often have smaller α values at the other corner ()
C-worst corner underestimates delay variations but these paths are dominated by the RC-worst corner
α lt 10 delay variations are covered by the RC-worst corner
Dominated by C-worstΔdelay at C-worst gt Δdelay at RC-worst
Dominated by RC-worst Δdelay at RC-worst gt Δdelay at C-worst
Δdelay at RC-worst [d(Ycw) ndash d(Ytyp)] d(Ytyp)
bull Paths are more sensitive to R or to Cbull Using RC-worst or C-worst only will underestimate delay variationsbull Need both RC- and C-worst corners to cover process variations
In the following α is defined at the dominant corner
Intuition on Delay Variability Across Cw RCw
13ISVLSI-2014 invited talk 140710
Scaling Factor α and Delay Variationbull Paths with small Δdrcw and Δdcw have large α
bull Eg here we see αj gt 06 when ((Δdrcw lt 3) AND (Δdcw lt 3))
bull Identify paths for tightened BEOL corners based on Δdrcw and Δdcw
Δd(Ycw)d(Ytyp)
Δd(Yrcw)d(Ytyp) α
14ISVLSI-2014 invited talk 140710
bull Paths with small Δdrcw and Δdcw have large α
bull Eg there are αj gt 06 when ((Δdrcw lt 3) AND (Δdcw lt 3))
bull Identify paths for tightened BEOL corners based on Δdrcw and Δdcw
Find Paths for Which TBCs Can Be Used
Δd(Ycw)d(Ytyp)
Δd(Yrcw)d(Ytyp)
Acw
Arcw
Gtbc = Set of paths that can be safely signed off using TBC ( (Path with Δdcw larger than Acw) OR (Path with Δdrcw larger than Arcw) )
α
15ISVLSI-2014 invited talk 140710
Determining α Arcw and Acw
Δd at C-worst corner ()Δd at RC-worst corner ()
bull Assumption critical paths in different designs have similar trends
bull Extract Arcw and Acw from a set of representative paths
bull Plot α vs Δdelay find Arcw and Acw for a given α
bull Add +1 margin on Arcw and Acw to account for sampling error
bull Smaller α larger thresholds (Arcw and Acw) fewer paths in GTBC
Δd at C-worst corner ()
Arcw Acw
16ISVLSI-2014 invited talk 140710
Benefits of Tightened BEOL Corners
bull WNS and TNS are reduced by up to 100ps and 53nsbull Timing violations reduced by
24 to 100
bull TBC-06 more benefits bull Tradeoff between reduced margin
vs paths which use TBC
Correlation factor γ = 05
LEON SUPERBLUE12 NETCARD
-018-016-014-012
-01-008-006-004-002
0
CBC TBC-05 TBC-06 TBC-07
WN
S (n
s)
LEON SUPERBLUE12 NETCARD
-90-80-70-60-50-40-30-20-10
0
CBC TBC-05 TBC-06 TBC-07
TNS
(ns)
LEON SUPERBLUE12 NETCARD0
200400600800
1000120014001600
CBC TBC-05 TBC-06 TBC-07
Tim
ing
viol
ation
s
17ISVLSI-2014 invited talk 140710
Outlinebull Introductionbull Modeling of IC Variabilitybull Tolerance of IC Variabilitybull Margining of IC Variabilitybull Conclusions
18ISVLSI-2014 invited talk 140710
How to Minimize Cost of Resilience bull Additional circuits area and power penaltiesbull Recovery from errors throughput degradationbull Large hold margin short-path padding costbull Want benefits (eg energy) to maximally outweigh costs
Razor Razor-Lite TIMBER
Razor Razor-Lite TIMBER
Power penalty 30 [Das08] ~0 [Kim13] 100 [Choudhury09]
Overall Optimization Flowbull Iteratively optimize with SEOpt and SkewOpt
Initial placement (all FFs = error-tolerant FFs)
Energy lt min energy
Save current solution
Margin insertion on K paths based on sensitivity function
Replace error-tolerant FFs w normal FFs
SEOpt
Activity aware clock skew optimization
SkewOpt
OR-tree insertion
23ISVLSI-2014 invited talk 140710
Benefit of Low-Cost Resiliencebull Reference flows
bull Pure-margin (PM) conventional method w only margin insertionbull Brute-force (BF) use error-tolerant FFs for timing-critical endpoints
bull Proposed method (CO) achieves up to 21 energy reduction compared to reference methods
bull Resilience benefits increase with larger process variation
PM BF CO PM BF CO PM BF CO27
29
31
33
35
37
En
erg
y (
mJ
)
PM BF CO PM BF CO PM BF CO22
26
30
34
38Energy penalty of throughput degradation
Energy penalty of additional circuits
Energy wo resilience
En
erg
y (
mJ
)
Large marginMedium marginSmall margin
MUL
EXU
Large marginMedium marginSmall margin
Smallmediumlarge margin 1σ2σ3σ for SS corner Technology foundry 28nm
24ISVLSI-2014 invited talk 140710
Increased Benefit of Resilience with AVSbull Adaptive voltage scaling allows a lower supply voltage for resilient
designs thus reduced powerbull Proposed method trades off between timing-error penalty vs
reduced power at a lower supply voltagebull Proposed method achieves an average of 17 energy reduction
compared to pure-margin designs Resilience benefits increase in the context of AVS strategy
086 09 094 098 10225
30
35
40
45
50pure-marginbrute-forceCombOpt
Supply voltage (V)
En
erg
y (
mJ
)
070 072 074 076 078 08024
26
28
30
32
34
36 pure-marginbrute-forceCombOpt
Supply voltage (V)
En
erg
y (
mJ
)
MUL EXU
Minimum achievable energy
Technology foundry 28nm
25ISVLSI-2014 invited talk 140710
Outlinebull Introductionbull Modeling of IC Variabilitybull Tolerance of IC Variabilitybull Margining of IC Variabilitybull Conclusions
26ISVLSI-2014 invited talk 140710
Breaking Chicken-Egg Loops Less Marginbull Example Interaction between reliability margin and AVS designs
bull Bias temperature instability (BTI) aging higher |ΔVth| lower fmax
bull AVS can be used to compensate for performance degradation
Circuit
Closed-loop AVS
On-chip aging
monitor
Circuit performanc
e
Voltage regulato
r
Circuit frequency
Vdd
time
time
Without AVSWith AVS
target
27ISVLSI-2014 invited talk 140710
Derated Library Characterization and AVS
bull VBTI = Voltage for BTI aging estimation
bull Vlib = Voltage for circuit performance estimation (library characterization)
bull VBTI and Vlib are required in signoff
bull VBTI and Vlib selection should consider BTI + AVS interaction
bull Aging and Vfinal are unknowns before circuit implementation
BTI degradation and AVS
Vfinal
VBTI |Vt|
Step 1
Vlib
Derated library
Step 2
Circuit implementation and
signoff
circuit
Step 3
28ISVLSI-2014 invited talk 140710
Library Characterization for AVS
bull VBTI = Voltage for BTI aging estimation
bull Vlib = Voltage for circuit performance estimation (library characterization)
bull VBTI and Vlib are required in signoff
bull VBTI and Vlib depend on aging during AVS
bull Aging and Vfinal are unknowns before circuit implementation
Vlib
VBTI Derated library
|Vt| Circuit implementation and
signoff
circuitBTI degradation and AVS
Vfinal
Step 1 Step 2 Step 3
No obvious guideline to define VBTI and Vlib
Inconsistency among Vfinal Vlib VBTI
bull What is the design overhead when timing libraries are not properly characterized
bull Can we define BTI- and AVS-aware signoff corners that ensure product goals with small design lifetime energy overheads Joint work with Wei-Ting Jonas Chan Tuck-Boon Chan Siddhartha Nath
29ISVLSI-2014 invited talk 140710
Power vs Area Across Different Signoffs
ldquoKneerdquo point for balanced area and power tradeoff
Optimistic signoff corner bull AVS increases supply voltage
aggressively to compensate aging
bull Large lifetime energy overhead
bull May fail to meet timing if desired supply voltage gt Vmax
bull Selection of signoff modes affects area power
bull ASP-DAC 2013 Optimization of signoff modes
Improve performance power or area
Reduce overdesign
NOM
ODNOM
OD
time
Vdd
tnom tOD tnom tOD
Also Multi-Mode Signoff Choices Matter
12
Fix fOD still 14 power range
Power of circuits w different overdrive modes
Different overdrive modes 26 power range
fnom = 800MHz Vnom = 08V
36ISVLSI-2014 invited talk 140710
Also Tunable Monitors Less Margin
Aggressive config Vmin_est lt Vmin_chip Some chips will fail
Optimized configbull Increase high
resistance passgatesbull Vmin_est asymp Vmin_chip
Default config
bull Low resistance passgates
bull Guardband for worst-case
bull Vmin_est gt Vmin_chip
bull 13mV margin
37ISVLSI-2014 invited talk 140710
Also Tunable Monitors Less Margin
Aggressive config Vmin_est lt Vmin_chip Some chips will fail
Default config
bull Low resistance passgates
bull Guardband for worst-case
bull Vmin_est gt Vmin_chip
bull 13mV margin
Optimized configbull Increase high
resistance passgatesbull Vmin_est asymp Vmin_chip
Benefits of tunability bull Compensate for difference
between model vs siliconbull Recover margin when variation is
reduced due to improved process
38ISVLSI-2014 invited talk 140710
Outlinebull Introductionbull Modeling of IC Variabilitybull Margining of IC Variabilitybull Tolerance of IC Variabilitybull Conclusions
39ISVLSI-2014 invited talk 140710
Conclusionsbull Variability severely challenges IC value
bull In manufacturing process during operation across lifetime
bull Benefit of ldquonext noderdquo is increasingly hard to findbull Entire node is a ldquo202020rdquo value propositionbull 5-10 in PPA metrics is now substantial at leading edge
bull Variability is connected to tapeout IC properties by models margins tolerances used in signoff
bull Some takeaways from this talkbull Substantial benefit from tightening BEOL corners (= signoff)bull ldquoMinimum cost of resiliencerdquo is a rich optimization challengebull Chicken-egg loops in signoff definition can be brokenbull Holistic approaches will provide ldquoequivalent scalingrdquo that
extends the value trajectory of Moorersquos Law
40ISVLSI-2014 invited talk 140710
Thank You
41ISVLSI-2014 invited talk 140710
Backup
42ISVLSI-2014 invited talk 140710
Power Penalty to Fix EM with AVS
1 2 3 4 5 6 7 8 91200
1300
1400
1500
1600
1700
030
032
034
036
Core Power (mW) PG Power (mW)
Implemetation
Core
Pow
er (m
W)
PG
Pow
er (m
W)
bull Core power increases due to elevated voltage bull PG power increases due to both elevated voltage and mesh degradationbull A tradeoff between invested guardband in signoff
Highest invested guardband
Least invested guardband
14 power penalty
>
43ISVLSI-2014 invited talk 140710
Homogeneous Cornersbull (1) Define RC corners of each layer separatelybull (2) Use corners from each layer to construct a
homogeneous corner for an interconnect stack
C-3σ
Layer M2
3σ
C-3σ
Layer M1
3σ
Interconnect stack with M1 and M2
M1 C
M2 C
3σ Pessimism
Example worst-case capacitance corner Homogeneous
Cw corner
44ISVLSI-2014 invited talk 140710
Homogeneous Cornersbull (1) Define RC corners of each layer separatelybull (2) Use corners from each layer to construct a
homogeneous corner for an interconnect stack
Interconnect stack with M1 and M2
M1 C
M2 C
3σ
Homogeneous Cw corner
C-3σ
Layer M2
3σ
C-3σ
Layer M1
3σ
Pessimism
Example worst-case capacitance corner
When variations in different layers are not fully correlated pessimism of homogeneous corners increase with layers
45ISVLSI-2014 invited talk 140710
Correlation Matrixbull Let Σ be the correlation matrix for variation sources
M1 M2 M3 M4
ΔW ΔT ΔH ΔW ΔT ΔH ΔW ΔT ΔH ΔW ΔT ΔH
M1 ΔW 1 0 0 γ 0 0 γ 0 0 0 0 0
ΔT 0 1 0 0 γ 0 0 γ 0 0 0 0
ΔH 0 0 1 0 0 γ 0 0 γ 0 0 0
M2 ΔW γ 0 0 1 0 0 γ 0 0 0 0 0
ΔT 0 γ 0 0 1 0 0 γ 0 0 0 0
ΔH 0 0 γ 0 0 1 0 0 γ 0 0 0
M3 ΔW γ 0 0 γ 0 0 1 0 0 0 0 0
ΔT 0 γ 0 0 γ 0 0 1 0 0 0 0
ΔH 0 0 γ 0 0 γ 0 0 1 0 0 0
M4 ΔW 0 0 0 0 0 0 0 0 0 1 0 0
ΔT 0 0 0 0 0 0 0 0 0 0 1 0
ΔH 0 0 0 0 0 0 0 0 0 0 0 1
= Σ
Correlation for variation sources with the same variation type and in the process module γ 05
Variation sources in different process modules are independent
46ISVLSI-2014 invited talk 140710
Wiring Structure in Timing-Critical Paths (2)
bull 92 of paths have lt 60 of wirelength on any single layer
Max wirelength ratio across all layers ()
Cum
ulati
ve p
roba
bilit
y
092
60
bull Variations in different layers are not fully correlated
bull Some paths have α gt 10 a CBC can underestimate delay variationsbull But these paths have larger delays at the other corner
C-worst corner underestimates delay variations but these paths are dominated by the RC-worst corner
α lt 10 delay variations are covered by the RC-worst corner
Dominated by C-worstΔdelay at C-worst gt Δdelay at RC-worst
Dominated by RC-worst Δdelay at RC-worst gt Δdelay at C-worst
Δdelay at RC-worst [d(Ycw) ndash d(Ytyp)] d(Ytyp)
48ISVLSI-2014 invited talk 140710
Delay Variation
α α
Δdelay at C-worst [d(Ycw) ndash d(Ytyp)] d(Ytyp)
bull Some paths have α gt 10 a CBC can underestimate delay variationsbull But these paths have larger delays at the other corner
C-worst corner underestimates delay variations but these paths are dominated by the RC-worst corner
α lt 10 delay variations are covered by the RC-worst corner
Dominated by C-worstΔdelay at C-worst gt Δdelay at RC-worst
Dominated by RC-worst Δdelay at RC-worst gt Δdelay at C-worst
Δdelay at RC-worst [d(Ycw) ndash d(Ytyp)] d(Ytyp)
bull Paths are more sensitive to R or to Cbull Using RC-worst or C-worst only will underestimate delay variationsbull Need both RC- and C-worst corners to cover process variationsbull In the following discussions α is defined at the dominant corner
49ISVLSI-2014 invited talk 140710
Non-Homogeneous Corner
bull Each layer can have different skewed variationsInterconnect stack with M1 and M2
M1 C
M2 C
3σ
Non-homogeneous cornerM1 == Cw (3σ)M2 == Ctyp
bull Less pessimism with non-homogeneous cornersbull Challenge
bull Many feasible combinationsbull A corner can only cover certain pathsbull How to choose the best combinations
50ISVLSI-2014 invited talk 140710
Opportunities for Tightened BEOL Corners
bull CBC can be pessimistic Most paths have α lt 05 bull Use tightened BEOL corners eg scale BEOL variation in
itf with α = 05
Δdj(Yrcw)dj(Ytyp) x 100
3σjd(Ytyp) x 100
Challenge how to avoid underestimating delay variation to preserve parametric yield
51ISVLSI-2014 invited talk 140710
Wiring Structure in Timing-Critical Paths
bull Critical paths are structurally similar
bull Wires on critical paths are routed on many layers
bull Structure is an outcome of the design flow
Testcasebull 45nm foundry library (wire
resistivity scaled by 8X)bull Netlist NETCARD 1mm2 570K
standard cell instancesbull 9 metal layersbull Extract critical paths from
different PVT and BEOL corners
Wirelength ratio ()
52ISVLSI-2014 invited talk 140710
Proposed Timing Signoff Flow
bull Extract RC at RC-worst C-worst and the typical corners
bull Calculate Δdelay of critical paths
bull Put path j in the group Gtbc if Δdelay is larger than a threshold
bull Fix only the paths in Gtbc using tightened BEOL corners
bull Since tightened corners have smaller delay variations timing closure is easier
Routed design
Timing analysis at BEOL corners Ytyp Ycw Yrcw
GTBC GCBC
ECOusing CBC
Timing analysis
using TBC
violation = 0
Timing analysis
using CBC
violation = 0
ECOusing TBC
done
53ISVLSI-2014 invited talk 140710
Experiment Setup
LEON3MP NETCARD SUPERBLUE12Clock period (ns) 18 20 31
bull Model BTI degradation with Vfinal throughout lifetime
bull Aging of a flat Vfinal asymp aging of an adaptive Vdd
bull But slightly pessimistic
Vdd
time
NBTI
PBTI
VBTI = Vlib asymp Vfinal
58ISVLSI-2014 invited talk 140710
Vfinal Estimation
bull Problem Vfinal is not available at early design stage (design has not been implemented)
bull Vfinal = Vdd end of life (to compensate BTI aging)
bull Gates along critical pathbull Timing slack at t = 0
bull Circuit activity is not an issue bull Because BTI effect is not sensitive to circuit activitybull DC or AC stress model is sufficient
59ISVLSI-2014 invited talk 140710
Observation and Heuristic 2
bull Observation 2 Vfinal is not sensitive to gate types
bull Heuristic 2 use average Vfinal of different gate typesbull Vfinal is a function of timing slack
bull Assume timing slack = 0
10mV
60ISVLSI-2014 invited talk 140710
Technology and Benchmark Circuits
bull NANGATE library with 32nm PTM technology bull Signoff for setup time violationbull Temperature = 125Cbull Process corner = slow NMOS and PMOSbull BTI degradation = DC AC
Supply voltages
Circuit Frequency (GHz)C5315 138c7552 125AES 089MPEG2 105
Vmax105V
Vinit090V
Vheur1 (DC) 097V
Vheur1 (AC) 095V
Vheur2 (DC) 095V
Vheur2 (AC) 093V
61ISVLSI-2014 invited talk 140710
A Reference Signoff Flow
bull Basic idea keep a consistent VBTI VLIB and Vdd throughout circuit lifetime
bull Signoff flowbull Estimate aging at each time step
bull Update circuit timing and Vdd
bull Repeat until t = tfinal
bull Modify circuit and start over if Vfinal gt maximum allowed voltage
bull No overhead in timing analysis but very slow Many STA runs
and library
Vstep AVS voltage stepVfinal converged voltage
62ISVLSI-2014 invited talk 140710
Experiment Setupbull Characterize different derated libraries
bull Evaluate impact of library characterizationbull Seven setups
1 VBTI = Vlib = Vinit Ignore AVS2 Most pessimistic derated library3 VBTI = Vlib = Vmax Extreme corner for AVS4 VBTI = Vfinal Do not overestimate aging but ignores AVS5 No derated library (reference)6 Proposed method with α=07 Proposed method with α=003
Case 1 2 3 4 5 6 7Vlib(V) Vinit Vinit Vmax Vinit NA Vheur1 Vheur2
VBTI (V) Vinit Vmax Vmax Vfinal NA Vheur1 Vheur2
63ISVLSI-2014 invited talk 140710
ldquoChicken and Eggrdquo Loop
bull ldquoChicken and eggrdquo loop in signoffbull Derated library characterization is related to BTI + AVSbull AVS affected by circuit implementation
bull Timing constraints critical paths etc
bull Circuit is affected by library characterization
Circuit
Derated Libraries
Vfinal
Vlib VBTI
64ISVLSI-2014 invited talk 140710
Bias Temperature Instability (BTI)
|ΔVth| increases when device is on (stressed)|ΔVth| is partially recovered when device is off (relaxed)NBTI PMOS PBTINMOS
|Vgs|
time
ON OFF ON OFF
[VattikondaWC06]
Device aging (|ΔVth|) accumulates over time
[TCASrsquo14]
65ISVLSI-2014 invited talk 140710
Observation 1
[Chan11]
bull BTI is a ldquofront-loadedrdquo phenomenon
bull 50 BTI aging happens within the 1st year of circuit lifetime (total lifetime = 10 years)
bull Most Vdd increment happens in early lifetime
bull Gap between Vdd and Vfinal reduces rapidly
asymp70 Vdd increment in 1 year(remaining 30 over 9 years)
Vfinal
66ISVLSI-2014 invited talk 140710
Results for DC Scenario
Optimistic signoff corner bull AVS increases supply voltage
aggressively to compensate agingbull Consume more powerbull May fail to meet timing if desired
bull Assume no EM fix at signoffbull BTI degradation is checked at each step and MTTF is updated as
30 MTTF penalty
200mV voltage compensation
>
70ISVLSI-2014 invited talk 140710
0 2 4 6 8 10 12090092094096098100102104
S1 S2 S3 S4 S5
Year
VDD
DMA 3S1 S2 S3 S4 S5
78
79
80
81
MTT
F (Y
ear)
EM Impact on AVS Scheduling
12 years MTTF penalty
>
71ISVLSI-2014 invited talk 140710
What is ldquoSignoffrdquo
bull Foundation of contract between design house and foundrybull ldquochip should workrdquo stack of models margins analysesbull Function timing signal integrity power integrity hellip
Nominal VddStatic IR drop
Power grid IR gradientDynamic IR
HCINBTI
Signoff Vdd
Voltage
Problem Margins = pessimism
overdesign schedule delay
ldquomargin stackrdquo for voltage signoff
Operating voltage
72ISVLSI-2014 invited talk 140710
Statistical Timing Analysis (1)
bull Delay sensitivity of path pj to variation source zv
bull Assumptions bull Δdjv is linear with respect to variation sources
bull Variation sources are normal distributions
bull Obtain Δdjv using 28 runs of RC extraction and static timing analysis (STA)
28 itf files (27 variation
sources + Ytyp)
Routed Netlist
RC extraction
STA
Δdjv
Δdjv = [ - ] 3dj(Yv) dj(Ytyp)
Note Path delay includes gate and wire delays
73ISVLSI-2014 invited talk 140710
Statistical Timing Analysis (2)bull Σ is the correlation matrix for variation sources (eg 27 x 27)
bull Σ = λλT (Note λ is obtained by Cholesky decomposition)
h h119879 119903119900119906119892 119901119906119905=1minus119864119877119879
+1minus119864119877119903times119879
recovery cycles
Clock period
Error rate [Kahng10]
76ISVLSI-2014 invited talk 140710
Selective-Endpoint Optimizationbull Optimize fanin cone w tighter constraints Allows replacement of Razor FF w normal FFbull Trade off cost of resilience vs data path optimization
Energy Reduction in AVS Contextbull Adaptive voltage scaling allows lower supply voltage for resilient
designs thus reduced powerbull Proposed method trades off between timing-error penalty vs
reduced power at a lower supply voltagebull Proposed method achieves an average of 18 energy reduction
compared to pure-margin designs Resilience benefits increase in the context of AVS strategy
084 088 092 096 10030
36
42
48
54
60brute-forcepure-marginCombOpt
Supply voltage (V)
En
erg
y (
mJ
)
084 086 088 09 092 094 096 098 1 10225
29
33
37
41
45brute-force
pure-margin
CombOpt
Supply voltage (V)
En
erg
y (
mJ
)
MUL EXU
Minimum achievable energy
80ISVLSI-2014 invited talk 140710
Our Concept Mode Dominancebull Design cone (of mode A) is the union of all the feasible operating modes for
circuits signed of at mode Abull Design cone is determined by tradeoff between voltage and frequency (mainly
threshold voltages)bull One mode is outside of the design cone of the other
failed design overdesignbull Mode A has positive timing slacks with respect to mode B
mode A dominates mode Bbull Equivalent dominance no mode is dominated by the other
bull Modes are in each othersrsquo design cone
Voltage
Frequency
A
Negative Slacks = failed design
Positive Slacks = overdesign
B
C
Design Cone of mode A
Multi-mode signoff at modes which do not exhibit equivalent dominance leads to overdesign
Guideline search for signoff modes within design cone reduce overdesign
81ISVLSI-2014 invited talk 140710
Our Method Global Optimizationbull Iteratively sample and refine power models
bull Avoid circuit implementation at each modebull Small constant of runs is enough Scalable
Sample (SPampR)
Construct power models
Estimate optimal signoff modes
Sample (SPampR)
Refine power models
Adaptive search
Global optimization flow
09 10 11 1214
15
16
17
18
19
201st 2nd real
Signoff Voltage (v)
Po
wer
(m
W)
Power estimation of adaptive search
bull Ovals indicate sample pointsbull 1st 2nd power from power models at first
second iterationbull real power from real implemented circuits
Design AESf 700MHz
82ISVLSI-2014 invited talk 140710
Classes of Closed-Loop AVS
bull Critical path may be difficult to identify (IP from 3rd party)
bull Calibrating monitors at multiple modesvoltages requires long test time
Closed-Loop AVS
Design-dependent replica
In-situmonitor
Generic monitor
bull Does not capture design-specific performance variation
82
This work Tunable monitor for closed-loop AVSbull Can be applied as a generic monitorbull Or tuned to capture design-specific performance
83ISVLSI-2014 invited talk 140710
Design of RO with Tunable Vmin
bull Identified two circuit knobs to tune Vmin
bull Series resistancebull Cell types (INV NAND NOR)
bull Proposed circuitbull Different cell type covers different process cornersbull Tune series resistance of each stage to high or low
1 bit 1 bit 1 bit Control pins
High resistance
Low resistance
84ISVLSI-2014 invited talk 140710
Benefit of Resilience Cost Reductionbull Reference flows
bull Pure-margin (PM) conventional methodology w only margin insertionbull Brute-force (BF) insert error-tolerant FFs at timing-critical endpoints
bull Proposed method (CO) achieves up to 20 energy reduction compared to reference methods
bull Resilience benefits increase with safety margin
PM BF CO PM BF CO PM BF CO25
30
35
40
45
50
55 Energy penalty of throughput degradationEnergy penalty of additional circuitsEnergy wo resilience
En
erg
y (
mJ
)
Large marginMedium marginSmall margin
MUL
PM BF CO PM BF CO PM BF CO25
27
29
31
33
35
En
erg
y (
mJ
)
EXU
Large marginMedium marginSmall margin
Smallmediumlarge margin safety margin = 51015 of clock period
85ISVLSI-2014 invited talk 140710
Increased Benefit of Resilience With AVSbull AVS (Adaptive Voltage Scaling) allows lower supply voltage for
resilient designs reduced powerbull We trade off between timing-error penalty vs reduced power at a
lower supply voltagebull Average 18 energy reduction compared to pure-margin designs
Resilience benefits increase in AVS context
084 088 092 096 10030
36
42
48
54
60brute-forcepure-marginCombOpt
Supply voltage (V)
En
erg
y (
mJ
)
084 086 088 09 092 094 096 098 1 10225
29
33
37
41
45brute-force
pure-margin
CombOpt
Supply voltage (V)
En
erg
y (
mJ
)
MUL EXU
Minimum achievable energy
86ISVLSI-2014 invited talk 140710
Overall Optimization Flow
bull Iteratively optimize with SEOpt and SkewOpt
Initial placement (all FFs = error-tolerant FFs)
Energy lt min energy
Save current solution
Margin insertion on K paths based on sensitivity function
Replace error-tolerant FFs w normal FFs
SEOpt
Activity-aware clock skew optimization
SkewOpt
Toward Holistic Modeling Margining and Tolerance of IC Variabi
IC Variability
Challenge Value of Technology
Solutions Modeling Margining Tolerance
Outline
BEOL Corner Optimization
Proposed Timing Signoff Flow
Conventional BEOL Corners
Statistical RC Model
Pessimism of Conventional BEOL Corners (CBC)
Intuition on Delay Variability Across Cw RCw
Intuition on Delay Variability Across Cw RCw (2)
Scaling Factor α and Delay Variation
Find Paths for Which TBCs Can Be Used
Determining α Arcw and Acw
Benefits of Tightened BEOL Corners
Outline (2)
How to Minimize Cost of Resilience
Tradeoff Resilience Cost vs Datapath Cost
Selective-Endpoint Optimization (SEOpt)
Clock Skew Optimization (SkewOpt)
Overall Optimization Flow
Benefit of Low-Cost Resilience
Increased Benefit of Resilience with AVS
Outline (3)
Breaking Chicken-Egg Loops Less Margin
Derated Library Characterization and AVS
Library Characterization for AVS
Power vs Area Across Different Signoffs
Heuristics 1
Vfinal Estimation
Observation and Heuristic 2
Proposed Library Characterization Flow
Power vs Area for All Designs
Also Multi-Mode Signoff Choices Matter
Also Tunable Monitors Less Margin
Also Tunable Monitors Less Margin (2)
Outline (4)
Conclusions
Thank You
Backup
Power Penalty to Fix EM with AVS
Homogeneous Corners
Homogeneous Corners (2)
Correlation Matrix
Wiring Structure in Timing-Critical Paths (2)
Delay Variation
Delay Variation (2)
Non-Homogeneous Corner
Opportunities for Tightened BEOL Corners
Wiring Structure in Timing-Critical Paths (2)
Proposed Timing Signoff Flow (2)
Experiment Setup
Further Analysis
Scaling Factor Results
Benefits of Tightened BEOL Corners (1)
Heuristics 1 (2)
Vfinal Estimation (2)
Observation and Heuristic 2 (2)
Technology and Benchmark Circuits
A Reference Signoff Flow
Experiment Setup (2)
ldquoChicken and Eggrdquo Loop
Bias Temperature Instability (BTI)
Observation 1
Results for DC Scenario
Problem Signoff Corner Definition
AVS Signoff Corner Selection
AVS Impact on EM Lifetime
EM Impact on AVS Scheduling
What is ldquoSignoffrdquo
Statistical Timing Analysis (1)
Statistical Timing Analysis (2)
Resilient Designs
Resilience Cost Reduction Problem
Selective-Endpoint Optimization
Process-Aware Vdd Scaling (PVS)
Challenge Variability
Energy Reduction in AVS Context
Our Concept Mode Dominance
Our Method Global Optimization
Classes of Closed-Loop AVS
Design of RO with Tunable Vmin
Benefit of Resilience Cost Reduction
Increased Benefit of Resilience With AVS
Overall Optimization Flow (2)
6ISVLSI-2014 invited talk 140710
BEOL Corner Optimization
bull 20nm and below increased timing variation due to interconnect R Cbull Design closure becomes much more difficult
bull Costs of BEOL variationsbull More design effort (eg ldquolast monthrdquo of manual ECO iteration)
bull Compromised circuit performance at high Vdd
bull Recent work reduce signoff margin by using tightened BEOL corners without sacrificing parametric yieldbull Signoff at conventional BEOL corners is pessimistic for most timing-
critical pathsbull We identify paths which can be safely signed off using tightened
BEOL corners (TBC)bull Joint work with Sorin Dobre (Qualcomm) and Tuck-Boon Chan
7ISVLSI-2014 invited talk 140710
Proposed Timing Signoff Flow
Routed design
Timing analysis using conventional BEOL corners (CBC)
ECOusing CBC
violation = 0
done
Conventional Signoff
No
Routed design
Classify timing critical paths
GTBC GCBC
ECOusing CBC
Timing analysis
using TBC
violation = 0
Timing analysis
using CBC
violation = 0
ECOusing TBC
done
This work
NoNo
8ISVLSI-2014 invited talk 140710
Conventional BEOL Corners
bull Three major variation sources per layer ΔW ΔT ΔHbull Conventional BEOL corners (CBC)
bull Homogeneous corners all variation sources are skewed in the same direction
bull BEOL RC variations are modeled in interconnect technology file (itf)
M2
M3
M1
S2 W2T2
H2 Inter-layer dielectric
Inter-metal dielectric
H3
H1
T1
T3
ΔW ΔT ΔH
Ytyp typical typical Typical
Ycb min min max
Ycw max max min
Yrcb max max max
Yrcw min min min
9ISVLSI-2014 invited talk 140710
Statistical RC Modelbull 3 variation sources in each layer ΔW ΔT ΔH
bull 9-layer metal stack has 27 variation sources z1 z2 hellip z27
bull BEOL layers in the same process module use the same manufacturing equipment and process steps
bull zu and zv are correlated if and only ifbull zu and zv are the same type (ΔW ΔT or ΔH)
bull zu and zv are in the same process module
M2 z4 z5 z6
M4 z10 z11 z12
M3 z7 z8 z9
M5 z13 z14 z15
M6 z16 z17 z18
M7 z19 z20 z21
M8 z22 z23 z24
M9 z25 z26 z27
M1 z1 z2 z3
Process module 3
Process module 2
Process module 1
Examples bull ΔW in layer M4 has a
positive correlation with ΔW in layers M5 M6 and M7
bull But ΔW in layer M4 is not correlated with ΔT in M4
ΔW ΔT ΔH
10ISVLSI-2014 invited talk 140710
Pessimism of Conventional BEOL Corners (CBC)
bull Assumption a max (setup) path pj is ldquosaferdquo when delay evaluated at a given CBC is larger than nominal delay + 3σj
dj(YCBC) ge 3σj + dj(Ytyp)
bull For a given path we can compare the statistical delay variation and the delay obtained from a given CBC αj = 3σj Δdj(YCBC)
Δdelay (vs typ) at C-worst [d(Ycw) ndash d(Ytyp)] d(Ytyp)
bull Some paths have α gt 10 a CBC can underestimate delay variationsbull But these paths often have smaller α values at the other corner ()
C-worst corner underestimates delay variations but these paths are dominated by the RC-worst corner
α lt 10 here delay variations covered by RC-worst corner
Dominated by C-worstΔdelay at C-worst gt Δdelay at RC-worst
Dominated by RC-worst Δdelay at RC-worst gt Δdelay at C-worst
Δdelay (vs typ) at RC-worst [d(Ycw) ndash d(Ytyp)] d(Ytyp)
12ISVLSI-2014 invited talk 140710
α α
Δdelay at C-worst [d(Ycw) ndash d(Ytyp)] d(Ytyp)
bull Some paths have α gt 10 a CBC can underestimate delay variationsbull But these paths often have smaller α values at the other corner ()
C-worst corner underestimates delay variations but these paths are dominated by the RC-worst corner
α lt 10 delay variations are covered by the RC-worst corner
Dominated by C-worstΔdelay at C-worst gt Δdelay at RC-worst
Dominated by RC-worst Δdelay at RC-worst gt Δdelay at C-worst
Δdelay at RC-worst [d(Ycw) ndash d(Ytyp)] d(Ytyp)
bull Paths are more sensitive to R or to Cbull Using RC-worst or C-worst only will underestimate delay variationsbull Need both RC- and C-worst corners to cover process variations
In the following α is defined at the dominant corner
Intuition on Delay Variability Across Cw RCw
13ISVLSI-2014 invited talk 140710
Scaling Factor α and Delay Variationbull Paths with small Δdrcw and Δdcw have large α
bull Eg here we see αj gt 06 when ((Δdrcw lt 3) AND (Δdcw lt 3))
bull Identify paths for tightened BEOL corners based on Δdrcw and Δdcw
Δd(Ycw)d(Ytyp)
Δd(Yrcw)d(Ytyp) α
14ISVLSI-2014 invited talk 140710
bull Paths with small Δdrcw and Δdcw have large α
bull Eg there are αj gt 06 when ((Δdrcw lt 3) AND (Δdcw lt 3))
bull Identify paths for tightened BEOL corners based on Δdrcw and Δdcw
Find Paths for Which TBCs Can Be Used
Δd(Ycw)d(Ytyp)
Δd(Yrcw)d(Ytyp)
Acw
Arcw
Gtbc = Set of paths that can be safely signed off using TBC ( (Path with Δdcw larger than Acw) OR (Path with Δdrcw larger than Arcw) )
α
15ISVLSI-2014 invited talk 140710
Determining α Arcw and Acw
Δd at C-worst corner ()Δd at RC-worst corner ()
bull Assumption critical paths in different designs have similar trends
bull Extract Arcw and Acw from a set of representative paths
bull Plot α vs Δdelay find Arcw and Acw for a given α
bull Add +1 margin on Arcw and Acw to account for sampling error
bull Smaller α larger thresholds (Arcw and Acw) fewer paths in GTBC
Δd at C-worst corner ()
Arcw Acw
16ISVLSI-2014 invited talk 140710
Benefits of Tightened BEOL Corners
bull WNS and TNS are reduced by up to 100ps and 53nsbull Timing violations reduced by
24 to 100
bull TBC-06 more benefits bull Tradeoff between reduced margin
vs paths which use TBC
Correlation factor γ = 05
LEON SUPERBLUE12 NETCARD
-018-016-014-012
-01-008-006-004-002
0
CBC TBC-05 TBC-06 TBC-07
WN
S (n
s)
LEON SUPERBLUE12 NETCARD
-90-80-70-60-50-40-30-20-10
0
CBC TBC-05 TBC-06 TBC-07
TNS
(ns)
LEON SUPERBLUE12 NETCARD0
200400600800
1000120014001600
CBC TBC-05 TBC-06 TBC-07
Tim
ing
viol
ation
s
17ISVLSI-2014 invited talk 140710
Outlinebull Introductionbull Modeling of IC Variabilitybull Tolerance of IC Variabilitybull Margining of IC Variabilitybull Conclusions
18ISVLSI-2014 invited talk 140710
How to Minimize Cost of Resilience bull Additional circuits area and power penaltiesbull Recovery from errors throughput degradationbull Large hold margin short-path padding costbull Want benefits (eg energy) to maximally outweigh costs
Razor Razor-Lite TIMBER
Razor Razor-Lite TIMBER
Power penalty 30 [Das08] ~0 [Kim13] 100 [Choudhury09]
Overall Optimization Flowbull Iteratively optimize with SEOpt and SkewOpt
Initial placement (all FFs = error-tolerant FFs)
Energy lt min energy
Save current solution
Margin insertion on K paths based on sensitivity function
Replace error-tolerant FFs w normal FFs
SEOpt
Activity aware clock skew optimization
SkewOpt
OR-tree insertion
23ISVLSI-2014 invited talk 140710
Benefit of Low-Cost Resiliencebull Reference flows
bull Pure-margin (PM) conventional method w only margin insertionbull Brute-force (BF) use error-tolerant FFs for timing-critical endpoints
bull Proposed method (CO) achieves up to 21 energy reduction compared to reference methods
bull Resilience benefits increase with larger process variation
PM BF CO PM BF CO PM BF CO27
29
31
33
35
37
En
erg
y (
mJ
)
PM BF CO PM BF CO PM BF CO22
26
30
34
38Energy penalty of throughput degradation
Energy penalty of additional circuits
Energy wo resilience
En
erg
y (
mJ
)
Large marginMedium marginSmall margin
MUL
EXU
Large marginMedium marginSmall margin
Smallmediumlarge margin 1σ2σ3σ for SS corner Technology foundry 28nm
24ISVLSI-2014 invited talk 140710
Increased Benefit of Resilience with AVSbull Adaptive voltage scaling allows a lower supply voltage for resilient
designs thus reduced powerbull Proposed method trades off between timing-error penalty vs
reduced power at a lower supply voltagebull Proposed method achieves an average of 17 energy reduction
compared to pure-margin designs Resilience benefits increase in the context of AVS strategy
086 09 094 098 10225
30
35
40
45
50pure-marginbrute-forceCombOpt
Supply voltage (V)
En
erg
y (
mJ
)
070 072 074 076 078 08024
26
28
30
32
34
36 pure-marginbrute-forceCombOpt
Supply voltage (V)
En
erg
y (
mJ
)
MUL EXU
Minimum achievable energy
Technology foundry 28nm
25ISVLSI-2014 invited talk 140710
Outlinebull Introductionbull Modeling of IC Variabilitybull Tolerance of IC Variabilitybull Margining of IC Variabilitybull Conclusions
26ISVLSI-2014 invited talk 140710
Breaking Chicken-Egg Loops Less Marginbull Example Interaction between reliability margin and AVS designs
bull Bias temperature instability (BTI) aging higher |ΔVth| lower fmax
bull AVS can be used to compensate for performance degradation
Circuit
Closed-loop AVS
On-chip aging
monitor
Circuit performanc
e
Voltage regulato
r
Circuit frequency
Vdd
time
time
Without AVSWith AVS
target
27ISVLSI-2014 invited talk 140710
Derated Library Characterization and AVS
bull VBTI = Voltage for BTI aging estimation
bull Vlib = Voltage for circuit performance estimation (library characterization)
bull VBTI and Vlib are required in signoff
bull VBTI and Vlib selection should consider BTI + AVS interaction
bull Aging and Vfinal are unknowns before circuit implementation
BTI degradation and AVS
Vfinal
VBTI |Vt|
Step 1
Vlib
Derated library
Step 2
Circuit implementation and
signoff
circuit
Step 3
28ISVLSI-2014 invited talk 140710
Library Characterization for AVS
bull VBTI = Voltage for BTI aging estimation
bull Vlib = Voltage for circuit performance estimation (library characterization)
bull VBTI and Vlib are required in signoff
bull VBTI and Vlib depend on aging during AVS
bull Aging and Vfinal are unknowns before circuit implementation
Vlib
VBTI Derated library
|Vt| Circuit implementation and
signoff
circuitBTI degradation and AVS
Vfinal
Step 1 Step 2 Step 3
No obvious guideline to define VBTI and Vlib
Inconsistency among Vfinal Vlib VBTI
bull What is the design overhead when timing libraries are not properly characterized
bull Can we define BTI- and AVS-aware signoff corners that ensure product goals with small design lifetime energy overheads Joint work with Wei-Ting Jonas Chan Tuck-Boon Chan Siddhartha Nath
29ISVLSI-2014 invited talk 140710
Power vs Area Across Different Signoffs
ldquoKneerdquo point for balanced area and power tradeoff
Optimistic signoff corner bull AVS increases supply voltage
aggressively to compensate aging
bull Large lifetime energy overhead
bull May fail to meet timing if desired supply voltage gt Vmax
bull Selection of signoff modes affects area power
bull ASP-DAC 2013 Optimization of signoff modes
Improve performance power or area
Reduce overdesign
NOM
ODNOM
OD
time
Vdd
tnom tOD tnom tOD
Also Multi-Mode Signoff Choices Matter
12
Fix fOD still 14 power range
Power of circuits w different overdrive modes
Different overdrive modes 26 power range
fnom = 800MHz Vnom = 08V
36ISVLSI-2014 invited talk 140710
Also Tunable Monitors Less Margin
Aggressive config Vmin_est lt Vmin_chip Some chips will fail
Optimized configbull Increase high
resistance passgatesbull Vmin_est asymp Vmin_chip
Default config
bull Low resistance passgates
bull Guardband for worst-case
bull Vmin_est gt Vmin_chip
bull 13mV margin
37ISVLSI-2014 invited talk 140710
Also Tunable Monitors Less Margin
Aggressive config Vmin_est lt Vmin_chip Some chips will fail
Default config
bull Low resistance passgates
bull Guardband for worst-case
bull Vmin_est gt Vmin_chip
bull 13mV margin
Optimized configbull Increase high
resistance passgatesbull Vmin_est asymp Vmin_chip
Benefits of tunability bull Compensate for difference
between model vs siliconbull Recover margin when variation is
reduced due to improved process
38ISVLSI-2014 invited talk 140710
Outlinebull Introductionbull Modeling of IC Variabilitybull Margining of IC Variabilitybull Tolerance of IC Variabilitybull Conclusions
39ISVLSI-2014 invited talk 140710
Conclusionsbull Variability severely challenges IC value
bull In manufacturing process during operation across lifetime
bull Benefit of ldquonext noderdquo is increasingly hard to findbull Entire node is a ldquo202020rdquo value propositionbull 5-10 in PPA metrics is now substantial at leading edge
bull Variability is connected to tapeout IC properties by models margins tolerances used in signoff
bull Some takeaways from this talkbull Substantial benefit from tightening BEOL corners (= signoff)bull ldquoMinimum cost of resiliencerdquo is a rich optimization challengebull Chicken-egg loops in signoff definition can be brokenbull Holistic approaches will provide ldquoequivalent scalingrdquo that
extends the value trajectory of Moorersquos Law
40ISVLSI-2014 invited talk 140710
Thank You
41ISVLSI-2014 invited talk 140710
Backup
42ISVLSI-2014 invited talk 140710
Power Penalty to Fix EM with AVS
1 2 3 4 5 6 7 8 91200
1300
1400
1500
1600
1700
030
032
034
036
Core Power (mW) PG Power (mW)
Implemetation
Core
Pow
er (m
W)
PG
Pow
er (m
W)
bull Core power increases due to elevated voltage bull PG power increases due to both elevated voltage and mesh degradationbull A tradeoff between invested guardband in signoff
Highest invested guardband
Least invested guardband
14 power penalty
>
43ISVLSI-2014 invited talk 140710
Homogeneous Cornersbull (1) Define RC corners of each layer separatelybull (2) Use corners from each layer to construct a
homogeneous corner for an interconnect stack
C-3σ
Layer M2
3σ
C-3σ
Layer M1
3σ
Interconnect stack with M1 and M2
M1 C
M2 C
3σ Pessimism
Example worst-case capacitance corner Homogeneous
Cw corner
44ISVLSI-2014 invited talk 140710
Homogeneous Cornersbull (1) Define RC corners of each layer separatelybull (2) Use corners from each layer to construct a
homogeneous corner for an interconnect stack
Interconnect stack with M1 and M2
M1 C
M2 C
3σ
Homogeneous Cw corner
C-3σ
Layer M2
3σ
C-3σ
Layer M1
3σ
Pessimism
Example worst-case capacitance corner
When variations in different layers are not fully correlated pessimism of homogeneous corners increase with layers
45ISVLSI-2014 invited talk 140710
Correlation Matrixbull Let Σ be the correlation matrix for variation sources
M1 M2 M3 M4
ΔW ΔT ΔH ΔW ΔT ΔH ΔW ΔT ΔH ΔW ΔT ΔH
M1 ΔW 1 0 0 γ 0 0 γ 0 0 0 0 0
ΔT 0 1 0 0 γ 0 0 γ 0 0 0 0
ΔH 0 0 1 0 0 γ 0 0 γ 0 0 0
M2 ΔW γ 0 0 1 0 0 γ 0 0 0 0 0
ΔT 0 γ 0 0 1 0 0 γ 0 0 0 0
ΔH 0 0 γ 0 0 1 0 0 γ 0 0 0
M3 ΔW γ 0 0 γ 0 0 1 0 0 0 0 0
ΔT 0 γ 0 0 γ 0 0 1 0 0 0 0
ΔH 0 0 γ 0 0 γ 0 0 1 0 0 0
M4 ΔW 0 0 0 0 0 0 0 0 0 1 0 0
ΔT 0 0 0 0 0 0 0 0 0 0 1 0
ΔH 0 0 0 0 0 0 0 0 0 0 0 1
= Σ
Correlation for variation sources with the same variation type and in the process module γ 05
Variation sources in different process modules are independent
46ISVLSI-2014 invited talk 140710
Wiring Structure in Timing-Critical Paths (2)
bull 92 of paths have lt 60 of wirelength on any single layer
Max wirelength ratio across all layers ()
Cum
ulati
ve p
roba
bilit
y
092
60
bull Variations in different layers are not fully correlated
bull Some paths have α gt 10 a CBC can underestimate delay variationsbull But these paths have larger delays at the other corner
C-worst corner underestimates delay variations but these paths are dominated by the RC-worst corner
α lt 10 delay variations are covered by the RC-worst corner
Dominated by C-worstΔdelay at C-worst gt Δdelay at RC-worst
Dominated by RC-worst Δdelay at RC-worst gt Δdelay at C-worst
Δdelay at RC-worst [d(Ycw) ndash d(Ytyp)] d(Ytyp)
48ISVLSI-2014 invited talk 140710
Delay Variation
α α
Δdelay at C-worst [d(Ycw) ndash d(Ytyp)] d(Ytyp)
bull Some paths have α gt 10 a CBC can underestimate delay variationsbull But these paths have larger delays at the other corner
C-worst corner underestimates delay variations but these paths are dominated by the RC-worst corner
α lt 10 delay variations are covered by the RC-worst corner
Dominated by C-worstΔdelay at C-worst gt Δdelay at RC-worst
Dominated by RC-worst Δdelay at RC-worst gt Δdelay at C-worst
Δdelay at RC-worst [d(Ycw) ndash d(Ytyp)] d(Ytyp)
bull Paths are more sensitive to R or to Cbull Using RC-worst or C-worst only will underestimate delay variationsbull Need both RC- and C-worst corners to cover process variationsbull In the following discussions α is defined at the dominant corner
49ISVLSI-2014 invited talk 140710
Non-Homogeneous Corner
bull Each layer can have different skewed variationsInterconnect stack with M1 and M2
M1 C
M2 C
3σ
Non-homogeneous cornerM1 == Cw (3σ)M2 == Ctyp
bull Less pessimism with non-homogeneous cornersbull Challenge
bull Many feasible combinationsbull A corner can only cover certain pathsbull How to choose the best combinations
50ISVLSI-2014 invited talk 140710
Opportunities for Tightened BEOL Corners
bull CBC can be pessimistic Most paths have α lt 05 bull Use tightened BEOL corners eg scale BEOL variation in
itf with α = 05
Δdj(Yrcw)dj(Ytyp) x 100
3σjd(Ytyp) x 100
Challenge how to avoid underestimating delay variation to preserve parametric yield
51ISVLSI-2014 invited talk 140710
Wiring Structure in Timing-Critical Paths
bull Critical paths are structurally similar
bull Wires on critical paths are routed on many layers
bull Structure is an outcome of the design flow
Testcasebull 45nm foundry library (wire
resistivity scaled by 8X)bull Netlist NETCARD 1mm2 570K
standard cell instancesbull 9 metal layersbull Extract critical paths from
different PVT and BEOL corners
Wirelength ratio ()
52ISVLSI-2014 invited talk 140710
Proposed Timing Signoff Flow
bull Extract RC at RC-worst C-worst and the typical corners
bull Calculate Δdelay of critical paths
bull Put path j in the group Gtbc if Δdelay is larger than a threshold
bull Fix only the paths in Gtbc using tightened BEOL corners
bull Since tightened corners have smaller delay variations timing closure is easier
Routed design
Timing analysis at BEOL corners Ytyp Ycw Yrcw
GTBC GCBC
ECOusing CBC
Timing analysis
using TBC
violation = 0
Timing analysis
using CBC
violation = 0
ECOusing TBC
done
53ISVLSI-2014 invited talk 140710
Experiment Setup
LEON3MP NETCARD SUPERBLUE12Clock period (ns) 18 20 31
bull Model BTI degradation with Vfinal throughout lifetime
bull Aging of a flat Vfinal asymp aging of an adaptive Vdd
bull But slightly pessimistic
Vdd
time
NBTI
PBTI
VBTI = Vlib asymp Vfinal
58ISVLSI-2014 invited talk 140710
Vfinal Estimation
bull Problem Vfinal is not available at early design stage (design has not been implemented)
bull Vfinal = Vdd end of life (to compensate BTI aging)
bull Gates along critical pathbull Timing slack at t = 0
bull Circuit activity is not an issue bull Because BTI effect is not sensitive to circuit activitybull DC or AC stress model is sufficient
59ISVLSI-2014 invited talk 140710
Observation and Heuristic 2
bull Observation 2 Vfinal is not sensitive to gate types
bull Heuristic 2 use average Vfinal of different gate typesbull Vfinal is a function of timing slack
bull Assume timing slack = 0
10mV
60ISVLSI-2014 invited talk 140710
Technology and Benchmark Circuits
bull NANGATE library with 32nm PTM technology bull Signoff for setup time violationbull Temperature = 125Cbull Process corner = slow NMOS and PMOSbull BTI degradation = DC AC
Supply voltages
Circuit Frequency (GHz)C5315 138c7552 125AES 089MPEG2 105
Vmax105V
Vinit090V
Vheur1 (DC) 097V
Vheur1 (AC) 095V
Vheur2 (DC) 095V
Vheur2 (AC) 093V
61ISVLSI-2014 invited talk 140710
A Reference Signoff Flow
bull Basic idea keep a consistent VBTI VLIB and Vdd throughout circuit lifetime
bull Signoff flowbull Estimate aging at each time step
bull Update circuit timing and Vdd
bull Repeat until t = tfinal
bull Modify circuit and start over if Vfinal gt maximum allowed voltage
bull No overhead in timing analysis but very slow Many STA runs
and library
Vstep AVS voltage stepVfinal converged voltage
62ISVLSI-2014 invited talk 140710
Experiment Setupbull Characterize different derated libraries
bull Evaluate impact of library characterizationbull Seven setups
1 VBTI = Vlib = Vinit Ignore AVS2 Most pessimistic derated library3 VBTI = Vlib = Vmax Extreme corner for AVS4 VBTI = Vfinal Do not overestimate aging but ignores AVS5 No derated library (reference)6 Proposed method with α=07 Proposed method with α=003
Case 1 2 3 4 5 6 7Vlib(V) Vinit Vinit Vmax Vinit NA Vheur1 Vheur2
VBTI (V) Vinit Vmax Vmax Vfinal NA Vheur1 Vheur2
63ISVLSI-2014 invited talk 140710
ldquoChicken and Eggrdquo Loop
bull ldquoChicken and eggrdquo loop in signoffbull Derated library characterization is related to BTI + AVSbull AVS affected by circuit implementation
bull Timing constraints critical paths etc
bull Circuit is affected by library characterization
Circuit
Derated Libraries
Vfinal
Vlib VBTI
64ISVLSI-2014 invited talk 140710
Bias Temperature Instability (BTI)
|ΔVth| increases when device is on (stressed)|ΔVth| is partially recovered when device is off (relaxed)NBTI PMOS PBTINMOS
|Vgs|
time
ON OFF ON OFF
[VattikondaWC06]
Device aging (|ΔVth|) accumulates over time
[TCASrsquo14]
65ISVLSI-2014 invited talk 140710
Observation 1
[Chan11]
bull BTI is a ldquofront-loadedrdquo phenomenon
bull 50 BTI aging happens within the 1st year of circuit lifetime (total lifetime = 10 years)
bull Most Vdd increment happens in early lifetime
bull Gap between Vdd and Vfinal reduces rapidly
asymp70 Vdd increment in 1 year(remaining 30 over 9 years)
Vfinal
66ISVLSI-2014 invited talk 140710
Results for DC Scenario
Optimistic signoff corner bull AVS increases supply voltage
aggressively to compensate agingbull Consume more powerbull May fail to meet timing if desired
bull Assume no EM fix at signoffbull BTI degradation is checked at each step and MTTF is updated as
30 MTTF penalty
200mV voltage compensation
>
70ISVLSI-2014 invited talk 140710
0 2 4 6 8 10 12090092094096098100102104
S1 S2 S3 S4 S5
Year
VDD
DMA 3S1 S2 S3 S4 S5
78
79
80
81
MTT
F (Y
ear)
EM Impact on AVS Scheduling
12 years MTTF penalty
>
71ISVLSI-2014 invited talk 140710
What is ldquoSignoffrdquo
bull Foundation of contract between design house and foundrybull ldquochip should workrdquo stack of models margins analysesbull Function timing signal integrity power integrity hellip
Nominal VddStatic IR drop
Power grid IR gradientDynamic IR
HCINBTI
Signoff Vdd
Voltage
Problem Margins = pessimism
overdesign schedule delay
ldquomargin stackrdquo for voltage signoff
Operating voltage
72ISVLSI-2014 invited talk 140710
Statistical Timing Analysis (1)
bull Delay sensitivity of path pj to variation source zv
bull Assumptions bull Δdjv is linear with respect to variation sources
bull Variation sources are normal distributions
bull Obtain Δdjv using 28 runs of RC extraction and static timing analysis (STA)
28 itf files (27 variation
sources + Ytyp)
Routed Netlist
RC extraction
STA
Δdjv
Δdjv = [ - ] 3dj(Yv) dj(Ytyp)
Note Path delay includes gate and wire delays
73ISVLSI-2014 invited talk 140710
Statistical Timing Analysis (2)bull Σ is the correlation matrix for variation sources (eg 27 x 27)
bull Σ = λλT (Note λ is obtained by Cholesky decomposition)
h h119879 119903119900119906119892 119901119906119905=1minus119864119877119879
+1minus119864119877119903times119879
recovery cycles
Clock period
Error rate [Kahng10]
76ISVLSI-2014 invited talk 140710
Selective-Endpoint Optimizationbull Optimize fanin cone w tighter constraints Allows replacement of Razor FF w normal FFbull Trade off cost of resilience vs data path optimization
Δdelay (vs typ) at C-worst [d(Ycw) ndash d(Ytyp)] d(Ytyp)
bull Some paths have α gt 10 a CBC can underestimate delay variationsbull But these paths often have smaller α values at the other corner ()
C-worst corner underestimates delay variations but these paths are dominated by the RC-worst corner
α lt 10 here delay variations covered by RC-worst corner
Dominated by C-worstΔdelay at C-worst gt Δdelay at RC-worst
Dominated by RC-worst Δdelay at RC-worst gt Δdelay at C-worst
Δdelay (vs typ) at RC-worst [d(Ycw) ndash d(Ytyp)] d(Ytyp)
12ISVLSI-2014 invited talk 140710
α α
Δdelay at C-worst [d(Ycw) ndash d(Ytyp)] d(Ytyp)
bull Some paths have α gt 10 a CBC can underestimate delay variationsbull But these paths often have smaller α values at the other corner ()
C-worst corner underestimates delay variations but these paths are dominated by the RC-worst corner
α lt 10 delay variations are covered by the RC-worst corner
Dominated by C-worstΔdelay at C-worst gt Δdelay at RC-worst
Dominated by RC-worst Δdelay at RC-worst gt Δdelay at C-worst
Δdelay at RC-worst [d(Ycw) ndash d(Ytyp)] d(Ytyp)
bull Paths are more sensitive to R or to Cbull Using RC-worst or C-worst only will underestimate delay variationsbull Need both RC- and C-worst corners to cover process variations
In the following α is defined at the dominant corner
Intuition on Delay Variability Across Cw RCw
13ISVLSI-2014 invited talk 140710
Scaling Factor α and Delay Variationbull Paths with small Δdrcw and Δdcw have large α
bull Eg here we see αj gt 06 when ((Δdrcw lt 3) AND (Δdcw lt 3))
bull Identify paths for tightened BEOL corners based on Δdrcw and Δdcw
Δd(Ycw)d(Ytyp)
Δd(Yrcw)d(Ytyp) α
14ISVLSI-2014 invited talk 140710
bull Paths with small Δdrcw and Δdcw have large α
bull Eg there are αj gt 06 when ((Δdrcw lt 3) AND (Δdcw lt 3))
bull Identify paths for tightened BEOL corners based on Δdrcw and Δdcw
Find Paths for Which TBCs Can Be Used
Δd(Ycw)d(Ytyp)
Δd(Yrcw)d(Ytyp)
Acw
Arcw
Gtbc = Set of paths that can be safely signed off using TBC ( (Path with Δdcw larger than Acw) OR (Path with Δdrcw larger than Arcw) )
α
15ISVLSI-2014 invited talk 140710
Determining α Arcw and Acw
Δd at C-worst corner ()Δd at RC-worst corner ()
bull Assumption critical paths in different designs have similar trends
bull Extract Arcw and Acw from a set of representative paths
bull Plot α vs Δdelay find Arcw and Acw for a given α
bull Add +1 margin on Arcw and Acw to account for sampling error
bull Smaller α larger thresholds (Arcw and Acw) fewer paths in GTBC
Δd at C-worst corner ()
Arcw Acw
16ISVLSI-2014 invited talk 140710
Benefits of Tightened BEOL Corners
bull WNS and TNS are reduced by up to 100ps and 53nsbull Timing violations reduced by
24 to 100
bull TBC-06 more benefits bull Tradeoff between reduced margin
vs paths which use TBC
Correlation factor γ = 05
LEON SUPERBLUE12 NETCARD
-018-016-014-012
-01-008-006-004-002
0
CBC TBC-05 TBC-06 TBC-07
WN
S (n
s)
LEON SUPERBLUE12 NETCARD
-90-80-70-60-50-40-30-20-10
0
CBC TBC-05 TBC-06 TBC-07
TNS
(ns)
LEON SUPERBLUE12 NETCARD0
200400600800
1000120014001600
CBC TBC-05 TBC-06 TBC-07
Tim
ing
viol
ation
s
17ISVLSI-2014 invited talk 140710
Outlinebull Introductionbull Modeling of IC Variabilitybull Tolerance of IC Variabilitybull Margining of IC Variabilitybull Conclusions
18ISVLSI-2014 invited talk 140710
How to Minimize Cost of Resilience bull Additional circuits area and power penaltiesbull Recovery from errors throughput degradationbull Large hold margin short-path padding costbull Want benefits (eg energy) to maximally outweigh costs
Razor Razor-Lite TIMBER
Razor Razor-Lite TIMBER
Power penalty 30 [Das08] ~0 [Kim13] 100 [Choudhury09]
Overall Optimization Flowbull Iteratively optimize with SEOpt and SkewOpt
Initial placement (all FFs = error-tolerant FFs)
Energy lt min energy
Save current solution
Margin insertion on K paths based on sensitivity function
Replace error-tolerant FFs w normal FFs
SEOpt
Activity aware clock skew optimization
SkewOpt
OR-tree insertion
23ISVLSI-2014 invited talk 140710
Benefit of Low-Cost Resiliencebull Reference flows
bull Pure-margin (PM) conventional method w only margin insertionbull Brute-force (BF) use error-tolerant FFs for timing-critical endpoints
bull Proposed method (CO) achieves up to 21 energy reduction compared to reference methods
bull Resilience benefits increase with larger process variation
PM BF CO PM BF CO PM BF CO27
29
31
33
35
37
En
erg
y (
mJ
)
PM BF CO PM BF CO PM BF CO22
26
30
34
38Energy penalty of throughput degradation
Energy penalty of additional circuits
Energy wo resilience
En
erg
y (
mJ
)
Large marginMedium marginSmall margin
MUL
EXU
Large marginMedium marginSmall margin
Smallmediumlarge margin 1σ2σ3σ for SS corner Technology foundry 28nm
24ISVLSI-2014 invited talk 140710
Increased Benefit of Resilience with AVSbull Adaptive voltage scaling allows a lower supply voltage for resilient
designs thus reduced powerbull Proposed method trades off between timing-error penalty vs
reduced power at a lower supply voltagebull Proposed method achieves an average of 17 energy reduction
compared to pure-margin designs Resilience benefits increase in the context of AVS strategy
086 09 094 098 10225
30
35
40
45
50pure-marginbrute-forceCombOpt
Supply voltage (V)
En
erg
y (
mJ
)
070 072 074 076 078 08024
26
28
30
32
34
36 pure-marginbrute-forceCombOpt
Supply voltage (V)
En
erg
y (
mJ
)
MUL EXU
Minimum achievable energy
Technology foundry 28nm
25ISVLSI-2014 invited talk 140710
Outlinebull Introductionbull Modeling of IC Variabilitybull Tolerance of IC Variabilitybull Margining of IC Variabilitybull Conclusions
26ISVLSI-2014 invited talk 140710
Breaking Chicken-Egg Loops Less Marginbull Example Interaction between reliability margin and AVS designs
bull Bias temperature instability (BTI) aging higher |ΔVth| lower fmax
bull AVS can be used to compensate for performance degradation
Circuit
Closed-loop AVS
On-chip aging
monitor
Circuit performanc
e
Voltage regulato
r
Circuit frequency
Vdd
time
time
Without AVSWith AVS
target
27ISVLSI-2014 invited talk 140710
Derated Library Characterization and AVS
bull VBTI = Voltage for BTI aging estimation
bull Vlib = Voltage for circuit performance estimation (library characterization)
bull VBTI and Vlib are required in signoff
bull VBTI and Vlib selection should consider BTI + AVS interaction
bull Aging and Vfinal are unknowns before circuit implementation
BTI degradation and AVS
Vfinal
VBTI |Vt|
Step 1
Vlib
Derated library
Step 2
Circuit implementation and
signoff
circuit
Step 3
28ISVLSI-2014 invited talk 140710
Library Characterization for AVS
bull VBTI = Voltage for BTI aging estimation
bull Vlib = Voltage for circuit performance estimation (library characterization)
bull VBTI and Vlib are required in signoff
bull VBTI and Vlib depend on aging during AVS
bull Aging and Vfinal are unknowns before circuit implementation
Vlib
VBTI Derated library
|Vt| Circuit implementation and
signoff
circuitBTI degradation and AVS
Vfinal
Step 1 Step 2 Step 3
No obvious guideline to define VBTI and Vlib
Inconsistency among Vfinal Vlib VBTI
bull What is the design overhead when timing libraries are not properly characterized
bull Can we define BTI- and AVS-aware signoff corners that ensure product goals with small design lifetime energy overheads Joint work with Wei-Ting Jonas Chan Tuck-Boon Chan Siddhartha Nath
29ISVLSI-2014 invited talk 140710
Power vs Area Across Different Signoffs
ldquoKneerdquo point for balanced area and power tradeoff
Optimistic signoff corner bull AVS increases supply voltage
aggressively to compensate aging
bull Large lifetime energy overhead
bull May fail to meet timing if desired supply voltage gt Vmax
bull Selection of signoff modes affects area power
bull ASP-DAC 2013 Optimization of signoff modes
Improve performance power or area
Reduce overdesign
NOM
ODNOM
OD
time
Vdd
tnom tOD tnom tOD
Also Multi-Mode Signoff Choices Matter
12
Fix fOD still 14 power range
Power of circuits w different overdrive modes
Different overdrive modes 26 power range
fnom = 800MHz Vnom = 08V
36ISVLSI-2014 invited talk 140710
Also Tunable Monitors Less Margin
Aggressive config Vmin_est lt Vmin_chip Some chips will fail
Optimized configbull Increase high
resistance passgatesbull Vmin_est asymp Vmin_chip
Default config
bull Low resistance passgates
bull Guardband for worst-case
bull Vmin_est gt Vmin_chip
bull 13mV margin
37ISVLSI-2014 invited talk 140710
Also Tunable Monitors Less Margin
Aggressive config Vmin_est lt Vmin_chip Some chips will fail
Default config
bull Low resistance passgates
bull Guardband for worst-case
bull Vmin_est gt Vmin_chip
bull 13mV margin
Optimized configbull Increase high
resistance passgatesbull Vmin_est asymp Vmin_chip
Benefits of tunability bull Compensate for difference
between model vs siliconbull Recover margin when variation is
reduced due to improved process
38ISVLSI-2014 invited talk 140710
Outlinebull Introductionbull Modeling of IC Variabilitybull Margining of IC Variabilitybull Tolerance of IC Variabilitybull Conclusions
39ISVLSI-2014 invited talk 140710
Conclusionsbull Variability severely challenges IC value
bull In manufacturing process during operation across lifetime
bull Benefit of ldquonext noderdquo is increasingly hard to findbull Entire node is a ldquo202020rdquo value propositionbull 5-10 in PPA metrics is now substantial at leading edge
bull Variability is connected to tapeout IC properties by models margins tolerances used in signoff
bull Some takeaways from this talkbull Substantial benefit from tightening BEOL corners (= signoff)bull ldquoMinimum cost of resiliencerdquo is a rich optimization challengebull Chicken-egg loops in signoff definition can be brokenbull Holistic approaches will provide ldquoequivalent scalingrdquo that
extends the value trajectory of Moorersquos Law
40ISVLSI-2014 invited talk 140710
Thank You
41ISVLSI-2014 invited talk 140710
Backup
42ISVLSI-2014 invited talk 140710
Power Penalty to Fix EM with AVS
1 2 3 4 5 6 7 8 91200
1300
1400
1500
1600
1700
030
032
034
036
Core Power (mW) PG Power (mW)
Implemetation
Core
Pow
er (m
W)
PG
Pow
er (m
W)
bull Core power increases due to elevated voltage bull PG power increases due to both elevated voltage and mesh degradationbull A tradeoff between invested guardband in signoff
Highest invested guardband
Least invested guardband
14 power penalty
>
43ISVLSI-2014 invited talk 140710
Homogeneous Cornersbull (1) Define RC corners of each layer separatelybull (2) Use corners from each layer to construct a
homogeneous corner for an interconnect stack
C-3σ
Layer M2
3σ
C-3σ
Layer M1
3σ
Interconnect stack with M1 and M2
M1 C
M2 C
3σ Pessimism
Example worst-case capacitance corner Homogeneous
Cw corner
44ISVLSI-2014 invited talk 140710
Homogeneous Cornersbull (1) Define RC corners of each layer separatelybull (2) Use corners from each layer to construct a
homogeneous corner for an interconnect stack
Interconnect stack with M1 and M2
M1 C
M2 C
3σ
Homogeneous Cw corner
C-3σ
Layer M2
3σ
C-3σ
Layer M1
3σ
Pessimism
Example worst-case capacitance corner
When variations in different layers are not fully correlated pessimism of homogeneous corners increase with layers
45ISVLSI-2014 invited talk 140710
Correlation Matrixbull Let Σ be the correlation matrix for variation sources
M1 M2 M3 M4
ΔW ΔT ΔH ΔW ΔT ΔH ΔW ΔT ΔH ΔW ΔT ΔH
M1 ΔW 1 0 0 γ 0 0 γ 0 0 0 0 0
ΔT 0 1 0 0 γ 0 0 γ 0 0 0 0
ΔH 0 0 1 0 0 γ 0 0 γ 0 0 0
M2 ΔW γ 0 0 1 0 0 γ 0 0 0 0 0
ΔT 0 γ 0 0 1 0 0 γ 0 0 0 0
ΔH 0 0 γ 0 0 1 0 0 γ 0 0 0
M3 ΔW γ 0 0 γ 0 0 1 0 0 0 0 0
ΔT 0 γ 0 0 γ 0 0 1 0 0 0 0
ΔH 0 0 γ 0 0 γ 0 0 1 0 0 0
M4 ΔW 0 0 0 0 0 0 0 0 0 1 0 0
ΔT 0 0 0 0 0 0 0 0 0 0 1 0
ΔH 0 0 0 0 0 0 0 0 0 0 0 1
= Σ
Correlation for variation sources with the same variation type and in the process module γ 05
Variation sources in different process modules are independent
46ISVLSI-2014 invited talk 140710
Wiring Structure in Timing-Critical Paths (2)
bull 92 of paths have lt 60 of wirelength on any single layer
Max wirelength ratio across all layers ()
Cum
ulati
ve p
roba
bilit
y
092
60
bull Variations in different layers are not fully correlated
bull Some paths have α gt 10 a CBC can underestimate delay variationsbull But these paths have larger delays at the other corner
C-worst corner underestimates delay variations but these paths are dominated by the RC-worst corner
α lt 10 delay variations are covered by the RC-worst corner
Dominated by C-worstΔdelay at C-worst gt Δdelay at RC-worst
Dominated by RC-worst Δdelay at RC-worst gt Δdelay at C-worst
Δdelay at RC-worst [d(Ycw) ndash d(Ytyp)] d(Ytyp)
48ISVLSI-2014 invited talk 140710
Delay Variation
α α
Δdelay at C-worst [d(Ycw) ndash d(Ytyp)] d(Ytyp)
bull Some paths have α gt 10 a CBC can underestimate delay variationsbull But these paths have larger delays at the other corner
C-worst corner underestimates delay variations but these paths are dominated by the RC-worst corner
α lt 10 delay variations are covered by the RC-worst corner
Dominated by C-worstΔdelay at C-worst gt Δdelay at RC-worst
Dominated by RC-worst Δdelay at RC-worst gt Δdelay at C-worst
Δdelay at RC-worst [d(Ycw) ndash d(Ytyp)] d(Ytyp)
bull Paths are more sensitive to R or to Cbull Using RC-worst or C-worst only will underestimate delay variationsbull Need both RC- and C-worst corners to cover process variationsbull In the following discussions α is defined at the dominant corner
49ISVLSI-2014 invited talk 140710
Non-Homogeneous Corner
bull Each layer can have different skewed variationsInterconnect stack with M1 and M2
M1 C
M2 C
3σ
Non-homogeneous cornerM1 == Cw (3σ)M2 == Ctyp
bull Less pessimism with non-homogeneous cornersbull Challenge
bull Many feasible combinationsbull A corner can only cover certain pathsbull How to choose the best combinations
50ISVLSI-2014 invited talk 140710
Opportunities for Tightened BEOL Corners
bull CBC can be pessimistic Most paths have α lt 05 bull Use tightened BEOL corners eg scale BEOL variation in
itf with α = 05
Δdj(Yrcw)dj(Ytyp) x 100
3σjd(Ytyp) x 100
Challenge how to avoid underestimating delay variation to preserve parametric yield
51ISVLSI-2014 invited talk 140710
Wiring Structure in Timing-Critical Paths
bull Critical paths are structurally similar
bull Wires on critical paths are routed on many layers
bull Structure is an outcome of the design flow
Testcasebull 45nm foundry library (wire
resistivity scaled by 8X)bull Netlist NETCARD 1mm2 570K
standard cell instancesbull 9 metal layersbull Extract critical paths from
different PVT and BEOL corners
Wirelength ratio ()
52ISVLSI-2014 invited talk 140710
Proposed Timing Signoff Flow
bull Extract RC at RC-worst C-worst and the typical corners
bull Calculate Δdelay of critical paths
bull Put path j in the group Gtbc if Δdelay is larger than a threshold
bull Fix only the paths in Gtbc using tightened BEOL corners
bull Since tightened corners have smaller delay variations timing closure is easier
Routed design
Timing analysis at BEOL corners Ytyp Ycw Yrcw
GTBC GCBC
ECOusing CBC
Timing analysis
using TBC
violation = 0
Timing analysis
using CBC
violation = 0
ECOusing TBC
done
53ISVLSI-2014 invited talk 140710
Experiment Setup
LEON3MP NETCARD SUPERBLUE12Clock period (ns) 18 20 31
bull Model BTI degradation with Vfinal throughout lifetime
bull Aging of a flat Vfinal asymp aging of an adaptive Vdd
bull But slightly pessimistic
Vdd
time
NBTI
PBTI
VBTI = Vlib asymp Vfinal
58ISVLSI-2014 invited talk 140710
Vfinal Estimation
bull Problem Vfinal is not available at early design stage (design has not been implemented)
bull Vfinal = Vdd end of life (to compensate BTI aging)
bull Gates along critical pathbull Timing slack at t = 0
bull Circuit activity is not an issue bull Because BTI effect is not sensitive to circuit activitybull DC or AC stress model is sufficient
59ISVLSI-2014 invited talk 140710
Observation and Heuristic 2
bull Observation 2 Vfinal is not sensitive to gate types
bull Heuristic 2 use average Vfinal of different gate typesbull Vfinal is a function of timing slack
bull Assume timing slack = 0
10mV
60ISVLSI-2014 invited talk 140710
Technology and Benchmark Circuits
bull NANGATE library with 32nm PTM technology bull Signoff for setup time violationbull Temperature = 125Cbull Process corner = slow NMOS and PMOSbull BTI degradation = DC AC
Supply voltages
Circuit Frequency (GHz)C5315 138c7552 125AES 089MPEG2 105
Vmax105V
Vinit090V
Vheur1 (DC) 097V
Vheur1 (AC) 095V
Vheur2 (DC) 095V
Vheur2 (AC) 093V
61ISVLSI-2014 invited talk 140710
A Reference Signoff Flow
bull Basic idea keep a consistent VBTI VLIB and Vdd throughout circuit lifetime
bull Signoff flowbull Estimate aging at each time step
bull Update circuit timing and Vdd
bull Repeat until t = tfinal
bull Modify circuit and start over if Vfinal gt maximum allowed voltage
bull No overhead in timing analysis but very slow Many STA runs
and library
Vstep AVS voltage stepVfinal converged voltage
62ISVLSI-2014 invited talk 140710
Experiment Setupbull Characterize different derated libraries
bull Evaluate impact of library characterizationbull Seven setups
1 VBTI = Vlib = Vinit Ignore AVS2 Most pessimistic derated library3 VBTI = Vlib = Vmax Extreme corner for AVS4 VBTI = Vfinal Do not overestimate aging but ignores AVS5 No derated library (reference)6 Proposed method with α=07 Proposed method with α=003
Case 1 2 3 4 5 6 7Vlib(V) Vinit Vinit Vmax Vinit NA Vheur1 Vheur2
VBTI (V) Vinit Vmax Vmax Vfinal NA Vheur1 Vheur2
63ISVLSI-2014 invited talk 140710
ldquoChicken and Eggrdquo Loop
bull ldquoChicken and eggrdquo loop in signoffbull Derated library characterization is related to BTI + AVSbull AVS affected by circuit implementation
bull Timing constraints critical paths etc
bull Circuit is affected by library characterization
Circuit
Derated Libraries
Vfinal
Vlib VBTI
64ISVLSI-2014 invited talk 140710
Bias Temperature Instability (BTI)
|ΔVth| increases when device is on (stressed)|ΔVth| is partially recovered when device is off (relaxed)NBTI PMOS PBTINMOS
|Vgs|
time
ON OFF ON OFF
[VattikondaWC06]
Device aging (|ΔVth|) accumulates over time
[TCASrsquo14]
65ISVLSI-2014 invited talk 140710
Observation 1
[Chan11]
bull BTI is a ldquofront-loadedrdquo phenomenon
bull 50 BTI aging happens within the 1st year of circuit lifetime (total lifetime = 10 years)
bull Most Vdd increment happens in early lifetime
bull Gap between Vdd and Vfinal reduces rapidly
asymp70 Vdd increment in 1 year(remaining 30 over 9 years)
Vfinal
66ISVLSI-2014 invited talk 140710
Results for DC Scenario
Optimistic signoff corner bull AVS increases supply voltage
aggressively to compensate agingbull Consume more powerbull May fail to meet timing if desired
bull Assume no EM fix at signoffbull BTI degradation is checked at each step and MTTF is updated as
30 MTTF penalty
200mV voltage compensation
>
70ISVLSI-2014 invited talk 140710
0 2 4 6 8 10 12090092094096098100102104
S1 S2 S3 S4 S5
Year
VDD
DMA 3S1 S2 S3 S4 S5
78
79
80
81
MTT
F (Y
ear)
EM Impact on AVS Scheduling
12 years MTTF penalty
>
71ISVLSI-2014 invited talk 140710
What is ldquoSignoffrdquo
bull Foundation of contract between design house and foundrybull ldquochip should workrdquo stack of models margins analysesbull Function timing signal integrity power integrity hellip
Nominal VddStatic IR drop
Power grid IR gradientDynamic IR
HCINBTI
Signoff Vdd
Voltage
Problem Margins = pessimism
overdesign schedule delay
ldquomargin stackrdquo for voltage signoff
Operating voltage
72ISVLSI-2014 invited talk 140710
Statistical Timing Analysis (1)
bull Delay sensitivity of path pj to variation source zv
bull Assumptions bull Δdjv is linear with respect to variation sources
bull Variation sources are normal distributions
bull Obtain Δdjv using 28 runs of RC extraction and static timing analysis (STA)
28 itf files (27 variation
sources + Ytyp)
Routed Netlist
RC extraction
STA
Δdjv
Δdjv = [ - ] 3dj(Yv) dj(Ytyp)
Note Path delay includes gate and wire delays
73ISVLSI-2014 invited talk 140710
Statistical Timing Analysis (2)bull Σ is the correlation matrix for variation sources (eg 27 x 27)
bull Σ = λλT (Note λ is obtained by Cholesky decomposition)
h h119879 119903119900119906119892 119901119906119905=1minus119864119877119879
+1minus119864119877119903times119879
recovery cycles
Clock period
Error rate [Kahng10]
76ISVLSI-2014 invited talk 140710
Selective-Endpoint Optimizationbull Optimize fanin cone w tighter constraints Allows replacement of Razor FF w normal FFbull Trade off cost of resilience vs data path optimization
Δdelay (vs typ) at C-worst [d(Ycw) ndash d(Ytyp)] d(Ytyp)
bull Some paths have α gt 10 a CBC can underestimate delay variationsbull But these paths often have smaller α values at the other corner ()
C-worst corner underestimates delay variations but these paths are dominated by the RC-worst corner
α lt 10 here delay variations covered by RC-worst corner
Dominated by C-worstΔdelay at C-worst gt Δdelay at RC-worst
Dominated by RC-worst Δdelay at RC-worst gt Δdelay at C-worst
Δdelay (vs typ) at RC-worst [d(Ycw) ndash d(Ytyp)] d(Ytyp)
12ISVLSI-2014 invited talk 140710
α α
Δdelay at C-worst [d(Ycw) ndash d(Ytyp)] d(Ytyp)
bull Some paths have α gt 10 a CBC can underestimate delay variationsbull But these paths often have smaller α values at the other corner ()
C-worst corner underestimates delay variations but these paths are dominated by the RC-worst corner
α lt 10 delay variations are covered by the RC-worst corner
Dominated by C-worstΔdelay at C-worst gt Δdelay at RC-worst
Dominated by RC-worst Δdelay at RC-worst gt Δdelay at C-worst
Δdelay at RC-worst [d(Ycw) ndash d(Ytyp)] d(Ytyp)
bull Paths are more sensitive to R or to Cbull Using RC-worst or C-worst only will underestimate delay variationsbull Need both RC- and C-worst corners to cover process variations
In the following α is defined at the dominant corner
Intuition on Delay Variability Across Cw RCw
13ISVLSI-2014 invited talk 140710
Scaling Factor α and Delay Variationbull Paths with small Δdrcw and Δdcw have large α
bull Eg here we see αj gt 06 when ((Δdrcw lt 3) AND (Δdcw lt 3))
bull Identify paths for tightened BEOL corners based on Δdrcw and Δdcw
Δd(Ycw)d(Ytyp)
Δd(Yrcw)d(Ytyp) α
14ISVLSI-2014 invited talk 140710
bull Paths with small Δdrcw and Δdcw have large α
bull Eg there are αj gt 06 when ((Δdrcw lt 3) AND (Δdcw lt 3))
bull Identify paths for tightened BEOL corners based on Δdrcw and Δdcw
Find Paths for Which TBCs Can Be Used
Δd(Ycw)d(Ytyp)
Δd(Yrcw)d(Ytyp)
Acw
Arcw
Gtbc = Set of paths that can be safely signed off using TBC ( (Path with Δdcw larger than Acw) OR (Path with Δdrcw larger than Arcw) )
α
15ISVLSI-2014 invited talk 140710
Determining α Arcw and Acw
Δd at C-worst corner ()Δd at RC-worst corner ()
bull Assumption critical paths in different designs have similar trends
bull Extract Arcw and Acw from a set of representative paths
bull Plot α vs Δdelay find Arcw and Acw for a given α
bull Add +1 margin on Arcw and Acw to account for sampling error
bull Smaller α larger thresholds (Arcw and Acw) fewer paths in GTBC
Δd at C-worst corner ()
Arcw Acw
16ISVLSI-2014 invited talk 140710
Benefits of Tightened BEOL Corners
bull WNS and TNS are reduced by up to 100ps and 53nsbull Timing violations reduced by
24 to 100
bull TBC-06 more benefits bull Tradeoff between reduced margin
vs paths which use TBC
Correlation factor γ = 05
LEON SUPERBLUE12 NETCARD
-018-016-014-012
-01-008-006-004-002
0
CBC TBC-05 TBC-06 TBC-07
WN
S (n
s)
LEON SUPERBLUE12 NETCARD
-90-80-70-60-50-40-30-20-10
0
CBC TBC-05 TBC-06 TBC-07
TNS
(ns)
LEON SUPERBLUE12 NETCARD0
200400600800
1000120014001600
CBC TBC-05 TBC-06 TBC-07
Tim
ing
viol
ation
s
17ISVLSI-2014 invited talk 140710
Outlinebull Introductionbull Modeling of IC Variabilitybull Tolerance of IC Variabilitybull Margining of IC Variabilitybull Conclusions
18ISVLSI-2014 invited talk 140710
How to Minimize Cost of Resilience bull Additional circuits area and power penaltiesbull Recovery from errors throughput degradationbull Large hold margin short-path padding costbull Want benefits (eg energy) to maximally outweigh costs
Razor Razor-Lite TIMBER
Razor Razor-Lite TIMBER
Power penalty 30 [Das08] ~0 [Kim13] 100 [Choudhury09]
Overall Optimization Flowbull Iteratively optimize with SEOpt and SkewOpt
Initial placement (all FFs = error-tolerant FFs)
Energy lt min energy
Save current solution
Margin insertion on K paths based on sensitivity function
Replace error-tolerant FFs w normal FFs
SEOpt
Activity aware clock skew optimization
SkewOpt
OR-tree insertion
23ISVLSI-2014 invited talk 140710
Benefit of Low-Cost Resiliencebull Reference flows
bull Pure-margin (PM) conventional method w only margin insertionbull Brute-force (BF) use error-tolerant FFs for timing-critical endpoints
bull Proposed method (CO) achieves up to 21 energy reduction compared to reference methods
bull Resilience benefits increase with larger process variation
PM BF CO PM BF CO PM BF CO27
29
31
33
35
37
En
erg
y (
mJ
)
PM BF CO PM BF CO PM BF CO22
26
30
34
38Energy penalty of throughput degradation
Energy penalty of additional circuits
Energy wo resilience
En
erg
y (
mJ
)
Large marginMedium marginSmall margin
MUL
EXU
Large marginMedium marginSmall margin
Smallmediumlarge margin 1σ2σ3σ for SS corner Technology foundry 28nm
24ISVLSI-2014 invited talk 140710
Increased Benefit of Resilience with AVSbull Adaptive voltage scaling allows a lower supply voltage for resilient
designs thus reduced powerbull Proposed method trades off between timing-error penalty vs
reduced power at a lower supply voltagebull Proposed method achieves an average of 17 energy reduction
compared to pure-margin designs Resilience benefits increase in the context of AVS strategy
086 09 094 098 10225
30
35
40
45
50pure-marginbrute-forceCombOpt
Supply voltage (V)
En
erg
y (
mJ
)
070 072 074 076 078 08024
26
28
30
32
34
36 pure-marginbrute-forceCombOpt
Supply voltage (V)
En
erg
y (
mJ
)
MUL EXU
Minimum achievable energy
Technology foundry 28nm
25ISVLSI-2014 invited talk 140710
Outlinebull Introductionbull Modeling of IC Variabilitybull Tolerance of IC Variabilitybull Margining of IC Variabilitybull Conclusions
26ISVLSI-2014 invited talk 140710
Breaking Chicken-Egg Loops Less Marginbull Example Interaction between reliability margin and AVS designs
bull Bias temperature instability (BTI) aging higher |ΔVth| lower fmax
bull AVS can be used to compensate for performance degradation
Circuit
Closed-loop AVS
On-chip aging
monitor
Circuit performanc
e
Voltage regulato
r
Circuit frequency
Vdd
time
time
Without AVSWith AVS
target
27ISVLSI-2014 invited talk 140710
Derated Library Characterization and AVS
bull VBTI = Voltage for BTI aging estimation
bull Vlib = Voltage for circuit performance estimation (library characterization)
bull VBTI and Vlib are required in signoff
bull VBTI and Vlib selection should consider BTI + AVS interaction
bull Aging and Vfinal are unknowns before circuit implementation
BTI degradation and AVS
Vfinal
VBTI |Vt|
Step 1
Vlib
Derated library
Step 2
Circuit implementation and
signoff
circuit
Step 3
28ISVLSI-2014 invited talk 140710
Library Characterization for AVS
bull VBTI = Voltage for BTI aging estimation
bull Vlib = Voltage for circuit performance estimation (library characterization)
bull VBTI and Vlib are required in signoff
bull VBTI and Vlib depend on aging during AVS
bull Aging and Vfinal are unknowns before circuit implementation
Vlib
VBTI Derated library
|Vt| Circuit implementation and
signoff
circuitBTI degradation and AVS
Vfinal
Step 1 Step 2 Step 3
No obvious guideline to define VBTI and Vlib
Inconsistency among Vfinal Vlib VBTI
bull What is the design overhead when timing libraries are not properly characterized
bull Can we define BTI- and AVS-aware signoff corners that ensure product goals with small design lifetime energy overheads Joint work with Wei-Ting Jonas Chan Tuck-Boon Chan Siddhartha Nath
29ISVLSI-2014 invited talk 140710
Power vs Area Across Different Signoffs
ldquoKneerdquo point for balanced area and power tradeoff
Optimistic signoff corner bull AVS increases supply voltage
aggressively to compensate aging
bull Large lifetime energy overhead
bull May fail to meet timing if desired supply voltage gt Vmax
bull Selection of signoff modes affects area power
bull ASP-DAC 2013 Optimization of signoff modes
Improve performance power or area
Reduce overdesign
NOM
ODNOM
OD
time
Vdd
tnom tOD tnom tOD
Also Multi-Mode Signoff Choices Matter
12
Fix fOD still 14 power range
Power of circuits w different overdrive modes
Different overdrive modes 26 power range
fnom = 800MHz Vnom = 08V
36ISVLSI-2014 invited talk 140710
Also Tunable Monitors Less Margin
Aggressive config Vmin_est lt Vmin_chip Some chips will fail
Optimized configbull Increase high
resistance passgatesbull Vmin_est asymp Vmin_chip
Default config
bull Low resistance passgates
bull Guardband for worst-case
bull Vmin_est gt Vmin_chip
bull 13mV margin
37ISVLSI-2014 invited talk 140710
Also Tunable Monitors Less Margin
Aggressive config Vmin_est lt Vmin_chip Some chips will fail
Default config
bull Low resistance passgates
bull Guardband for worst-case
bull Vmin_est gt Vmin_chip
bull 13mV margin
Optimized configbull Increase high
resistance passgatesbull Vmin_est asymp Vmin_chip
Benefits of tunability bull Compensate for difference
between model vs siliconbull Recover margin when variation is
reduced due to improved process
38ISVLSI-2014 invited talk 140710
Outlinebull Introductionbull Modeling of IC Variabilitybull Margining of IC Variabilitybull Tolerance of IC Variabilitybull Conclusions
39ISVLSI-2014 invited talk 140710
Conclusionsbull Variability severely challenges IC value
bull In manufacturing process during operation across lifetime
bull Benefit of ldquonext noderdquo is increasingly hard to findbull Entire node is a ldquo202020rdquo value propositionbull 5-10 in PPA metrics is now substantial at leading edge
bull Variability is connected to tapeout IC properties by models margins tolerances used in signoff
bull Some takeaways from this talkbull Substantial benefit from tightening BEOL corners (= signoff)bull ldquoMinimum cost of resiliencerdquo is a rich optimization challengebull Chicken-egg loops in signoff definition can be brokenbull Holistic approaches will provide ldquoequivalent scalingrdquo that
extends the value trajectory of Moorersquos Law
40ISVLSI-2014 invited talk 140710
Thank You
41ISVLSI-2014 invited talk 140710
Backup
42ISVLSI-2014 invited talk 140710
Power Penalty to Fix EM with AVS
1 2 3 4 5 6 7 8 91200
1300
1400
1500
1600
1700
030
032
034
036
Core Power (mW) PG Power (mW)
Implemetation
Core
Pow
er (m
W)
PG
Pow
er (m
W)
bull Core power increases due to elevated voltage bull PG power increases due to both elevated voltage and mesh degradationbull A tradeoff between invested guardband in signoff
Highest invested guardband
Least invested guardband
14 power penalty
>
43ISVLSI-2014 invited talk 140710
Homogeneous Cornersbull (1) Define RC corners of each layer separatelybull (2) Use corners from each layer to construct a
homogeneous corner for an interconnect stack
C-3σ
Layer M2
3σ
C-3σ
Layer M1
3σ
Interconnect stack with M1 and M2
M1 C
M2 C
3σ Pessimism
Example worst-case capacitance corner Homogeneous
Cw corner
44ISVLSI-2014 invited talk 140710
Homogeneous Cornersbull (1) Define RC corners of each layer separatelybull (2) Use corners from each layer to construct a
homogeneous corner for an interconnect stack
Interconnect stack with M1 and M2
M1 C
M2 C
3σ
Homogeneous Cw corner
C-3σ
Layer M2
3σ
C-3σ
Layer M1
3σ
Pessimism
Example worst-case capacitance corner
When variations in different layers are not fully correlated pessimism of homogeneous corners increase with layers
45ISVLSI-2014 invited talk 140710
Correlation Matrixbull Let Σ be the correlation matrix for variation sources
M1 M2 M3 M4
ΔW ΔT ΔH ΔW ΔT ΔH ΔW ΔT ΔH ΔW ΔT ΔH
M1 ΔW 1 0 0 γ 0 0 γ 0 0 0 0 0
ΔT 0 1 0 0 γ 0 0 γ 0 0 0 0
ΔH 0 0 1 0 0 γ 0 0 γ 0 0 0
M2 ΔW γ 0 0 1 0 0 γ 0 0 0 0 0
ΔT 0 γ 0 0 1 0 0 γ 0 0 0 0
ΔH 0 0 γ 0 0 1 0 0 γ 0 0 0
M3 ΔW γ 0 0 γ 0 0 1 0 0 0 0 0
ΔT 0 γ 0 0 γ 0 0 1 0 0 0 0
ΔH 0 0 γ 0 0 γ 0 0 1 0 0 0
M4 ΔW 0 0 0 0 0 0 0 0 0 1 0 0
ΔT 0 0 0 0 0 0 0 0 0 0 1 0
ΔH 0 0 0 0 0 0 0 0 0 0 0 1
= Σ
Correlation for variation sources with the same variation type and in the process module γ 05
Variation sources in different process modules are independent
46ISVLSI-2014 invited talk 140710
Wiring Structure in Timing-Critical Paths (2)
bull 92 of paths have lt 60 of wirelength on any single layer
Max wirelength ratio across all layers ()
Cum
ulati
ve p
roba
bilit
y
092
60
bull Variations in different layers are not fully correlated
bull Some paths have α gt 10 a CBC can underestimate delay variationsbull But these paths have larger delays at the other corner
C-worst corner underestimates delay variations but these paths are dominated by the RC-worst corner
α lt 10 delay variations are covered by the RC-worst corner
Dominated by C-worstΔdelay at C-worst gt Δdelay at RC-worst
Dominated by RC-worst Δdelay at RC-worst gt Δdelay at C-worst
Δdelay at RC-worst [d(Ycw) ndash d(Ytyp)] d(Ytyp)
48ISVLSI-2014 invited talk 140710
Delay Variation
α α
Δdelay at C-worst [d(Ycw) ndash d(Ytyp)] d(Ytyp)
bull Some paths have α gt 10 a CBC can underestimate delay variationsbull But these paths have larger delays at the other corner
C-worst corner underestimates delay variations but these paths are dominated by the RC-worst corner
α lt 10 delay variations are covered by the RC-worst corner
Dominated by C-worstΔdelay at C-worst gt Δdelay at RC-worst
Dominated by RC-worst Δdelay at RC-worst gt Δdelay at C-worst
Δdelay at RC-worst [d(Ycw) ndash d(Ytyp)] d(Ytyp)
bull Paths are more sensitive to R or to Cbull Using RC-worst or C-worst only will underestimate delay variationsbull Need both RC- and C-worst corners to cover process variationsbull In the following discussions α is defined at the dominant corner
49ISVLSI-2014 invited talk 140710
Non-Homogeneous Corner
bull Each layer can have different skewed variationsInterconnect stack with M1 and M2
M1 C
M2 C
3σ
Non-homogeneous cornerM1 == Cw (3σ)M2 == Ctyp
bull Less pessimism with non-homogeneous cornersbull Challenge
bull Many feasible combinationsbull A corner can only cover certain pathsbull How to choose the best combinations
50ISVLSI-2014 invited talk 140710
Opportunities for Tightened BEOL Corners
bull CBC can be pessimistic Most paths have α lt 05 bull Use tightened BEOL corners eg scale BEOL variation in
itf with α = 05
Δdj(Yrcw)dj(Ytyp) x 100
3σjd(Ytyp) x 100
Challenge how to avoid underestimating delay variation to preserve parametric yield
51ISVLSI-2014 invited talk 140710
Wiring Structure in Timing-Critical Paths
bull Critical paths are structurally similar
bull Wires on critical paths are routed on many layers
bull Structure is an outcome of the design flow
Testcasebull 45nm foundry library (wire
resistivity scaled by 8X)bull Netlist NETCARD 1mm2 570K
standard cell instancesbull 9 metal layersbull Extract critical paths from
different PVT and BEOL corners
Wirelength ratio ()
52ISVLSI-2014 invited talk 140710
Proposed Timing Signoff Flow
bull Extract RC at RC-worst C-worst and the typical corners
bull Calculate Δdelay of critical paths
bull Put path j in the group Gtbc if Δdelay is larger than a threshold
bull Fix only the paths in Gtbc using tightened BEOL corners
bull Since tightened corners have smaller delay variations timing closure is easier
Routed design
Timing analysis at BEOL corners Ytyp Ycw Yrcw
GTBC GCBC
ECOusing CBC
Timing analysis
using TBC
violation = 0
Timing analysis
using CBC
violation = 0
ECOusing TBC
done
53ISVLSI-2014 invited talk 140710
Experiment Setup
LEON3MP NETCARD SUPERBLUE12Clock period (ns) 18 20 31
bull Model BTI degradation with Vfinal throughout lifetime
bull Aging of a flat Vfinal asymp aging of an adaptive Vdd
bull But slightly pessimistic
Vdd
time
NBTI
PBTI
VBTI = Vlib asymp Vfinal
58ISVLSI-2014 invited talk 140710
Vfinal Estimation
bull Problem Vfinal is not available at early design stage (design has not been implemented)
bull Vfinal = Vdd end of life (to compensate BTI aging)
bull Gates along critical pathbull Timing slack at t = 0
bull Circuit activity is not an issue bull Because BTI effect is not sensitive to circuit activitybull DC or AC stress model is sufficient
59ISVLSI-2014 invited talk 140710
Observation and Heuristic 2
bull Observation 2 Vfinal is not sensitive to gate types
bull Heuristic 2 use average Vfinal of different gate typesbull Vfinal is a function of timing slack
bull Assume timing slack = 0
10mV
60ISVLSI-2014 invited talk 140710
Technology and Benchmark Circuits
bull NANGATE library with 32nm PTM technology bull Signoff for setup time violationbull Temperature = 125Cbull Process corner = slow NMOS and PMOSbull BTI degradation = DC AC
Supply voltages
Circuit Frequency (GHz)C5315 138c7552 125AES 089MPEG2 105
Vmax105V
Vinit090V
Vheur1 (DC) 097V
Vheur1 (AC) 095V
Vheur2 (DC) 095V
Vheur2 (AC) 093V
61ISVLSI-2014 invited talk 140710
A Reference Signoff Flow
bull Basic idea keep a consistent VBTI VLIB and Vdd throughout circuit lifetime
bull Signoff flowbull Estimate aging at each time step
bull Update circuit timing and Vdd
bull Repeat until t = tfinal
bull Modify circuit and start over if Vfinal gt maximum allowed voltage
bull No overhead in timing analysis but very slow Many STA runs
and library
Vstep AVS voltage stepVfinal converged voltage
62ISVLSI-2014 invited talk 140710
Experiment Setupbull Characterize different derated libraries
bull Evaluate impact of library characterizationbull Seven setups
1 VBTI = Vlib = Vinit Ignore AVS2 Most pessimistic derated library3 VBTI = Vlib = Vmax Extreme corner for AVS4 VBTI = Vfinal Do not overestimate aging but ignores AVS5 No derated library (reference)6 Proposed method with α=07 Proposed method with α=003
Case 1 2 3 4 5 6 7Vlib(V) Vinit Vinit Vmax Vinit NA Vheur1 Vheur2
VBTI (V) Vinit Vmax Vmax Vfinal NA Vheur1 Vheur2
63ISVLSI-2014 invited talk 140710
ldquoChicken and Eggrdquo Loop
bull ldquoChicken and eggrdquo loop in signoffbull Derated library characterization is related to BTI + AVSbull AVS affected by circuit implementation
bull Timing constraints critical paths etc
bull Circuit is affected by library characterization
Circuit
Derated Libraries
Vfinal
Vlib VBTI
64ISVLSI-2014 invited talk 140710
Bias Temperature Instability (BTI)
|ΔVth| increases when device is on (stressed)|ΔVth| is partially recovered when device is off (relaxed)NBTI PMOS PBTINMOS
|Vgs|
time
ON OFF ON OFF
[VattikondaWC06]
Device aging (|ΔVth|) accumulates over time
[TCASrsquo14]
65ISVLSI-2014 invited talk 140710
Observation 1
[Chan11]
bull BTI is a ldquofront-loadedrdquo phenomenon
bull 50 BTI aging happens within the 1st year of circuit lifetime (total lifetime = 10 years)
bull Most Vdd increment happens in early lifetime
bull Gap between Vdd and Vfinal reduces rapidly
asymp70 Vdd increment in 1 year(remaining 30 over 9 years)
Vfinal
66ISVLSI-2014 invited talk 140710
Results for DC Scenario
Optimistic signoff corner bull AVS increases supply voltage
aggressively to compensate agingbull Consume more powerbull May fail to meet timing if desired
bull Assume no EM fix at signoffbull BTI degradation is checked at each step and MTTF is updated as
30 MTTF penalty
200mV voltage compensation
>
70ISVLSI-2014 invited talk 140710
0 2 4 6 8 10 12090092094096098100102104
S1 S2 S3 S4 S5
Year
VDD
DMA 3S1 S2 S3 S4 S5
78
79
80
81
MTT
F (Y
ear)
EM Impact on AVS Scheduling
12 years MTTF penalty
>
71ISVLSI-2014 invited talk 140710
What is ldquoSignoffrdquo
bull Foundation of contract between design house and foundrybull ldquochip should workrdquo stack of models margins analysesbull Function timing signal integrity power integrity hellip
Nominal VddStatic IR drop
Power grid IR gradientDynamic IR
HCINBTI
Signoff Vdd
Voltage
Problem Margins = pessimism
overdesign schedule delay
ldquomargin stackrdquo for voltage signoff
Operating voltage
72ISVLSI-2014 invited talk 140710
Statistical Timing Analysis (1)
bull Delay sensitivity of path pj to variation source zv
bull Assumptions bull Δdjv is linear with respect to variation sources
bull Variation sources are normal distributions
bull Obtain Δdjv using 28 runs of RC extraction and static timing analysis (STA)
28 itf files (27 variation
sources + Ytyp)
Routed Netlist
RC extraction
STA
Δdjv
Δdjv = [ - ] 3dj(Yv) dj(Ytyp)
Note Path delay includes gate and wire delays
73ISVLSI-2014 invited talk 140710
Statistical Timing Analysis (2)bull Σ is the correlation matrix for variation sources (eg 27 x 27)
bull Σ = λλT (Note λ is obtained by Cholesky decomposition)
h h119879 119903119900119906119892 119901119906119905=1minus119864119877119879
+1minus119864119877119903times119879
recovery cycles
Clock period
Error rate [Kahng10]
76ISVLSI-2014 invited talk 140710
Selective-Endpoint Optimizationbull Optimize fanin cone w tighter constraints Allows replacement of Razor FF w normal FFbull Trade off cost of resilience vs data path optimization
Δdelay (vs typ) at C-worst [d(Ycw) ndash d(Ytyp)] d(Ytyp)
bull Some paths have α gt 10 a CBC can underestimate delay variationsbull But these paths often have smaller α values at the other corner ()
C-worst corner underestimates delay variations but these paths are dominated by the RC-worst corner
α lt 10 here delay variations covered by RC-worst corner
Dominated by C-worstΔdelay at C-worst gt Δdelay at RC-worst
Dominated by RC-worst Δdelay at RC-worst gt Δdelay at C-worst
Δdelay (vs typ) at RC-worst [d(Ycw) ndash d(Ytyp)] d(Ytyp)
12ISVLSI-2014 invited talk 140710
α α
Δdelay at C-worst [d(Ycw) ndash d(Ytyp)] d(Ytyp)
bull Some paths have α gt 10 a CBC can underestimate delay variationsbull But these paths often have smaller α values at the other corner ()
C-worst corner underestimates delay variations but these paths are dominated by the RC-worst corner
α lt 10 delay variations are covered by the RC-worst corner
Dominated by C-worstΔdelay at C-worst gt Δdelay at RC-worst
Dominated by RC-worst Δdelay at RC-worst gt Δdelay at C-worst
Δdelay at RC-worst [d(Ycw) ndash d(Ytyp)] d(Ytyp)
bull Paths are more sensitive to R or to Cbull Using RC-worst or C-worst only will underestimate delay variationsbull Need both RC- and C-worst corners to cover process variations
In the following α is defined at the dominant corner
Intuition on Delay Variability Across Cw RCw
13ISVLSI-2014 invited talk 140710
Scaling Factor α and Delay Variationbull Paths with small Δdrcw and Δdcw have large α
bull Eg here we see αj gt 06 when ((Δdrcw lt 3) AND (Δdcw lt 3))
bull Identify paths for tightened BEOL corners based on Δdrcw and Δdcw
Δd(Ycw)d(Ytyp)
Δd(Yrcw)d(Ytyp) α
14ISVLSI-2014 invited talk 140710
bull Paths with small Δdrcw and Δdcw have large α
bull Eg there are αj gt 06 when ((Δdrcw lt 3) AND (Δdcw lt 3))
bull Identify paths for tightened BEOL corners based on Δdrcw and Δdcw
Find Paths for Which TBCs Can Be Used
Δd(Ycw)d(Ytyp)
Δd(Yrcw)d(Ytyp)
Acw
Arcw
Gtbc = Set of paths that can be safely signed off using TBC ( (Path with Δdcw larger than Acw) OR (Path with Δdrcw larger than Arcw) )
α
15ISVLSI-2014 invited talk 140710
Determining α Arcw and Acw
Δd at C-worst corner ()Δd at RC-worst corner ()
bull Assumption critical paths in different designs have similar trends
bull Extract Arcw and Acw from a set of representative paths
bull Plot α vs Δdelay find Arcw and Acw for a given α
bull Add +1 margin on Arcw and Acw to account for sampling error
bull Smaller α larger thresholds (Arcw and Acw) fewer paths in GTBC
Δd at C-worst corner ()
Arcw Acw
16ISVLSI-2014 invited talk 140710
Benefits of Tightened BEOL Corners
bull WNS and TNS are reduced by up to 100ps and 53nsbull Timing violations reduced by
24 to 100
bull TBC-06 more benefits bull Tradeoff between reduced margin
vs paths which use TBC
Correlation factor γ = 05
LEON SUPERBLUE12 NETCARD
-018-016-014-012
-01-008-006-004-002
0
CBC TBC-05 TBC-06 TBC-07
WN
S (n
s)
LEON SUPERBLUE12 NETCARD
-90-80-70-60-50-40-30-20-10
0
CBC TBC-05 TBC-06 TBC-07
TNS
(ns)
LEON SUPERBLUE12 NETCARD0
200400600800
1000120014001600
CBC TBC-05 TBC-06 TBC-07
Tim
ing
viol
ation
s
17ISVLSI-2014 invited talk 140710
Outlinebull Introductionbull Modeling of IC Variabilitybull Tolerance of IC Variabilitybull Margining of IC Variabilitybull Conclusions
18ISVLSI-2014 invited talk 140710
How to Minimize Cost of Resilience bull Additional circuits area and power penaltiesbull Recovery from errors throughput degradationbull Large hold margin short-path padding costbull Want benefits (eg energy) to maximally outweigh costs
Razor Razor-Lite TIMBER
Razor Razor-Lite TIMBER
Power penalty 30 [Das08] ~0 [Kim13] 100 [Choudhury09]
Overall Optimization Flowbull Iteratively optimize with SEOpt and SkewOpt
Initial placement (all FFs = error-tolerant FFs)
Energy lt min energy
Save current solution
Margin insertion on K paths based on sensitivity function
Replace error-tolerant FFs w normal FFs
SEOpt
Activity aware clock skew optimization
SkewOpt
OR-tree insertion
23ISVLSI-2014 invited talk 140710
Benefit of Low-Cost Resiliencebull Reference flows
bull Pure-margin (PM) conventional method w only margin insertionbull Brute-force (BF) use error-tolerant FFs for timing-critical endpoints
bull Proposed method (CO) achieves up to 21 energy reduction compared to reference methods
bull Resilience benefits increase with larger process variation
PM BF CO PM BF CO PM BF CO27
29
31
33
35
37
En
erg
y (
mJ
)
PM BF CO PM BF CO PM BF CO22
26
30
34
38Energy penalty of throughput degradation
Energy penalty of additional circuits
Energy wo resilience
En
erg
y (
mJ
)
Large marginMedium marginSmall margin
MUL
EXU
Large marginMedium marginSmall margin
Smallmediumlarge margin 1σ2σ3σ for SS corner Technology foundry 28nm
24ISVLSI-2014 invited talk 140710
Increased Benefit of Resilience with AVSbull Adaptive voltage scaling allows a lower supply voltage for resilient
designs thus reduced powerbull Proposed method trades off between timing-error penalty vs
reduced power at a lower supply voltagebull Proposed method achieves an average of 17 energy reduction
compared to pure-margin designs Resilience benefits increase in the context of AVS strategy
086 09 094 098 10225
30
35
40
45
50pure-marginbrute-forceCombOpt
Supply voltage (V)
En
erg
y (
mJ
)
070 072 074 076 078 08024
26
28
30
32
34
36 pure-marginbrute-forceCombOpt
Supply voltage (V)
En
erg
y (
mJ
)
MUL EXU
Minimum achievable energy
Technology foundry 28nm
25ISVLSI-2014 invited talk 140710
Outlinebull Introductionbull Modeling of IC Variabilitybull Tolerance of IC Variabilitybull Margining of IC Variabilitybull Conclusions
26ISVLSI-2014 invited talk 140710
Breaking Chicken-Egg Loops Less Marginbull Example Interaction between reliability margin and AVS designs
bull Bias temperature instability (BTI) aging higher |ΔVth| lower fmax
bull AVS can be used to compensate for performance degradation
Circuit
Closed-loop AVS
On-chip aging
monitor
Circuit performanc
e
Voltage regulato
r
Circuit frequency
Vdd
time
time
Without AVSWith AVS
target
27ISVLSI-2014 invited talk 140710
Derated Library Characterization and AVS
bull VBTI = Voltage for BTI aging estimation
bull Vlib = Voltage for circuit performance estimation (library characterization)
bull VBTI and Vlib are required in signoff
bull VBTI and Vlib selection should consider BTI + AVS interaction
bull Aging and Vfinal are unknowns before circuit implementation
BTI degradation and AVS
Vfinal
VBTI |Vt|
Step 1
Vlib
Derated library
Step 2
Circuit implementation and
signoff
circuit
Step 3
28ISVLSI-2014 invited talk 140710
Library Characterization for AVS
bull VBTI = Voltage for BTI aging estimation
bull Vlib = Voltage for circuit performance estimation (library characterization)
bull VBTI and Vlib are required in signoff
bull VBTI and Vlib depend on aging during AVS
bull Aging and Vfinal are unknowns before circuit implementation
Vlib
VBTI Derated library
|Vt| Circuit implementation and
signoff
circuitBTI degradation and AVS
Vfinal
Step 1 Step 2 Step 3
No obvious guideline to define VBTI and Vlib
Inconsistency among Vfinal Vlib VBTI
bull What is the design overhead when timing libraries are not properly characterized
bull Can we define BTI- and AVS-aware signoff corners that ensure product goals with small design lifetime energy overheads Joint work with Wei-Ting Jonas Chan Tuck-Boon Chan Siddhartha Nath
29ISVLSI-2014 invited talk 140710
Power vs Area Across Different Signoffs
ldquoKneerdquo point for balanced area and power tradeoff
Optimistic signoff corner bull AVS increases supply voltage
aggressively to compensate aging
bull Large lifetime energy overhead
bull May fail to meet timing if desired supply voltage gt Vmax
bull Selection of signoff modes affects area power
bull ASP-DAC 2013 Optimization of signoff modes
Improve performance power or area
Reduce overdesign
NOM
ODNOM
OD
time
Vdd
tnom tOD tnom tOD
Also Multi-Mode Signoff Choices Matter
12
Fix fOD still 14 power range
Power of circuits w different overdrive modes
Different overdrive modes 26 power range
fnom = 800MHz Vnom = 08V
36ISVLSI-2014 invited talk 140710
Also Tunable Monitors Less Margin
Aggressive config Vmin_est lt Vmin_chip Some chips will fail
Optimized configbull Increase high
resistance passgatesbull Vmin_est asymp Vmin_chip
Default config
bull Low resistance passgates
bull Guardband for worst-case
bull Vmin_est gt Vmin_chip
bull 13mV margin
37ISVLSI-2014 invited talk 140710
Also Tunable Monitors Less Margin
Aggressive config Vmin_est lt Vmin_chip Some chips will fail
Default config
bull Low resistance passgates
bull Guardband for worst-case
bull Vmin_est gt Vmin_chip
bull 13mV margin
Optimized configbull Increase high
resistance passgatesbull Vmin_est asymp Vmin_chip
Benefits of tunability bull Compensate for difference
between model vs siliconbull Recover margin when variation is
reduced due to improved process
38ISVLSI-2014 invited talk 140710
Outlinebull Introductionbull Modeling of IC Variabilitybull Margining of IC Variabilitybull Tolerance of IC Variabilitybull Conclusions
39ISVLSI-2014 invited talk 140710
Conclusionsbull Variability severely challenges IC value
bull In manufacturing process during operation across lifetime
bull Benefit of ldquonext noderdquo is increasingly hard to findbull Entire node is a ldquo202020rdquo value propositionbull 5-10 in PPA metrics is now substantial at leading edge
bull Variability is connected to tapeout IC properties by models margins tolerances used in signoff
bull Some takeaways from this talkbull Substantial benefit from tightening BEOL corners (= signoff)bull ldquoMinimum cost of resiliencerdquo is a rich optimization challengebull Chicken-egg loops in signoff definition can be brokenbull Holistic approaches will provide ldquoequivalent scalingrdquo that
extends the value trajectory of Moorersquos Law
40ISVLSI-2014 invited talk 140710
Thank You
41ISVLSI-2014 invited talk 140710
Backup
42ISVLSI-2014 invited talk 140710
Power Penalty to Fix EM with AVS
1 2 3 4 5 6 7 8 91200
1300
1400
1500
1600
1700
030
032
034
036
Core Power (mW) PG Power (mW)
Implemetation
Core
Pow
er (m
W)
PG
Pow
er (m
W)
bull Core power increases due to elevated voltage bull PG power increases due to both elevated voltage and mesh degradationbull A tradeoff between invested guardband in signoff
Highest invested guardband
Least invested guardband
14 power penalty
>
43ISVLSI-2014 invited talk 140710
Homogeneous Cornersbull (1) Define RC corners of each layer separatelybull (2) Use corners from each layer to construct a
homogeneous corner for an interconnect stack
C-3σ
Layer M2
3σ
C-3σ
Layer M1
3σ
Interconnect stack with M1 and M2
M1 C
M2 C
3σ Pessimism
Example worst-case capacitance corner Homogeneous
Cw corner
44ISVLSI-2014 invited talk 140710
Homogeneous Cornersbull (1) Define RC corners of each layer separatelybull (2) Use corners from each layer to construct a
homogeneous corner for an interconnect stack
Interconnect stack with M1 and M2
M1 C
M2 C
3σ
Homogeneous Cw corner
C-3σ
Layer M2
3σ
C-3σ
Layer M1
3σ
Pessimism
Example worst-case capacitance corner
When variations in different layers are not fully correlated pessimism of homogeneous corners increase with layers
45ISVLSI-2014 invited talk 140710
Correlation Matrixbull Let Σ be the correlation matrix for variation sources
M1 M2 M3 M4
ΔW ΔT ΔH ΔW ΔT ΔH ΔW ΔT ΔH ΔW ΔT ΔH
M1 ΔW 1 0 0 γ 0 0 γ 0 0 0 0 0
ΔT 0 1 0 0 γ 0 0 γ 0 0 0 0
ΔH 0 0 1 0 0 γ 0 0 γ 0 0 0
M2 ΔW γ 0 0 1 0 0 γ 0 0 0 0 0
ΔT 0 γ 0 0 1 0 0 γ 0 0 0 0
ΔH 0 0 γ 0 0 1 0 0 γ 0 0 0
M3 ΔW γ 0 0 γ 0 0 1 0 0 0 0 0
ΔT 0 γ 0 0 γ 0 0 1 0 0 0 0
ΔH 0 0 γ 0 0 γ 0 0 1 0 0 0
M4 ΔW 0 0 0 0 0 0 0 0 0 1 0 0
ΔT 0 0 0 0 0 0 0 0 0 0 1 0
ΔH 0 0 0 0 0 0 0 0 0 0 0 1
= Σ
Correlation for variation sources with the same variation type and in the process module γ 05
Variation sources in different process modules are independent
46ISVLSI-2014 invited talk 140710
Wiring Structure in Timing-Critical Paths (2)
bull 92 of paths have lt 60 of wirelength on any single layer
Max wirelength ratio across all layers ()
Cum
ulati
ve p
roba
bilit
y
092
60
bull Variations in different layers are not fully correlated
bull Some paths have α gt 10 a CBC can underestimate delay variationsbull But these paths have larger delays at the other corner
C-worst corner underestimates delay variations but these paths are dominated by the RC-worst corner
α lt 10 delay variations are covered by the RC-worst corner
Dominated by C-worstΔdelay at C-worst gt Δdelay at RC-worst
Dominated by RC-worst Δdelay at RC-worst gt Δdelay at C-worst
Δdelay at RC-worst [d(Ycw) ndash d(Ytyp)] d(Ytyp)
48ISVLSI-2014 invited talk 140710
Delay Variation
α α
Δdelay at C-worst [d(Ycw) ndash d(Ytyp)] d(Ytyp)
bull Some paths have α gt 10 a CBC can underestimate delay variationsbull But these paths have larger delays at the other corner
C-worst corner underestimates delay variations but these paths are dominated by the RC-worst corner
α lt 10 delay variations are covered by the RC-worst corner
Dominated by C-worstΔdelay at C-worst gt Δdelay at RC-worst
Dominated by RC-worst Δdelay at RC-worst gt Δdelay at C-worst
Δdelay at RC-worst [d(Ycw) ndash d(Ytyp)] d(Ytyp)
bull Paths are more sensitive to R or to Cbull Using RC-worst or C-worst only will underestimate delay variationsbull Need both RC- and C-worst corners to cover process variationsbull In the following discussions α is defined at the dominant corner
49ISVLSI-2014 invited talk 140710
Non-Homogeneous Corner
bull Each layer can have different skewed variationsInterconnect stack with M1 and M2
M1 C
M2 C
3σ
Non-homogeneous cornerM1 == Cw (3σ)M2 == Ctyp
bull Less pessimism with non-homogeneous cornersbull Challenge
bull Many feasible combinationsbull A corner can only cover certain pathsbull How to choose the best combinations
50ISVLSI-2014 invited talk 140710
Opportunities for Tightened BEOL Corners
bull CBC can be pessimistic Most paths have α lt 05 bull Use tightened BEOL corners eg scale BEOL variation in
itf with α = 05
Δdj(Yrcw)dj(Ytyp) x 100
3σjd(Ytyp) x 100
Challenge how to avoid underestimating delay variation to preserve parametric yield
51ISVLSI-2014 invited talk 140710
Wiring Structure in Timing-Critical Paths
bull Critical paths are structurally similar
bull Wires on critical paths are routed on many layers
bull Structure is an outcome of the design flow
Testcasebull 45nm foundry library (wire
resistivity scaled by 8X)bull Netlist NETCARD 1mm2 570K
standard cell instancesbull 9 metal layersbull Extract critical paths from
different PVT and BEOL corners
Wirelength ratio ()
52ISVLSI-2014 invited talk 140710
Proposed Timing Signoff Flow
bull Extract RC at RC-worst C-worst and the typical corners
bull Calculate Δdelay of critical paths
bull Put path j in the group Gtbc if Δdelay is larger than a threshold
bull Fix only the paths in Gtbc using tightened BEOL corners
bull Since tightened corners have smaller delay variations timing closure is easier
Routed design
Timing analysis at BEOL corners Ytyp Ycw Yrcw
GTBC GCBC
ECOusing CBC
Timing analysis
using TBC
violation = 0
Timing analysis
using CBC
violation = 0
ECOusing TBC
done
53ISVLSI-2014 invited talk 140710
Experiment Setup
LEON3MP NETCARD SUPERBLUE12Clock period (ns) 18 20 31
bull Model BTI degradation with Vfinal throughout lifetime
bull Aging of a flat Vfinal asymp aging of an adaptive Vdd
bull But slightly pessimistic
Vdd
time
NBTI
PBTI
VBTI = Vlib asymp Vfinal
58ISVLSI-2014 invited talk 140710
Vfinal Estimation
bull Problem Vfinal is not available at early design stage (design has not been implemented)
bull Vfinal = Vdd end of life (to compensate BTI aging)
bull Gates along critical pathbull Timing slack at t = 0
bull Circuit activity is not an issue bull Because BTI effect is not sensitive to circuit activitybull DC or AC stress model is sufficient
59ISVLSI-2014 invited talk 140710
Observation and Heuristic 2
bull Observation 2 Vfinal is not sensitive to gate types
bull Heuristic 2 use average Vfinal of different gate typesbull Vfinal is a function of timing slack
bull Assume timing slack = 0
10mV
60ISVLSI-2014 invited talk 140710
Technology and Benchmark Circuits
bull NANGATE library with 32nm PTM technology bull Signoff for setup time violationbull Temperature = 125Cbull Process corner = slow NMOS and PMOSbull BTI degradation = DC AC
Supply voltages
Circuit Frequency (GHz)C5315 138c7552 125AES 089MPEG2 105
Vmax105V
Vinit090V
Vheur1 (DC) 097V
Vheur1 (AC) 095V
Vheur2 (DC) 095V
Vheur2 (AC) 093V
61ISVLSI-2014 invited talk 140710
A Reference Signoff Flow
bull Basic idea keep a consistent VBTI VLIB and Vdd throughout circuit lifetime
bull Signoff flowbull Estimate aging at each time step
bull Update circuit timing and Vdd
bull Repeat until t = tfinal
bull Modify circuit and start over if Vfinal gt maximum allowed voltage
bull No overhead in timing analysis but very slow Many STA runs
and library
Vstep AVS voltage stepVfinal converged voltage
62ISVLSI-2014 invited talk 140710
Experiment Setupbull Characterize different derated libraries
bull Evaluate impact of library characterizationbull Seven setups
1 VBTI = Vlib = Vinit Ignore AVS2 Most pessimistic derated library3 VBTI = Vlib = Vmax Extreme corner for AVS4 VBTI = Vfinal Do not overestimate aging but ignores AVS5 No derated library (reference)6 Proposed method with α=07 Proposed method with α=003
Case 1 2 3 4 5 6 7Vlib(V) Vinit Vinit Vmax Vinit NA Vheur1 Vheur2
VBTI (V) Vinit Vmax Vmax Vfinal NA Vheur1 Vheur2
63ISVLSI-2014 invited talk 140710
ldquoChicken and Eggrdquo Loop
bull ldquoChicken and eggrdquo loop in signoffbull Derated library characterization is related to BTI + AVSbull AVS affected by circuit implementation
bull Timing constraints critical paths etc
bull Circuit is affected by library characterization
Circuit
Derated Libraries
Vfinal
Vlib VBTI
64ISVLSI-2014 invited talk 140710
Bias Temperature Instability (BTI)
|ΔVth| increases when device is on (stressed)|ΔVth| is partially recovered when device is off (relaxed)NBTI PMOS PBTINMOS
|Vgs|
time
ON OFF ON OFF
[VattikondaWC06]
Device aging (|ΔVth|) accumulates over time
[TCASrsquo14]
65ISVLSI-2014 invited talk 140710
Observation 1
[Chan11]
bull BTI is a ldquofront-loadedrdquo phenomenon
bull 50 BTI aging happens within the 1st year of circuit lifetime (total lifetime = 10 years)
bull Most Vdd increment happens in early lifetime
bull Gap between Vdd and Vfinal reduces rapidly
asymp70 Vdd increment in 1 year(remaining 30 over 9 years)
Vfinal
66ISVLSI-2014 invited talk 140710
Results for DC Scenario
Optimistic signoff corner bull AVS increases supply voltage
aggressively to compensate agingbull Consume more powerbull May fail to meet timing if desired
bull Assume no EM fix at signoffbull BTI degradation is checked at each step and MTTF is updated as
30 MTTF penalty
200mV voltage compensation
>
70ISVLSI-2014 invited talk 140710
0 2 4 6 8 10 12090092094096098100102104
S1 S2 S3 S4 S5
Year
VDD
DMA 3S1 S2 S3 S4 S5
78
79
80
81
MTT
F (Y
ear)
EM Impact on AVS Scheduling
12 years MTTF penalty
>
71ISVLSI-2014 invited talk 140710
What is ldquoSignoffrdquo
bull Foundation of contract between design house and foundrybull ldquochip should workrdquo stack of models margins analysesbull Function timing signal integrity power integrity hellip
Nominal VddStatic IR drop
Power grid IR gradientDynamic IR
HCINBTI
Signoff Vdd
Voltage
Problem Margins = pessimism
overdesign schedule delay
ldquomargin stackrdquo for voltage signoff
Operating voltage
72ISVLSI-2014 invited talk 140710
Statistical Timing Analysis (1)
bull Delay sensitivity of path pj to variation source zv
bull Assumptions bull Δdjv is linear with respect to variation sources
bull Variation sources are normal distributions
bull Obtain Δdjv using 28 runs of RC extraction and static timing analysis (STA)
28 itf files (27 variation
sources + Ytyp)
Routed Netlist
RC extraction
STA
Δdjv
Δdjv = [ - ] 3dj(Yv) dj(Ytyp)
Note Path delay includes gate and wire delays
73ISVLSI-2014 invited talk 140710
Statistical Timing Analysis (2)bull Σ is the correlation matrix for variation sources (eg 27 x 27)
bull Σ = λλT (Note λ is obtained by Cholesky decomposition)
h h119879 119903119900119906119892 119901119906119905=1minus119864119877119879
+1minus119864119877119903times119879
recovery cycles
Clock period
Error rate [Kahng10]
76ISVLSI-2014 invited talk 140710
Selective-Endpoint Optimizationbull Optimize fanin cone w tighter constraints Allows replacement of Razor FF w normal FFbull Trade off cost of resilience vs data path optimization
Δdelay (vs typ) at C-worst [d(Ycw) ndash d(Ytyp)] d(Ytyp)
bull Some paths have α gt 10 a CBC can underestimate delay variationsbull But these paths often have smaller α values at the other corner ()
C-worst corner underestimates delay variations but these paths are dominated by the RC-worst corner
α lt 10 here delay variations covered by RC-worst corner
Dominated by C-worstΔdelay at C-worst gt Δdelay at RC-worst
Dominated by RC-worst Δdelay at RC-worst gt Δdelay at C-worst
Δdelay (vs typ) at RC-worst [d(Ycw) ndash d(Ytyp)] d(Ytyp)
12ISVLSI-2014 invited talk 140710
α α
Δdelay at C-worst [d(Ycw) ndash d(Ytyp)] d(Ytyp)
bull Some paths have α gt 10 a CBC can underestimate delay variationsbull But these paths often have smaller α values at the other corner ()
C-worst corner underestimates delay variations but these paths are dominated by the RC-worst corner
α lt 10 delay variations are covered by the RC-worst corner
Dominated by C-worstΔdelay at C-worst gt Δdelay at RC-worst
Dominated by RC-worst Δdelay at RC-worst gt Δdelay at C-worst
Δdelay at RC-worst [d(Ycw) ndash d(Ytyp)] d(Ytyp)
bull Paths are more sensitive to R or to Cbull Using RC-worst or C-worst only will underestimate delay variationsbull Need both RC- and C-worst corners to cover process variations
In the following α is defined at the dominant corner
Intuition on Delay Variability Across Cw RCw
13ISVLSI-2014 invited talk 140710
Scaling Factor α and Delay Variationbull Paths with small Δdrcw and Δdcw have large α
bull Eg here we see αj gt 06 when ((Δdrcw lt 3) AND (Δdcw lt 3))
bull Identify paths for tightened BEOL corners based on Δdrcw and Δdcw
Δd(Ycw)d(Ytyp)
Δd(Yrcw)d(Ytyp) α
14ISVLSI-2014 invited talk 140710
bull Paths with small Δdrcw and Δdcw have large α
bull Eg there are αj gt 06 when ((Δdrcw lt 3) AND (Δdcw lt 3))
bull Identify paths for tightened BEOL corners based on Δdrcw and Δdcw
Find Paths for Which TBCs Can Be Used
Δd(Ycw)d(Ytyp)
Δd(Yrcw)d(Ytyp)
Acw
Arcw
Gtbc = Set of paths that can be safely signed off using TBC ( (Path with Δdcw larger than Acw) OR (Path with Δdrcw larger than Arcw) )
α
15ISVLSI-2014 invited talk 140710
Determining α Arcw and Acw
Δd at C-worst corner ()Δd at RC-worst corner ()
bull Assumption critical paths in different designs have similar trends
bull Extract Arcw and Acw from a set of representative paths
bull Plot α vs Δdelay find Arcw and Acw for a given α
bull Add +1 margin on Arcw and Acw to account for sampling error
bull Smaller α larger thresholds (Arcw and Acw) fewer paths in GTBC
Δd at C-worst corner ()
Arcw Acw
16ISVLSI-2014 invited talk 140710
Benefits of Tightened BEOL Corners
bull WNS and TNS are reduced by up to 100ps and 53nsbull Timing violations reduced by
24 to 100
bull TBC-06 more benefits bull Tradeoff between reduced margin
vs paths which use TBC
Correlation factor γ = 05
LEON SUPERBLUE12 NETCARD
-018-016-014-012
-01-008-006-004-002
0
CBC TBC-05 TBC-06 TBC-07
WN
S (n
s)
LEON SUPERBLUE12 NETCARD
-90-80-70-60-50-40-30-20-10
0
CBC TBC-05 TBC-06 TBC-07
TNS
(ns)
LEON SUPERBLUE12 NETCARD0
200400600800
1000120014001600
CBC TBC-05 TBC-06 TBC-07
Tim
ing
viol
ation
s
17ISVLSI-2014 invited talk 140710
Outlinebull Introductionbull Modeling of IC Variabilitybull Tolerance of IC Variabilitybull Margining of IC Variabilitybull Conclusions
18ISVLSI-2014 invited talk 140710
How to Minimize Cost of Resilience bull Additional circuits area and power penaltiesbull Recovery from errors throughput degradationbull Large hold margin short-path padding costbull Want benefits (eg energy) to maximally outweigh costs
Razor Razor-Lite TIMBER
Razor Razor-Lite TIMBER
Power penalty 30 [Das08] ~0 [Kim13] 100 [Choudhury09]
Overall Optimization Flowbull Iteratively optimize with SEOpt and SkewOpt
Initial placement (all FFs = error-tolerant FFs)
Energy lt min energy
Save current solution
Margin insertion on K paths based on sensitivity function
Replace error-tolerant FFs w normal FFs
SEOpt
Activity aware clock skew optimization
SkewOpt
OR-tree insertion
23ISVLSI-2014 invited talk 140710
Benefit of Low-Cost Resiliencebull Reference flows
bull Pure-margin (PM) conventional method w only margin insertionbull Brute-force (BF) use error-tolerant FFs for timing-critical endpoints
bull Proposed method (CO) achieves up to 21 energy reduction compared to reference methods
bull Resilience benefits increase with larger process variation
PM BF CO PM BF CO PM BF CO27
29
31
33
35
37
En
erg
y (
mJ
)
PM BF CO PM BF CO PM BF CO22
26
30
34
38Energy penalty of throughput degradation
Energy penalty of additional circuits
Energy wo resilience
En
erg
y (
mJ
)
Large marginMedium marginSmall margin
MUL
EXU
Large marginMedium marginSmall margin
Smallmediumlarge margin 1σ2σ3σ for SS corner Technology foundry 28nm
24ISVLSI-2014 invited talk 140710
Increased Benefit of Resilience with AVSbull Adaptive voltage scaling allows a lower supply voltage for resilient
designs thus reduced powerbull Proposed method trades off between timing-error penalty vs
reduced power at a lower supply voltagebull Proposed method achieves an average of 17 energy reduction
compared to pure-margin designs Resilience benefits increase in the context of AVS strategy
086 09 094 098 10225
30
35
40
45
50pure-marginbrute-forceCombOpt
Supply voltage (V)
En
erg
y (
mJ
)
070 072 074 076 078 08024
26
28
30
32
34
36 pure-marginbrute-forceCombOpt
Supply voltage (V)
En
erg
y (
mJ
)
MUL EXU
Minimum achievable energy
Technology foundry 28nm
25ISVLSI-2014 invited talk 140710
Outlinebull Introductionbull Modeling of IC Variabilitybull Tolerance of IC Variabilitybull Margining of IC Variabilitybull Conclusions
26ISVLSI-2014 invited talk 140710
Breaking Chicken-Egg Loops Less Marginbull Example Interaction between reliability margin and AVS designs
bull Bias temperature instability (BTI) aging higher |ΔVth| lower fmax
bull AVS can be used to compensate for performance degradation
Circuit
Closed-loop AVS
On-chip aging
monitor
Circuit performanc
e
Voltage regulato
r
Circuit frequency
Vdd
time
time
Without AVSWith AVS
target
27ISVLSI-2014 invited talk 140710
Derated Library Characterization and AVS
bull VBTI = Voltage for BTI aging estimation
bull Vlib = Voltage for circuit performance estimation (library characterization)
bull VBTI and Vlib are required in signoff
bull VBTI and Vlib selection should consider BTI + AVS interaction
bull Aging and Vfinal are unknowns before circuit implementation
BTI degradation and AVS
Vfinal
VBTI |Vt|
Step 1
Vlib
Derated library
Step 2
Circuit implementation and
signoff
circuit
Step 3
28ISVLSI-2014 invited talk 140710
Library Characterization for AVS
bull VBTI = Voltage for BTI aging estimation
bull Vlib = Voltage for circuit performance estimation (library characterization)
bull VBTI and Vlib are required in signoff
bull VBTI and Vlib depend on aging during AVS
bull Aging and Vfinal are unknowns before circuit implementation
Vlib
VBTI Derated library
|Vt| Circuit implementation and
signoff
circuitBTI degradation and AVS
Vfinal
Step 1 Step 2 Step 3
No obvious guideline to define VBTI and Vlib
Inconsistency among Vfinal Vlib VBTI
bull What is the design overhead when timing libraries are not properly characterized
bull Can we define BTI- and AVS-aware signoff corners that ensure product goals with small design lifetime energy overheads Joint work with Wei-Ting Jonas Chan Tuck-Boon Chan Siddhartha Nath
29ISVLSI-2014 invited talk 140710
Power vs Area Across Different Signoffs
ldquoKneerdquo point for balanced area and power tradeoff
Optimistic signoff corner bull AVS increases supply voltage
aggressively to compensate aging
bull Large lifetime energy overhead
bull May fail to meet timing if desired supply voltage gt Vmax
bull Selection of signoff modes affects area power
bull ASP-DAC 2013 Optimization of signoff modes
Improve performance power or area
Reduce overdesign
NOM
ODNOM
OD
time
Vdd
tnom tOD tnom tOD
Also Multi-Mode Signoff Choices Matter
12
Fix fOD still 14 power range
Power of circuits w different overdrive modes
Different overdrive modes 26 power range
fnom = 800MHz Vnom = 08V
36ISVLSI-2014 invited talk 140710
Also Tunable Monitors Less Margin
Aggressive config Vmin_est lt Vmin_chip Some chips will fail
Optimized configbull Increase high
resistance passgatesbull Vmin_est asymp Vmin_chip
Default config
bull Low resistance passgates
bull Guardband for worst-case
bull Vmin_est gt Vmin_chip
bull 13mV margin
37ISVLSI-2014 invited talk 140710
Also Tunable Monitors Less Margin
Aggressive config Vmin_est lt Vmin_chip Some chips will fail
Default config
bull Low resistance passgates
bull Guardband for worst-case
bull Vmin_est gt Vmin_chip
bull 13mV margin
Optimized configbull Increase high
resistance passgatesbull Vmin_est asymp Vmin_chip
Benefits of tunability bull Compensate for difference
between model vs siliconbull Recover margin when variation is
reduced due to improved process
38ISVLSI-2014 invited talk 140710
Outlinebull Introductionbull Modeling of IC Variabilitybull Margining of IC Variabilitybull Tolerance of IC Variabilitybull Conclusions
39ISVLSI-2014 invited talk 140710
Conclusionsbull Variability severely challenges IC value
bull In manufacturing process during operation across lifetime
bull Benefit of ldquonext noderdquo is increasingly hard to findbull Entire node is a ldquo202020rdquo value propositionbull 5-10 in PPA metrics is now substantial at leading edge
bull Variability is connected to tapeout IC properties by models margins tolerances used in signoff
bull Some takeaways from this talkbull Substantial benefit from tightening BEOL corners (= signoff)bull ldquoMinimum cost of resiliencerdquo is a rich optimization challengebull Chicken-egg loops in signoff definition can be brokenbull Holistic approaches will provide ldquoequivalent scalingrdquo that
extends the value trajectory of Moorersquos Law
40ISVLSI-2014 invited talk 140710
Thank You
41ISVLSI-2014 invited talk 140710
Backup
42ISVLSI-2014 invited talk 140710
Power Penalty to Fix EM with AVS
1 2 3 4 5 6 7 8 91200
1300
1400
1500
1600
1700
030
032
034
036
Core Power (mW) PG Power (mW)
Implemetation
Core
Pow
er (m
W)
PG
Pow
er (m
W)
bull Core power increases due to elevated voltage bull PG power increases due to both elevated voltage and mesh degradationbull A tradeoff between invested guardband in signoff
Highest invested guardband
Least invested guardband
14 power penalty
>
43ISVLSI-2014 invited talk 140710
Homogeneous Cornersbull (1) Define RC corners of each layer separatelybull (2) Use corners from each layer to construct a
homogeneous corner for an interconnect stack
C-3σ
Layer M2
3σ
C-3σ
Layer M1
3σ
Interconnect stack with M1 and M2
M1 C
M2 C
3σ Pessimism
Example worst-case capacitance corner Homogeneous
Cw corner
44ISVLSI-2014 invited talk 140710
Homogeneous Cornersbull (1) Define RC corners of each layer separatelybull (2) Use corners from each layer to construct a
homogeneous corner for an interconnect stack
Interconnect stack with M1 and M2
M1 C
M2 C
3σ
Homogeneous Cw corner
C-3σ
Layer M2
3σ
C-3σ
Layer M1
3σ
Pessimism
Example worst-case capacitance corner
When variations in different layers are not fully correlated pessimism of homogeneous corners increase with layers
45ISVLSI-2014 invited talk 140710
Correlation Matrixbull Let Σ be the correlation matrix for variation sources
M1 M2 M3 M4
ΔW ΔT ΔH ΔW ΔT ΔH ΔW ΔT ΔH ΔW ΔT ΔH
M1 ΔW 1 0 0 γ 0 0 γ 0 0 0 0 0
ΔT 0 1 0 0 γ 0 0 γ 0 0 0 0
ΔH 0 0 1 0 0 γ 0 0 γ 0 0 0
M2 ΔW γ 0 0 1 0 0 γ 0 0 0 0 0
ΔT 0 γ 0 0 1 0 0 γ 0 0 0 0
ΔH 0 0 γ 0 0 1 0 0 γ 0 0 0
M3 ΔW γ 0 0 γ 0 0 1 0 0 0 0 0
ΔT 0 γ 0 0 γ 0 0 1 0 0 0 0
ΔH 0 0 γ 0 0 γ 0 0 1 0 0 0
M4 ΔW 0 0 0 0 0 0 0 0 0 1 0 0
ΔT 0 0 0 0 0 0 0 0 0 0 1 0
ΔH 0 0 0 0 0 0 0 0 0 0 0 1
= Σ
Correlation for variation sources with the same variation type and in the process module γ 05
Variation sources in different process modules are independent
46ISVLSI-2014 invited talk 140710
Wiring Structure in Timing-Critical Paths (2)
bull 92 of paths have lt 60 of wirelength on any single layer
Max wirelength ratio across all layers ()
Cum
ulati
ve p
roba
bilit
y
092
60
bull Variations in different layers are not fully correlated
bull Some paths have α gt 10 a CBC can underestimate delay variationsbull But these paths have larger delays at the other corner
C-worst corner underestimates delay variations but these paths are dominated by the RC-worst corner
α lt 10 delay variations are covered by the RC-worst corner
Dominated by C-worstΔdelay at C-worst gt Δdelay at RC-worst
Dominated by RC-worst Δdelay at RC-worst gt Δdelay at C-worst
Δdelay at RC-worst [d(Ycw) ndash d(Ytyp)] d(Ytyp)
48ISVLSI-2014 invited talk 140710
Delay Variation
α α
Δdelay at C-worst [d(Ycw) ndash d(Ytyp)] d(Ytyp)
bull Some paths have α gt 10 a CBC can underestimate delay variationsbull But these paths have larger delays at the other corner
C-worst corner underestimates delay variations but these paths are dominated by the RC-worst corner
α lt 10 delay variations are covered by the RC-worst corner
Dominated by C-worstΔdelay at C-worst gt Δdelay at RC-worst
Dominated by RC-worst Δdelay at RC-worst gt Δdelay at C-worst
Δdelay at RC-worst [d(Ycw) ndash d(Ytyp)] d(Ytyp)
bull Paths are more sensitive to R or to Cbull Using RC-worst or C-worst only will underestimate delay variationsbull Need both RC- and C-worst corners to cover process variationsbull In the following discussions α is defined at the dominant corner
49ISVLSI-2014 invited talk 140710
Non-Homogeneous Corner
bull Each layer can have different skewed variationsInterconnect stack with M1 and M2
M1 C
M2 C
3σ
Non-homogeneous cornerM1 == Cw (3σ)M2 == Ctyp
bull Less pessimism with non-homogeneous cornersbull Challenge
bull Many feasible combinationsbull A corner can only cover certain pathsbull How to choose the best combinations
50ISVLSI-2014 invited talk 140710
Opportunities for Tightened BEOL Corners
bull CBC can be pessimistic Most paths have α lt 05 bull Use tightened BEOL corners eg scale BEOL variation in
itf with α = 05
Δdj(Yrcw)dj(Ytyp) x 100
3σjd(Ytyp) x 100
Challenge how to avoid underestimating delay variation to preserve parametric yield
51ISVLSI-2014 invited talk 140710
Wiring Structure in Timing-Critical Paths
bull Critical paths are structurally similar
bull Wires on critical paths are routed on many layers
bull Structure is an outcome of the design flow
Testcasebull 45nm foundry library (wire
resistivity scaled by 8X)bull Netlist NETCARD 1mm2 570K
standard cell instancesbull 9 metal layersbull Extract critical paths from
different PVT and BEOL corners
Wirelength ratio ()
52ISVLSI-2014 invited talk 140710
Proposed Timing Signoff Flow
bull Extract RC at RC-worst C-worst and the typical corners
bull Calculate Δdelay of critical paths
bull Put path j in the group Gtbc if Δdelay is larger than a threshold
bull Fix only the paths in Gtbc using tightened BEOL corners
bull Since tightened corners have smaller delay variations timing closure is easier
Routed design
Timing analysis at BEOL corners Ytyp Ycw Yrcw
GTBC GCBC
ECOusing CBC
Timing analysis
using TBC
violation = 0
Timing analysis
using CBC
violation = 0
ECOusing TBC
done
53ISVLSI-2014 invited talk 140710
Experiment Setup
LEON3MP NETCARD SUPERBLUE12Clock period (ns) 18 20 31
bull Model BTI degradation with Vfinal throughout lifetime
bull Aging of a flat Vfinal asymp aging of an adaptive Vdd
bull But slightly pessimistic
Vdd
time
NBTI
PBTI
VBTI = Vlib asymp Vfinal
58ISVLSI-2014 invited talk 140710
Vfinal Estimation
bull Problem Vfinal is not available at early design stage (design has not been implemented)
bull Vfinal = Vdd end of life (to compensate BTI aging)
bull Gates along critical pathbull Timing slack at t = 0
bull Circuit activity is not an issue bull Because BTI effect is not sensitive to circuit activitybull DC or AC stress model is sufficient
59ISVLSI-2014 invited talk 140710
Observation and Heuristic 2
bull Observation 2 Vfinal is not sensitive to gate types
bull Heuristic 2 use average Vfinal of different gate typesbull Vfinal is a function of timing slack
bull Assume timing slack = 0
10mV
60ISVLSI-2014 invited talk 140710
Technology and Benchmark Circuits
bull NANGATE library with 32nm PTM technology bull Signoff for setup time violationbull Temperature = 125Cbull Process corner = slow NMOS and PMOSbull BTI degradation = DC AC
Supply voltages
Circuit Frequency (GHz)C5315 138c7552 125AES 089MPEG2 105
Vmax105V
Vinit090V
Vheur1 (DC) 097V
Vheur1 (AC) 095V
Vheur2 (DC) 095V
Vheur2 (AC) 093V
61ISVLSI-2014 invited talk 140710
A Reference Signoff Flow
bull Basic idea keep a consistent VBTI VLIB and Vdd throughout circuit lifetime
bull Signoff flowbull Estimate aging at each time step
bull Update circuit timing and Vdd
bull Repeat until t = tfinal
bull Modify circuit and start over if Vfinal gt maximum allowed voltage
bull No overhead in timing analysis but very slow Many STA runs
and library
Vstep AVS voltage stepVfinal converged voltage
62ISVLSI-2014 invited talk 140710
Experiment Setupbull Characterize different derated libraries
bull Evaluate impact of library characterizationbull Seven setups
1 VBTI = Vlib = Vinit Ignore AVS2 Most pessimistic derated library3 VBTI = Vlib = Vmax Extreme corner for AVS4 VBTI = Vfinal Do not overestimate aging but ignores AVS5 No derated library (reference)6 Proposed method with α=07 Proposed method with α=003
Case 1 2 3 4 5 6 7Vlib(V) Vinit Vinit Vmax Vinit NA Vheur1 Vheur2
VBTI (V) Vinit Vmax Vmax Vfinal NA Vheur1 Vheur2
63ISVLSI-2014 invited talk 140710
ldquoChicken and Eggrdquo Loop
bull ldquoChicken and eggrdquo loop in signoffbull Derated library characterization is related to BTI + AVSbull AVS affected by circuit implementation
bull Timing constraints critical paths etc
bull Circuit is affected by library characterization
Circuit
Derated Libraries
Vfinal
Vlib VBTI
64ISVLSI-2014 invited talk 140710
Bias Temperature Instability (BTI)
|ΔVth| increases when device is on (stressed)|ΔVth| is partially recovered when device is off (relaxed)NBTI PMOS PBTINMOS
|Vgs|
time
ON OFF ON OFF
[VattikondaWC06]
Device aging (|ΔVth|) accumulates over time
[TCASrsquo14]
65ISVLSI-2014 invited talk 140710
Observation 1
[Chan11]
bull BTI is a ldquofront-loadedrdquo phenomenon
bull 50 BTI aging happens within the 1st year of circuit lifetime (total lifetime = 10 years)
bull Most Vdd increment happens in early lifetime
bull Gap between Vdd and Vfinal reduces rapidly
asymp70 Vdd increment in 1 year(remaining 30 over 9 years)
Vfinal
66ISVLSI-2014 invited talk 140710
Results for DC Scenario
Optimistic signoff corner bull AVS increases supply voltage
aggressively to compensate agingbull Consume more powerbull May fail to meet timing if desired
bull Assume no EM fix at signoffbull BTI degradation is checked at each step and MTTF is updated as
30 MTTF penalty
200mV voltage compensation
>
70ISVLSI-2014 invited talk 140710
0 2 4 6 8 10 12090092094096098100102104
S1 S2 S3 S4 S5
Year
VDD
DMA 3S1 S2 S3 S4 S5
78
79
80
81
MTT
F (Y
ear)
EM Impact on AVS Scheduling
12 years MTTF penalty
>
71ISVLSI-2014 invited talk 140710
What is ldquoSignoffrdquo
bull Foundation of contract between design house and foundrybull ldquochip should workrdquo stack of models margins analysesbull Function timing signal integrity power integrity hellip
Nominal VddStatic IR drop
Power grid IR gradientDynamic IR
HCINBTI
Signoff Vdd
Voltage
Problem Margins = pessimism
overdesign schedule delay
ldquomargin stackrdquo for voltage signoff
Operating voltage
72ISVLSI-2014 invited talk 140710
Statistical Timing Analysis (1)
bull Delay sensitivity of path pj to variation source zv
bull Assumptions bull Δdjv is linear with respect to variation sources
bull Variation sources are normal distributions
bull Obtain Δdjv using 28 runs of RC extraction and static timing analysis (STA)
28 itf files (27 variation
sources + Ytyp)
Routed Netlist
RC extraction
STA
Δdjv
Δdjv = [ - ] 3dj(Yv) dj(Ytyp)
Note Path delay includes gate and wire delays
73ISVLSI-2014 invited talk 140710
Statistical Timing Analysis (2)bull Σ is the correlation matrix for variation sources (eg 27 x 27)
bull Σ = λλT (Note λ is obtained by Cholesky decomposition)
h h119879 119903119900119906119892 119901119906119905=1minus119864119877119879
+1minus119864119877119903times119879
recovery cycles
Clock period
Error rate [Kahng10]
76ISVLSI-2014 invited talk 140710
Selective-Endpoint Optimizationbull Optimize fanin cone w tighter constraints Allows replacement of Razor FF w normal FFbull Trade off cost of resilience vs data path optimization
Energy Reduction in AVS Contextbull Adaptive voltage scaling allows lower supply voltage for resilient
designs thus reduced powerbull Proposed method trades off between timing-error penalty vs
reduced power at a lower supply voltagebull Proposed method achieves an average of 18 energy reduction
compared to pure-margin designs Resilience benefits increase in the context of AVS strategy
084 088 092 096 10030
36
42
48
54
60brute-forcepure-marginCombOpt
Supply voltage (V)
En
erg
y (
mJ
)
084 086 088 09 092 094 096 098 1 10225
29
33
37
41
45brute-force
pure-margin
CombOpt
Supply voltage (V)
En
erg
y (
mJ
)
MUL EXU
Minimum achievable energy
80ISVLSI-2014 invited talk 140710
Our Concept Mode Dominancebull Design cone (of mode A) is the union of all the feasible operating modes for
circuits signed of at mode Abull Design cone is determined by tradeoff between voltage and frequency (mainly
threshold voltages)bull One mode is outside of the design cone of the other
failed design overdesignbull Mode A has positive timing slacks with respect to mode B
mode A dominates mode Bbull Equivalent dominance no mode is dominated by the other
bull Modes are in each othersrsquo design cone
Voltage
Frequency
A
Negative Slacks = failed design
Positive Slacks = overdesign
B
C
Design Cone of mode A
Multi-mode signoff at modes which do not exhibit equivalent dominance leads to overdesign
Guideline search for signoff modes within design cone reduce overdesign
81ISVLSI-2014 invited talk 140710
Our Method Global Optimizationbull Iteratively sample and refine power models
bull Avoid circuit implementation at each modebull Small constant of runs is enough Scalable
Sample (SPampR)
Construct power models
Estimate optimal signoff modes
Sample (SPampR)
Refine power models
Adaptive search
Global optimization flow
09 10 11 1214
15
16
17
18
19
201st 2nd real
Signoff Voltage (v)
Po
wer
(m
W)
Power estimation of adaptive search
bull Ovals indicate sample pointsbull 1st 2nd power from power models at first
second iterationbull real power from real implemented circuits
Design AESf 700MHz
82ISVLSI-2014 invited talk 140710
Classes of Closed-Loop AVS
bull Critical path may be difficult to identify (IP from 3rd party)
bull Calibrating monitors at multiple modesvoltages requires long test time
Closed-Loop AVS
Design-dependent replica
In-situmonitor
Generic monitor
bull Does not capture design-specific performance variation
82
This work Tunable monitor for closed-loop AVSbull Can be applied as a generic monitorbull Or tuned to capture design-specific performance
83ISVLSI-2014 invited talk 140710
Design of RO with Tunable Vmin
bull Identified two circuit knobs to tune Vmin
bull Series resistancebull Cell types (INV NAND NOR)
bull Proposed circuitbull Different cell type covers different process cornersbull Tune series resistance of each stage to high or low
1 bit 1 bit 1 bit Control pins
High resistance
Low resistance
84ISVLSI-2014 invited talk 140710
Benefit of Resilience Cost Reductionbull Reference flows
bull Pure-margin (PM) conventional methodology w only margin insertionbull Brute-force (BF) insert error-tolerant FFs at timing-critical endpoints
bull Proposed method (CO) achieves up to 20 energy reduction compared to reference methods
bull Resilience benefits increase with safety margin
PM BF CO PM BF CO PM BF CO25
30
35
40
45
50
55 Energy penalty of throughput degradationEnergy penalty of additional circuitsEnergy wo resilience
En
erg
y (
mJ
)
Large marginMedium marginSmall margin
MUL
PM BF CO PM BF CO PM BF CO25
27
29
31
33
35
En
erg
y (
mJ
)
EXU
Large marginMedium marginSmall margin
Smallmediumlarge margin safety margin = 51015 of clock period
85ISVLSI-2014 invited talk 140710
Increased Benefit of Resilience With AVSbull AVS (Adaptive Voltage Scaling) allows lower supply voltage for
resilient designs reduced powerbull We trade off between timing-error penalty vs reduced power at a
lower supply voltagebull Average 18 energy reduction compared to pure-margin designs
Resilience benefits increase in AVS context
084 088 092 096 10030
36
42
48
54
60brute-forcepure-marginCombOpt
Supply voltage (V)
En
erg
y (
mJ
)
084 086 088 09 092 094 096 098 1 10225
29
33
37
41
45brute-force
pure-margin
CombOpt
Supply voltage (V)
En
erg
y (
mJ
)
MUL EXU
Minimum achievable energy
86ISVLSI-2014 invited talk 140710
Overall Optimization Flow
bull Iteratively optimize with SEOpt and SkewOpt
Initial placement (all FFs = error-tolerant FFs)
Energy lt min energy
Save current solution
Margin insertion on K paths based on sensitivity function
Replace error-tolerant FFs w normal FFs
SEOpt
Activity-aware clock skew optimization
SkewOpt
Toward Holistic Modeling Margining and Tolerance of IC Variabi
IC Variability
Challenge Value of Technology
Solutions Modeling Margining Tolerance
Outline
BEOL Corner Optimization
Proposed Timing Signoff Flow
Conventional BEOL Corners
Statistical RC Model
Pessimism of Conventional BEOL Corners (CBC)
Intuition on Delay Variability Across Cw RCw
Intuition on Delay Variability Across Cw RCw (2)
Scaling Factor α and Delay Variation
Find Paths for Which TBCs Can Be Used
Determining α Arcw and Acw
Benefits of Tightened BEOL Corners
Outline (2)
How to Minimize Cost of Resilience
Tradeoff Resilience Cost vs Datapath Cost
Selective-Endpoint Optimization (SEOpt)
Clock Skew Optimization (SkewOpt)
Overall Optimization Flow
Benefit of Low-Cost Resilience
Increased Benefit of Resilience with AVS
Outline (3)
Breaking Chicken-Egg Loops Less Margin
Derated Library Characterization and AVS
Library Characterization for AVS
Power vs Area Across Different Signoffs
Heuristics 1
Vfinal Estimation
Observation and Heuristic 2
Proposed Library Characterization Flow
Power vs Area for All Designs
Also Multi-Mode Signoff Choices Matter
Also Tunable Monitors Less Margin
Also Tunable Monitors Less Margin (2)
Outline (4)
Conclusions
Thank You
Backup
Power Penalty to Fix EM with AVS
Homogeneous Corners
Homogeneous Corners (2)
Correlation Matrix
Wiring Structure in Timing-Critical Paths (2)
Delay Variation
Delay Variation (2)
Non-Homogeneous Corner
Opportunities for Tightened BEOL Corners
Wiring Structure in Timing-Critical Paths (2)
Proposed Timing Signoff Flow (2)
Experiment Setup
Further Analysis
Scaling Factor Results
Benefits of Tightened BEOL Corners (1)
Heuristics 1 (2)
Vfinal Estimation (2)
Observation and Heuristic 2 (2)
Technology and Benchmark Circuits
A Reference Signoff Flow
Experiment Setup (2)
ldquoChicken and Eggrdquo Loop
Bias Temperature Instability (BTI)
Observation 1
Results for DC Scenario
Problem Signoff Corner Definition
AVS Signoff Corner Selection
AVS Impact on EM Lifetime
EM Impact on AVS Scheduling
What is ldquoSignoffrdquo
Statistical Timing Analysis (1)
Statistical Timing Analysis (2)
Resilient Designs
Resilience Cost Reduction Problem
Selective-Endpoint Optimization
Process-Aware Vdd Scaling (PVS)
Challenge Variability
Energy Reduction in AVS Context
Our Concept Mode Dominance
Our Method Global Optimization
Classes of Closed-Loop AVS
Design of RO with Tunable Vmin
Benefit of Resilience Cost Reduction
Increased Benefit of Resilience With AVS
Overall Optimization Flow (2)
11ISVLSI-2014 invited talk 140710
Intuition on Delay Variability Across Cw RCw
α α
Δdelay (vs typ) at C-worst [d(Ycw) ndash d(Ytyp)] d(Ytyp)
bull Some paths have α gt 10 a CBC can underestimate delay variationsbull But these paths often have smaller α values at the other corner ()
C-worst corner underestimates delay variations but these paths are dominated by the RC-worst corner
α lt 10 here delay variations covered by RC-worst corner
Dominated by C-worstΔdelay at C-worst gt Δdelay at RC-worst
Dominated by RC-worst Δdelay at RC-worst gt Δdelay at C-worst
Δdelay (vs typ) at RC-worst [d(Ycw) ndash d(Ytyp)] d(Ytyp)
12ISVLSI-2014 invited talk 140710
α α
Δdelay at C-worst [d(Ycw) ndash d(Ytyp)] d(Ytyp)
bull Some paths have α gt 10 a CBC can underestimate delay variationsbull But these paths often have smaller α values at the other corner ()
C-worst corner underestimates delay variations but these paths are dominated by the RC-worst corner
α lt 10 delay variations are covered by the RC-worst corner
Dominated by C-worstΔdelay at C-worst gt Δdelay at RC-worst
Dominated by RC-worst Δdelay at RC-worst gt Δdelay at C-worst
Δdelay at RC-worst [d(Ycw) ndash d(Ytyp)] d(Ytyp)
bull Paths are more sensitive to R or to Cbull Using RC-worst or C-worst only will underestimate delay variationsbull Need both RC- and C-worst corners to cover process variations
In the following α is defined at the dominant corner
Intuition on Delay Variability Across Cw RCw
13ISVLSI-2014 invited talk 140710
Scaling Factor α and Delay Variationbull Paths with small Δdrcw and Δdcw have large α
bull Eg here we see αj gt 06 when ((Δdrcw lt 3) AND (Δdcw lt 3))
bull Identify paths for tightened BEOL corners based on Δdrcw and Δdcw
Δd(Ycw)d(Ytyp)
Δd(Yrcw)d(Ytyp) α
14ISVLSI-2014 invited talk 140710
bull Paths with small Δdrcw and Δdcw have large α
bull Eg there are αj gt 06 when ((Δdrcw lt 3) AND (Δdcw lt 3))
bull Identify paths for tightened BEOL corners based on Δdrcw and Δdcw
Find Paths for Which TBCs Can Be Used
Δd(Ycw)d(Ytyp)
Δd(Yrcw)d(Ytyp)
Acw
Arcw
Gtbc = Set of paths that can be safely signed off using TBC ( (Path with Δdcw larger than Acw) OR (Path with Δdrcw larger than Arcw) )
α
15ISVLSI-2014 invited talk 140710
Determining α Arcw and Acw
Δd at C-worst corner ()Δd at RC-worst corner ()
bull Assumption critical paths in different designs have similar trends
bull Extract Arcw and Acw from a set of representative paths
bull Plot α vs Δdelay find Arcw and Acw for a given α
bull Add +1 margin on Arcw and Acw to account for sampling error
bull Smaller α larger thresholds (Arcw and Acw) fewer paths in GTBC
Δd at C-worst corner ()
Arcw Acw
16ISVLSI-2014 invited talk 140710
Benefits of Tightened BEOL Corners
bull WNS and TNS are reduced by up to 100ps and 53nsbull Timing violations reduced by
24 to 100
bull TBC-06 more benefits bull Tradeoff between reduced margin
vs paths which use TBC
Correlation factor γ = 05
LEON SUPERBLUE12 NETCARD
-018-016-014-012
-01-008-006-004-002
0
CBC TBC-05 TBC-06 TBC-07
WN
S (n
s)
LEON SUPERBLUE12 NETCARD
-90-80-70-60-50-40-30-20-10
0
CBC TBC-05 TBC-06 TBC-07
TNS
(ns)
LEON SUPERBLUE12 NETCARD0
200400600800
1000120014001600
CBC TBC-05 TBC-06 TBC-07
Tim
ing
viol
ation
s
17ISVLSI-2014 invited talk 140710
Outlinebull Introductionbull Modeling of IC Variabilitybull Tolerance of IC Variabilitybull Margining of IC Variabilitybull Conclusions
18ISVLSI-2014 invited talk 140710
How to Minimize Cost of Resilience bull Additional circuits area and power penaltiesbull Recovery from errors throughput degradationbull Large hold margin short-path padding costbull Want benefits (eg energy) to maximally outweigh costs
Razor Razor-Lite TIMBER
Razor Razor-Lite TIMBER
Power penalty 30 [Das08] ~0 [Kim13] 100 [Choudhury09]
Overall Optimization Flowbull Iteratively optimize with SEOpt and SkewOpt
Initial placement (all FFs = error-tolerant FFs)
Energy lt min energy
Save current solution
Margin insertion on K paths based on sensitivity function
Replace error-tolerant FFs w normal FFs
SEOpt
Activity aware clock skew optimization
SkewOpt
OR-tree insertion
23ISVLSI-2014 invited talk 140710
Benefit of Low-Cost Resiliencebull Reference flows
bull Pure-margin (PM) conventional method w only margin insertionbull Brute-force (BF) use error-tolerant FFs for timing-critical endpoints
bull Proposed method (CO) achieves up to 21 energy reduction compared to reference methods
bull Resilience benefits increase with larger process variation
PM BF CO PM BF CO PM BF CO27
29
31
33
35
37
En
erg
y (
mJ
)
PM BF CO PM BF CO PM BF CO22
26
30
34
38Energy penalty of throughput degradation
Energy penalty of additional circuits
Energy wo resilience
En
erg
y (
mJ
)
Large marginMedium marginSmall margin
MUL
EXU
Large marginMedium marginSmall margin
Smallmediumlarge margin 1σ2σ3σ for SS corner Technology foundry 28nm
24ISVLSI-2014 invited talk 140710
Increased Benefit of Resilience with AVSbull Adaptive voltage scaling allows a lower supply voltage for resilient
designs thus reduced powerbull Proposed method trades off between timing-error penalty vs
reduced power at a lower supply voltagebull Proposed method achieves an average of 17 energy reduction
compared to pure-margin designs Resilience benefits increase in the context of AVS strategy
086 09 094 098 10225
30
35
40
45
50pure-marginbrute-forceCombOpt
Supply voltage (V)
En
erg
y (
mJ
)
070 072 074 076 078 08024
26
28
30
32
34
36 pure-marginbrute-forceCombOpt
Supply voltage (V)
En
erg
y (
mJ
)
MUL EXU
Minimum achievable energy
Technology foundry 28nm
25ISVLSI-2014 invited talk 140710
Outlinebull Introductionbull Modeling of IC Variabilitybull Tolerance of IC Variabilitybull Margining of IC Variabilitybull Conclusions
26ISVLSI-2014 invited talk 140710
Breaking Chicken-Egg Loops Less Marginbull Example Interaction between reliability margin and AVS designs
bull Bias temperature instability (BTI) aging higher |ΔVth| lower fmax
bull AVS can be used to compensate for performance degradation
Circuit
Closed-loop AVS
On-chip aging
monitor
Circuit performanc
e
Voltage regulato
r
Circuit frequency
Vdd
time
time
Without AVSWith AVS
target
27ISVLSI-2014 invited talk 140710
Derated Library Characterization and AVS
bull VBTI = Voltage for BTI aging estimation
bull Vlib = Voltage for circuit performance estimation (library characterization)
bull VBTI and Vlib are required in signoff
bull VBTI and Vlib selection should consider BTI + AVS interaction
bull Aging and Vfinal are unknowns before circuit implementation
BTI degradation and AVS
Vfinal
VBTI |Vt|
Step 1
Vlib
Derated library
Step 2
Circuit implementation and
signoff
circuit
Step 3
28ISVLSI-2014 invited talk 140710
Library Characterization for AVS
bull VBTI = Voltage for BTI aging estimation
bull Vlib = Voltage for circuit performance estimation (library characterization)
bull VBTI and Vlib are required in signoff
bull VBTI and Vlib depend on aging during AVS
bull Aging and Vfinal are unknowns before circuit implementation
Vlib
VBTI Derated library
|Vt| Circuit implementation and
signoff
circuitBTI degradation and AVS
Vfinal
Step 1 Step 2 Step 3
No obvious guideline to define VBTI and Vlib
Inconsistency among Vfinal Vlib VBTI
bull What is the design overhead when timing libraries are not properly characterized
bull Can we define BTI- and AVS-aware signoff corners that ensure product goals with small design lifetime energy overheads Joint work with Wei-Ting Jonas Chan Tuck-Boon Chan Siddhartha Nath
29ISVLSI-2014 invited talk 140710
Power vs Area Across Different Signoffs
ldquoKneerdquo point for balanced area and power tradeoff
Optimistic signoff corner bull AVS increases supply voltage
aggressively to compensate aging
bull Large lifetime energy overhead
bull May fail to meet timing if desired supply voltage gt Vmax
bull Selection of signoff modes affects area power
bull ASP-DAC 2013 Optimization of signoff modes
Improve performance power or area
Reduce overdesign
NOM
ODNOM
OD
time
Vdd
tnom tOD tnom tOD
Also Multi-Mode Signoff Choices Matter
12
Fix fOD still 14 power range
Power of circuits w different overdrive modes
Different overdrive modes 26 power range
fnom = 800MHz Vnom = 08V
36ISVLSI-2014 invited talk 140710
Also Tunable Monitors Less Margin
Aggressive config Vmin_est lt Vmin_chip Some chips will fail
Optimized configbull Increase high
resistance passgatesbull Vmin_est asymp Vmin_chip
Default config
bull Low resistance passgates
bull Guardband for worst-case
bull Vmin_est gt Vmin_chip
bull 13mV margin
37ISVLSI-2014 invited talk 140710
Also Tunable Monitors Less Margin
Aggressive config Vmin_est lt Vmin_chip Some chips will fail
Default config
bull Low resistance passgates
bull Guardband for worst-case
bull Vmin_est gt Vmin_chip
bull 13mV margin
Optimized configbull Increase high
resistance passgatesbull Vmin_est asymp Vmin_chip
Benefits of tunability bull Compensate for difference
between model vs siliconbull Recover margin when variation is
reduced due to improved process
38ISVLSI-2014 invited talk 140710
Outlinebull Introductionbull Modeling of IC Variabilitybull Margining of IC Variabilitybull Tolerance of IC Variabilitybull Conclusions
39ISVLSI-2014 invited talk 140710
Conclusionsbull Variability severely challenges IC value
bull In manufacturing process during operation across lifetime
bull Benefit of ldquonext noderdquo is increasingly hard to findbull Entire node is a ldquo202020rdquo value propositionbull 5-10 in PPA metrics is now substantial at leading edge
bull Variability is connected to tapeout IC properties by models margins tolerances used in signoff
bull Some takeaways from this talkbull Substantial benefit from tightening BEOL corners (= signoff)bull ldquoMinimum cost of resiliencerdquo is a rich optimization challengebull Chicken-egg loops in signoff definition can be brokenbull Holistic approaches will provide ldquoequivalent scalingrdquo that
extends the value trajectory of Moorersquos Law
40ISVLSI-2014 invited talk 140710
Thank You
41ISVLSI-2014 invited talk 140710
Backup
42ISVLSI-2014 invited talk 140710
Power Penalty to Fix EM with AVS
1 2 3 4 5 6 7 8 91200
1300
1400
1500
1600
1700
030
032
034
036
Core Power (mW) PG Power (mW)
Implemetation
Core
Pow
er (m
W)
PG
Pow
er (m
W)
bull Core power increases due to elevated voltage bull PG power increases due to both elevated voltage and mesh degradationbull A tradeoff between invested guardband in signoff
Highest invested guardband
Least invested guardband
14 power penalty
>
43ISVLSI-2014 invited talk 140710
Homogeneous Cornersbull (1) Define RC corners of each layer separatelybull (2) Use corners from each layer to construct a
homogeneous corner for an interconnect stack
C-3σ
Layer M2
3σ
C-3σ
Layer M1
3σ
Interconnect stack with M1 and M2
M1 C
M2 C
3σ Pessimism
Example worst-case capacitance corner Homogeneous
Cw corner
44ISVLSI-2014 invited talk 140710
Homogeneous Cornersbull (1) Define RC corners of each layer separatelybull (2) Use corners from each layer to construct a
homogeneous corner for an interconnect stack
Interconnect stack with M1 and M2
M1 C
M2 C
3σ
Homogeneous Cw corner
C-3σ
Layer M2
3σ
C-3σ
Layer M1
3σ
Pessimism
Example worst-case capacitance corner
When variations in different layers are not fully correlated pessimism of homogeneous corners increase with layers
45ISVLSI-2014 invited talk 140710
Correlation Matrixbull Let Σ be the correlation matrix for variation sources
M1 M2 M3 M4
ΔW ΔT ΔH ΔW ΔT ΔH ΔW ΔT ΔH ΔW ΔT ΔH
M1 ΔW 1 0 0 γ 0 0 γ 0 0 0 0 0
ΔT 0 1 0 0 γ 0 0 γ 0 0 0 0
ΔH 0 0 1 0 0 γ 0 0 γ 0 0 0
M2 ΔW γ 0 0 1 0 0 γ 0 0 0 0 0
ΔT 0 γ 0 0 1 0 0 γ 0 0 0 0
ΔH 0 0 γ 0 0 1 0 0 γ 0 0 0
M3 ΔW γ 0 0 γ 0 0 1 0 0 0 0 0
ΔT 0 γ 0 0 γ 0 0 1 0 0 0 0
ΔH 0 0 γ 0 0 γ 0 0 1 0 0 0
M4 ΔW 0 0 0 0 0 0 0 0 0 1 0 0
ΔT 0 0 0 0 0 0 0 0 0 0 1 0
ΔH 0 0 0 0 0 0 0 0 0 0 0 1
= Σ
Correlation for variation sources with the same variation type and in the process module γ 05
Variation sources in different process modules are independent
46ISVLSI-2014 invited talk 140710
Wiring Structure in Timing-Critical Paths (2)
bull 92 of paths have lt 60 of wirelength on any single layer
Max wirelength ratio across all layers ()
Cum
ulati
ve p
roba
bilit
y
092
60
bull Variations in different layers are not fully correlated
bull Some paths have α gt 10 a CBC can underestimate delay variationsbull But these paths have larger delays at the other corner
C-worst corner underestimates delay variations but these paths are dominated by the RC-worst corner
α lt 10 delay variations are covered by the RC-worst corner
Dominated by C-worstΔdelay at C-worst gt Δdelay at RC-worst
Dominated by RC-worst Δdelay at RC-worst gt Δdelay at C-worst
Δdelay at RC-worst [d(Ycw) ndash d(Ytyp)] d(Ytyp)
48ISVLSI-2014 invited talk 140710
Delay Variation
α α
Δdelay at C-worst [d(Ycw) ndash d(Ytyp)] d(Ytyp)
bull Some paths have α gt 10 a CBC can underestimate delay variationsbull But these paths have larger delays at the other corner
C-worst corner underestimates delay variations but these paths are dominated by the RC-worst corner
α lt 10 delay variations are covered by the RC-worst corner
Dominated by C-worstΔdelay at C-worst gt Δdelay at RC-worst
Dominated by RC-worst Δdelay at RC-worst gt Δdelay at C-worst
Δdelay at RC-worst [d(Ycw) ndash d(Ytyp)] d(Ytyp)
bull Paths are more sensitive to R or to Cbull Using RC-worst or C-worst only will underestimate delay variationsbull Need both RC- and C-worst corners to cover process variationsbull In the following discussions α is defined at the dominant corner
49ISVLSI-2014 invited talk 140710
Non-Homogeneous Corner
bull Each layer can have different skewed variationsInterconnect stack with M1 and M2
M1 C
M2 C
3σ
Non-homogeneous cornerM1 == Cw (3σ)M2 == Ctyp
bull Less pessimism with non-homogeneous cornersbull Challenge
bull Many feasible combinationsbull A corner can only cover certain pathsbull How to choose the best combinations
50ISVLSI-2014 invited talk 140710
Opportunities for Tightened BEOL Corners
bull CBC can be pessimistic Most paths have α lt 05 bull Use tightened BEOL corners eg scale BEOL variation in
itf with α = 05
Δdj(Yrcw)dj(Ytyp) x 100
3σjd(Ytyp) x 100
Challenge how to avoid underestimating delay variation to preserve parametric yield
51ISVLSI-2014 invited talk 140710
Wiring Structure in Timing-Critical Paths
bull Critical paths are structurally similar
bull Wires on critical paths are routed on many layers
bull Structure is an outcome of the design flow
Testcasebull 45nm foundry library (wire
resistivity scaled by 8X)bull Netlist NETCARD 1mm2 570K
standard cell instancesbull 9 metal layersbull Extract critical paths from
different PVT and BEOL corners
Wirelength ratio ()
52ISVLSI-2014 invited talk 140710
Proposed Timing Signoff Flow
bull Extract RC at RC-worst C-worst and the typical corners
bull Calculate Δdelay of critical paths
bull Put path j in the group Gtbc if Δdelay is larger than a threshold
bull Fix only the paths in Gtbc using tightened BEOL corners
bull Since tightened corners have smaller delay variations timing closure is easier
Routed design
Timing analysis at BEOL corners Ytyp Ycw Yrcw
GTBC GCBC
ECOusing CBC
Timing analysis
using TBC
violation = 0
Timing analysis
using CBC
violation = 0
ECOusing TBC
done
53ISVLSI-2014 invited talk 140710
Experiment Setup
LEON3MP NETCARD SUPERBLUE12Clock period (ns) 18 20 31
bull Model BTI degradation with Vfinal throughout lifetime
bull Aging of a flat Vfinal asymp aging of an adaptive Vdd
bull But slightly pessimistic
Vdd
time
NBTI
PBTI
VBTI = Vlib asymp Vfinal
58ISVLSI-2014 invited talk 140710
Vfinal Estimation
bull Problem Vfinal is not available at early design stage (design has not been implemented)
bull Vfinal = Vdd end of life (to compensate BTI aging)
bull Gates along critical pathbull Timing slack at t = 0
bull Circuit activity is not an issue bull Because BTI effect is not sensitive to circuit activitybull DC or AC stress model is sufficient
59ISVLSI-2014 invited talk 140710
Observation and Heuristic 2
bull Observation 2 Vfinal is not sensitive to gate types
bull Heuristic 2 use average Vfinal of different gate typesbull Vfinal is a function of timing slack
bull Assume timing slack = 0
10mV
60ISVLSI-2014 invited talk 140710
Technology and Benchmark Circuits
bull NANGATE library with 32nm PTM technology bull Signoff for setup time violationbull Temperature = 125Cbull Process corner = slow NMOS and PMOSbull BTI degradation = DC AC
Supply voltages
Circuit Frequency (GHz)C5315 138c7552 125AES 089MPEG2 105
Vmax105V
Vinit090V
Vheur1 (DC) 097V
Vheur1 (AC) 095V
Vheur2 (DC) 095V
Vheur2 (AC) 093V
61ISVLSI-2014 invited talk 140710
A Reference Signoff Flow
bull Basic idea keep a consistent VBTI VLIB and Vdd throughout circuit lifetime
bull Signoff flowbull Estimate aging at each time step
bull Update circuit timing and Vdd
bull Repeat until t = tfinal
bull Modify circuit and start over if Vfinal gt maximum allowed voltage
bull No overhead in timing analysis but very slow Many STA runs
and library
Vstep AVS voltage stepVfinal converged voltage
62ISVLSI-2014 invited talk 140710
Experiment Setupbull Characterize different derated libraries
bull Evaluate impact of library characterizationbull Seven setups
1 VBTI = Vlib = Vinit Ignore AVS2 Most pessimistic derated library3 VBTI = Vlib = Vmax Extreme corner for AVS4 VBTI = Vfinal Do not overestimate aging but ignores AVS5 No derated library (reference)6 Proposed method with α=07 Proposed method with α=003
Case 1 2 3 4 5 6 7Vlib(V) Vinit Vinit Vmax Vinit NA Vheur1 Vheur2
VBTI (V) Vinit Vmax Vmax Vfinal NA Vheur1 Vheur2
63ISVLSI-2014 invited talk 140710
ldquoChicken and Eggrdquo Loop
bull ldquoChicken and eggrdquo loop in signoffbull Derated library characterization is related to BTI + AVSbull AVS affected by circuit implementation
bull Timing constraints critical paths etc
bull Circuit is affected by library characterization
Circuit
Derated Libraries
Vfinal
Vlib VBTI
64ISVLSI-2014 invited talk 140710
Bias Temperature Instability (BTI)
|ΔVth| increases when device is on (stressed)|ΔVth| is partially recovered when device is off (relaxed)NBTI PMOS PBTINMOS
|Vgs|
time
ON OFF ON OFF
[VattikondaWC06]
Device aging (|ΔVth|) accumulates over time
[TCASrsquo14]
65ISVLSI-2014 invited talk 140710
Observation 1
[Chan11]
bull BTI is a ldquofront-loadedrdquo phenomenon
bull 50 BTI aging happens within the 1st year of circuit lifetime (total lifetime = 10 years)
bull Most Vdd increment happens in early lifetime
bull Gap between Vdd and Vfinal reduces rapidly
asymp70 Vdd increment in 1 year(remaining 30 over 9 years)
Vfinal
66ISVLSI-2014 invited talk 140710
Results for DC Scenario
Optimistic signoff corner bull AVS increases supply voltage
aggressively to compensate agingbull Consume more powerbull May fail to meet timing if desired
bull Assume no EM fix at signoffbull BTI degradation is checked at each step and MTTF is updated as
30 MTTF penalty
200mV voltage compensation
>
70ISVLSI-2014 invited talk 140710
0 2 4 6 8 10 12090092094096098100102104
S1 S2 S3 S4 S5
Year
VDD
DMA 3S1 S2 S3 S4 S5
78
79
80
81
MTT
F (Y
ear)
EM Impact on AVS Scheduling
12 years MTTF penalty
>
71ISVLSI-2014 invited talk 140710
What is ldquoSignoffrdquo
bull Foundation of contract between design house and foundrybull ldquochip should workrdquo stack of models margins analysesbull Function timing signal integrity power integrity hellip
Nominal VddStatic IR drop
Power grid IR gradientDynamic IR
HCINBTI
Signoff Vdd
Voltage
Problem Margins = pessimism
overdesign schedule delay
ldquomargin stackrdquo for voltage signoff
Operating voltage
72ISVLSI-2014 invited talk 140710
Statistical Timing Analysis (1)
bull Delay sensitivity of path pj to variation source zv
bull Assumptions bull Δdjv is linear with respect to variation sources
bull Variation sources are normal distributions
bull Obtain Δdjv using 28 runs of RC extraction and static timing analysis (STA)
28 itf files (27 variation
sources + Ytyp)
Routed Netlist
RC extraction
STA
Δdjv
Δdjv = [ - ] 3dj(Yv) dj(Ytyp)
Note Path delay includes gate and wire delays
73ISVLSI-2014 invited talk 140710
Statistical Timing Analysis (2)bull Σ is the correlation matrix for variation sources (eg 27 x 27)
bull Σ = λλT (Note λ is obtained by Cholesky decomposition)
h h119879 119903119900119906119892 119901119906119905=1minus119864119877119879
+1minus119864119877119903times119879
recovery cycles
Clock period
Error rate [Kahng10]
76ISVLSI-2014 invited talk 140710
Selective-Endpoint Optimizationbull Optimize fanin cone w tighter constraints Allows replacement of Razor FF w normal FFbull Trade off cost of resilience vs data path optimization
Energy Reduction in AVS Contextbull Adaptive voltage scaling allows lower supply voltage for resilient
designs thus reduced powerbull Proposed method trades off between timing-error penalty vs
reduced power at a lower supply voltagebull Proposed method achieves an average of 18 energy reduction
compared to pure-margin designs Resilience benefits increase in the context of AVS strategy
084 088 092 096 10030
36
42
48
54
60brute-forcepure-marginCombOpt
Supply voltage (V)
En
erg
y (
mJ
)
084 086 088 09 092 094 096 098 1 10225
29
33
37
41
45brute-force
pure-margin
CombOpt
Supply voltage (V)
En
erg
y (
mJ
)
MUL EXU
Minimum achievable energy
80ISVLSI-2014 invited talk 140710
Our Concept Mode Dominancebull Design cone (of mode A) is the union of all the feasible operating modes for
circuits signed of at mode Abull Design cone is determined by tradeoff between voltage and frequency (mainly
threshold voltages)bull One mode is outside of the design cone of the other
failed design overdesignbull Mode A has positive timing slacks with respect to mode B
mode A dominates mode Bbull Equivalent dominance no mode is dominated by the other
bull Modes are in each othersrsquo design cone
Voltage
Frequency
A
Negative Slacks = failed design
Positive Slacks = overdesign
B
C
Design Cone of mode A
Multi-mode signoff at modes which do not exhibit equivalent dominance leads to overdesign
Guideline search for signoff modes within design cone reduce overdesign
81ISVLSI-2014 invited talk 140710
Our Method Global Optimizationbull Iteratively sample and refine power models
bull Avoid circuit implementation at each modebull Small constant of runs is enough Scalable
Sample (SPampR)
Construct power models
Estimate optimal signoff modes
Sample (SPampR)
Refine power models
Adaptive search
Global optimization flow
09 10 11 1214
15
16
17
18
19
201st 2nd real
Signoff Voltage (v)
Po
wer
(m
W)
Power estimation of adaptive search
bull Ovals indicate sample pointsbull 1st 2nd power from power models at first
second iterationbull real power from real implemented circuits
Design AESf 700MHz
82ISVLSI-2014 invited talk 140710
Classes of Closed-Loop AVS
bull Critical path may be difficult to identify (IP from 3rd party)
bull Calibrating monitors at multiple modesvoltages requires long test time
Closed-Loop AVS
Design-dependent replica
In-situmonitor
Generic monitor
bull Does not capture design-specific performance variation
82
This work Tunable monitor for closed-loop AVSbull Can be applied as a generic monitorbull Or tuned to capture design-specific performance
83ISVLSI-2014 invited talk 140710
Design of RO with Tunable Vmin
bull Identified two circuit knobs to tune Vmin
bull Series resistancebull Cell types (INV NAND NOR)
bull Proposed circuitbull Different cell type covers different process cornersbull Tune series resistance of each stage to high or low
1 bit 1 bit 1 bit Control pins
High resistance
Low resistance
84ISVLSI-2014 invited talk 140710
Benefit of Resilience Cost Reductionbull Reference flows
bull Pure-margin (PM) conventional methodology w only margin insertionbull Brute-force (BF) insert error-tolerant FFs at timing-critical endpoints
bull Proposed method (CO) achieves up to 20 energy reduction compared to reference methods
bull Resilience benefits increase with safety margin
PM BF CO PM BF CO PM BF CO25
30
35
40
45
50
55 Energy penalty of throughput degradationEnergy penalty of additional circuitsEnergy wo resilience
En
erg
y (
mJ
)
Large marginMedium marginSmall margin
MUL
PM BF CO PM BF CO PM BF CO25
27
29
31
33
35
En
erg
y (
mJ
)
EXU
Large marginMedium marginSmall margin
Smallmediumlarge margin safety margin = 51015 of clock period
85ISVLSI-2014 invited talk 140710
Increased Benefit of Resilience With AVSbull AVS (Adaptive Voltage Scaling) allows lower supply voltage for
resilient designs reduced powerbull We trade off between timing-error penalty vs reduced power at a
lower supply voltagebull Average 18 energy reduction compared to pure-margin designs
Resilience benefits increase in AVS context
084 088 092 096 10030
36
42
48
54
60brute-forcepure-marginCombOpt
Supply voltage (V)
En
erg
y (
mJ
)
084 086 088 09 092 094 096 098 1 10225
29
33
37
41
45brute-force
pure-margin
CombOpt
Supply voltage (V)
En
erg
y (
mJ
)
MUL EXU
Minimum achievable energy
86ISVLSI-2014 invited talk 140710
Overall Optimization Flow
bull Iteratively optimize with SEOpt and SkewOpt
Initial placement (all FFs = error-tolerant FFs)
Energy lt min energy
Save current solution
Margin insertion on K paths based on sensitivity function
Replace error-tolerant FFs w normal FFs
SEOpt
Activity-aware clock skew optimization
SkewOpt
Toward Holistic Modeling Margining and Tolerance of IC Variabi
IC Variability
Challenge Value of Technology
Solutions Modeling Margining Tolerance
Outline
BEOL Corner Optimization
Proposed Timing Signoff Flow
Conventional BEOL Corners
Statistical RC Model
Pessimism of Conventional BEOL Corners (CBC)
Intuition on Delay Variability Across Cw RCw
Intuition on Delay Variability Across Cw RCw (2)
Scaling Factor α and Delay Variation
Find Paths for Which TBCs Can Be Used
Determining α Arcw and Acw
Benefits of Tightened BEOL Corners
Outline (2)
How to Minimize Cost of Resilience
Tradeoff Resilience Cost vs Datapath Cost
Selective-Endpoint Optimization (SEOpt)
Clock Skew Optimization (SkewOpt)
Overall Optimization Flow
Benefit of Low-Cost Resilience
Increased Benefit of Resilience with AVS
Outline (3)
Breaking Chicken-Egg Loops Less Margin
Derated Library Characterization and AVS
Library Characterization for AVS
Power vs Area Across Different Signoffs
Heuristics 1
Vfinal Estimation
Observation and Heuristic 2
Proposed Library Characterization Flow
Power vs Area for All Designs
Also Multi-Mode Signoff Choices Matter
Also Tunable Monitors Less Margin
Also Tunable Monitors Less Margin (2)
Outline (4)
Conclusions
Thank You
Backup
Power Penalty to Fix EM with AVS
Homogeneous Corners
Homogeneous Corners (2)
Correlation Matrix
Wiring Structure in Timing-Critical Paths (2)
Delay Variation
Delay Variation (2)
Non-Homogeneous Corner
Opportunities for Tightened BEOL Corners
Wiring Structure in Timing-Critical Paths (2)
Proposed Timing Signoff Flow (2)
Experiment Setup
Further Analysis
Scaling Factor Results
Benefits of Tightened BEOL Corners (1)
Heuristics 1 (2)
Vfinal Estimation (2)
Observation and Heuristic 2 (2)
Technology and Benchmark Circuits
A Reference Signoff Flow
Experiment Setup (2)
ldquoChicken and Eggrdquo Loop
Bias Temperature Instability (BTI)
Observation 1
Results for DC Scenario
Problem Signoff Corner Definition
AVS Signoff Corner Selection
AVS Impact on EM Lifetime
EM Impact on AVS Scheduling
What is ldquoSignoffrdquo
Statistical Timing Analysis (1)
Statistical Timing Analysis (2)
Resilient Designs
Resilience Cost Reduction Problem
Selective-Endpoint Optimization
Process-Aware Vdd Scaling (PVS)
Challenge Variability
Energy Reduction in AVS Context
Our Concept Mode Dominance
Our Method Global Optimization
Classes of Closed-Loop AVS
Design of RO with Tunable Vmin
Benefit of Resilience Cost Reduction
Increased Benefit of Resilience With AVS
Overall Optimization Flow (2)
12ISVLSI-2014 invited talk 140710
α α
Δdelay at C-worst [d(Ycw) ndash d(Ytyp)] d(Ytyp)
bull Some paths have α gt 10 a CBC can underestimate delay variationsbull But these paths often have smaller α values at the other corner ()
C-worst corner underestimates delay variations but these paths are dominated by the RC-worst corner
α lt 10 delay variations are covered by the RC-worst corner
Dominated by C-worstΔdelay at C-worst gt Δdelay at RC-worst
Dominated by RC-worst Δdelay at RC-worst gt Δdelay at C-worst
Δdelay at RC-worst [d(Ycw) ndash d(Ytyp)] d(Ytyp)
bull Paths are more sensitive to R or to Cbull Using RC-worst or C-worst only will underestimate delay variationsbull Need both RC- and C-worst corners to cover process variations
In the following α is defined at the dominant corner
Intuition on Delay Variability Across Cw RCw
13ISVLSI-2014 invited talk 140710
Scaling Factor α and Delay Variationbull Paths with small Δdrcw and Δdcw have large α
bull Eg here we see αj gt 06 when ((Δdrcw lt 3) AND (Δdcw lt 3))
bull Identify paths for tightened BEOL corners based on Δdrcw and Δdcw
Δd(Ycw)d(Ytyp)
Δd(Yrcw)d(Ytyp) α
14ISVLSI-2014 invited talk 140710
bull Paths with small Δdrcw and Δdcw have large α
bull Eg there are αj gt 06 when ((Δdrcw lt 3) AND (Δdcw lt 3))
bull Identify paths for tightened BEOL corners based on Δdrcw and Δdcw
Find Paths for Which TBCs Can Be Used
Δd(Ycw)d(Ytyp)
Δd(Yrcw)d(Ytyp)
Acw
Arcw
Gtbc = Set of paths that can be safely signed off using TBC ( (Path with Δdcw larger than Acw) OR (Path with Δdrcw larger than Arcw) )
α
15ISVLSI-2014 invited talk 140710
Determining α Arcw and Acw
Δd at C-worst corner ()Δd at RC-worst corner ()
bull Assumption critical paths in different designs have similar trends
bull Extract Arcw and Acw from a set of representative paths
bull Plot α vs Δdelay find Arcw and Acw for a given α
bull Add +1 margin on Arcw and Acw to account for sampling error
bull Smaller α larger thresholds (Arcw and Acw) fewer paths in GTBC
Δd at C-worst corner ()
Arcw Acw
16ISVLSI-2014 invited talk 140710
Benefits of Tightened BEOL Corners
bull WNS and TNS are reduced by up to 100ps and 53nsbull Timing violations reduced by
24 to 100
bull TBC-06 more benefits bull Tradeoff between reduced margin
vs paths which use TBC
Correlation factor γ = 05
LEON SUPERBLUE12 NETCARD
-018-016-014-012
-01-008-006-004-002
0
CBC TBC-05 TBC-06 TBC-07
WN
S (n
s)
LEON SUPERBLUE12 NETCARD
-90-80-70-60-50-40-30-20-10
0
CBC TBC-05 TBC-06 TBC-07
TNS
(ns)
LEON SUPERBLUE12 NETCARD0
200400600800
1000120014001600
CBC TBC-05 TBC-06 TBC-07
Tim
ing
viol
ation
s
17ISVLSI-2014 invited talk 140710
Outlinebull Introductionbull Modeling of IC Variabilitybull Tolerance of IC Variabilitybull Margining of IC Variabilitybull Conclusions
18ISVLSI-2014 invited talk 140710
How to Minimize Cost of Resilience bull Additional circuits area and power penaltiesbull Recovery from errors throughput degradationbull Large hold margin short-path padding costbull Want benefits (eg energy) to maximally outweigh costs
Razor Razor-Lite TIMBER
Razor Razor-Lite TIMBER
Power penalty 30 [Das08] ~0 [Kim13] 100 [Choudhury09]
Overall Optimization Flowbull Iteratively optimize with SEOpt and SkewOpt
Initial placement (all FFs = error-tolerant FFs)
Energy lt min energy
Save current solution
Margin insertion on K paths based on sensitivity function
Replace error-tolerant FFs w normal FFs
SEOpt
Activity aware clock skew optimization
SkewOpt
OR-tree insertion
23ISVLSI-2014 invited talk 140710
Benefit of Low-Cost Resiliencebull Reference flows
bull Pure-margin (PM) conventional method w only margin insertionbull Brute-force (BF) use error-tolerant FFs for timing-critical endpoints
bull Proposed method (CO) achieves up to 21 energy reduction compared to reference methods
bull Resilience benefits increase with larger process variation
PM BF CO PM BF CO PM BF CO27
29
31
33
35
37
En
erg
y (
mJ
)
PM BF CO PM BF CO PM BF CO22
26
30
34
38Energy penalty of throughput degradation
Energy penalty of additional circuits
Energy wo resilience
En
erg
y (
mJ
)
Large marginMedium marginSmall margin
MUL
EXU
Large marginMedium marginSmall margin
Smallmediumlarge margin 1σ2σ3σ for SS corner Technology foundry 28nm
24ISVLSI-2014 invited talk 140710
Increased Benefit of Resilience with AVSbull Adaptive voltage scaling allows a lower supply voltage for resilient
designs thus reduced powerbull Proposed method trades off between timing-error penalty vs
reduced power at a lower supply voltagebull Proposed method achieves an average of 17 energy reduction
compared to pure-margin designs Resilience benefits increase in the context of AVS strategy
086 09 094 098 10225
30
35
40
45
50pure-marginbrute-forceCombOpt
Supply voltage (V)
En
erg
y (
mJ
)
070 072 074 076 078 08024
26
28
30
32
34
36 pure-marginbrute-forceCombOpt
Supply voltage (V)
En
erg
y (
mJ
)
MUL EXU
Minimum achievable energy
Technology foundry 28nm
25ISVLSI-2014 invited talk 140710
Outlinebull Introductionbull Modeling of IC Variabilitybull Tolerance of IC Variabilitybull Margining of IC Variabilitybull Conclusions
26ISVLSI-2014 invited talk 140710
Breaking Chicken-Egg Loops Less Marginbull Example Interaction between reliability margin and AVS designs
bull Bias temperature instability (BTI) aging higher |ΔVth| lower fmax
bull AVS can be used to compensate for performance degradation
Circuit
Closed-loop AVS
On-chip aging
monitor
Circuit performanc
e
Voltage regulato
r
Circuit frequency
Vdd
time
time
Without AVSWith AVS
target
27ISVLSI-2014 invited talk 140710
Derated Library Characterization and AVS
bull VBTI = Voltage for BTI aging estimation
bull Vlib = Voltage for circuit performance estimation (library characterization)
bull VBTI and Vlib are required in signoff
bull VBTI and Vlib selection should consider BTI + AVS interaction
bull Aging and Vfinal are unknowns before circuit implementation
BTI degradation and AVS
Vfinal
VBTI |Vt|
Step 1
Vlib
Derated library
Step 2
Circuit implementation and
signoff
circuit
Step 3
28ISVLSI-2014 invited talk 140710
Library Characterization for AVS
bull VBTI = Voltage for BTI aging estimation
bull Vlib = Voltage for circuit performance estimation (library characterization)
bull VBTI and Vlib are required in signoff
bull VBTI and Vlib depend on aging during AVS
bull Aging and Vfinal are unknowns before circuit implementation
Vlib
VBTI Derated library
|Vt| Circuit implementation and
signoff
circuitBTI degradation and AVS
Vfinal
Step 1 Step 2 Step 3
No obvious guideline to define VBTI and Vlib
Inconsistency among Vfinal Vlib VBTI
bull What is the design overhead when timing libraries are not properly characterized
bull Can we define BTI- and AVS-aware signoff corners that ensure product goals with small design lifetime energy overheads Joint work with Wei-Ting Jonas Chan Tuck-Boon Chan Siddhartha Nath
29ISVLSI-2014 invited talk 140710
Power vs Area Across Different Signoffs
ldquoKneerdquo point for balanced area and power tradeoff
Optimistic signoff corner bull AVS increases supply voltage
aggressively to compensate aging
bull Large lifetime energy overhead
bull May fail to meet timing if desired supply voltage gt Vmax
bull Selection of signoff modes affects area power
bull ASP-DAC 2013 Optimization of signoff modes
Improve performance power or area
Reduce overdesign
NOM
ODNOM
OD
time
Vdd
tnom tOD tnom tOD
Also Multi-Mode Signoff Choices Matter
12
Fix fOD still 14 power range
Power of circuits w different overdrive modes
Different overdrive modes 26 power range
fnom = 800MHz Vnom = 08V
36ISVLSI-2014 invited talk 140710
Also Tunable Monitors Less Margin
Aggressive config Vmin_est lt Vmin_chip Some chips will fail
Optimized configbull Increase high
resistance passgatesbull Vmin_est asymp Vmin_chip
Default config
bull Low resistance passgates
bull Guardband for worst-case
bull Vmin_est gt Vmin_chip
bull 13mV margin
37ISVLSI-2014 invited talk 140710
Also Tunable Monitors Less Margin
Aggressive config Vmin_est lt Vmin_chip Some chips will fail
Default config
bull Low resistance passgates
bull Guardband for worst-case
bull Vmin_est gt Vmin_chip
bull 13mV margin
Optimized configbull Increase high
resistance passgatesbull Vmin_est asymp Vmin_chip
Benefits of tunability bull Compensate for difference
between model vs siliconbull Recover margin when variation is
reduced due to improved process
38ISVLSI-2014 invited talk 140710
Outlinebull Introductionbull Modeling of IC Variabilitybull Margining of IC Variabilitybull Tolerance of IC Variabilitybull Conclusions
39ISVLSI-2014 invited talk 140710
Conclusionsbull Variability severely challenges IC value
bull In manufacturing process during operation across lifetime
bull Benefit of ldquonext noderdquo is increasingly hard to findbull Entire node is a ldquo202020rdquo value propositionbull 5-10 in PPA metrics is now substantial at leading edge
bull Variability is connected to tapeout IC properties by models margins tolerances used in signoff
bull Some takeaways from this talkbull Substantial benefit from tightening BEOL corners (= signoff)bull ldquoMinimum cost of resiliencerdquo is a rich optimization challengebull Chicken-egg loops in signoff definition can be brokenbull Holistic approaches will provide ldquoequivalent scalingrdquo that
extends the value trajectory of Moorersquos Law
40ISVLSI-2014 invited talk 140710
Thank You
41ISVLSI-2014 invited talk 140710
Backup
42ISVLSI-2014 invited talk 140710
Power Penalty to Fix EM with AVS
1 2 3 4 5 6 7 8 91200
1300
1400
1500
1600
1700
030
032
034
036
Core Power (mW) PG Power (mW)
Implemetation
Core
Pow
er (m
W)
PG
Pow
er (m
W)
bull Core power increases due to elevated voltage bull PG power increases due to both elevated voltage and mesh degradationbull A tradeoff between invested guardband in signoff
Highest invested guardband
Least invested guardband
14 power penalty
>
43ISVLSI-2014 invited talk 140710
Homogeneous Cornersbull (1) Define RC corners of each layer separatelybull (2) Use corners from each layer to construct a
homogeneous corner for an interconnect stack
C-3σ
Layer M2
3σ
C-3σ
Layer M1
3σ
Interconnect stack with M1 and M2
M1 C
M2 C
3σ Pessimism
Example worst-case capacitance corner Homogeneous
Cw corner
44ISVLSI-2014 invited talk 140710
Homogeneous Cornersbull (1) Define RC corners of each layer separatelybull (2) Use corners from each layer to construct a
homogeneous corner for an interconnect stack
Interconnect stack with M1 and M2
M1 C
M2 C
3σ
Homogeneous Cw corner
C-3σ
Layer M2
3σ
C-3σ
Layer M1
3σ
Pessimism
Example worst-case capacitance corner
When variations in different layers are not fully correlated pessimism of homogeneous corners increase with layers
45ISVLSI-2014 invited talk 140710
Correlation Matrixbull Let Σ be the correlation matrix for variation sources
M1 M2 M3 M4
ΔW ΔT ΔH ΔW ΔT ΔH ΔW ΔT ΔH ΔW ΔT ΔH
M1 ΔW 1 0 0 γ 0 0 γ 0 0 0 0 0
ΔT 0 1 0 0 γ 0 0 γ 0 0 0 0
ΔH 0 0 1 0 0 γ 0 0 γ 0 0 0
M2 ΔW γ 0 0 1 0 0 γ 0 0 0 0 0
ΔT 0 γ 0 0 1 0 0 γ 0 0 0 0
ΔH 0 0 γ 0 0 1 0 0 γ 0 0 0
M3 ΔW γ 0 0 γ 0 0 1 0 0 0 0 0
ΔT 0 γ 0 0 γ 0 0 1 0 0 0 0
ΔH 0 0 γ 0 0 γ 0 0 1 0 0 0
M4 ΔW 0 0 0 0 0 0 0 0 0 1 0 0
ΔT 0 0 0 0 0 0 0 0 0 0 1 0
ΔH 0 0 0 0 0 0 0 0 0 0 0 1
= Σ
Correlation for variation sources with the same variation type and in the process module γ 05
Variation sources in different process modules are independent
46ISVLSI-2014 invited talk 140710
Wiring Structure in Timing-Critical Paths (2)
bull 92 of paths have lt 60 of wirelength on any single layer
Max wirelength ratio across all layers ()
Cum
ulati
ve p
roba
bilit
y
092
60
bull Variations in different layers are not fully correlated
bull Some paths have α gt 10 a CBC can underestimate delay variationsbull But these paths have larger delays at the other corner
C-worst corner underestimates delay variations but these paths are dominated by the RC-worst corner
α lt 10 delay variations are covered by the RC-worst corner
Dominated by C-worstΔdelay at C-worst gt Δdelay at RC-worst
Dominated by RC-worst Δdelay at RC-worst gt Δdelay at C-worst
Δdelay at RC-worst [d(Ycw) ndash d(Ytyp)] d(Ytyp)
48ISVLSI-2014 invited talk 140710
Delay Variation
α α
Δdelay at C-worst [d(Ycw) ndash d(Ytyp)] d(Ytyp)
bull Some paths have α gt 10 a CBC can underestimate delay variationsbull But these paths have larger delays at the other corner
C-worst corner underestimates delay variations but these paths are dominated by the RC-worst corner
α lt 10 delay variations are covered by the RC-worst corner
Dominated by C-worstΔdelay at C-worst gt Δdelay at RC-worst
Dominated by RC-worst Δdelay at RC-worst gt Δdelay at C-worst
Δdelay at RC-worst [d(Ycw) ndash d(Ytyp)] d(Ytyp)
bull Paths are more sensitive to R or to Cbull Using RC-worst or C-worst only will underestimate delay variationsbull Need both RC- and C-worst corners to cover process variationsbull In the following discussions α is defined at the dominant corner
49ISVLSI-2014 invited talk 140710
Non-Homogeneous Corner
bull Each layer can have different skewed variationsInterconnect stack with M1 and M2
M1 C
M2 C
3σ
Non-homogeneous cornerM1 == Cw (3σ)M2 == Ctyp
bull Less pessimism with non-homogeneous cornersbull Challenge
bull Many feasible combinationsbull A corner can only cover certain pathsbull How to choose the best combinations
50ISVLSI-2014 invited talk 140710
Opportunities for Tightened BEOL Corners
bull CBC can be pessimistic Most paths have α lt 05 bull Use tightened BEOL corners eg scale BEOL variation in
itf with α = 05
Δdj(Yrcw)dj(Ytyp) x 100
3σjd(Ytyp) x 100
Challenge how to avoid underestimating delay variation to preserve parametric yield
51ISVLSI-2014 invited talk 140710
Wiring Structure in Timing-Critical Paths
bull Critical paths are structurally similar
bull Wires on critical paths are routed on many layers
bull Structure is an outcome of the design flow
Testcasebull 45nm foundry library (wire
resistivity scaled by 8X)bull Netlist NETCARD 1mm2 570K
standard cell instancesbull 9 metal layersbull Extract critical paths from
different PVT and BEOL corners
Wirelength ratio ()
52ISVLSI-2014 invited talk 140710
Proposed Timing Signoff Flow
bull Extract RC at RC-worst C-worst and the typical corners
bull Calculate Δdelay of critical paths
bull Put path j in the group Gtbc if Δdelay is larger than a threshold
bull Fix only the paths in Gtbc using tightened BEOL corners
bull Since tightened corners have smaller delay variations timing closure is easier
Routed design
Timing analysis at BEOL corners Ytyp Ycw Yrcw
GTBC GCBC
ECOusing CBC
Timing analysis
using TBC
violation = 0
Timing analysis
using CBC
violation = 0
ECOusing TBC
done
53ISVLSI-2014 invited talk 140710
Experiment Setup
LEON3MP NETCARD SUPERBLUE12Clock period (ns) 18 20 31
bull Model BTI degradation with Vfinal throughout lifetime
bull Aging of a flat Vfinal asymp aging of an adaptive Vdd
bull But slightly pessimistic
Vdd
time
NBTI
PBTI
VBTI = Vlib asymp Vfinal
58ISVLSI-2014 invited talk 140710
Vfinal Estimation
bull Problem Vfinal is not available at early design stage (design has not been implemented)
bull Vfinal = Vdd end of life (to compensate BTI aging)
bull Gates along critical pathbull Timing slack at t = 0
bull Circuit activity is not an issue bull Because BTI effect is not sensitive to circuit activitybull DC or AC stress model is sufficient
59ISVLSI-2014 invited talk 140710
Observation and Heuristic 2
bull Observation 2 Vfinal is not sensitive to gate types
bull Heuristic 2 use average Vfinal of different gate typesbull Vfinal is a function of timing slack
bull Assume timing slack = 0
10mV
60ISVLSI-2014 invited talk 140710
Technology and Benchmark Circuits
bull NANGATE library with 32nm PTM technology bull Signoff for setup time violationbull Temperature = 125Cbull Process corner = slow NMOS and PMOSbull BTI degradation = DC AC
Supply voltages
Circuit Frequency (GHz)C5315 138c7552 125AES 089MPEG2 105
Vmax105V
Vinit090V
Vheur1 (DC) 097V
Vheur1 (AC) 095V
Vheur2 (DC) 095V
Vheur2 (AC) 093V
61ISVLSI-2014 invited talk 140710
A Reference Signoff Flow
bull Basic idea keep a consistent VBTI VLIB and Vdd throughout circuit lifetime
bull Signoff flowbull Estimate aging at each time step
bull Update circuit timing and Vdd
bull Repeat until t = tfinal
bull Modify circuit and start over if Vfinal gt maximum allowed voltage
bull No overhead in timing analysis but very slow Many STA runs
and library
Vstep AVS voltage stepVfinal converged voltage
62ISVLSI-2014 invited talk 140710
Experiment Setupbull Characterize different derated libraries
bull Evaluate impact of library characterizationbull Seven setups
1 VBTI = Vlib = Vinit Ignore AVS2 Most pessimistic derated library3 VBTI = Vlib = Vmax Extreme corner for AVS4 VBTI = Vfinal Do not overestimate aging but ignores AVS5 No derated library (reference)6 Proposed method with α=07 Proposed method with α=003
Case 1 2 3 4 5 6 7Vlib(V) Vinit Vinit Vmax Vinit NA Vheur1 Vheur2
VBTI (V) Vinit Vmax Vmax Vfinal NA Vheur1 Vheur2
63ISVLSI-2014 invited talk 140710
ldquoChicken and Eggrdquo Loop
bull ldquoChicken and eggrdquo loop in signoffbull Derated library characterization is related to BTI + AVSbull AVS affected by circuit implementation
bull Timing constraints critical paths etc
bull Circuit is affected by library characterization
Circuit
Derated Libraries
Vfinal
Vlib VBTI
64ISVLSI-2014 invited talk 140710
Bias Temperature Instability (BTI)
|ΔVth| increases when device is on (stressed)|ΔVth| is partially recovered when device is off (relaxed)NBTI PMOS PBTINMOS
|Vgs|
time
ON OFF ON OFF
[VattikondaWC06]
Device aging (|ΔVth|) accumulates over time
[TCASrsquo14]
65ISVLSI-2014 invited talk 140710
Observation 1
[Chan11]
bull BTI is a ldquofront-loadedrdquo phenomenon
bull 50 BTI aging happens within the 1st year of circuit lifetime (total lifetime = 10 years)
bull Most Vdd increment happens in early lifetime
bull Gap between Vdd and Vfinal reduces rapidly
asymp70 Vdd increment in 1 year(remaining 30 over 9 years)
Vfinal
66ISVLSI-2014 invited talk 140710
Results for DC Scenario
Optimistic signoff corner bull AVS increases supply voltage
aggressively to compensate agingbull Consume more powerbull May fail to meet timing if desired
bull Assume no EM fix at signoffbull BTI degradation is checked at each step and MTTF is updated as
30 MTTF penalty
200mV voltage compensation
>
70ISVLSI-2014 invited talk 140710
0 2 4 6 8 10 12090092094096098100102104
S1 S2 S3 S4 S5
Year
VDD
DMA 3S1 S2 S3 S4 S5
78
79
80
81
MTT
F (Y
ear)
EM Impact on AVS Scheduling
12 years MTTF penalty
>
71ISVLSI-2014 invited talk 140710
What is ldquoSignoffrdquo
bull Foundation of contract between design house and foundrybull ldquochip should workrdquo stack of models margins analysesbull Function timing signal integrity power integrity hellip
Nominal VddStatic IR drop
Power grid IR gradientDynamic IR
HCINBTI
Signoff Vdd
Voltage
Problem Margins = pessimism
overdesign schedule delay
ldquomargin stackrdquo for voltage signoff
Operating voltage
72ISVLSI-2014 invited talk 140710
Statistical Timing Analysis (1)
bull Delay sensitivity of path pj to variation source zv
bull Assumptions bull Δdjv is linear with respect to variation sources
bull Variation sources are normal distributions
bull Obtain Δdjv using 28 runs of RC extraction and static timing analysis (STA)
28 itf files (27 variation
sources + Ytyp)
Routed Netlist
RC extraction
STA
Δdjv
Δdjv = [ - ] 3dj(Yv) dj(Ytyp)
Note Path delay includes gate and wire delays
73ISVLSI-2014 invited talk 140710
Statistical Timing Analysis (2)bull Σ is the correlation matrix for variation sources (eg 27 x 27)
bull Σ = λλT (Note λ is obtained by Cholesky decomposition)
h h119879 119903119900119906119892 119901119906119905=1minus119864119877119879
+1minus119864119877119903times119879
recovery cycles
Clock period
Error rate [Kahng10]
76ISVLSI-2014 invited talk 140710
Selective-Endpoint Optimizationbull Optimize fanin cone w tighter constraints Allows replacement of Razor FF w normal FFbull Trade off cost of resilience vs data path optimization
Energy Reduction in AVS Contextbull Adaptive voltage scaling allows lower supply voltage for resilient
designs thus reduced powerbull Proposed method trades off between timing-error penalty vs
reduced power at a lower supply voltagebull Proposed method achieves an average of 18 energy reduction
compared to pure-margin designs Resilience benefits increase in the context of AVS strategy
084 088 092 096 10030
36
42
48
54
60brute-forcepure-marginCombOpt
Supply voltage (V)
En
erg
y (
mJ
)
084 086 088 09 092 094 096 098 1 10225
29
33
37
41
45brute-force
pure-margin
CombOpt
Supply voltage (V)
En
erg
y (
mJ
)
MUL EXU
Minimum achievable energy
80ISVLSI-2014 invited talk 140710
Our Concept Mode Dominancebull Design cone (of mode A) is the union of all the feasible operating modes for
circuits signed of at mode Abull Design cone is determined by tradeoff between voltage and frequency (mainly
threshold voltages)bull One mode is outside of the design cone of the other
failed design overdesignbull Mode A has positive timing slacks with respect to mode B
mode A dominates mode Bbull Equivalent dominance no mode is dominated by the other
bull Modes are in each othersrsquo design cone
Voltage
Frequency
A
Negative Slacks = failed design
Positive Slacks = overdesign
B
C
Design Cone of mode A
Multi-mode signoff at modes which do not exhibit equivalent dominance leads to overdesign
Guideline search for signoff modes within design cone reduce overdesign
81ISVLSI-2014 invited talk 140710
Our Method Global Optimizationbull Iteratively sample and refine power models
bull Avoid circuit implementation at each modebull Small constant of runs is enough Scalable
Sample (SPampR)
Construct power models
Estimate optimal signoff modes
Sample (SPampR)
Refine power models
Adaptive search
Global optimization flow
09 10 11 1214
15
16
17
18
19
201st 2nd real
Signoff Voltage (v)
Po
wer
(m
W)
Power estimation of adaptive search
bull Ovals indicate sample pointsbull 1st 2nd power from power models at first
second iterationbull real power from real implemented circuits
Design AESf 700MHz
82ISVLSI-2014 invited talk 140710
Classes of Closed-Loop AVS
bull Critical path may be difficult to identify (IP from 3rd party)
bull Calibrating monitors at multiple modesvoltages requires long test time
Closed-Loop AVS
Design-dependent replica
In-situmonitor
Generic monitor
bull Does not capture design-specific performance variation
82
This work Tunable monitor for closed-loop AVSbull Can be applied as a generic monitorbull Or tuned to capture design-specific performance
83ISVLSI-2014 invited talk 140710
Design of RO with Tunable Vmin
bull Identified two circuit knobs to tune Vmin
bull Series resistancebull Cell types (INV NAND NOR)
bull Proposed circuitbull Different cell type covers different process cornersbull Tune series resistance of each stage to high or low
1 bit 1 bit 1 bit Control pins
High resistance
Low resistance
84ISVLSI-2014 invited talk 140710
Benefit of Resilience Cost Reductionbull Reference flows
bull Pure-margin (PM) conventional methodology w only margin insertionbull Brute-force (BF) insert error-tolerant FFs at timing-critical endpoints
bull Proposed method (CO) achieves up to 20 energy reduction compared to reference methods
bull Resilience benefits increase with safety margin
PM BF CO PM BF CO PM BF CO25
30
35
40
45
50
55 Energy penalty of throughput degradationEnergy penalty of additional circuitsEnergy wo resilience
En
erg
y (
mJ
)
Large marginMedium marginSmall margin
MUL
PM BF CO PM BF CO PM BF CO25
27
29
31
33
35
En
erg
y (
mJ
)
EXU
Large marginMedium marginSmall margin
Smallmediumlarge margin safety margin = 51015 of clock period
85ISVLSI-2014 invited talk 140710
Increased Benefit of Resilience With AVSbull AVS (Adaptive Voltage Scaling) allows lower supply voltage for
resilient designs reduced powerbull We trade off between timing-error penalty vs reduced power at a
lower supply voltagebull Average 18 energy reduction compared to pure-margin designs
Resilience benefits increase in AVS context
084 088 092 096 10030
36
42
48
54
60brute-forcepure-marginCombOpt
Supply voltage (V)
En
erg
y (
mJ
)
084 086 088 09 092 094 096 098 1 10225
29
33
37
41
45brute-force
pure-margin
CombOpt
Supply voltage (V)
En
erg
y (
mJ
)
MUL EXU
Minimum achievable energy
86ISVLSI-2014 invited talk 140710
Overall Optimization Flow
bull Iteratively optimize with SEOpt and SkewOpt
Initial placement (all FFs = error-tolerant FFs)
Energy lt min energy
Save current solution
Margin insertion on K paths based on sensitivity function
Replace error-tolerant FFs w normal FFs
SEOpt
Activity-aware clock skew optimization
SkewOpt
Toward Holistic Modeling Margining and Tolerance of IC Variabi
IC Variability
Challenge Value of Technology
Solutions Modeling Margining Tolerance
Outline
BEOL Corner Optimization
Proposed Timing Signoff Flow
Conventional BEOL Corners
Statistical RC Model
Pessimism of Conventional BEOL Corners (CBC)
Intuition on Delay Variability Across Cw RCw
Intuition on Delay Variability Across Cw RCw (2)
Scaling Factor α and Delay Variation
Find Paths for Which TBCs Can Be Used
Determining α Arcw and Acw
Benefits of Tightened BEOL Corners
Outline (2)
How to Minimize Cost of Resilience
Tradeoff Resilience Cost vs Datapath Cost
Selective-Endpoint Optimization (SEOpt)
Clock Skew Optimization (SkewOpt)
Overall Optimization Flow
Benefit of Low-Cost Resilience
Increased Benefit of Resilience with AVS
Outline (3)
Breaking Chicken-Egg Loops Less Margin
Derated Library Characterization and AVS
Library Characterization for AVS
Power vs Area Across Different Signoffs
Heuristics 1
Vfinal Estimation
Observation and Heuristic 2
Proposed Library Characterization Flow
Power vs Area for All Designs
Also Multi-Mode Signoff Choices Matter
Also Tunable Monitors Less Margin
Also Tunable Monitors Less Margin (2)
Outline (4)
Conclusions
Thank You
Backup
Power Penalty to Fix EM with AVS
Homogeneous Corners
Homogeneous Corners (2)
Correlation Matrix
Wiring Structure in Timing-Critical Paths (2)
Delay Variation
Delay Variation (2)
Non-Homogeneous Corner
Opportunities for Tightened BEOL Corners
Wiring Structure in Timing-Critical Paths (2)
Proposed Timing Signoff Flow (2)
Experiment Setup
Further Analysis
Scaling Factor Results
Benefits of Tightened BEOL Corners (1)
Heuristics 1 (2)
Vfinal Estimation (2)
Observation and Heuristic 2 (2)
Technology and Benchmark Circuits
A Reference Signoff Flow
Experiment Setup (2)
ldquoChicken and Eggrdquo Loop
Bias Temperature Instability (BTI)
Observation 1
Results for DC Scenario
Problem Signoff Corner Definition
AVS Signoff Corner Selection
AVS Impact on EM Lifetime
EM Impact on AVS Scheduling
What is ldquoSignoffrdquo
Statistical Timing Analysis (1)
Statistical Timing Analysis (2)
Resilient Designs
Resilience Cost Reduction Problem
Selective-Endpoint Optimization
Process-Aware Vdd Scaling (PVS)
Challenge Variability
Energy Reduction in AVS Context
Our Concept Mode Dominance
Our Method Global Optimization
Classes of Closed-Loop AVS
Design of RO with Tunable Vmin
Benefit of Resilience Cost Reduction
Increased Benefit of Resilience With AVS
Overall Optimization Flow (2)
13ISVLSI-2014 invited talk 140710
Scaling Factor α and Delay Variationbull Paths with small Δdrcw and Δdcw have large α
bull Eg here we see αj gt 06 when ((Δdrcw lt 3) AND (Δdcw lt 3))
bull Identify paths for tightened BEOL corners based on Δdrcw and Δdcw
Δd(Ycw)d(Ytyp)
Δd(Yrcw)d(Ytyp) α
14ISVLSI-2014 invited talk 140710
bull Paths with small Δdrcw and Δdcw have large α
bull Eg there are αj gt 06 when ((Δdrcw lt 3) AND (Δdcw lt 3))
bull Identify paths for tightened BEOL corners based on Δdrcw and Δdcw
Find Paths for Which TBCs Can Be Used
Δd(Ycw)d(Ytyp)
Δd(Yrcw)d(Ytyp)
Acw
Arcw
Gtbc = Set of paths that can be safely signed off using TBC ( (Path with Δdcw larger than Acw) OR (Path with Δdrcw larger than Arcw) )
α
15ISVLSI-2014 invited talk 140710
Determining α Arcw and Acw
Δd at C-worst corner ()Δd at RC-worst corner ()
bull Assumption critical paths in different designs have similar trends
bull Extract Arcw and Acw from a set of representative paths
bull Plot α vs Δdelay find Arcw and Acw for a given α
bull Add +1 margin on Arcw and Acw to account for sampling error
bull Smaller α larger thresholds (Arcw and Acw) fewer paths in GTBC
Δd at C-worst corner ()
Arcw Acw
16ISVLSI-2014 invited talk 140710
Benefits of Tightened BEOL Corners
bull WNS and TNS are reduced by up to 100ps and 53nsbull Timing violations reduced by
24 to 100
bull TBC-06 more benefits bull Tradeoff between reduced margin
vs paths which use TBC
Correlation factor γ = 05
LEON SUPERBLUE12 NETCARD
-018-016-014-012
-01-008-006-004-002
0
CBC TBC-05 TBC-06 TBC-07
WN
S (n
s)
LEON SUPERBLUE12 NETCARD
-90-80-70-60-50-40-30-20-10
0
CBC TBC-05 TBC-06 TBC-07
TNS
(ns)
LEON SUPERBLUE12 NETCARD0
200400600800
1000120014001600
CBC TBC-05 TBC-06 TBC-07
Tim
ing
viol
ation
s
17ISVLSI-2014 invited talk 140710
Outlinebull Introductionbull Modeling of IC Variabilitybull Tolerance of IC Variabilitybull Margining of IC Variabilitybull Conclusions
18ISVLSI-2014 invited talk 140710
How to Minimize Cost of Resilience bull Additional circuits area and power penaltiesbull Recovery from errors throughput degradationbull Large hold margin short-path padding costbull Want benefits (eg energy) to maximally outweigh costs
Razor Razor-Lite TIMBER
Razor Razor-Lite TIMBER
Power penalty 30 [Das08] ~0 [Kim13] 100 [Choudhury09]
Overall Optimization Flowbull Iteratively optimize with SEOpt and SkewOpt
Initial placement (all FFs = error-tolerant FFs)
Energy lt min energy
Save current solution
Margin insertion on K paths based on sensitivity function
Replace error-tolerant FFs w normal FFs
SEOpt
Activity aware clock skew optimization
SkewOpt
OR-tree insertion
23ISVLSI-2014 invited talk 140710
Benefit of Low-Cost Resiliencebull Reference flows
bull Pure-margin (PM) conventional method w only margin insertionbull Brute-force (BF) use error-tolerant FFs for timing-critical endpoints
bull Proposed method (CO) achieves up to 21 energy reduction compared to reference methods
bull Resilience benefits increase with larger process variation
PM BF CO PM BF CO PM BF CO27
29
31
33
35
37
En
erg
y (
mJ
)
PM BF CO PM BF CO PM BF CO22
26
30
34
38Energy penalty of throughput degradation
Energy penalty of additional circuits
Energy wo resilience
En
erg
y (
mJ
)
Large marginMedium marginSmall margin
MUL
EXU
Large marginMedium marginSmall margin
Smallmediumlarge margin 1σ2σ3σ for SS corner Technology foundry 28nm
24ISVLSI-2014 invited talk 140710
Increased Benefit of Resilience with AVSbull Adaptive voltage scaling allows a lower supply voltage for resilient
designs thus reduced powerbull Proposed method trades off between timing-error penalty vs
reduced power at a lower supply voltagebull Proposed method achieves an average of 17 energy reduction
compared to pure-margin designs Resilience benefits increase in the context of AVS strategy
086 09 094 098 10225
30
35
40
45
50pure-marginbrute-forceCombOpt
Supply voltage (V)
En
erg
y (
mJ
)
070 072 074 076 078 08024
26
28
30
32
34
36 pure-marginbrute-forceCombOpt
Supply voltage (V)
En
erg
y (
mJ
)
MUL EXU
Minimum achievable energy
Technology foundry 28nm
25ISVLSI-2014 invited talk 140710
Outlinebull Introductionbull Modeling of IC Variabilitybull Tolerance of IC Variabilitybull Margining of IC Variabilitybull Conclusions
26ISVLSI-2014 invited talk 140710
Breaking Chicken-Egg Loops Less Marginbull Example Interaction between reliability margin and AVS designs
bull Bias temperature instability (BTI) aging higher |ΔVth| lower fmax
bull AVS can be used to compensate for performance degradation
Circuit
Closed-loop AVS
On-chip aging
monitor
Circuit performanc
e
Voltage regulato
r
Circuit frequency
Vdd
time
time
Without AVSWith AVS
target
27ISVLSI-2014 invited talk 140710
Derated Library Characterization and AVS
bull VBTI = Voltage for BTI aging estimation
bull Vlib = Voltage for circuit performance estimation (library characterization)
bull VBTI and Vlib are required in signoff
bull VBTI and Vlib selection should consider BTI + AVS interaction
bull Aging and Vfinal are unknowns before circuit implementation
BTI degradation and AVS
Vfinal
VBTI |Vt|
Step 1
Vlib
Derated library
Step 2
Circuit implementation and
signoff
circuit
Step 3
28ISVLSI-2014 invited talk 140710
Library Characterization for AVS
bull VBTI = Voltage for BTI aging estimation
bull Vlib = Voltage for circuit performance estimation (library characterization)
bull VBTI and Vlib are required in signoff
bull VBTI and Vlib depend on aging during AVS
bull Aging and Vfinal are unknowns before circuit implementation
Vlib
VBTI Derated library
|Vt| Circuit implementation and
signoff
circuitBTI degradation and AVS
Vfinal
Step 1 Step 2 Step 3
No obvious guideline to define VBTI and Vlib
Inconsistency among Vfinal Vlib VBTI
bull What is the design overhead when timing libraries are not properly characterized
bull Can we define BTI- and AVS-aware signoff corners that ensure product goals with small design lifetime energy overheads Joint work with Wei-Ting Jonas Chan Tuck-Boon Chan Siddhartha Nath
29ISVLSI-2014 invited talk 140710
Power vs Area Across Different Signoffs
ldquoKneerdquo point for balanced area and power tradeoff
Optimistic signoff corner bull AVS increases supply voltage
aggressively to compensate aging
bull Large lifetime energy overhead
bull May fail to meet timing if desired supply voltage gt Vmax
bull Selection of signoff modes affects area power
bull ASP-DAC 2013 Optimization of signoff modes
Improve performance power or area
Reduce overdesign
NOM
ODNOM
OD
time
Vdd
tnom tOD tnom tOD
Also Multi-Mode Signoff Choices Matter
12
Fix fOD still 14 power range
Power of circuits w different overdrive modes
Different overdrive modes 26 power range
fnom = 800MHz Vnom = 08V
36ISVLSI-2014 invited talk 140710
Also Tunable Monitors Less Margin
Aggressive config Vmin_est lt Vmin_chip Some chips will fail
Optimized configbull Increase high
resistance passgatesbull Vmin_est asymp Vmin_chip
Default config
bull Low resistance passgates
bull Guardband for worst-case
bull Vmin_est gt Vmin_chip
bull 13mV margin
37ISVLSI-2014 invited talk 140710
Also Tunable Monitors Less Margin
Aggressive config Vmin_est lt Vmin_chip Some chips will fail
Default config
bull Low resistance passgates
bull Guardband for worst-case
bull Vmin_est gt Vmin_chip
bull 13mV margin
Optimized configbull Increase high
resistance passgatesbull Vmin_est asymp Vmin_chip
Benefits of tunability bull Compensate for difference
between model vs siliconbull Recover margin when variation is
reduced due to improved process
38ISVLSI-2014 invited talk 140710
Outlinebull Introductionbull Modeling of IC Variabilitybull Margining of IC Variabilitybull Tolerance of IC Variabilitybull Conclusions
39ISVLSI-2014 invited talk 140710
Conclusionsbull Variability severely challenges IC value
bull In manufacturing process during operation across lifetime
bull Benefit of ldquonext noderdquo is increasingly hard to findbull Entire node is a ldquo202020rdquo value propositionbull 5-10 in PPA metrics is now substantial at leading edge
bull Variability is connected to tapeout IC properties by models margins tolerances used in signoff
bull Some takeaways from this talkbull Substantial benefit from tightening BEOL corners (= signoff)bull ldquoMinimum cost of resiliencerdquo is a rich optimization challengebull Chicken-egg loops in signoff definition can be brokenbull Holistic approaches will provide ldquoequivalent scalingrdquo that
extends the value trajectory of Moorersquos Law
40ISVLSI-2014 invited talk 140710
Thank You
41ISVLSI-2014 invited talk 140710
Backup
42ISVLSI-2014 invited talk 140710
Power Penalty to Fix EM with AVS
1 2 3 4 5 6 7 8 91200
1300
1400
1500
1600
1700
030
032
034
036
Core Power (mW) PG Power (mW)
Implemetation
Core
Pow
er (m
W)
PG
Pow
er (m
W)
bull Core power increases due to elevated voltage bull PG power increases due to both elevated voltage and mesh degradationbull A tradeoff between invested guardband in signoff
Highest invested guardband
Least invested guardband
14 power penalty
>
43ISVLSI-2014 invited talk 140710
Homogeneous Cornersbull (1) Define RC corners of each layer separatelybull (2) Use corners from each layer to construct a
homogeneous corner for an interconnect stack
C-3σ
Layer M2
3σ
C-3σ
Layer M1
3σ
Interconnect stack with M1 and M2
M1 C
M2 C
3σ Pessimism
Example worst-case capacitance corner Homogeneous
Cw corner
44ISVLSI-2014 invited talk 140710
Homogeneous Cornersbull (1) Define RC corners of each layer separatelybull (2) Use corners from each layer to construct a
homogeneous corner for an interconnect stack
Interconnect stack with M1 and M2
M1 C
M2 C
3σ
Homogeneous Cw corner
C-3σ
Layer M2
3σ
C-3σ
Layer M1
3σ
Pessimism
Example worst-case capacitance corner
When variations in different layers are not fully correlated pessimism of homogeneous corners increase with layers
45ISVLSI-2014 invited talk 140710
Correlation Matrixbull Let Σ be the correlation matrix for variation sources
M1 M2 M3 M4
ΔW ΔT ΔH ΔW ΔT ΔH ΔW ΔT ΔH ΔW ΔT ΔH
M1 ΔW 1 0 0 γ 0 0 γ 0 0 0 0 0
ΔT 0 1 0 0 γ 0 0 γ 0 0 0 0
ΔH 0 0 1 0 0 γ 0 0 γ 0 0 0
M2 ΔW γ 0 0 1 0 0 γ 0 0 0 0 0
ΔT 0 γ 0 0 1 0 0 γ 0 0 0 0
ΔH 0 0 γ 0 0 1 0 0 γ 0 0 0
M3 ΔW γ 0 0 γ 0 0 1 0 0 0 0 0
ΔT 0 γ 0 0 γ 0 0 1 0 0 0 0
ΔH 0 0 γ 0 0 γ 0 0 1 0 0 0
M4 ΔW 0 0 0 0 0 0 0 0 0 1 0 0
ΔT 0 0 0 0 0 0 0 0 0 0 1 0
ΔH 0 0 0 0 0 0 0 0 0 0 0 1
= Σ
Correlation for variation sources with the same variation type and in the process module γ 05
Variation sources in different process modules are independent
46ISVLSI-2014 invited talk 140710
Wiring Structure in Timing-Critical Paths (2)
bull 92 of paths have lt 60 of wirelength on any single layer
Max wirelength ratio across all layers ()
Cum
ulati
ve p
roba
bilit
y
092
60
bull Variations in different layers are not fully correlated
bull Some paths have α gt 10 a CBC can underestimate delay variationsbull But these paths have larger delays at the other corner
C-worst corner underestimates delay variations but these paths are dominated by the RC-worst corner
α lt 10 delay variations are covered by the RC-worst corner
Dominated by C-worstΔdelay at C-worst gt Δdelay at RC-worst
Dominated by RC-worst Δdelay at RC-worst gt Δdelay at C-worst
Δdelay at RC-worst [d(Ycw) ndash d(Ytyp)] d(Ytyp)
48ISVLSI-2014 invited talk 140710
Delay Variation
α α
Δdelay at C-worst [d(Ycw) ndash d(Ytyp)] d(Ytyp)
bull Some paths have α gt 10 a CBC can underestimate delay variationsbull But these paths have larger delays at the other corner
C-worst corner underestimates delay variations but these paths are dominated by the RC-worst corner
α lt 10 delay variations are covered by the RC-worst corner
Dominated by C-worstΔdelay at C-worst gt Δdelay at RC-worst
Dominated by RC-worst Δdelay at RC-worst gt Δdelay at C-worst
Δdelay at RC-worst [d(Ycw) ndash d(Ytyp)] d(Ytyp)
bull Paths are more sensitive to R or to Cbull Using RC-worst or C-worst only will underestimate delay variationsbull Need both RC- and C-worst corners to cover process variationsbull In the following discussions α is defined at the dominant corner
49ISVLSI-2014 invited talk 140710
Non-Homogeneous Corner
bull Each layer can have different skewed variationsInterconnect stack with M1 and M2
M1 C
M2 C
3σ
Non-homogeneous cornerM1 == Cw (3σ)M2 == Ctyp
bull Less pessimism with non-homogeneous cornersbull Challenge
bull Many feasible combinationsbull A corner can only cover certain pathsbull How to choose the best combinations
50ISVLSI-2014 invited talk 140710
Opportunities for Tightened BEOL Corners
bull CBC can be pessimistic Most paths have α lt 05 bull Use tightened BEOL corners eg scale BEOL variation in
itf with α = 05
Δdj(Yrcw)dj(Ytyp) x 100
3σjd(Ytyp) x 100
Challenge how to avoid underestimating delay variation to preserve parametric yield
51ISVLSI-2014 invited talk 140710
Wiring Structure in Timing-Critical Paths
bull Critical paths are structurally similar
bull Wires on critical paths are routed on many layers
bull Structure is an outcome of the design flow
Testcasebull 45nm foundry library (wire
resistivity scaled by 8X)bull Netlist NETCARD 1mm2 570K
standard cell instancesbull 9 metal layersbull Extract critical paths from
different PVT and BEOL corners
Wirelength ratio ()
52ISVLSI-2014 invited talk 140710
Proposed Timing Signoff Flow
bull Extract RC at RC-worst C-worst and the typical corners
bull Calculate Δdelay of critical paths
bull Put path j in the group Gtbc if Δdelay is larger than a threshold
bull Fix only the paths in Gtbc using tightened BEOL corners
bull Since tightened corners have smaller delay variations timing closure is easier
Routed design
Timing analysis at BEOL corners Ytyp Ycw Yrcw
GTBC GCBC
ECOusing CBC
Timing analysis
using TBC
violation = 0
Timing analysis
using CBC
violation = 0
ECOusing TBC
done
53ISVLSI-2014 invited talk 140710
Experiment Setup
LEON3MP NETCARD SUPERBLUE12Clock period (ns) 18 20 31
bull Model BTI degradation with Vfinal throughout lifetime
bull Aging of a flat Vfinal asymp aging of an adaptive Vdd
bull But slightly pessimistic
Vdd
time
NBTI
PBTI
VBTI = Vlib asymp Vfinal
58ISVLSI-2014 invited talk 140710
Vfinal Estimation
bull Problem Vfinal is not available at early design stage (design has not been implemented)
bull Vfinal = Vdd end of life (to compensate BTI aging)
bull Gates along critical pathbull Timing slack at t = 0
bull Circuit activity is not an issue bull Because BTI effect is not sensitive to circuit activitybull DC or AC stress model is sufficient
59ISVLSI-2014 invited talk 140710
Observation and Heuristic 2
bull Observation 2 Vfinal is not sensitive to gate types
bull Heuristic 2 use average Vfinal of different gate typesbull Vfinal is a function of timing slack
bull Assume timing slack = 0
10mV
60ISVLSI-2014 invited talk 140710
Technology and Benchmark Circuits
bull NANGATE library with 32nm PTM technology bull Signoff for setup time violationbull Temperature = 125Cbull Process corner = slow NMOS and PMOSbull BTI degradation = DC AC
Supply voltages
Circuit Frequency (GHz)C5315 138c7552 125AES 089MPEG2 105
Vmax105V
Vinit090V
Vheur1 (DC) 097V
Vheur1 (AC) 095V
Vheur2 (DC) 095V
Vheur2 (AC) 093V
61ISVLSI-2014 invited talk 140710
A Reference Signoff Flow
bull Basic idea keep a consistent VBTI VLIB and Vdd throughout circuit lifetime
bull Signoff flowbull Estimate aging at each time step
bull Update circuit timing and Vdd
bull Repeat until t = tfinal
bull Modify circuit and start over if Vfinal gt maximum allowed voltage
bull No overhead in timing analysis but very slow Many STA runs
and library
Vstep AVS voltage stepVfinal converged voltage
62ISVLSI-2014 invited talk 140710
Experiment Setupbull Characterize different derated libraries
bull Evaluate impact of library characterizationbull Seven setups
1 VBTI = Vlib = Vinit Ignore AVS2 Most pessimistic derated library3 VBTI = Vlib = Vmax Extreme corner for AVS4 VBTI = Vfinal Do not overestimate aging but ignores AVS5 No derated library (reference)6 Proposed method with α=07 Proposed method with α=003
Case 1 2 3 4 5 6 7Vlib(V) Vinit Vinit Vmax Vinit NA Vheur1 Vheur2
VBTI (V) Vinit Vmax Vmax Vfinal NA Vheur1 Vheur2
63ISVLSI-2014 invited talk 140710
ldquoChicken and Eggrdquo Loop
bull ldquoChicken and eggrdquo loop in signoffbull Derated library characterization is related to BTI + AVSbull AVS affected by circuit implementation
bull Timing constraints critical paths etc
bull Circuit is affected by library characterization
Circuit
Derated Libraries
Vfinal
Vlib VBTI
64ISVLSI-2014 invited talk 140710
Bias Temperature Instability (BTI)
|ΔVth| increases when device is on (stressed)|ΔVth| is partially recovered when device is off (relaxed)NBTI PMOS PBTINMOS
|Vgs|
time
ON OFF ON OFF
[VattikondaWC06]
Device aging (|ΔVth|) accumulates over time
[TCASrsquo14]
65ISVLSI-2014 invited talk 140710
Observation 1
[Chan11]
bull BTI is a ldquofront-loadedrdquo phenomenon
bull 50 BTI aging happens within the 1st year of circuit lifetime (total lifetime = 10 years)
bull Most Vdd increment happens in early lifetime
bull Gap between Vdd and Vfinal reduces rapidly
asymp70 Vdd increment in 1 year(remaining 30 over 9 years)
Vfinal
66ISVLSI-2014 invited talk 140710
Results for DC Scenario
Optimistic signoff corner bull AVS increases supply voltage
aggressively to compensate agingbull Consume more powerbull May fail to meet timing if desired
bull Assume no EM fix at signoffbull BTI degradation is checked at each step and MTTF is updated as
30 MTTF penalty
200mV voltage compensation
>
70ISVLSI-2014 invited talk 140710
0 2 4 6 8 10 12090092094096098100102104
S1 S2 S3 S4 S5
Year
VDD
DMA 3S1 S2 S3 S4 S5
78
79
80
81
MTT
F (Y
ear)
EM Impact on AVS Scheduling
12 years MTTF penalty
>
71ISVLSI-2014 invited talk 140710
What is ldquoSignoffrdquo
bull Foundation of contract between design house and foundrybull ldquochip should workrdquo stack of models margins analysesbull Function timing signal integrity power integrity hellip
Nominal VddStatic IR drop
Power grid IR gradientDynamic IR
HCINBTI
Signoff Vdd
Voltage
Problem Margins = pessimism
overdesign schedule delay
ldquomargin stackrdquo for voltage signoff
Operating voltage
72ISVLSI-2014 invited talk 140710
Statistical Timing Analysis (1)
bull Delay sensitivity of path pj to variation source zv
bull Assumptions bull Δdjv is linear with respect to variation sources
bull Variation sources are normal distributions
bull Obtain Δdjv using 28 runs of RC extraction and static timing analysis (STA)
28 itf files (27 variation
sources + Ytyp)
Routed Netlist
RC extraction
STA
Δdjv
Δdjv = [ - ] 3dj(Yv) dj(Ytyp)
Note Path delay includes gate and wire delays
73ISVLSI-2014 invited talk 140710
Statistical Timing Analysis (2)bull Σ is the correlation matrix for variation sources (eg 27 x 27)
bull Σ = λλT (Note λ is obtained by Cholesky decomposition)
h h119879 119903119900119906119892 119901119906119905=1minus119864119877119879
+1minus119864119877119903times119879
recovery cycles
Clock period
Error rate [Kahng10]
76ISVLSI-2014 invited talk 140710
Selective-Endpoint Optimizationbull Optimize fanin cone w tighter constraints Allows replacement of Razor FF w normal FFbull Trade off cost of resilience vs data path optimization
Energy Reduction in AVS Contextbull Adaptive voltage scaling allows lower supply voltage for resilient
designs thus reduced powerbull Proposed method trades off between timing-error penalty vs
reduced power at a lower supply voltagebull Proposed method achieves an average of 18 energy reduction
compared to pure-margin designs Resilience benefits increase in the context of AVS strategy
084 088 092 096 10030
36
42
48
54
60brute-forcepure-marginCombOpt
Supply voltage (V)
En
erg
y (
mJ
)
084 086 088 09 092 094 096 098 1 10225
29
33
37
41
45brute-force
pure-margin
CombOpt
Supply voltage (V)
En
erg
y (
mJ
)
MUL EXU
Minimum achievable energy
80ISVLSI-2014 invited talk 140710
Our Concept Mode Dominancebull Design cone (of mode A) is the union of all the feasible operating modes for
circuits signed of at mode Abull Design cone is determined by tradeoff between voltage and frequency (mainly
threshold voltages)bull One mode is outside of the design cone of the other
failed design overdesignbull Mode A has positive timing slacks with respect to mode B
mode A dominates mode Bbull Equivalent dominance no mode is dominated by the other
bull Modes are in each othersrsquo design cone
Voltage
Frequency
A
Negative Slacks = failed design
Positive Slacks = overdesign
B
C
Design Cone of mode A
Multi-mode signoff at modes which do not exhibit equivalent dominance leads to overdesign
Guideline search for signoff modes within design cone reduce overdesign
81ISVLSI-2014 invited talk 140710
Our Method Global Optimizationbull Iteratively sample and refine power models
bull Avoid circuit implementation at each modebull Small constant of runs is enough Scalable
Sample (SPampR)
Construct power models
Estimate optimal signoff modes
Sample (SPampR)
Refine power models
Adaptive search
Global optimization flow
09 10 11 1214
15
16
17
18
19
201st 2nd real
Signoff Voltage (v)
Po
wer
(m
W)
Power estimation of adaptive search
bull Ovals indicate sample pointsbull 1st 2nd power from power models at first
second iterationbull real power from real implemented circuits
Design AESf 700MHz
82ISVLSI-2014 invited talk 140710
Classes of Closed-Loop AVS
bull Critical path may be difficult to identify (IP from 3rd party)
bull Calibrating monitors at multiple modesvoltages requires long test time
Closed-Loop AVS
Design-dependent replica
In-situmonitor
Generic monitor
bull Does not capture design-specific performance variation
82
This work Tunable monitor for closed-loop AVSbull Can be applied as a generic monitorbull Or tuned to capture design-specific performance
83ISVLSI-2014 invited talk 140710
Design of RO with Tunable Vmin
bull Identified two circuit knobs to tune Vmin
bull Series resistancebull Cell types (INV NAND NOR)
bull Proposed circuitbull Different cell type covers different process cornersbull Tune series resistance of each stage to high or low
1 bit 1 bit 1 bit Control pins
High resistance
Low resistance
84ISVLSI-2014 invited talk 140710
Benefit of Resilience Cost Reductionbull Reference flows
bull Pure-margin (PM) conventional methodology w only margin insertionbull Brute-force (BF) insert error-tolerant FFs at timing-critical endpoints
bull Proposed method (CO) achieves up to 20 energy reduction compared to reference methods
bull Resilience benefits increase with safety margin
PM BF CO PM BF CO PM BF CO25
30
35
40
45
50
55 Energy penalty of throughput degradationEnergy penalty of additional circuitsEnergy wo resilience
En
erg
y (
mJ
)
Large marginMedium marginSmall margin
MUL
PM BF CO PM BF CO PM BF CO25
27
29
31
33
35
En
erg
y (
mJ
)
EXU
Large marginMedium marginSmall margin
Smallmediumlarge margin safety margin = 51015 of clock period
85ISVLSI-2014 invited talk 140710
Increased Benefit of Resilience With AVSbull AVS (Adaptive Voltage Scaling) allows lower supply voltage for
resilient designs reduced powerbull We trade off between timing-error penalty vs reduced power at a
lower supply voltagebull Average 18 energy reduction compared to pure-margin designs
Resilience benefits increase in AVS context
084 088 092 096 10030
36
42
48
54
60brute-forcepure-marginCombOpt
Supply voltage (V)
En
erg
y (
mJ
)
084 086 088 09 092 094 096 098 1 10225
29
33
37
41
45brute-force
pure-margin
CombOpt
Supply voltage (V)
En
erg
y (
mJ
)
MUL EXU
Minimum achievable energy
86ISVLSI-2014 invited talk 140710
Overall Optimization Flow
bull Iteratively optimize with SEOpt and SkewOpt
Initial placement (all FFs = error-tolerant FFs)
Energy lt min energy
Save current solution
Margin insertion on K paths based on sensitivity function
Replace error-tolerant FFs w normal FFs
SEOpt
Activity-aware clock skew optimization
SkewOpt
Toward Holistic Modeling Margining and Tolerance of IC Variabi
IC Variability
Challenge Value of Technology
Solutions Modeling Margining Tolerance
Outline
BEOL Corner Optimization
Proposed Timing Signoff Flow
Conventional BEOL Corners
Statistical RC Model
Pessimism of Conventional BEOL Corners (CBC)
Intuition on Delay Variability Across Cw RCw
Intuition on Delay Variability Across Cw RCw (2)
Scaling Factor α and Delay Variation
Find Paths for Which TBCs Can Be Used
Determining α Arcw and Acw
Benefits of Tightened BEOL Corners
Outline (2)
How to Minimize Cost of Resilience
Tradeoff Resilience Cost vs Datapath Cost
Selective-Endpoint Optimization (SEOpt)
Clock Skew Optimization (SkewOpt)
Overall Optimization Flow
Benefit of Low-Cost Resilience
Increased Benefit of Resilience with AVS
Outline (3)
Breaking Chicken-Egg Loops Less Margin
Derated Library Characterization and AVS
Library Characterization for AVS
Power vs Area Across Different Signoffs
Heuristics 1
Vfinal Estimation
Observation and Heuristic 2
Proposed Library Characterization Flow
Power vs Area for All Designs
Also Multi-Mode Signoff Choices Matter
Also Tunable Monitors Less Margin
Also Tunable Monitors Less Margin (2)
Outline (4)
Conclusions
Thank You
Backup
Power Penalty to Fix EM with AVS
Homogeneous Corners
Homogeneous Corners (2)
Correlation Matrix
Wiring Structure in Timing-Critical Paths (2)
Delay Variation
Delay Variation (2)
Non-Homogeneous Corner
Opportunities for Tightened BEOL Corners
Wiring Structure in Timing-Critical Paths (2)
Proposed Timing Signoff Flow (2)
Experiment Setup
Further Analysis
Scaling Factor Results
Benefits of Tightened BEOL Corners (1)
Heuristics 1 (2)
Vfinal Estimation (2)
Observation and Heuristic 2 (2)
Technology and Benchmark Circuits
A Reference Signoff Flow
Experiment Setup (2)
ldquoChicken and Eggrdquo Loop
Bias Temperature Instability (BTI)
Observation 1
Results for DC Scenario
Problem Signoff Corner Definition
AVS Signoff Corner Selection
AVS Impact on EM Lifetime
EM Impact on AVS Scheduling
What is ldquoSignoffrdquo
Statistical Timing Analysis (1)
Statistical Timing Analysis (2)
Resilient Designs
Resilience Cost Reduction Problem
Selective-Endpoint Optimization
Process-Aware Vdd Scaling (PVS)
Challenge Variability
Energy Reduction in AVS Context
Our Concept Mode Dominance
Our Method Global Optimization
Classes of Closed-Loop AVS
Design of RO with Tunable Vmin
Benefit of Resilience Cost Reduction
Increased Benefit of Resilience With AVS
Overall Optimization Flow (2)
14ISVLSI-2014 invited talk 140710
bull Paths with small Δdrcw and Δdcw have large α
bull Eg there are αj gt 06 when ((Δdrcw lt 3) AND (Δdcw lt 3))
bull Identify paths for tightened BEOL corners based on Δdrcw and Δdcw
Find Paths for Which TBCs Can Be Used
Δd(Ycw)d(Ytyp)
Δd(Yrcw)d(Ytyp)
Acw
Arcw
Gtbc = Set of paths that can be safely signed off using TBC ( (Path with Δdcw larger than Acw) OR (Path with Δdrcw larger than Arcw) )
α
15ISVLSI-2014 invited talk 140710
Determining α Arcw and Acw
Δd at C-worst corner ()Δd at RC-worst corner ()
bull Assumption critical paths in different designs have similar trends
bull Extract Arcw and Acw from a set of representative paths
bull Plot α vs Δdelay find Arcw and Acw for a given α
bull Add +1 margin on Arcw and Acw to account for sampling error
bull Smaller α larger thresholds (Arcw and Acw) fewer paths in GTBC
Δd at C-worst corner ()
Arcw Acw
16ISVLSI-2014 invited talk 140710
Benefits of Tightened BEOL Corners
bull WNS and TNS are reduced by up to 100ps and 53nsbull Timing violations reduced by
24 to 100
bull TBC-06 more benefits bull Tradeoff between reduced margin
vs paths which use TBC
Correlation factor γ = 05
LEON SUPERBLUE12 NETCARD
-018-016-014-012
-01-008-006-004-002
0
CBC TBC-05 TBC-06 TBC-07
WN
S (n
s)
LEON SUPERBLUE12 NETCARD
-90-80-70-60-50-40-30-20-10
0
CBC TBC-05 TBC-06 TBC-07
TNS
(ns)
LEON SUPERBLUE12 NETCARD0
200400600800
1000120014001600
CBC TBC-05 TBC-06 TBC-07
Tim
ing
viol
ation
s
17ISVLSI-2014 invited talk 140710
Outlinebull Introductionbull Modeling of IC Variabilitybull Tolerance of IC Variabilitybull Margining of IC Variabilitybull Conclusions
18ISVLSI-2014 invited talk 140710
How to Minimize Cost of Resilience bull Additional circuits area and power penaltiesbull Recovery from errors throughput degradationbull Large hold margin short-path padding costbull Want benefits (eg energy) to maximally outweigh costs
Razor Razor-Lite TIMBER
Razor Razor-Lite TIMBER
Power penalty 30 [Das08] ~0 [Kim13] 100 [Choudhury09]
Overall Optimization Flowbull Iteratively optimize with SEOpt and SkewOpt
Initial placement (all FFs = error-tolerant FFs)
Energy lt min energy
Save current solution
Margin insertion on K paths based on sensitivity function
Replace error-tolerant FFs w normal FFs
SEOpt
Activity aware clock skew optimization
SkewOpt
OR-tree insertion
23ISVLSI-2014 invited talk 140710
Benefit of Low-Cost Resiliencebull Reference flows
bull Pure-margin (PM) conventional method w only margin insertionbull Brute-force (BF) use error-tolerant FFs for timing-critical endpoints
bull Proposed method (CO) achieves up to 21 energy reduction compared to reference methods
bull Resilience benefits increase with larger process variation
PM BF CO PM BF CO PM BF CO27
29
31
33
35
37
En
erg
y (
mJ
)
PM BF CO PM BF CO PM BF CO22
26
30
34
38Energy penalty of throughput degradation
Energy penalty of additional circuits
Energy wo resilience
En
erg
y (
mJ
)
Large marginMedium marginSmall margin
MUL
EXU
Large marginMedium marginSmall margin
Smallmediumlarge margin 1σ2σ3σ for SS corner Technology foundry 28nm
24ISVLSI-2014 invited talk 140710
Increased Benefit of Resilience with AVSbull Adaptive voltage scaling allows a lower supply voltage for resilient
designs thus reduced powerbull Proposed method trades off between timing-error penalty vs
reduced power at a lower supply voltagebull Proposed method achieves an average of 17 energy reduction
compared to pure-margin designs Resilience benefits increase in the context of AVS strategy
086 09 094 098 10225
30
35
40
45
50pure-marginbrute-forceCombOpt
Supply voltage (V)
En
erg
y (
mJ
)
070 072 074 076 078 08024
26
28
30
32
34
36 pure-marginbrute-forceCombOpt
Supply voltage (V)
En
erg
y (
mJ
)
MUL EXU
Minimum achievable energy
Technology foundry 28nm
25ISVLSI-2014 invited talk 140710
Outlinebull Introductionbull Modeling of IC Variabilitybull Tolerance of IC Variabilitybull Margining of IC Variabilitybull Conclusions
26ISVLSI-2014 invited talk 140710
Breaking Chicken-Egg Loops Less Marginbull Example Interaction between reliability margin and AVS designs
bull Bias temperature instability (BTI) aging higher |ΔVth| lower fmax
bull AVS can be used to compensate for performance degradation
Circuit
Closed-loop AVS
On-chip aging
monitor
Circuit performanc
e
Voltage regulato
r
Circuit frequency
Vdd
time
time
Without AVSWith AVS
target
27ISVLSI-2014 invited talk 140710
Derated Library Characterization and AVS
bull VBTI = Voltage for BTI aging estimation
bull Vlib = Voltage for circuit performance estimation (library characterization)
bull VBTI and Vlib are required in signoff
bull VBTI and Vlib selection should consider BTI + AVS interaction
bull Aging and Vfinal are unknowns before circuit implementation
BTI degradation and AVS
Vfinal
VBTI |Vt|
Step 1
Vlib
Derated library
Step 2
Circuit implementation and
signoff
circuit
Step 3
28ISVLSI-2014 invited talk 140710
Library Characterization for AVS
bull VBTI = Voltage for BTI aging estimation
bull Vlib = Voltage for circuit performance estimation (library characterization)
bull VBTI and Vlib are required in signoff
bull VBTI and Vlib depend on aging during AVS
bull Aging and Vfinal are unknowns before circuit implementation
Vlib
VBTI Derated library
|Vt| Circuit implementation and
signoff
circuitBTI degradation and AVS
Vfinal
Step 1 Step 2 Step 3
No obvious guideline to define VBTI and Vlib
Inconsistency among Vfinal Vlib VBTI
bull What is the design overhead when timing libraries are not properly characterized
bull Can we define BTI- and AVS-aware signoff corners that ensure product goals with small design lifetime energy overheads Joint work with Wei-Ting Jonas Chan Tuck-Boon Chan Siddhartha Nath
29ISVLSI-2014 invited talk 140710
Power vs Area Across Different Signoffs
ldquoKneerdquo point for balanced area and power tradeoff
Optimistic signoff corner bull AVS increases supply voltage
aggressively to compensate aging
bull Large lifetime energy overhead
bull May fail to meet timing if desired supply voltage gt Vmax
bull Selection of signoff modes affects area power
bull ASP-DAC 2013 Optimization of signoff modes
Improve performance power or area
Reduce overdesign
NOM
ODNOM
OD
time
Vdd
tnom tOD tnom tOD
Also Multi-Mode Signoff Choices Matter
12
Fix fOD still 14 power range
Power of circuits w different overdrive modes
Different overdrive modes 26 power range
fnom = 800MHz Vnom = 08V
36ISVLSI-2014 invited talk 140710
Also Tunable Monitors Less Margin
Aggressive config Vmin_est lt Vmin_chip Some chips will fail
Optimized configbull Increase high
resistance passgatesbull Vmin_est asymp Vmin_chip
Default config
bull Low resistance passgates
bull Guardband for worst-case
bull Vmin_est gt Vmin_chip
bull 13mV margin
37ISVLSI-2014 invited talk 140710
Also Tunable Monitors Less Margin
Aggressive config Vmin_est lt Vmin_chip Some chips will fail
Default config
bull Low resistance passgates
bull Guardband for worst-case
bull Vmin_est gt Vmin_chip
bull 13mV margin
Optimized configbull Increase high
resistance passgatesbull Vmin_est asymp Vmin_chip
Benefits of tunability bull Compensate for difference
between model vs siliconbull Recover margin when variation is
reduced due to improved process
38ISVLSI-2014 invited talk 140710
Outlinebull Introductionbull Modeling of IC Variabilitybull Margining of IC Variabilitybull Tolerance of IC Variabilitybull Conclusions
39ISVLSI-2014 invited talk 140710
Conclusionsbull Variability severely challenges IC value
bull In manufacturing process during operation across lifetime
bull Benefit of ldquonext noderdquo is increasingly hard to findbull Entire node is a ldquo202020rdquo value propositionbull 5-10 in PPA metrics is now substantial at leading edge
bull Variability is connected to tapeout IC properties by models margins tolerances used in signoff
bull Some takeaways from this talkbull Substantial benefit from tightening BEOL corners (= signoff)bull ldquoMinimum cost of resiliencerdquo is a rich optimization challengebull Chicken-egg loops in signoff definition can be brokenbull Holistic approaches will provide ldquoequivalent scalingrdquo that
extends the value trajectory of Moorersquos Law
40ISVLSI-2014 invited talk 140710
Thank You
41ISVLSI-2014 invited talk 140710
Backup
42ISVLSI-2014 invited talk 140710
Power Penalty to Fix EM with AVS
1 2 3 4 5 6 7 8 91200
1300
1400
1500
1600
1700
030
032
034
036
Core Power (mW) PG Power (mW)
Implemetation
Core
Pow
er (m
W)
PG
Pow
er (m
W)
bull Core power increases due to elevated voltage bull PG power increases due to both elevated voltage and mesh degradationbull A tradeoff between invested guardband in signoff
Highest invested guardband
Least invested guardband
14 power penalty
>
43ISVLSI-2014 invited talk 140710
Homogeneous Cornersbull (1) Define RC corners of each layer separatelybull (2) Use corners from each layer to construct a
homogeneous corner for an interconnect stack
C-3σ
Layer M2
3σ
C-3σ
Layer M1
3σ
Interconnect stack with M1 and M2
M1 C
M2 C
3σ Pessimism
Example worst-case capacitance corner Homogeneous
Cw corner
44ISVLSI-2014 invited talk 140710
Homogeneous Cornersbull (1) Define RC corners of each layer separatelybull (2) Use corners from each layer to construct a
homogeneous corner for an interconnect stack
Interconnect stack with M1 and M2
M1 C
M2 C
3σ
Homogeneous Cw corner
C-3σ
Layer M2
3σ
C-3σ
Layer M1
3σ
Pessimism
Example worst-case capacitance corner
When variations in different layers are not fully correlated pessimism of homogeneous corners increase with layers
45ISVLSI-2014 invited talk 140710
Correlation Matrixbull Let Σ be the correlation matrix for variation sources
M1 M2 M3 M4
ΔW ΔT ΔH ΔW ΔT ΔH ΔW ΔT ΔH ΔW ΔT ΔH
M1 ΔW 1 0 0 γ 0 0 γ 0 0 0 0 0
ΔT 0 1 0 0 γ 0 0 γ 0 0 0 0
ΔH 0 0 1 0 0 γ 0 0 γ 0 0 0
M2 ΔW γ 0 0 1 0 0 γ 0 0 0 0 0
ΔT 0 γ 0 0 1 0 0 γ 0 0 0 0
ΔH 0 0 γ 0 0 1 0 0 γ 0 0 0
M3 ΔW γ 0 0 γ 0 0 1 0 0 0 0 0
ΔT 0 γ 0 0 γ 0 0 1 0 0 0 0
ΔH 0 0 γ 0 0 γ 0 0 1 0 0 0
M4 ΔW 0 0 0 0 0 0 0 0 0 1 0 0
ΔT 0 0 0 0 0 0 0 0 0 0 1 0
ΔH 0 0 0 0 0 0 0 0 0 0 0 1
= Σ
Correlation for variation sources with the same variation type and in the process module γ 05
Variation sources in different process modules are independent
46ISVLSI-2014 invited talk 140710
Wiring Structure in Timing-Critical Paths (2)
bull 92 of paths have lt 60 of wirelength on any single layer
Max wirelength ratio across all layers ()
Cum
ulati
ve p
roba
bilit
y
092
60
bull Variations in different layers are not fully correlated
bull Some paths have α gt 10 a CBC can underestimate delay variationsbull But these paths have larger delays at the other corner
C-worst corner underestimates delay variations but these paths are dominated by the RC-worst corner
α lt 10 delay variations are covered by the RC-worst corner
Dominated by C-worstΔdelay at C-worst gt Δdelay at RC-worst
Dominated by RC-worst Δdelay at RC-worst gt Δdelay at C-worst
Δdelay at RC-worst [d(Ycw) ndash d(Ytyp)] d(Ytyp)
48ISVLSI-2014 invited talk 140710
Delay Variation
α α
Δdelay at C-worst [d(Ycw) ndash d(Ytyp)] d(Ytyp)
bull Some paths have α gt 10 a CBC can underestimate delay variationsbull But these paths have larger delays at the other corner
C-worst corner underestimates delay variations but these paths are dominated by the RC-worst corner
α lt 10 delay variations are covered by the RC-worst corner
Dominated by C-worstΔdelay at C-worst gt Δdelay at RC-worst
Dominated by RC-worst Δdelay at RC-worst gt Δdelay at C-worst
Δdelay at RC-worst [d(Ycw) ndash d(Ytyp)] d(Ytyp)
bull Paths are more sensitive to R or to Cbull Using RC-worst or C-worst only will underestimate delay variationsbull Need both RC- and C-worst corners to cover process variationsbull In the following discussions α is defined at the dominant corner
49ISVLSI-2014 invited talk 140710
Non-Homogeneous Corner
bull Each layer can have different skewed variationsInterconnect stack with M1 and M2
M1 C
M2 C
3σ
Non-homogeneous cornerM1 == Cw (3σ)M2 == Ctyp
bull Less pessimism with non-homogeneous cornersbull Challenge
bull Many feasible combinationsbull A corner can only cover certain pathsbull How to choose the best combinations
50ISVLSI-2014 invited talk 140710
Opportunities for Tightened BEOL Corners
bull CBC can be pessimistic Most paths have α lt 05 bull Use tightened BEOL corners eg scale BEOL variation in
itf with α = 05
Δdj(Yrcw)dj(Ytyp) x 100
3σjd(Ytyp) x 100
Challenge how to avoid underestimating delay variation to preserve parametric yield
51ISVLSI-2014 invited talk 140710
Wiring Structure in Timing-Critical Paths
bull Critical paths are structurally similar
bull Wires on critical paths are routed on many layers
bull Structure is an outcome of the design flow
Testcasebull 45nm foundry library (wire
resistivity scaled by 8X)bull Netlist NETCARD 1mm2 570K
standard cell instancesbull 9 metal layersbull Extract critical paths from
different PVT and BEOL corners
Wirelength ratio ()
52ISVLSI-2014 invited talk 140710
Proposed Timing Signoff Flow
bull Extract RC at RC-worst C-worst and the typical corners
bull Calculate Δdelay of critical paths
bull Put path j in the group Gtbc if Δdelay is larger than a threshold
bull Fix only the paths in Gtbc using tightened BEOL corners
bull Since tightened corners have smaller delay variations timing closure is easier
Routed design
Timing analysis at BEOL corners Ytyp Ycw Yrcw
GTBC GCBC
ECOusing CBC
Timing analysis
using TBC
violation = 0
Timing analysis
using CBC
violation = 0
ECOusing TBC
done
53ISVLSI-2014 invited talk 140710
Experiment Setup
LEON3MP NETCARD SUPERBLUE12Clock period (ns) 18 20 31
bull Model BTI degradation with Vfinal throughout lifetime
bull Aging of a flat Vfinal asymp aging of an adaptive Vdd
bull But slightly pessimistic
Vdd
time
NBTI
PBTI
VBTI = Vlib asymp Vfinal
58ISVLSI-2014 invited talk 140710
Vfinal Estimation
bull Problem Vfinal is not available at early design stage (design has not been implemented)
bull Vfinal = Vdd end of life (to compensate BTI aging)
bull Gates along critical pathbull Timing slack at t = 0
bull Circuit activity is not an issue bull Because BTI effect is not sensitive to circuit activitybull DC or AC stress model is sufficient
59ISVLSI-2014 invited talk 140710
Observation and Heuristic 2
bull Observation 2 Vfinal is not sensitive to gate types
bull Heuristic 2 use average Vfinal of different gate typesbull Vfinal is a function of timing slack
bull Assume timing slack = 0
10mV
60ISVLSI-2014 invited talk 140710
Technology and Benchmark Circuits
bull NANGATE library with 32nm PTM technology bull Signoff for setup time violationbull Temperature = 125Cbull Process corner = slow NMOS and PMOSbull BTI degradation = DC AC
Supply voltages
Circuit Frequency (GHz)C5315 138c7552 125AES 089MPEG2 105
Vmax105V
Vinit090V
Vheur1 (DC) 097V
Vheur1 (AC) 095V
Vheur2 (DC) 095V
Vheur2 (AC) 093V
61ISVLSI-2014 invited talk 140710
A Reference Signoff Flow
bull Basic idea keep a consistent VBTI VLIB and Vdd throughout circuit lifetime
bull Signoff flowbull Estimate aging at each time step
bull Update circuit timing and Vdd
bull Repeat until t = tfinal
bull Modify circuit and start over if Vfinal gt maximum allowed voltage
bull No overhead in timing analysis but very slow Many STA runs
and library
Vstep AVS voltage stepVfinal converged voltage
62ISVLSI-2014 invited talk 140710
Experiment Setupbull Characterize different derated libraries
bull Evaluate impact of library characterizationbull Seven setups
1 VBTI = Vlib = Vinit Ignore AVS2 Most pessimistic derated library3 VBTI = Vlib = Vmax Extreme corner for AVS4 VBTI = Vfinal Do not overestimate aging but ignores AVS5 No derated library (reference)6 Proposed method with α=07 Proposed method with α=003
Case 1 2 3 4 5 6 7Vlib(V) Vinit Vinit Vmax Vinit NA Vheur1 Vheur2
VBTI (V) Vinit Vmax Vmax Vfinal NA Vheur1 Vheur2
63ISVLSI-2014 invited talk 140710
ldquoChicken and Eggrdquo Loop
bull ldquoChicken and eggrdquo loop in signoffbull Derated library characterization is related to BTI + AVSbull AVS affected by circuit implementation
bull Timing constraints critical paths etc
bull Circuit is affected by library characterization
Circuit
Derated Libraries
Vfinal
Vlib VBTI
64ISVLSI-2014 invited talk 140710
Bias Temperature Instability (BTI)
|ΔVth| increases when device is on (stressed)|ΔVth| is partially recovered when device is off (relaxed)NBTI PMOS PBTINMOS
|Vgs|
time
ON OFF ON OFF
[VattikondaWC06]
Device aging (|ΔVth|) accumulates over time
[TCASrsquo14]
65ISVLSI-2014 invited talk 140710
Observation 1
[Chan11]
bull BTI is a ldquofront-loadedrdquo phenomenon
bull 50 BTI aging happens within the 1st year of circuit lifetime (total lifetime = 10 years)
bull Most Vdd increment happens in early lifetime
bull Gap between Vdd and Vfinal reduces rapidly
asymp70 Vdd increment in 1 year(remaining 30 over 9 years)
Vfinal
66ISVLSI-2014 invited talk 140710
Results for DC Scenario
Optimistic signoff corner bull AVS increases supply voltage
aggressively to compensate agingbull Consume more powerbull May fail to meet timing if desired
bull Assume no EM fix at signoffbull BTI degradation is checked at each step and MTTF is updated as
30 MTTF penalty
200mV voltage compensation
>
70ISVLSI-2014 invited talk 140710
0 2 4 6 8 10 12090092094096098100102104
S1 S2 S3 S4 S5
Year
VDD
DMA 3S1 S2 S3 S4 S5
78
79
80
81
MTT
F (Y
ear)
EM Impact on AVS Scheduling
12 years MTTF penalty
>
71ISVLSI-2014 invited talk 140710
What is ldquoSignoffrdquo
bull Foundation of contract between design house and foundrybull ldquochip should workrdquo stack of models margins analysesbull Function timing signal integrity power integrity hellip
Nominal VddStatic IR drop
Power grid IR gradientDynamic IR
HCINBTI
Signoff Vdd
Voltage
Problem Margins = pessimism
overdesign schedule delay
ldquomargin stackrdquo for voltage signoff
Operating voltage
72ISVLSI-2014 invited talk 140710
Statistical Timing Analysis (1)
bull Delay sensitivity of path pj to variation source zv
bull Assumptions bull Δdjv is linear with respect to variation sources
bull Variation sources are normal distributions
bull Obtain Δdjv using 28 runs of RC extraction and static timing analysis (STA)
28 itf files (27 variation
sources + Ytyp)
Routed Netlist
RC extraction
STA
Δdjv
Δdjv = [ - ] 3dj(Yv) dj(Ytyp)
Note Path delay includes gate and wire delays
73ISVLSI-2014 invited talk 140710
Statistical Timing Analysis (2)bull Σ is the correlation matrix for variation sources (eg 27 x 27)
bull Σ = λλT (Note λ is obtained by Cholesky decomposition)
h h119879 119903119900119906119892 119901119906119905=1minus119864119877119879
+1minus119864119877119903times119879
recovery cycles
Clock period
Error rate [Kahng10]
76ISVLSI-2014 invited talk 140710
Selective-Endpoint Optimizationbull Optimize fanin cone w tighter constraints Allows replacement of Razor FF w normal FFbull Trade off cost of resilience vs data path optimization
Energy Reduction in AVS Contextbull Adaptive voltage scaling allows lower supply voltage for resilient
designs thus reduced powerbull Proposed method trades off between timing-error penalty vs
reduced power at a lower supply voltagebull Proposed method achieves an average of 18 energy reduction
compared to pure-margin designs Resilience benefits increase in the context of AVS strategy
084 088 092 096 10030
36
42
48
54
60brute-forcepure-marginCombOpt
Supply voltage (V)
En
erg
y (
mJ
)
084 086 088 09 092 094 096 098 1 10225
29
33
37
41
45brute-force
pure-margin
CombOpt
Supply voltage (V)
En
erg
y (
mJ
)
MUL EXU
Minimum achievable energy
80ISVLSI-2014 invited talk 140710
Our Concept Mode Dominancebull Design cone (of mode A) is the union of all the feasible operating modes for
circuits signed of at mode Abull Design cone is determined by tradeoff between voltage and frequency (mainly
threshold voltages)bull One mode is outside of the design cone of the other
failed design overdesignbull Mode A has positive timing slacks with respect to mode B
mode A dominates mode Bbull Equivalent dominance no mode is dominated by the other
bull Modes are in each othersrsquo design cone
Voltage
Frequency
A
Negative Slacks = failed design
Positive Slacks = overdesign
B
C
Design Cone of mode A
Multi-mode signoff at modes which do not exhibit equivalent dominance leads to overdesign
Guideline search for signoff modes within design cone reduce overdesign
81ISVLSI-2014 invited talk 140710
Our Method Global Optimizationbull Iteratively sample and refine power models
bull Avoid circuit implementation at each modebull Small constant of runs is enough Scalable
Sample (SPampR)
Construct power models
Estimate optimal signoff modes
Sample (SPampR)
Refine power models
Adaptive search
Global optimization flow
09 10 11 1214
15
16
17
18
19
201st 2nd real
Signoff Voltage (v)
Po
wer
(m
W)
Power estimation of adaptive search
bull Ovals indicate sample pointsbull 1st 2nd power from power models at first
second iterationbull real power from real implemented circuits
Design AESf 700MHz
82ISVLSI-2014 invited talk 140710
Classes of Closed-Loop AVS
bull Critical path may be difficult to identify (IP from 3rd party)
bull Calibrating monitors at multiple modesvoltages requires long test time
Closed-Loop AVS
Design-dependent replica
In-situmonitor
Generic monitor
bull Does not capture design-specific performance variation
82
This work Tunable monitor for closed-loop AVSbull Can be applied as a generic monitorbull Or tuned to capture design-specific performance
83ISVLSI-2014 invited talk 140710
Design of RO with Tunable Vmin
bull Identified two circuit knobs to tune Vmin
bull Series resistancebull Cell types (INV NAND NOR)
bull Proposed circuitbull Different cell type covers different process cornersbull Tune series resistance of each stage to high or low
1 bit 1 bit 1 bit Control pins
High resistance
Low resistance
84ISVLSI-2014 invited talk 140710
Benefit of Resilience Cost Reductionbull Reference flows
bull Pure-margin (PM) conventional methodology w only margin insertionbull Brute-force (BF) insert error-tolerant FFs at timing-critical endpoints
bull Proposed method (CO) achieves up to 20 energy reduction compared to reference methods
bull Resilience benefits increase with safety margin
PM BF CO PM BF CO PM BF CO25
30
35
40
45
50
55 Energy penalty of throughput degradationEnergy penalty of additional circuitsEnergy wo resilience
En
erg
y (
mJ
)
Large marginMedium marginSmall margin
MUL
PM BF CO PM BF CO PM BF CO25
27
29
31
33
35
En
erg
y (
mJ
)
EXU
Large marginMedium marginSmall margin
Smallmediumlarge margin safety margin = 51015 of clock period
85ISVLSI-2014 invited talk 140710
Increased Benefit of Resilience With AVSbull AVS (Adaptive Voltage Scaling) allows lower supply voltage for
resilient designs reduced powerbull We trade off between timing-error penalty vs reduced power at a
lower supply voltagebull Average 18 energy reduction compared to pure-margin designs
Resilience benefits increase in AVS context
084 088 092 096 10030
36
42
48
54
60brute-forcepure-marginCombOpt
Supply voltage (V)
En
erg
y (
mJ
)
084 086 088 09 092 094 096 098 1 10225
29
33
37
41
45brute-force
pure-margin
CombOpt
Supply voltage (V)
En
erg
y (
mJ
)
MUL EXU
Minimum achievable energy
86ISVLSI-2014 invited talk 140710
Overall Optimization Flow
bull Iteratively optimize with SEOpt and SkewOpt
Initial placement (all FFs = error-tolerant FFs)
Energy lt min energy
Save current solution
Margin insertion on K paths based on sensitivity function
Replace error-tolerant FFs w normal FFs
SEOpt
Activity-aware clock skew optimization
SkewOpt
Toward Holistic Modeling Margining and Tolerance of IC Variabi
IC Variability
Challenge Value of Technology
Solutions Modeling Margining Tolerance
Outline
BEOL Corner Optimization
Proposed Timing Signoff Flow
Conventional BEOL Corners
Statistical RC Model
Pessimism of Conventional BEOL Corners (CBC)
Intuition on Delay Variability Across Cw RCw
Intuition on Delay Variability Across Cw RCw (2)
Scaling Factor α and Delay Variation
Find Paths for Which TBCs Can Be Used
Determining α Arcw and Acw
Benefits of Tightened BEOL Corners
Outline (2)
How to Minimize Cost of Resilience
Tradeoff Resilience Cost vs Datapath Cost
Selective-Endpoint Optimization (SEOpt)
Clock Skew Optimization (SkewOpt)
Overall Optimization Flow
Benefit of Low-Cost Resilience
Increased Benefit of Resilience with AVS
Outline (3)
Breaking Chicken-Egg Loops Less Margin
Derated Library Characterization and AVS
Library Characterization for AVS
Power vs Area Across Different Signoffs
Heuristics 1
Vfinal Estimation
Observation and Heuristic 2
Proposed Library Characterization Flow
Power vs Area for All Designs
Also Multi-Mode Signoff Choices Matter
Also Tunable Monitors Less Margin
Also Tunable Monitors Less Margin (2)
Outline (4)
Conclusions
Thank You
Backup
Power Penalty to Fix EM with AVS
Homogeneous Corners
Homogeneous Corners (2)
Correlation Matrix
Wiring Structure in Timing-Critical Paths (2)
Delay Variation
Delay Variation (2)
Non-Homogeneous Corner
Opportunities for Tightened BEOL Corners
Wiring Structure in Timing-Critical Paths (2)
Proposed Timing Signoff Flow (2)
Experiment Setup
Further Analysis
Scaling Factor Results
Benefits of Tightened BEOL Corners (1)
Heuristics 1 (2)
Vfinal Estimation (2)
Observation and Heuristic 2 (2)
Technology and Benchmark Circuits
A Reference Signoff Flow
Experiment Setup (2)
ldquoChicken and Eggrdquo Loop
Bias Temperature Instability (BTI)
Observation 1
Results for DC Scenario
Problem Signoff Corner Definition
AVS Signoff Corner Selection
AVS Impact on EM Lifetime
EM Impact on AVS Scheduling
What is ldquoSignoffrdquo
Statistical Timing Analysis (1)
Statistical Timing Analysis (2)
Resilient Designs
Resilience Cost Reduction Problem
Selective-Endpoint Optimization
Process-Aware Vdd Scaling (PVS)
Challenge Variability
Energy Reduction in AVS Context
Our Concept Mode Dominance
Our Method Global Optimization
Classes of Closed-Loop AVS
Design of RO with Tunable Vmin
Benefit of Resilience Cost Reduction
Increased Benefit of Resilience With AVS
Overall Optimization Flow (2)
15ISVLSI-2014 invited talk 140710
Determining α Arcw and Acw
Δd at C-worst corner ()Δd at RC-worst corner ()
bull Assumption critical paths in different designs have similar trends
bull Extract Arcw and Acw from a set of representative paths
bull Plot α vs Δdelay find Arcw and Acw for a given α
bull Add +1 margin on Arcw and Acw to account for sampling error
bull Smaller α larger thresholds (Arcw and Acw) fewer paths in GTBC
Δd at C-worst corner ()
Arcw Acw
16ISVLSI-2014 invited talk 140710
Benefits of Tightened BEOL Corners
bull WNS and TNS are reduced by up to 100ps and 53nsbull Timing violations reduced by
24 to 100
bull TBC-06 more benefits bull Tradeoff between reduced margin
vs paths which use TBC
Correlation factor γ = 05
LEON SUPERBLUE12 NETCARD
-018-016-014-012
-01-008-006-004-002
0
CBC TBC-05 TBC-06 TBC-07
WN
S (n
s)
LEON SUPERBLUE12 NETCARD
-90-80-70-60-50-40-30-20-10
0
CBC TBC-05 TBC-06 TBC-07
TNS
(ns)
LEON SUPERBLUE12 NETCARD0
200400600800
1000120014001600
CBC TBC-05 TBC-06 TBC-07
Tim
ing
viol
ation
s
17ISVLSI-2014 invited talk 140710
Outlinebull Introductionbull Modeling of IC Variabilitybull Tolerance of IC Variabilitybull Margining of IC Variabilitybull Conclusions
18ISVLSI-2014 invited talk 140710
How to Minimize Cost of Resilience bull Additional circuits area and power penaltiesbull Recovery from errors throughput degradationbull Large hold margin short-path padding costbull Want benefits (eg energy) to maximally outweigh costs
Razor Razor-Lite TIMBER
Razor Razor-Lite TIMBER
Power penalty 30 [Das08] ~0 [Kim13] 100 [Choudhury09]
Overall Optimization Flowbull Iteratively optimize with SEOpt and SkewOpt
Initial placement (all FFs = error-tolerant FFs)
Energy lt min energy
Save current solution
Margin insertion on K paths based on sensitivity function
Replace error-tolerant FFs w normal FFs
SEOpt
Activity aware clock skew optimization
SkewOpt
OR-tree insertion
23ISVLSI-2014 invited talk 140710
Benefit of Low-Cost Resiliencebull Reference flows
bull Pure-margin (PM) conventional method w only margin insertionbull Brute-force (BF) use error-tolerant FFs for timing-critical endpoints
bull Proposed method (CO) achieves up to 21 energy reduction compared to reference methods
bull Resilience benefits increase with larger process variation
PM BF CO PM BF CO PM BF CO27
29
31
33
35
37
En
erg
y (
mJ
)
PM BF CO PM BF CO PM BF CO22
26
30
34
38Energy penalty of throughput degradation
Energy penalty of additional circuits
Energy wo resilience
En
erg
y (
mJ
)
Large marginMedium marginSmall margin
MUL
EXU
Large marginMedium marginSmall margin
Smallmediumlarge margin 1σ2σ3σ for SS corner Technology foundry 28nm
24ISVLSI-2014 invited talk 140710
Increased Benefit of Resilience with AVSbull Adaptive voltage scaling allows a lower supply voltage for resilient
designs thus reduced powerbull Proposed method trades off between timing-error penalty vs
reduced power at a lower supply voltagebull Proposed method achieves an average of 17 energy reduction
compared to pure-margin designs Resilience benefits increase in the context of AVS strategy
086 09 094 098 10225
30
35
40
45
50pure-marginbrute-forceCombOpt
Supply voltage (V)
En
erg
y (
mJ
)
070 072 074 076 078 08024
26
28
30
32
34
36 pure-marginbrute-forceCombOpt
Supply voltage (V)
En
erg
y (
mJ
)
MUL EXU
Minimum achievable energy
Technology foundry 28nm
25ISVLSI-2014 invited talk 140710
Outlinebull Introductionbull Modeling of IC Variabilitybull Tolerance of IC Variabilitybull Margining of IC Variabilitybull Conclusions
26ISVLSI-2014 invited talk 140710
Breaking Chicken-Egg Loops Less Marginbull Example Interaction between reliability margin and AVS designs
bull Bias temperature instability (BTI) aging higher |ΔVth| lower fmax
bull AVS can be used to compensate for performance degradation
Circuit
Closed-loop AVS
On-chip aging
monitor
Circuit performanc
e
Voltage regulato
r
Circuit frequency
Vdd
time
time
Without AVSWith AVS
target
27ISVLSI-2014 invited talk 140710
Derated Library Characterization and AVS
bull VBTI = Voltage for BTI aging estimation
bull Vlib = Voltage for circuit performance estimation (library characterization)
bull VBTI and Vlib are required in signoff
bull VBTI and Vlib selection should consider BTI + AVS interaction
bull Aging and Vfinal are unknowns before circuit implementation
BTI degradation and AVS
Vfinal
VBTI |Vt|
Step 1
Vlib
Derated library
Step 2
Circuit implementation and
signoff
circuit
Step 3
28ISVLSI-2014 invited talk 140710
Library Characterization for AVS
bull VBTI = Voltage for BTI aging estimation
bull Vlib = Voltage for circuit performance estimation (library characterization)
bull VBTI and Vlib are required in signoff
bull VBTI and Vlib depend on aging during AVS
bull Aging and Vfinal are unknowns before circuit implementation
Vlib
VBTI Derated library
|Vt| Circuit implementation and
signoff
circuitBTI degradation and AVS
Vfinal
Step 1 Step 2 Step 3
No obvious guideline to define VBTI and Vlib
Inconsistency among Vfinal Vlib VBTI
bull What is the design overhead when timing libraries are not properly characterized
bull Can we define BTI- and AVS-aware signoff corners that ensure product goals with small design lifetime energy overheads Joint work with Wei-Ting Jonas Chan Tuck-Boon Chan Siddhartha Nath
29ISVLSI-2014 invited talk 140710
Power vs Area Across Different Signoffs
ldquoKneerdquo point for balanced area and power tradeoff
Optimistic signoff corner bull AVS increases supply voltage
aggressively to compensate aging
bull Large lifetime energy overhead
bull May fail to meet timing if desired supply voltage gt Vmax
bull Selection of signoff modes affects area power
bull ASP-DAC 2013 Optimization of signoff modes
Improve performance power or area
Reduce overdesign
NOM
ODNOM
OD
time
Vdd
tnom tOD tnom tOD
Also Multi-Mode Signoff Choices Matter
12
Fix fOD still 14 power range
Power of circuits w different overdrive modes
Different overdrive modes 26 power range
fnom = 800MHz Vnom = 08V
36ISVLSI-2014 invited talk 140710
Also Tunable Monitors Less Margin
Aggressive config Vmin_est lt Vmin_chip Some chips will fail
Optimized configbull Increase high
resistance passgatesbull Vmin_est asymp Vmin_chip
Default config
bull Low resistance passgates
bull Guardband for worst-case
bull Vmin_est gt Vmin_chip
bull 13mV margin
37ISVLSI-2014 invited talk 140710
Also Tunable Monitors Less Margin
Aggressive config Vmin_est lt Vmin_chip Some chips will fail
Default config
bull Low resistance passgates
bull Guardband for worst-case
bull Vmin_est gt Vmin_chip
bull 13mV margin
Optimized configbull Increase high
resistance passgatesbull Vmin_est asymp Vmin_chip
Benefits of tunability bull Compensate for difference
between model vs siliconbull Recover margin when variation is
reduced due to improved process
38ISVLSI-2014 invited talk 140710
Outlinebull Introductionbull Modeling of IC Variabilitybull Margining of IC Variabilitybull Tolerance of IC Variabilitybull Conclusions
39ISVLSI-2014 invited talk 140710
Conclusionsbull Variability severely challenges IC value
bull In manufacturing process during operation across lifetime
bull Benefit of ldquonext noderdquo is increasingly hard to findbull Entire node is a ldquo202020rdquo value propositionbull 5-10 in PPA metrics is now substantial at leading edge
bull Variability is connected to tapeout IC properties by models margins tolerances used in signoff
bull Some takeaways from this talkbull Substantial benefit from tightening BEOL corners (= signoff)bull ldquoMinimum cost of resiliencerdquo is a rich optimization challengebull Chicken-egg loops in signoff definition can be brokenbull Holistic approaches will provide ldquoequivalent scalingrdquo that
extends the value trajectory of Moorersquos Law
40ISVLSI-2014 invited talk 140710
Thank You
41ISVLSI-2014 invited talk 140710
Backup
42ISVLSI-2014 invited talk 140710
Power Penalty to Fix EM with AVS
1 2 3 4 5 6 7 8 91200
1300
1400
1500
1600
1700
030
032
034
036
Core Power (mW) PG Power (mW)
Implemetation
Core
Pow
er (m
W)
PG
Pow
er (m
W)
bull Core power increases due to elevated voltage bull PG power increases due to both elevated voltage and mesh degradationbull A tradeoff between invested guardband in signoff
Highest invested guardband
Least invested guardband
14 power penalty
>
43ISVLSI-2014 invited talk 140710
Homogeneous Cornersbull (1) Define RC corners of each layer separatelybull (2) Use corners from each layer to construct a
homogeneous corner for an interconnect stack
C-3σ
Layer M2
3σ
C-3σ
Layer M1
3σ
Interconnect stack with M1 and M2
M1 C
M2 C
3σ Pessimism
Example worst-case capacitance corner Homogeneous
Cw corner
44ISVLSI-2014 invited talk 140710
Homogeneous Cornersbull (1) Define RC corners of each layer separatelybull (2) Use corners from each layer to construct a
homogeneous corner for an interconnect stack
Interconnect stack with M1 and M2
M1 C
M2 C
3σ
Homogeneous Cw corner
C-3σ
Layer M2
3σ
C-3σ
Layer M1
3σ
Pessimism
Example worst-case capacitance corner
When variations in different layers are not fully correlated pessimism of homogeneous corners increase with layers
45ISVLSI-2014 invited talk 140710
Correlation Matrixbull Let Σ be the correlation matrix for variation sources
M1 M2 M3 M4
ΔW ΔT ΔH ΔW ΔT ΔH ΔW ΔT ΔH ΔW ΔT ΔH
M1 ΔW 1 0 0 γ 0 0 γ 0 0 0 0 0
ΔT 0 1 0 0 γ 0 0 γ 0 0 0 0
ΔH 0 0 1 0 0 γ 0 0 γ 0 0 0
M2 ΔW γ 0 0 1 0 0 γ 0 0 0 0 0
ΔT 0 γ 0 0 1 0 0 γ 0 0 0 0
ΔH 0 0 γ 0 0 1 0 0 γ 0 0 0
M3 ΔW γ 0 0 γ 0 0 1 0 0 0 0 0
ΔT 0 γ 0 0 γ 0 0 1 0 0 0 0
ΔH 0 0 γ 0 0 γ 0 0 1 0 0 0
M4 ΔW 0 0 0 0 0 0 0 0 0 1 0 0
ΔT 0 0 0 0 0 0 0 0 0 0 1 0
ΔH 0 0 0 0 0 0 0 0 0 0 0 1
= Σ
Correlation for variation sources with the same variation type and in the process module γ 05
Variation sources in different process modules are independent
46ISVLSI-2014 invited talk 140710
Wiring Structure in Timing-Critical Paths (2)
bull 92 of paths have lt 60 of wirelength on any single layer
Max wirelength ratio across all layers ()
Cum
ulati
ve p
roba
bilit
y
092
60
bull Variations in different layers are not fully correlated
bull Some paths have α gt 10 a CBC can underestimate delay variationsbull But these paths have larger delays at the other corner
C-worst corner underestimates delay variations but these paths are dominated by the RC-worst corner
α lt 10 delay variations are covered by the RC-worst corner
Dominated by C-worstΔdelay at C-worst gt Δdelay at RC-worst
Dominated by RC-worst Δdelay at RC-worst gt Δdelay at C-worst
Δdelay at RC-worst [d(Ycw) ndash d(Ytyp)] d(Ytyp)
48ISVLSI-2014 invited talk 140710
Delay Variation
α α
Δdelay at C-worst [d(Ycw) ndash d(Ytyp)] d(Ytyp)
bull Some paths have α gt 10 a CBC can underestimate delay variationsbull But these paths have larger delays at the other corner
C-worst corner underestimates delay variations but these paths are dominated by the RC-worst corner
α lt 10 delay variations are covered by the RC-worst corner
Dominated by C-worstΔdelay at C-worst gt Δdelay at RC-worst
Dominated by RC-worst Δdelay at RC-worst gt Δdelay at C-worst
Δdelay at RC-worst [d(Ycw) ndash d(Ytyp)] d(Ytyp)
bull Paths are more sensitive to R or to Cbull Using RC-worst or C-worst only will underestimate delay variationsbull Need both RC- and C-worst corners to cover process variationsbull In the following discussions α is defined at the dominant corner
49ISVLSI-2014 invited talk 140710
Non-Homogeneous Corner
bull Each layer can have different skewed variationsInterconnect stack with M1 and M2
M1 C
M2 C
3σ
Non-homogeneous cornerM1 == Cw (3σ)M2 == Ctyp
bull Less pessimism with non-homogeneous cornersbull Challenge
bull Many feasible combinationsbull A corner can only cover certain pathsbull How to choose the best combinations
50ISVLSI-2014 invited talk 140710
Opportunities for Tightened BEOL Corners
bull CBC can be pessimistic Most paths have α lt 05 bull Use tightened BEOL corners eg scale BEOL variation in
itf with α = 05
Δdj(Yrcw)dj(Ytyp) x 100
3σjd(Ytyp) x 100
Challenge how to avoid underestimating delay variation to preserve parametric yield
51ISVLSI-2014 invited talk 140710
Wiring Structure in Timing-Critical Paths
bull Critical paths are structurally similar
bull Wires on critical paths are routed on many layers
bull Structure is an outcome of the design flow
Testcasebull 45nm foundry library (wire
resistivity scaled by 8X)bull Netlist NETCARD 1mm2 570K
standard cell instancesbull 9 metal layersbull Extract critical paths from
different PVT and BEOL corners
Wirelength ratio ()
52ISVLSI-2014 invited talk 140710
Proposed Timing Signoff Flow
bull Extract RC at RC-worst C-worst and the typical corners
bull Calculate Δdelay of critical paths
bull Put path j in the group Gtbc if Δdelay is larger than a threshold
bull Fix only the paths in Gtbc using tightened BEOL corners
bull Since tightened corners have smaller delay variations timing closure is easier
Routed design
Timing analysis at BEOL corners Ytyp Ycw Yrcw
GTBC GCBC
ECOusing CBC
Timing analysis
using TBC
violation = 0
Timing analysis
using CBC
violation = 0
ECOusing TBC
done
53ISVLSI-2014 invited talk 140710
Experiment Setup
LEON3MP NETCARD SUPERBLUE12Clock period (ns) 18 20 31
bull Model BTI degradation with Vfinal throughout lifetime
bull Aging of a flat Vfinal asymp aging of an adaptive Vdd
bull But slightly pessimistic
Vdd
time
NBTI
PBTI
VBTI = Vlib asymp Vfinal
58ISVLSI-2014 invited talk 140710
Vfinal Estimation
bull Problem Vfinal is not available at early design stage (design has not been implemented)
bull Vfinal = Vdd end of life (to compensate BTI aging)
bull Gates along critical pathbull Timing slack at t = 0
bull Circuit activity is not an issue bull Because BTI effect is not sensitive to circuit activitybull DC or AC stress model is sufficient
59ISVLSI-2014 invited talk 140710
Observation and Heuristic 2
bull Observation 2 Vfinal is not sensitive to gate types
bull Heuristic 2 use average Vfinal of different gate typesbull Vfinal is a function of timing slack
bull Assume timing slack = 0
10mV
60ISVLSI-2014 invited talk 140710
Technology and Benchmark Circuits
bull NANGATE library with 32nm PTM technology bull Signoff for setup time violationbull Temperature = 125Cbull Process corner = slow NMOS and PMOSbull BTI degradation = DC AC
Supply voltages
Circuit Frequency (GHz)C5315 138c7552 125AES 089MPEG2 105
Vmax105V
Vinit090V
Vheur1 (DC) 097V
Vheur1 (AC) 095V
Vheur2 (DC) 095V
Vheur2 (AC) 093V
61ISVLSI-2014 invited talk 140710
A Reference Signoff Flow
bull Basic idea keep a consistent VBTI VLIB and Vdd throughout circuit lifetime
bull Signoff flowbull Estimate aging at each time step
bull Update circuit timing and Vdd
bull Repeat until t = tfinal
bull Modify circuit and start over if Vfinal gt maximum allowed voltage
bull No overhead in timing analysis but very slow Many STA runs
and library
Vstep AVS voltage stepVfinal converged voltage
62ISVLSI-2014 invited talk 140710
Experiment Setupbull Characterize different derated libraries
bull Evaluate impact of library characterizationbull Seven setups
1 VBTI = Vlib = Vinit Ignore AVS2 Most pessimistic derated library3 VBTI = Vlib = Vmax Extreme corner for AVS4 VBTI = Vfinal Do not overestimate aging but ignores AVS5 No derated library (reference)6 Proposed method with α=07 Proposed method with α=003
Case 1 2 3 4 5 6 7Vlib(V) Vinit Vinit Vmax Vinit NA Vheur1 Vheur2
VBTI (V) Vinit Vmax Vmax Vfinal NA Vheur1 Vheur2
63ISVLSI-2014 invited talk 140710
ldquoChicken and Eggrdquo Loop
bull ldquoChicken and eggrdquo loop in signoffbull Derated library characterization is related to BTI + AVSbull AVS affected by circuit implementation
bull Timing constraints critical paths etc
bull Circuit is affected by library characterization
Circuit
Derated Libraries
Vfinal
Vlib VBTI
64ISVLSI-2014 invited talk 140710
Bias Temperature Instability (BTI)
|ΔVth| increases when device is on (stressed)|ΔVth| is partially recovered when device is off (relaxed)NBTI PMOS PBTINMOS
|Vgs|
time
ON OFF ON OFF
[VattikondaWC06]
Device aging (|ΔVth|) accumulates over time
[TCASrsquo14]
65ISVLSI-2014 invited talk 140710
Observation 1
[Chan11]
bull BTI is a ldquofront-loadedrdquo phenomenon
bull 50 BTI aging happens within the 1st year of circuit lifetime (total lifetime = 10 years)
bull Most Vdd increment happens in early lifetime
bull Gap between Vdd and Vfinal reduces rapidly
asymp70 Vdd increment in 1 year(remaining 30 over 9 years)
Vfinal
66ISVLSI-2014 invited talk 140710
Results for DC Scenario
Optimistic signoff corner bull AVS increases supply voltage
aggressively to compensate agingbull Consume more powerbull May fail to meet timing if desired
bull Assume no EM fix at signoffbull BTI degradation is checked at each step and MTTF is updated as
30 MTTF penalty
200mV voltage compensation
>
70ISVLSI-2014 invited talk 140710
0 2 4 6 8 10 12090092094096098100102104
S1 S2 S3 S4 S5
Year
VDD
DMA 3S1 S2 S3 S4 S5
78
79
80
81
MTT
F (Y
ear)
EM Impact on AVS Scheduling
12 years MTTF penalty
>
71ISVLSI-2014 invited talk 140710
What is ldquoSignoffrdquo
bull Foundation of contract between design house and foundrybull ldquochip should workrdquo stack of models margins analysesbull Function timing signal integrity power integrity hellip
Nominal VddStatic IR drop
Power grid IR gradientDynamic IR
HCINBTI
Signoff Vdd
Voltage
Problem Margins = pessimism
overdesign schedule delay
ldquomargin stackrdquo for voltage signoff
Operating voltage
72ISVLSI-2014 invited talk 140710
Statistical Timing Analysis (1)
bull Delay sensitivity of path pj to variation source zv
bull Assumptions bull Δdjv is linear with respect to variation sources
bull Variation sources are normal distributions
bull Obtain Δdjv using 28 runs of RC extraction and static timing analysis (STA)
28 itf files (27 variation
sources + Ytyp)
Routed Netlist
RC extraction
STA
Δdjv
Δdjv = [ - ] 3dj(Yv) dj(Ytyp)
Note Path delay includes gate and wire delays
73ISVLSI-2014 invited talk 140710
Statistical Timing Analysis (2)bull Σ is the correlation matrix for variation sources (eg 27 x 27)
bull Σ = λλT (Note λ is obtained by Cholesky decomposition)
h h119879 119903119900119906119892 119901119906119905=1minus119864119877119879
+1minus119864119877119903times119879
recovery cycles
Clock period
Error rate [Kahng10]
76ISVLSI-2014 invited talk 140710
Selective-Endpoint Optimizationbull Optimize fanin cone w tighter constraints Allows replacement of Razor FF w normal FFbull Trade off cost of resilience vs data path optimization
Energy Reduction in AVS Contextbull Adaptive voltage scaling allows lower supply voltage for resilient
designs thus reduced powerbull Proposed method trades off between timing-error penalty vs
reduced power at a lower supply voltagebull Proposed method achieves an average of 18 energy reduction
compared to pure-margin designs Resilience benefits increase in the context of AVS strategy
084 088 092 096 10030
36
42
48
54
60brute-forcepure-marginCombOpt
Supply voltage (V)
En
erg
y (
mJ
)
084 086 088 09 092 094 096 098 1 10225
29
33
37
41
45brute-force
pure-margin
CombOpt
Supply voltage (V)
En
erg
y (
mJ
)
MUL EXU
Minimum achievable energy
80ISVLSI-2014 invited talk 140710
Our Concept Mode Dominancebull Design cone (of mode A) is the union of all the feasible operating modes for
circuits signed of at mode Abull Design cone is determined by tradeoff between voltage and frequency (mainly
threshold voltages)bull One mode is outside of the design cone of the other
failed design overdesignbull Mode A has positive timing slacks with respect to mode B
mode A dominates mode Bbull Equivalent dominance no mode is dominated by the other
bull Modes are in each othersrsquo design cone
Voltage
Frequency
A
Negative Slacks = failed design
Positive Slacks = overdesign
B
C
Design Cone of mode A
Multi-mode signoff at modes which do not exhibit equivalent dominance leads to overdesign
Guideline search for signoff modes within design cone reduce overdesign
81ISVLSI-2014 invited talk 140710
Our Method Global Optimizationbull Iteratively sample and refine power models
bull Avoid circuit implementation at each modebull Small constant of runs is enough Scalable
Sample (SPampR)
Construct power models
Estimate optimal signoff modes
Sample (SPampR)
Refine power models
Adaptive search
Global optimization flow
09 10 11 1214
15
16
17
18
19
201st 2nd real
Signoff Voltage (v)
Po
wer
(m
W)
Power estimation of adaptive search
bull Ovals indicate sample pointsbull 1st 2nd power from power models at first
second iterationbull real power from real implemented circuits
Design AESf 700MHz
82ISVLSI-2014 invited talk 140710
Classes of Closed-Loop AVS
bull Critical path may be difficult to identify (IP from 3rd party)
bull Calibrating monitors at multiple modesvoltages requires long test time
Closed-Loop AVS
Design-dependent replica
In-situmonitor
Generic monitor
bull Does not capture design-specific performance variation
82
This work Tunable monitor for closed-loop AVSbull Can be applied as a generic monitorbull Or tuned to capture design-specific performance
83ISVLSI-2014 invited talk 140710
Design of RO with Tunable Vmin
bull Identified two circuit knobs to tune Vmin
bull Series resistancebull Cell types (INV NAND NOR)
bull Proposed circuitbull Different cell type covers different process cornersbull Tune series resistance of each stage to high or low
1 bit 1 bit 1 bit Control pins
High resistance
Low resistance
84ISVLSI-2014 invited talk 140710
Benefit of Resilience Cost Reductionbull Reference flows
bull Pure-margin (PM) conventional methodology w only margin insertionbull Brute-force (BF) insert error-tolerant FFs at timing-critical endpoints
bull Proposed method (CO) achieves up to 20 energy reduction compared to reference methods
bull Resilience benefits increase with safety margin
PM BF CO PM BF CO PM BF CO25
30
35
40
45
50
55 Energy penalty of throughput degradationEnergy penalty of additional circuitsEnergy wo resilience
En
erg
y (
mJ
)
Large marginMedium marginSmall margin
MUL
PM BF CO PM BF CO PM BF CO25
27
29
31
33
35
En
erg
y (
mJ
)
EXU
Large marginMedium marginSmall margin
Smallmediumlarge margin safety margin = 51015 of clock period
85ISVLSI-2014 invited talk 140710
Increased Benefit of Resilience With AVSbull AVS (Adaptive Voltage Scaling) allows lower supply voltage for
resilient designs reduced powerbull We trade off between timing-error penalty vs reduced power at a
lower supply voltagebull Average 18 energy reduction compared to pure-margin designs
Resilience benefits increase in AVS context
084 088 092 096 10030
36
42
48
54
60brute-forcepure-marginCombOpt
Supply voltage (V)
En
erg
y (
mJ
)
084 086 088 09 092 094 096 098 1 10225
29
33
37
41
45brute-force
pure-margin
CombOpt
Supply voltage (V)
En
erg
y (
mJ
)
MUL EXU
Minimum achievable energy
86ISVLSI-2014 invited talk 140710
Overall Optimization Flow
bull Iteratively optimize with SEOpt and SkewOpt
Initial placement (all FFs = error-tolerant FFs)
Energy lt min energy
Save current solution
Margin insertion on K paths based on sensitivity function
Replace error-tolerant FFs w normal FFs
SEOpt
Activity-aware clock skew optimization
SkewOpt
Toward Holistic Modeling Margining and Tolerance of IC Variabi
IC Variability
Challenge Value of Technology
Solutions Modeling Margining Tolerance
Outline
BEOL Corner Optimization
Proposed Timing Signoff Flow
Conventional BEOL Corners
Statistical RC Model
Pessimism of Conventional BEOL Corners (CBC)
Intuition on Delay Variability Across Cw RCw
Intuition on Delay Variability Across Cw RCw (2)
Scaling Factor α and Delay Variation
Find Paths for Which TBCs Can Be Used
Determining α Arcw and Acw
Benefits of Tightened BEOL Corners
Outline (2)
How to Minimize Cost of Resilience
Tradeoff Resilience Cost vs Datapath Cost
Selective-Endpoint Optimization (SEOpt)
Clock Skew Optimization (SkewOpt)
Overall Optimization Flow
Benefit of Low-Cost Resilience
Increased Benefit of Resilience with AVS
Outline (3)
Breaking Chicken-Egg Loops Less Margin
Derated Library Characterization and AVS
Library Characterization for AVS
Power vs Area Across Different Signoffs
Heuristics 1
Vfinal Estimation
Observation and Heuristic 2
Proposed Library Characterization Flow
Power vs Area for All Designs
Also Multi-Mode Signoff Choices Matter
Also Tunable Monitors Less Margin
Also Tunable Monitors Less Margin (2)
Outline (4)
Conclusions
Thank You
Backup
Power Penalty to Fix EM with AVS
Homogeneous Corners
Homogeneous Corners (2)
Correlation Matrix
Wiring Structure in Timing-Critical Paths (2)
Delay Variation
Delay Variation (2)
Non-Homogeneous Corner
Opportunities for Tightened BEOL Corners
Wiring Structure in Timing-Critical Paths (2)
Proposed Timing Signoff Flow (2)
Experiment Setup
Further Analysis
Scaling Factor Results
Benefits of Tightened BEOL Corners (1)
Heuristics 1 (2)
Vfinal Estimation (2)
Observation and Heuristic 2 (2)
Technology and Benchmark Circuits
A Reference Signoff Flow
Experiment Setup (2)
ldquoChicken and Eggrdquo Loop
Bias Temperature Instability (BTI)
Observation 1
Results for DC Scenario
Problem Signoff Corner Definition
AVS Signoff Corner Selection
AVS Impact on EM Lifetime
EM Impact on AVS Scheduling
What is ldquoSignoffrdquo
Statistical Timing Analysis (1)
Statistical Timing Analysis (2)
Resilient Designs
Resilience Cost Reduction Problem
Selective-Endpoint Optimization
Process-Aware Vdd Scaling (PVS)
Challenge Variability
Energy Reduction in AVS Context
Our Concept Mode Dominance
Our Method Global Optimization
Classes of Closed-Loop AVS
Design of RO with Tunable Vmin
Benefit of Resilience Cost Reduction
Increased Benefit of Resilience With AVS
Overall Optimization Flow (2)
16ISVLSI-2014 invited talk 140710
Benefits of Tightened BEOL Corners
bull WNS and TNS are reduced by up to 100ps and 53nsbull Timing violations reduced by
24 to 100
bull TBC-06 more benefits bull Tradeoff between reduced margin
vs paths which use TBC
Correlation factor γ = 05
LEON SUPERBLUE12 NETCARD
-018-016-014-012
-01-008-006-004-002
0
CBC TBC-05 TBC-06 TBC-07
WN
S (n
s)
LEON SUPERBLUE12 NETCARD
-90-80-70-60-50-40-30-20-10
0
CBC TBC-05 TBC-06 TBC-07
TNS
(ns)
LEON SUPERBLUE12 NETCARD0
200400600800
1000120014001600
CBC TBC-05 TBC-06 TBC-07
Tim
ing
viol
ation
s
17ISVLSI-2014 invited talk 140710
Outlinebull Introductionbull Modeling of IC Variabilitybull Tolerance of IC Variabilitybull Margining of IC Variabilitybull Conclusions
18ISVLSI-2014 invited talk 140710
How to Minimize Cost of Resilience bull Additional circuits area and power penaltiesbull Recovery from errors throughput degradationbull Large hold margin short-path padding costbull Want benefits (eg energy) to maximally outweigh costs
Razor Razor-Lite TIMBER
Razor Razor-Lite TIMBER
Power penalty 30 [Das08] ~0 [Kim13] 100 [Choudhury09]
Overall Optimization Flowbull Iteratively optimize with SEOpt and SkewOpt
Initial placement (all FFs = error-tolerant FFs)
Energy lt min energy
Save current solution
Margin insertion on K paths based on sensitivity function
Replace error-tolerant FFs w normal FFs
SEOpt
Activity aware clock skew optimization
SkewOpt
OR-tree insertion
23ISVLSI-2014 invited talk 140710
Benefit of Low-Cost Resiliencebull Reference flows
bull Pure-margin (PM) conventional method w only margin insertionbull Brute-force (BF) use error-tolerant FFs for timing-critical endpoints
bull Proposed method (CO) achieves up to 21 energy reduction compared to reference methods
bull Resilience benefits increase with larger process variation
PM BF CO PM BF CO PM BF CO27
29
31
33
35
37
En
erg
y (
mJ
)
PM BF CO PM BF CO PM BF CO22
26
30
34
38Energy penalty of throughput degradation
Energy penalty of additional circuits
Energy wo resilience
En
erg
y (
mJ
)
Large marginMedium marginSmall margin
MUL
EXU
Large marginMedium marginSmall margin
Smallmediumlarge margin 1σ2σ3σ for SS corner Technology foundry 28nm
24ISVLSI-2014 invited talk 140710
Increased Benefit of Resilience with AVSbull Adaptive voltage scaling allows a lower supply voltage for resilient
designs thus reduced powerbull Proposed method trades off between timing-error penalty vs
reduced power at a lower supply voltagebull Proposed method achieves an average of 17 energy reduction
compared to pure-margin designs Resilience benefits increase in the context of AVS strategy
086 09 094 098 10225
30
35
40
45
50pure-marginbrute-forceCombOpt
Supply voltage (V)
En
erg
y (
mJ
)
070 072 074 076 078 08024
26
28
30
32
34
36 pure-marginbrute-forceCombOpt
Supply voltage (V)
En
erg
y (
mJ
)
MUL EXU
Minimum achievable energy
Technology foundry 28nm
25ISVLSI-2014 invited talk 140710
Outlinebull Introductionbull Modeling of IC Variabilitybull Tolerance of IC Variabilitybull Margining of IC Variabilitybull Conclusions
26ISVLSI-2014 invited talk 140710
Breaking Chicken-Egg Loops Less Marginbull Example Interaction between reliability margin and AVS designs
bull Bias temperature instability (BTI) aging higher |ΔVth| lower fmax
bull AVS can be used to compensate for performance degradation
Circuit
Closed-loop AVS
On-chip aging
monitor
Circuit performanc
e
Voltage regulato
r
Circuit frequency
Vdd
time
time
Without AVSWith AVS
target
27ISVLSI-2014 invited talk 140710
Derated Library Characterization and AVS
bull VBTI = Voltage for BTI aging estimation
bull Vlib = Voltage for circuit performance estimation (library characterization)
bull VBTI and Vlib are required in signoff
bull VBTI and Vlib selection should consider BTI + AVS interaction
bull Aging and Vfinal are unknowns before circuit implementation
BTI degradation and AVS
Vfinal
VBTI |Vt|
Step 1
Vlib
Derated library
Step 2
Circuit implementation and
signoff
circuit
Step 3
28ISVLSI-2014 invited talk 140710
Library Characterization for AVS
bull VBTI = Voltage for BTI aging estimation
bull Vlib = Voltage for circuit performance estimation (library characterization)
bull VBTI and Vlib are required in signoff
bull VBTI and Vlib depend on aging during AVS
bull Aging and Vfinal are unknowns before circuit implementation
Vlib
VBTI Derated library
|Vt| Circuit implementation and
signoff
circuitBTI degradation and AVS
Vfinal
Step 1 Step 2 Step 3
No obvious guideline to define VBTI and Vlib
Inconsistency among Vfinal Vlib VBTI
bull What is the design overhead when timing libraries are not properly characterized
bull Can we define BTI- and AVS-aware signoff corners that ensure product goals with small design lifetime energy overheads Joint work with Wei-Ting Jonas Chan Tuck-Boon Chan Siddhartha Nath
29ISVLSI-2014 invited talk 140710
Power vs Area Across Different Signoffs
ldquoKneerdquo point for balanced area and power tradeoff
Optimistic signoff corner bull AVS increases supply voltage
aggressively to compensate aging
bull Large lifetime energy overhead
bull May fail to meet timing if desired supply voltage gt Vmax
bull Selection of signoff modes affects area power
bull ASP-DAC 2013 Optimization of signoff modes
Improve performance power or area
Reduce overdesign
NOM
ODNOM
OD
time
Vdd
tnom tOD tnom tOD
Also Multi-Mode Signoff Choices Matter
12
Fix fOD still 14 power range
Power of circuits w different overdrive modes
Different overdrive modes 26 power range
fnom = 800MHz Vnom = 08V
36ISVLSI-2014 invited talk 140710
Also Tunable Monitors Less Margin
Aggressive config Vmin_est lt Vmin_chip Some chips will fail
Optimized configbull Increase high
resistance passgatesbull Vmin_est asymp Vmin_chip
Default config
bull Low resistance passgates
bull Guardband for worst-case
bull Vmin_est gt Vmin_chip
bull 13mV margin
37ISVLSI-2014 invited talk 140710
Also Tunable Monitors Less Margin
Aggressive config Vmin_est lt Vmin_chip Some chips will fail
Default config
bull Low resistance passgates
bull Guardband for worst-case
bull Vmin_est gt Vmin_chip
bull 13mV margin
Optimized configbull Increase high
resistance passgatesbull Vmin_est asymp Vmin_chip
Benefits of tunability bull Compensate for difference
between model vs siliconbull Recover margin when variation is
reduced due to improved process
38ISVLSI-2014 invited talk 140710
Outlinebull Introductionbull Modeling of IC Variabilitybull Margining of IC Variabilitybull Tolerance of IC Variabilitybull Conclusions
39ISVLSI-2014 invited talk 140710
Conclusionsbull Variability severely challenges IC value
bull In manufacturing process during operation across lifetime
bull Benefit of ldquonext noderdquo is increasingly hard to findbull Entire node is a ldquo202020rdquo value propositionbull 5-10 in PPA metrics is now substantial at leading edge
bull Variability is connected to tapeout IC properties by models margins tolerances used in signoff
bull Some takeaways from this talkbull Substantial benefit from tightening BEOL corners (= signoff)bull ldquoMinimum cost of resiliencerdquo is a rich optimization challengebull Chicken-egg loops in signoff definition can be brokenbull Holistic approaches will provide ldquoequivalent scalingrdquo that
extends the value trajectory of Moorersquos Law
40ISVLSI-2014 invited talk 140710
Thank You
41ISVLSI-2014 invited talk 140710
Backup
42ISVLSI-2014 invited talk 140710
Power Penalty to Fix EM with AVS
1 2 3 4 5 6 7 8 91200
1300
1400
1500
1600
1700
030
032
034
036
Core Power (mW) PG Power (mW)
Implemetation
Core
Pow
er (m
W)
PG
Pow
er (m
W)
bull Core power increases due to elevated voltage bull PG power increases due to both elevated voltage and mesh degradationbull A tradeoff between invested guardband in signoff
Highest invested guardband
Least invested guardband
14 power penalty
>
43ISVLSI-2014 invited talk 140710
Homogeneous Cornersbull (1) Define RC corners of each layer separatelybull (2) Use corners from each layer to construct a
homogeneous corner for an interconnect stack
C-3σ
Layer M2
3σ
C-3σ
Layer M1
3σ
Interconnect stack with M1 and M2
M1 C
M2 C
3σ Pessimism
Example worst-case capacitance corner Homogeneous
Cw corner
44ISVLSI-2014 invited talk 140710
Homogeneous Cornersbull (1) Define RC corners of each layer separatelybull (2) Use corners from each layer to construct a
homogeneous corner for an interconnect stack
Interconnect stack with M1 and M2
M1 C
M2 C
3σ
Homogeneous Cw corner
C-3σ
Layer M2
3σ
C-3σ
Layer M1
3σ
Pessimism
Example worst-case capacitance corner
When variations in different layers are not fully correlated pessimism of homogeneous corners increase with layers
45ISVLSI-2014 invited talk 140710
Correlation Matrixbull Let Σ be the correlation matrix for variation sources
M1 M2 M3 M4
ΔW ΔT ΔH ΔW ΔT ΔH ΔW ΔT ΔH ΔW ΔT ΔH
M1 ΔW 1 0 0 γ 0 0 γ 0 0 0 0 0
ΔT 0 1 0 0 γ 0 0 γ 0 0 0 0
ΔH 0 0 1 0 0 γ 0 0 γ 0 0 0
M2 ΔW γ 0 0 1 0 0 γ 0 0 0 0 0
ΔT 0 γ 0 0 1 0 0 γ 0 0 0 0
ΔH 0 0 γ 0 0 1 0 0 γ 0 0 0
M3 ΔW γ 0 0 γ 0 0 1 0 0 0 0 0
ΔT 0 γ 0 0 γ 0 0 1 0 0 0 0
ΔH 0 0 γ 0 0 γ 0 0 1 0 0 0
M4 ΔW 0 0 0 0 0 0 0 0 0 1 0 0
ΔT 0 0 0 0 0 0 0 0 0 0 1 0
ΔH 0 0 0 0 0 0 0 0 0 0 0 1
= Σ
Correlation for variation sources with the same variation type and in the process module γ 05
Variation sources in different process modules are independent
46ISVLSI-2014 invited talk 140710
Wiring Structure in Timing-Critical Paths (2)
bull 92 of paths have lt 60 of wirelength on any single layer
Max wirelength ratio across all layers ()
Cum
ulati
ve p
roba
bilit
y
092
60
bull Variations in different layers are not fully correlated
bull Some paths have α gt 10 a CBC can underestimate delay variationsbull But these paths have larger delays at the other corner
C-worst corner underestimates delay variations but these paths are dominated by the RC-worst corner
α lt 10 delay variations are covered by the RC-worst corner
Dominated by C-worstΔdelay at C-worst gt Δdelay at RC-worst
Dominated by RC-worst Δdelay at RC-worst gt Δdelay at C-worst
Δdelay at RC-worst [d(Ycw) ndash d(Ytyp)] d(Ytyp)
48ISVLSI-2014 invited talk 140710
Delay Variation
α α
Δdelay at C-worst [d(Ycw) ndash d(Ytyp)] d(Ytyp)
bull Some paths have α gt 10 a CBC can underestimate delay variationsbull But these paths have larger delays at the other corner
C-worst corner underestimates delay variations but these paths are dominated by the RC-worst corner
α lt 10 delay variations are covered by the RC-worst corner
Dominated by C-worstΔdelay at C-worst gt Δdelay at RC-worst
Dominated by RC-worst Δdelay at RC-worst gt Δdelay at C-worst
Δdelay at RC-worst [d(Ycw) ndash d(Ytyp)] d(Ytyp)
bull Paths are more sensitive to R or to Cbull Using RC-worst or C-worst only will underestimate delay variationsbull Need both RC- and C-worst corners to cover process variationsbull In the following discussions α is defined at the dominant corner
49ISVLSI-2014 invited talk 140710
Non-Homogeneous Corner
bull Each layer can have different skewed variationsInterconnect stack with M1 and M2
M1 C
M2 C
3σ
Non-homogeneous cornerM1 == Cw (3σ)M2 == Ctyp
bull Less pessimism with non-homogeneous cornersbull Challenge
bull Many feasible combinationsbull A corner can only cover certain pathsbull How to choose the best combinations
50ISVLSI-2014 invited talk 140710
Opportunities for Tightened BEOL Corners
bull CBC can be pessimistic Most paths have α lt 05 bull Use tightened BEOL corners eg scale BEOL variation in
itf with α = 05
Δdj(Yrcw)dj(Ytyp) x 100
3σjd(Ytyp) x 100
Challenge how to avoid underestimating delay variation to preserve parametric yield
51ISVLSI-2014 invited talk 140710
Wiring Structure in Timing-Critical Paths
bull Critical paths are structurally similar
bull Wires on critical paths are routed on many layers
bull Structure is an outcome of the design flow
Testcasebull 45nm foundry library (wire
resistivity scaled by 8X)bull Netlist NETCARD 1mm2 570K
standard cell instancesbull 9 metal layersbull Extract critical paths from
different PVT and BEOL corners
Wirelength ratio ()
52ISVLSI-2014 invited talk 140710
Proposed Timing Signoff Flow
bull Extract RC at RC-worst C-worst and the typical corners
bull Calculate Δdelay of critical paths
bull Put path j in the group Gtbc if Δdelay is larger than a threshold
bull Fix only the paths in Gtbc using tightened BEOL corners
bull Since tightened corners have smaller delay variations timing closure is easier
Routed design
Timing analysis at BEOL corners Ytyp Ycw Yrcw
GTBC GCBC
ECOusing CBC
Timing analysis
using TBC
violation = 0
Timing analysis
using CBC
violation = 0
ECOusing TBC
done
53ISVLSI-2014 invited talk 140710
Experiment Setup
LEON3MP NETCARD SUPERBLUE12Clock period (ns) 18 20 31
bull Model BTI degradation with Vfinal throughout lifetime
bull Aging of a flat Vfinal asymp aging of an adaptive Vdd
bull But slightly pessimistic
Vdd
time
NBTI
PBTI
VBTI = Vlib asymp Vfinal
58ISVLSI-2014 invited talk 140710
Vfinal Estimation
bull Problem Vfinal is not available at early design stage (design has not been implemented)
bull Vfinal = Vdd end of life (to compensate BTI aging)
bull Gates along critical pathbull Timing slack at t = 0
bull Circuit activity is not an issue bull Because BTI effect is not sensitive to circuit activitybull DC or AC stress model is sufficient
59ISVLSI-2014 invited talk 140710
Observation and Heuristic 2
bull Observation 2 Vfinal is not sensitive to gate types
bull Heuristic 2 use average Vfinal of different gate typesbull Vfinal is a function of timing slack
bull Assume timing slack = 0
10mV
60ISVLSI-2014 invited talk 140710
Technology and Benchmark Circuits
bull NANGATE library with 32nm PTM technology bull Signoff for setup time violationbull Temperature = 125Cbull Process corner = slow NMOS and PMOSbull BTI degradation = DC AC
Supply voltages
Circuit Frequency (GHz)C5315 138c7552 125AES 089MPEG2 105
Vmax105V
Vinit090V
Vheur1 (DC) 097V
Vheur1 (AC) 095V
Vheur2 (DC) 095V
Vheur2 (AC) 093V
61ISVLSI-2014 invited talk 140710
A Reference Signoff Flow
bull Basic idea keep a consistent VBTI VLIB and Vdd throughout circuit lifetime
bull Signoff flowbull Estimate aging at each time step
bull Update circuit timing and Vdd
bull Repeat until t = tfinal
bull Modify circuit and start over if Vfinal gt maximum allowed voltage
bull No overhead in timing analysis but very slow Many STA runs
and library
Vstep AVS voltage stepVfinal converged voltage
62ISVLSI-2014 invited talk 140710
Experiment Setupbull Characterize different derated libraries
bull Evaluate impact of library characterizationbull Seven setups
1 VBTI = Vlib = Vinit Ignore AVS2 Most pessimistic derated library3 VBTI = Vlib = Vmax Extreme corner for AVS4 VBTI = Vfinal Do not overestimate aging but ignores AVS5 No derated library (reference)6 Proposed method with α=07 Proposed method with α=003
Case 1 2 3 4 5 6 7Vlib(V) Vinit Vinit Vmax Vinit NA Vheur1 Vheur2
VBTI (V) Vinit Vmax Vmax Vfinal NA Vheur1 Vheur2
63ISVLSI-2014 invited talk 140710
ldquoChicken and Eggrdquo Loop
bull ldquoChicken and eggrdquo loop in signoffbull Derated library characterization is related to BTI + AVSbull AVS affected by circuit implementation
bull Timing constraints critical paths etc
bull Circuit is affected by library characterization
Circuit
Derated Libraries
Vfinal
Vlib VBTI
64ISVLSI-2014 invited talk 140710
Bias Temperature Instability (BTI)
|ΔVth| increases when device is on (stressed)|ΔVth| is partially recovered when device is off (relaxed)NBTI PMOS PBTINMOS
|Vgs|
time
ON OFF ON OFF
[VattikondaWC06]
Device aging (|ΔVth|) accumulates over time
[TCASrsquo14]
65ISVLSI-2014 invited talk 140710
Observation 1
[Chan11]
bull BTI is a ldquofront-loadedrdquo phenomenon
bull 50 BTI aging happens within the 1st year of circuit lifetime (total lifetime = 10 years)
bull Most Vdd increment happens in early lifetime
bull Gap between Vdd and Vfinal reduces rapidly
asymp70 Vdd increment in 1 year(remaining 30 over 9 years)
Vfinal
66ISVLSI-2014 invited talk 140710
Results for DC Scenario
Optimistic signoff corner bull AVS increases supply voltage
aggressively to compensate agingbull Consume more powerbull May fail to meet timing if desired
bull Assume no EM fix at signoffbull BTI degradation is checked at each step and MTTF is updated as
30 MTTF penalty
200mV voltage compensation
>
70ISVLSI-2014 invited talk 140710
0 2 4 6 8 10 12090092094096098100102104
S1 S2 S3 S4 S5
Year
VDD
DMA 3S1 S2 S3 S4 S5
78
79
80
81
MTT
F (Y
ear)
EM Impact on AVS Scheduling
12 years MTTF penalty
>
71ISVLSI-2014 invited talk 140710
What is ldquoSignoffrdquo
bull Foundation of contract between design house and foundrybull ldquochip should workrdquo stack of models margins analysesbull Function timing signal integrity power integrity hellip
Nominal VddStatic IR drop
Power grid IR gradientDynamic IR
HCINBTI
Signoff Vdd
Voltage
Problem Margins = pessimism
overdesign schedule delay
ldquomargin stackrdquo for voltage signoff
Operating voltage
72ISVLSI-2014 invited talk 140710
Statistical Timing Analysis (1)
bull Delay sensitivity of path pj to variation source zv
bull Assumptions bull Δdjv is linear with respect to variation sources
bull Variation sources are normal distributions
bull Obtain Δdjv using 28 runs of RC extraction and static timing analysis (STA)
28 itf files (27 variation
sources + Ytyp)
Routed Netlist
RC extraction
STA
Δdjv
Δdjv = [ - ] 3dj(Yv) dj(Ytyp)
Note Path delay includes gate and wire delays
73ISVLSI-2014 invited talk 140710
Statistical Timing Analysis (2)bull Σ is the correlation matrix for variation sources (eg 27 x 27)
bull Σ = λλT (Note λ is obtained by Cholesky decomposition)
h h119879 119903119900119906119892 119901119906119905=1minus119864119877119879
+1minus119864119877119903times119879
recovery cycles
Clock period
Error rate [Kahng10]
76ISVLSI-2014 invited talk 140710
Selective-Endpoint Optimizationbull Optimize fanin cone w tighter constraints Allows replacement of Razor FF w normal FFbull Trade off cost of resilience vs data path optimization
Energy Reduction in AVS Contextbull Adaptive voltage scaling allows lower supply voltage for resilient
designs thus reduced powerbull Proposed method trades off between timing-error penalty vs
reduced power at a lower supply voltagebull Proposed method achieves an average of 18 energy reduction
compared to pure-margin designs Resilience benefits increase in the context of AVS strategy
084 088 092 096 10030
36
42
48
54
60brute-forcepure-marginCombOpt
Supply voltage (V)
En
erg
y (
mJ
)
084 086 088 09 092 094 096 098 1 10225
29
33
37
41
45brute-force
pure-margin
CombOpt
Supply voltage (V)
En
erg
y (
mJ
)
MUL EXU
Minimum achievable energy
80ISVLSI-2014 invited talk 140710
Our Concept Mode Dominancebull Design cone (of mode A) is the union of all the feasible operating modes for
circuits signed of at mode Abull Design cone is determined by tradeoff between voltage and frequency (mainly
threshold voltages)bull One mode is outside of the design cone of the other
failed design overdesignbull Mode A has positive timing slacks with respect to mode B
mode A dominates mode Bbull Equivalent dominance no mode is dominated by the other
bull Modes are in each othersrsquo design cone
Voltage
Frequency
A
Negative Slacks = failed design
Positive Slacks = overdesign
B
C
Design Cone of mode A
Multi-mode signoff at modes which do not exhibit equivalent dominance leads to overdesign
Guideline search for signoff modes within design cone reduce overdesign
81ISVLSI-2014 invited talk 140710
Our Method Global Optimizationbull Iteratively sample and refine power models
bull Avoid circuit implementation at each modebull Small constant of runs is enough Scalable
Sample (SPampR)
Construct power models
Estimate optimal signoff modes
Sample (SPampR)
Refine power models
Adaptive search
Global optimization flow
09 10 11 1214
15
16
17
18
19
201st 2nd real
Signoff Voltage (v)
Po
wer
(m
W)
Power estimation of adaptive search
bull Ovals indicate sample pointsbull 1st 2nd power from power models at first
second iterationbull real power from real implemented circuits
Design AESf 700MHz
82ISVLSI-2014 invited talk 140710
Classes of Closed-Loop AVS
bull Critical path may be difficult to identify (IP from 3rd party)
bull Calibrating monitors at multiple modesvoltages requires long test time
Closed-Loop AVS
Design-dependent replica
In-situmonitor
Generic monitor
bull Does not capture design-specific performance variation
82
This work Tunable monitor for closed-loop AVSbull Can be applied as a generic monitorbull Or tuned to capture design-specific performance
83ISVLSI-2014 invited talk 140710
Design of RO with Tunable Vmin
bull Identified two circuit knobs to tune Vmin
bull Series resistancebull Cell types (INV NAND NOR)
bull Proposed circuitbull Different cell type covers different process cornersbull Tune series resistance of each stage to high or low
1 bit 1 bit 1 bit Control pins
High resistance
Low resistance
84ISVLSI-2014 invited talk 140710
Benefit of Resilience Cost Reductionbull Reference flows
bull Pure-margin (PM) conventional methodology w only margin insertionbull Brute-force (BF) insert error-tolerant FFs at timing-critical endpoints
bull Proposed method (CO) achieves up to 20 energy reduction compared to reference methods
bull Resilience benefits increase with safety margin
PM BF CO PM BF CO PM BF CO25
30
35
40
45
50
55 Energy penalty of throughput degradationEnergy penalty of additional circuitsEnergy wo resilience
En
erg
y (
mJ
)
Large marginMedium marginSmall margin
MUL
PM BF CO PM BF CO PM BF CO25
27
29
31
33
35
En
erg
y (
mJ
)
EXU
Large marginMedium marginSmall margin
Smallmediumlarge margin safety margin = 51015 of clock period
85ISVLSI-2014 invited talk 140710
Increased Benefit of Resilience With AVSbull AVS (Adaptive Voltage Scaling) allows lower supply voltage for
resilient designs reduced powerbull We trade off between timing-error penalty vs reduced power at a
lower supply voltagebull Average 18 energy reduction compared to pure-margin designs
Resilience benefits increase in AVS context
084 088 092 096 10030
36
42
48
54
60brute-forcepure-marginCombOpt
Supply voltage (V)
En
erg
y (
mJ
)
084 086 088 09 092 094 096 098 1 10225
29
33
37
41
45brute-force
pure-margin
CombOpt
Supply voltage (V)
En
erg
y (
mJ
)
MUL EXU
Minimum achievable energy
86ISVLSI-2014 invited talk 140710
Overall Optimization Flow
bull Iteratively optimize with SEOpt and SkewOpt
Initial placement (all FFs = error-tolerant FFs)
Energy lt min energy
Save current solution
Margin insertion on K paths based on sensitivity function
Replace error-tolerant FFs w normal FFs
SEOpt
Activity-aware clock skew optimization
SkewOpt
Toward Holistic Modeling Margining and Tolerance of IC Variabi
IC Variability
Challenge Value of Technology
Solutions Modeling Margining Tolerance
Outline
BEOL Corner Optimization
Proposed Timing Signoff Flow
Conventional BEOL Corners
Statistical RC Model
Pessimism of Conventional BEOL Corners (CBC)
Intuition on Delay Variability Across Cw RCw
Intuition on Delay Variability Across Cw RCw (2)
Scaling Factor α and Delay Variation
Find Paths for Which TBCs Can Be Used
Determining α Arcw and Acw
Benefits of Tightened BEOL Corners
Outline (2)
How to Minimize Cost of Resilience
Tradeoff Resilience Cost vs Datapath Cost
Selective-Endpoint Optimization (SEOpt)
Clock Skew Optimization (SkewOpt)
Overall Optimization Flow
Benefit of Low-Cost Resilience
Increased Benefit of Resilience with AVS
Outline (3)
Breaking Chicken-Egg Loops Less Margin
Derated Library Characterization and AVS
Library Characterization for AVS
Power vs Area Across Different Signoffs
Heuristics 1
Vfinal Estimation
Observation and Heuristic 2
Proposed Library Characterization Flow
Power vs Area for All Designs
Also Multi-Mode Signoff Choices Matter
Also Tunable Monitors Less Margin
Also Tunable Monitors Less Margin (2)
Outline (4)
Conclusions
Thank You
Backup
Power Penalty to Fix EM with AVS
Homogeneous Corners
Homogeneous Corners (2)
Correlation Matrix
Wiring Structure in Timing-Critical Paths (2)
Delay Variation
Delay Variation (2)
Non-Homogeneous Corner
Opportunities for Tightened BEOL Corners
Wiring Structure in Timing-Critical Paths (2)
Proposed Timing Signoff Flow (2)
Experiment Setup
Further Analysis
Scaling Factor Results
Benefits of Tightened BEOL Corners (1)
Heuristics 1 (2)
Vfinal Estimation (2)
Observation and Heuristic 2 (2)
Technology and Benchmark Circuits
A Reference Signoff Flow
Experiment Setup (2)
ldquoChicken and Eggrdquo Loop
Bias Temperature Instability (BTI)
Observation 1
Results for DC Scenario
Problem Signoff Corner Definition
AVS Signoff Corner Selection
AVS Impact on EM Lifetime
EM Impact on AVS Scheduling
What is ldquoSignoffrdquo
Statistical Timing Analysis (1)
Statistical Timing Analysis (2)
Resilient Designs
Resilience Cost Reduction Problem
Selective-Endpoint Optimization
Process-Aware Vdd Scaling (PVS)
Challenge Variability
Energy Reduction in AVS Context
Our Concept Mode Dominance
Our Method Global Optimization
Classes of Closed-Loop AVS
Design of RO with Tunable Vmin
Benefit of Resilience Cost Reduction
Increased Benefit of Resilience With AVS
Overall Optimization Flow (2)
17ISVLSI-2014 invited talk 140710
Outlinebull Introductionbull Modeling of IC Variabilitybull Tolerance of IC Variabilitybull Margining of IC Variabilitybull Conclusions
18ISVLSI-2014 invited talk 140710
How to Minimize Cost of Resilience bull Additional circuits area and power penaltiesbull Recovery from errors throughput degradationbull Large hold margin short-path padding costbull Want benefits (eg energy) to maximally outweigh costs
Razor Razor-Lite TIMBER
Razor Razor-Lite TIMBER
Power penalty 30 [Das08] ~0 [Kim13] 100 [Choudhury09]
Overall Optimization Flowbull Iteratively optimize with SEOpt and SkewOpt
Initial placement (all FFs = error-tolerant FFs)
Energy lt min energy
Save current solution
Margin insertion on K paths based on sensitivity function
Replace error-tolerant FFs w normal FFs
SEOpt
Activity aware clock skew optimization
SkewOpt
OR-tree insertion
23ISVLSI-2014 invited talk 140710
Benefit of Low-Cost Resiliencebull Reference flows
bull Pure-margin (PM) conventional method w only margin insertionbull Brute-force (BF) use error-tolerant FFs for timing-critical endpoints
bull Proposed method (CO) achieves up to 21 energy reduction compared to reference methods
bull Resilience benefits increase with larger process variation
PM BF CO PM BF CO PM BF CO27
29
31
33
35
37
En
erg
y (
mJ
)
PM BF CO PM BF CO PM BF CO22
26
30
34
38Energy penalty of throughput degradation
Energy penalty of additional circuits
Energy wo resilience
En
erg
y (
mJ
)
Large marginMedium marginSmall margin
MUL
EXU
Large marginMedium marginSmall margin
Smallmediumlarge margin 1σ2σ3σ for SS corner Technology foundry 28nm
24ISVLSI-2014 invited talk 140710
Increased Benefit of Resilience with AVSbull Adaptive voltage scaling allows a lower supply voltage for resilient
designs thus reduced powerbull Proposed method trades off between timing-error penalty vs
reduced power at a lower supply voltagebull Proposed method achieves an average of 17 energy reduction
compared to pure-margin designs Resilience benefits increase in the context of AVS strategy
086 09 094 098 10225
30
35
40
45
50pure-marginbrute-forceCombOpt
Supply voltage (V)
En
erg
y (
mJ
)
070 072 074 076 078 08024
26
28
30
32
34
36 pure-marginbrute-forceCombOpt
Supply voltage (V)
En
erg
y (
mJ
)
MUL EXU
Minimum achievable energy
Technology foundry 28nm
25ISVLSI-2014 invited talk 140710
Outlinebull Introductionbull Modeling of IC Variabilitybull Tolerance of IC Variabilitybull Margining of IC Variabilitybull Conclusions
26ISVLSI-2014 invited talk 140710
Breaking Chicken-Egg Loops Less Marginbull Example Interaction between reliability margin and AVS designs
bull Bias temperature instability (BTI) aging higher |ΔVth| lower fmax
bull AVS can be used to compensate for performance degradation
Circuit
Closed-loop AVS
On-chip aging
monitor
Circuit performanc
e
Voltage regulato
r
Circuit frequency
Vdd
time
time
Without AVSWith AVS
target
27ISVLSI-2014 invited talk 140710
Derated Library Characterization and AVS
bull VBTI = Voltage for BTI aging estimation
bull Vlib = Voltage for circuit performance estimation (library characterization)
bull VBTI and Vlib are required in signoff
bull VBTI and Vlib selection should consider BTI + AVS interaction
bull Aging and Vfinal are unknowns before circuit implementation
BTI degradation and AVS
Vfinal
VBTI |Vt|
Step 1
Vlib
Derated library
Step 2
Circuit implementation and
signoff
circuit
Step 3
28ISVLSI-2014 invited talk 140710
Library Characterization for AVS
bull VBTI = Voltage for BTI aging estimation
bull Vlib = Voltage for circuit performance estimation (library characterization)
bull VBTI and Vlib are required in signoff
bull VBTI and Vlib depend on aging during AVS
bull Aging and Vfinal are unknowns before circuit implementation
Vlib
VBTI Derated library
|Vt| Circuit implementation and
signoff
circuitBTI degradation and AVS
Vfinal
Step 1 Step 2 Step 3
No obvious guideline to define VBTI and Vlib
Inconsistency among Vfinal Vlib VBTI
bull What is the design overhead when timing libraries are not properly characterized
bull Can we define BTI- and AVS-aware signoff corners that ensure product goals with small design lifetime energy overheads Joint work with Wei-Ting Jonas Chan Tuck-Boon Chan Siddhartha Nath
29ISVLSI-2014 invited talk 140710
Power vs Area Across Different Signoffs
ldquoKneerdquo point for balanced area and power tradeoff
Optimistic signoff corner bull AVS increases supply voltage
aggressively to compensate aging
bull Large lifetime energy overhead
bull May fail to meet timing if desired supply voltage gt Vmax
bull Selection of signoff modes affects area power
bull ASP-DAC 2013 Optimization of signoff modes
Improve performance power or area
Reduce overdesign
NOM
ODNOM
OD
time
Vdd
tnom tOD tnom tOD
Also Multi-Mode Signoff Choices Matter
12
Fix fOD still 14 power range
Power of circuits w different overdrive modes
Different overdrive modes 26 power range
fnom = 800MHz Vnom = 08V
36ISVLSI-2014 invited talk 140710
Also Tunable Monitors Less Margin
Aggressive config Vmin_est lt Vmin_chip Some chips will fail
Optimized configbull Increase high
resistance passgatesbull Vmin_est asymp Vmin_chip
Default config
bull Low resistance passgates
bull Guardband for worst-case
bull Vmin_est gt Vmin_chip
bull 13mV margin
37ISVLSI-2014 invited talk 140710
Also Tunable Monitors Less Margin
Aggressive config Vmin_est lt Vmin_chip Some chips will fail
Default config
bull Low resistance passgates
bull Guardband for worst-case
bull Vmin_est gt Vmin_chip
bull 13mV margin
Optimized configbull Increase high
resistance passgatesbull Vmin_est asymp Vmin_chip
Benefits of tunability bull Compensate for difference
between model vs siliconbull Recover margin when variation is
reduced due to improved process
38ISVLSI-2014 invited talk 140710
Outlinebull Introductionbull Modeling of IC Variabilitybull Margining of IC Variabilitybull Tolerance of IC Variabilitybull Conclusions
39ISVLSI-2014 invited talk 140710
Conclusionsbull Variability severely challenges IC value
bull In manufacturing process during operation across lifetime
bull Benefit of ldquonext noderdquo is increasingly hard to findbull Entire node is a ldquo202020rdquo value propositionbull 5-10 in PPA metrics is now substantial at leading edge
bull Variability is connected to tapeout IC properties by models margins tolerances used in signoff
bull Some takeaways from this talkbull Substantial benefit from tightening BEOL corners (= signoff)bull ldquoMinimum cost of resiliencerdquo is a rich optimization challengebull Chicken-egg loops in signoff definition can be brokenbull Holistic approaches will provide ldquoequivalent scalingrdquo that
extends the value trajectory of Moorersquos Law
40ISVLSI-2014 invited talk 140710
Thank You
41ISVLSI-2014 invited talk 140710
Backup
42ISVLSI-2014 invited talk 140710
Power Penalty to Fix EM with AVS
1 2 3 4 5 6 7 8 91200
1300
1400
1500
1600
1700
030
032
034
036
Core Power (mW) PG Power (mW)
Implemetation
Core
Pow
er (m
W)
PG
Pow
er (m
W)
bull Core power increases due to elevated voltage bull PG power increases due to both elevated voltage and mesh degradationbull A tradeoff between invested guardband in signoff
Highest invested guardband
Least invested guardband
14 power penalty
>
43ISVLSI-2014 invited talk 140710
Homogeneous Cornersbull (1) Define RC corners of each layer separatelybull (2) Use corners from each layer to construct a
homogeneous corner for an interconnect stack
C-3σ
Layer M2
3σ
C-3σ
Layer M1
3σ
Interconnect stack with M1 and M2
M1 C
M2 C
3σ Pessimism
Example worst-case capacitance corner Homogeneous
Cw corner
44ISVLSI-2014 invited talk 140710
Homogeneous Cornersbull (1) Define RC corners of each layer separatelybull (2) Use corners from each layer to construct a
homogeneous corner for an interconnect stack
Interconnect stack with M1 and M2
M1 C
M2 C
3σ
Homogeneous Cw corner
C-3σ
Layer M2
3σ
C-3σ
Layer M1
3σ
Pessimism
Example worst-case capacitance corner
When variations in different layers are not fully correlated pessimism of homogeneous corners increase with layers
45ISVLSI-2014 invited talk 140710
Correlation Matrixbull Let Σ be the correlation matrix for variation sources
M1 M2 M3 M4
ΔW ΔT ΔH ΔW ΔT ΔH ΔW ΔT ΔH ΔW ΔT ΔH
M1 ΔW 1 0 0 γ 0 0 γ 0 0 0 0 0
ΔT 0 1 0 0 γ 0 0 γ 0 0 0 0
ΔH 0 0 1 0 0 γ 0 0 γ 0 0 0
M2 ΔW γ 0 0 1 0 0 γ 0 0 0 0 0
ΔT 0 γ 0 0 1 0 0 γ 0 0 0 0
ΔH 0 0 γ 0 0 1 0 0 γ 0 0 0
M3 ΔW γ 0 0 γ 0 0 1 0 0 0 0 0
ΔT 0 γ 0 0 γ 0 0 1 0 0 0 0
ΔH 0 0 γ 0 0 γ 0 0 1 0 0 0
M4 ΔW 0 0 0 0 0 0 0 0 0 1 0 0
ΔT 0 0 0 0 0 0 0 0 0 0 1 0
ΔH 0 0 0 0 0 0 0 0 0 0 0 1
= Σ
Correlation for variation sources with the same variation type and in the process module γ 05
Variation sources in different process modules are independent
46ISVLSI-2014 invited talk 140710
Wiring Structure in Timing-Critical Paths (2)
bull 92 of paths have lt 60 of wirelength on any single layer
Max wirelength ratio across all layers ()
Cum
ulati
ve p
roba
bilit
y
092
60
bull Variations in different layers are not fully correlated
bull Some paths have α gt 10 a CBC can underestimate delay variationsbull But these paths have larger delays at the other corner
C-worst corner underestimates delay variations but these paths are dominated by the RC-worst corner
α lt 10 delay variations are covered by the RC-worst corner
Dominated by C-worstΔdelay at C-worst gt Δdelay at RC-worst
Dominated by RC-worst Δdelay at RC-worst gt Δdelay at C-worst
Δdelay at RC-worst [d(Ycw) ndash d(Ytyp)] d(Ytyp)
48ISVLSI-2014 invited talk 140710
Delay Variation
α α
Δdelay at C-worst [d(Ycw) ndash d(Ytyp)] d(Ytyp)
bull Some paths have α gt 10 a CBC can underestimate delay variationsbull But these paths have larger delays at the other corner
C-worst corner underestimates delay variations but these paths are dominated by the RC-worst corner
α lt 10 delay variations are covered by the RC-worst corner
Dominated by C-worstΔdelay at C-worst gt Δdelay at RC-worst
Dominated by RC-worst Δdelay at RC-worst gt Δdelay at C-worst
Δdelay at RC-worst [d(Ycw) ndash d(Ytyp)] d(Ytyp)
bull Paths are more sensitive to R or to Cbull Using RC-worst or C-worst only will underestimate delay variationsbull Need both RC- and C-worst corners to cover process variationsbull In the following discussions α is defined at the dominant corner
49ISVLSI-2014 invited talk 140710
Non-Homogeneous Corner
bull Each layer can have different skewed variationsInterconnect stack with M1 and M2
M1 C
M2 C
3σ
Non-homogeneous cornerM1 == Cw (3σ)M2 == Ctyp
bull Less pessimism with non-homogeneous cornersbull Challenge
bull Many feasible combinationsbull A corner can only cover certain pathsbull How to choose the best combinations
50ISVLSI-2014 invited talk 140710
Opportunities for Tightened BEOL Corners
bull CBC can be pessimistic Most paths have α lt 05 bull Use tightened BEOL corners eg scale BEOL variation in
itf with α = 05
Δdj(Yrcw)dj(Ytyp) x 100
3σjd(Ytyp) x 100
Challenge how to avoid underestimating delay variation to preserve parametric yield
51ISVLSI-2014 invited talk 140710
Wiring Structure in Timing-Critical Paths
bull Critical paths are structurally similar
bull Wires on critical paths are routed on many layers
bull Structure is an outcome of the design flow
Testcasebull 45nm foundry library (wire
resistivity scaled by 8X)bull Netlist NETCARD 1mm2 570K
standard cell instancesbull 9 metal layersbull Extract critical paths from
different PVT and BEOL corners
Wirelength ratio ()
52ISVLSI-2014 invited talk 140710
Proposed Timing Signoff Flow
bull Extract RC at RC-worst C-worst and the typical corners
bull Calculate Δdelay of critical paths
bull Put path j in the group Gtbc if Δdelay is larger than a threshold
bull Fix only the paths in Gtbc using tightened BEOL corners
bull Since tightened corners have smaller delay variations timing closure is easier
Routed design
Timing analysis at BEOL corners Ytyp Ycw Yrcw
GTBC GCBC
ECOusing CBC
Timing analysis
using TBC
violation = 0
Timing analysis
using CBC
violation = 0
ECOusing TBC
done
53ISVLSI-2014 invited talk 140710
Experiment Setup
LEON3MP NETCARD SUPERBLUE12Clock period (ns) 18 20 31
bull Model BTI degradation with Vfinal throughout lifetime
bull Aging of a flat Vfinal asymp aging of an adaptive Vdd
bull But slightly pessimistic
Vdd
time
NBTI
PBTI
VBTI = Vlib asymp Vfinal
58ISVLSI-2014 invited talk 140710
Vfinal Estimation
bull Problem Vfinal is not available at early design stage (design has not been implemented)
bull Vfinal = Vdd end of life (to compensate BTI aging)
bull Gates along critical pathbull Timing slack at t = 0
bull Circuit activity is not an issue bull Because BTI effect is not sensitive to circuit activitybull DC or AC stress model is sufficient
59ISVLSI-2014 invited talk 140710
Observation and Heuristic 2
bull Observation 2 Vfinal is not sensitive to gate types
bull Heuristic 2 use average Vfinal of different gate typesbull Vfinal is a function of timing slack
bull Assume timing slack = 0
10mV
60ISVLSI-2014 invited talk 140710
Technology and Benchmark Circuits
bull NANGATE library with 32nm PTM technology bull Signoff for setup time violationbull Temperature = 125Cbull Process corner = slow NMOS and PMOSbull BTI degradation = DC AC
Supply voltages
Circuit Frequency (GHz)C5315 138c7552 125AES 089MPEG2 105
Vmax105V
Vinit090V
Vheur1 (DC) 097V
Vheur1 (AC) 095V
Vheur2 (DC) 095V
Vheur2 (AC) 093V
61ISVLSI-2014 invited talk 140710
A Reference Signoff Flow
bull Basic idea keep a consistent VBTI VLIB and Vdd throughout circuit lifetime
bull Signoff flowbull Estimate aging at each time step
bull Update circuit timing and Vdd
bull Repeat until t = tfinal
bull Modify circuit and start over if Vfinal gt maximum allowed voltage
bull No overhead in timing analysis but very slow Many STA runs
and library
Vstep AVS voltage stepVfinal converged voltage
62ISVLSI-2014 invited talk 140710
Experiment Setupbull Characterize different derated libraries
bull Evaluate impact of library characterizationbull Seven setups
1 VBTI = Vlib = Vinit Ignore AVS2 Most pessimistic derated library3 VBTI = Vlib = Vmax Extreme corner for AVS4 VBTI = Vfinal Do not overestimate aging but ignores AVS5 No derated library (reference)6 Proposed method with α=07 Proposed method with α=003
Case 1 2 3 4 5 6 7Vlib(V) Vinit Vinit Vmax Vinit NA Vheur1 Vheur2
VBTI (V) Vinit Vmax Vmax Vfinal NA Vheur1 Vheur2
63ISVLSI-2014 invited talk 140710
ldquoChicken and Eggrdquo Loop
bull ldquoChicken and eggrdquo loop in signoffbull Derated library characterization is related to BTI + AVSbull AVS affected by circuit implementation
bull Timing constraints critical paths etc
bull Circuit is affected by library characterization
Circuit
Derated Libraries
Vfinal
Vlib VBTI
64ISVLSI-2014 invited talk 140710
Bias Temperature Instability (BTI)
|ΔVth| increases when device is on (stressed)|ΔVth| is partially recovered when device is off (relaxed)NBTI PMOS PBTINMOS
|Vgs|
time
ON OFF ON OFF
[VattikondaWC06]
Device aging (|ΔVth|) accumulates over time
[TCASrsquo14]
65ISVLSI-2014 invited talk 140710
Observation 1
[Chan11]
bull BTI is a ldquofront-loadedrdquo phenomenon
bull 50 BTI aging happens within the 1st year of circuit lifetime (total lifetime = 10 years)
bull Most Vdd increment happens in early lifetime
bull Gap between Vdd and Vfinal reduces rapidly
asymp70 Vdd increment in 1 year(remaining 30 over 9 years)
Vfinal
66ISVLSI-2014 invited talk 140710
Results for DC Scenario
Optimistic signoff corner bull AVS increases supply voltage
aggressively to compensate agingbull Consume more powerbull May fail to meet timing if desired
bull Assume no EM fix at signoffbull BTI degradation is checked at each step and MTTF is updated as
30 MTTF penalty
200mV voltage compensation
>
70ISVLSI-2014 invited talk 140710
0 2 4 6 8 10 12090092094096098100102104
S1 S2 S3 S4 S5
Year
VDD
DMA 3S1 S2 S3 S4 S5
78
79
80
81
MTT
F (Y
ear)
EM Impact on AVS Scheduling
12 years MTTF penalty
>
71ISVLSI-2014 invited talk 140710
What is ldquoSignoffrdquo
bull Foundation of contract between design house and foundrybull ldquochip should workrdquo stack of models margins analysesbull Function timing signal integrity power integrity hellip
Nominal VddStatic IR drop
Power grid IR gradientDynamic IR
HCINBTI
Signoff Vdd
Voltage
Problem Margins = pessimism
overdesign schedule delay
ldquomargin stackrdquo for voltage signoff
Operating voltage
72ISVLSI-2014 invited talk 140710
Statistical Timing Analysis (1)
bull Delay sensitivity of path pj to variation source zv
bull Assumptions bull Δdjv is linear with respect to variation sources
bull Variation sources are normal distributions
bull Obtain Δdjv using 28 runs of RC extraction and static timing analysis (STA)
28 itf files (27 variation
sources + Ytyp)
Routed Netlist
RC extraction
STA
Δdjv
Δdjv = [ - ] 3dj(Yv) dj(Ytyp)
Note Path delay includes gate and wire delays
73ISVLSI-2014 invited talk 140710
Statistical Timing Analysis (2)bull Σ is the correlation matrix for variation sources (eg 27 x 27)
bull Σ = λλT (Note λ is obtained by Cholesky decomposition)
h h119879 119903119900119906119892 119901119906119905=1minus119864119877119879
+1minus119864119877119903times119879
recovery cycles
Clock period
Error rate [Kahng10]
76ISVLSI-2014 invited talk 140710
Selective-Endpoint Optimizationbull Optimize fanin cone w tighter constraints Allows replacement of Razor FF w normal FFbull Trade off cost of resilience vs data path optimization
Energy Reduction in AVS Contextbull Adaptive voltage scaling allows lower supply voltage for resilient
designs thus reduced powerbull Proposed method trades off between timing-error penalty vs
reduced power at a lower supply voltagebull Proposed method achieves an average of 18 energy reduction
compared to pure-margin designs Resilience benefits increase in the context of AVS strategy
084 088 092 096 10030
36
42
48
54
60brute-forcepure-marginCombOpt
Supply voltage (V)
En
erg
y (
mJ
)
084 086 088 09 092 094 096 098 1 10225
29
33
37
41
45brute-force
pure-margin
CombOpt
Supply voltage (V)
En
erg
y (
mJ
)
MUL EXU
Minimum achievable energy
80ISVLSI-2014 invited talk 140710
Our Concept Mode Dominancebull Design cone (of mode A) is the union of all the feasible operating modes for
circuits signed of at mode Abull Design cone is determined by tradeoff between voltage and frequency (mainly
threshold voltages)bull One mode is outside of the design cone of the other
failed design overdesignbull Mode A has positive timing slacks with respect to mode B
mode A dominates mode Bbull Equivalent dominance no mode is dominated by the other
bull Modes are in each othersrsquo design cone
Voltage
Frequency
A
Negative Slacks = failed design
Positive Slacks = overdesign
B
C
Design Cone of mode A
Multi-mode signoff at modes which do not exhibit equivalent dominance leads to overdesign
Guideline search for signoff modes within design cone reduce overdesign
81ISVLSI-2014 invited talk 140710
Our Method Global Optimizationbull Iteratively sample and refine power models
bull Avoid circuit implementation at each modebull Small constant of runs is enough Scalable
Sample (SPampR)
Construct power models
Estimate optimal signoff modes
Sample (SPampR)
Refine power models
Adaptive search
Global optimization flow
09 10 11 1214
15
16
17
18
19
201st 2nd real
Signoff Voltage (v)
Po
wer
(m
W)
Power estimation of adaptive search
bull Ovals indicate sample pointsbull 1st 2nd power from power models at first
second iterationbull real power from real implemented circuits
Design AESf 700MHz
82ISVLSI-2014 invited talk 140710
Classes of Closed-Loop AVS
bull Critical path may be difficult to identify (IP from 3rd party)
bull Calibrating monitors at multiple modesvoltages requires long test time
Closed-Loop AVS
Design-dependent replica
In-situmonitor
Generic monitor
bull Does not capture design-specific performance variation
82
This work Tunable monitor for closed-loop AVSbull Can be applied as a generic monitorbull Or tuned to capture design-specific performance
83ISVLSI-2014 invited talk 140710
Design of RO with Tunable Vmin
bull Identified two circuit knobs to tune Vmin
bull Series resistancebull Cell types (INV NAND NOR)
bull Proposed circuitbull Different cell type covers different process cornersbull Tune series resistance of each stage to high or low
1 bit 1 bit 1 bit Control pins
High resistance
Low resistance
84ISVLSI-2014 invited talk 140710
Benefit of Resilience Cost Reductionbull Reference flows
bull Pure-margin (PM) conventional methodology w only margin insertionbull Brute-force (BF) insert error-tolerant FFs at timing-critical endpoints
bull Proposed method (CO) achieves up to 20 energy reduction compared to reference methods
bull Resilience benefits increase with safety margin
PM BF CO PM BF CO PM BF CO25
30
35
40
45
50
55 Energy penalty of throughput degradationEnergy penalty of additional circuitsEnergy wo resilience
En
erg
y (
mJ
)
Large marginMedium marginSmall margin
MUL
PM BF CO PM BF CO PM BF CO25
27
29
31
33
35
En
erg
y (
mJ
)
EXU
Large marginMedium marginSmall margin
Smallmediumlarge margin safety margin = 51015 of clock period
85ISVLSI-2014 invited talk 140710
Increased Benefit of Resilience With AVSbull AVS (Adaptive Voltage Scaling) allows lower supply voltage for
resilient designs reduced powerbull We trade off between timing-error penalty vs reduced power at a
lower supply voltagebull Average 18 energy reduction compared to pure-margin designs
Resilience benefits increase in AVS context
084 088 092 096 10030
36
42
48
54
60brute-forcepure-marginCombOpt
Supply voltage (V)
En
erg
y (
mJ
)
084 086 088 09 092 094 096 098 1 10225
29
33
37
41
45brute-force
pure-margin
CombOpt
Supply voltage (V)
En
erg
y (
mJ
)
MUL EXU
Minimum achievable energy
86ISVLSI-2014 invited talk 140710
Overall Optimization Flow
bull Iteratively optimize with SEOpt and SkewOpt
Initial placement (all FFs = error-tolerant FFs)
Energy lt min energy
Save current solution
Margin insertion on K paths based on sensitivity function
Replace error-tolerant FFs w normal FFs
SEOpt
Activity-aware clock skew optimization
SkewOpt
Toward Holistic Modeling Margining and Tolerance of IC Variabi
IC Variability
Challenge Value of Technology
Solutions Modeling Margining Tolerance
Outline
BEOL Corner Optimization
Proposed Timing Signoff Flow
Conventional BEOL Corners
Statistical RC Model
Pessimism of Conventional BEOL Corners (CBC)
Intuition on Delay Variability Across Cw RCw
Intuition on Delay Variability Across Cw RCw (2)
Scaling Factor α and Delay Variation
Find Paths for Which TBCs Can Be Used
Determining α Arcw and Acw
Benefits of Tightened BEOL Corners
Outline (2)
How to Minimize Cost of Resilience
Tradeoff Resilience Cost vs Datapath Cost
Selective-Endpoint Optimization (SEOpt)
Clock Skew Optimization (SkewOpt)
Overall Optimization Flow
Benefit of Low-Cost Resilience
Increased Benefit of Resilience with AVS
Outline (3)
Breaking Chicken-Egg Loops Less Margin
Derated Library Characterization and AVS
Library Characterization for AVS
Power vs Area Across Different Signoffs
Heuristics 1
Vfinal Estimation
Observation and Heuristic 2
Proposed Library Characterization Flow
Power vs Area for All Designs
Also Multi-Mode Signoff Choices Matter
Also Tunable Monitors Less Margin
Also Tunable Monitors Less Margin (2)
Outline (4)
Conclusions
Thank You
Backup
Power Penalty to Fix EM with AVS
Homogeneous Corners
Homogeneous Corners (2)
Correlation Matrix
Wiring Structure in Timing-Critical Paths (2)
Delay Variation
Delay Variation (2)
Non-Homogeneous Corner
Opportunities for Tightened BEOL Corners
Wiring Structure in Timing-Critical Paths (2)
Proposed Timing Signoff Flow (2)
Experiment Setup
Further Analysis
Scaling Factor Results
Benefits of Tightened BEOL Corners (1)
Heuristics 1 (2)
Vfinal Estimation (2)
Observation and Heuristic 2 (2)
Technology and Benchmark Circuits
A Reference Signoff Flow
Experiment Setup (2)
ldquoChicken and Eggrdquo Loop
Bias Temperature Instability (BTI)
Observation 1
Results for DC Scenario
Problem Signoff Corner Definition
AVS Signoff Corner Selection
AVS Impact on EM Lifetime
EM Impact on AVS Scheduling
What is ldquoSignoffrdquo
Statistical Timing Analysis (1)
Statistical Timing Analysis (2)
Resilient Designs
Resilience Cost Reduction Problem
Selective-Endpoint Optimization
Process-Aware Vdd Scaling (PVS)
Challenge Variability
Energy Reduction in AVS Context
Our Concept Mode Dominance
Our Method Global Optimization
Classes of Closed-Loop AVS
Design of RO with Tunable Vmin
Benefit of Resilience Cost Reduction
Increased Benefit of Resilience With AVS
Overall Optimization Flow (2)
18ISVLSI-2014 invited talk 140710
How to Minimize Cost of Resilience bull Additional circuits area and power penaltiesbull Recovery from errors throughput degradationbull Large hold margin short-path padding costbull Want benefits (eg energy) to maximally outweigh costs
Razor Razor-Lite TIMBER
Razor Razor-Lite TIMBER
Power penalty 30 [Das08] ~0 [Kim13] 100 [Choudhury09]
Overall Optimization Flowbull Iteratively optimize with SEOpt and SkewOpt
Initial placement (all FFs = error-tolerant FFs)
Energy lt min energy
Save current solution
Margin insertion on K paths based on sensitivity function
Replace error-tolerant FFs w normal FFs
SEOpt
Activity aware clock skew optimization
SkewOpt
OR-tree insertion
23ISVLSI-2014 invited talk 140710
Benefit of Low-Cost Resiliencebull Reference flows
bull Pure-margin (PM) conventional method w only margin insertionbull Brute-force (BF) use error-tolerant FFs for timing-critical endpoints
bull Proposed method (CO) achieves up to 21 energy reduction compared to reference methods
bull Resilience benefits increase with larger process variation
PM BF CO PM BF CO PM BF CO27
29
31
33
35
37
En
erg
y (
mJ
)
PM BF CO PM BF CO PM BF CO22
26
30
34
38Energy penalty of throughput degradation
Energy penalty of additional circuits
Energy wo resilience
En
erg
y (
mJ
)
Large marginMedium marginSmall margin
MUL
EXU
Large marginMedium marginSmall margin
Smallmediumlarge margin 1σ2σ3σ for SS corner Technology foundry 28nm
24ISVLSI-2014 invited talk 140710
Increased Benefit of Resilience with AVSbull Adaptive voltage scaling allows a lower supply voltage for resilient
designs thus reduced powerbull Proposed method trades off between timing-error penalty vs
reduced power at a lower supply voltagebull Proposed method achieves an average of 17 energy reduction
compared to pure-margin designs Resilience benefits increase in the context of AVS strategy
086 09 094 098 10225
30
35
40
45
50pure-marginbrute-forceCombOpt
Supply voltage (V)
En
erg
y (
mJ
)
070 072 074 076 078 08024
26
28
30
32
34
36 pure-marginbrute-forceCombOpt
Supply voltage (V)
En
erg
y (
mJ
)
MUL EXU
Minimum achievable energy
Technology foundry 28nm
25ISVLSI-2014 invited talk 140710
Outlinebull Introductionbull Modeling of IC Variabilitybull Tolerance of IC Variabilitybull Margining of IC Variabilitybull Conclusions
26ISVLSI-2014 invited talk 140710
Breaking Chicken-Egg Loops Less Marginbull Example Interaction between reliability margin and AVS designs
bull Bias temperature instability (BTI) aging higher |ΔVth| lower fmax
bull AVS can be used to compensate for performance degradation
Circuit
Closed-loop AVS
On-chip aging
monitor
Circuit performanc
e
Voltage regulato
r
Circuit frequency
Vdd
time
time
Without AVSWith AVS
target
27ISVLSI-2014 invited talk 140710
Derated Library Characterization and AVS
bull VBTI = Voltage for BTI aging estimation
bull Vlib = Voltage for circuit performance estimation (library characterization)
bull VBTI and Vlib are required in signoff
bull VBTI and Vlib selection should consider BTI + AVS interaction
bull Aging and Vfinal are unknowns before circuit implementation
BTI degradation and AVS
Vfinal
VBTI |Vt|
Step 1
Vlib
Derated library
Step 2
Circuit implementation and
signoff
circuit
Step 3
28ISVLSI-2014 invited talk 140710
Library Characterization for AVS
bull VBTI = Voltage for BTI aging estimation
bull Vlib = Voltage for circuit performance estimation (library characterization)
bull VBTI and Vlib are required in signoff
bull VBTI and Vlib depend on aging during AVS
bull Aging and Vfinal are unknowns before circuit implementation
Vlib
VBTI Derated library
|Vt| Circuit implementation and
signoff
circuitBTI degradation and AVS
Vfinal
Step 1 Step 2 Step 3
No obvious guideline to define VBTI and Vlib
Inconsistency among Vfinal Vlib VBTI
bull What is the design overhead when timing libraries are not properly characterized
bull Can we define BTI- and AVS-aware signoff corners that ensure product goals with small design lifetime energy overheads Joint work with Wei-Ting Jonas Chan Tuck-Boon Chan Siddhartha Nath
29ISVLSI-2014 invited talk 140710
Power vs Area Across Different Signoffs
ldquoKneerdquo point for balanced area and power tradeoff
Optimistic signoff corner bull AVS increases supply voltage
aggressively to compensate aging
bull Large lifetime energy overhead
bull May fail to meet timing if desired supply voltage gt Vmax
bull Selection of signoff modes affects area power
bull ASP-DAC 2013 Optimization of signoff modes
Improve performance power or area
Reduce overdesign
NOM
ODNOM
OD
time
Vdd
tnom tOD tnom tOD
Also Multi-Mode Signoff Choices Matter
12
Fix fOD still 14 power range
Power of circuits w different overdrive modes
Different overdrive modes 26 power range
fnom = 800MHz Vnom = 08V
36ISVLSI-2014 invited talk 140710
Also Tunable Monitors Less Margin
Aggressive config Vmin_est lt Vmin_chip Some chips will fail
Optimized configbull Increase high
resistance passgatesbull Vmin_est asymp Vmin_chip
Default config
bull Low resistance passgates
bull Guardband for worst-case
bull Vmin_est gt Vmin_chip
bull 13mV margin
37ISVLSI-2014 invited talk 140710
Also Tunable Monitors Less Margin
Aggressive config Vmin_est lt Vmin_chip Some chips will fail
Default config
bull Low resistance passgates
bull Guardband for worst-case
bull Vmin_est gt Vmin_chip
bull 13mV margin
Optimized configbull Increase high
resistance passgatesbull Vmin_est asymp Vmin_chip
Benefits of tunability bull Compensate for difference
between model vs siliconbull Recover margin when variation is
reduced due to improved process
38ISVLSI-2014 invited talk 140710
Outlinebull Introductionbull Modeling of IC Variabilitybull Margining of IC Variabilitybull Tolerance of IC Variabilitybull Conclusions
39ISVLSI-2014 invited talk 140710
Conclusionsbull Variability severely challenges IC value
bull In manufacturing process during operation across lifetime
bull Benefit of ldquonext noderdquo is increasingly hard to findbull Entire node is a ldquo202020rdquo value propositionbull 5-10 in PPA metrics is now substantial at leading edge
bull Variability is connected to tapeout IC properties by models margins tolerances used in signoff
bull Some takeaways from this talkbull Substantial benefit from tightening BEOL corners (= signoff)bull ldquoMinimum cost of resiliencerdquo is a rich optimization challengebull Chicken-egg loops in signoff definition can be brokenbull Holistic approaches will provide ldquoequivalent scalingrdquo that
extends the value trajectory of Moorersquos Law
40ISVLSI-2014 invited talk 140710
Thank You
41ISVLSI-2014 invited talk 140710
Backup
42ISVLSI-2014 invited talk 140710
Power Penalty to Fix EM with AVS
1 2 3 4 5 6 7 8 91200
1300
1400
1500
1600
1700
030
032
034
036
Core Power (mW) PG Power (mW)
Implemetation
Core
Pow
er (m
W)
PG
Pow
er (m
W)
bull Core power increases due to elevated voltage bull PG power increases due to both elevated voltage and mesh degradationbull A tradeoff between invested guardband in signoff
Highest invested guardband
Least invested guardband
14 power penalty
>
43ISVLSI-2014 invited talk 140710
Homogeneous Cornersbull (1) Define RC corners of each layer separatelybull (2) Use corners from each layer to construct a
homogeneous corner for an interconnect stack
C-3σ
Layer M2
3σ
C-3σ
Layer M1
3σ
Interconnect stack with M1 and M2
M1 C
M2 C
3σ Pessimism
Example worst-case capacitance corner Homogeneous
Cw corner
44ISVLSI-2014 invited talk 140710
Homogeneous Cornersbull (1) Define RC corners of each layer separatelybull (2) Use corners from each layer to construct a
homogeneous corner for an interconnect stack
Interconnect stack with M1 and M2
M1 C
M2 C
3σ
Homogeneous Cw corner
C-3σ
Layer M2
3σ
C-3σ
Layer M1
3σ
Pessimism
Example worst-case capacitance corner
When variations in different layers are not fully correlated pessimism of homogeneous corners increase with layers
45ISVLSI-2014 invited talk 140710
Correlation Matrixbull Let Σ be the correlation matrix for variation sources
M1 M2 M3 M4
ΔW ΔT ΔH ΔW ΔT ΔH ΔW ΔT ΔH ΔW ΔT ΔH
M1 ΔW 1 0 0 γ 0 0 γ 0 0 0 0 0
ΔT 0 1 0 0 γ 0 0 γ 0 0 0 0
ΔH 0 0 1 0 0 γ 0 0 γ 0 0 0
M2 ΔW γ 0 0 1 0 0 γ 0 0 0 0 0
ΔT 0 γ 0 0 1 0 0 γ 0 0 0 0
ΔH 0 0 γ 0 0 1 0 0 γ 0 0 0
M3 ΔW γ 0 0 γ 0 0 1 0 0 0 0 0
ΔT 0 γ 0 0 γ 0 0 1 0 0 0 0
ΔH 0 0 γ 0 0 γ 0 0 1 0 0 0
M4 ΔW 0 0 0 0 0 0 0 0 0 1 0 0
ΔT 0 0 0 0 0 0 0 0 0 0 1 0
ΔH 0 0 0 0 0 0 0 0 0 0 0 1
= Σ
Correlation for variation sources with the same variation type and in the process module γ 05
Variation sources in different process modules are independent
46ISVLSI-2014 invited talk 140710
Wiring Structure in Timing-Critical Paths (2)
bull 92 of paths have lt 60 of wirelength on any single layer
Max wirelength ratio across all layers ()
Cum
ulati
ve p
roba
bilit
y
092
60
bull Variations in different layers are not fully correlated
bull Some paths have α gt 10 a CBC can underestimate delay variationsbull But these paths have larger delays at the other corner
C-worst corner underestimates delay variations but these paths are dominated by the RC-worst corner
α lt 10 delay variations are covered by the RC-worst corner
Dominated by C-worstΔdelay at C-worst gt Δdelay at RC-worst
Dominated by RC-worst Δdelay at RC-worst gt Δdelay at C-worst
Δdelay at RC-worst [d(Ycw) ndash d(Ytyp)] d(Ytyp)
48ISVLSI-2014 invited talk 140710
Delay Variation
α α
Δdelay at C-worst [d(Ycw) ndash d(Ytyp)] d(Ytyp)
bull Some paths have α gt 10 a CBC can underestimate delay variationsbull But these paths have larger delays at the other corner
C-worst corner underestimates delay variations but these paths are dominated by the RC-worst corner
α lt 10 delay variations are covered by the RC-worst corner
Dominated by C-worstΔdelay at C-worst gt Δdelay at RC-worst
Dominated by RC-worst Δdelay at RC-worst gt Δdelay at C-worst
Δdelay at RC-worst [d(Ycw) ndash d(Ytyp)] d(Ytyp)
bull Paths are more sensitive to R or to Cbull Using RC-worst or C-worst only will underestimate delay variationsbull Need both RC- and C-worst corners to cover process variationsbull In the following discussions α is defined at the dominant corner
49ISVLSI-2014 invited talk 140710
Non-Homogeneous Corner
bull Each layer can have different skewed variationsInterconnect stack with M1 and M2
M1 C
M2 C
3σ
Non-homogeneous cornerM1 == Cw (3σ)M2 == Ctyp
bull Less pessimism with non-homogeneous cornersbull Challenge
bull Many feasible combinationsbull A corner can only cover certain pathsbull How to choose the best combinations
50ISVLSI-2014 invited talk 140710
Opportunities for Tightened BEOL Corners
bull CBC can be pessimistic Most paths have α lt 05 bull Use tightened BEOL corners eg scale BEOL variation in
itf with α = 05
Δdj(Yrcw)dj(Ytyp) x 100
3σjd(Ytyp) x 100
Challenge how to avoid underestimating delay variation to preserve parametric yield
51ISVLSI-2014 invited talk 140710
Wiring Structure in Timing-Critical Paths
bull Critical paths are structurally similar
bull Wires on critical paths are routed on many layers
bull Structure is an outcome of the design flow
Testcasebull 45nm foundry library (wire
resistivity scaled by 8X)bull Netlist NETCARD 1mm2 570K
standard cell instancesbull 9 metal layersbull Extract critical paths from
different PVT and BEOL corners
Wirelength ratio ()
52ISVLSI-2014 invited talk 140710
Proposed Timing Signoff Flow
bull Extract RC at RC-worst C-worst and the typical corners
bull Calculate Δdelay of critical paths
bull Put path j in the group Gtbc if Δdelay is larger than a threshold
bull Fix only the paths in Gtbc using tightened BEOL corners
bull Since tightened corners have smaller delay variations timing closure is easier
Routed design
Timing analysis at BEOL corners Ytyp Ycw Yrcw
GTBC GCBC
ECOusing CBC
Timing analysis
using TBC
violation = 0
Timing analysis
using CBC
violation = 0
ECOusing TBC
done
53ISVLSI-2014 invited talk 140710
Experiment Setup
LEON3MP NETCARD SUPERBLUE12Clock period (ns) 18 20 31
bull Model BTI degradation with Vfinal throughout lifetime
bull Aging of a flat Vfinal asymp aging of an adaptive Vdd
bull But slightly pessimistic
Vdd
time
NBTI
PBTI
VBTI = Vlib asymp Vfinal
58ISVLSI-2014 invited talk 140710
Vfinal Estimation
bull Problem Vfinal is not available at early design stage (design has not been implemented)
bull Vfinal = Vdd end of life (to compensate BTI aging)
bull Gates along critical pathbull Timing slack at t = 0
bull Circuit activity is not an issue bull Because BTI effect is not sensitive to circuit activitybull DC or AC stress model is sufficient
59ISVLSI-2014 invited talk 140710
Observation and Heuristic 2
bull Observation 2 Vfinal is not sensitive to gate types
bull Heuristic 2 use average Vfinal of different gate typesbull Vfinal is a function of timing slack
bull Assume timing slack = 0
10mV
60ISVLSI-2014 invited talk 140710
Technology and Benchmark Circuits
bull NANGATE library with 32nm PTM technology bull Signoff for setup time violationbull Temperature = 125Cbull Process corner = slow NMOS and PMOSbull BTI degradation = DC AC
Supply voltages
Circuit Frequency (GHz)C5315 138c7552 125AES 089MPEG2 105
Vmax105V
Vinit090V
Vheur1 (DC) 097V
Vheur1 (AC) 095V
Vheur2 (DC) 095V
Vheur2 (AC) 093V
61ISVLSI-2014 invited talk 140710
A Reference Signoff Flow
bull Basic idea keep a consistent VBTI VLIB and Vdd throughout circuit lifetime
bull Signoff flowbull Estimate aging at each time step
bull Update circuit timing and Vdd
bull Repeat until t = tfinal
bull Modify circuit and start over if Vfinal gt maximum allowed voltage
bull No overhead in timing analysis but very slow Many STA runs
and library
Vstep AVS voltage stepVfinal converged voltage
62ISVLSI-2014 invited talk 140710
Experiment Setupbull Characterize different derated libraries
bull Evaluate impact of library characterizationbull Seven setups
1 VBTI = Vlib = Vinit Ignore AVS2 Most pessimistic derated library3 VBTI = Vlib = Vmax Extreme corner for AVS4 VBTI = Vfinal Do not overestimate aging but ignores AVS5 No derated library (reference)6 Proposed method with α=07 Proposed method with α=003
Case 1 2 3 4 5 6 7Vlib(V) Vinit Vinit Vmax Vinit NA Vheur1 Vheur2
VBTI (V) Vinit Vmax Vmax Vfinal NA Vheur1 Vheur2
63ISVLSI-2014 invited talk 140710
ldquoChicken and Eggrdquo Loop
bull ldquoChicken and eggrdquo loop in signoffbull Derated library characterization is related to BTI + AVSbull AVS affected by circuit implementation
bull Timing constraints critical paths etc
bull Circuit is affected by library characterization
Circuit
Derated Libraries
Vfinal
Vlib VBTI
64ISVLSI-2014 invited talk 140710
Bias Temperature Instability (BTI)
|ΔVth| increases when device is on (stressed)|ΔVth| is partially recovered when device is off (relaxed)NBTI PMOS PBTINMOS
|Vgs|
time
ON OFF ON OFF
[VattikondaWC06]
Device aging (|ΔVth|) accumulates over time
[TCASrsquo14]
65ISVLSI-2014 invited talk 140710
Observation 1
[Chan11]
bull BTI is a ldquofront-loadedrdquo phenomenon
bull 50 BTI aging happens within the 1st year of circuit lifetime (total lifetime = 10 years)
bull Most Vdd increment happens in early lifetime
bull Gap between Vdd and Vfinal reduces rapidly
asymp70 Vdd increment in 1 year(remaining 30 over 9 years)
Vfinal
66ISVLSI-2014 invited talk 140710
Results for DC Scenario
Optimistic signoff corner bull AVS increases supply voltage
aggressively to compensate agingbull Consume more powerbull May fail to meet timing if desired
bull Assume no EM fix at signoffbull BTI degradation is checked at each step and MTTF is updated as
30 MTTF penalty
200mV voltage compensation
>
70ISVLSI-2014 invited talk 140710
0 2 4 6 8 10 12090092094096098100102104
S1 S2 S3 S4 S5
Year
VDD
DMA 3S1 S2 S3 S4 S5
78
79
80
81
MTT
F (Y
ear)
EM Impact on AVS Scheduling
12 years MTTF penalty
>
71ISVLSI-2014 invited talk 140710
What is ldquoSignoffrdquo
bull Foundation of contract between design house and foundrybull ldquochip should workrdquo stack of models margins analysesbull Function timing signal integrity power integrity hellip
Nominal VddStatic IR drop
Power grid IR gradientDynamic IR
HCINBTI
Signoff Vdd
Voltage
Problem Margins = pessimism
overdesign schedule delay
ldquomargin stackrdquo for voltage signoff
Operating voltage
72ISVLSI-2014 invited talk 140710
Statistical Timing Analysis (1)
bull Delay sensitivity of path pj to variation source zv
bull Assumptions bull Δdjv is linear with respect to variation sources
bull Variation sources are normal distributions
bull Obtain Δdjv using 28 runs of RC extraction and static timing analysis (STA)
28 itf files (27 variation
sources + Ytyp)
Routed Netlist
RC extraction
STA
Δdjv
Δdjv = [ - ] 3dj(Yv) dj(Ytyp)
Note Path delay includes gate and wire delays
73ISVLSI-2014 invited talk 140710
Statistical Timing Analysis (2)bull Σ is the correlation matrix for variation sources (eg 27 x 27)
bull Σ = λλT (Note λ is obtained by Cholesky decomposition)
h h119879 119903119900119906119892 119901119906119905=1minus119864119877119879
+1minus119864119877119903times119879
recovery cycles
Clock period
Error rate [Kahng10]
76ISVLSI-2014 invited talk 140710
Selective-Endpoint Optimizationbull Optimize fanin cone w tighter constraints Allows replacement of Razor FF w normal FFbull Trade off cost of resilience vs data path optimization
Overall Optimization Flowbull Iteratively optimize with SEOpt and SkewOpt
Initial placement (all FFs = error-tolerant FFs)
Energy lt min energy
Save current solution
Margin insertion on K paths based on sensitivity function
Replace error-tolerant FFs w normal FFs
SEOpt
Activity aware clock skew optimization
SkewOpt
OR-tree insertion
23ISVLSI-2014 invited talk 140710
Benefit of Low-Cost Resiliencebull Reference flows
bull Pure-margin (PM) conventional method w only margin insertionbull Brute-force (BF) use error-tolerant FFs for timing-critical endpoints
bull Proposed method (CO) achieves up to 21 energy reduction compared to reference methods
bull Resilience benefits increase with larger process variation
PM BF CO PM BF CO PM BF CO27
29
31
33
35
37
En
erg
y (
mJ
)
PM BF CO PM BF CO PM BF CO22
26
30
34
38Energy penalty of throughput degradation
Energy penalty of additional circuits
Energy wo resilience
En
erg
y (
mJ
)
Large marginMedium marginSmall margin
MUL
EXU
Large marginMedium marginSmall margin
Smallmediumlarge margin 1σ2σ3σ for SS corner Technology foundry 28nm
24ISVLSI-2014 invited talk 140710
Increased Benefit of Resilience with AVSbull Adaptive voltage scaling allows a lower supply voltage for resilient
designs thus reduced powerbull Proposed method trades off between timing-error penalty vs
reduced power at a lower supply voltagebull Proposed method achieves an average of 17 energy reduction
compared to pure-margin designs Resilience benefits increase in the context of AVS strategy
086 09 094 098 10225
30
35
40
45
50pure-marginbrute-forceCombOpt
Supply voltage (V)
En
erg
y (
mJ
)
070 072 074 076 078 08024
26
28
30
32
34
36 pure-marginbrute-forceCombOpt
Supply voltage (V)
En
erg
y (
mJ
)
MUL EXU
Minimum achievable energy
Technology foundry 28nm
25ISVLSI-2014 invited talk 140710
Outlinebull Introductionbull Modeling of IC Variabilitybull Tolerance of IC Variabilitybull Margining of IC Variabilitybull Conclusions
26ISVLSI-2014 invited talk 140710
Breaking Chicken-Egg Loops Less Marginbull Example Interaction between reliability margin and AVS designs
bull Bias temperature instability (BTI) aging higher |ΔVth| lower fmax
bull AVS can be used to compensate for performance degradation
Circuit
Closed-loop AVS
On-chip aging
monitor
Circuit performanc
e
Voltage regulato
r
Circuit frequency
Vdd
time
time
Without AVSWith AVS
target
27ISVLSI-2014 invited talk 140710
Derated Library Characterization and AVS
bull VBTI = Voltage for BTI aging estimation
bull Vlib = Voltage for circuit performance estimation (library characterization)
bull VBTI and Vlib are required in signoff
bull VBTI and Vlib selection should consider BTI + AVS interaction
bull Aging and Vfinal are unknowns before circuit implementation
BTI degradation and AVS
Vfinal
VBTI |Vt|
Step 1
Vlib
Derated library
Step 2
Circuit implementation and
signoff
circuit
Step 3
28ISVLSI-2014 invited talk 140710
Library Characterization for AVS
bull VBTI = Voltage for BTI aging estimation
bull Vlib = Voltage for circuit performance estimation (library characterization)
bull VBTI and Vlib are required in signoff
bull VBTI and Vlib depend on aging during AVS
bull Aging and Vfinal are unknowns before circuit implementation
Vlib
VBTI Derated library
|Vt| Circuit implementation and
signoff
circuitBTI degradation and AVS
Vfinal
Step 1 Step 2 Step 3
No obvious guideline to define VBTI and Vlib
Inconsistency among Vfinal Vlib VBTI
bull What is the design overhead when timing libraries are not properly characterized
bull Can we define BTI- and AVS-aware signoff corners that ensure product goals with small design lifetime energy overheads Joint work with Wei-Ting Jonas Chan Tuck-Boon Chan Siddhartha Nath
29ISVLSI-2014 invited talk 140710
Power vs Area Across Different Signoffs
ldquoKneerdquo point for balanced area and power tradeoff
Optimistic signoff corner bull AVS increases supply voltage
aggressively to compensate aging
bull Large lifetime energy overhead
bull May fail to meet timing if desired supply voltage gt Vmax
bull Selection of signoff modes affects area power
bull ASP-DAC 2013 Optimization of signoff modes
Improve performance power or area
Reduce overdesign
NOM
ODNOM
OD
time
Vdd
tnom tOD tnom tOD
Also Multi-Mode Signoff Choices Matter
12
Fix fOD still 14 power range
Power of circuits w different overdrive modes
Different overdrive modes 26 power range
fnom = 800MHz Vnom = 08V
36ISVLSI-2014 invited talk 140710
Also Tunable Monitors Less Margin
Aggressive config Vmin_est lt Vmin_chip Some chips will fail
Optimized configbull Increase high
resistance passgatesbull Vmin_est asymp Vmin_chip
Default config
bull Low resistance passgates
bull Guardband for worst-case
bull Vmin_est gt Vmin_chip
bull 13mV margin
37ISVLSI-2014 invited talk 140710
Also Tunable Monitors Less Margin
Aggressive config Vmin_est lt Vmin_chip Some chips will fail
Default config
bull Low resistance passgates
bull Guardband for worst-case
bull Vmin_est gt Vmin_chip
bull 13mV margin
Optimized configbull Increase high
resistance passgatesbull Vmin_est asymp Vmin_chip
Benefits of tunability bull Compensate for difference
between model vs siliconbull Recover margin when variation is
reduced due to improved process
38ISVLSI-2014 invited talk 140710
Outlinebull Introductionbull Modeling of IC Variabilitybull Margining of IC Variabilitybull Tolerance of IC Variabilitybull Conclusions
39ISVLSI-2014 invited talk 140710
Conclusionsbull Variability severely challenges IC value
bull In manufacturing process during operation across lifetime
bull Benefit of ldquonext noderdquo is increasingly hard to findbull Entire node is a ldquo202020rdquo value propositionbull 5-10 in PPA metrics is now substantial at leading edge
bull Variability is connected to tapeout IC properties by models margins tolerances used in signoff
bull Some takeaways from this talkbull Substantial benefit from tightening BEOL corners (= signoff)bull ldquoMinimum cost of resiliencerdquo is a rich optimization challengebull Chicken-egg loops in signoff definition can be brokenbull Holistic approaches will provide ldquoequivalent scalingrdquo that
extends the value trajectory of Moorersquos Law
40ISVLSI-2014 invited talk 140710
Thank You
41ISVLSI-2014 invited talk 140710
Backup
42ISVLSI-2014 invited talk 140710
Power Penalty to Fix EM with AVS
1 2 3 4 5 6 7 8 91200
1300
1400
1500
1600
1700
030
032
034
036
Core Power (mW) PG Power (mW)
Implemetation
Core
Pow
er (m
W)
PG
Pow
er (m
W)
bull Core power increases due to elevated voltage bull PG power increases due to both elevated voltage and mesh degradationbull A tradeoff between invested guardband in signoff
Highest invested guardband
Least invested guardband
14 power penalty
>
43ISVLSI-2014 invited talk 140710
Homogeneous Cornersbull (1) Define RC corners of each layer separatelybull (2) Use corners from each layer to construct a
homogeneous corner for an interconnect stack
C-3σ
Layer M2
3σ
C-3σ
Layer M1
3σ
Interconnect stack with M1 and M2
M1 C
M2 C
3σ Pessimism
Example worst-case capacitance corner Homogeneous
Cw corner
44ISVLSI-2014 invited talk 140710
Homogeneous Cornersbull (1) Define RC corners of each layer separatelybull (2) Use corners from each layer to construct a
homogeneous corner for an interconnect stack
Interconnect stack with M1 and M2
M1 C
M2 C
3σ
Homogeneous Cw corner
C-3σ
Layer M2
3σ
C-3σ
Layer M1
3σ
Pessimism
Example worst-case capacitance corner
When variations in different layers are not fully correlated pessimism of homogeneous corners increase with layers
45ISVLSI-2014 invited talk 140710
Correlation Matrixbull Let Σ be the correlation matrix for variation sources
M1 M2 M3 M4
ΔW ΔT ΔH ΔW ΔT ΔH ΔW ΔT ΔH ΔW ΔT ΔH
M1 ΔW 1 0 0 γ 0 0 γ 0 0 0 0 0
ΔT 0 1 0 0 γ 0 0 γ 0 0 0 0
ΔH 0 0 1 0 0 γ 0 0 γ 0 0 0
M2 ΔW γ 0 0 1 0 0 γ 0 0 0 0 0
ΔT 0 γ 0 0 1 0 0 γ 0 0 0 0
ΔH 0 0 γ 0 0 1 0 0 γ 0 0 0
M3 ΔW γ 0 0 γ 0 0 1 0 0 0 0 0
ΔT 0 γ 0 0 γ 0 0 1 0 0 0 0
ΔH 0 0 γ 0 0 γ 0 0 1 0 0 0
M4 ΔW 0 0 0 0 0 0 0 0 0 1 0 0
ΔT 0 0 0 0 0 0 0 0 0 0 1 0
ΔH 0 0 0 0 0 0 0 0 0 0 0 1
= Σ
Correlation for variation sources with the same variation type and in the process module γ 05
Variation sources in different process modules are independent
46ISVLSI-2014 invited talk 140710
Wiring Structure in Timing-Critical Paths (2)
bull 92 of paths have lt 60 of wirelength on any single layer
Max wirelength ratio across all layers ()
Cum
ulati
ve p
roba
bilit
y
092
60
bull Variations in different layers are not fully correlated
bull Some paths have α gt 10 a CBC can underestimate delay variationsbull But these paths have larger delays at the other corner
C-worst corner underestimates delay variations but these paths are dominated by the RC-worst corner
α lt 10 delay variations are covered by the RC-worst corner
Dominated by C-worstΔdelay at C-worst gt Δdelay at RC-worst
Dominated by RC-worst Δdelay at RC-worst gt Δdelay at C-worst
Δdelay at RC-worst [d(Ycw) ndash d(Ytyp)] d(Ytyp)
48ISVLSI-2014 invited talk 140710
Delay Variation
α α
Δdelay at C-worst [d(Ycw) ndash d(Ytyp)] d(Ytyp)
bull Some paths have α gt 10 a CBC can underestimate delay variationsbull But these paths have larger delays at the other corner
C-worst corner underestimates delay variations but these paths are dominated by the RC-worst corner
α lt 10 delay variations are covered by the RC-worst corner
Dominated by C-worstΔdelay at C-worst gt Δdelay at RC-worst
Dominated by RC-worst Δdelay at RC-worst gt Δdelay at C-worst
Δdelay at RC-worst [d(Ycw) ndash d(Ytyp)] d(Ytyp)
bull Paths are more sensitive to R or to Cbull Using RC-worst or C-worst only will underestimate delay variationsbull Need both RC- and C-worst corners to cover process variationsbull In the following discussions α is defined at the dominant corner
49ISVLSI-2014 invited talk 140710
Non-Homogeneous Corner
bull Each layer can have different skewed variationsInterconnect stack with M1 and M2
M1 C
M2 C
3σ
Non-homogeneous cornerM1 == Cw (3σ)M2 == Ctyp
bull Less pessimism with non-homogeneous cornersbull Challenge
bull Many feasible combinationsbull A corner can only cover certain pathsbull How to choose the best combinations
50ISVLSI-2014 invited talk 140710
Opportunities for Tightened BEOL Corners
bull CBC can be pessimistic Most paths have α lt 05 bull Use tightened BEOL corners eg scale BEOL variation in
itf with α = 05
Δdj(Yrcw)dj(Ytyp) x 100
3σjd(Ytyp) x 100
Challenge how to avoid underestimating delay variation to preserve parametric yield
51ISVLSI-2014 invited talk 140710
Wiring Structure in Timing-Critical Paths
bull Critical paths are structurally similar
bull Wires on critical paths are routed on many layers
bull Structure is an outcome of the design flow
Testcasebull 45nm foundry library (wire
resistivity scaled by 8X)bull Netlist NETCARD 1mm2 570K
standard cell instancesbull 9 metal layersbull Extract critical paths from
different PVT and BEOL corners
Wirelength ratio ()
52ISVLSI-2014 invited talk 140710
Proposed Timing Signoff Flow
bull Extract RC at RC-worst C-worst and the typical corners
bull Calculate Δdelay of critical paths
bull Put path j in the group Gtbc if Δdelay is larger than a threshold
bull Fix only the paths in Gtbc using tightened BEOL corners
bull Since tightened corners have smaller delay variations timing closure is easier
Routed design
Timing analysis at BEOL corners Ytyp Ycw Yrcw
GTBC GCBC
ECOusing CBC
Timing analysis
using TBC
violation = 0
Timing analysis
using CBC
violation = 0
ECOusing TBC
done
53ISVLSI-2014 invited talk 140710
Experiment Setup
LEON3MP NETCARD SUPERBLUE12Clock period (ns) 18 20 31
bull Model BTI degradation with Vfinal throughout lifetime
bull Aging of a flat Vfinal asymp aging of an adaptive Vdd
bull But slightly pessimistic
Vdd
time
NBTI
PBTI
VBTI = Vlib asymp Vfinal
58ISVLSI-2014 invited talk 140710
Vfinal Estimation
bull Problem Vfinal is not available at early design stage (design has not been implemented)
bull Vfinal = Vdd end of life (to compensate BTI aging)
bull Gates along critical pathbull Timing slack at t = 0
bull Circuit activity is not an issue bull Because BTI effect is not sensitive to circuit activitybull DC or AC stress model is sufficient
59ISVLSI-2014 invited talk 140710
Observation and Heuristic 2
bull Observation 2 Vfinal is not sensitive to gate types
bull Heuristic 2 use average Vfinal of different gate typesbull Vfinal is a function of timing slack
bull Assume timing slack = 0
10mV
60ISVLSI-2014 invited talk 140710
Technology and Benchmark Circuits
bull NANGATE library with 32nm PTM technology bull Signoff for setup time violationbull Temperature = 125Cbull Process corner = slow NMOS and PMOSbull BTI degradation = DC AC
Supply voltages
Circuit Frequency (GHz)C5315 138c7552 125AES 089MPEG2 105
Vmax105V
Vinit090V
Vheur1 (DC) 097V
Vheur1 (AC) 095V
Vheur2 (DC) 095V
Vheur2 (AC) 093V
61ISVLSI-2014 invited talk 140710
A Reference Signoff Flow
bull Basic idea keep a consistent VBTI VLIB and Vdd throughout circuit lifetime
bull Signoff flowbull Estimate aging at each time step
bull Update circuit timing and Vdd
bull Repeat until t = tfinal
bull Modify circuit and start over if Vfinal gt maximum allowed voltage
bull No overhead in timing analysis but very slow Many STA runs
and library
Vstep AVS voltage stepVfinal converged voltage
62ISVLSI-2014 invited talk 140710
Experiment Setupbull Characterize different derated libraries
bull Evaluate impact of library characterizationbull Seven setups
1 VBTI = Vlib = Vinit Ignore AVS2 Most pessimistic derated library3 VBTI = Vlib = Vmax Extreme corner for AVS4 VBTI = Vfinal Do not overestimate aging but ignores AVS5 No derated library (reference)6 Proposed method with α=07 Proposed method with α=003
Case 1 2 3 4 5 6 7Vlib(V) Vinit Vinit Vmax Vinit NA Vheur1 Vheur2
VBTI (V) Vinit Vmax Vmax Vfinal NA Vheur1 Vheur2
63ISVLSI-2014 invited talk 140710
ldquoChicken and Eggrdquo Loop
bull ldquoChicken and eggrdquo loop in signoffbull Derated library characterization is related to BTI + AVSbull AVS affected by circuit implementation
bull Timing constraints critical paths etc
bull Circuit is affected by library characterization
Circuit
Derated Libraries
Vfinal
Vlib VBTI
64ISVLSI-2014 invited talk 140710
Bias Temperature Instability (BTI)
|ΔVth| increases when device is on (stressed)|ΔVth| is partially recovered when device is off (relaxed)NBTI PMOS PBTINMOS
|Vgs|
time
ON OFF ON OFF
[VattikondaWC06]
Device aging (|ΔVth|) accumulates over time
[TCASrsquo14]
65ISVLSI-2014 invited talk 140710
Observation 1
[Chan11]
bull BTI is a ldquofront-loadedrdquo phenomenon
bull 50 BTI aging happens within the 1st year of circuit lifetime (total lifetime = 10 years)
bull Most Vdd increment happens in early lifetime
bull Gap between Vdd and Vfinal reduces rapidly
asymp70 Vdd increment in 1 year(remaining 30 over 9 years)
Vfinal
66ISVLSI-2014 invited talk 140710
Results for DC Scenario
Optimistic signoff corner bull AVS increases supply voltage
aggressively to compensate agingbull Consume more powerbull May fail to meet timing if desired
bull Assume no EM fix at signoffbull BTI degradation is checked at each step and MTTF is updated as
30 MTTF penalty
200mV voltage compensation
>
70ISVLSI-2014 invited talk 140710
0 2 4 6 8 10 12090092094096098100102104
S1 S2 S3 S4 S5
Year
VDD
DMA 3S1 S2 S3 S4 S5
78
79
80
81
MTT
F (Y
ear)
EM Impact on AVS Scheduling
12 years MTTF penalty
>
71ISVLSI-2014 invited talk 140710
What is ldquoSignoffrdquo
bull Foundation of contract between design house and foundrybull ldquochip should workrdquo stack of models margins analysesbull Function timing signal integrity power integrity hellip
Nominal VddStatic IR drop
Power grid IR gradientDynamic IR
HCINBTI
Signoff Vdd
Voltage
Problem Margins = pessimism
overdesign schedule delay
ldquomargin stackrdquo for voltage signoff
Operating voltage
72ISVLSI-2014 invited talk 140710
Statistical Timing Analysis (1)
bull Delay sensitivity of path pj to variation source zv
bull Assumptions bull Δdjv is linear with respect to variation sources
bull Variation sources are normal distributions
bull Obtain Δdjv using 28 runs of RC extraction and static timing analysis (STA)
28 itf files (27 variation
sources + Ytyp)
Routed Netlist
RC extraction
STA
Δdjv
Δdjv = [ - ] 3dj(Yv) dj(Ytyp)
Note Path delay includes gate and wire delays
73ISVLSI-2014 invited talk 140710
Statistical Timing Analysis (2)bull Σ is the correlation matrix for variation sources (eg 27 x 27)
bull Σ = λλT (Note λ is obtained by Cholesky decomposition)
h h119879 119903119900119906119892 119901119906119905=1minus119864119877119879
+1minus119864119877119903times119879
recovery cycles
Clock period
Error rate [Kahng10]
76ISVLSI-2014 invited talk 140710
Selective-Endpoint Optimizationbull Optimize fanin cone w tighter constraints Allows replacement of Razor FF w normal FFbull Trade off cost of resilience vs data path optimization
Overall Optimization Flowbull Iteratively optimize with SEOpt and SkewOpt
Initial placement (all FFs = error-tolerant FFs)
Energy lt min energy
Save current solution
Margin insertion on K paths based on sensitivity function
Replace error-tolerant FFs w normal FFs
SEOpt
Activity aware clock skew optimization
SkewOpt
OR-tree insertion
23ISVLSI-2014 invited talk 140710
Benefit of Low-Cost Resiliencebull Reference flows
bull Pure-margin (PM) conventional method w only margin insertionbull Brute-force (BF) use error-tolerant FFs for timing-critical endpoints
bull Proposed method (CO) achieves up to 21 energy reduction compared to reference methods
bull Resilience benefits increase with larger process variation
PM BF CO PM BF CO PM BF CO27
29
31
33
35
37
En
erg
y (
mJ
)
PM BF CO PM BF CO PM BF CO22
26
30
34
38Energy penalty of throughput degradation
Energy penalty of additional circuits
Energy wo resilience
En
erg
y (
mJ
)
Large marginMedium marginSmall margin
MUL
EXU
Large marginMedium marginSmall margin
Smallmediumlarge margin 1σ2σ3σ for SS corner Technology foundry 28nm
24ISVLSI-2014 invited talk 140710
Increased Benefit of Resilience with AVSbull Adaptive voltage scaling allows a lower supply voltage for resilient
designs thus reduced powerbull Proposed method trades off between timing-error penalty vs
reduced power at a lower supply voltagebull Proposed method achieves an average of 17 energy reduction
compared to pure-margin designs Resilience benefits increase in the context of AVS strategy
086 09 094 098 10225
30
35
40
45
50pure-marginbrute-forceCombOpt
Supply voltage (V)
En
erg
y (
mJ
)
070 072 074 076 078 08024
26
28
30
32
34
36 pure-marginbrute-forceCombOpt
Supply voltage (V)
En
erg
y (
mJ
)
MUL EXU
Minimum achievable energy
Technology foundry 28nm
25ISVLSI-2014 invited talk 140710
Outlinebull Introductionbull Modeling of IC Variabilitybull Tolerance of IC Variabilitybull Margining of IC Variabilitybull Conclusions
26ISVLSI-2014 invited talk 140710
Breaking Chicken-Egg Loops Less Marginbull Example Interaction between reliability margin and AVS designs
bull Bias temperature instability (BTI) aging higher |ΔVth| lower fmax
bull AVS can be used to compensate for performance degradation
Circuit
Closed-loop AVS
On-chip aging
monitor
Circuit performanc
e
Voltage regulato
r
Circuit frequency
Vdd
time
time
Without AVSWith AVS
target
27ISVLSI-2014 invited talk 140710
Derated Library Characterization and AVS
bull VBTI = Voltage for BTI aging estimation
bull Vlib = Voltage for circuit performance estimation (library characterization)
bull VBTI and Vlib are required in signoff
bull VBTI and Vlib selection should consider BTI + AVS interaction
bull Aging and Vfinal are unknowns before circuit implementation
BTI degradation and AVS
Vfinal
VBTI |Vt|
Step 1
Vlib
Derated library
Step 2
Circuit implementation and
signoff
circuit
Step 3
28ISVLSI-2014 invited talk 140710
Library Characterization for AVS
bull VBTI = Voltage for BTI aging estimation
bull Vlib = Voltage for circuit performance estimation (library characterization)
bull VBTI and Vlib are required in signoff
bull VBTI and Vlib depend on aging during AVS
bull Aging and Vfinal are unknowns before circuit implementation
Vlib
VBTI Derated library
|Vt| Circuit implementation and
signoff
circuitBTI degradation and AVS
Vfinal
Step 1 Step 2 Step 3
No obvious guideline to define VBTI and Vlib
Inconsistency among Vfinal Vlib VBTI
bull What is the design overhead when timing libraries are not properly characterized
bull Can we define BTI- and AVS-aware signoff corners that ensure product goals with small design lifetime energy overheads Joint work with Wei-Ting Jonas Chan Tuck-Boon Chan Siddhartha Nath
29ISVLSI-2014 invited talk 140710
Power vs Area Across Different Signoffs
ldquoKneerdquo point for balanced area and power tradeoff
Optimistic signoff corner bull AVS increases supply voltage
aggressively to compensate aging
bull Large lifetime energy overhead
bull May fail to meet timing if desired supply voltage gt Vmax
bull Selection of signoff modes affects area power
bull ASP-DAC 2013 Optimization of signoff modes
Improve performance power or area
Reduce overdesign
NOM
ODNOM
OD
time
Vdd
tnom tOD tnom tOD
Also Multi-Mode Signoff Choices Matter
12
Fix fOD still 14 power range
Power of circuits w different overdrive modes
Different overdrive modes 26 power range
fnom = 800MHz Vnom = 08V
36ISVLSI-2014 invited talk 140710
Also Tunable Monitors Less Margin
Aggressive config Vmin_est lt Vmin_chip Some chips will fail
Optimized configbull Increase high
resistance passgatesbull Vmin_est asymp Vmin_chip
Default config
bull Low resistance passgates
bull Guardband for worst-case
bull Vmin_est gt Vmin_chip
bull 13mV margin
37ISVLSI-2014 invited talk 140710
Also Tunable Monitors Less Margin
Aggressive config Vmin_est lt Vmin_chip Some chips will fail
Default config
bull Low resistance passgates
bull Guardband for worst-case
bull Vmin_est gt Vmin_chip
bull 13mV margin
Optimized configbull Increase high
resistance passgatesbull Vmin_est asymp Vmin_chip
Benefits of tunability bull Compensate for difference
between model vs siliconbull Recover margin when variation is
reduced due to improved process
38ISVLSI-2014 invited talk 140710
Outlinebull Introductionbull Modeling of IC Variabilitybull Margining of IC Variabilitybull Tolerance of IC Variabilitybull Conclusions
39ISVLSI-2014 invited talk 140710
Conclusionsbull Variability severely challenges IC value
bull In manufacturing process during operation across lifetime
bull Benefit of ldquonext noderdquo is increasingly hard to findbull Entire node is a ldquo202020rdquo value propositionbull 5-10 in PPA metrics is now substantial at leading edge
bull Variability is connected to tapeout IC properties by models margins tolerances used in signoff
bull Some takeaways from this talkbull Substantial benefit from tightening BEOL corners (= signoff)bull ldquoMinimum cost of resiliencerdquo is a rich optimization challengebull Chicken-egg loops in signoff definition can be brokenbull Holistic approaches will provide ldquoequivalent scalingrdquo that
extends the value trajectory of Moorersquos Law
40ISVLSI-2014 invited talk 140710
Thank You
41ISVLSI-2014 invited talk 140710
Backup
42ISVLSI-2014 invited talk 140710
Power Penalty to Fix EM with AVS
1 2 3 4 5 6 7 8 91200
1300
1400
1500
1600
1700
030
032
034
036
Core Power (mW) PG Power (mW)
Implemetation
Core
Pow
er (m
W)
PG
Pow
er (m
W)
bull Core power increases due to elevated voltage bull PG power increases due to both elevated voltage and mesh degradationbull A tradeoff between invested guardband in signoff
Highest invested guardband
Least invested guardband
14 power penalty
>
43ISVLSI-2014 invited talk 140710
Homogeneous Cornersbull (1) Define RC corners of each layer separatelybull (2) Use corners from each layer to construct a
homogeneous corner for an interconnect stack
C-3σ
Layer M2
3σ
C-3σ
Layer M1
3σ
Interconnect stack with M1 and M2
M1 C
M2 C
3σ Pessimism
Example worst-case capacitance corner Homogeneous
Cw corner
44ISVLSI-2014 invited talk 140710
Homogeneous Cornersbull (1) Define RC corners of each layer separatelybull (2) Use corners from each layer to construct a
homogeneous corner for an interconnect stack
Interconnect stack with M1 and M2
M1 C
M2 C
3σ
Homogeneous Cw corner
C-3σ
Layer M2
3σ
C-3σ
Layer M1
3σ
Pessimism
Example worst-case capacitance corner
When variations in different layers are not fully correlated pessimism of homogeneous corners increase with layers
45ISVLSI-2014 invited talk 140710
Correlation Matrixbull Let Σ be the correlation matrix for variation sources
M1 M2 M3 M4
ΔW ΔT ΔH ΔW ΔT ΔH ΔW ΔT ΔH ΔW ΔT ΔH
M1 ΔW 1 0 0 γ 0 0 γ 0 0 0 0 0
ΔT 0 1 0 0 γ 0 0 γ 0 0 0 0
ΔH 0 0 1 0 0 γ 0 0 γ 0 0 0
M2 ΔW γ 0 0 1 0 0 γ 0 0 0 0 0
ΔT 0 γ 0 0 1 0 0 γ 0 0 0 0
ΔH 0 0 γ 0 0 1 0 0 γ 0 0 0
M3 ΔW γ 0 0 γ 0 0 1 0 0 0 0 0
ΔT 0 γ 0 0 γ 0 0 1 0 0 0 0
ΔH 0 0 γ 0 0 γ 0 0 1 0 0 0
M4 ΔW 0 0 0 0 0 0 0 0 0 1 0 0
ΔT 0 0 0 0 0 0 0 0 0 0 1 0
ΔH 0 0 0 0 0 0 0 0 0 0 0 1
= Σ
Correlation for variation sources with the same variation type and in the process module γ 05
Variation sources in different process modules are independent
46ISVLSI-2014 invited talk 140710
Wiring Structure in Timing-Critical Paths (2)
bull 92 of paths have lt 60 of wirelength on any single layer
Max wirelength ratio across all layers ()
Cum
ulati
ve p
roba
bilit
y
092
60
bull Variations in different layers are not fully correlated
bull Some paths have α gt 10 a CBC can underestimate delay variationsbull But these paths have larger delays at the other corner
C-worst corner underestimates delay variations but these paths are dominated by the RC-worst corner
α lt 10 delay variations are covered by the RC-worst corner
Dominated by C-worstΔdelay at C-worst gt Δdelay at RC-worst
Dominated by RC-worst Δdelay at RC-worst gt Δdelay at C-worst
Δdelay at RC-worst [d(Ycw) ndash d(Ytyp)] d(Ytyp)
48ISVLSI-2014 invited talk 140710
Delay Variation
α α
Δdelay at C-worst [d(Ycw) ndash d(Ytyp)] d(Ytyp)
bull Some paths have α gt 10 a CBC can underestimate delay variationsbull But these paths have larger delays at the other corner
C-worst corner underestimates delay variations but these paths are dominated by the RC-worst corner
α lt 10 delay variations are covered by the RC-worst corner
Dominated by C-worstΔdelay at C-worst gt Δdelay at RC-worst
Dominated by RC-worst Δdelay at RC-worst gt Δdelay at C-worst
Δdelay at RC-worst [d(Ycw) ndash d(Ytyp)] d(Ytyp)
bull Paths are more sensitive to R or to Cbull Using RC-worst or C-worst only will underestimate delay variationsbull Need both RC- and C-worst corners to cover process variationsbull In the following discussions α is defined at the dominant corner
49ISVLSI-2014 invited talk 140710
Non-Homogeneous Corner
bull Each layer can have different skewed variationsInterconnect stack with M1 and M2
M1 C
M2 C
3σ
Non-homogeneous cornerM1 == Cw (3σ)M2 == Ctyp
bull Less pessimism with non-homogeneous cornersbull Challenge
bull Many feasible combinationsbull A corner can only cover certain pathsbull How to choose the best combinations
50ISVLSI-2014 invited talk 140710
Opportunities for Tightened BEOL Corners
bull CBC can be pessimistic Most paths have α lt 05 bull Use tightened BEOL corners eg scale BEOL variation in
itf with α = 05
Δdj(Yrcw)dj(Ytyp) x 100
3σjd(Ytyp) x 100
Challenge how to avoid underestimating delay variation to preserve parametric yield
51ISVLSI-2014 invited talk 140710
Wiring Structure in Timing-Critical Paths
bull Critical paths are structurally similar
bull Wires on critical paths are routed on many layers
bull Structure is an outcome of the design flow
Testcasebull 45nm foundry library (wire
resistivity scaled by 8X)bull Netlist NETCARD 1mm2 570K
standard cell instancesbull 9 metal layersbull Extract critical paths from
different PVT and BEOL corners
Wirelength ratio ()
52ISVLSI-2014 invited talk 140710
Proposed Timing Signoff Flow
bull Extract RC at RC-worst C-worst and the typical corners
bull Calculate Δdelay of critical paths
bull Put path j in the group Gtbc if Δdelay is larger than a threshold
bull Fix only the paths in Gtbc using tightened BEOL corners
bull Since tightened corners have smaller delay variations timing closure is easier
Routed design
Timing analysis at BEOL corners Ytyp Ycw Yrcw
GTBC GCBC
ECOusing CBC
Timing analysis
using TBC
violation = 0
Timing analysis
using CBC
violation = 0
ECOusing TBC
done
53ISVLSI-2014 invited talk 140710
Experiment Setup
LEON3MP NETCARD SUPERBLUE12Clock period (ns) 18 20 31
bull Model BTI degradation with Vfinal throughout lifetime
bull Aging of a flat Vfinal asymp aging of an adaptive Vdd
bull But slightly pessimistic
Vdd
time
NBTI
PBTI
VBTI = Vlib asymp Vfinal
58ISVLSI-2014 invited talk 140710
Vfinal Estimation
bull Problem Vfinal is not available at early design stage (design has not been implemented)
bull Vfinal = Vdd end of life (to compensate BTI aging)
bull Gates along critical pathbull Timing slack at t = 0
bull Circuit activity is not an issue bull Because BTI effect is not sensitive to circuit activitybull DC or AC stress model is sufficient
59ISVLSI-2014 invited talk 140710
Observation and Heuristic 2
bull Observation 2 Vfinal is not sensitive to gate types
bull Heuristic 2 use average Vfinal of different gate typesbull Vfinal is a function of timing slack
bull Assume timing slack = 0
10mV
60ISVLSI-2014 invited talk 140710
Technology and Benchmark Circuits
bull NANGATE library with 32nm PTM technology bull Signoff for setup time violationbull Temperature = 125Cbull Process corner = slow NMOS and PMOSbull BTI degradation = DC AC
Supply voltages
Circuit Frequency (GHz)C5315 138c7552 125AES 089MPEG2 105
Vmax105V
Vinit090V
Vheur1 (DC) 097V
Vheur1 (AC) 095V
Vheur2 (DC) 095V
Vheur2 (AC) 093V
61ISVLSI-2014 invited talk 140710
A Reference Signoff Flow
bull Basic idea keep a consistent VBTI VLIB and Vdd throughout circuit lifetime
bull Signoff flowbull Estimate aging at each time step
bull Update circuit timing and Vdd
bull Repeat until t = tfinal
bull Modify circuit and start over if Vfinal gt maximum allowed voltage
bull No overhead in timing analysis but very slow Many STA runs
and library
Vstep AVS voltage stepVfinal converged voltage
62ISVLSI-2014 invited talk 140710
Experiment Setupbull Characterize different derated libraries
bull Evaluate impact of library characterizationbull Seven setups
1 VBTI = Vlib = Vinit Ignore AVS2 Most pessimistic derated library3 VBTI = Vlib = Vmax Extreme corner for AVS4 VBTI = Vfinal Do not overestimate aging but ignores AVS5 No derated library (reference)6 Proposed method with α=07 Proposed method with α=003
Case 1 2 3 4 5 6 7Vlib(V) Vinit Vinit Vmax Vinit NA Vheur1 Vheur2
VBTI (V) Vinit Vmax Vmax Vfinal NA Vheur1 Vheur2
63ISVLSI-2014 invited talk 140710
ldquoChicken and Eggrdquo Loop
bull ldquoChicken and eggrdquo loop in signoffbull Derated library characterization is related to BTI + AVSbull AVS affected by circuit implementation
bull Timing constraints critical paths etc
bull Circuit is affected by library characterization
Circuit
Derated Libraries
Vfinal
Vlib VBTI
64ISVLSI-2014 invited talk 140710
Bias Temperature Instability (BTI)
|ΔVth| increases when device is on (stressed)|ΔVth| is partially recovered when device is off (relaxed)NBTI PMOS PBTINMOS
|Vgs|
time
ON OFF ON OFF
[VattikondaWC06]
Device aging (|ΔVth|) accumulates over time
[TCASrsquo14]
65ISVLSI-2014 invited talk 140710
Observation 1
[Chan11]
bull BTI is a ldquofront-loadedrdquo phenomenon
bull 50 BTI aging happens within the 1st year of circuit lifetime (total lifetime = 10 years)
bull Most Vdd increment happens in early lifetime
bull Gap between Vdd and Vfinal reduces rapidly
asymp70 Vdd increment in 1 year(remaining 30 over 9 years)
Vfinal
66ISVLSI-2014 invited talk 140710
Results for DC Scenario
Optimistic signoff corner bull AVS increases supply voltage
aggressively to compensate agingbull Consume more powerbull May fail to meet timing if desired
bull Assume no EM fix at signoffbull BTI degradation is checked at each step and MTTF is updated as
30 MTTF penalty
200mV voltage compensation
>
70ISVLSI-2014 invited talk 140710
0 2 4 6 8 10 12090092094096098100102104
S1 S2 S3 S4 S5
Year
VDD
DMA 3S1 S2 S3 S4 S5
78
79
80
81
MTT
F (Y
ear)
EM Impact on AVS Scheduling
12 years MTTF penalty
>
71ISVLSI-2014 invited talk 140710
What is ldquoSignoffrdquo
bull Foundation of contract between design house and foundrybull ldquochip should workrdquo stack of models margins analysesbull Function timing signal integrity power integrity hellip
Nominal VddStatic IR drop
Power grid IR gradientDynamic IR
HCINBTI
Signoff Vdd
Voltage
Problem Margins = pessimism
overdesign schedule delay
ldquomargin stackrdquo for voltage signoff
Operating voltage
72ISVLSI-2014 invited talk 140710
Statistical Timing Analysis (1)
bull Delay sensitivity of path pj to variation source zv
bull Assumptions bull Δdjv is linear with respect to variation sources
bull Variation sources are normal distributions
bull Obtain Δdjv using 28 runs of RC extraction and static timing analysis (STA)
28 itf files (27 variation
sources + Ytyp)
Routed Netlist
RC extraction
STA
Δdjv
Δdjv = [ - ] 3dj(Yv) dj(Ytyp)
Note Path delay includes gate and wire delays
73ISVLSI-2014 invited talk 140710
Statistical Timing Analysis (2)bull Σ is the correlation matrix for variation sources (eg 27 x 27)
bull Σ = λλT (Note λ is obtained by Cholesky decomposition)
h h119879 119903119900119906119892 119901119906119905=1minus119864119877119879
+1minus119864119877119903times119879
recovery cycles
Clock period
Error rate [Kahng10]
76ISVLSI-2014 invited talk 140710
Selective-Endpoint Optimizationbull Optimize fanin cone w tighter constraints Allows replacement of Razor FF w normal FFbull Trade off cost of resilience vs data path optimization
Overall Optimization Flowbull Iteratively optimize with SEOpt and SkewOpt
Initial placement (all FFs = error-tolerant FFs)
Energy lt min energy
Save current solution
Margin insertion on K paths based on sensitivity function
Replace error-tolerant FFs w normal FFs
SEOpt
Activity aware clock skew optimization
SkewOpt
OR-tree insertion
23ISVLSI-2014 invited talk 140710
Benefit of Low-Cost Resiliencebull Reference flows
bull Pure-margin (PM) conventional method w only margin insertionbull Brute-force (BF) use error-tolerant FFs for timing-critical endpoints
bull Proposed method (CO) achieves up to 21 energy reduction compared to reference methods
bull Resilience benefits increase with larger process variation
PM BF CO PM BF CO PM BF CO27
29
31
33
35
37
En
erg
y (
mJ
)
PM BF CO PM BF CO PM BF CO22
26
30
34
38Energy penalty of throughput degradation
Energy penalty of additional circuits
Energy wo resilience
En
erg
y (
mJ
)
Large marginMedium marginSmall margin
MUL
EXU
Large marginMedium marginSmall margin
Smallmediumlarge margin 1σ2σ3σ for SS corner Technology foundry 28nm
24ISVLSI-2014 invited talk 140710
Increased Benefit of Resilience with AVSbull Adaptive voltage scaling allows a lower supply voltage for resilient
designs thus reduced powerbull Proposed method trades off between timing-error penalty vs
reduced power at a lower supply voltagebull Proposed method achieves an average of 17 energy reduction
compared to pure-margin designs Resilience benefits increase in the context of AVS strategy
086 09 094 098 10225
30
35
40
45
50pure-marginbrute-forceCombOpt
Supply voltage (V)
En
erg
y (
mJ
)
070 072 074 076 078 08024
26
28
30
32
34
36 pure-marginbrute-forceCombOpt
Supply voltage (V)
En
erg
y (
mJ
)
MUL EXU
Minimum achievable energy
Technology foundry 28nm
25ISVLSI-2014 invited talk 140710
Outlinebull Introductionbull Modeling of IC Variabilitybull Tolerance of IC Variabilitybull Margining of IC Variabilitybull Conclusions
26ISVLSI-2014 invited talk 140710
Breaking Chicken-Egg Loops Less Marginbull Example Interaction between reliability margin and AVS designs
bull Bias temperature instability (BTI) aging higher |ΔVth| lower fmax
bull AVS can be used to compensate for performance degradation
Circuit
Closed-loop AVS
On-chip aging
monitor
Circuit performanc
e
Voltage regulato
r
Circuit frequency
Vdd
time
time
Without AVSWith AVS
target
27ISVLSI-2014 invited talk 140710
Derated Library Characterization and AVS
bull VBTI = Voltage for BTI aging estimation
bull Vlib = Voltage for circuit performance estimation (library characterization)
bull VBTI and Vlib are required in signoff
bull VBTI and Vlib selection should consider BTI + AVS interaction
bull Aging and Vfinal are unknowns before circuit implementation
BTI degradation and AVS
Vfinal
VBTI |Vt|
Step 1
Vlib
Derated library
Step 2
Circuit implementation and
signoff
circuit
Step 3
28ISVLSI-2014 invited talk 140710
Library Characterization for AVS
bull VBTI = Voltage for BTI aging estimation
bull Vlib = Voltage for circuit performance estimation (library characterization)
bull VBTI and Vlib are required in signoff
bull VBTI and Vlib depend on aging during AVS
bull Aging and Vfinal are unknowns before circuit implementation
Vlib
VBTI Derated library
|Vt| Circuit implementation and
signoff
circuitBTI degradation and AVS
Vfinal
Step 1 Step 2 Step 3
No obvious guideline to define VBTI and Vlib
Inconsistency among Vfinal Vlib VBTI
bull What is the design overhead when timing libraries are not properly characterized
bull Can we define BTI- and AVS-aware signoff corners that ensure product goals with small design lifetime energy overheads Joint work with Wei-Ting Jonas Chan Tuck-Boon Chan Siddhartha Nath
29ISVLSI-2014 invited talk 140710
Power vs Area Across Different Signoffs
ldquoKneerdquo point for balanced area and power tradeoff
Optimistic signoff corner bull AVS increases supply voltage
aggressively to compensate aging
bull Large lifetime energy overhead
bull May fail to meet timing if desired supply voltage gt Vmax
bull Selection of signoff modes affects area power
bull ASP-DAC 2013 Optimization of signoff modes
Improve performance power or area
Reduce overdesign
NOM
ODNOM
OD
time
Vdd
tnom tOD tnom tOD
Also Multi-Mode Signoff Choices Matter
12
Fix fOD still 14 power range
Power of circuits w different overdrive modes
Different overdrive modes 26 power range
fnom = 800MHz Vnom = 08V
36ISVLSI-2014 invited talk 140710
Also Tunable Monitors Less Margin
Aggressive config Vmin_est lt Vmin_chip Some chips will fail
Optimized configbull Increase high
resistance passgatesbull Vmin_est asymp Vmin_chip
Default config
bull Low resistance passgates
bull Guardband for worst-case
bull Vmin_est gt Vmin_chip
bull 13mV margin
37ISVLSI-2014 invited talk 140710
Also Tunable Monitors Less Margin
Aggressive config Vmin_est lt Vmin_chip Some chips will fail
Default config
bull Low resistance passgates
bull Guardband for worst-case
bull Vmin_est gt Vmin_chip
bull 13mV margin
Optimized configbull Increase high
resistance passgatesbull Vmin_est asymp Vmin_chip
Benefits of tunability bull Compensate for difference
between model vs siliconbull Recover margin when variation is
reduced due to improved process
38ISVLSI-2014 invited talk 140710
Outlinebull Introductionbull Modeling of IC Variabilitybull Margining of IC Variabilitybull Tolerance of IC Variabilitybull Conclusions
39ISVLSI-2014 invited talk 140710
Conclusionsbull Variability severely challenges IC value
bull In manufacturing process during operation across lifetime
bull Benefit of ldquonext noderdquo is increasingly hard to findbull Entire node is a ldquo202020rdquo value propositionbull 5-10 in PPA metrics is now substantial at leading edge
bull Variability is connected to tapeout IC properties by models margins tolerances used in signoff
bull Some takeaways from this talkbull Substantial benefit from tightening BEOL corners (= signoff)bull ldquoMinimum cost of resiliencerdquo is a rich optimization challengebull Chicken-egg loops in signoff definition can be brokenbull Holistic approaches will provide ldquoequivalent scalingrdquo that
extends the value trajectory of Moorersquos Law
40ISVLSI-2014 invited talk 140710
Thank You
41ISVLSI-2014 invited talk 140710
Backup
42ISVLSI-2014 invited talk 140710
Power Penalty to Fix EM with AVS
1 2 3 4 5 6 7 8 91200
1300
1400
1500
1600
1700
030
032
034
036
Core Power (mW) PG Power (mW)
Implemetation
Core
Pow
er (m
W)
PG
Pow
er (m
W)
bull Core power increases due to elevated voltage bull PG power increases due to both elevated voltage and mesh degradationbull A tradeoff between invested guardband in signoff
Highest invested guardband
Least invested guardband
14 power penalty
>
43ISVLSI-2014 invited talk 140710
Homogeneous Cornersbull (1) Define RC corners of each layer separatelybull (2) Use corners from each layer to construct a
homogeneous corner for an interconnect stack
C-3σ
Layer M2
3σ
C-3σ
Layer M1
3σ
Interconnect stack with M1 and M2
M1 C
M2 C
3σ Pessimism
Example worst-case capacitance corner Homogeneous
Cw corner
44ISVLSI-2014 invited talk 140710
Homogeneous Cornersbull (1) Define RC corners of each layer separatelybull (2) Use corners from each layer to construct a
homogeneous corner for an interconnect stack
Interconnect stack with M1 and M2
M1 C
M2 C
3σ
Homogeneous Cw corner
C-3σ
Layer M2
3σ
C-3σ
Layer M1
3σ
Pessimism
Example worst-case capacitance corner
When variations in different layers are not fully correlated pessimism of homogeneous corners increase with layers
45ISVLSI-2014 invited talk 140710
Correlation Matrixbull Let Σ be the correlation matrix for variation sources
M1 M2 M3 M4
ΔW ΔT ΔH ΔW ΔT ΔH ΔW ΔT ΔH ΔW ΔT ΔH
M1 ΔW 1 0 0 γ 0 0 γ 0 0 0 0 0
ΔT 0 1 0 0 γ 0 0 γ 0 0 0 0
ΔH 0 0 1 0 0 γ 0 0 γ 0 0 0
M2 ΔW γ 0 0 1 0 0 γ 0 0 0 0 0
ΔT 0 γ 0 0 1 0 0 γ 0 0 0 0
ΔH 0 0 γ 0 0 1 0 0 γ 0 0 0
M3 ΔW γ 0 0 γ 0 0 1 0 0 0 0 0
ΔT 0 γ 0 0 γ 0 0 1 0 0 0 0
ΔH 0 0 γ 0 0 γ 0 0 1 0 0 0
M4 ΔW 0 0 0 0 0 0 0 0 0 1 0 0
ΔT 0 0 0 0 0 0 0 0 0 0 1 0
ΔH 0 0 0 0 0 0 0 0 0 0 0 1
= Σ
Correlation for variation sources with the same variation type and in the process module γ 05
Variation sources in different process modules are independent
46ISVLSI-2014 invited talk 140710
Wiring Structure in Timing-Critical Paths (2)
bull 92 of paths have lt 60 of wirelength on any single layer
Max wirelength ratio across all layers ()
Cum
ulati
ve p
roba
bilit
y
092
60
bull Variations in different layers are not fully correlated
bull Some paths have α gt 10 a CBC can underestimate delay variationsbull But these paths have larger delays at the other corner
C-worst corner underestimates delay variations but these paths are dominated by the RC-worst corner
α lt 10 delay variations are covered by the RC-worst corner
Dominated by C-worstΔdelay at C-worst gt Δdelay at RC-worst
Dominated by RC-worst Δdelay at RC-worst gt Δdelay at C-worst
Δdelay at RC-worst [d(Ycw) ndash d(Ytyp)] d(Ytyp)
48ISVLSI-2014 invited talk 140710
Delay Variation
α α
Δdelay at C-worst [d(Ycw) ndash d(Ytyp)] d(Ytyp)
bull Some paths have α gt 10 a CBC can underestimate delay variationsbull But these paths have larger delays at the other corner
C-worst corner underestimates delay variations but these paths are dominated by the RC-worst corner
α lt 10 delay variations are covered by the RC-worst corner
Dominated by C-worstΔdelay at C-worst gt Δdelay at RC-worst
Dominated by RC-worst Δdelay at RC-worst gt Δdelay at C-worst
Δdelay at RC-worst [d(Ycw) ndash d(Ytyp)] d(Ytyp)
bull Paths are more sensitive to R or to Cbull Using RC-worst or C-worst only will underestimate delay variationsbull Need both RC- and C-worst corners to cover process variationsbull In the following discussions α is defined at the dominant corner
49ISVLSI-2014 invited talk 140710
Non-Homogeneous Corner
bull Each layer can have different skewed variationsInterconnect stack with M1 and M2
M1 C
M2 C
3σ
Non-homogeneous cornerM1 == Cw (3σ)M2 == Ctyp
bull Less pessimism with non-homogeneous cornersbull Challenge
bull Many feasible combinationsbull A corner can only cover certain pathsbull How to choose the best combinations
50ISVLSI-2014 invited talk 140710
Opportunities for Tightened BEOL Corners
bull CBC can be pessimistic Most paths have α lt 05 bull Use tightened BEOL corners eg scale BEOL variation in
itf with α = 05
Δdj(Yrcw)dj(Ytyp) x 100
3σjd(Ytyp) x 100
Challenge how to avoid underestimating delay variation to preserve parametric yield
51ISVLSI-2014 invited talk 140710
Wiring Structure in Timing-Critical Paths
bull Critical paths are structurally similar
bull Wires on critical paths are routed on many layers
bull Structure is an outcome of the design flow
Testcasebull 45nm foundry library (wire
resistivity scaled by 8X)bull Netlist NETCARD 1mm2 570K
standard cell instancesbull 9 metal layersbull Extract critical paths from
different PVT and BEOL corners
Wirelength ratio ()
52ISVLSI-2014 invited talk 140710
Proposed Timing Signoff Flow
bull Extract RC at RC-worst C-worst and the typical corners
bull Calculate Δdelay of critical paths
bull Put path j in the group Gtbc if Δdelay is larger than a threshold
bull Fix only the paths in Gtbc using tightened BEOL corners
bull Since tightened corners have smaller delay variations timing closure is easier
Routed design
Timing analysis at BEOL corners Ytyp Ycw Yrcw
GTBC GCBC
ECOusing CBC
Timing analysis
using TBC
violation = 0
Timing analysis
using CBC
violation = 0
ECOusing TBC
done
53ISVLSI-2014 invited talk 140710
Experiment Setup
LEON3MP NETCARD SUPERBLUE12Clock period (ns) 18 20 31
bull Model BTI degradation with Vfinal throughout lifetime
bull Aging of a flat Vfinal asymp aging of an adaptive Vdd
bull But slightly pessimistic
Vdd
time
NBTI
PBTI
VBTI = Vlib asymp Vfinal
58ISVLSI-2014 invited talk 140710
Vfinal Estimation
bull Problem Vfinal is not available at early design stage (design has not been implemented)
bull Vfinal = Vdd end of life (to compensate BTI aging)
bull Gates along critical pathbull Timing slack at t = 0
bull Circuit activity is not an issue bull Because BTI effect is not sensitive to circuit activitybull DC or AC stress model is sufficient
59ISVLSI-2014 invited talk 140710
Observation and Heuristic 2
bull Observation 2 Vfinal is not sensitive to gate types
bull Heuristic 2 use average Vfinal of different gate typesbull Vfinal is a function of timing slack
bull Assume timing slack = 0
10mV
60ISVLSI-2014 invited talk 140710
Technology and Benchmark Circuits
bull NANGATE library with 32nm PTM technology bull Signoff for setup time violationbull Temperature = 125Cbull Process corner = slow NMOS and PMOSbull BTI degradation = DC AC
Supply voltages
Circuit Frequency (GHz)C5315 138c7552 125AES 089MPEG2 105
Vmax105V
Vinit090V
Vheur1 (DC) 097V
Vheur1 (AC) 095V
Vheur2 (DC) 095V
Vheur2 (AC) 093V
61ISVLSI-2014 invited talk 140710
A Reference Signoff Flow
bull Basic idea keep a consistent VBTI VLIB and Vdd throughout circuit lifetime
bull Signoff flowbull Estimate aging at each time step
bull Update circuit timing and Vdd
bull Repeat until t = tfinal
bull Modify circuit and start over if Vfinal gt maximum allowed voltage
bull No overhead in timing analysis but very slow Many STA runs
and library
Vstep AVS voltage stepVfinal converged voltage
62ISVLSI-2014 invited talk 140710
Experiment Setupbull Characterize different derated libraries
bull Evaluate impact of library characterizationbull Seven setups
1 VBTI = Vlib = Vinit Ignore AVS2 Most pessimistic derated library3 VBTI = Vlib = Vmax Extreme corner for AVS4 VBTI = Vfinal Do not overestimate aging but ignores AVS5 No derated library (reference)6 Proposed method with α=07 Proposed method with α=003
Case 1 2 3 4 5 6 7Vlib(V) Vinit Vinit Vmax Vinit NA Vheur1 Vheur2
VBTI (V) Vinit Vmax Vmax Vfinal NA Vheur1 Vheur2
63ISVLSI-2014 invited talk 140710
ldquoChicken and Eggrdquo Loop
bull ldquoChicken and eggrdquo loop in signoffbull Derated library characterization is related to BTI + AVSbull AVS affected by circuit implementation
bull Timing constraints critical paths etc
bull Circuit is affected by library characterization
Circuit
Derated Libraries
Vfinal
Vlib VBTI
64ISVLSI-2014 invited talk 140710
Bias Temperature Instability (BTI)
|ΔVth| increases when device is on (stressed)|ΔVth| is partially recovered when device is off (relaxed)NBTI PMOS PBTINMOS
|Vgs|
time
ON OFF ON OFF
[VattikondaWC06]
Device aging (|ΔVth|) accumulates over time
[TCASrsquo14]
65ISVLSI-2014 invited talk 140710
Observation 1
[Chan11]
bull BTI is a ldquofront-loadedrdquo phenomenon
bull 50 BTI aging happens within the 1st year of circuit lifetime (total lifetime = 10 years)
bull Most Vdd increment happens in early lifetime
bull Gap between Vdd and Vfinal reduces rapidly
asymp70 Vdd increment in 1 year(remaining 30 over 9 years)
Vfinal
66ISVLSI-2014 invited talk 140710
Results for DC Scenario
Optimistic signoff corner bull AVS increases supply voltage
aggressively to compensate agingbull Consume more powerbull May fail to meet timing if desired
bull Assume no EM fix at signoffbull BTI degradation is checked at each step and MTTF is updated as
30 MTTF penalty
200mV voltage compensation
>
70ISVLSI-2014 invited talk 140710
0 2 4 6 8 10 12090092094096098100102104
S1 S2 S3 S4 S5
Year
VDD
DMA 3S1 S2 S3 S4 S5
78
79
80
81
MTT
F (Y
ear)
EM Impact on AVS Scheduling
12 years MTTF penalty
>
71ISVLSI-2014 invited talk 140710
What is ldquoSignoffrdquo
bull Foundation of contract between design house and foundrybull ldquochip should workrdquo stack of models margins analysesbull Function timing signal integrity power integrity hellip
Nominal VddStatic IR drop
Power grid IR gradientDynamic IR
HCINBTI
Signoff Vdd
Voltage
Problem Margins = pessimism
overdesign schedule delay
ldquomargin stackrdquo for voltage signoff
Operating voltage
72ISVLSI-2014 invited talk 140710
Statistical Timing Analysis (1)
bull Delay sensitivity of path pj to variation source zv
bull Assumptions bull Δdjv is linear with respect to variation sources
bull Variation sources are normal distributions
bull Obtain Δdjv using 28 runs of RC extraction and static timing analysis (STA)
28 itf files (27 variation
sources + Ytyp)
Routed Netlist
RC extraction
STA
Δdjv
Δdjv = [ - ] 3dj(Yv) dj(Ytyp)
Note Path delay includes gate and wire delays
73ISVLSI-2014 invited talk 140710
Statistical Timing Analysis (2)bull Σ is the correlation matrix for variation sources (eg 27 x 27)
bull Σ = λλT (Note λ is obtained by Cholesky decomposition)
h h119879 119903119900119906119892 119901119906119905=1minus119864119877119879
+1minus119864119877119903times119879
recovery cycles
Clock period
Error rate [Kahng10]
76ISVLSI-2014 invited talk 140710
Selective-Endpoint Optimizationbull Optimize fanin cone w tighter constraints Allows replacement of Razor FF w normal FFbull Trade off cost of resilience vs data path optimization
Energy Reduction in AVS Contextbull Adaptive voltage scaling allows lower supply voltage for resilient
designs thus reduced powerbull Proposed method trades off between timing-error penalty vs
reduced power at a lower supply voltagebull Proposed method achieves an average of 18 energy reduction
compared to pure-margin designs Resilience benefits increase in the context of AVS strategy
084 088 092 096 10030
36
42
48
54
60brute-forcepure-marginCombOpt
Supply voltage (V)
En
erg
y (
mJ
)
084 086 088 09 092 094 096 098 1 10225
29
33
37
41
45brute-force
pure-margin
CombOpt
Supply voltage (V)
En
erg
y (
mJ
)
MUL EXU
Minimum achievable energy
80ISVLSI-2014 invited talk 140710
Our Concept Mode Dominancebull Design cone (of mode A) is the union of all the feasible operating modes for
circuits signed of at mode Abull Design cone is determined by tradeoff between voltage and frequency (mainly
threshold voltages)bull One mode is outside of the design cone of the other
failed design overdesignbull Mode A has positive timing slacks with respect to mode B
mode A dominates mode Bbull Equivalent dominance no mode is dominated by the other
bull Modes are in each othersrsquo design cone
Voltage
Frequency
A
Negative Slacks = failed design
Positive Slacks = overdesign
B
C
Design Cone of mode A
Multi-mode signoff at modes which do not exhibit equivalent dominance leads to overdesign
Guideline search for signoff modes within design cone reduce overdesign
81ISVLSI-2014 invited talk 140710
Our Method Global Optimizationbull Iteratively sample and refine power models
bull Avoid circuit implementation at each modebull Small constant of runs is enough Scalable
Sample (SPampR)
Construct power models
Estimate optimal signoff modes
Sample (SPampR)
Refine power models
Adaptive search
Global optimization flow
09 10 11 1214
15
16
17
18
19
201st 2nd real
Signoff Voltage (v)
Po
wer
(m
W)
Power estimation of adaptive search
bull Ovals indicate sample pointsbull 1st 2nd power from power models at first
second iterationbull real power from real implemented circuits
Design AESf 700MHz
82ISVLSI-2014 invited talk 140710
Classes of Closed-Loop AVS
bull Critical path may be difficult to identify (IP from 3rd party)
bull Calibrating monitors at multiple modesvoltages requires long test time
Closed-Loop AVS
Design-dependent replica
In-situmonitor
Generic monitor
bull Does not capture design-specific performance variation
82
This work Tunable monitor for closed-loop AVSbull Can be applied as a generic monitorbull Or tuned to capture design-specific performance
83ISVLSI-2014 invited talk 140710
Design of RO with Tunable Vmin
bull Identified two circuit knobs to tune Vmin
bull Series resistancebull Cell types (INV NAND NOR)
bull Proposed circuitbull Different cell type covers different process cornersbull Tune series resistance of each stage to high or low
1 bit 1 bit 1 bit Control pins
High resistance
Low resistance
84ISVLSI-2014 invited talk 140710
Benefit of Resilience Cost Reductionbull Reference flows
bull Pure-margin (PM) conventional methodology w only margin insertionbull Brute-force (BF) insert error-tolerant FFs at timing-critical endpoints
bull Proposed method (CO) achieves up to 20 energy reduction compared to reference methods
bull Resilience benefits increase with safety margin
PM BF CO PM BF CO PM BF CO25
30
35
40
45
50
55 Energy penalty of throughput degradationEnergy penalty of additional circuitsEnergy wo resilience
En
erg
y (
mJ
)
Large marginMedium marginSmall margin
MUL
PM BF CO PM BF CO PM BF CO25
27
29
31
33
35
En
erg
y (
mJ
)
EXU
Large marginMedium marginSmall margin
Smallmediumlarge margin safety margin = 51015 of clock period
85ISVLSI-2014 invited talk 140710
Increased Benefit of Resilience With AVSbull AVS (Adaptive Voltage Scaling) allows lower supply voltage for
resilient designs reduced powerbull We trade off between timing-error penalty vs reduced power at a
lower supply voltagebull Average 18 energy reduction compared to pure-margin designs
Resilience benefits increase in AVS context
084 088 092 096 10030
36
42
48
54
60brute-forcepure-marginCombOpt
Supply voltage (V)
En
erg
y (
mJ
)
084 086 088 09 092 094 096 098 1 10225
29
33
37
41
45brute-force
pure-margin
CombOpt
Supply voltage (V)
En
erg
y (
mJ
)
MUL EXU
Minimum achievable energy
86ISVLSI-2014 invited talk 140710
Overall Optimization Flow
bull Iteratively optimize with SEOpt and SkewOpt
Initial placement (all FFs = error-tolerant FFs)
Energy lt min energy
Save current solution
Margin insertion on K paths based on sensitivity function
Replace error-tolerant FFs w normal FFs
SEOpt
Activity-aware clock skew optimization
SkewOpt
Toward Holistic Modeling Margining and Tolerance of IC Variabi
IC Variability
Challenge Value of Technology
Solutions Modeling Margining Tolerance
Outline
BEOL Corner Optimization
Proposed Timing Signoff Flow
Conventional BEOL Corners
Statistical RC Model
Pessimism of Conventional BEOL Corners (CBC)
Intuition on Delay Variability Across Cw RCw
Intuition on Delay Variability Across Cw RCw (2)
Scaling Factor α and Delay Variation
Find Paths for Which TBCs Can Be Used
Determining α Arcw and Acw
Benefits of Tightened BEOL Corners
Outline (2)
How to Minimize Cost of Resilience
Tradeoff Resilience Cost vs Datapath Cost
Selective-Endpoint Optimization (SEOpt)
Clock Skew Optimization (SkewOpt)
Overall Optimization Flow
Benefit of Low-Cost Resilience
Increased Benefit of Resilience with AVS
Outline (3)
Breaking Chicken-Egg Loops Less Margin
Derated Library Characterization and AVS
Library Characterization for AVS
Power vs Area Across Different Signoffs
Heuristics 1
Vfinal Estimation
Observation and Heuristic 2
Proposed Library Characterization Flow
Power vs Area for All Designs
Also Multi-Mode Signoff Choices Matter
Also Tunable Monitors Less Margin
Also Tunable Monitors Less Margin (2)
Outline (4)
Conclusions
Thank You
Backup
Power Penalty to Fix EM with AVS
Homogeneous Corners
Homogeneous Corners (2)
Correlation Matrix
Wiring Structure in Timing-Critical Paths (2)
Delay Variation
Delay Variation (2)
Non-Homogeneous Corner
Opportunities for Tightened BEOL Corners
Wiring Structure in Timing-Critical Paths (2)
Proposed Timing Signoff Flow (2)
Experiment Setup
Further Analysis
Scaling Factor Results
Benefits of Tightened BEOL Corners (1)
Heuristics 1 (2)
Vfinal Estimation (2)
Observation and Heuristic 2 (2)
Technology and Benchmark Circuits
A Reference Signoff Flow
Experiment Setup (2)
ldquoChicken and Eggrdquo Loop
Bias Temperature Instability (BTI)
Observation 1
Results for DC Scenario
Problem Signoff Corner Definition
AVS Signoff Corner Selection
AVS Impact on EM Lifetime
EM Impact on AVS Scheduling
What is ldquoSignoffrdquo
Statistical Timing Analysis (1)
Statistical Timing Analysis (2)
Resilient Designs
Resilience Cost Reduction Problem
Selective-Endpoint Optimization
Process-Aware Vdd Scaling (PVS)
Challenge Variability
Energy Reduction in AVS Context
Our Concept Mode Dominance
Our Method Global Optimization
Classes of Closed-Loop AVS
Design of RO with Tunable Vmin
Benefit of Resilience Cost Reduction
Increased Benefit of Resilience With AVS
Overall Optimization Flow (2)
22ISVLSI-2014 invited talk 140710
Overall Optimization Flowbull Iteratively optimize with SEOpt and SkewOpt
Initial placement (all FFs = error-tolerant FFs)
Energy lt min energy
Save current solution
Margin insertion on K paths based on sensitivity function
Replace error-tolerant FFs w normal FFs
SEOpt
Activity aware clock skew optimization
SkewOpt
OR-tree insertion
23ISVLSI-2014 invited talk 140710
Benefit of Low-Cost Resiliencebull Reference flows
bull Pure-margin (PM) conventional method w only margin insertionbull Brute-force (BF) use error-tolerant FFs for timing-critical endpoints
bull Proposed method (CO) achieves up to 21 energy reduction compared to reference methods
bull Resilience benefits increase with larger process variation
PM BF CO PM BF CO PM BF CO27
29
31
33
35
37
En
erg
y (
mJ
)
PM BF CO PM BF CO PM BF CO22
26
30
34
38Energy penalty of throughput degradation
Energy penalty of additional circuits
Energy wo resilience
En
erg
y (
mJ
)
Large marginMedium marginSmall margin
MUL
EXU
Large marginMedium marginSmall margin
Smallmediumlarge margin 1σ2σ3σ for SS corner Technology foundry 28nm
24ISVLSI-2014 invited talk 140710
Increased Benefit of Resilience with AVSbull Adaptive voltage scaling allows a lower supply voltage for resilient
designs thus reduced powerbull Proposed method trades off between timing-error penalty vs
reduced power at a lower supply voltagebull Proposed method achieves an average of 17 energy reduction
compared to pure-margin designs Resilience benefits increase in the context of AVS strategy
086 09 094 098 10225
30
35
40
45
50pure-marginbrute-forceCombOpt
Supply voltage (V)
En
erg
y (
mJ
)
070 072 074 076 078 08024
26
28
30
32
34
36 pure-marginbrute-forceCombOpt
Supply voltage (V)
En
erg
y (
mJ
)
MUL EXU
Minimum achievable energy
Technology foundry 28nm
25ISVLSI-2014 invited talk 140710
Outlinebull Introductionbull Modeling of IC Variabilitybull Tolerance of IC Variabilitybull Margining of IC Variabilitybull Conclusions
26ISVLSI-2014 invited talk 140710
Breaking Chicken-Egg Loops Less Marginbull Example Interaction between reliability margin and AVS designs
bull Bias temperature instability (BTI) aging higher |ΔVth| lower fmax
bull AVS can be used to compensate for performance degradation
Circuit
Closed-loop AVS
On-chip aging
monitor
Circuit performanc
e
Voltage regulato
r
Circuit frequency
Vdd
time
time
Without AVSWith AVS
target
27ISVLSI-2014 invited talk 140710
Derated Library Characterization and AVS
bull VBTI = Voltage for BTI aging estimation
bull Vlib = Voltage for circuit performance estimation (library characterization)
bull VBTI and Vlib are required in signoff
bull VBTI and Vlib selection should consider BTI + AVS interaction
bull Aging and Vfinal are unknowns before circuit implementation
BTI degradation and AVS
Vfinal
VBTI |Vt|
Step 1
Vlib
Derated library
Step 2
Circuit implementation and
signoff
circuit
Step 3
28ISVLSI-2014 invited talk 140710
Library Characterization for AVS
bull VBTI = Voltage for BTI aging estimation
bull Vlib = Voltage for circuit performance estimation (library characterization)
bull VBTI and Vlib are required in signoff
bull VBTI and Vlib depend on aging during AVS
bull Aging and Vfinal are unknowns before circuit implementation
Vlib
VBTI Derated library
|Vt| Circuit implementation and
signoff
circuitBTI degradation and AVS
Vfinal
Step 1 Step 2 Step 3
No obvious guideline to define VBTI and Vlib
Inconsistency among Vfinal Vlib VBTI
bull What is the design overhead when timing libraries are not properly characterized
bull Can we define BTI- and AVS-aware signoff corners that ensure product goals with small design lifetime energy overheads Joint work with Wei-Ting Jonas Chan Tuck-Boon Chan Siddhartha Nath
29ISVLSI-2014 invited talk 140710
Power vs Area Across Different Signoffs
ldquoKneerdquo point for balanced area and power tradeoff
Optimistic signoff corner bull AVS increases supply voltage
aggressively to compensate aging
bull Large lifetime energy overhead
bull May fail to meet timing if desired supply voltage gt Vmax
bull Selection of signoff modes affects area power
bull ASP-DAC 2013 Optimization of signoff modes
Improve performance power or area
Reduce overdesign
NOM
ODNOM
OD
time
Vdd
tnom tOD tnom tOD
Also Multi-Mode Signoff Choices Matter
12
Fix fOD still 14 power range
Power of circuits w different overdrive modes
Different overdrive modes 26 power range
fnom = 800MHz Vnom = 08V
36ISVLSI-2014 invited talk 140710
Also Tunable Monitors Less Margin
Aggressive config Vmin_est lt Vmin_chip Some chips will fail
Optimized configbull Increase high
resistance passgatesbull Vmin_est asymp Vmin_chip
Default config
bull Low resistance passgates
bull Guardband for worst-case
bull Vmin_est gt Vmin_chip
bull 13mV margin
37ISVLSI-2014 invited talk 140710
Also Tunable Monitors Less Margin
Aggressive config Vmin_est lt Vmin_chip Some chips will fail
Default config
bull Low resistance passgates
bull Guardband for worst-case
bull Vmin_est gt Vmin_chip
bull 13mV margin
Optimized configbull Increase high
resistance passgatesbull Vmin_est asymp Vmin_chip
Benefits of tunability bull Compensate for difference
between model vs siliconbull Recover margin when variation is
reduced due to improved process
38ISVLSI-2014 invited talk 140710
Outlinebull Introductionbull Modeling of IC Variabilitybull Margining of IC Variabilitybull Tolerance of IC Variabilitybull Conclusions
39ISVLSI-2014 invited talk 140710
Conclusionsbull Variability severely challenges IC value
bull In manufacturing process during operation across lifetime
bull Benefit of ldquonext noderdquo is increasingly hard to findbull Entire node is a ldquo202020rdquo value propositionbull 5-10 in PPA metrics is now substantial at leading edge
bull Variability is connected to tapeout IC properties by models margins tolerances used in signoff
bull Some takeaways from this talkbull Substantial benefit from tightening BEOL corners (= signoff)bull ldquoMinimum cost of resiliencerdquo is a rich optimization challengebull Chicken-egg loops in signoff definition can be brokenbull Holistic approaches will provide ldquoequivalent scalingrdquo that
extends the value trajectory of Moorersquos Law
40ISVLSI-2014 invited talk 140710
Thank You
41ISVLSI-2014 invited talk 140710
Backup
42ISVLSI-2014 invited talk 140710
Power Penalty to Fix EM with AVS
1 2 3 4 5 6 7 8 91200
1300
1400
1500
1600
1700
030
032
034
036
Core Power (mW) PG Power (mW)
Implemetation
Core
Pow
er (m
W)
PG
Pow
er (m
W)
bull Core power increases due to elevated voltage bull PG power increases due to both elevated voltage and mesh degradationbull A tradeoff between invested guardband in signoff
Highest invested guardband
Least invested guardband
14 power penalty
>
43ISVLSI-2014 invited talk 140710
Homogeneous Cornersbull (1) Define RC corners of each layer separatelybull (2) Use corners from each layer to construct a
homogeneous corner for an interconnect stack
C-3σ
Layer M2
3σ
C-3σ
Layer M1
3σ
Interconnect stack with M1 and M2
M1 C
M2 C
3σ Pessimism
Example worst-case capacitance corner Homogeneous
Cw corner
44ISVLSI-2014 invited talk 140710
Homogeneous Cornersbull (1) Define RC corners of each layer separatelybull (2) Use corners from each layer to construct a
homogeneous corner for an interconnect stack
Interconnect stack with M1 and M2
M1 C
M2 C
3σ
Homogeneous Cw corner
C-3σ
Layer M2
3σ
C-3σ
Layer M1
3σ
Pessimism
Example worst-case capacitance corner
When variations in different layers are not fully correlated pessimism of homogeneous corners increase with layers
45ISVLSI-2014 invited talk 140710
Correlation Matrixbull Let Σ be the correlation matrix for variation sources
M1 M2 M3 M4
ΔW ΔT ΔH ΔW ΔT ΔH ΔW ΔT ΔH ΔW ΔT ΔH
M1 ΔW 1 0 0 γ 0 0 γ 0 0 0 0 0
ΔT 0 1 0 0 γ 0 0 γ 0 0 0 0
ΔH 0 0 1 0 0 γ 0 0 γ 0 0 0
M2 ΔW γ 0 0 1 0 0 γ 0 0 0 0 0
ΔT 0 γ 0 0 1 0 0 γ 0 0 0 0
ΔH 0 0 γ 0 0 1 0 0 γ 0 0 0
M3 ΔW γ 0 0 γ 0 0 1 0 0 0 0 0
ΔT 0 γ 0 0 γ 0 0 1 0 0 0 0
ΔH 0 0 γ 0 0 γ 0 0 1 0 0 0
M4 ΔW 0 0 0 0 0 0 0 0 0 1 0 0
ΔT 0 0 0 0 0 0 0 0 0 0 1 0
ΔH 0 0 0 0 0 0 0 0 0 0 0 1
= Σ
Correlation for variation sources with the same variation type and in the process module γ 05
Variation sources in different process modules are independent
46ISVLSI-2014 invited talk 140710
Wiring Structure in Timing-Critical Paths (2)
bull 92 of paths have lt 60 of wirelength on any single layer
Max wirelength ratio across all layers ()
Cum
ulati
ve p
roba
bilit
y
092
60
bull Variations in different layers are not fully correlated
bull Some paths have α gt 10 a CBC can underestimate delay variationsbull But these paths have larger delays at the other corner
C-worst corner underestimates delay variations but these paths are dominated by the RC-worst corner
α lt 10 delay variations are covered by the RC-worst corner
Dominated by C-worstΔdelay at C-worst gt Δdelay at RC-worst
Dominated by RC-worst Δdelay at RC-worst gt Δdelay at C-worst
Δdelay at RC-worst [d(Ycw) ndash d(Ytyp)] d(Ytyp)
48ISVLSI-2014 invited talk 140710
Delay Variation
α α
Δdelay at C-worst [d(Ycw) ndash d(Ytyp)] d(Ytyp)
bull Some paths have α gt 10 a CBC can underestimate delay variationsbull But these paths have larger delays at the other corner
C-worst corner underestimates delay variations but these paths are dominated by the RC-worst corner
α lt 10 delay variations are covered by the RC-worst corner
Dominated by C-worstΔdelay at C-worst gt Δdelay at RC-worst
Dominated by RC-worst Δdelay at RC-worst gt Δdelay at C-worst
Δdelay at RC-worst [d(Ycw) ndash d(Ytyp)] d(Ytyp)
bull Paths are more sensitive to R or to Cbull Using RC-worst or C-worst only will underestimate delay variationsbull Need both RC- and C-worst corners to cover process variationsbull In the following discussions α is defined at the dominant corner
49ISVLSI-2014 invited talk 140710
Non-Homogeneous Corner
bull Each layer can have different skewed variationsInterconnect stack with M1 and M2
M1 C
M2 C
3σ
Non-homogeneous cornerM1 == Cw (3σ)M2 == Ctyp
bull Less pessimism with non-homogeneous cornersbull Challenge
bull Many feasible combinationsbull A corner can only cover certain pathsbull How to choose the best combinations
50ISVLSI-2014 invited talk 140710
Opportunities for Tightened BEOL Corners
bull CBC can be pessimistic Most paths have α lt 05 bull Use tightened BEOL corners eg scale BEOL variation in
itf with α = 05
Δdj(Yrcw)dj(Ytyp) x 100
3σjd(Ytyp) x 100
Challenge how to avoid underestimating delay variation to preserve parametric yield
51ISVLSI-2014 invited talk 140710
Wiring Structure in Timing-Critical Paths
bull Critical paths are structurally similar
bull Wires on critical paths are routed on many layers
bull Structure is an outcome of the design flow
Testcasebull 45nm foundry library (wire
resistivity scaled by 8X)bull Netlist NETCARD 1mm2 570K
standard cell instancesbull 9 metal layersbull Extract critical paths from
different PVT and BEOL corners
Wirelength ratio ()
52ISVLSI-2014 invited talk 140710
Proposed Timing Signoff Flow
bull Extract RC at RC-worst C-worst and the typical corners
bull Calculate Δdelay of critical paths
bull Put path j in the group Gtbc if Δdelay is larger than a threshold
bull Fix only the paths in Gtbc using tightened BEOL corners
bull Since tightened corners have smaller delay variations timing closure is easier
Routed design
Timing analysis at BEOL corners Ytyp Ycw Yrcw
GTBC GCBC
ECOusing CBC
Timing analysis
using TBC
violation = 0
Timing analysis
using CBC
violation = 0
ECOusing TBC
done
53ISVLSI-2014 invited talk 140710
Experiment Setup
LEON3MP NETCARD SUPERBLUE12Clock period (ns) 18 20 31
bull Model BTI degradation with Vfinal throughout lifetime
bull Aging of a flat Vfinal asymp aging of an adaptive Vdd
bull But slightly pessimistic
Vdd
time
NBTI
PBTI
VBTI = Vlib asymp Vfinal
58ISVLSI-2014 invited talk 140710
Vfinal Estimation
bull Problem Vfinal is not available at early design stage (design has not been implemented)
bull Vfinal = Vdd end of life (to compensate BTI aging)
bull Gates along critical pathbull Timing slack at t = 0
bull Circuit activity is not an issue bull Because BTI effect is not sensitive to circuit activitybull DC or AC stress model is sufficient
59ISVLSI-2014 invited talk 140710
Observation and Heuristic 2
bull Observation 2 Vfinal is not sensitive to gate types
bull Heuristic 2 use average Vfinal of different gate typesbull Vfinal is a function of timing slack
bull Assume timing slack = 0
10mV
60ISVLSI-2014 invited talk 140710
Technology and Benchmark Circuits
bull NANGATE library with 32nm PTM technology bull Signoff for setup time violationbull Temperature = 125Cbull Process corner = slow NMOS and PMOSbull BTI degradation = DC AC
Supply voltages
Circuit Frequency (GHz)C5315 138c7552 125AES 089MPEG2 105
Vmax105V
Vinit090V
Vheur1 (DC) 097V
Vheur1 (AC) 095V
Vheur2 (DC) 095V
Vheur2 (AC) 093V
61ISVLSI-2014 invited talk 140710
A Reference Signoff Flow
bull Basic idea keep a consistent VBTI VLIB and Vdd throughout circuit lifetime
bull Signoff flowbull Estimate aging at each time step
bull Update circuit timing and Vdd
bull Repeat until t = tfinal
bull Modify circuit and start over if Vfinal gt maximum allowed voltage
bull No overhead in timing analysis but very slow Many STA runs
and library
Vstep AVS voltage stepVfinal converged voltage
62ISVLSI-2014 invited talk 140710
Experiment Setupbull Characterize different derated libraries
bull Evaluate impact of library characterizationbull Seven setups
1 VBTI = Vlib = Vinit Ignore AVS2 Most pessimistic derated library3 VBTI = Vlib = Vmax Extreme corner for AVS4 VBTI = Vfinal Do not overestimate aging but ignores AVS5 No derated library (reference)6 Proposed method with α=07 Proposed method with α=003
Case 1 2 3 4 5 6 7Vlib(V) Vinit Vinit Vmax Vinit NA Vheur1 Vheur2
VBTI (V) Vinit Vmax Vmax Vfinal NA Vheur1 Vheur2
63ISVLSI-2014 invited talk 140710
ldquoChicken and Eggrdquo Loop
bull ldquoChicken and eggrdquo loop in signoffbull Derated library characterization is related to BTI + AVSbull AVS affected by circuit implementation
bull Timing constraints critical paths etc
bull Circuit is affected by library characterization
Circuit
Derated Libraries
Vfinal
Vlib VBTI
64ISVLSI-2014 invited talk 140710
Bias Temperature Instability (BTI)
|ΔVth| increases when device is on (stressed)|ΔVth| is partially recovered when device is off (relaxed)NBTI PMOS PBTINMOS
|Vgs|
time
ON OFF ON OFF
[VattikondaWC06]
Device aging (|ΔVth|) accumulates over time
[TCASrsquo14]
65ISVLSI-2014 invited talk 140710
Observation 1
[Chan11]
bull BTI is a ldquofront-loadedrdquo phenomenon
bull 50 BTI aging happens within the 1st year of circuit lifetime (total lifetime = 10 years)
bull Most Vdd increment happens in early lifetime
bull Gap between Vdd and Vfinal reduces rapidly
asymp70 Vdd increment in 1 year(remaining 30 over 9 years)
Vfinal
66ISVLSI-2014 invited talk 140710
Results for DC Scenario
Optimistic signoff corner bull AVS increases supply voltage
aggressively to compensate agingbull Consume more powerbull May fail to meet timing if desired
bull Assume no EM fix at signoffbull BTI degradation is checked at each step and MTTF is updated as
30 MTTF penalty
200mV voltage compensation
>
70ISVLSI-2014 invited talk 140710
0 2 4 6 8 10 12090092094096098100102104
S1 S2 S3 S4 S5
Year
VDD
DMA 3S1 S2 S3 S4 S5
78
79
80
81
MTT
F (Y
ear)
EM Impact on AVS Scheduling
12 years MTTF penalty
>
71ISVLSI-2014 invited talk 140710
What is ldquoSignoffrdquo
bull Foundation of contract between design house and foundrybull ldquochip should workrdquo stack of models margins analysesbull Function timing signal integrity power integrity hellip
Nominal VddStatic IR drop
Power grid IR gradientDynamic IR
HCINBTI
Signoff Vdd
Voltage
Problem Margins = pessimism
overdesign schedule delay
ldquomargin stackrdquo for voltage signoff
Operating voltage
72ISVLSI-2014 invited talk 140710
Statistical Timing Analysis (1)
bull Delay sensitivity of path pj to variation source zv
bull Assumptions bull Δdjv is linear with respect to variation sources
bull Variation sources are normal distributions
bull Obtain Δdjv using 28 runs of RC extraction and static timing analysis (STA)
28 itf files (27 variation
sources + Ytyp)
Routed Netlist
RC extraction
STA
Δdjv
Δdjv = [ - ] 3dj(Yv) dj(Ytyp)
Note Path delay includes gate and wire delays
73ISVLSI-2014 invited talk 140710
Statistical Timing Analysis (2)bull Σ is the correlation matrix for variation sources (eg 27 x 27)
bull Σ = λλT (Note λ is obtained by Cholesky decomposition)
h h119879 119903119900119906119892 119901119906119905=1minus119864119877119879
+1minus119864119877119903times119879
recovery cycles
Clock period
Error rate [Kahng10]
76ISVLSI-2014 invited talk 140710
Selective-Endpoint Optimizationbull Optimize fanin cone w tighter constraints Allows replacement of Razor FF w normal FFbull Trade off cost of resilience vs data path optimization
Energy Reduction in AVS Contextbull Adaptive voltage scaling allows lower supply voltage for resilient
designs thus reduced powerbull Proposed method trades off between timing-error penalty vs
reduced power at a lower supply voltagebull Proposed method achieves an average of 18 energy reduction
compared to pure-margin designs Resilience benefits increase in the context of AVS strategy
084 088 092 096 10030
36
42
48
54
60brute-forcepure-marginCombOpt
Supply voltage (V)
En
erg
y (
mJ
)
084 086 088 09 092 094 096 098 1 10225
29
33
37
41
45brute-force
pure-margin
CombOpt
Supply voltage (V)
En
erg
y (
mJ
)
MUL EXU
Minimum achievable energy
80ISVLSI-2014 invited talk 140710
Our Concept Mode Dominancebull Design cone (of mode A) is the union of all the feasible operating modes for
circuits signed of at mode Abull Design cone is determined by tradeoff between voltage and frequency (mainly
threshold voltages)bull One mode is outside of the design cone of the other
failed design overdesignbull Mode A has positive timing slacks with respect to mode B
mode A dominates mode Bbull Equivalent dominance no mode is dominated by the other
bull Modes are in each othersrsquo design cone
Voltage
Frequency
A
Negative Slacks = failed design
Positive Slacks = overdesign
B
C
Design Cone of mode A
Multi-mode signoff at modes which do not exhibit equivalent dominance leads to overdesign
Guideline search for signoff modes within design cone reduce overdesign
81ISVLSI-2014 invited talk 140710
Our Method Global Optimizationbull Iteratively sample and refine power models
bull Avoid circuit implementation at each modebull Small constant of runs is enough Scalable
Sample (SPampR)
Construct power models
Estimate optimal signoff modes
Sample (SPampR)
Refine power models
Adaptive search
Global optimization flow
09 10 11 1214
15
16
17
18
19
201st 2nd real
Signoff Voltage (v)
Po
wer
(m
W)
Power estimation of adaptive search
bull Ovals indicate sample pointsbull 1st 2nd power from power models at first
second iterationbull real power from real implemented circuits
Design AESf 700MHz
82ISVLSI-2014 invited talk 140710
Classes of Closed-Loop AVS
bull Critical path may be difficult to identify (IP from 3rd party)
bull Calibrating monitors at multiple modesvoltages requires long test time
Closed-Loop AVS
Design-dependent replica
In-situmonitor
Generic monitor
bull Does not capture design-specific performance variation
82
This work Tunable monitor for closed-loop AVSbull Can be applied as a generic monitorbull Or tuned to capture design-specific performance
83ISVLSI-2014 invited talk 140710
Design of RO with Tunable Vmin
bull Identified two circuit knobs to tune Vmin
bull Series resistancebull Cell types (INV NAND NOR)
bull Proposed circuitbull Different cell type covers different process cornersbull Tune series resistance of each stage to high or low
1 bit 1 bit 1 bit Control pins
High resistance
Low resistance
84ISVLSI-2014 invited talk 140710
Benefit of Resilience Cost Reductionbull Reference flows
bull Pure-margin (PM) conventional methodology w only margin insertionbull Brute-force (BF) insert error-tolerant FFs at timing-critical endpoints
bull Proposed method (CO) achieves up to 20 energy reduction compared to reference methods
bull Resilience benefits increase with safety margin
PM BF CO PM BF CO PM BF CO25
30
35
40
45
50
55 Energy penalty of throughput degradationEnergy penalty of additional circuitsEnergy wo resilience
En
erg
y (
mJ
)
Large marginMedium marginSmall margin
MUL
PM BF CO PM BF CO PM BF CO25
27
29
31
33
35
En
erg
y (
mJ
)
EXU
Large marginMedium marginSmall margin
Smallmediumlarge margin safety margin = 51015 of clock period
85ISVLSI-2014 invited talk 140710
Increased Benefit of Resilience With AVSbull AVS (Adaptive Voltage Scaling) allows lower supply voltage for
resilient designs reduced powerbull We trade off between timing-error penalty vs reduced power at a
lower supply voltagebull Average 18 energy reduction compared to pure-margin designs
Resilience benefits increase in AVS context
084 088 092 096 10030
36
42
48
54
60brute-forcepure-marginCombOpt
Supply voltage (V)
En
erg
y (
mJ
)
084 086 088 09 092 094 096 098 1 10225
29
33
37
41
45brute-force
pure-margin
CombOpt
Supply voltage (V)
En
erg
y (
mJ
)
MUL EXU
Minimum achievable energy
86ISVLSI-2014 invited talk 140710
Overall Optimization Flow
bull Iteratively optimize with SEOpt and SkewOpt
Initial placement (all FFs = error-tolerant FFs)
Energy lt min energy
Save current solution
Margin insertion on K paths based on sensitivity function
Replace error-tolerant FFs w normal FFs
SEOpt
Activity-aware clock skew optimization
SkewOpt
Toward Holistic Modeling Margining and Tolerance of IC Variabi
IC Variability
Challenge Value of Technology
Solutions Modeling Margining Tolerance
Outline
BEOL Corner Optimization
Proposed Timing Signoff Flow
Conventional BEOL Corners
Statistical RC Model
Pessimism of Conventional BEOL Corners (CBC)
Intuition on Delay Variability Across Cw RCw
Intuition on Delay Variability Across Cw RCw (2)
Scaling Factor α and Delay Variation
Find Paths for Which TBCs Can Be Used
Determining α Arcw and Acw
Benefits of Tightened BEOL Corners
Outline (2)
How to Minimize Cost of Resilience
Tradeoff Resilience Cost vs Datapath Cost
Selective-Endpoint Optimization (SEOpt)
Clock Skew Optimization (SkewOpt)
Overall Optimization Flow
Benefit of Low-Cost Resilience
Increased Benefit of Resilience with AVS
Outline (3)
Breaking Chicken-Egg Loops Less Margin
Derated Library Characterization and AVS
Library Characterization for AVS
Power vs Area Across Different Signoffs
Heuristics 1
Vfinal Estimation
Observation and Heuristic 2
Proposed Library Characterization Flow
Power vs Area for All Designs
Also Multi-Mode Signoff Choices Matter
Also Tunable Monitors Less Margin
Also Tunable Monitors Less Margin (2)
Outline (4)
Conclusions
Thank You
Backup
Power Penalty to Fix EM with AVS
Homogeneous Corners
Homogeneous Corners (2)
Correlation Matrix
Wiring Structure in Timing-Critical Paths (2)
Delay Variation
Delay Variation (2)
Non-Homogeneous Corner
Opportunities for Tightened BEOL Corners
Wiring Structure in Timing-Critical Paths (2)
Proposed Timing Signoff Flow (2)
Experiment Setup
Further Analysis
Scaling Factor Results
Benefits of Tightened BEOL Corners (1)
Heuristics 1 (2)
Vfinal Estimation (2)
Observation and Heuristic 2 (2)
Technology and Benchmark Circuits
A Reference Signoff Flow
Experiment Setup (2)
ldquoChicken and Eggrdquo Loop
Bias Temperature Instability (BTI)
Observation 1
Results for DC Scenario
Problem Signoff Corner Definition
AVS Signoff Corner Selection
AVS Impact on EM Lifetime
EM Impact on AVS Scheduling
What is ldquoSignoffrdquo
Statistical Timing Analysis (1)
Statistical Timing Analysis (2)
Resilient Designs
Resilience Cost Reduction Problem
Selective-Endpoint Optimization
Process-Aware Vdd Scaling (PVS)
Challenge Variability
Energy Reduction in AVS Context
Our Concept Mode Dominance
Our Method Global Optimization
Classes of Closed-Loop AVS
Design of RO with Tunable Vmin
Benefit of Resilience Cost Reduction
Increased Benefit of Resilience With AVS
Overall Optimization Flow (2)
23ISVLSI-2014 invited talk 140710
Benefit of Low-Cost Resiliencebull Reference flows
bull Pure-margin (PM) conventional method w only margin insertionbull Brute-force (BF) use error-tolerant FFs for timing-critical endpoints
bull Proposed method (CO) achieves up to 21 energy reduction compared to reference methods
bull Resilience benefits increase with larger process variation
PM BF CO PM BF CO PM BF CO27
29
31
33
35
37
En
erg
y (
mJ
)
PM BF CO PM BF CO PM BF CO22
26
30
34
38Energy penalty of throughput degradation
Energy penalty of additional circuits
Energy wo resilience
En
erg
y (
mJ
)
Large marginMedium marginSmall margin
MUL
EXU
Large marginMedium marginSmall margin
Smallmediumlarge margin 1σ2σ3σ for SS corner Technology foundry 28nm
24ISVLSI-2014 invited talk 140710
Increased Benefit of Resilience with AVSbull Adaptive voltage scaling allows a lower supply voltage for resilient
designs thus reduced powerbull Proposed method trades off between timing-error penalty vs
reduced power at a lower supply voltagebull Proposed method achieves an average of 17 energy reduction
compared to pure-margin designs Resilience benefits increase in the context of AVS strategy
086 09 094 098 10225
30
35
40
45
50pure-marginbrute-forceCombOpt
Supply voltage (V)
En
erg
y (
mJ
)
070 072 074 076 078 08024
26
28
30
32
34
36 pure-marginbrute-forceCombOpt
Supply voltage (V)
En
erg
y (
mJ
)
MUL EXU
Minimum achievable energy
Technology foundry 28nm
25ISVLSI-2014 invited talk 140710
Outlinebull Introductionbull Modeling of IC Variabilitybull Tolerance of IC Variabilitybull Margining of IC Variabilitybull Conclusions
26ISVLSI-2014 invited talk 140710
Breaking Chicken-Egg Loops Less Marginbull Example Interaction between reliability margin and AVS designs
bull Bias temperature instability (BTI) aging higher |ΔVth| lower fmax
bull AVS can be used to compensate for performance degradation
Circuit
Closed-loop AVS
On-chip aging
monitor
Circuit performanc
e
Voltage regulato
r
Circuit frequency
Vdd
time
time
Without AVSWith AVS
target
27ISVLSI-2014 invited talk 140710
Derated Library Characterization and AVS
bull VBTI = Voltage for BTI aging estimation
bull Vlib = Voltage for circuit performance estimation (library characterization)
bull VBTI and Vlib are required in signoff
bull VBTI and Vlib selection should consider BTI + AVS interaction
bull Aging and Vfinal are unknowns before circuit implementation
BTI degradation and AVS
Vfinal
VBTI |Vt|
Step 1
Vlib
Derated library
Step 2
Circuit implementation and
signoff
circuit
Step 3
28ISVLSI-2014 invited talk 140710
Library Characterization for AVS
bull VBTI = Voltage for BTI aging estimation
bull Vlib = Voltage for circuit performance estimation (library characterization)
bull VBTI and Vlib are required in signoff
bull VBTI and Vlib depend on aging during AVS
bull Aging and Vfinal are unknowns before circuit implementation
Vlib
VBTI Derated library
|Vt| Circuit implementation and
signoff
circuitBTI degradation and AVS
Vfinal
Step 1 Step 2 Step 3
No obvious guideline to define VBTI and Vlib
Inconsistency among Vfinal Vlib VBTI
bull What is the design overhead when timing libraries are not properly characterized
bull Can we define BTI- and AVS-aware signoff corners that ensure product goals with small design lifetime energy overheads Joint work with Wei-Ting Jonas Chan Tuck-Boon Chan Siddhartha Nath
29ISVLSI-2014 invited talk 140710
Power vs Area Across Different Signoffs
ldquoKneerdquo point for balanced area and power tradeoff
Optimistic signoff corner bull AVS increases supply voltage
aggressively to compensate aging
bull Large lifetime energy overhead
bull May fail to meet timing if desired supply voltage gt Vmax
bull Selection of signoff modes affects area power
bull ASP-DAC 2013 Optimization of signoff modes
Improve performance power or area
Reduce overdesign
NOM
ODNOM
OD
time
Vdd
tnom tOD tnom tOD
Also Multi-Mode Signoff Choices Matter
12
Fix fOD still 14 power range
Power of circuits w different overdrive modes
Different overdrive modes 26 power range
fnom = 800MHz Vnom = 08V
36ISVLSI-2014 invited talk 140710
Also Tunable Monitors Less Margin
Aggressive config Vmin_est lt Vmin_chip Some chips will fail
Optimized configbull Increase high
resistance passgatesbull Vmin_est asymp Vmin_chip
Default config
bull Low resistance passgates
bull Guardband for worst-case
bull Vmin_est gt Vmin_chip
bull 13mV margin
37ISVLSI-2014 invited talk 140710
Also Tunable Monitors Less Margin
Aggressive config Vmin_est lt Vmin_chip Some chips will fail
Default config
bull Low resistance passgates
bull Guardband for worst-case
bull Vmin_est gt Vmin_chip
bull 13mV margin
Optimized configbull Increase high
resistance passgatesbull Vmin_est asymp Vmin_chip
Benefits of tunability bull Compensate for difference
between model vs siliconbull Recover margin when variation is
reduced due to improved process
38ISVLSI-2014 invited talk 140710
Outlinebull Introductionbull Modeling of IC Variabilitybull Margining of IC Variabilitybull Tolerance of IC Variabilitybull Conclusions
39ISVLSI-2014 invited talk 140710
Conclusionsbull Variability severely challenges IC value
bull In manufacturing process during operation across lifetime
bull Benefit of ldquonext noderdquo is increasingly hard to findbull Entire node is a ldquo202020rdquo value propositionbull 5-10 in PPA metrics is now substantial at leading edge
bull Variability is connected to tapeout IC properties by models margins tolerances used in signoff
bull Some takeaways from this talkbull Substantial benefit from tightening BEOL corners (= signoff)bull ldquoMinimum cost of resiliencerdquo is a rich optimization challengebull Chicken-egg loops in signoff definition can be brokenbull Holistic approaches will provide ldquoequivalent scalingrdquo that
extends the value trajectory of Moorersquos Law
40ISVLSI-2014 invited talk 140710
Thank You
41ISVLSI-2014 invited talk 140710
Backup
42ISVLSI-2014 invited talk 140710
Power Penalty to Fix EM with AVS
1 2 3 4 5 6 7 8 91200
1300
1400
1500
1600
1700
030
032
034
036
Core Power (mW) PG Power (mW)
Implemetation
Core
Pow
er (m
W)
PG
Pow
er (m
W)
bull Core power increases due to elevated voltage bull PG power increases due to both elevated voltage and mesh degradationbull A tradeoff between invested guardband in signoff
Highest invested guardband
Least invested guardband
14 power penalty
>
43ISVLSI-2014 invited talk 140710
Homogeneous Cornersbull (1) Define RC corners of each layer separatelybull (2) Use corners from each layer to construct a
homogeneous corner for an interconnect stack
C-3σ
Layer M2
3σ
C-3σ
Layer M1
3σ
Interconnect stack with M1 and M2
M1 C
M2 C
3σ Pessimism
Example worst-case capacitance corner Homogeneous
Cw corner
44ISVLSI-2014 invited talk 140710
Homogeneous Cornersbull (1) Define RC corners of each layer separatelybull (2) Use corners from each layer to construct a
homogeneous corner for an interconnect stack
Interconnect stack with M1 and M2
M1 C
M2 C
3σ
Homogeneous Cw corner
C-3σ
Layer M2
3σ
C-3σ
Layer M1
3σ
Pessimism
Example worst-case capacitance corner
When variations in different layers are not fully correlated pessimism of homogeneous corners increase with layers
45ISVLSI-2014 invited talk 140710
Correlation Matrixbull Let Σ be the correlation matrix for variation sources
M1 M2 M3 M4
ΔW ΔT ΔH ΔW ΔT ΔH ΔW ΔT ΔH ΔW ΔT ΔH
M1 ΔW 1 0 0 γ 0 0 γ 0 0 0 0 0
ΔT 0 1 0 0 γ 0 0 γ 0 0 0 0
ΔH 0 0 1 0 0 γ 0 0 γ 0 0 0
M2 ΔW γ 0 0 1 0 0 γ 0 0 0 0 0
ΔT 0 γ 0 0 1 0 0 γ 0 0 0 0
ΔH 0 0 γ 0 0 1 0 0 γ 0 0 0
M3 ΔW γ 0 0 γ 0 0 1 0 0 0 0 0
ΔT 0 γ 0 0 γ 0 0 1 0 0 0 0
ΔH 0 0 γ 0 0 γ 0 0 1 0 0 0
M4 ΔW 0 0 0 0 0 0 0 0 0 1 0 0
ΔT 0 0 0 0 0 0 0 0 0 0 1 0
ΔH 0 0 0 0 0 0 0 0 0 0 0 1
= Σ
Correlation for variation sources with the same variation type and in the process module γ 05
Variation sources in different process modules are independent
46ISVLSI-2014 invited talk 140710
Wiring Structure in Timing-Critical Paths (2)
bull 92 of paths have lt 60 of wirelength on any single layer
Max wirelength ratio across all layers ()
Cum
ulati
ve p
roba
bilit
y
092
60
bull Variations in different layers are not fully correlated
bull Some paths have α gt 10 a CBC can underestimate delay variationsbull But these paths have larger delays at the other corner
C-worst corner underestimates delay variations but these paths are dominated by the RC-worst corner
α lt 10 delay variations are covered by the RC-worst corner
Dominated by C-worstΔdelay at C-worst gt Δdelay at RC-worst
Dominated by RC-worst Δdelay at RC-worst gt Δdelay at C-worst
Δdelay at RC-worst [d(Ycw) ndash d(Ytyp)] d(Ytyp)
48ISVLSI-2014 invited talk 140710
Delay Variation
α α
Δdelay at C-worst [d(Ycw) ndash d(Ytyp)] d(Ytyp)
bull Some paths have α gt 10 a CBC can underestimate delay variationsbull But these paths have larger delays at the other corner
C-worst corner underestimates delay variations but these paths are dominated by the RC-worst corner
α lt 10 delay variations are covered by the RC-worst corner
Dominated by C-worstΔdelay at C-worst gt Δdelay at RC-worst
Dominated by RC-worst Δdelay at RC-worst gt Δdelay at C-worst
Δdelay at RC-worst [d(Ycw) ndash d(Ytyp)] d(Ytyp)
bull Paths are more sensitive to R or to Cbull Using RC-worst or C-worst only will underestimate delay variationsbull Need both RC- and C-worst corners to cover process variationsbull In the following discussions α is defined at the dominant corner
49ISVLSI-2014 invited talk 140710
Non-Homogeneous Corner
bull Each layer can have different skewed variationsInterconnect stack with M1 and M2
M1 C
M2 C
3σ
Non-homogeneous cornerM1 == Cw (3σ)M2 == Ctyp
bull Less pessimism with non-homogeneous cornersbull Challenge
bull Many feasible combinationsbull A corner can only cover certain pathsbull How to choose the best combinations
50ISVLSI-2014 invited talk 140710
Opportunities for Tightened BEOL Corners
bull CBC can be pessimistic Most paths have α lt 05 bull Use tightened BEOL corners eg scale BEOL variation in
itf with α = 05
Δdj(Yrcw)dj(Ytyp) x 100
3σjd(Ytyp) x 100
Challenge how to avoid underestimating delay variation to preserve parametric yield
51ISVLSI-2014 invited talk 140710
Wiring Structure in Timing-Critical Paths
bull Critical paths are structurally similar
bull Wires on critical paths are routed on many layers
bull Structure is an outcome of the design flow
Testcasebull 45nm foundry library (wire
resistivity scaled by 8X)bull Netlist NETCARD 1mm2 570K
standard cell instancesbull 9 metal layersbull Extract critical paths from
different PVT and BEOL corners
Wirelength ratio ()
52ISVLSI-2014 invited talk 140710
Proposed Timing Signoff Flow
bull Extract RC at RC-worst C-worst and the typical corners
bull Calculate Δdelay of critical paths
bull Put path j in the group Gtbc if Δdelay is larger than a threshold
bull Fix only the paths in Gtbc using tightened BEOL corners
bull Since tightened corners have smaller delay variations timing closure is easier
Routed design
Timing analysis at BEOL corners Ytyp Ycw Yrcw
GTBC GCBC
ECOusing CBC
Timing analysis
using TBC
violation = 0
Timing analysis
using CBC
violation = 0
ECOusing TBC
done
53ISVLSI-2014 invited talk 140710
Experiment Setup
LEON3MP NETCARD SUPERBLUE12Clock period (ns) 18 20 31
bull Model BTI degradation with Vfinal throughout lifetime
bull Aging of a flat Vfinal asymp aging of an adaptive Vdd
bull But slightly pessimistic
Vdd
time
NBTI
PBTI
VBTI = Vlib asymp Vfinal
58ISVLSI-2014 invited talk 140710
Vfinal Estimation
bull Problem Vfinal is not available at early design stage (design has not been implemented)
bull Vfinal = Vdd end of life (to compensate BTI aging)
bull Gates along critical pathbull Timing slack at t = 0
bull Circuit activity is not an issue bull Because BTI effect is not sensitive to circuit activitybull DC or AC stress model is sufficient
59ISVLSI-2014 invited talk 140710
Observation and Heuristic 2
bull Observation 2 Vfinal is not sensitive to gate types
bull Heuristic 2 use average Vfinal of different gate typesbull Vfinal is a function of timing slack
bull Assume timing slack = 0
10mV
60ISVLSI-2014 invited talk 140710
Technology and Benchmark Circuits
bull NANGATE library with 32nm PTM technology bull Signoff for setup time violationbull Temperature = 125Cbull Process corner = slow NMOS and PMOSbull BTI degradation = DC AC
Supply voltages
Circuit Frequency (GHz)C5315 138c7552 125AES 089MPEG2 105
Vmax105V
Vinit090V
Vheur1 (DC) 097V
Vheur1 (AC) 095V
Vheur2 (DC) 095V
Vheur2 (AC) 093V
61ISVLSI-2014 invited talk 140710
A Reference Signoff Flow
bull Basic idea keep a consistent VBTI VLIB and Vdd throughout circuit lifetime
bull Signoff flowbull Estimate aging at each time step
bull Update circuit timing and Vdd
bull Repeat until t = tfinal
bull Modify circuit and start over if Vfinal gt maximum allowed voltage
bull No overhead in timing analysis but very slow Many STA runs
and library
Vstep AVS voltage stepVfinal converged voltage
62ISVLSI-2014 invited talk 140710
Experiment Setupbull Characterize different derated libraries
bull Evaluate impact of library characterizationbull Seven setups
1 VBTI = Vlib = Vinit Ignore AVS2 Most pessimistic derated library3 VBTI = Vlib = Vmax Extreme corner for AVS4 VBTI = Vfinal Do not overestimate aging but ignores AVS5 No derated library (reference)6 Proposed method with α=07 Proposed method with α=003
Case 1 2 3 4 5 6 7Vlib(V) Vinit Vinit Vmax Vinit NA Vheur1 Vheur2
VBTI (V) Vinit Vmax Vmax Vfinal NA Vheur1 Vheur2
63ISVLSI-2014 invited talk 140710
ldquoChicken and Eggrdquo Loop
bull ldquoChicken and eggrdquo loop in signoffbull Derated library characterization is related to BTI + AVSbull AVS affected by circuit implementation
bull Timing constraints critical paths etc
bull Circuit is affected by library characterization
Circuit
Derated Libraries
Vfinal
Vlib VBTI
64ISVLSI-2014 invited talk 140710
Bias Temperature Instability (BTI)
|ΔVth| increases when device is on (stressed)|ΔVth| is partially recovered when device is off (relaxed)NBTI PMOS PBTINMOS
|Vgs|
time
ON OFF ON OFF
[VattikondaWC06]
Device aging (|ΔVth|) accumulates over time
[TCASrsquo14]
65ISVLSI-2014 invited talk 140710
Observation 1
[Chan11]
bull BTI is a ldquofront-loadedrdquo phenomenon
bull 50 BTI aging happens within the 1st year of circuit lifetime (total lifetime = 10 years)
bull Most Vdd increment happens in early lifetime
bull Gap between Vdd and Vfinal reduces rapidly
asymp70 Vdd increment in 1 year(remaining 30 over 9 years)
Vfinal
66ISVLSI-2014 invited talk 140710
Results for DC Scenario
Optimistic signoff corner bull AVS increases supply voltage
aggressively to compensate agingbull Consume more powerbull May fail to meet timing if desired
bull Assume no EM fix at signoffbull BTI degradation is checked at each step and MTTF is updated as
30 MTTF penalty
200mV voltage compensation
>
70ISVLSI-2014 invited talk 140710
0 2 4 6 8 10 12090092094096098100102104
S1 S2 S3 S4 S5
Year
VDD
DMA 3S1 S2 S3 S4 S5
78
79
80
81
MTT
F (Y
ear)
EM Impact on AVS Scheduling
12 years MTTF penalty
>
71ISVLSI-2014 invited talk 140710
What is ldquoSignoffrdquo
bull Foundation of contract between design house and foundrybull ldquochip should workrdquo stack of models margins analysesbull Function timing signal integrity power integrity hellip
Nominal VddStatic IR drop
Power grid IR gradientDynamic IR
HCINBTI
Signoff Vdd
Voltage
Problem Margins = pessimism
overdesign schedule delay
ldquomargin stackrdquo for voltage signoff
Operating voltage
72ISVLSI-2014 invited talk 140710
Statistical Timing Analysis (1)
bull Delay sensitivity of path pj to variation source zv
bull Assumptions bull Δdjv is linear with respect to variation sources
bull Variation sources are normal distributions
bull Obtain Δdjv using 28 runs of RC extraction and static timing analysis (STA)
28 itf files (27 variation
sources + Ytyp)
Routed Netlist
RC extraction
STA
Δdjv
Δdjv = [ - ] 3dj(Yv) dj(Ytyp)
Note Path delay includes gate and wire delays
73ISVLSI-2014 invited talk 140710
Statistical Timing Analysis (2)bull Σ is the correlation matrix for variation sources (eg 27 x 27)
bull Σ = λλT (Note λ is obtained by Cholesky decomposition)
h h119879 119903119900119906119892 119901119906119905=1minus119864119877119879
+1minus119864119877119903times119879
recovery cycles
Clock period
Error rate [Kahng10]
76ISVLSI-2014 invited talk 140710
Selective-Endpoint Optimizationbull Optimize fanin cone w tighter constraints Allows replacement of Razor FF w normal FFbull Trade off cost of resilience vs data path optimization
Energy Reduction in AVS Contextbull Adaptive voltage scaling allows lower supply voltage for resilient
designs thus reduced powerbull Proposed method trades off between timing-error penalty vs
reduced power at a lower supply voltagebull Proposed method achieves an average of 18 energy reduction
compared to pure-margin designs Resilience benefits increase in the context of AVS strategy
084 088 092 096 10030
36
42
48
54
60brute-forcepure-marginCombOpt
Supply voltage (V)
En
erg
y (
mJ
)
084 086 088 09 092 094 096 098 1 10225
29
33
37
41
45brute-force
pure-margin
CombOpt
Supply voltage (V)
En
erg
y (
mJ
)
MUL EXU
Minimum achievable energy
80ISVLSI-2014 invited talk 140710
Our Concept Mode Dominancebull Design cone (of mode A) is the union of all the feasible operating modes for
circuits signed of at mode Abull Design cone is determined by tradeoff between voltage and frequency (mainly
threshold voltages)bull One mode is outside of the design cone of the other
failed design overdesignbull Mode A has positive timing slacks with respect to mode B
mode A dominates mode Bbull Equivalent dominance no mode is dominated by the other
bull Modes are in each othersrsquo design cone
Voltage
Frequency
A
Negative Slacks = failed design
Positive Slacks = overdesign
B
C
Design Cone of mode A
Multi-mode signoff at modes which do not exhibit equivalent dominance leads to overdesign
Guideline search for signoff modes within design cone reduce overdesign
81ISVLSI-2014 invited talk 140710
Our Method Global Optimizationbull Iteratively sample and refine power models
bull Avoid circuit implementation at each modebull Small constant of runs is enough Scalable
Sample (SPampR)
Construct power models
Estimate optimal signoff modes
Sample (SPampR)
Refine power models
Adaptive search
Global optimization flow
09 10 11 1214
15
16
17
18
19
201st 2nd real
Signoff Voltage (v)
Po
wer
(m
W)
Power estimation of adaptive search
bull Ovals indicate sample pointsbull 1st 2nd power from power models at first
second iterationbull real power from real implemented circuits
Design AESf 700MHz
82ISVLSI-2014 invited talk 140710
Classes of Closed-Loop AVS
bull Critical path may be difficult to identify (IP from 3rd party)
bull Calibrating monitors at multiple modesvoltages requires long test time
Closed-Loop AVS
Design-dependent replica
In-situmonitor
Generic monitor
bull Does not capture design-specific performance variation
82
This work Tunable monitor for closed-loop AVSbull Can be applied as a generic monitorbull Or tuned to capture design-specific performance
83ISVLSI-2014 invited talk 140710
Design of RO with Tunable Vmin
bull Identified two circuit knobs to tune Vmin
bull Series resistancebull Cell types (INV NAND NOR)
bull Proposed circuitbull Different cell type covers different process cornersbull Tune series resistance of each stage to high or low
1 bit 1 bit 1 bit Control pins
High resistance
Low resistance
84ISVLSI-2014 invited talk 140710
Benefit of Resilience Cost Reductionbull Reference flows
bull Pure-margin (PM) conventional methodology w only margin insertionbull Brute-force (BF) insert error-tolerant FFs at timing-critical endpoints
bull Proposed method (CO) achieves up to 20 energy reduction compared to reference methods
bull Resilience benefits increase with safety margin
PM BF CO PM BF CO PM BF CO25
30
35
40
45
50
55 Energy penalty of throughput degradationEnergy penalty of additional circuitsEnergy wo resilience
En
erg
y (
mJ
)
Large marginMedium marginSmall margin
MUL
PM BF CO PM BF CO PM BF CO25
27
29
31
33
35
En
erg
y (
mJ
)
EXU
Large marginMedium marginSmall margin
Smallmediumlarge margin safety margin = 51015 of clock period
85ISVLSI-2014 invited talk 140710
Increased Benefit of Resilience With AVSbull AVS (Adaptive Voltage Scaling) allows lower supply voltage for
resilient designs reduced powerbull We trade off between timing-error penalty vs reduced power at a
lower supply voltagebull Average 18 energy reduction compared to pure-margin designs
Resilience benefits increase in AVS context
084 088 092 096 10030
36
42
48
54
60brute-forcepure-marginCombOpt
Supply voltage (V)
En
erg
y (
mJ
)
084 086 088 09 092 094 096 098 1 10225
29
33
37
41
45brute-force
pure-margin
CombOpt
Supply voltage (V)
En
erg
y (
mJ
)
MUL EXU
Minimum achievable energy
86ISVLSI-2014 invited talk 140710
Overall Optimization Flow
bull Iteratively optimize with SEOpt and SkewOpt
Initial placement (all FFs = error-tolerant FFs)
Energy lt min energy
Save current solution
Margin insertion on K paths based on sensitivity function
Replace error-tolerant FFs w normal FFs
SEOpt
Activity-aware clock skew optimization
SkewOpt
Toward Holistic Modeling Margining and Tolerance of IC Variabi
IC Variability
Challenge Value of Technology
Solutions Modeling Margining Tolerance
Outline
BEOL Corner Optimization
Proposed Timing Signoff Flow
Conventional BEOL Corners
Statistical RC Model
Pessimism of Conventional BEOL Corners (CBC)
Intuition on Delay Variability Across Cw RCw
Intuition on Delay Variability Across Cw RCw (2)
Scaling Factor α and Delay Variation
Find Paths for Which TBCs Can Be Used
Determining α Arcw and Acw
Benefits of Tightened BEOL Corners
Outline (2)
How to Minimize Cost of Resilience
Tradeoff Resilience Cost vs Datapath Cost
Selective-Endpoint Optimization (SEOpt)
Clock Skew Optimization (SkewOpt)
Overall Optimization Flow
Benefit of Low-Cost Resilience
Increased Benefit of Resilience with AVS
Outline (3)
Breaking Chicken-Egg Loops Less Margin
Derated Library Characterization and AVS
Library Characterization for AVS
Power vs Area Across Different Signoffs
Heuristics 1
Vfinal Estimation
Observation and Heuristic 2
Proposed Library Characterization Flow
Power vs Area for All Designs
Also Multi-Mode Signoff Choices Matter
Also Tunable Monitors Less Margin
Also Tunable Monitors Less Margin (2)
Outline (4)
Conclusions
Thank You
Backup
Power Penalty to Fix EM with AVS
Homogeneous Corners
Homogeneous Corners (2)
Correlation Matrix
Wiring Structure in Timing-Critical Paths (2)
Delay Variation
Delay Variation (2)
Non-Homogeneous Corner
Opportunities for Tightened BEOL Corners
Wiring Structure in Timing-Critical Paths (2)
Proposed Timing Signoff Flow (2)
Experiment Setup
Further Analysis
Scaling Factor Results
Benefits of Tightened BEOL Corners (1)
Heuristics 1 (2)
Vfinal Estimation (2)
Observation and Heuristic 2 (2)
Technology and Benchmark Circuits
A Reference Signoff Flow
Experiment Setup (2)
ldquoChicken and Eggrdquo Loop
Bias Temperature Instability (BTI)
Observation 1
Results for DC Scenario
Problem Signoff Corner Definition
AVS Signoff Corner Selection
AVS Impact on EM Lifetime
EM Impact on AVS Scheduling
What is ldquoSignoffrdquo
Statistical Timing Analysis (1)
Statistical Timing Analysis (2)
Resilient Designs
Resilience Cost Reduction Problem
Selective-Endpoint Optimization
Process-Aware Vdd Scaling (PVS)
Challenge Variability
Energy Reduction in AVS Context
Our Concept Mode Dominance
Our Method Global Optimization
Classes of Closed-Loop AVS
Design of RO with Tunable Vmin
Benefit of Resilience Cost Reduction
Increased Benefit of Resilience With AVS
Overall Optimization Flow (2)
24ISVLSI-2014 invited talk 140710
Increased Benefit of Resilience with AVSbull Adaptive voltage scaling allows a lower supply voltage for resilient
designs thus reduced powerbull Proposed method trades off between timing-error penalty vs
reduced power at a lower supply voltagebull Proposed method achieves an average of 17 energy reduction
compared to pure-margin designs Resilience benefits increase in the context of AVS strategy
086 09 094 098 10225
30
35
40
45
50pure-marginbrute-forceCombOpt
Supply voltage (V)
En
erg
y (
mJ
)
070 072 074 076 078 08024
26
28
30
32
34
36 pure-marginbrute-forceCombOpt
Supply voltage (V)
En
erg
y (
mJ
)
MUL EXU
Minimum achievable energy
Technology foundry 28nm
25ISVLSI-2014 invited talk 140710
Outlinebull Introductionbull Modeling of IC Variabilitybull Tolerance of IC Variabilitybull Margining of IC Variabilitybull Conclusions
26ISVLSI-2014 invited talk 140710
Breaking Chicken-Egg Loops Less Marginbull Example Interaction between reliability margin and AVS designs
bull Bias temperature instability (BTI) aging higher |ΔVth| lower fmax
bull AVS can be used to compensate for performance degradation
Circuit
Closed-loop AVS
On-chip aging
monitor
Circuit performanc
e
Voltage regulato
r
Circuit frequency
Vdd
time
time
Without AVSWith AVS
target
27ISVLSI-2014 invited talk 140710
Derated Library Characterization and AVS
bull VBTI = Voltage for BTI aging estimation
bull Vlib = Voltage for circuit performance estimation (library characterization)
bull VBTI and Vlib are required in signoff
bull VBTI and Vlib selection should consider BTI + AVS interaction
bull Aging and Vfinal are unknowns before circuit implementation
BTI degradation and AVS
Vfinal
VBTI |Vt|
Step 1
Vlib
Derated library
Step 2
Circuit implementation and
signoff
circuit
Step 3
28ISVLSI-2014 invited talk 140710
Library Characterization for AVS
bull VBTI = Voltage for BTI aging estimation
bull Vlib = Voltage for circuit performance estimation (library characterization)
bull VBTI and Vlib are required in signoff
bull VBTI and Vlib depend on aging during AVS
bull Aging and Vfinal are unknowns before circuit implementation
Vlib
VBTI Derated library
|Vt| Circuit implementation and
signoff
circuitBTI degradation and AVS
Vfinal
Step 1 Step 2 Step 3
No obvious guideline to define VBTI and Vlib
Inconsistency among Vfinal Vlib VBTI
bull What is the design overhead when timing libraries are not properly characterized
bull Can we define BTI- and AVS-aware signoff corners that ensure product goals with small design lifetime energy overheads Joint work with Wei-Ting Jonas Chan Tuck-Boon Chan Siddhartha Nath
29ISVLSI-2014 invited talk 140710
Power vs Area Across Different Signoffs
ldquoKneerdquo point for balanced area and power tradeoff
Optimistic signoff corner bull AVS increases supply voltage
aggressively to compensate aging
bull Large lifetime energy overhead
bull May fail to meet timing if desired supply voltage gt Vmax
bull Selection of signoff modes affects area power
bull ASP-DAC 2013 Optimization of signoff modes
Improve performance power or area
Reduce overdesign
NOM
ODNOM
OD
time
Vdd
tnom tOD tnom tOD
Also Multi-Mode Signoff Choices Matter
12
Fix fOD still 14 power range
Power of circuits w different overdrive modes
Different overdrive modes 26 power range
fnom = 800MHz Vnom = 08V
36ISVLSI-2014 invited talk 140710
Also Tunable Monitors Less Margin
Aggressive config Vmin_est lt Vmin_chip Some chips will fail
Optimized configbull Increase high
resistance passgatesbull Vmin_est asymp Vmin_chip
Default config
bull Low resistance passgates
bull Guardband for worst-case
bull Vmin_est gt Vmin_chip
bull 13mV margin
37ISVLSI-2014 invited talk 140710
Also Tunable Monitors Less Margin
Aggressive config Vmin_est lt Vmin_chip Some chips will fail
Default config
bull Low resistance passgates
bull Guardband for worst-case
bull Vmin_est gt Vmin_chip
bull 13mV margin
Optimized configbull Increase high
resistance passgatesbull Vmin_est asymp Vmin_chip
Benefits of tunability bull Compensate for difference
between model vs siliconbull Recover margin when variation is
reduced due to improved process
38ISVLSI-2014 invited talk 140710
Outlinebull Introductionbull Modeling of IC Variabilitybull Margining of IC Variabilitybull Tolerance of IC Variabilitybull Conclusions
39ISVLSI-2014 invited talk 140710
Conclusionsbull Variability severely challenges IC value
bull In manufacturing process during operation across lifetime
bull Benefit of ldquonext noderdquo is increasingly hard to findbull Entire node is a ldquo202020rdquo value propositionbull 5-10 in PPA metrics is now substantial at leading edge
bull Variability is connected to tapeout IC properties by models margins tolerances used in signoff
bull Some takeaways from this talkbull Substantial benefit from tightening BEOL corners (= signoff)bull ldquoMinimum cost of resiliencerdquo is a rich optimization challengebull Chicken-egg loops in signoff definition can be brokenbull Holistic approaches will provide ldquoequivalent scalingrdquo that
extends the value trajectory of Moorersquos Law
40ISVLSI-2014 invited talk 140710
Thank You
41ISVLSI-2014 invited talk 140710
Backup
42ISVLSI-2014 invited talk 140710
Power Penalty to Fix EM with AVS
1 2 3 4 5 6 7 8 91200
1300
1400
1500
1600
1700
030
032
034
036
Core Power (mW) PG Power (mW)
Implemetation
Core
Pow
er (m
W)
PG
Pow
er (m
W)
bull Core power increases due to elevated voltage bull PG power increases due to both elevated voltage and mesh degradationbull A tradeoff between invested guardband in signoff
Highest invested guardband
Least invested guardband
14 power penalty
>
43ISVLSI-2014 invited talk 140710
Homogeneous Cornersbull (1) Define RC corners of each layer separatelybull (2) Use corners from each layer to construct a
homogeneous corner for an interconnect stack
C-3σ
Layer M2
3σ
C-3σ
Layer M1
3σ
Interconnect stack with M1 and M2
M1 C
M2 C
3σ Pessimism
Example worst-case capacitance corner Homogeneous
Cw corner
44ISVLSI-2014 invited talk 140710
Homogeneous Cornersbull (1) Define RC corners of each layer separatelybull (2) Use corners from each layer to construct a
homogeneous corner for an interconnect stack
Interconnect stack with M1 and M2
M1 C
M2 C
3σ
Homogeneous Cw corner
C-3σ
Layer M2
3σ
C-3σ
Layer M1
3σ
Pessimism
Example worst-case capacitance corner
When variations in different layers are not fully correlated pessimism of homogeneous corners increase with layers
45ISVLSI-2014 invited talk 140710
Correlation Matrixbull Let Σ be the correlation matrix for variation sources
M1 M2 M3 M4
ΔW ΔT ΔH ΔW ΔT ΔH ΔW ΔT ΔH ΔW ΔT ΔH
M1 ΔW 1 0 0 γ 0 0 γ 0 0 0 0 0
ΔT 0 1 0 0 γ 0 0 γ 0 0 0 0
ΔH 0 0 1 0 0 γ 0 0 γ 0 0 0
M2 ΔW γ 0 0 1 0 0 γ 0 0 0 0 0
ΔT 0 γ 0 0 1 0 0 γ 0 0 0 0
ΔH 0 0 γ 0 0 1 0 0 γ 0 0 0
M3 ΔW γ 0 0 γ 0 0 1 0 0 0 0 0
ΔT 0 γ 0 0 γ 0 0 1 0 0 0 0
ΔH 0 0 γ 0 0 γ 0 0 1 0 0 0
M4 ΔW 0 0 0 0 0 0 0 0 0 1 0 0
ΔT 0 0 0 0 0 0 0 0 0 0 1 0
ΔH 0 0 0 0 0 0 0 0 0 0 0 1
= Σ
Correlation for variation sources with the same variation type and in the process module γ 05
Variation sources in different process modules are independent
46ISVLSI-2014 invited talk 140710
Wiring Structure in Timing-Critical Paths (2)
bull 92 of paths have lt 60 of wirelength on any single layer
Max wirelength ratio across all layers ()
Cum
ulati
ve p
roba
bilit
y
092
60
bull Variations in different layers are not fully correlated
bull Some paths have α gt 10 a CBC can underestimate delay variationsbull But these paths have larger delays at the other corner
C-worst corner underestimates delay variations but these paths are dominated by the RC-worst corner
α lt 10 delay variations are covered by the RC-worst corner
Dominated by C-worstΔdelay at C-worst gt Δdelay at RC-worst
Dominated by RC-worst Δdelay at RC-worst gt Δdelay at C-worst
Δdelay at RC-worst [d(Ycw) ndash d(Ytyp)] d(Ytyp)
48ISVLSI-2014 invited talk 140710
Delay Variation
α α
Δdelay at C-worst [d(Ycw) ndash d(Ytyp)] d(Ytyp)
bull Some paths have α gt 10 a CBC can underestimate delay variationsbull But these paths have larger delays at the other corner
C-worst corner underestimates delay variations but these paths are dominated by the RC-worst corner
α lt 10 delay variations are covered by the RC-worst corner
Dominated by C-worstΔdelay at C-worst gt Δdelay at RC-worst
Dominated by RC-worst Δdelay at RC-worst gt Δdelay at C-worst
Δdelay at RC-worst [d(Ycw) ndash d(Ytyp)] d(Ytyp)
bull Paths are more sensitive to R or to Cbull Using RC-worst or C-worst only will underestimate delay variationsbull Need both RC- and C-worst corners to cover process variationsbull In the following discussions α is defined at the dominant corner
49ISVLSI-2014 invited talk 140710
Non-Homogeneous Corner
bull Each layer can have different skewed variationsInterconnect stack with M1 and M2
M1 C
M2 C
3σ
Non-homogeneous cornerM1 == Cw (3σ)M2 == Ctyp
bull Less pessimism with non-homogeneous cornersbull Challenge
bull Many feasible combinationsbull A corner can only cover certain pathsbull How to choose the best combinations
50ISVLSI-2014 invited talk 140710
Opportunities for Tightened BEOL Corners
bull CBC can be pessimistic Most paths have α lt 05 bull Use tightened BEOL corners eg scale BEOL variation in
itf with α = 05
Δdj(Yrcw)dj(Ytyp) x 100
3σjd(Ytyp) x 100
Challenge how to avoid underestimating delay variation to preserve parametric yield
51ISVLSI-2014 invited talk 140710
Wiring Structure in Timing-Critical Paths
bull Critical paths are structurally similar
bull Wires on critical paths are routed on many layers
bull Structure is an outcome of the design flow
Testcasebull 45nm foundry library (wire
resistivity scaled by 8X)bull Netlist NETCARD 1mm2 570K
standard cell instancesbull 9 metal layersbull Extract critical paths from
different PVT and BEOL corners
Wirelength ratio ()
52ISVLSI-2014 invited talk 140710
Proposed Timing Signoff Flow
bull Extract RC at RC-worst C-worst and the typical corners
bull Calculate Δdelay of critical paths
bull Put path j in the group Gtbc if Δdelay is larger than a threshold
bull Fix only the paths in Gtbc using tightened BEOL corners
bull Since tightened corners have smaller delay variations timing closure is easier
Routed design
Timing analysis at BEOL corners Ytyp Ycw Yrcw
GTBC GCBC
ECOusing CBC
Timing analysis
using TBC
violation = 0
Timing analysis
using CBC
violation = 0
ECOusing TBC
done
53ISVLSI-2014 invited talk 140710
Experiment Setup
LEON3MP NETCARD SUPERBLUE12Clock period (ns) 18 20 31
bull Model BTI degradation with Vfinal throughout lifetime
bull Aging of a flat Vfinal asymp aging of an adaptive Vdd
bull But slightly pessimistic
Vdd
time
NBTI
PBTI
VBTI = Vlib asymp Vfinal
58ISVLSI-2014 invited talk 140710
Vfinal Estimation
bull Problem Vfinal is not available at early design stage (design has not been implemented)
bull Vfinal = Vdd end of life (to compensate BTI aging)
bull Gates along critical pathbull Timing slack at t = 0
bull Circuit activity is not an issue bull Because BTI effect is not sensitive to circuit activitybull DC or AC stress model is sufficient
59ISVLSI-2014 invited talk 140710
Observation and Heuristic 2
bull Observation 2 Vfinal is not sensitive to gate types
bull Heuristic 2 use average Vfinal of different gate typesbull Vfinal is a function of timing slack
bull Assume timing slack = 0
10mV
60ISVLSI-2014 invited talk 140710
Technology and Benchmark Circuits
bull NANGATE library with 32nm PTM technology bull Signoff for setup time violationbull Temperature = 125Cbull Process corner = slow NMOS and PMOSbull BTI degradation = DC AC
Supply voltages
Circuit Frequency (GHz)C5315 138c7552 125AES 089MPEG2 105
Vmax105V
Vinit090V
Vheur1 (DC) 097V
Vheur1 (AC) 095V
Vheur2 (DC) 095V
Vheur2 (AC) 093V
61ISVLSI-2014 invited talk 140710
A Reference Signoff Flow
bull Basic idea keep a consistent VBTI VLIB and Vdd throughout circuit lifetime
bull Signoff flowbull Estimate aging at each time step
bull Update circuit timing and Vdd
bull Repeat until t = tfinal
bull Modify circuit and start over if Vfinal gt maximum allowed voltage
bull No overhead in timing analysis but very slow Many STA runs
and library
Vstep AVS voltage stepVfinal converged voltage
62ISVLSI-2014 invited talk 140710
Experiment Setupbull Characterize different derated libraries
bull Evaluate impact of library characterizationbull Seven setups
1 VBTI = Vlib = Vinit Ignore AVS2 Most pessimistic derated library3 VBTI = Vlib = Vmax Extreme corner for AVS4 VBTI = Vfinal Do not overestimate aging but ignores AVS5 No derated library (reference)6 Proposed method with α=07 Proposed method with α=003
Case 1 2 3 4 5 6 7Vlib(V) Vinit Vinit Vmax Vinit NA Vheur1 Vheur2
VBTI (V) Vinit Vmax Vmax Vfinal NA Vheur1 Vheur2
63ISVLSI-2014 invited talk 140710
ldquoChicken and Eggrdquo Loop
bull ldquoChicken and eggrdquo loop in signoffbull Derated library characterization is related to BTI + AVSbull AVS affected by circuit implementation
bull Timing constraints critical paths etc
bull Circuit is affected by library characterization
Circuit
Derated Libraries
Vfinal
Vlib VBTI
64ISVLSI-2014 invited talk 140710
Bias Temperature Instability (BTI)
|ΔVth| increases when device is on (stressed)|ΔVth| is partially recovered when device is off (relaxed)NBTI PMOS PBTINMOS
|Vgs|
time
ON OFF ON OFF
[VattikondaWC06]
Device aging (|ΔVth|) accumulates over time
[TCASrsquo14]
65ISVLSI-2014 invited talk 140710
Observation 1
[Chan11]
bull BTI is a ldquofront-loadedrdquo phenomenon
bull 50 BTI aging happens within the 1st year of circuit lifetime (total lifetime = 10 years)
bull Most Vdd increment happens in early lifetime
bull Gap between Vdd and Vfinal reduces rapidly
asymp70 Vdd increment in 1 year(remaining 30 over 9 years)
Vfinal
66ISVLSI-2014 invited talk 140710
Results for DC Scenario
Optimistic signoff corner bull AVS increases supply voltage
aggressively to compensate agingbull Consume more powerbull May fail to meet timing if desired
bull Assume no EM fix at signoffbull BTI degradation is checked at each step and MTTF is updated as
30 MTTF penalty
200mV voltage compensation
>
70ISVLSI-2014 invited talk 140710
0 2 4 6 8 10 12090092094096098100102104
S1 S2 S3 S4 S5
Year
VDD
DMA 3S1 S2 S3 S4 S5
78
79
80
81
MTT
F (Y
ear)
EM Impact on AVS Scheduling
12 years MTTF penalty
>
71ISVLSI-2014 invited talk 140710
What is ldquoSignoffrdquo
bull Foundation of contract between design house and foundrybull ldquochip should workrdquo stack of models margins analysesbull Function timing signal integrity power integrity hellip
Nominal VddStatic IR drop
Power grid IR gradientDynamic IR
HCINBTI
Signoff Vdd
Voltage
Problem Margins = pessimism
overdesign schedule delay
ldquomargin stackrdquo for voltage signoff
Operating voltage
72ISVLSI-2014 invited talk 140710
Statistical Timing Analysis (1)
bull Delay sensitivity of path pj to variation source zv
bull Assumptions bull Δdjv is linear with respect to variation sources
bull Variation sources are normal distributions
bull Obtain Δdjv using 28 runs of RC extraction and static timing analysis (STA)
28 itf files (27 variation
sources + Ytyp)
Routed Netlist
RC extraction
STA
Δdjv
Δdjv = [ - ] 3dj(Yv) dj(Ytyp)
Note Path delay includes gate and wire delays
73ISVLSI-2014 invited talk 140710
Statistical Timing Analysis (2)bull Σ is the correlation matrix for variation sources (eg 27 x 27)
bull Σ = λλT (Note λ is obtained by Cholesky decomposition)
h h119879 119903119900119906119892 119901119906119905=1minus119864119877119879
+1minus119864119877119903times119879
recovery cycles
Clock period
Error rate [Kahng10]
76ISVLSI-2014 invited talk 140710
Selective-Endpoint Optimizationbull Optimize fanin cone w tighter constraints Allows replacement of Razor FF w normal FFbull Trade off cost of resilience vs data path optimization
Energy Reduction in AVS Contextbull Adaptive voltage scaling allows lower supply voltage for resilient
designs thus reduced powerbull Proposed method trades off between timing-error penalty vs
reduced power at a lower supply voltagebull Proposed method achieves an average of 18 energy reduction
compared to pure-margin designs Resilience benefits increase in the context of AVS strategy
084 088 092 096 10030
36
42
48
54
60brute-forcepure-marginCombOpt
Supply voltage (V)
En
erg
y (
mJ
)
084 086 088 09 092 094 096 098 1 10225
29
33
37
41
45brute-force
pure-margin
CombOpt
Supply voltage (V)
En
erg
y (
mJ
)
MUL EXU
Minimum achievable energy
80ISVLSI-2014 invited talk 140710
Our Concept Mode Dominancebull Design cone (of mode A) is the union of all the feasible operating modes for
circuits signed of at mode Abull Design cone is determined by tradeoff between voltage and frequency (mainly
threshold voltages)bull One mode is outside of the design cone of the other
failed design overdesignbull Mode A has positive timing slacks with respect to mode B
mode A dominates mode Bbull Equivalent dominance no mode is dominated by the other
bull Modes are in each othersrsquo design cone
Voltage
Frequency
A
Negative Slacks = failed design
Positive Slacks = overdesign
B
C
Design Cone of mode A
Multi-mode signoff at modes which do not exhibit equivalent dominance leads to overdesign
Guideline search for signoff modes within design cone reduce overdesign
81ISVLSI-2014 invited talk 140710
Our Method Global Optimizationbull Iteratively sample and refine power models
bull Avoid circuit implementation at each modebull Small constant of runs is enough Scalable
Sample (SPampR)
Construct power models
Estimate optimal signoff modes
Sample (SPampR)
Refine power models
Adaptive search
Global optimization flow
09 10 11 1214
15
16
17
18
19
201st 2nd real
Signoff Voltage (v)
Po
wer
(m
W)
Power estimation of adaptive search
bull Ovals indicate sample pointsbull 1st 2nd power from power models at first
second iterationbull real power from real implemented circuits
Design AESf 700MHz
82ISVLSI-2014 invited talk 140710
Classes of Closed-Loop AVS
bull Critical path may be difficult to identify (IP from 3rd party)
bull Calibrating monitors at multiple modesvoltages requires long test time
Closed-Loop AVS
Design-dependent replica
In-situmonitor
Generic monitor
bull Does not capture design-specific performance variation
82
This work Tunable monitor for closed-loop AVSbull Can be applied as a generic monitorbull Or tuned to capture design-specific performance
83ISVLSI-2014 invited talk 140710
Design of RO with Tunable Vmin
bull Identified two circuit knobs to tune Vmin
bull Series resistancebull Cell types (INV NAND NOR)
bull Proposed circuitbull Different cell type covers different process cornersbull Tune series resistance of each stage to high or low
1 bit 1 bit 1 bit Control pins
High resistance
Low resistance
84ISVLSI-2014 invited talk 140710
Benefit of Resilience Cost Reductionbull Reference flows
bull Pure-margin (PM) conventional methodology w only margin insertionbull Brute-force (BF) insert error-tolerant FFs at timing-critical endpoints
bull Proposed method (CO) achieves up to 20 energy reduction compared to reference methods
bull Resilience benefits increase with safety margin
PM BF CO PM BF CO PM BF CO25
30
35
40
45
50
55 Energy penalty of throughput degradationEnergy penalty of additional circuitsEnergy wo resilience
En
erg
y (
mJ
)
Large marginMedium marginSmall margin
MUL
PM BF CO PM BF CO PM BF CO25
27
29
31
33
35
En
erg
y (
mJ
)
EXU
Large marginMedium marginSmall margin
Smallmediumlarge margin safety margin = 51015 of clock period
85ISVLSI-2014 invited talk 140710
Increased Benefit of Resilience With AVSbull AVS (Adaptive Voltage Scaling) allows lower supply voltage for
resilient designs reduced powerbull We trade off between timing-error penalty vs reduced power at a
lower supply voltagebull Average 18 energy reduction compared to pure-margin designs
Resilience benefits increase in AVS context
084 088 092 096 10030
36
42
48
54
60brute-forcepure-marginCombOpt
Supply voltage (V)
En
erg
y (
mJ
)
084 086 088 09 092 094 096 098 1 10225
29
33
37
41
45brute-force
pure-margin
CombOpt
Supply voltage (V)
En
erg
y (
mJ
)
MUL EXU
Minimum achievable energy
86ISVLSI-2014 invited talk 140710
Overall Optimization Flow
bull Iteratively optimize with SEOpt and SkewOpt
Initial placement (all FFs = error-tolerant FFs)
Energy lt min energy
Save current solution
Margin insertion on K paths based on sensitivity function
Replace error-tolerant FFs w normal FFs
SEOpt
Activity-aware clock skew optimization
SkewOpt
Toward Holistic Modeling Margining and Tolerance of IC Variabi
IC Variability
Challenge Value of Technology
Solutions Modeling Margining Tolerance
Outline
BEOL Corner Optimization
Proposed Timing Signoff Flow
Conventional BEOL Corners
Statistical RC Model
Pessimism of Conventional BEOL Corners (CBC)
Intuition on Delay Variability Across Cw RCw
Intuition on Delay Variability Across Cw RCw (2)
Scaling Factor α and Delay Variation
Find Paths for Which TBCs Can Be Used
Determining α Arcw and Acw
Benefits of Tightened BEOL Corners
Outline (2)
How to Minimize Cost of Resilience
Tradeoff Resilience Cost vs Datapath Cost
Selective-Endpoint Optimization (SEOpt)
Clock Skew Optimization (SkewOpt)
Overall Optimization Flow
Benefit of Low-Cost Resilience
Increased Benefit of Resilience with AVS
Outline (3)
Breaking Chicken-Egg Loops Less Margin
Derated Library Characterization and AVS
Library Characterization for AVS
Power vs Area Across Different Signoffs
Heuristics 1
Vfinal Estimation
Observation and Heuristic 2
Proposed Library Characterization Flow
Power vs Area for All Designs
Also Multi-Mode Signoff Choices Matter
Also Tunable Monitors Less Margin
Also Tunable Monitors Less Margin (2)
Outline (4)
Conclusions
Thank You
Backup
Power Penalty to Fix EM with AVS
Homogeneous Corners
Homogeneous Corners (2)
Correlation Matrix
Wiring Structure in Timing-Critical Paths (2)
Delay Variation
Delay Variation (2)
Non-Homogeneous Corner
Opportunities for Tightened BEOL Corners
Wiring Structure in Timing-Critical Paths (2)
Proposed Timing Signoff Flow (2)
Experiment Setup
Further Analysis
Scaling Factor Results
Benefits of Tightened BEOL Corners (1)
Heuristics 1 (2)
Vfinal Estimation (2)
Observation and Heuristic 2 (2)
Technology and Benchmark Circuits
A Reference Signoff Flow
Experiment Setup (2)
ldquoChicken and Eggrdquo Loop
Bias Temperature Instability (BTI)
Observation 1
Results for DC Scenario
Problem Signoff Corner Definition
AVS Signoff Corner Selection
AVS Impact on EM Lifetime
EM Impact on AVS Scheduling
What is ldquoSignoffrdquo
Statistical Timing Analysis (1)
Statistical Timing Analysis (2)
Resilient Designs
Resilience Cost Reduction Problem
Selective-Endpoint Optimization
Process-Aware Vdd Scaling (PVS)
Challenge Variability
Energy Reduction in AVS Context
Our Concept Mode Dominance
Our Method Global Optimization
Classes of Closed-Loop AVS
Design of RO with Tunable Vmin
Benefit of Resilience Cost Reduction
Increased Benefit of Resilience With AVS
Overall Optimization Flow (2)
25ISVLSI-2014 invited talk 140710
Outlinebull Introductionbull Modeling of IC Variabilitybull Tolerance of IC Variabilitybull Margining of IC Variabilitybull Conclusions
26ISVLSI-2014 invited talk 140710
Breaking Chicken-Egg Loops Less Marginbull Example Interaction between reliability margin and AVS designs
bull Bias temperature instability (BTI) aging higher |ΔVth| lower fmax
bull AVS can be used to compensate for performance degradation
Circuit
Closed-loop AVS
On-chip aging
monitor
Circuit performanc
e
Voltage regulato
r
Circuit frequency
Vdd
time
time
Without AVSWith AVS
target
27ISVLSI-2014 invited talk 140710
Derated Library Characterization and AVS
bull VBTI = Voltage for BTI aging estimation
bull Vlib = Voltage for circuit performance estimation (library characterization)
bull VBTI and Vlib are required in signoff
bull VBTI and Vlib selection should consider BTI + AVS interaction
bull Aging and Vfinal are unknowns before circuit implementation
BTI degradation and AVS
Vfinal
VBTI |Vt|
Step 1
Vlib
Derated library
Step 2
Circuit implementation and
signoff
circuit
Step 3
28ISVLSI-2014 invited talk 140710
Library Characterization for AVS
bull VBTI = Voltage for BTI aging estimation
bull Vlib = Voltage for circuit performance estimation (library characterization)
bull VBTI and Vlib are required in signoff
bull VBTI and Vlib depend on aging during AVS
bull Aging and Vfinal are unknowns before circuit implementation
Vlib
VBTI Derated library
|Vt| Circuit implementation and
signoff
circuitBTI degradation and AVS
Vfinal
Step 1 Step 2 Step 3
No obvious guideline to define VBTI and Vlib
Inconsistency among Vfinal Vlib VBTI
bull What is the design overhead when timing libraries are not properly characterized
bull Can we define BTI- and AVS-aware signoff corners that ensure product goals with small design lifetime energy overheads Joint work with Wei-Ting Jonas Chan Tuck-Boon Chan Siddhartha Nath
29ISVLSI-2014 invited talk 140710
Power vs Area Across Different Signoffs
ldquoKneerdquo point for balanced area and power tradeoff
Optimistic signoff corner bull AVS increases supply voltage
aggressively to compensate aging
bull Large lifetime energy overhead
bull May fail to meet timing if desired supply voltage gt Vmax
bull Selection of signoff modes affects area power
bull ASP-DAC 2013 Optimization of signoff modes
Improve performance power or area
Reduce overdesign
NOM
ODNOM
OD
time
Vdd
tnom tOD tnom tOD
Also Multi-Mode Signoff Choices Matter
12
Fix fOD still 14 power range
Power of circuits w different overdrive modes
Different overdrive modes 26 power range
fnom = 800MHz Vnom = 08V
36ISVLSI-2014 invited talk 140710
Also Tunable Monitors Less Margin
Aggressive config Vmin_est lt Vmin_chip Some chips will fail
Optimized configbull Increase high
resistance passgatesbull Vmin_est asymp Vmin_chip
Default config
bull Low resistance passgates
bull Guardband for worst-case
bull Vmin_est gt Vmin_chip
bull 13mV margin
37ISVLSI-2014 invited talk 140710
Also Tunable Monitors Less Margin
Aggressive config Vmin_est lt Vmin_chip Some chips will fail
Default config
bull Low resistance passgates
bull Guardband for worst-case
bull Vmin_est gt Vmin_chip
bull 13mV margin
Optimized configbull Increase high
resistance passgatesbull Vmin_est asymp Vmin_chip
Benefits of tunability bull Compensate for difference
between model vs siliconbull Recover margin when variation is
reduced due to improved process
38ISVLSI-2014 invited talk 140710
Outlinebull Introductionbull Modeling of IC Variabilitybull Margining of IC Variabilitybull Tolerance of IC Variabilitybull Conclusions
39ISVLSI-2014 invited talk 140710
Conclusionsbull Variability severely challenges IC value
bull In manufacturing process during operation across lifetime
bull Benefit of ldquonext noderdquo is increasingly hard to findbull Entire node is a ldquo202020rdquo value propositionbull 5-10 in PPA metrics is now substantial at leading edge
bull Variability is connected to tapeout IC properties by models margins tolerances used in signoff
bull Some takeaways from this talkbull Substantial benefit from tightening BEOL corners (= signoff)bull ldquoMinimum cost of resiliencerdquo is a rich optimization challengebull Chicken-egg loops in signoff definition can be brokenbull Holistic approaches will provide ldquoequivalent scalingrdquo that
extends the value trajectory of Moorersquos Law
40ISVLSI-2014 invited talk 140710
Thank You
41ISVLSI-2014 invited talk 140710
Backup
42ISVLSI-2014 invited talk 140710
Power Penalty to Fix EM with AVS
1 2 3 4 5 6 7 8 91200
1300
1400
1500
1600
1700
030
032
034
036
Core Power (mW) PG Power (mW)
Implemetation
Core
Pow
er (m
W)
PG
Pow
er (m
W)
bull Core power increases due to elevated voltage bull PG power increases due to both elevated voltage and mesh degradationbull A tradeoff between invested guardband in signoff
Highest invested guardband
Least invested guardband
14 power penalty
>
43ISVLSI-2014 invited talk 140710
Homogeneous Cornersbull (1) Define RC corners of each layer separatelybull (2) Use corners from each layer to construct a
homogeneous corner for an interconnect stack
C-3σ
Layer M2
3σ
C-3σ
Layer M1
3σ
Interconnect stack with M1 and M2
M1 C
M2 C
3σ Pessimism
Example worst-case capacitance corner Homogeneous
Cw corner
44ISVLSI-2014 invited talk 140710
Homogeneous Cornersbull (1) Define RC corners of each layer separatelybull (2) Use corners from each layer to construct a
homogeneous corner for an interconnect stack
Interconnect stack with M1 and M2
M1 C
M2 C
3σ
Homogeneous Cw corner
C-3σ
Layer M2
3σ
C-3σ
Layer M1
3σ
Pessimism
Example worst-case capacitance corner
When variations in different layers are not fully correlated pessimism of homogeneous corners increase with layers
45ISVLSI-2014 invited talk 140710
Correlation Matrixbull Let Σ be the correlation matrix for variation sources
M1 M2 M3 M4
ΔW ΔT ΔH ΔW ΔT ΔH ΔW ΔT ΔH ΔW ΔT ΔH
M1 ΔW 1 0 0 γ 0 0 γ 0 0 0 0 0
ΔT 0 1 0 0 γ 0 0 γ 0 0 0 0
ΔH 0 0 1 0 0 γ 0 0 γ 0 0 0
M2 ΔW γ 0 0 1 0 0 γ 0 0 0 0 0
ΔT 0 γ 0 0 1 0 0 γ 0 0 0 0
ΔH 0 0 γ 0 0 1 0 0 γ 0 0 0
M3 ΔW γ 0 0 γ 0 0 1 0 0 0 0 0
ΔT 0 γ 0 0 γ 0 0 1 0 0 0 0
ΔH 0 0 γ 0 0 γ 0 0 1 0 0 0
M4 ΔW 0 0 0 0 0 0 0 0 0 1 0 0
ΔT 0 0 0 0 0 0 0 0 0 0 1 0
ΔH 0 0 0 0 0 0 0 0 0 0 0 1
= Σ
Correlation for variation sources with the same variation type and in the process module γ 05
Variation sources in different process modules are independent
46ISVLSI-2014 invited talk 140710
Wiring Structure in Timing-Critical Paths (2)
bull 92 of paths have lt 60 of wirelength on any single layer
Max wirelength ratio across all layers ()
Cum
ulati
ve p
roba
bilit
y
092
60
bull Variations in different layers are not fully correlated
bull Some paths have α gt 10 a CBC can underestimate delay variationsbull But these paths have larger delays at the other corner
C-worst corner underestimates delay variations but these paths are dominated by the RC-worst corner
α lt 10 delay variations are covered by the RC-worst corner
Dominated by C-worstΔdelay at C-worst gt Δdelay at RC-worst
Dominated by RC-worst Δdelay at RC-worst gt Δdelay at C-worst
Δdelay at RC-worst [d(Ycw) ndash d(Ytyp)] d(Ytyp)
48ISVLSI-2014 invited talk 140710
Delay Variation
α α
Δdelay at C-worst [d(Ycw) ndash d(Ytyp)] d(Ytyp)
bull Some paths have α gt 10 a CBC can underestimate delay variationsbull But these paths have larger delays at the other corner
C-worst corner underestimates delay variations but these paths are dominated by the RC-worst corner
α lt 10 delay variations are covered by the RC-worst corner
Dominated by C-worstΔdelay at C-worst gt Δdelay at RC-worst
Dominated by RC-worst Δdelay at RC-worst gt Δdelay at C-worst
Δdelay at RC-worst [d(Ycw) ndash d(Ytyp)] d(Ytyp)
bull Paths are more sensitive to R or to Cbull Using RC-worst or C-worst only will underestimate delay variationsbull Need both RC- and C-worst corners to cover process variationsbull In the following discussions α is defined at the dominant corner
49ISVLSI-2014 invited talk 140710
Non-Homogeneous Corner
bull Each layer can have different skewed variationsInterconnect stack with M1 and M2
M1 C
M2 C
3σ
Non-homogeneous cornerM1 == Cw (3σ)M2 == Ctyp
bull Less pessimism with non-homogeneous cornersbull Challenge
bull Many feasible combinationsbull A corner can only cover certain pathsbull How to choose the best combinations
50ISVLSI-2014 invited talk 140710
Opportunities for Tightened BEOL Corners
bull CBC can be pessimistic Most paths have α lt 05 bull Use tightened BEOL corners eg scale BEOL variation in
itf with α = 05
Δdj(Yrcw)dj(Ytyp) x 100
3σjd(Ytyp) x 100
Challenge how to avoid underestimating delay variation to preserve parametric yield
51ISVLSI-2014 invited talk 140710
Wiring Structure in Timing-Critical Paths
bull Critical paths are structurally similar
bull Wires on critical paths are routed on many layers
bull Structure is an outcome of the design flow
Testcasebull 45nm foundry library (wire
resistivity scaled by 8X)bull Netlist NETCARD 1mm2 570K
standard cell instancesbull 9 metal layersbull Extract critical paths from
different PVT and BEOL corners
Wirelength ratio ()
52ISVLSI-2014 invited talk 140710
Proposed Timing Signoff Flow
bull Extract RC at RC-worst C-worst and the typical corners
bull Calculate Δdelay of critical paths
bull Put path j in the group Gtbc if Δdelay is larger than a threshold
bull Fix only the paths in Gtbc using tightened BEOL corners
bull Since tightened corners have smaller delay variations timing closure is easier
Routed design
Timing analysis at BEOL corners Ytyp Ycw Yrcw
GTBC GCBC
ECOusing CBC
Timing analysis
using TBC
violation = 0
Timing analysis
using CBC
violation = 0
ECOusing TBC
done
53ISVLSI-2014 invited talk 140710
Experiment Setup
LEON3MP NETCARD SUPERBLUE12Clock period (ns) 18 20 31
bull Model BTI degradation with Vfinal throughout lifetime
bull Aging of a flat Vfinal asymp aging of an adaptive Vdd
bull But slightly pessimistic
Vdd
time
NBTI
PBTI
VBTI = Vlib asymp Vfinal
58ISVLSI-2014 invited talk 140710
Vfinal Estimation
bull Problem Vfinal is not available at early design stage (design has not been implemented)
bull Vfinal = Vdd end of life (to compensate BTI aging)
bull Gates along critical pathbull Timing slack at t = 0
bull Circuit activity is not an issue bull Because BTI effect is not sensitive to circuit activitybull DC or AC stress model is sufficient
59ISVLSI-2014 invited talk 140710
Observation and Heuristic 2
bull Observation 2 Vfinal is not sensitive to gate types
bull Heuristic 2 use average Vfinal of different gate typesbull Vfinal is a function of timing slack
bull Assume timing slack = 0
10mV
60ISVLSI-2014 invited talk 140710
Technology and Benchmark Circuits
bull NANGATE library with 32nm PTM technology bull Signoff for setup time violationbull Temperature = 125Cbull Process corner = slow NMOS and PMOSbull BTI degradation = DC AC
Supply voltages
Circuit Frequency (GHz)C5315 138c7552 125AES 089MPEG2 105
Vmax105V
Vinit090V
Vheur1 (DC) 097V
Vheur1 (AC) 095V
Vheur2 (DC) 095V
Vheur2 (AC) 093V
61ISVLSI-2014 invited talk 140710
A Reference Signoff Flow
bull Basic idea keep a consistent VBTI VLIB and Vdd throughout circuit lifetime
bull Signoff flowbull Estimate aging at each time step
bull Update circuit timing and Vdd
bull Repeat until t = tfinal
bull Modify circuit and start over if Vfinal gt maximum allowed voltage
bull No overhead in timing analysis but very slow Many STA runs
and library
Vstep AVS voltage stepVfinal converged voltage
62ISVLSI-2014 invited talk 140710
Experiment Setupbull Characterize different derated libraries
bull Evaluate impact of library characterizationbull Seven setups
1 VBTI = Vlib = Vinit Ignore AVS2 Most pessimistic derated library3 VBTI = Vlib = Vmax Extreme corner for AVS4 VBTI = Vfinal Do not overestimate aging but ignores AVS5 No derated library (reference)6 Proposed method with α=07 Proposed method with α=003
Case 1 2 3 4 5 6 7Vlib(V) Vinit Vinit Vmax Vinit NA Vheur1 Vheur2
VBTI (V) Vinit Vmax Vmax Vfinal NA Vheur1 Vheur2
63ISVLSI-2014 invited talk 140710
ldquoChicken and Eggrdquo Loop
bull ldquoChicken and eggrdquo loop in signoffbull Derated library characterization is related to BTI + AVSbull AVS affected by circuit implementation
bull Timing constraints critical paths etc
bull Circuit is affected by library characterization
Circuit
Derated Libraries
Vfinal
Vlib VBTI
64ISVLSI-2014 invited talk 140710
Bias Temperature Instability (BTI)
|ΔVth| increases when device is on (stressed)|ΔVth| is partially recovered when device is off (relaxed)NBTI PMOS PBTINMOS
|Vgs|
time
ON OFF ON OFF
[VattikondaWC06]
Device aging (|ΔVth|) accumulates over time
[TCASrsquo14]
65ISVLSI-2014 invited talk 140710
Observation 1
[Chan11]
bull BTI is a ldquofront-loadedrdquo phenomenon
bull 50 BTI aging happens within the 1st year of circuit lifetime (total lifetime = 10 years)
bull Most Vdd increment happens in early lifetime
bull Gap between Vdd and Vfinal reduces rapidly
asymp70 Vdd increment in 1 year(remaining 30 over 9 years)
Vfinal
66ISVLSI-2014 invited talk 140710
Results for DC Scenario
Optimistic signoff corner bull AVS increases supply voltage
aggressively to compensate agingbull Consume more powerbull May fail to meet timing if desired
bull Assume no EM fix at signoffbull BTI degradation is checked at each step and MTTF is updated as
30 MTTF penalty
200mV voltage compensation
>
70ISVLSI-2014 invited talk 140710
0 2 4 6 8 10 12090092094096098100102104
S1 S2 S3 S4 S5
Year
VDD
DMA 3S1 S2 S3 S4 S5
78
79
80
81
MTT
F (Y
ear)
EM Impact on AVS Scheduling
12 years MTTF penalty
>
71ISVLSI-2014 invited talk 140710
What is ldquoSignoffrdquo
bull Foundation of contract between design house and foundrybull ldquochip should workrdquo stack of models margins analysesbull Function timing signal integrity power integrity hellip
Nominal VddStatic IR drop
Power grid IR gradientDynamic IR
HCINBTI
Signoff Vdd
Voltage
Problem Margins = pessimism
overdesign schedule delay
ldquomargin stackrdquo for voltage signoff
Operating voltage
72ISVLSI-2014 invited talk 140710
Statistical Timing Analysis (1)
bull Delay sensitivity of path pj to variation source zv
bull Assumptions bull Δdjv is linear with respect to variation sources
bull Variation sources are normal distributions
bull Obtain Δdjv using 28 runs of RC extraction and static timing analysis (STA)
28 itf files (27 variation
sources + Ytyp)
Routed Netlist
RC extraction
STA
Δdjv
Δdjv = [ - ] 3dj(Yv) dj(Ytyp)
Note Path delay includes gate and wire delays
73ISVLSI-2014 invited talk 140710
Statistical Timing Analysis (2)bull Σ is the correlation matrix for variation sources (eg 27 x 27)
bull Σ = λλT (Note λ is obtained by Cholesky decomposition)
h h119879 119903119900119906119892 119901119906119905=1minus119864119877119879
+1minus119864119877119903times119879
recovery cycles
Clock period
Error rate [Kahng10]
76ISVLSI-2014 invited talk 140710
Selective-Endpoint Optimizationbull Optimize fanin cone w tighter constraints Allows replacement of Razor FF w normal FFbull Trade off cost of resilience vs data path optimization
Energy Reduction in AVS Contextbull Adaptive voltage scaling allows lower supply voltage for resilient
designs thus reduced powerbull Proposed method trades off between timing-error penalty vs
reduced power at a lower supply voltagebull Proposed method achieves an average of 18 energy reduction
compared to pure-margin designs Resilience benefits increase in the context of AVS strategy
084 088 092 096 10030
36
42
48
54
60brute-forcepure-marginCombOpt
Supply voltage (V)
En
erg
y (
mJ
)
084 086 088 09 092 094 096 098 1 10225
29
33
37
41
45brute-force
pure-margin
CombOpt
Supply voltage (V)
En
erg
y (
mJ
)
MUL EXU
Minimum achievable energy
80ISVLSI-2014 invited talk 140710
Our Concept Mode Dominancebull Design cone (of mode A) is the union of all the feasible operating modes for
circuits signed of at mode Abull Design cone is determined by tradeoff between voltage and frequency (mainly
threshold voltages)bull One mode is outside of the design cone of the other
failed design overdesignbull Mode A has positive timing slacks with respect to mode B
mode A dominates mode Bbull Equivalent dominance no mode is dominated by the other
bull Modes are in each othersrsquo design cone
Voltage
Frequency
A
Negative Slacks = failed design
Positive Slacks = overdesign
B
C
Design Cone of mode A
Multi-mode signoff at modes which do not exhibit equivalent dominance leads to overdesign
Guideline search for signoff modes within design cone reduce overdesign
81ISVLSI-2014 invited talk 140710
Our Method Global Optimizationbull Iteratively sample and refine power models
bull Avoid circuit implementation at each modebull Small constant of runs is enough Scalable
Sample (SPampR)
Construct power models
Estimate optimal signoff modes
Sample (SPampR)
Refine power models
Adaptive search
Global optimization flow
09 10 11 1214
15
16
17
18
19
201st 2nd real
Signoff Voltage (v)
Po
wer
(m
W)
Power estimation of adaptive search
bull Ovals indicate sample pointsbull 1st 2nd power from power models at first
second iterationbull real power from real implemented circuits
Design AESf 700MHz
82ISVLSI-2014 invited talk 140710
Classes of Closed-Loop AVS
bull Critical path may be difficult to identify (IP from 3rd party)
bull Calibrating monitors at multiple modesvoltages requires long test time
Closed-Loop AVS
Design-dependent replica
In-situmonitor
Generic monitor
bull Does not capture design-specific performance variation
82
This work Tunable monitor for closed-loop AVSbull Can be applied as a generic monitorbull Or tuned to capture design-specific performance
83ISVLSI-2014 invited talk 140710
Design of RO with Tunable Vmin
bull Identified two circuit knobs to tune Vmin
bull Series resistancebull Cell types (INV NAND NOR)
bull Proposed circuitbull Different cell type covers different process cornersbull Tune series resistance of each stage to high or low
1 bit 1 bit 1 bit Control pins
High resistance
Low resistance
84ISVLSI-2014 invited talk 140710
Benefit of Resilience Cost Reductionbull Reference flows
bull Pure-margin (PM) conventional methodology w only margin insertionbull Brute-force (BF) insert error-tolerant FFs at timing-critical endpoints
bull Proposed method (CO) achieves up to 20 energy reduction compared to reference methods
bull Resilience benefits increase with safety margin
PM BF CO PM BF CO PM BF CO25
30
35
40
45
50
55 Energy penalty of throughput degradationEnergy penalty of additional circuitsEnergy wo resilience
En
erg
y (
mJ
)
Large marginMedium marginSmall margin
MUL
PM BF CO PM BF CO PM BF CO25
27
29
31
33
35
En
erg
y (
mJ
)
EXU
Large marginMedium marginSmall margin
Smallmediumlarge margin safety margin = 51015 of clock period
85ISVLSI-2014 invited talk 140710
Increased Benefit of Resilience With AVSbull AVS (Adaptive Voltage Scaling) allows lower supply voltage for
resilient designs reduced powerbull We trade off between timing-error penalty vs reduced power at a
lower supply voltagebull Average 18 energy reduction compared to pure-margin designs
Resilience benefits increase in AVS context
084 088 092 096 10030
36
42
48
54
60brute-forcepure-marginCombOpt
Supply voltage (V)
En
erg
y (
mJ
)
084 086 088 09 092 094 096 098 1 10225
29
33
37
41
45brute-force
pure-margin
CombOpt
Supply voltage (V)
En
erg
y (
mJ
)
MUL EXU
Minimum achievable energy
86ISVLSI-2014 invited talk 140710
Overall Optimization Flow
bull Iteratively optimize with SEOpt and SkewOpt
Initial placement (all FFs = error-tolerant FFs)
Energy lt min energy
Save current solution
Margin insertion on K paths based on sensitivity function
Replace error-tolerant FFs w normal FFs
SEOpt
Activity-aware clock skew optimization
SkewOpt
Toward Holistic Modeling Margining and Tolerance of IC Variabi
IC Variability
Challenge Value of Technology
Solutions Modeling Margining Tolerance
Outline
BEOL Corner Optimization
Proposed Timing Signoff Flow
Conventional BEOL Corners
Statistical RC Model
Pessimism of Conventional BEOL Corners (CBC)
Intuition on Delay Variability Across Cw RCw
Intuition on Delay Variability Across Cw RCw (2)
Scaling Factor α and Delay Variation
Find Paths for Which TBCs Can Be Used
Determining α Arcw and Acw
Benefits of Tightened BEOL Corners
Outline (2)
How to Minimize Cost of Resilience
Tradeoff Resilience Cost vs Datapath Cost
Selective-Endpoint Optimization (SEOpt)
Clock Skew Optimization (SkewOpt)
Overall Optimization Flow
Benefit of Low-Cost Resilience
Increased Benefit of Resilience with AVS
Outline (3)
Breaking Chicken-Egg Loops Less Margin
Derated Library Characterization and AVS
Library Characterization for AVS
Power vs Area Across Different Signoffs
Heuristics 1
Vfinal Estimation
Observation and Heuristic 2
Proposed Library Characterization Flow
Power vs Area for All Designs
Also Multi-Mode Signoff Choices Matter
Also Tunable Monitors Less Margin
Also Tunable Monitors Less Margin (2)
Outline (4)
Conclusions
Thank You
Backup
Power Penalty to Fix EM with AVS
Homogeneous Corners
Homogeneous Corners (2)
Correlation Matrix
Wiring Structure in Timing-Critical Paths (2)
Delay Variation
Delay Variation (2)
Non-Homogeneous Corner
Opportunities for Tightened BEOL Corners
Wiring Structure in Timing-Critical Paths (2)
Proposed Timing Signoff Flow (2)
Experiment Setup
Further Analysis
Scaling Factor Results
Benefits of Tightened BEOL Corners (1)
Heuristics 1 (2)
Vfinal Estimation (2)
Observation and Heuristic 2 (2)
Technology and Benchmark Circuits
A Reference Signoff Flow
Experiment Setup (2)
ldquoChicken and Eggrdquo Loop
Bias Temperature Instability (BTI)
Observation 1
Results for DC Scenario
Problem Signoff Corner Definition
AVS Signoff Corner Selection
AVS Impact on EM Lifetime
EM Impact on AVS Scheduling
What is ldquoSignoffrdquo
Statistical Timing Analysis (1)
Statistical Timing Analysis (2)
Resilient Designs
Resilience Cost Reduction Problem
Selective-Endpoint Optimization
Process-Aware Vdd Scaling (PVS)
Challenge Variability
Energy Reduction in AVS Context
Our Concept Mode Dominance
Our Method Global Optimization
Classes of Closed-Loop AVS
Design of RO with Tunable Vmin
Benefit of Resilience Cost Reduction
Increased Benefit of Resilience With AVS
Overall Optimization Flow (2)
26ISVLSI-2014 invited talk 140710
Breaking Chicken-Egg Loops Less Marginbull Example Interaction between reliability margin and AVS designs
bull Bias temperature instability (BTI) aging higher |ΔVth| lower fmax
bull AVS can be used to compensate for performance degradation
Circuit
Closed-loop AVS
On-chip aging
monitor
Circuit performanc
e
Voltage regulato
r
Circuit frequency
Vdd
time
time
Without AVSWith AVS
target
27ISVLSI-2014 invited talk 140710
Derated Library Characterization and AVS
bull VBTI = Voltage for BTI aging estimation
bull Vlib = Voltage for circuit performance estimation (library characterization)
bull VBTI and Vlib are required in signoff
bull VBTI and Vlib selection should consider BTI + AVS interaction
bull Aging and Vfinal are unknowns before circuit implementation
BTI degradation and AVS
Vfinal
VBTI |Vt|
Step 1
Vlib
Derated library
Step 2
Circuit implementation and
signoff
circuit
Step 3
28ISVLSI-2014 invited talk 140710
Library Characterization for AVS
bull VBTI = Voltage for BTI aging estimation
bull Vlib = Voltage for circuit performance estimation (library characterization)
bull VBTI and Vlib are required in signoff
bull VBTI and Vlib depend on aging during AVS
bull Aging and Vfinal are unknowns before circuit implementation
Vlib
VBTI Derated library
|Vt| Circuit implementation and
signoff
circuitBTI degradation and AVS
Vfinal
Step 1 Step 2 Step 3
No obvious guideline to define VBTI and Vlib
Inconsistency among Vfinal Vlib VBTI
bull What is the design overhead when timing libraries are not properly characterized
bull Can we define BTI- and AVS-aware signoff corners that ensure product goals with small design lifetime energy overheads Joint work with Wei-Ting Jonas Chan Tuck-Boon Chan Siddhartha Nath
29ISVLSI-2014 invited talk 140710
Power vs Area Across Different Signoffs
ldquoKneerdquo point for balanced area and power tradeoff
Optimistic signoff corner bull AVS increases supply voltage
aggressively to compensate aging
bull Large lifetime energy overhead
bull May fail to meet timing if desired supply voltage gt Vmax
bull Selection of signoff modes affects area power
bull ASP-DAC 2013 Optimization of signoff modes
Improve performance power or area
Reduce overdesign
NOM
ODNOM
OD
time
Vdd
tnom tOD tnom tOD
Also Multi-Mode Signoff Choices Matter
12
Fix fOD still 14 power range
Power of circuits w different overdrive modes
Different overdrive modes 26 power range
fnom = 800MHz Vnom = 08V
36ISVLSI-2014 invited talk 140710
Also Tunable Monitors Less Margin
Aggressive config Vmin_est lt Vmin_chip Some chips will fail
Optimized configbull Increase high
resistance passgatesbull Vmin_est asymp Vmin_chip
Default config
bull Low resistance passgates
bull Guardband for worst-case
bull Vmin_est gt Vmin_chip
bull 13mV margin
37ISVLSI-2014 invited talk 140710
Also Tunable Monitors Less Margin
Aggressive config Vmin_est lt Vmin_chip Some chips will fail
Default config
bull Low resistance passgates
bull Guardband for worst-case
bull Vmin_est gt Vmin_chip
bull 13mV margin
Optimized configbull Increase high
resistance passgatesbull Vmin_est asymp Vmin_chip
Benefits of tunability bull Compensate for difference
between model vs siliconbull Recover margin when variation is
reduced due to improved process
38ISVLSI-2014 invited talk 140710
Outlinebull Introductionbull Modeling of IC Variabilitybull Margining of IC Variabilitybull Tolerance of IC Variabilitybull Conclusions
39ISVLSI-2014 invited talk 140710
Conclusionsbull Variability severely challenges IC value
bull In manufacturing process during operation across lifetime
bull Benefit of ldquonext noderdquo is increasingly hard to findbull Entire node is a ldquo202020rdquo value propositionbull 5-10 in PPA metrics is now substantial at leading edge
bull Variability is connected to tapeout IC properties by models margins tolerances used in signoff
bull Some takeaways from this talkbull Substantial benefit from tightening BEOL corners (= signoff)bull ldquoMinimum cost of resiliencerdquo is a rich optimization challengebull Chicken-egg loops in signoff definition can be brokenbull Holistic approaches will provide ldquoequivalent scalingrdquo that
extends the value trajectory of Moorersquos Law
40ISVLSI-2014 invited talk 140710
Thank You
41ISVLSI-2014 invited talk 140710
Backup
42ISVLSI-2014 invited talk 140710
Power Penalty to Fix EM with AVS
1 2 3 4 5 6 7 8 91200
1300
1400
1500
1600
1700
030
032
034
036
Core Power (mW) PG Power (mW)
Implemetation
Core
Pow
er (m
W)
PG
Pow
er (m
W)
bull Core power increases due to elevated voltage bull PG power increases due to both elevated voltage and mesh degradationbull A tradeoff between invested guardband in signoff
Highest invested guardband
Least invested guardband
14 power penalty
>
43ISVLSI-2014 invited talk 140710
Homogeneous Cornersbull (1) Define RC corners of each layer separatelybull (2) Use corners from each layer to construct a
homogeneous corner for an interconnect stack
C-3σ
Layer M2
3σ
C-3σ
Layer M1
3σ
Interconnect stack with M1 and M2
M1 C
M2 C
3σ Pessimism
Example worst-case capacitance corner Homogeneous
Cw corner
44ISVLSI-2014 invited talk 140710
Homogeneous Cornersbull (1) Define RC corners of each layer separatelybull (2) Use corners from each layer to construct a
homogeneous corner for an interconnect stack
Interconnect stack with M1 and M2
M1 C
M2 C
3σ
Homogeneous Cw corner
C-3σ
Layer M2
3σ
C-3σ
Layer M1
3σ
Pessimism
Example worst-case capacitance corner
When variations in different layers are not fully correlated pessimism of homogeneous corners increase with layers
45ISVLSI-2014 invited talk 140710
Correlation Matrixbull Let Σ be the correlation matrix for variation sources
M1 M2 M3 M4
ΔW ΔT ΔH ΔW ΔT ΔH ΔW ΔT ΔH ΔW ΔT ΔH
M1 ΔW 1 0 0 γ 0 0 γ 0 0 0 0 0
ΔT 0 1 0 0 γ 0 0 γ 0 0 0 0
ΔH 0 0 1 0 0 γ 0 0 γ 0 0 0
M2 ΔW γ 0 0 1 0 0 γ 0 0 0 0 0
ΔT 0 γ 0 0 1 0 0 γ 0 0 0 0
ΔH 0 0 γ 0 0 1 0 0 γ 0 0 0
M3 ΔW γ 0 0 γ 0 0 1 0 0 0 0 0
ΔT 0 γ 0 0 γ 0 0 1 0 0 0 0
ΔH 0 0 γ 0 0 γ 0 0 1 0 0 0
M4 ΔW 0 0 0 0 0 0 0 0 0 1 0 0
ΔT 0 0 0 0 0 0 0 0 0 0 1 0
ΔH 0 0 0 0 0 0 0 0 0 0 0 1
= Σ
Correlation for variation sources with the same variation type and in the process module γ 05
Variation sources in different process modules are independent
46ISVLSI-2014 invited talk 140710
Wiring Structure in Timing-Critical Paths (2)
bull 92 of paths have lt 60 of wirelength on any single layer
Max wirelength ratio across all layers ()
Cum
ulati
ve p
roba
bilit
y
092
60
bull Variations in different layers are not fully correlated
bull Some paths have α gt 10 a CBC can underestimate delay variationsbull But these paths have larger delays at the other corner
C-worst corner underestimates delay variations but these paths are dominated by the RC-worst corner
α lt 10 delay variations are covered by the RC-worst corner
Dominated by C-worstΔdelay at C-worst gt Δdelay at RC-worst
Dominated by RC-worst Δdelay at RC-worst gt Δdelay at C-worst
Δdelay at RC-worst [d(Ycw) ndash d(Ytyp)] d(Ytyp)
48ISVLSI-2014 invited talk 140710
Delay Variation
α α
Δdelay at C-worst [d(Ycw) ndash d(Ytyp)] d(Ytyp)
bull Some paths have α gt 10 a CBC can underestimate delay variationsbull But these paths have larger delays at the other corner
C-worst corner underestimates delay variations but these paths are dominated by the RC-worst corner
α lt 10 delay variations are covered by the RC-worst corner
Dominated by C-worstΔdelay at C-worst gt Δdelay at RC-worst
Dominated by RC-worst Δdelay at RC-worst gt Δdelay at C-worst
Δdelay at RC-worst [d(Ycw) ndash d(Ytyp)] d(Ytyp)
bull Paths are more sensitive to R or to Cbull Using RC-worst or C-worst only will underestimate delay variationsbull Need both RC- and C-worst corners to cover process variationsbull In the following discussions α is defined at the dominant corner
49ISVLSI-2014 invited talk 140710
Non-Homogeneous Corner
bull Each layer can have different skewed variationsInterconnect stack with M1 and M2
M1 C
M2 C
3σ
Non-homogeneous cornerM1 == Cw (3σ)M2 == Ctyp
bull Less pessimism with non-homogeneous cornersbull Challenge
bull Many feasible combinationsbull A corner can only cover certain pathsbull How to choose the best combinations
50ISVLSI-2014 invited talk 140710
Opportunities for Tightened BEOL Corners
bull CBC can be pessimistic Most paths have α lt 05 bull Use tightened BEOL corners eg scale BEOL variation in
itf with α = 05
Δdj(Yrcw)dj(Ytyp) x 100
3σjd(Ytyp) x 100
Challenge how to avoid underestimating delay variation to preserve parametric yield
51ISVLSI-2014 invited talk 140710
Wiring Structure in Timing-Critical Paths
bull Critical paths are structurally similar
bull Wires on critical paths are routed on many layers
bull Structure is an outcome of the design flow
Testcasebull 45nm foundry library (wire
resistivity scaled by 8X)bull Netlist NETCARD 1mm2 570K
standard cell instancesbull 9 metal layersbull Extract critical paths from
different PVT and BEOL corners
Wirelength ratio ()
52ISVLSI-2014 invited talk 140710
Proposed Timing Signoff Flow
bull Extract RC at RC-worst C-worst and the typical corners
bull Calculate Δdelay of critical paths
bull Put path j in the group Gtbc if Δdelay is larger than a threshold
bull Fix only the paths in Gtbc using tightened BEOL corners
bull Since tightened corners have smaller delay variations timing closure is easier
Routed design
Timing analysis at BEOL corners Ytyp Ycw Yrcw
GTBC GCBC
ECOusing CBC
Timing analysis
using TBC
violation = 0
Timing analysis
using CBC
violation = 0
ECOusing TBC
done
53ISVLSI-2014 invited talk 140710
Experiment Setup
LEON3MP NETCARD SUPERBLUE12Clock period (ns) 18 20 31
bull Model BTI degradation with Vfinal throughout lifetime
bull Aging of a flat Vfinal asymp aging of an adaptive Vdd
bull But slightly pessimistic
Vdd
time
NBTI
PBTI
VBTI = Vlib asymp Vfinal
58ISVLSI-2014 invited talk 140710
Vfinal Estimation
bull Problem Vfinal is not available at early design stage (design has not been implemented)
bull Vfinal = Vdd end of life (to compensate BTI aging)
bull Gates along critical pathbull Timing slack at t = 0
bull Circuit activity is not an issue bull Because BTI effect is not sensitive to circuit activitybull DC or AC stress model is sufficient
59ISVLSI-2014 invited talk 140710
Observation and Heuristic 2
bull Observation 2 Vfinal is not sensitive to gate types
bull Heuristic 2 use average Vfinal of different gate typesbull Vfinal is a function of timing slack
bull Assume timing slack = 0
10mV
60ISVLSI-2014 invited talk 140710
Technology and Benchmark Circuits
bull NANGATE library with 32nm PTM technology bull Signoff for setup time violationbull Temperature = 125Cbull Process corner = slow NMOS and PMOSbull BTI degradation = DC AC
Supply voltages
Circuit Frequency (GHz)C5315 138c7552 125AES 089MPEG2 105
Vmax105V
Vinit090V
Vheur1 (DC) 097V
Vheur1 (AC) 095V
Vheur2 (DC) 095V
Vheur2 (AC) 093V
61ISVLSI-2014 invited talk 140710
A Reference Signoff Flow
bull Basic idea keep a consistent VBTI VLIB and Vdd throughout circuit lifetime
bull Signoff flowbull Estimate aging at each time step
bull Update circuit timing and Vdd
bull Repeat until t = tfinal
bull Modify circuit and start over if Vfinal gt maximum allowed voltage
bull No overhead in timing analysis but very slow Many STA runs
and library
Vstep AVS voltage stepVfinal converged voltage
62ISVLSI-2014 invited talk 140710
Experiment Setupbull Characterize different derated libraries
bull Evaluate impact of library characterizationbull Seven setups
1 VBTI = Vlib = Vinit Ignore AVS2 Most pessimistic derated library3 VBTI = Vlib = Vmax Extreme corner for AVS4 VBTI = Vfinal Do not overestimate aging but ignores AVS5 No derated library (reference)6 Proposed method with α=07 Proposed method with α=003
Case 1 2 3 4 5 6 7Vlib(V) Vinit Vinit Vmax Vinit NA Vheur1 Vheur2
VBTI (V) Vinit Vmax Vmax Vfinal NA Vheur1 Vheur2
63ISVLSI-2014 invited talk 140710
ldquoChicken and Eggrdquo Loop
bull ldquoChicken and eggrdquo loop in signoffbull Derated library characterization is related to BTI + AVSbull AVS affected by circuit implementation
bull Timing constraints critical paths etc
bull Circuit is affected by library characterization
Circuit
Derated Libraries
Vfinal
Vlib VBTI
64ISVLSI-2014 invited talk 140710
Bias Temperature Instability (BTI)
|ΔVth| increases when device is on (stressed)|ΔVth| is partially recovered when device is off (relaxed)NBTI PMOS PBTINMOS
|Vgs|
time
ON OFF ON OFF
[VattikondaWC06]
Device aging (|ΔVth|) accumulates over time
[TCASrsquo14]
65ISVLSI-2014 invited talk 140710
Observation 1
[Chan11]
bull BTI is a ldquofront-loadedrdquo phenomenon
bull 50 BTI aging happens within the 1st year of circuit lifetime (total lifetime = 10 years)
bull Most Vdd increment happens in early lifetime
bull Gap between Vdd and Vfinal reduces rapidly
asymp70 Vdd increment in 1 year(remaining 30 over 9 years)
Vfinal
66ISVLSI-2014 invited talk 140710
Results for DC Scenario
Optimistic signoff corner bull AVS increases supply voltage
aggressively to compensate agingbull Consume more powerbull May fail to meet timing if desired
bull Assume no EM fix at signoffbull BTI degradation is checked at each step and MTTF is updated as
30 MTTF penalty
200mV voltage compensation
>
70ISVLSI-2014 invited talk 140710
0 2 4 6 8 10 12090092094096098100102104
S1 S2 S3 S4 S5
Year
VDD
DMA 3S1 S2 S3 S4 S5
78
79
80
81
MTT
F (Y
ear)
EM Impact on AVS Scheduling
12 years MTTF penalty
>
71ISVLSI-2014 invited talk 140710
What is ldquoSignoffrdquo
bull Foundation of contract between design house and foundrybull ldquochip should workrdquo stack of models margins analysesbull Function timing signal integrity power integrity hellip
Nominal VddStatic IR drop
Power grid IR gradientDynamic IR
HCINBTI
Signoff Vdd
Voltage
Problem Margins = pessimism
overdesign schedule delay
ldquomargin stackrdquo for voltage signoff
Operating voltage
72ISVLSI-2014 invited talk 140710
Statistical Timing Analysis (1)
bull Delay sensitivity of path pj to variation source zv
bull Assumptions bull Δdjv is linear with respect to variation sources
bull Variation sources are normal distributions
bull Obtain Δdjv using 28 runs of RC extraction and static timing analysis (STA)
28 itf files (27 variation
sources + Ytyp)
Routed Netlist
RC extraction
STA
Δdjv
Δdjv = [ - ] 3dj(Yv) dj(Ytyp)
Note Path delay includes gate and wire delays
73ISVLSI-2014 invited talk 140710
Statistical Timing Analysis (2)bull Σ is the correlation matrix for variation sources (eg 27 x 27)
bull Σ = λλT (Note λ is obtained by Cholesky decomposition)
h h119879 119903119900119906119892 119901119906119905=1minus119864119877119879
+1minus119864119877119903times119879
recovery cycles
Clock period
Error rate [Kahng10]
76ISVLSI-2014 invited talk 140710
Selective-Endpoint Optimizationbull Optimize fanin cone w tighter constraints Allows replacement of Razor FF w normal FFbull Trade off cost of resilience vs data path optimization
Energy Reduction in AVS Contextbull Adaptive voltage scaling allows lower supply voltage for resilient
designs thus reduced powerbull Proposed method trades off between timing-error penalty vs
reduced power at a lower supply voltagebull Proposed method achieves an average of 18 energy reduction
compared to pure-margin designs Resilience benefits increase in the context of AVS strategy
084 088 092 096 10030
36
42
48
54
60brute-forcepure-marginCombOpt
Supply voltage (V)
En
erg
y (
mJ
)
084 086 088 09 092 094 096 098 1 10225
29
33
37
41
45brute-force
pure-margin
CombOpt
Supply voltage (V)
En
erg
y (
mJ
)
MUL EXU
Minimum achievable energy
80ISVLSI-2014 invited talk 140710
Our Concept Mode Dominancebull Design cone (of mode A) is the union of all the feasible operating modes for
circuits signed of at mode Abull Design cone is determined by tradeoff between voltage and frequency (mainly
threshold voltages)bull One mode is outside of the design cone of the other
failed design overdesignbull Mode A has positive timing slacks with respect to mode B
mode A dominates mode Bbull Equivalent dominance no mode is dominated by the other
bull Modes are in each othersrsquo design cone
Voltage
Frequency
A
Negative Slacks = failed design
Positive Slacks = overdesign
B
C
Design Cone of mode A
Multi-mode signoff at modes which do not exhibit equivalent dominance leads to overdesign
Guideline search for signoff modes within design cone reduce overdesign
81ISVLSI-2014 invited talk 140710
Our Method Global Optimizationbull Iteratively sample and refine power models
bull Avoid circuit implementation at each modebull Small constant of runs is enough Scalable
Sample (SPampR)
Construct power models
Estimate optimal signoff modes
Sample (SPampR)
Refine power models
Adaptive search
Global optimization flow
09 10 11 1214
15
16
17
18
19
201st 2nd real
Signoff Voltage (v)
Po
wer
(m
W)
Power estimation of adaptive search
bull Ovals indicate sample pointsbull 1st 2nd power from power models at first
second iterationbull real power from real implemented circuits
Design AESf 700MHz
82ISVLSI-2014 invited talk 140710
Classes of Closed-Loop AVS
bull Critical path may be difficult to identify (IP from 3rd party)
bull Calibrating monitors at multiple modesvoltages requires long test time
Closed-Loop AVS
Design-dependent replica
In-situmonitor
Generic monitor
bull Does not capture design-specific performance variation
82
This work Tunable monitor for closed-loop AVSbull Can be applied as a generic monitorbull Or tuned to capture design-specific performance
83ISVLSI-2014 invited talk 140710
Design of RO with Tunable Vmin
bull Identified two circuit knobs to tune Vmin
bull Series resistancebull Cell types (INV NAND NOR)
bull Proposed circuitbull Different cell type covers different process cornersbull Tune series resistance of each stage to high or low
1 bit 1 bit 1 bit Control pins
High resistance
Low resistance
84ISVLSI-2014 invited talk 140710
Benefit of Resilience Cost Reductionbull Reference flows
bull Pure-margin (PM) conventional methodology w only margin insertionbull Brute-force (BF) insert error-tolerant FFs at timing-critical endpoints
bull Proposed method (CO) achieves up to 20 energy reduction compared to reference methods
bull Resilience benefits increase with safety margin
PM BF CO PM BF CO PM BF CO25
30
35
40
45
50
55 Energy penalty of throughput degradationEnergy penalty of additional circuitsEnergy wo resilience
En
erg
y (
mJ
)
Large marginMedium marginSmall margin
MUL
PM BF CO PM BF CO PM BF CO25
27
29
31
33
35
En
erg
y (
mJ
)
EXU
Large marginMedium marginSmall margin
Smallmediumlarge margin safety margin = 51015 of clock period
85ISVLSI-2014 invited talk 140710
Increased Benefit of Resilience With AVSbull AVS (Adaptive Voltage Scaling) allows lower supply voltage for
resilient designs reduced powerbull We trade off between timing-error penalty vs reduced power at a
lower supply voltagebull Average 18 energy reduction compared to pure-margin designs
Resilience benefits increase in AVS context
084 088 092 096 10030
36
42
48
54
60brute-forcepure-marginCombOpt
Supply voltage (V)
En
erg
y (
mJ
)
084 086 088 09 092 094 096 098 1 10225
29
33
37
41
45brute-force
pure-margin
CombOpt
Supply voltage (V)
En
erg
y (
mJ
)
MUL EXU
Minimum achievable energy
86ISVLSI-2014 invited talk 140710
Overall Optimization Flow
bull Iteratively optimize with SEOpt and SkewOpt
Initial placement (all FFs = error-tolerant FFs)
Energy lt min energy
Save current solution
Margin insertion on K paths based on sensitivity function
Replace error-tolerant FFs w normal FFs
SEOpt
Activity-aware clock skew optimization
SkewOpt
Toward Holistic Modeling Margining and Tolerance of IC Variabi
IC Variability
Challenge Value of Technology
Solutions Modeling Margining Tolerance
Outline
BEOL Corner Optimization
Proposed Timing Signoff Flow
Conventional BEOL Corners
Statistical RC Model
Pessimism of Conventional BEOL Corners (CBC)
Intuition on Delay Variability Across Cw RCw
Intuition on Delay Variability Across Cw RCw (2)
Scaling Factor α and Delay Variation
Find Paths for Which TBCs Can Be Used
Determining α Arcw and Acw
Benefits of Tightened BEOL Corners
Outline (2)
How to Minimize Cost of Resilience
Tradeoff Resilience Cost vs Datapath Cost
Selective-Endpoint Optimization (SEOpt)
Clock Skew Optimization (SkewOpt)
Overall Optimization Flow
Benefit of Low-Cost Resilience
Increased Benefit of Resilience with AVS
Outline (3)
Breaking Chicken-Egg Loops Less Margin
Derated Library Characterization and AVS
Library Characterization for AVS
Power vs Area Across Different Signoffs
Heuristics 1
Vfinal Estimation
Observation and Heuristic 2
Proposed Library Characterization Flow
Power vs Area for All Designs
Also Multi-Mode Signoff Choices Matter
Also Tunable Monitors Less Margin
Also Tunable Monitors Less Margin (2)
Outline (4)
Conclusions
Thank You
Backup
Power Penalty to Fix EM with AVS
Homogeneous Corners
Homogeneous Corners (2)
Correlation Matrix
Wiring Structure in Timing-Critical Paths (2)
Delay Variation
Delay Variation (2)
Non-Homogeneous Corner
Opportunities for Tightened BEOL Corners
Wiring Structure in Timing-Critical Paths (2)
Proposed Timing Signoff Flow (2)
Experiment Setup
Further Analysis
Scaling Factor Results
Benefits of Tightened BEOL Corners (1)
Heuristics 1 (2)
Vfinal Estimation (2)
Observation and Heuristic 2 (2)
Technology and Benchmark Circuits
A Reference Signoff Flow
Experiment Setup (2)
ldquoChicken and Eggrdquo Loop
Bias Temperature Instability (BTI)
Observation 1
Results for DC Scenario
Problem Signoff Corner Definition
AVS Signoff Corner Selection
AVS Impact on EM Lifetime
EM Impact on AVS Scheduling
What is ldquoSignoffrdquo
Statistical Timing Analysis (1)
Statistical Timing Analysis (2)
Resilient Designs
Resilience Cost Reduction Problem
Selective-Endpoint Optimization
Process-Aware Vdd Scaling (PVS)
Challenge Variability
Energy Reduction in AVS Context
Our Concept Mode Dominance
Our Method Global Optimization
Classes of Closed-Loop AVS
Design of RO with Tunable Vmin
Benefit of Resilience Cost Reduction
Increased Benefit of Resilience With AVS
Overall Optimization Flow (2)
27ISVLSI-2014 invited talk 140710
Derated Library Characterization and AVS
bull VBTI = Voltage for BTI aging estimation
bull Vlib = Voltage for circuit performance estimation (library characterization)
bull VBTI and Vlib are required in signoff
bull VBTI and Vlib selection should consider BTI + AVS interaction
bull Aging and Vfinal are unknowns before circuit implementation
BTI degradation and AVS
Vfinal
VBTI |Vt|
Step 1
Vlib
Derated library
Step 2
Circuit implementation and
signoff
circuit
Step 3
28ISVLSI-2014 invited talk 140710
Library Characterization for AVS
bull VBTI = Voltage for BTI aging estimation
bull Vlib = Voltage for circuit performance estimation (library characterization)
bull VBTI and Vlib are required in signoff
bull VBTI and Vlib depend on aging during AVS
bull Aging and Vfinal are unknowns before circuit implementation
Vlib
VBTI Derated library
|Vt| Circuit implementation and
signoff
circuitBTI degradation and AVS
Vfinal
Step 1 Step 2 Step 3
No obvious guideline to define VBTI and Vlib
Inconsistency among Vfinal Vlib VBTI
bull What is the design overhead when timing libraries are not properly characterized
bull Can we define BTI- and AVS-aware signoff corners that ensure product goals with small design lifetime energy overheads Joint work with Wei-Ting Jonas Chan Tuck-Boon Chan Siddhartha Nath
29ISVLSI-2014 invited talk 140710
Power vs Area Across Different Signoffs
ldquoKneerdquo point for balanced area and power tradeoff
Optimistic signoff corner bull AVS increases supply voltage
aggressively to compensate aging
bull Large lifetime energy overhead
bull May fail to meet timing if desired supply voltage gt Vmax
bull Selection of signoff modes affects area power
bull ASP-DAC 2013 Optimization of signoff modes
Improve performance power or area
Reduce overdesign
NOM
ODNOM
OD
time
Vdd
tnom tOD tnom tOD
Also Multi-Mode Signoff Choices Matter
12
Fix fOD still 14 power range
Power of circuits w different overdrive modes
Different overdrive modes 26 power range
fnom = 800MHz Vnom = 08V
36ISVLSI-2014 invited talk 140710
Also Tunable Monitors Less Margin
Aggressive config Vmin_est lt Vmin_chip Some chips will fail
Optimized configbull Increase high
resistance passgatesbull Vmin_est asymp Vmin_chip
Default config
bull Low resistance passgates
bull Guardband for worst-case
bull Vmin_est gt Vmin_chip
bull 13mV margin
37ISVLSI-2014 invited talk 140710
Also Tunable Monitors Less Margin
Aggressive config Vmin_est lt Vmin_chip Some chips will fail
Default config
bull Low resistance passgates
bull Guardband for worst-case
bull Vmin_est gt Vmin_chip
bull 13mV margin
Optimized configbull Increase high
resistance passgatesbull Vmin_est asymp Vmin_chip
Benefits of tunability bull Compensate for difference
between model vs siliconbull Recover margin when variation is
reduced due to improved process
38ISVLSI-2014 invited talk 140710
Outlinebull Introductionbull Modeling of IC Variabilitybull Margining of IC Variabilitybull Tolerance of IC Variabilitybull Conclusions
39ISVLSI-2014 invited talk 140710
Conclusionsbull Variability severely challenges IC value
bull In manufacturing process during operation across lifetime
bull Benefit of ldquonext noderdquo is increasingly hard to findbull Entire node is a ldquo202020rdquo value propositionbull 5-10 in PPA metrics is now substantial at leading edge
bull Variability is connected to tapeout IC properties by models margins tolerances used in signoff
bull Some takeaways from this talkbull Substantial benefit from tightening BEOL corners (= signoff)bull ldquoMinimum cost of resiliencerdquo is a rich optimization challengebull Chicken-egg loops in signoff definition can be brokenbull Holistic approaches will provide ldquoequivalent scalingrdquo that
extends the value trajectory of Moorersquos Law
40ISVLSI-2014 invited talk 140710
Thank You
41ISVLSI-2014 invited talk 140710
Backup
42ISVLSI-2014 invited talk 140710
Power Penalty to Fix EM with AVS
1 2 3 4 5 6 7 8 91200
1300
1400
1500
1600
1700
030
032
034
036
Core Power (mW) PG Power (mW)
Implemetation
Core
Pow
er (m
W)
PG
Pow
er (m
W)
bull Core power increases due to elevated voltage bull PG power increases due to both elevated voltage and mesh degradationbull A tradeoff between invested guardband in signoff
Highest invested guardband
Least invested guardband
14 power penalty
>
43ISVLSI-2014 invited talk 140710
Homogeneous Cornersbull (1) Define RC corners of each layer separatelybull (2) Use corners from each layer to construct a
homogeneous corner for an interconnect stack
C-3σ
Layer M2
3σ
C-3σ
Layer M1
3σ
Interconnect stack with M1 and M2
M1 C
M2 C
3σ Pessimism
Example worst-case capacitance corner Homogeneous
Cw corner
44ISVLSI-2014 invited talk 140710
Homogeneous Cornersbull (1) Define RC corners of each layer separatelybull (2) Use corners from each layer to construct a
homogeneous corner for an interconnect stack
Interconnect stack with M1 and M2
M1 C
M2 C
3σ
Homogeneous Cw corner
C-3σ
Layer M2
3σ
C-3σ
Layer M1
3σ
Pessimism
Example worst-case capacitance corner
When variations in different layers are not fully correlated pessimism of homogeneous corners increase with layers
45ISVLSI-2014 invited talk 140710
Correlation Matrixbull Let Σ be the correlation matrix for variation sources
M1 M2 M3 M4
ΔW ΔT ΔH ΔW ΔT ΔH ΔW ΔT ΔH ΔW ΔT ΔH
M1 ΔW 1 0 0 γ 0 0 γ 0 0 0 0 0
ΔT 0 1 0 0 γ 0 0 γ 0 0 0 0
ΔH 0 0 1 0 0 γ 0 0 γ 0 0 0
M2 ΔW γ 0 0 1 0 0 γ 0 0 0 0 0
ΔT 0 γ 0 0 1 0 0 γ 0 0 0 0
ΔH 0 0 γ 0 0 1 0 0 γ 0 0 0
M3 ΔW γ 0 0 γ 0 0 1 0 0 0 0 0
ΔT 0 γ 0 0 γ 0 0 1 0 0 0 0
ΔH 0 0 γ 0 0 γ 0 0 1 0 0 0
M4 ΔW 0 0 0 0 0 0 0 0 0 1 0 0
ΔT 0 0 0 0 0 0 0 0 0 0 1 0
ΔH 0 0 0 0 0 0 0 0 0 0 0 1
= Σ
Correlation for variation sources with the same variation type and in the process module γ 05
Variation sources in different process modules are independent
46ISVLSI-2014 invited talk 140710
Wiring Structure in Timing-Critical Paths (2)
bull 92 of paths have lt 60 of wirelength on any single layer
Max wirelength ratio across all layers ()
Cum
ulati
ve p
roba
bilit
y
092
60
bull Variations in different layers are not fully correlated
bull Some paths have α gt 10 a CBC can underestimate delay variationsbull But these paths have larger delays at the other corner
C-worst corner underestimates delay variations but these paths are dominated by the RC-worst corner
α lt 10 delay variations are covered by the RC-worst corner
Dominated by C-worstΔdelay at C-worst gt Δdelay at RC-worst
Dominated by RC-worst Δdelay at RC-worst gt Δdelay at C-worst
Δdelay at RC-worst [d(Ycw) ndash d(Ytyp)] d(Ytyp)
48ISVLSI-2014 invited talk 140710
Delay Variation
α α
Δdelay at C-worst [d(Ycw) ndash d(Ytyp)] d(Ytyp)
bull Some paths have α gt 10 a CBC can underestimate delay variationsbull But these paths have larger delays at the other corner
C-worst corner underestimates delay variations but these paths are dominated by the RC-worst corner
α lt 10 delay variations are covered by the RC-worst corner
Dominated by C-worstΔdelay at C-worst gt Δdelay at RC-worst
Dominated by RC-worst Δdelay at RC-worst gt Δdelay at C-worst
Δdelay at RC-worst [d(Ycw) ndash d(Ytyp)] d(Ytyp)
bull Paths are more sensitive to R or to Cbull Using RC-worst or C-worst only will underestimate delay variationsbull Need both RC- and C-worst corners to cover process variationsbull In the following discussions α is defined at the dominant corner
49ISVLSI-2014 invited talk 140710
Non-Homogeneous Corner
bull Each layer can have different skewed variationsInterconnect stack with M1 and M2
M1 C
M2 C
3σ
Non-homogeneous cornerM1 == Cw (3σ)M2 == Ctyp
bull Less pessimism with non-homogeneous cornersbull Challenge
bull Many feasible combinationsbull A corner can only cover certain pathsbull How to choose the best combinations
50ISVLSI-2014 invited talk 140710
Opportunities for Tightened BEOL Corners
bull CBC can be pessimistic Most paths have α lt 05 bull Use tightened BEOL corners eg scale BEOL variation in
itf with α = 05
Δdj(Yrcw)dj(Ytyp) x 100
3σjd(Ytyp) x 100
Challenge how to avoid underestimating delay variation to preserve parametric yield
51ISVLSI-2014 invited talk 140710
Wiring Structure in Timing-Critical Paths
bull Critical paths are structurally similar
bull Wires on critical paths are routed on many layers
bull Structure is an outcome of the design flow
Testcasebull 45nm foundry library (wire
resistivity scaled by 8X)bull Netlist NETCARD 1mm2 570K
standard cell instancesbull 9 metal layersbull Extract critical paths from
different PVT and BEOL corners
Wirelength ratio ()
52ISVLSI-2014 invited talk 140710
Proposed Timing Signoff Flow
bull Extract RC at RC-worst C-worst and the typical corners
bull Calculate Δdelay of critical paths
bull Put path j in the group Gtbc if Δdelay is larger than a threshold
bull Fix only the paths in Gtbc using tightened BEOL corners
bull Since tightened corners have smaller delay variations timing closure is easier
Routed design
Timing analysis at BEOL corners Ytyp Ycw Yrcw
GTBC GCBC
ECOusing CBC
Timing analysis
using TBC
violation = 0
Timing analysis
using CBC
violation = 0
ECOusing TBC
done
53ISVLSI-2014 invited talk 140710
Experiment Setup
LEON3MP NETCARD SUPERBLUE12Clock period (ns) 18 20 31
bull Model BTI degradation with Vfinal throughout lifetime
bull Aging of a flat Vfinal asymp aging of an adaptive Vdd
bull But slightly pessimistic
Vdd
time
NBTI
PBTI
VBTI = Vlib asymp Vfinal
58ISVLSI-2014 invited talk 140710
Vfinal Estimation
bull Problem Vfinal is not available at early design stage (design has not been implemented)
bull Vfinal = Vdd end of life (to compensate BTI aging)
bull Gates along critical pathbull Timing slack at t = 0
bull Circuit activity is not an issue bull Because BTI effect is not sensitive to circuit activitybull DC or AC stress model is sufficient
59ISVLSI-2014 invited talk 140710
Observation and Heuristic 2
bull Observation 2 Vfinal is not sensitive to gate types
bull Heuristic 2 use average Vfinal of different gate typesbull Vfinal is a function of timing slack
bull Assume timing slack = 0
10mV
60ISVLSI-2014 invited talk 140710
Technology and Benchmark Circuits
bull NANGATE library with 32nm PTM technology bull Signoff for setup time violationbull Temperature = 125Cbull Process corner = slow NMOS and PMOSbull BTI degradation = DC AC
Supply voltages
Circuit Frequency (GHz)C5315 138c7552 125AES 089MPEG2 105
Vmax105V
Vinit090V
Vheur1 (DC) 097V
Vheur1 (AC) 095V
Vheur2 (DC) 095V
Vheur2 (AC) 093V
61ISVLSI-2014 invited talk 140710
A Reference Signoff Flow
bull Basic idea keep a consistent VBTI VLIB and Vdd throughout circuit lifetime
bull Signoff flowbull Estimate aging at each time step
bull Update circuit timing and Vdd
bull Repeat until t = tfinal
bull Modify circuit and start over if Vfinal gt maximum allowed voltage
bull No overhead in timing analysis but very slow Many STA runs
and library
Vstep AVS voltage stepVfinal converged voltage
62ISVLSI-2014 invited talk 140710
Experiment Setupbull Characterize different derated libraries
bull Evaluate impact of library characterizationbull Seven setups
1 VBTI = Vlib = Vinit Ignore AVS2 Most pessimistic derated library3 VBTI = Vlib = Vmax Extreme corner for AVS4 VBTI = Vfinal Do not overestimate aging but ignores AVS5 No derated library (reference)6 Proposed method with α=07 Proposed method with α=003
Case 1 2 3 4 5 6 7Vlib(V) Vinit Vinit Vmax Vinit NA Vheur1 Vheur2
VBTI (V) Vinit Vmax Vmax Vfinal NA Vheur1 Vheur2
63ISVLSI-2014 invited talk 140710
ldquoChicken and Eggrdquo Loop
bull ldquoChicken and eggrdquo loop in signoffbull Derated library characterization is related to BTI + AVSbull AVS affected by circuit implementation
bull Timing constraints critical paths etc
bull Circuit is affected by library characterization
Circuit
Derated Libraries
Vfinal
Vlib VBTI
64ISVLSI-2014 invited talk 140710
Bias Temperature Instability (BTI)
|ΔVth| increases when device is on (stressed)|ΔVth| is partially recovered when device is off (relaxed)NBTI PMOS PBTINMOS
|Vgs|
time
ON OFF ON OFF
[VattikondaWC06]
Device aging (|ΔVth|) accumulates over time
[TCASrsquo14]
65ISVLSI-2014 invited talk 140710
Observation 1
[Chan11]
bull BTI is a ldquofront-loadedrdquo phenomenon
bull 50 BTI aging happens within the 1st year of circuit lifetime (total lifetime = 10 years)
bull Most Vdd increment happens in early lifetime
bull Gap between Vdd and Vfinal reduces rapidly
asymp70 Vdd increment in 1 year(remaining 30 over 9 years)
Vfinal
66ISVLSI-2014 invited talk 140710
Results for DC Scenario
Optimistic signoff corner bull AVS increases supply voltage
aggressively to compensate agingbull Consume more powerbull May fail to meet timing if desired
bull Assume no EM fix at signoffbull BTI degradation is checked at each step and MTTF is updated as
30 MTTF penalty
200mV voltage compensation
>
70ISVLSI-2014 invited talk 140710
0 2 4 6 8 10 12090092094096098100102104
S1 S2 S3 S4 S5
Year
VDD
DMA 3S1 S2 S3 S4 S5
78
79
80
81
MTT
F (Y
ear)
EM Impact on AVS Scheduling
12 years MTTF penalty
>
71ISVLSI-2014 invited talk 140710
What is ldquoSignoffrdquo
bull Foundation of contract between design house and foundrybull ldquochip should workrdquo stack of models margins analysesbull Function timing signal integrity power integrity hellip
Nominal VddStatic IR drop
Power grid IR gradientDynamic IR
HCINBTI
Signoff Vdd
Voltage
Problem Margins = pessimism
overdesign schedule delay
ldquomargin stackrdquo for voltage signoff
Operating voltage
72ISVLSI-2014 invited talk 140710
Statistical Timing Analysis (1)
bull Delay sensitivity of path pj to variation source zv
bull Assumptions bull Δdjv is linear with respect to variation sources
bull Variation sources are normal distributions
bull Obtain Δdjv using 28 runs of RC extraction and static timing analysis (STA)
28 itf files (27 variation
sources + Ytyp)
Routed Netlist
RC extraction
STA
Δdjv
Δdjv = [ - ] 3dj(Yv) dj(Ytyp)
Note Path delay includes gate and wire delays
73ISVLSI-2014 invited talk 140710
Statistical Timing Analysis (2)bull Σ is the correlation matrix for variation sources (eg 27 x 27)
bull Σ = λλT (Note λ is obtained by Cholesky decomposition)
h h119879 119903119900119906119892 119901119906119905=1minus119864119877119879
+1minus119864119877119903times119879
recovery cycles
Clock period
Error rate [Kahng10]
76ISVLSI-2014 invited talk 140710
Selective-Endpoint Optimizationbull Optimize fanin cone w tighter constraints Allows replacement of Razor FF w normal FFbull Trade off cost of resilience vs data path optimization
Energy Reduction in AVS Contextbull Adaptive voltage scaling allows lower supply voltage for resilient
designs thus reduced powerbull Proposed method trades off between timing-error penalty vs
reduced power at a lower supply voltagebull Proposed method achieves an average of 18 energy reduction
compared to pure-margin designs Resilience benefits increase in the context of AVS strategy
084 088 092 096 10030
36
42
48
54
60brute-forcepure-marginCombOpt
Supply voltage (V)
En
erg
y (
mJ
)
084 086 088 09 092 094 096 098 1 10225
29
33
37
41
45brute-force
pure-margin
CombOpt
Supply voltage (V)
En
erg
y (
mJ
)
MUL EXU
Minimum achievable energy
80ISVLSI-2014 invited talk 140710
Our Concept Mode Dominancebull Design cone (of mode A) is the union of all the feasible operating modes for
circuits signed of at mode Abull Design cone is determined by tradeoff between voltage and frequency (mainly
threshold voltages)bull One mode is outside of the design cone of the other
failed design overdesignbull Mode A has positive timing slacks with respect to mode B
mode A dominates mode Bbull Equivalent dominance no mode is dominated by the other
bull Modes are in each othersrsquo design cone
Voltage
Frequency
A
Negative Slacks = failed design
Positive Slacks = overdesign
B
C
Design Cone of mode A
Multi-mode signoff at modes which do not exhibit equivalent dominance leads to overdesign
Guideline search for signoff modes within design cone reduce overdesign
81ISVLSI-2014 invited talk 140710
Our Method Global Optimizationbull Iteratively sample and refine power models
bull Avoid circuit implementation at each modebull Small constant of runs is enough Scalable
Sample (SPampR)
Construct power models
Estimate optimal signoff modes
Sample (SPampR)
Refine power models
Adaptive search
Global optimization flow
09 10 11 1214
15
16
17
18
19
201st 2nd real
Signoff Voltage (v)
Po
wer
(m
W)
Power estimation of adaptive search
bull Ovals indicate sample pointsbull 1st 2nd power from power models at first
second iterationbull real power from real implemented circuits
Design AESf 700MHz
82ISVLSI-2014 invited talk 140710
Classes of Closed-Loop AVS
bull Critical path may be difficult to identify (IP from 3rd party)
bull Calibrating monitors at multiple modesvoltages requires long test time
Closed-Loop AVS
Design-dependent replica
In-situmonitor
Generic monitor
bull Does not capture design-specific performance variation
82
This work Tunable monitor for closed-loop AVSbull Can be applied as a generic monitorbull Or tuned to capture design-specific performance
83ISVLSI-2014 invited talk 140710
Design of RO with Tunable Vmin
bull Identified two circuit knobs to tune Vmin
bull Series resistancebull Cell types (INV NAND NOR)
bull Proposed circuitbull Different cell type covers different process cornersbull Tune series resistance of each stage to high or low
1 bit 1 bit 1 bit Control pins
High resistance
Low resistance
84ISVLSI-2014 invited talk 140710
Benefit of Resilience Cost Reductionbull Reference flows
bull Pure-margin (PM) conventional methodology w only margin insertionbull Brute-force (BF) insert error-tolerant FFs at timing-critical endpoints
bull Proposed method (CO) achieves up to 20 energy reduction compared to reference methods
bull Resilience benefits increase with safety margin
PM BF CO PM BF CO PM BF CO25
30
35
40
45
50
55 Energy penalty of throughput degradationEnergy penalty of additional circuitsEnergy wo resilience
En
erg
y (
mJ
)
Large marginMedium marginSmall margin
MUL
PM BF CO PM BF CO PM BF CO25
27
29
31
33
35
En
erg
y (
mJ
)
EXU
Large marginMedium marginSmall margin
Smallmediumlarge margin safety margin = 51015 of clock period
85ISVLSI-2014 invited talk 140710
Increased Benefit of Resilience With AVSbull AVS (Adaptive Voltage Scaling) allows lower supply voltage for
resilient designs reduced powerbull We trade off between timing-error penalty vs reduced power at a
lower supply voltagebull Average 18 energy reduction compared to pure-margin designs
Resilience benefits increase in AVS context
084 088 092 096 10030
36
42
48
54
60brute-forcepure-marginCombOpt
Supply voltage (V)
En
erg
y (
mJ
)
084 086 088 09 092 094 096 098 1 10225
29
33
37
41
45brute-force
pure-margin
CombOpt
Supply voltage (V)
En
erg
y (
mJ
)
MUL EXU
Minimum achievable energy
86ISVLSI-2014 invited talk 140710
Overall Optimization Flow
bull Iteratively optimize with SEOpt and SkewOpt
Initial placement (all FFs = error-tolerant FFs)
Energy lt min energy
Save current solution
Margin insertion on K paths based on sensitivity function
Replace error-tolerant FFs w normal FFs
SEOpt
Activity-aware clock skew optimization
SkewOpt
Toward Holistic Modeling Margining and Tolerance of IC Variabi
IC Variability
Challenge Value of Technology
Solutions Modeling Margining Tolerance
Outline
BEOL Corner Optimization
Proposed Timing Signoff Flow
Conventional BEOL Corners
Statistical RC Model
Pessimism of Conventional BEOL Corners (CBC)
Intuition on Delay Variability Across Cw RCw
Intuition on Delay Variability Across Cw RCw (2)
Scaling Factor α and Delay Variation
Find Paths for Which TBCs Can Be Used
Determining α Arcw and Acw
Benefits of Tightened BEOL Corners
Outline (2)
How to Minimize Cost of Resilience
Tradeoff Resilience Cost vs Datapath Cost
Selective-Endpoint Optimization (SEOpt)
Clock Skew Optimization (SkewOpt)
Overall Optimization Flow
Benefit of Low-Cost Resilience
Increased Benefit of Resilience with AVS
Outline (3)
Breaking Chicken-Egg Loops Less Margin
Derated Library Characterization and AVS
Library Characterization for AVS
Power vs Area Across Different Signoffs
Heuristics 1
Vfinal Estimation
Observation and Heuristic 2
Proposed Library Characterization Flow
Power vs Area for All Designs
Also Multi-Mode Signoff Choices Matter
Also Tunable Monitors Less Margin
Also Tunable Monitors Less Margin (2)
Outline (4)
Conclusions
Thank You
Backup
Power Penalty to Fix EM with AVS
Homogeneous Corners
Homogeneous Corners (2)
Correlation Matrix
Wiring Structure in Timing-Critical Paths (2)
Delay Variation
Delay Variation (2)
Non-Homogeneous Corner
Opportunities for Tightened BEOL Corners
Wiring Structure in Timing-Critical Paths (2)
Proposed Timing Signoff Flow (2)
Experiment Setup
Further Analysis
Scaling Factor Results
Benefits of Tightened BEOL Corners (1)
Heuristics 1 (2)
Vfinal Estimation (2)
Observation and Heuristic 2 (2)
Technology and Benchmark Circuits
A Reference Signoff Flow
Experiment Setup (2)
ldquoChicken and Eggrdquo Loop
Bias Temperature Instability (BTI)
Observation 1
Results for DC Scenario
Problem Signoff Corner Definition
AVS Signoff Corner Selection
AVS Impact on EM Lifetime
EM Impact on AVS Scheduling
What is ldquoSignoffrdquo
Statistical Timing Analysis (1)
Statistical Timing Analysis (2)
Resilient Designs
Resilience Cost Reduction Problem
Selective-Endpoint Optimization
Process-Aware Vdd Scaling (PVS)
Challenge Variability
Energy Reduction in AVS Context
Our Concept Mode Dominance
Our Method Global Optimization
Classes of Closed-Loop AVS
Design of RO with Tunable Vmin
Benefit of Resilience Cost Reduction
Increased Benefit of Resilience With AVS
Overall Optimization Flow (2)
28ISVLSI-2014 invited talk 140710
Library Characterization for AVS
bull VBTI = Voltage for BTI aging estimation
bull Vlib = Voltage for circuit performance estimation (library characterization)
bull VBTI and Vlib are required in signoff
bull VBTI and Vlib depend on aging during AVS
bull Aging and Vfinal are unknowns before circuit implementation
Vlib
VBTI Derated library
|Vt| Circuit implementation and
signoff
circuitBTI degradation and AVS
Vfinal
Step 1 Step 2 Step 3
No obvious guideline to define VBTI and Vlib
Inconsistency among Vfinal Vlib VBTI
bull What is the design overhead when timing libraries are not properly characterized
bull Can we define BTI- and AVS-aware signoff corners that ensure product goals with small design lifetime energy overheads Joint work with Wei-Ting Jonas Chan Tuck-Boon Chan Siddhartha Nath
29ISVLSI-2014 invited talk 140710
Power vs Area Across Different Signoffs
ldquoKneerdquo point for balanced area and power tradeoff
Optimistic signoff corner bull AVS increases supply voltage
aggressively to compensate aging
bull Large lifetime energy overhead
bull May fail to meet timing if desired supply voltage gt Vmax
bull Selection of signoff modes affects area power
bull ASP-DAC 2013 Optimization of signoff modes
Improve performance power or area
Reduce overdesign
NOM
ODNOM
OD
time
Vdd
tnom tOD tnom tOD
Also Multi-Mode Signoff Choices Matter
12
Fix fOD still 14 power range
Power of circuits w different overdrive modes
Different overdrive modes 26 power range
fnom = 800MHz Vnom = 08V
36ISVLSI-2014 invited talk 140710
Also Tunable Monitors Less Margin
Aggressive config Vmin_est lt Vmin_chip Some chips will fail
Optimized configbull Increase high
resistance passgatesbull Vmin_est asymp Vmin_chip
Default config
bull Low resistance passgates
bull Guardband for worst-case
bull Vmin_est gt Vmin_chip
bull 13mV margin
37ISVLSI-2014 invited talk 140710
Also Tunable Monitors Less Margin
Aggressive config Vmin_est lt Vmin_chip Some chips will fail
Default config
bull Low resistance passgates
bull Guardband for worst-case
bull Vmin_est gt Vmin_chip
bull 13mV margin
Optimized configbull Increase high
resistance passgatesbull Vmin_est asymp Vmin_chip
Benefits of tunability bull Compensate for difference
between model vs siliconbull Recover margin when variation is
reduced due to improved process
38ISVLSI-2014 invited talk 140710
Outlinebull Introductionbull Modeling of IC Variabilitybull Margining of IC Variabilitybull Tolerance of IC Variabilitybull Conclusions
39ISVLSI-2014 invited talk 140710
Conclusionsbull Variability severely challenges IC value
bull In manufacturing process during operation across lifetime
bull Benefit of ldquonext noderdquo is increasingly hard to findbull Entire node is a ldquo202020rdquo value propositionbull 5-10 in PPA metrics is now substantial at leading edge
bull Variability is connected to tapeout IC properties by models margins tolerances used in signoff
bull Some takeaways from this talkbull Substantial benefit from tightening BEOL corners (= signoff)bull ldquoMinimum cost of resiliencerdquo is a rich optimization challengebull Chicken-egg loops in signoff definition can be brokenbull Holistic approaches will provide ldquoequivalent scalingrdquo that
extends the value trajectory of Moorersquos Law
40ISVLSI-2014 invited talk 140710
Thank You
41ISVLSI-2014 invited talk 140710
Backup
42ISVLSI-2014 invited talk 140710
Power Penalty to Fix EM with AVS
1 2 3 4 5 6 7 8 91200
1300
1400
1500
1600
1700
030
032
034
036
Core Power (mW) PG Power (mW)
Implemetation
Core
Pow
er (m
W)
PG
Pow
er (m
W)
bull Core power increases due to elevated voltage bull PG power increases due to both elevated voltage and mesh degradationbull A tradeoff between invested guardband in signoff
Highest invested guardband
Least invested guardband
14 power penalty
>
43ISVLSI-2014 invited talk 140710
Homogeneous Cornersbull (1) Define RC corners of each layer separatelybull (2) Use corners from each layer to construct a
homogeneous corner for an interconnect stack
C-3σ
Layer M2
3σ
C-3σ
Layer M1
3σ
Interconnect stack with M1 and M2
M1 C
M2 C
3σ Pessimism
Example worst-case capacitance corner Homogeneous
Cw corner
44ISVLSI-2014 invited talk 140710
Homogeneous Cornersbull (1) Define RC corners of each layer separatelybull (2) Use corners from each layer to construct a
homogeneous corner for an interconnect stack
Interconnect stack with M1 and M2
M1 C
M2 C
3σ
Homogeneous Cw corner
C-3σ
Layer M2
3σ
C-3σ
Layer M1
3σ
Pessimism
Example worst-case capacitance corner
When variations in different layers are not fully correlated pessimism of homogeneous corners increase with layers
45ISVLSI-2014 invited talk 140710
Correlation Matrixbull Let Σ be the correlation matrix for variation sources
M1 M2 M3 M4
ΔW ΔT ΔH ΔW ΔT ΔH ΔW ΔT ΔH ΔW ΔT ΔH
M1 ΔW 1 0 0 γ 0 0 γ 0 0 0 0 0
ΔT 0 1 0 0 γ 0 0 γ 0 0 0 0
ΔH 0 0 1 0 0 γ 0 0 γ 0 0 0
M2 ΔW γ 0 0 1 0 0 γ 0 0 0 0 0
ΔT 0 γ 0 0 1 0 0 γ 0 0 0 0
ΔH 0 0 γ 0 0 1 0 0 γ 0 0 0
M3 ΔW γ 0 0 γ 0 0 1 0 0 0 0 0
ΔT 0 γ 0 0 γ 0 0 1 0 0 0 0
ΔH 0 0 γ 0 0 γ 0 0 1 0 0 0
M4 ΔW 0 0 0 0 0 0 0 0 0 1 0 0
ΔT 0 0 0 0 0 0 0 0 0 0 1 0
ΔH 0 0 0 0 0 0 0 0 0 0 0 1
= Σ
Correlation for variation sources with the same variation type and in the process module γ 05
Variation sources in different process modules are independent
46ISVLSI-2014 invited talk 140710
Wiring Structure in Timing-Critical Paths (2)
bull 92 of paths have lt 60 of wirelength on any single layer
Max wirelength ratio across all layers ()
Cum
ulati
ve p
roba
bilit
y
092
60
bull Variations in different layers are not fully correlated
bull Some paths have α gt 10 a CBC can underestimate delay variationsbull But these paths have larger delays at the other corner
C-worst corner underestimates delay variations but these paths are dominated by the RC-worst corner
α lt 10 delay variations are covered by the RC-worst corner
Dominated by C-worstΔdelay at C-worst gt Δdelay at RC-worst
Dominated by RC-worst Δdelay at RC-worst gt Δdelay at C-worst
Δdelay at RC-worst [d(Ycw) ndash d(Ytyp)] d(Ytyp)
48ISVLSI-2014 invited talk 140710
Delay Variation
α α
Δdelay at C-worst [d(Ycw) ndash d(Ytyp)] d(Ytyp)
bull Some paths have α gt 10 a CBC can underestimate delay variationsbull But these paths have larger delays at the other corner
C-worst corner underestimates delay variations but these paths are dominated by the RC-worst corner
α lt 10 delay variations are covered by the RC-worst corner
Dominated by C-worstΔdelay at C-worst gt Δdelay at RC-worst
Dominated by RC-worst Δdelay at RC-worst gt Δdelay at C-worst
Δdelay at RC-worst [d(Ycw) ndash d(Ytyp)] d(Ytyp)
bull Paths are more sensitive to R or to Cbull Using RC-worst or C-worst only will underestimate delay variationsbull Need both RC- and C-worst corners to cover process variationsbull In the following discussions α is defined at the dominant corner
49ISVLSI-2014 invited talk 140710
Non-Homogeneous Corner
bull Each layer can have different skewed variationsInterconnect stack with M1 and M2
M1 C
M2 C
3σ
Non-homogeneous cornerM1 == Cw (3σ)M2 == Ctyp
bull Less pessimism with non-homogeneous cornersbull Challenge
bull Many feasible combinationsbull A corner can only cover certain pathsbull How to choose the best combinations
50ISVLSI-2014 invited talk 140710
Opportunities for Tightened BEOL Corners
bull CBC can be pessimistic Most paths have α lt 05 bull Use tightened BEOL corners eg scale BEOL variation in
itf with α = 05
Δdj(Yrcw)dj(Ytyp) x 100
3σjd(Ytyp) x 100
Challenge how to avoid underestimating delay variation to preserve parametric yield
51ISVLSI-2014 invited talk 140710
Wiring Structure in Timing-Critical Paths
bull Critical paths are structurally similar
bull Wires on critical paths are routed on many layers
bull Structure is an outcome of the design flow
Testcasebull 45nm foundry library (wire
resistivity scaled by 8X)bull Netlist NETCARD 1mm2 570K
standard cell instancesbull 9 metal layersbull Extract critical paths from
different PVT and BEOL corners
Wirelength ratio ()
52ISVLSI-2014 invited talk 140710
Proposed Timing Signoff Flow
bull Extract RC at RC-worst C-worst and the typical corners
bull Calculate Δdelay of critical paths
bull Put path j in the group Gtbc if Δdelay is larger than a threshold
bull Fix only the paths in Gtbc using tightened BEOL corners
bull Since tightened corners have smaller delay variations timing closure is easier
Routed design
Timing analysis at BEOL corners Ytyp Ycw Yrcw
GTBC GCBC
ECOusing CBC
Timing analysis
using TBC
violation = 0
Timing analysis
using CBC
violation = 0
ECOusing TBC
done
53ISVLSI-2014 invited talk 140710
Experiment Setup
LEON3MP NETCARD SUPERBLUE12Clock period (ns) 18 20 31
bull Model BTI degradation with Vfinal throughout lifetime
bull Aging of a flat Vfinal asymp aging of an adaptive Vdd
bull But slightly pessimistic
Vdd
time
NBTI
PBTI
VBTI = Vlib asymp Vfinal
58ISVLSI-2014 invited talk 140710
Vfinal Estimation
bull Problem Vfinal is not available at early design stage (design has not been implemented)
bull Vfinal = Vdd end of life (to compensate BTI aging)
bull Gates along critical pathbull Timing slack at t = 0
bull Circuit activity is not an issue bull Because BTI effect is not sensitive to circuit activitybull DC or AC stress model is sufficient
59ISVLSI-2014 invited talk 140710
Observation and Heuristic 2
bull Observation 2 Vfinal is not sensitive to gate types
bull Heuristic 2 use average Vfinal of different gate typesbull Vfinal is a function of timing slack
bull Assume timing slack = 0
10mV
60ISVLSI-2014 invited talk 140710
Technology and Benchmark Circuits
bull NANGATE library with 32nm PTM technology bull Signoff for setup time violationbull Temperature = 125Cbull Process corner = slow NMOS and PMOSbull BTI degradation = DC AC
Supply voltages
Circuit Frequency (GHz)C5315 138c7552 125AES 089MPEG2 105
Vmax105V
Vinit090V
Vheur1 (DC) 097V
Vheur1 (AC) 095V
Vheur2 (DC) 095V
Vheur2 (AC) 093V
61ISVLSI-2014 invited talk 140710
A Reference Signoff Flow
bull Basic idea keep a consistent VBTI VLIB and Vdd throughout circuit lifetime
bull Signoff flowbull Estimate aging at each time step
bull Update circuit timing and Vdd
bull Repeat until t = tfinal
bull Modify circuit and start over if Vfinal gt maximum allowed voltage
bull No overhead in timing analysis but very slow Many STA runs
and library
Vstep AVS voltage stepVfinal converged voltage
62ISVLSI-2014 invited talk 140710
Experiment Setupbull Characterize different derated libraries
bull Evaluate impact of library characterizationbull Seven setups
1 VBTI = Vlib = Vinit Ignore AVS2 Most pessimistic derated library3 VBTI = Vlib = Vmax Extreme corner for AVS4 VBTI = Vfinal Do not overestimate aging but ignores AVS5 No derated library (reference)6 Proposed method with α=07 Proposed method with α=003
Case 1 2 3 4 5 6 7Vlib(V) Vinit Vinit Vmax Vinit NA Vheur1 Vheur2
VBTI (V) Vinit Vmax Vmax Vfinal NA Vheur1 Vheur2
63ISVLSI-2014 invited talk 140710
ldquoChicken and Eggrdquo Loop
bull ldquoChicken and eggrdquo loop in signoffbull Derated library characterization is related to BTI + AVSbull AVS affected by circuit implementation
bull Timing constraints critical paths etc
bull Circuit is affected by library characterization
Circuit
Derated Libraries
Vfinal
Vlib VBTI
64ISVLSI-2014 invited talk 140710
Bias Temperature Instability (BTI)
|ΔVth| increases when device is on (stressed)|ΔVth| is partially recovered when device is off (relaxed)NBTI PMOS PBTINMOS
|Vgs|
time
ON OFF ON OFF
[VattikondaWC06]
Device aging (|ΔVth|) accumulates over time
[TCASrsquo14]
65ISVLSI-2014 invited talk 140710
Observation 1
[Chan11]
bull BTI is a ldquofront-loadedrdquo phenomenon
bull 50 BTI aging happens within the 1st year of circuit lifetime (total lifetime = 10 years)
bull Most Vdd increment happens in early lifetime
bull Gap between Vdd and Vfinal reduces rapidly
asymp70 Vdd increment in 1 year(remaining 30 over 9 years)
Vfinal
66ISVLSI-2014 invited talk 140710
Results for DC Scenario
Optimistic signoff corner bull AVS increases supply voltage
aggressively to compensate agingbull Consume more powerbull May fail to meet timing if desired
bull Assume no EM fix at signoffbull BTI degradation is checked at each step and MTTF is updated as
30 MTTF penalty
200mV voltage compensation
>
70ISVLSI-2014 invited talk 140710
0 2 4 6 8 10 12090092094096098100102104
S1 S2 S3 S4 S5
Year
VDD
DMA 3S1 S2 S3 S4 S5
78
79
80
81
MTT
F (Y
ear)
EM Impact on AVS Scheduling
12 years MTTF penalty
>
71ISVLSI-2014 invited talk 140710
What is ldquoSignoffrdquo
bull Foundation of contract between design house and foundrybull ldquochip should workrdquo stack of models margins analysesbull Function timing signal integrity power integrity hellip
Nominal VddStatic IR drop
Power grid IR gradientDynamic IR
HCINBTI
Signoff Vdd
Voltage
Problem Margins = pessimism
overdesign schedule delay
ldquomargin stackrdquo for voltage signoff
Operating voltage
72ISVLSI-2014 invited talk 140710
Statistical Timing Analysis (1)
bull Delay sensitivity of path pj to variation source zv
bull Assumptions bull Δdjv is linear with respect to variation sources
bull Variation sources are normal distributions
bull Obtain Δdjv using 28 runs of RC extraction and static timing analysis (STA)
28 itf files (27 variation
sources + Ytyp)
Routed Netlist
RC extraction
STA
Δdjv
Δdjv = [ - ] 3dj(Yv) dj(Ytyp)
Note Path delay includes gate and wire delays
73ISVLSI-2014 invited talk 140710
Statistical Timing Analysis (2)bull Σ is the correlation matrix for variation sources (eg 27 x 27)
bull Σ = λλT (Note λ is obtained by Cholesky decomposition)
h h119879 119903119900119906119892 119901119906119905=1minus119864119877119879
+1minus119864119877119903times119879
recovery cycles
Clock period
Error rate [Kahng10]
76ISVLSI-2014 invited talk 140710
Selective-Endpoint Optimizationbull Optimize fanin cone w tighter constraints Allows replacement of Razor FF w normal FFbull Trade off cost of resilience vs data path optimization
bull Selection of signoff modes affects area power
bull ASP-DAC 2013 Optimization of signoff modes
Improve performance power or area
Reduce overdesign
NOM
ODNOM
OD
time
Vdd
tnom tOD tnom tOD
Also Multi-Mode Signoff Choices Matter
12
Fix fOD still 14 power range
Power of circuits w different overdrive modes
Different overdrive modes 26 power range
fnom = 800MHz Vnom = 08V
36ISVLSI-2014 invited talk 140710
Also Tunable Monitors Less Margin
Aggressive config Vmin_est lt Vmin_chip Some chips will fail
Optimized configbull Increase high
resistance passgatesbull Vmin_est asymp Vmin_chip
Default config
bull Low resistance passgates
bull Guardband for worst-case
bull Vmin_est gt Vmin_chip
bull 13mV margin
37ISVLSI-2014 invited talk 140710
Also Tunable Monitors Less Margin
Aggressive config Vmin_est lt Vmin_chip Some chips will fail
Default config
bull Low resistance passgates
bull Guardband for worst-case
bull Vmin_est gt Vmin_chip
bull 13mV margin
Optimized configbull Increase high
resistance passgatesbull Vmin_est asymp Vmin_chip
Benefits of tunability bull Compensate for difference
between model vs siliconbull Recover margin when variation is
reduced due to improved process
38ISVLSI-2014 invited talk 140710
Outlinebull Introductionbull Modeling of IC Variabilitybull Margining of IC Variabilitybull Tolerance of IC Variabilitybull Conclusions
39ISVLSI-2014 invited talk 140710
Conclusionsbull Variability severely challenges IC value
bull In manufacturing process during operation across lifetime
bull Benefit of ldquonext noderdquo is increasingly hard to findbull Entire node is a ldquo202020rdquo value propositionbull 5-10 in PPA metrics is now substantial at leading edge
bull Variability is connected to tapeout IC properties by models margins tolerances used in signoff
bull Some takeaways from this talkbull Substantial benefit from tightening BEOL corners (= signoff)bull ldquoMinimum cost of resiliencerdquo is a rich optimization challengebull Chicken-egg loops in signoff definition can be brokenbull Holistic approaches will provide ldquoequivalent scalingrdquo that
extends the value trajectory of Moorersquos Law
40ISVLSI-2014 invited talk 140710
Thank You
41ISVLSI-2014 invited talk 140710
Backup
42ISVLSI-2014 invited talk 140710
Power Penalty to Fix EM with AVS
1 2 3 4 5 6 7 8 91200
1300
1400
1500
1600
1700
030
032
034
036
Core Power (mW) PG Power (mW)
Implemetation
Core
Pow
er (m
W)
PG
Pow
er (m
W)
bull Core power increases due to elevated voltage bull PG power increases due to both elevated voltage and mesh degradationbull A tradeoff between invested guardband in signoff
Highest invested guardband
Least invested guardband
14 power penalty
>
43ISVLSI-2014 invited talk 140710
Homogeneous Cornersbull (1) Define RC corners of each layer separatelybull (2) Use corners from each layer to construct a
homogeneous corner for an interconnect stack
C-3σ
Layer M2
3σ
C-3σ
Layer M1
3σ
Interconnect stack with M1 and M2
M1 C
M2 C
3σ Pessimism
Example worst-case capacitance corner Homogeneous
Cw corner
44ISVLSI-2014 invited talk 140710
Homogeneous Cornersbull (1) Define RC corners of each layer separatelybull (2) Use corners from each layer to construct a
homogeneous corner for an interconnect stack
Interconnect stack with M1 and M2
M1 C
M2 C
3σ
Homogeneous Cw corner
C-3σ
Layer M2
3σ
C-3σ
Layer M1
3σ
Pessimism
Example worst-case capacitance corner
When variations in different layers are not fully correlated pessimism of homogeneous corners increase with layers
45ISVLSI-2014 invited talk 140710
Correlation Matrixbull Let Σ be the correlation matrix for variation sources
M1 M2 M3 M4
ΔW ΔT ΔH ΔW ΔT ΔH ΔW ΔT ΔH ΔW ΔT ΔH
M1 ΔW 1 0 0 γ 0 0 γ 0 0 0 0 0
ΔT 0 1 0 0 γ 0 0 γ 0 0 0 0
ΔH 0 0 1 0 0 γ 0 0 γ 0 0 0
M2 ΔW γ 0 0 1 0 0 γ 0 0 0 0 0
ΔT 0 γ 0 0 1 0 0 γ 0 0 0 0
ΔH 0 0 γ 0 0 1 0 0 γ 0 0 0
M3 ΔW γ 0 0 γ 0 0 1 0 0 0 0 0
ΔT 0 γ 0 0 γ 0 0 1 0 0 0 0
ΔH 0 0 γ 0 0 γ 0 0 1 0 0 0
M4 ΔW 0 0 0 0 0 0 0 0 0 1 0 0
ΔT 0 0 0 0 0 0 0 0 0 0 1 0
ΔH 0 0 0 0 0 0 0 0 0 0 0 1
= Σ
Correlation for variation sources with the same variation type and in the process module γ 05
Variation sources in different process modules are independent
46ISVLSI-2014 invited talk 140710
Wiring Structure in Timing-Critical Paths (2)
bull 92 of paths have lt 60 of wirelength on any single layer
Max wirelength ratio across all layers ()
Cum
ulati
ve p
roba
bilit
y
092
60
bull Variations in different layers are not fully correlated
bull Some paths have α gt 10 a CBC can underestimate delay variationsbull But these paths have larger delays at the other corner
C-worst corner underestimates delay variations but these paths are dominated by the RC-worst corner
α lt 10 delay variations are covered by the RC-worst corner
Dominated by C-worstΔdelay at C-worst gt Δdelay at RC-worst
Dominated by RC-worst Δdelay at RC-worst gt Δdelay at C-worst
Δdelay at RC-worst [d(Ycw) ndash d(Ytyp)] d(Ytyp)
48ISVLSI-2014 invited talk 140710
Delay Variation
α α
Δdelay at C-worst [d(Ycw) ndash d(Ytyp)] d(Ytyp)
bull Some paths have α gt 10 a CBC can underestimate delay variationsbull But these paths have larger delays at the other corner
C-worst corner underestimates delay variations but these paths are dominated by the RC-worst corner
α lt 10 delay variations are covered by the RC-worst corner
Dominated by C-worstΔdelay at C-worst gt Δdelay at RC-worst
Dominated by RC-worst Δdelay at RC-worst gt Δdelay at C-worst
Δdelay at RC-worst [d(Ycw) ndash d(Ytyp)] d(Ytyp)
bull Paths are more sensitive to R or to Cbull Using RC-worst or C-worst only will underestimate delay variationsbull Need both RC- and C-worst corners to cover process variationsbull In the following discussions α is defined at the dominant corner
49ISVLSI-2014 invited talk 140710
Non-Homogeneous Corner
bull Each layer can have different skewed variationsInterconnect stack with M1 and M2
M1 C
M2 C
3σ
Non-homogeneous cornerM1 == Cw (3σ)M2 == Ctyp
bull Less pessimism with non-homogeneous cornersbull Challenge
bull Many feasible combinationsbull A corner can only cover certain pathsbull How to choose the best combinations
50ISVLSI-2014 invited talk 140710
Opportunities for Tightened BEOL Corners
bull CBC can be pessimistic Most paths have α lt 05 bull Use tightened BEOL corners eg scale BEOL variation in
itf with α = 05
Δdj(Yrcw)dj(Ytyp) x 100
3σjd(Ytyp) x 100
Challenge how to avoid underestimating delay variation to preserve parametric yield
51ISVLSI-2014 invited talk 140710
Wiring Structure in Timing-Critical Paths
bull Critical paths are structurally similar
bull Wires on critical paths are routed on many layers
bull Structure is an outcome of the design flow
Testcasebull 45nm foundry library (wire
resistivity scaled by 8X)bull Netlist NETCARD 1mm2 570K
standard cell instancesbull 9 metal layersbull Extract critical paths from
different PVT and BEOL corners
Wirelength ratio ()
52ISVLSI-2014 invited talk 140710
Proposed Timing Signoff Flow
bull Extract RC at RC-worst C-worst and the typical corners
bull Calculate Δdelay of critical paths
bull Put path j in the group Gtbc if Δdelay is larger than a threshold
bull Fix only the paths in Gtbc using tightened BEOL corners
bull Since tightened corners have smaller delay variations timing closure is easier
Routed design
Timing analysis at BEOL corners Ytyp Ycw Yrcw
GTBC GCBC
ECOusing CBC
Timing analysis
using TBC
violation = 0
Timing analysis
using CBC
violation = 0
ECOusing TBC
done
53ISVLSI-2014 invited talk 140710
Experiment Setup
LEON3MP NETCARD SUPERBLUE12Clock period (ns) 18 20 31
bull Model BTI degradation with Vfinal throughout lifetime
bull Aging of a flat Vfinal asymp aging of an adaptive Vdd
bull But slightly pessimistic
Vdd
time
NBTI
PBTI
VBTI = Vlib asymp Vfinal
58ISVLSI-2014 invited talk 140710
Vfinal Estimation
bull Problem Vfinal is not available at early design stage (design has not been implemented)
bull Vfinal = Vdd end of life (to compensate BTI aging)
bull Gates along critical pathbull Timing slack at t = 0
bull Circuit activity is not an issue bull Because BTI effect is not sensitive to circuit activitybull DC or AC stress model is sufficient
59ISVLSI-2014 invited talk 140710
Observation and Heuristic 2
bull Observation 2 Vfinal is not sensitive to gate types
bull Heuristic 2 use average Vfinal of different gate typesbull Vfinal is a function of timing slack
bull Assume timing slack = 0
10mV
60ISVLSI-2014 invited talk 140710
Technology and Benchmark Circuits
bull NANGATE library with 32nm PTM technology bull Signoff for setup time violationbull Temperature = 125Cbull Process corner = slow NMOS and PMOSbull BTI degradation = DC AC
Supply voltages
Circuit Frequency (GHz)C5315 138c7552 125AES 089MPEG2 105
Vmax105V
Vinit090V
Vheur1 (DC) 097V
Vheur1 (AC) 095V
Vheur2 (DC) 095V
Vheur2 (AC) 093V
61ISVLSI-2014 invited talk 140710
A Reference Signoff Flow
bull Basic idea keep a consistent VBTI VLIB and Vdd throughout circuit lifetime
bull Signoff flowbull Estimate aging at each time step
bull Update circuit timing and Vdd
bull Repeat until t = tfinal
bull Modify circuit and start over if Vfinal gt maximum allowed voltage
bull No overhead in timing analysis but very slow Many STA runs
and library
Vstep AVS voltage stepVfinal converged voltage
62ISVLSI-2014 invited talk 140710
Experiment Setupbull Characterize different derated libraries
bull Evaluate impact of library characterizationbull Seven setups
1 VBTI = Vlib = Vinit Ignore AVS2 Most pessimistic derated library3 VBTI = Vlib = Vmax Extreme corner for AVS4 VBTI = Vfinal Do not overestimate aging but ignores AVS5 No derated library (reference)6 Proposed method with α=07 Proposed method with α=003
Case 1 2 3 4 5 6 7Vlib(V) Vinit Vinit Vmax Vinit NA Vheur1 Vheur2
VBTI (V) Vinit Vmax Vmax Vfinal NA Vheur1 Vheur2
63ISVLSI-2014 invited talk 140710
ldquoChicken and Eggrdquo Loop
bull ldquoChicken and eggrdquo loop in signoffbull Derated library characterization is related to BTI + AVSbull AVS affected by circuit implementation
bull Timing constraints critical paths etc
bull Circuit is affected by library characterization
Circuit
Derated Libraries
Vfinal
Vlib VBTI
64ISVLSI-2014 invited talk 140710
Bias Temperature Instability (BTI)
|ΔVth| increases when device is on (stressed)|ΔVth| is partially recovered when device is off (relaxed)NBTI PMOS PBTINMOS
|Vgs|
time
ON OFF ON OFF
[VattikondaWC06]
Device aging (|ΔVth|) accumulates over time
[TCASrsquo14]
65ISVLSI-2014 invited talk 140710
Observation 1
[Chan11]
bull BTI is a ldquofront-loadedrdquo phenomenon
bull 50 BTI aging happens within the 1st year of circuit lifetime (total lifetime = 10 years)
bull Most Vdd increment happens in early lifetime
bull Gap between Vdd and Vfinal reduces rapidly
asymp70 Vdd increment in 1 year(remaining 30 over 9 years)
Vfinal
66ISVLSI-2014 invited talk 140710
Results for DC Scenario
Optimistic signoff corner bull AVS increases supply voltage
aggressively to compensate agingbull Consume more powerbull May fail to meet timing if desired
bull Assume no EM fix at signoffbull BTI degradation is checked at each step and MTTF is updated as
30 MTTF penalty
200mV voltage compensation
>
70ISVLSI-2014 invited talk 140710
0 2 4 6 8 10 12090092094096098100102104
S1 S2 S3 S4 S5
Year
VDD
DMA 3S1 S2 S3 S4 S5
78
79
80
81
MTT
F (Y
ear)
EM Impact on AVS Scheduling
12 years MTTF penalty
>
71ISVLSI-2014 invited talk 140710
What is ldquoSignoffrdquo
bull Foundation of contract between design house and foundrybull ldquochip should workrdquo stack of models margins analysesbull Function timing signal integrity power integrity hellip
Nominal VddStatic IR drop
Power grid IR gradientDynamic IR
HCINBTI
Signoff Vdd
Voltage
Problem Margins = pessimism
overdesign schedule delay
ldquomargin stackrdquo for voltage signoff
Operating voltage
72ISVLSI-2014 invited talk 140710
Statistical Timing Analysis (1)
bull Delay sensitivity of path pj to variation source zv
bull Assumptions bull Δdjv is linear with respect to variation sources
bull Variation sources are normal distributions
bull Obtain Δdjv using 28 runs of RC extraction and static timing analysis (STA)
28 itf files (27 variation
sources + Ytyp)
Routed Netlist
RC extraction
STA
Δdjv
Δdjv = [ - ] 3dj(Yv) dj(Ytyp)
Note Path delay includes gate and wire delays
73ISVLSI-2014 invited talk 140710
Statistical Timing Analysis (2)bull Σ is the correlation matrix for variation sources (eg 27 x 27)
bull Σ = λλT (Note λ is obtained by Cholesky decomposition)
h h119879 119903119900119906119892 119901119906119905=1minus119864119877119879
+1minus119864119877119903times119879
recovery cycles
Clock period
Error rate [Kahng10]
76ISVLSI-2014 invited talk 140710
Selective-Endpoint Optimizationbull Optimize fanin cone w tighter constraints Allows replacement of Razor FF w normal FFbull Trade off cost of resilience vs data path optimization
bull Selection of signoff modes affects area power
bull ASP-DAC 2013 Optimization of signoff modes
Improve performance power or area
Reduce overdesign
NOM
ODNOM
OD
time
Vdd
tnom tOD tnom tOD
Also Multi-Mode Signoff Choices Matter
12
Fix fOD still 14 power range
Power of circuits w different overdrive modes
Different overdrive modes 26 power range
fnom = 800MHz Vnom = 08V
36ISVLSI-2014 invited talk 140710
Also Tunable Monitors Less Margin
Aggressive config Vmin_est lt Vmin_chip Some chips will fail
Optimized configbull Increase high
resistance passgatesbull Vmin_est asymp Vmin_chip
Default config
bull Low resistance passgates
bull Guardband for worst-case
bull Vmin_est gt Vmin_chip
bull 13mV margin
37ISVLSI-2014 invited talk 140710
Also Tunable Monitors Less Margin
Aggressive config Vmin_est lt Vmin_chip Some chips will fail
Default config
bull Low resistance passgates
bull Guardband for worst-case
bull Vmin_est gt Vmin_chip
bull 13mV margin
Optimized configbull Increase high
resistance passgatesbull Vmin_est asymp Vmin_chip
Benefits of tunability bull Compensate for difference
between model vs siliconbull Recover margin when variation is
reduced due to improved process
38ISVLSI-2014 invited talk 140710
Outlinebull Introductionbull Modeling of IC Variabilitybull Margining of IC Variabilitybull Tolerance of IC Variabilitybull Conclusions
39ISVLSI-2014 invited talk 140710
Conclusionsbull Variability severely challenges IC value
bull In manufacturing process during operation across lifetime
bull Benefit of ldquonext noderdquo is increasingly hard to findbull Entire node is a ldquo202020rdquo value propositionbull 5-10 in PPA metrics is now substantial at leading edge
bull Variability is connected to tapeout IC properties by models margins tolerances used in signoff
bull Some takeaways from this talkbull Substantial benefit from tightening BEOL corners (= signoff)bull ldquoMinimum cost of resiliencerdquo is a rich optimization challengebull Chicken-egg loops in signoff definition can be brokenbull Holistic approaches will provide ldquoequivalent scalingrdquo that
extends the value trajectory of Moorersquos Law
40ISVLSI-2014 invited talk 140710
Thank You
41ISVLSI-2014 invited talk 140710
Backup
42ISVLSI-2014 invited talk 140710
Power Penalty to Fix EM with AVS
1 2 3 4 5 6 7 8 91200
1300
1400
1500
1600
1700
030
032
034
036
Core Power (mW) PG Power (mW)
Implemetation
Core
Pow
er (m
W)
PG
Pow
er (m
W)
bull Core power increases due to elevated voltage bull PG power increases due to both elevated voltage and mesh degradationbull A tradeoff between invested guardband in signoff
Highest invested guardband
Least invested guardband
14 power penalty
>
43ISVLSI-2014 invited talk 140710
Homogeneous Cornersbull (1) Define RC corners of each layer separatelybull (2) Use corners from each layer to construct a
homogeneous corner for an interconnect stack
C-3σ
Layer M2
3σ
C-3σ
Layer M1
3σ
Interconnect stack with M1 and M2
M1 C
M2 C
3σ Pessimism
Example worst-case capacitance corner Homogeneous
Cw corner
44ISVLSI-2014 invited talk 140710
Homogeneous Cornersbull (1) Define RC corners of each layer separatelybull (2) Use corners from each layer to construct a
homogeneous corner for an interconnect stack
Interconnect stack with M1 and M2
M1 C
M2 C
3σ
Homogeneous Cw corner
C-3σ
Layer M2
3σ
C-3σ
Layer M1
3σ
Pessimism
Example worst-case capacitance corner
When variations in different layers are not fully correlated pessimism of homogeneous corners increase with layers
45ISVLSI-2014 invited talk 140710
Correlation Matrixbull Let Σ be the correlation matrix for variation sources
M1 M2 M3 M4
ΔW ΔT ΔH ΔW ΔT ΔH ΔW ΔT ΔH ΔW ΔT ΔH
M1 ΔW 1 0 0 γ 0 0 γ 0 0 0 0 0
ΔT 0 1 0 0 γ 0 0 γ 0 0 0 0
ΔH 0 0 1 0 0 γ 0 0 γ 0 0 0
M2 ΔW γ 0 0 1 0 0 γ 0 0 0 0 0
ΔT 0 γ 0 0 1 0 0 γ 0 0 0 0
ΔH 0 0 γ 0 0 1 0 0 γ 0 0 0
M3 ΔW γ 0 0 γ 0 0 1 0 0 0 0 0
ΔT 0 γ 0 0 γ 0 0 1 0 0 0 0
ΔH 0 0 γ 0 0 γ 0 0 1 0 0 0
M4 ΔW 0 0 0 0 0 0 0 0 0 1 0 0
ΔT 0 0 0 0 0 0 0 0 0 0 1 0
ΔH 0 0 0 0 0 0 0 0 0 0 0 1
= Σ
Correlation for variation sources with the same variation type and in the process module γ 05
Variation sources in different process modules are independent
46ISVLSI-2014 invited talk 140710
Wiring Structure in Timing-Critical Paths (2)
bull 92 of paths have lt 60 of wirelength on any single layer
Max wirelength ratio across all layers ()
Cum
ulati
ve p
roba
bilit
y
092
60
bull Variations in different layers are not fully correlated
bull Some paths have α gt 10 a CBC can underestimate delay variationsbull But these paths have larger delays at the other corner
C-worst corner underestimates delay variations but these paths are dominated by the RC-worst corner
α lt 10 delay variations are covered by the RC-worst corner
Dominated by C-worstΔdelay at C-worst gt Δdelay at RC-worst
Dominated by RC-worst Δdelay at RC-worst gt Δdelay at C-worst
Δdelay at RC-worst [d(Ycw) ndash d(Ytyp)] d(Ytyp)
48ISVLSI-2014 invited talk 140710
Delay Variation
α α
Δdelay at C-worst [d(Ycw) ndash d(Ytyp)] d(Ytyp)
bull Some paths have α gt 10 a CBC can underestimate delay variationsbull But these paths have larger delays at the other corner
C-worst corner underestimates delay variations but these paths are dominated by the RC-worst corner
α lt 10 delay variations are covered by the RC-worst corner
Dominated by C-worstΔdelay at C-worst gt Δdelay at RC-worst
Dominated by RC-worst Δdelay at RC-worst gt Δdelay at C-worst
Δdelay at RC-worst [d(Ycw) ndash d(Ytyp)] d(Ytyp)
bull Paths are more sensitive to R or to Cbull Using RC-worst or C-worst only will underestimate delay variationsbull Need both RC- and C-worst corners to cover process variationsbull In the following discussions α is defined at the dominant corner
49ISVLSI-2014 invited talk 140710
Non-Homogeneous Corner
bull Each layer can have different skewed variationsInterconnect stack with M1 and M2
M1 C
M2 C
3σ
Non-homogeneous cornerM1 == Cw (3σ)M2 == Ctyp
bull Less pessimism with non-homogeneous cornersbull Challenge
bull Many feasible combinationsbull A corner can only cover certain pathsbull How to choose the best combinations
50ISVLSI-2014 invited talk 140710
Opportunities for Tightened BEOL Corners
bull CBC can be pessimistic Most paths have α lt 05 bull Use tightened BEOL corners eg scale BEOL variation in
itf with α = 05
Δdj(Yrcw)dj(Ytyp) x 100
3σjd(Ytyp) x 100
Challenge how to avoid underestimating delay variation to preserve parametric yield
51ISVLSI-2014 invited talk 140710
Wiring Structure in Timing-Critical Paths
bull Critical paths are structurally similar
bull Wires on critical paths are routed on many layers
bull Structure is an outcome of the design flow
Testcasebull 45nm foundry library (wire
resistivity scaled by 8X)bull Netlist NETCARD 1mm2 570K
standard cell instancesbull 9 metal layersbull Extract critical paths from
different PVT and BEOL corners
Wirelength ratio ()
52ISVLSI-2014 invited talk 140710
Proposed Timing Signoff Flow
bull Extract RC at RC-worst C-worst and the typical corners
bull Calculate Δdelay of critical paths
bull Put path j in the group Gtbc if Δdelay is larger than a threshold
bull Fix only the paths in Gtbc using tightened BEOL corners
bull Since tightened corners have smaller delay variations timing closure is easier
Routed design
Timing analysis at BEOL corners Ytyp Ycw Yrcw
GTBC GCBC
ECOusing CBC
Timing analysis
using TBC
violation = 0
Timing analysis
using CBC
violation = 0
ECOusing TBC
done
53ISVLSI-2014 invited talk 140710
Experiment Setup
LEON3MP NETCARD SUPERBLUE12Clock period (ns) 18 20 31
bull Model BTI degradation with Vfinal throughout lifetime
bull Aging of a flat Vfinal asymp aging of an adaptive Vdd
bull But slightly pessimistic
Vdd
time
NBTI
PBTI
VBTI = Vlib asymp Vfinal
58ISVLSI-2014 invited talk 140710
Vfinal Estimation
bull Problem Vfinal is not available at early design stage (design has not been implemented)
bull Vfinal = Vdd end of life (to compensate BTI aging)
bull Gates along critical pathbull Timing slack at t = 0
bull Circuit activity is not an issue bull Because BTI effect is not sensitive to circuit activitybull DC or AC stress model is sufficient
59ISVLSI-2014 invited talk 140710
Observation and Heuristic 2
bull Observation 2 Vfinal is not sensitive to gate types
bull Heuristic 2 use average Vfinal of different gate typesbull Vfinal is a function of timing slack
bull Assume timing slack = 0
10mV
60ISVLSI-2014 invited talk 140710
Technology and Benchmark Circuits
bull NANGATE library with 32nm PTM technology bull Signoff for setup time violationbull Temperature = 125Cbull Process corner = slow NMOS and PMOSbull BTI degradation = DC AC
Supply voltages
Circuit Frequency (GHz)C5315 138c7552 125AES 089MPEG2 105
Vmax105V
Vinit090V
Vheur1 (DC) 097V
Vheur1 (AC) 095V
Vheur2 (DC) 095V
Vheur2 (AC) 093V
61ISVLSI-2014 invited talk 140710
A Reference Signoff Flow
bull Basic idea keep a consistent VBTI VLIB and Vdd throughout circuit lifetime
bull Signoff flowbull Estimate aging at each time step
bull Update circuit timing and Vdd
bull Repeat until t = tfinal
bull Modify circuit and start over if Vfinal gt maximum allowed voltage
bull No overhead in timing analysis but very slow Many STA runs
and library
Vstep AVS voltage stepVfinal converged voltage
62ISVLSI-2014 invited talk 140710
Experiment Setupbull Characterize different derated libraries
bull Evaluate impact of library characterizationbull Seven setups
1 VBTI = Vlib = Vinit Ignore AVS2 Most pessimistic derated library3 VBTI = Vlib = Vmax Extreme corner for AVS4 VBTI = Vfinal Do not overestimate aging but ignores AVS5 No derated library (reference)6 Proposed method with α=07 Proposed method with α=003
Case 1 2 3 4 5 6 7Vlib(V) Vinit Vinit Vmax Vinit NA Vheur1 Vheur2
VBTI (V) Vinit Vmax Vmax Vfinal NA Vheur1 Vheur2
63ISVLSI-2014 invited talk 140710
ldquoChicken and Eggrdquo Loop
bull ldquoChicken and eggrdquo loop in signoffbull Derated library characterization is related to BTI + AVSbull AVS affected by circuit implementation
bull Timing constraints critical paths etc
bull Circuit is affected by library characterization
Circuit
Derated Libraries
Vfinal
Vlib VBTI
64ISVLSI-2014 invited talk 140710
Bias Temperature Instability (BTI)
|ΔVth| increases when device is on (stressed)|ΔVth| is partially recovered when device is off (relaxed)NBTI PMOS PBTINMOS
|Vgs|
time
ON OFF ON OFF
[VattikondaWC06]
Device aging (|ΔVth|) accumulates over time
[TCASrsquo14]
65ISVLSI-2014 invited talk 140710
Observation 1
[Chan11]
bull BTI is a ldquofront-loadedrdquo phenomenon
bull 50 BTI aging happens within the 1st year of circuit lifetime (total lifetime = 10 years)
bull Most Vdd increment happens in early lifetime
bull Gap between Vdd and Vfinal reduces rapidly
asymp70 Vdd increment in 1 year(remaining 30 over 9 years)
Vfinal
66ISVLSI-2014 invited talk 140710
Results for DC Scenario
Optimistic signoff corner bull AVS increases supply voltage
aggressively to compensate agingbull Consume more powerbull May fail to meet timing if desired
bull Assume no EM fix at signoffbull BTI degradation is checked at each step and MTTF is updated as
30 MTTF penalty
200mV voltage compensation
>
70ISVLSI-2014 invited talk 140710
0 2 4 6 8 10 12090092094096098100102104
S1 S2 S3 S4 S5
Year
VDD
DMA 3S1 S2 S3 S4 S5
78
79
80
81
MTT
F (Y
ear)
EM Impact on AVS Scheduling
12 years MTTF penalty
>
71ISVLSI-2014 invited talk 140710
What is ldquoSignoffrdquo
bull Foundation of contract between design house and foundrybull ldquochip should workrdquo stack of models margins analysesbull Function timing signal integrity power integrity hellip
Nominal VddStatic IR drop
Power grid IR gradientDynamic IR
HCINBTI
Signoff Vdd
Voltage
Problem Margins = pessimism
overdesign schedule delay
ldquomargin stackrdquo for voltage signoff
Operating voltage
72ISVLSI-2014 invited talk 140710
Statistical Timing Analysis (1)
bull Delay sensitivity of path pj to variation source zv
bull Assumptions bull Δdjv is linear with respect to variation sources
bull Variation sources are normal distributions
bull Obtain Δdjv using 28 runs of RC extraction and static timing analysis (STA)
28 itf files (27 variation
sources + Ytyp)
Routed Netlist
RC extraction
STA
Δdjv
Δdjv = [ - ] 3dj(Yv) dj(Ytyp)
Note Path delay includes gate and wire delays
73ISVLSI-2014 invited talk 140710
Statistical Timing Analysis (2)bull Σ is the correlation matrix for variation sources (eg 27 x 27)
bull Σ = λλT (Note λ is obtained by Cholesky decomposition)
h h119879 119903119900119906119892 119901119906119905=1minus119864119877119879
+1minus119864119877119903times119879
recovery cycles
Clock period
Error rate [Kahng10]
76ISVLSI-2014 invited talk 140710
Selective-Endpoint Optimizationbull Optimize fanin cone w tighter constraints Allows replacement of Razor FF w normal FFbull Trade off cost of resilience vs data path optimization
bull Selection of signoff modes affects area power
bull ASP-DAC 2013 Optimization of signoff modes
Improve performance power or area
Reduce overdesign
NOM
ODNOM
OD
time
Vdd
tnom tOD tnom tOD
Also Multi-Mode Signoff Choices Matter
12
Fix fOD still 14 power range
Power of circuits w different overdrive modes
Different overdrive modes 26 power range
fnom = 800MHz Vnom = 08V
36ISVLSI-2014 invited talk 140710
Also Tunable Monitors Less Margin
Aggressive config Vmin_est lt Vmin_chip Some chips will fail
Optimized configbull Increase high
resistance passgatesbull Vmin_est asymp Vmin_chip
Default config
bull Low resistance passgates
bull Guardband for worst-case
bull Vmin_est gt Vmin_chip
bull 13mV margin
37ISVLSI-2014 invited talk 140710
Also Tunable Monitors Less Margin
Aggressive config Vmin_est lt Vmin_chip Some chips will fail
Default config
bull Low resistance passgates
bull Guardband for worst-case
bull Vmin_est gt Vmin_chip
bull 13mV margin
Optimized configbull Increase high
resistance passgatesbull Vmin_est asymp Vmin_chip
Benefits of tunability bull Compensate for difference
between model vs siliconbull Recover margin when variation is
reduced due to improved process
38ISVLSI-2014 invited talk 140710
Outlinebull Introductionbull Modeling of IC Variabilitybull Margining of IC Variabilitybull Tolerance of IC Variabilitybull Conclusions
39ISVLSI-2014 invited talk 140710
Conclusionsbull Variability severely challenges IC value
bull In manufacturing process during operation across lifetime
bull Benefit of ldquonext noderdquo is increasingly hard to findbull Entire node is a ldquo202020rdquo value propositionbull 5-10 in PPA metrics is now substantial at leading edge
bull Variability is connected to tapeout IC properties by models margins tolerances used in signoff
bull Some takeaways from this talkbull Substantial benefit from tightening BEOL corners (= signoff)bull ldquoMinimum cost of resiliencerdquo is a rich optimization challengebull Chicken-egg loops in signoff definition can be brokenbull Holistic approaches will provide ldquoequivalent scalingrdquo that
extends the value trajectory of Moorersquos Law
40ISVLSI-2014 invited talk 140710
Thank You
41ISVLSI-2014 invited talk 140710
Backup
42ISVLSI-2014 invited talk 140710
Power Penalty to Fix EM with AVS
1 2 3 4 5 6 7 8 91200
1300
1400
1500
1600
1700
030
032
034
036
Core Power (mW) PG Power (mW)
Implemetation
Core
Pow
er (m
W)
PG
Pow
er (m
W)
bull Core power increases due to elevated voltage bull PG power increases due to both elevated voltage and mesh degradationbull A tradeoff between invested guardband in signoff
Highest invested guardband
Least invested guardband
14 power penalty
>
43ISVLSI-2014 invited talk 140710
Homogeneous Cornersbull (1) Define RC corners of each layer separatelybull (2) Use corners from each layer to construct a
homogeneous corner for an interconnect stack
C-3σ
Layer M2
3σ
C-3σ
Layer M1
3σ
Interconnect stack with M1 and M2
M1 C
M2 C
3σ Pessimism
Example worst-case capacitance corner Homogeneous
Cw corner
44ISVLSI-2014 invited talk 140710
Homogeneous Cornersbull (1) Define RC corners of each layer separatelybull (2) Use corners from each layer to construct a
homogeneous corner for an interconnect stack
Interconnect stack with M1 and M2
M1 C
M2 C
3σ
Homogeneous Cw corner
C-3σ
Layer M2
3σ
C-3σ
Layer M1
3σ
Pessimism
Example worst-case capacitance corner
When variations in different layers are not fully correlated pessimism of homogeneous corners increase with layers
45ISVLSI-2014 invited talk 140710
Correlation Matrixbull Let Σ be the correlation matrix for variation sources
M1 M2 M3 M4
ΔW ΔT ΔH ΔW ΔT ΔH ΔW ΔT ΔH ΔW ΔT ΔH
M1 ΔW 1 0 0 γ 0 0 γ 0 0 0 0 0
ΔT 0 1 0 0 γ 0 0 γ 0 0 0 0
ΔH 0 0 1 0 0 γ 0 0 γ 0 0 0
M2 ΔW γ 0 0 1 0 0 γ 0 0 0 0 0
ΔT 0 γ 0 0 1 0 0 γ 0 0 0 0
ΔH 0 0 γ 0 0 1 0 0 γ 0 0 0
M3 ΔW γ 0 0 γ 0 0 1 0 0 0 0 0
ΔT 0 γ 0 0 γ 0 0 1 0 0 0 0
ΔH 0 0 γ 0 0 γ 0 0 1 0 0 0
M4 ΔW 0 0 0 0 0 0 0 0 0 1 0 0
ΔT 0 0 0 0 0 0 0 0 0 0 1 0
ΔH 0 0 0 0 0 0 0 0 0 0 0 1
= Σ
Correlation for variation sources with the same variation type and in the process module γ 05
Variation sources in different process modules are independent
46ISVLSI-2014 invited talk 140710
Wiring Structure in Timing-Critical Paths (2)
bull 92 of paths have lt 60 of wirelength on any single layer
Max wirelength ratio across all layers ()
Cum
ulati
ve p
roba
bilit
y
092
60
bull Variations in different layers are not fully correlated
bull Some paths have α gt 10 a CBC can underestimate delay variationsbull But these paths have larger delays at the other corner
C-worst corner underestimates delay variations but these paths are dominated by the RC-worst corner
α lt 10 delay variations are covered by the RC-worst corner
Dominated by C-worstΔdelay at C-worst gt Δdelay at RC-worst
Dominated by RC-worst Δdelay at RC-worst gt Δdelay at C-worst
Δdelay at RC-worst [d(Ycw) ndash d(Ytyp)] d(Ytyp)
48ISVLSI-2014 invited talk 140710
Delay Variation
α α
Δdelay at C-worst [d(Ycw) ndash d(Ytyp)] d(Ytyp)
bull Some paths have α gt 10 a CBC can underestimate delay variationsbull But these paths have larger delays at the other corner
C-worst corner underestimates delay variations but these paths are dominated by the RC-worst corner
α lt 10 delay variations are covered by the RC-worst corner
Dominated by C-worstΔdelay at C-worst gt Δdelay at RC-worst
Dominated by RC-worst Δdelay at RC-worst gt Δdelay at C-worst
Δdelay at RC-worst [d(Ycw) ndash d(Ytyp)] d(Ytyp)
bull Paths are more sensitive to R or to Cbull Using RC-worst or C-worst only will underestimate delay variationsbull Need both RC- and C-worst corners to cover process variationsbull In the following discussions α is defined at the dominant corner
49ISVLSI-2014 invited talk 140710
Non-Homogeneous Corner
bull Each layer can have different skewed variationsInterconnect stack with M1 and M2
M1 C
M2 C
3σ
Non-homogeneous cornerM1 == Cw (3σ)M2 == Ctyp
bull Less pessimism with non-homogeneous cornersbull Challenge
bull Many feasible combinationsbull A corner can only cover certain pathsbull How to choose the best combinations
50ISVLSI-2014 invited talk 140710
Opportunities for Tightened BEOL Corners
bull CBC can be pessimistic Most paths have α lt 05 bull Use tightened BEOL corners eg scale BEOL variation in
itf with α = 05
Δdj(Yrcw)dj(Ytyp) x 100
3σjd(Ytyp) x 100
Challenge how to avoid underestimating delay variation to preserve parametric yield
51ISVLSI-2014 invited talk 140710
Wiring Structure in Timing-Critical Paths
bull Critical paths are structurally similar
bull Wires on critical paths are routed on many layers
bull Structure is an outcome of the design flow
Testcasebull 45nm foundry library (wire
resistivity scaled by 8X)bull Netlist NETCARD 1mm2 570K
standard cell instancesbull 9 metal layersbull Extract critical paths from
different PVT and BEOL corners
Wirelength ratio ()
52ISVLSI-2014 invited talk 140710
Proposed Timing Signoff Flow
bull Extract RC at RC-worst C-worst and the typical corners
bull Calculate Δdelay of critical paths
bull Put path j in the group Gtbc if Δdelay is larger than a threshold
bull Fix only the paths in Gtbc using tightened BEOL corners
bull Since tightened corners have smaller delay variations timing closure is easier
Routed design
Timing analysis at BEOL corners Ytyp Ycw Yrcw
GTBC GCBC
ECOusing CBC
Timing analysis
using TBC
violation = 0
Timing analysis
using CBC
violation = 0
ECOusing TBC
done
53ISVLSI-2014 invited talk 140710
Experiment Setup
LEON3MP NETCARD SUPERBLUE12Clock period (ns) 18 20 31
bull Model BTI degradation with Vfinal throughout lifetime
bull Aging of a flat Vfinal asymp aging of an adaptive Vdd
bull But slightly pessimistic
Vdd
time
NBTI
PBTI
VBTI = Vlib asymp Vfinal
58ISVLSI-2014 invited talk 140710
Vfinal Estimation
bull Problem Vfinal is not available at early design stage (design has not been implemented)
bull Vfinal = Vdd end of life (to compensate BTI aging)
bull Gates along critical pathbull Timing slack at t = 0
bull Circuit activity is not an issue bull Because BTI effect is not sensitive to circuit activitybull DC or AC stress model is sufficient
59ISVLSI-2014 invited talk 140710
Observation and Heuristic 2
bull Observation 2 Vfinal is not sensitive to gate types
bull Heuristic 2 use average Vfinal of different gate typesbull Vfinal is a function of timing slack
bull Assume timing slack = 0
10mV
60ISVLSI-2014 invited talk 140710
Technology and Benchmark Circuits
bull NANGATE library with 32nm PTM technology bull Signoff for setup time violationbull Temperature = 125Cbull Process corner = slow NMOS and PMOSbull BTI degradation = DC AC
Supply voltages
Circuit Frequency (GHz)C5315 138c7552 125AES 089MPEG2 105
Vmax105V
Vinit090V
Vheur1 (DC) 097V
Vheur1 (AC) 095V
Vheur2 (DC) 095V
Vheur2 (AC) 093V
61ISVLSI-2014 invited talk 140710
A Reference Signoff Flow
bull Basic idea keep a consistent VBTI VLIB and Vdd throughout circuit lifetime
bull Signoff flowbull Estimate aging at each time step
bull Update circuit timing and Vdd
bull Repeat until t = tfinal
bull Modify circuit and start over if Vfinal gt maximum allowed voltage
bull No overhead in timing analysis but very slow Many STA runs
and library
Vstep AVS voltage stepVfinal converged voltage
62ISVLSI-2014 invited talk 140710
Experiment Setupbull Characterize different derated libraries
bull Evaluate impact of library characterizationbull Seven setups
1 VBTI = Vlib = Vinit Ignore AVS2 Most pessimistic derated library3 VBTI = Vlib = Vmax Extreme corner for AVS4 VBTI = Vfinal Do not overestimate aging but ignores AVS5 No derated library (reference)6 Proposed method with α=07 Proposed method with α=003
Case 1 2 3 4 5 6 7Vlib(V) Vinit Vinit Vmax Vinit NA Vheur1 Vheur2
VBTI (V) Vinit Vmax Vmax Vfinal NA Vheur1 Vheur2
63ISVLSI-2014 invited talk 140710
ldquoChicken and Eggrdquo Loop
bull ldquoChicken and eggrdquo loop in signoffbull Derated library characterization is related to BTI + AVSbull AVS affected by circuit implementation
bull Timing constraints critical paths etc
bull Circuit is affected by library characterization
Circuit
Derated Libraries
Vfinal
Vlib VBTI
64ISVLSI-2014 invited talk 140710
Bias Temperature Instability (BTI)
|ΔVth| increases when device is on (stressed)|ΔVth| is partially recovered when device is off (relaxed)NBTI PMOS PBTINMOS
|Vgs|
time
ON OFF ON OFF
[VattikondaWC06]
Device aging (|ΔVth|) accumulates over time
[TCASrsquo14]
65ISVLSI-2014 invited talk 140710
Observation 1
[Chan11]
bull BTI is a ldquofront-loadedrdquo phenomenon
bull 50 BTI aging happens within the 1st year of circuit lifetime (total lifetime = 10 years)
bull Most Vdd increment happens in early lifetime
bull Gap between Vdd and Vfinal reduces rapidly
asymp70 Vdd increment in 1 year(remaining 30 over 9 years)
Vfinal
66ISVLSI-2014 invited talk 140710
Results for DC Scenario
Optimistic signoff corner bull AVS increases supply voltage
aggressively to compensate agingbull Consume more powerbull May fail to meet timing if desired
bull Assume no EM fix at signoffbull BTI degradation is checked at each step and MTTF is updated as
30 MTTF penalty
200mV voltage compensation
>
70ISVLSI-2014 invited talk 140710
0 2 4 6 8 10 12090092094096098100102104
S1 S2 S3 S4 S5
Year
VDD
DMA 3S1 S2 S3 S4 S5
78
79
80
81
MTT
F (Y
ear)
EM Impact on AVS Scheduling
12 years MTTF penalty
>
71ISVLSI-2014 invited talk 140710
What is ldquoSignoffrdquo
bull Foundation of contract between design house and foundrybull ldquochip should workrdquo stack of models margins analysesbull Function timing signal integrity power integrity hellip
Nominal VddStatic IR drop
Power grid IR gradientDynamic IR
HCINBTI
Signoff Vdd
Voltage
Problem Margins = pessimism
overdesign schedule delay
ldquomargin stackrdquo for voltage signoff
Operating voltage
72ISVLSI-2014 invited talk 140710
Statistical Timing Analysis (1)
bull Delay sensitivity of path pj to variation source zv
bull Assumptions bull Δdjv is linear with respect to variation sources
bull Variation sources are normal distributions
bull Obtain Δdjv using 28 runs of RC extraction and static timing analysis (STA)
28 itf files (27 variation
sources + Ytyp)
Routed Netlist
RC extraction
STA
Δdjv
Δdjv = [ - ] 3dj(Yv) dj(Ytyp)
Note Path delay includes gate and wire delays
73ISVLSI-2014 invited talk 140710
Statistical Timing Analysis (2)bull Σ is the correlation matrix for variation sources (eg 27 x 27)
bull Σ = λλT (Note λ is obtained by Cholesky decomposition)
h h119879 119903119900119906119892 119901119906119905=1minus119864119877119879
+1minus119864119877119903times119879
recovery cycles
Clock period
Error rate [Kahng10]
76ISVLSI-2014 invited talk 140710
Selective-Endpoint Optimizationbull Optimize fanin cone w tighter constraints Allows replacement of Razor FF w normal FFbull Trade off cost of resilience vs data path optimization
bull Selection of signoff modes affects area power
bull ASP-DAC 2013 Optimization of signoff modes
Improve performance power or area
Reduce overdesign
NOM
ODNOM
OD
time
Vdd
tnom tOD tnom tOD
Also Multi-Mode Signoff Choices Matter
12
Fix fOD still 14 power range
Power of circuits w different overdrive modes
Different overdrive modes 26 power range
fnom = 800MHz Vnom = 08V
36ISVLSI-2014 invited talk 140710
Also Tunable Monitors Less Margin
Aggressive config Vmin_est lt Vmin_chip Some chips will fail
Optimized configbull Increase high
resistance passgatesbull Vmin_est asymp Vmin_chip
Default config
bull Low resistance passgates
bull Guardband for worst-case
bull Vmin_est gt Vmin_chip
bull 13mV margin
37ISVLSI-2014 invited talk 140710
Also Tunable Monitors Less Margin
Aggressive config Vmin_est lt Vmin_chip Some chips will fail
Default config
bull Low resistance passgates
bull Guardband for worst-case
bull Vmin_est gt Vmin_chip
bull 13mV margin
Optimized configbull Increase high
resistance passgatesbull Vmin_est asymp Vmin_chip
Benefits of tunability bull Compensate for difference
between model vs siliconbull Recover margin when variation is
reduced due to improved process
38ISVLSI-2014 invited talk 140710
Outlinebull Introductionbull Modeling of IC Variabilitybull Margining of IC Variabilitybull Tolerance of IC Variabilitybull Conclusions
39ISVLSI-2014 invited talk 140710
Conclusionsbull Variability severely challenges IC value
bull In manufacturing process during operation across lifetime
bull Benefit of ldquonext noderdquo is increasingly hard to findbull Entire node is a ldquo202020rdquo value propositionbull 5-10 in PPA metrics is now substantial at leading edge
bull Variability is connected to tapeout IC properties by models margins tolerances used in signoff
bull Some takeaways from this talkbull Substantial benefit from tightening BEOL corners (= signoff)bull ldquoMinimum cost of resiliencerdquo is a rich optimization challengebull Chicken-egg loops in signoff definition can be brokenbull Holistic approaches will provide ldquoequivalent scalingrdquo that
extends the value trajectory of Moorersquos Law
40ISVLSI-2014 invited talk 140710
Thank You
41ISVLSI-2014 invited talk 140710
Backup
42ISVLSI-2014 invited talk 140710
Power Penalty to Fix EM with AVS
1 2 3 4 5 6 7 8 91200
1300
1400
1500
1600
1700
030
032
034
036
Core Power (mW) PG Power (mW)
Implemetation
Core
Pow
er (m
W)
PG
Pow
er (m
W)
bull Core power increases due to elevated voltage bull PG power increases due to both elevated voltage and mesh degradationbull A tradeoff between invested guardband in signoff
Highest invested guardband
Least invested guardband
14 power penalty
>
43ISVLSI-2014 invited talk 140710
Homogeneous Cornersbull (1) Define RC corners of each layer separatelybull (2) Use corners from each layer to construct a
homogeneous corner for an interconnect stack
C-3σ
Layer M2
3σ
C-3σ
Layer M1
3σ
Interconnect stack with M1 and M2
M1 C
M2 C
3σ Pessimism
Example worst-case capacitance corner Homogeneous
Cw corner
44ISVLSI-2014 invited talk 140710
Homogeneous Cornersbull (1) Define RC corners of each layer separatelybull (2) Use corners from each layer to construct a
homogeneous corner for an interconnect stack
Interconnect stack with M1 and M2
M1 C
M2 C
3σ
Homogeneous Cw corner
C-3σ
Layer M2
3σ
C-3σ
Layer M1
3σ
Pessimism
Example worst-case capacitance corner
When variations in different layers are not fully correlated pessimism of homogeneous corners increase with layers
45ISVLSI-2014 invited talk 140710
Correlation Matrixbull Let Σ be the correlation matrix for variation sources
M1 M2 M3 M4
ΔW ΔT ΔH ΔW ΔT ΔH ΔW ΔT ΔH ΔW ΔT ΔH
M1 ΔW 1 0 0 γ 0 0 γ 0 0 0 0 0
ΔT 0 1 0 0 γ 0 0 γ 0 0 0 0
ΔH 0 0 1 0 0 γ 0 0 γ 0 0 0
M2 ΔW γ 0 0 1 0 0 γ 0 0 0 0 0
ΔT 0 γ 0 0 1 0 0 γ 0 0 0 0
ΔH 0 0 γ 0 0 1 0 0 γ 0 0 0
M3 ΔW γ 0 0 γ 0 0 1 0 0 0 0 0
ΔT 0 γ 0 0 γ 0 0 1 0 0 0 0
ΔH 0 0 γ 0 0 γ 0 0 1 0 0 0
M4 ΔW 0 0 0 0 0 0 0 0 0 1 0 0
ΔT 0 0 0 0 0 0 0 0 0 0 1 0
ΔH 0 0 0 0 0 0 0 0 0 0 0 1
= Σ
Correlation for variation sources with the same variation type and in the process module γ 05
Variation sources in different process modules are independent
46ISVLSI-2014 invited talk 140710
Wiring Structure in Timing-Critical Paths (2)
bull 92 of paths have lt 60 of wirelength on any single layer
Max wirelength ratio across all layers ()
Cum
ulati
ve p
roba
bilit
y
092
60
bull Variations in different layers are not fully correlated
bull Some paths have α gt 10 a CBC can underestimate delay variationsbull But these paths have larger delays at the other corner
C-worst corner underestimates delay variations but these paths are dominated by the RC-worst corner
α lt 10 delay variations are covered by the RC-worst corner
Dominated by C-worstΔdelay at C-worst gt Δdelay at RC-worst
Dominated by RC-worst Δdelay at RC-worst gt Δdelay at C-worst
Δdelay at RC-worst [d(Ycw) ndash d(Ytyp)] d(Ytyp)
48ISVLSI-2014 invited talk 140710
Delay Variation
α α
Δdelay at C-worst [d(Ycw) ndash d(Ytyp)] d(Ytyp)
bull Some paths have α gt 10 a CBC can underestimate delay variationsbull But these paths have larger delays at the other corner
C-worst corner underestimates delay variations but these paths are dominated by the RC-worst corner
α lt 10 delay variations are covered by the RC-worst corner
Dominated by C-worstΔdelay at C-worst gt Δdelay at RC-worst
Dominated by RC-worst Δdelay at RC-worst gt Δdelay at C-worst
Δdelay at RC-worst [d(Ycw) ndash d(Ytyp)] d(Ytyp)
bull Paths are more sensitive to R or to Cbull Using RC-worst or C-worst only will underestimate delay variationsbull Need both RC- and C-worst corners to cover process variationsbull In the following discussions α is defined at the dominant corner
49ISVLSI-2014 invited talk 140710
Non-Homogeneous Corner
bull Each layer can have different skewed variationsInterconnect stack with M1 and M2
M1 C
M2 C
3σ
Non-homogeneous cornerM1 == Cw (3σ)M2 == Ctyp
bull Less pessimism with non-homogeneous cornersbull Challenge
bull Many feasible combinationsbull A corner can only cover certain pathsbull How to choose the best combinations
50ISVLSI-2014 invited talk 140710
Opportunities for Tightened BEOL Corners
bull CBC can be pessimistic Most paths have α lt 05 bull Use tightened BEOL corners eg scale BEOL variation in
itf with α = 05
Δdj(Yrcw)dj(Ytyp) x 100
3σjd(Ytyp) x 100
Challenge how to avoid underestimating delay variation to preserve parametric yield
51ISVLSI-2014 invited talk 140710
Wiring Structure in Timing-Critical Paths
bull Critical paths are structurally similar
bull Wires on critical paths are routed on many layers
bull Structure is an outcome of the design flow
Testcasebull 45nm foundry library (wire
resistivity scaled by 8X)bull Netlist NETCARD 1mm2 570K
standard cell instancesbull 9 metal layersbull Extract critical paths from
different PVT and BEOL corners
Wirelength ratio ()
52ISVLSI-2014 invited talk 140710
Proposed Timing Signoff Flow
bull Extract RC at RC-worst C-worst and the typical corners
bull Calculate Δdelay of critical paths
bull Put path j in the group Gtbc if Δdelay is larger than a threshold
bull Fix only the paths in Gtbc using tightened BEOL corners
bull Since tightened corners have smaller delay variations timing closure is easier
Routed design
Timing analysis at BEOL corners Ytyp Ycw Yrcw
GTBC GCBC
ECOusing CBC
Timing analysis
using TBC
violation = 0
Timing analysis
using CBC
violation = 0
ECOusing TBC
done
53ISVLSI-2014 invited talk 140710
Experiment Setup
LEON3MP NETCARD SUPERBLUE12Clock period (ns) 18 20 31
bull Model BTI degradation with Vfinal throughout lifetime
bull Aging of a flat Vfinal asymp aging of an adaptive Vdd
bull But slightly pessimistic
Vdd
time
NBTI
PBTI
VBTI = Vlib asymp Vfinal
58ISVLSI-2014 invited talk 140710
Vfinal Estimation
bull Problem Vfinal is not available at early design stage (design has not been implemented)
bull Vfinal = Vdd end of life (to compensate BTI aging)
bull Gates along critical pathbull Timing slack at t = 0
bull Circuit activity is not an issue bull Because BTI effect is not sensitive to circuit activitybull DC or AC stress model is sufficient
59ISVLSI-2014 invited talk 140710
Observation and Heuristic 2
bull Observation 2 Vfinal is not sensitive to gate types
bull Heuristic 2 use average Vfinal of different gate typesbull Vfinal is a function of timing slack
bull Assume timing slack = 0
10mV
60ISVLSI-2014 invited talk 140710
Technology and Benchmark Circuits
bull NANGATE library with 32nm PTM technology bull Signoff for setup time violationbull Temperature = 125Cbull Process corner = slow NMOS and PMOSbull BTI degradation = DC AC
Supply voltages
Circuit Frequency (GHz)C5315 138c7552 125AES 089MPEG2 105
Vmax105V
Vinit090V
Vheur1 (DC) 097V
Vheur1 (AC) 095V
Vheur2 (DC) 095V
Vheur2 (AC) 093V
61ISVLSI-2014 invited talk 140710
A Reference Signoff Flow
bull Basic idea keep a consistent VBTI VLIB and Vdd throughout circuit lifetime
bull Signoff flowbull Estimate aging at each time step
bull Update circuit timing and Vdd
bull Repeat until t = tfinal
bull Modify circuit and start over if Vfinal gt maximum allowed voltage
bull No overhead in timing analysis but very slow Many STA runs
and library
Vstep AVS voltage stepVfinal converged voltage
62ISVLSI-2014 invited talk 140710
Experiment Setupbull Characterize different derated libraries
bull Evaluate impact of library characterizationbull Seven setups
1 VBTI = Vlib = Vinit Ignore AVS2 Most pessimistic derated library3 VBTI = Vlib = Vmax Extreme corner for AVS4 VBTI = Vfinal Do not overestimate aging but ignores AVS5 No derated library (reference)6 Proposed method with α=07 Proposed method with α=003
Case 1 2 3 4 5 6 7Vlib(V) Vinit Vinit Vmax Vinit NA Vheur1 Vheur2
VBTI (V) Vinit Vmax Vmax Vfinal NA Vheur1 Vheur2
63ISVLSI-2014 invited talk 140710
ldquoChicken and Eggrdquo Loop
bull ldquoChicken and eggrdquo loop in signoffbull Derated library characterization is related to BTI + AVSbull AVS affected by circuit implementation
bull Timing constraints critical paths etc
bull Circuit is affected by library characterization
Circuit
Derated Libraries
Vfinal
Vlib VBTI
64ISVLSI-2014 invited talk 140710
Bias Temperature Instability (BTI)
|ΔVth| increases when device is on (stressed)|ΔVth| is partially recovered when device is off (relaxed)NBTI PMOS PBTINMOS
|Vgs|
time
ON OFF ON OFF
[VattikondaWC06]
Device aging (|ΔVth|) accumulates over time
[TCASrsquo14]
65ISVLSI-2014 invited talk 140710
Observation 1
[Chan11]
bull BTI is a ldquofront-loadedrdquo phenomenon
bull 50 BTI aging happens within the 1st year of circuit lifetime (total lifetime = 10 years)
bull Most Vdd increment happens in early lifetime
bull Gap between Vdd and Vfinal reduces rapidly
asymp70 Vdd increment in 1 year(remaining 30 over 9 years)
Vfinal
66ISVLSI-2014 invited talk 140710
Results for DC Scenario
Optimistic signoff corner bull AVS increases supply voltage
aggressively to compensate agingbull Consume more powerbull May fail to meet timing if desired
bull Assume no EM fix at signoffbull BTI degradation is checked at each step and MTTF is updated as
30 MTTF penalty
200mV voltage compensation
>
70ISVLSI-2014 invited talk 140710
0 2 4 6 8 10 12090092094096098100102104
S1 S2 S3 S4 S5
Year
VDD
DMA 3S1 S2 S3 S4 S5
78
79
80
81
MTT
F (Y
ear)
EM Impact on AVS Scheduling
12 years MTTF penalty
>
71ISVLSI-2014 invited talk 140710
What is ldquoSignoffrdquo
bull Foundation of contract between design house and foundrybull ldquochip should workrdquo stack of models margins analysesbull Function timing signal integrity power integrity hellip
Nominal VddStatic IR drop
Power grid IR gradientDynamic IR
HCINBTI
Signoff Vdd
Voltage
Problem Margins = pessimism
overdesign schedule delay
ldquomargin stackrdquo for voltage signoff
Operating voltage
72ISVLSI-2014 invited talk 140710
Statistical Timing Analysis (1)
bull Delay sensitivity of path pj to variation source zv
bull Assumptions bull Δdjv is linear with respect to variation sources
bull Variation sources are normal distributions
bull Obtain Δdjv using 28 runs of RC extraction and static timing analysis (STA)
28 itf files (27 variation
sources + Ytyp)
Routed Netlist
RC extraction
STA
Δdjv
Δdjv = [ - ] 3dj(Yv) dj(Ytyp)
Note Path delay includes gate and wire delays
73ISVLSI-2014 invited talk 140710
Statistical Timing Analysis (2)bull Σ is the correlation matrix for variation sources (eg 27 x 27)
bull Σ = λλT (Note λ is obtained by Cholesky decomposition)
h h119879 119903119900119906119892 119901119906119905=1minus119864119877119879
+1minus119864119877119903times119879
recovery cycles
Clock period
Error rate [Kahng10]
76ISVLSI-2014 invited talk 140710
Selective-Endpoint Optimizationbull Optimize fanin cone w tighter constraints Allows replacement of Razor FF w normal FFbull Trade off cost of resilience vs data path optimization
bull Selection of signoff modes affects area power
bull ASP-DAC 2013 Optimization of signoff modes
Improve performance power or area
Reduce overdesign
NOM
ODNOM
OD
time
Vdd
tnom tOD tnom tOD
Also Multi-Mode Signoff Choices Matter
12
Fix fOD still 14 power range
Power of circuits w different overdrive modes
Different overdrive modes 26 power range
fnom = 800MHz Vnom = 08V
36ISVLSI-2014 invited talk 140710
Also Tunable Monitors Less Margin
Aggressive config Vmin_est lt Vmin_chip Some chips will fail
Optimized configbull Increase high
resistance passgatesbull Vmin_est asymp Vmin_chip
Default config
bull Low resistance passgates
bull Guardband for worst-case
bull Vmin_est gt Vmin_chip
bull 13mV margin
37ISVLSI-2014 invited talk 140710
Also Tunable Monitors Less Margin
Aggressive config Vmin_est lt Vmin_chip Some chips will fail
Default config
bull Low resistance passgates
bull Guardband for worst-case
bull Vmin_est gt Vmin_chip
bull 13mV margin
Optimized configbull Increase high
resistance passgatesbull Vmin_est asymp Vmin_chip
Benefits of tunability bull Compensate for difference
between model vs siliconbull Recover margin when variation is
reduced due to improved process
38ISVLSI-2014 invited talk 140710
Outlinebull Introductionbull Modeling of IC Variabilitybull Margining of IC Variabilitybull Tolerance of IC Variabilitybull Conclusions
39ISVLSI-2014 invited talk 140710
Conclusionsbull Variability severely challenges IC value
bull In manufacturing process during operation across lifetime
bull Benefit of ldquonext noderdquo is increasingly hard to findbull Entire node is a ldquo202020rdquo value propositionbull 5-10 in PPA metrics is now substantial at leading edge
bull Variability is connected to tapeout IC properties by models margins tolerances used in signoff
bull Some takeaways from this talkbull Substantial benefit from tightening BEOL corners (= signoff)bull ldquoMinimum cost of resiliencerdquo is a rich optimization challengebull Chicken-egg loops in signoff definition can be brokenbull Holistic approaches will provide ldquoequivalent scalingrdquo that
extends the value trajectory of Moorersquos Law
40ISVLSI-2014 invited talk 140710
Thank You
41ISVLSI-2014 invited talk 140710
Backup
42ISVLSI-2014 invited talk 140710
Power Penalty to Fix EM with AVS
1 2 3 4 5 6 7 8 91200
1300
1400
1500
1600
1700
030
032
034
036
Core Power (mW) PG Power (mW)
Implemetation
Core
Pow
er (m
W)
PG
Pow
er (m
W)
bull Core power increases due to elevated voltage bull PG power increases due to both elevated voltage and mesh degradationbull A tradeoff between invested guardband in signoff
Highest invested guardband
Least invested guardband
14 power penalty
>
43ISVLSI-2014 invited talk 140710
Homogeneous Cornersbull (1) Define RC corners of each layer separatelybull (2) Use corners from each layer to construct a
homogeneous corner for an interconnect stack
C-3σ
Layer M2
3σ
C-3σ
Layer M1
3σ
Interconnect stack with M1 and M2
M1 C
M2 C
3σ Pessimism
Example worst-case capacitance corner Homogeneous
Cw corner
44ISVLSI-2014 invited talk 140710
Homogeneous Cornersbull (1) Define RC corners of each layer separatelybull (2) Use corners from each layer to construct a
homogeneous corner for an interconnect stack
Interconnect stack with M1 and M2
M1 C
M2 C
3σ
Homogeneous Cw corner
C-3σ
Layer M2
3σ
C-3σ
Layer M1
3σ
Pessimism
Example worst-case capacitance corner
When variations in different layers are not fully correlated pessimism of homogeneous corners increase with layers
45ISVLSI-2014 invited talk 140710
Correlation Matrixbull Let Σ be the correlation matrix for variation sources
M1 M2 M3 M4
ΔW ΔT ΔH ΔW ΔT ΔH ΔW ΔT ΔH ΔW ΔT ΔH
M1 ΔW 1 0 0 γ 0 0 γ 0 0 0 0 0
ΔT 0 1 0 0 γ 0 0 γ 0 0 0 0
ΔH 0 0 1 0 0 γ 0 0 γ 0 0 0
M2 ΔW γ 0 0 1 0 0 γ 0 0 0 0 0
ΔT 0 γ 0 0 1 0 0 γ 0 0 0 0
ΔH 0 0 γ 0 0 1 0 0 γ 0 0 0
M3 ΔW γ 0 0 γ 0 0 1 0 0 0 0 0
ΔT 0 γ 0 0 γ 0 0 1 0 0 0 0
ΔH 0 0 γ 0 0 γ 0 0 1 0 0 0
M4 ΔW 0 0 0 0 0 0 0 0 0 1 0 0
ΔT 0 0 0 0 0 0 0 0 0 0 1 0
ΔH 0 0 0 0 0 0 0 0 0 0 0 1
= Σ
Correlation for variation sources with the same variation type and in the process module γ 05
Variation sources in different process modules are independent
46ISVLSI-2014 invited talk 140710
Wiring Structure in Timing-Critical Paths (2)
bull 92 of paths have lt 60 of wirelength on any single layer
Max wirelength ratio across all layers ()
Cum
ulati
ve p
roba
bilit
y
092
60
bull Variations in different layers are not fully correlated
bull Some paths have α gt 10 a CBC can underestimate delay variationsbull But these paths have larger delays at the other corner
C-worst corner underestimates delay variations but these paths are dominated by the RC-worst corner
α lt 10 delay variations are covered by the RC-worst corner
Dominated by C-worstΔdelay at C-worst gt Δdelay at RC-worst
Dominated by RC-worst Δdelay at RC-worst gt Δdelay at C-worst
Δdelay at RC-worst [d(Ycw) ndash d(Ytyp)] d(Ytyp)
48ISVLSI-2014 invited talk 140710
Delay Variation
α α
Δdelay at C-worst [d(Ycw) ndash d(Ytyp)] d(Ytyp)
bull Some paths have α gt 10 a CBC can underestimate delay variationsbull But these paths have larger delays at the other corner
C-worst corner underestimates delay variations but these paths are dominated by the RC-worst corner
α lt 10 delay variations are covered by the RC-worst corner
Dominated by C-worstΔdelay at C-worst gt Δdelay at RC-worst
Dominated by RC-worst Δdelay at RC-worst gt Δdelay at C-worst
Δdelay at RC-worst [d(Ycw) ndash d(Ytyp)] d(Ytyp)
bull Paths are more sensitive to R or to Cbull Using RC-worst or C-worst only will underestimate delay variationsbull Need both RC- and C-worst corners to cover process variationsbull In the following discussions α is defined at the dominant corner
49ISVLSI-2014 invited talk 140710
Non-Homogeneous Corner
bull Each layer can have different skewed variationsInterconnect stack with M1 and M2
M1 C
M2 C
3σ
Non-homogeneous cornerM1 == Cw (3σ)M2 == Ctyp
bull Less pessimism with non-homogeneous cornersbull Challenge
bull Many feasible combinationsbull A corner can only cover certain pathsbull How to choose the best combinations
50ISVLSI-2014 invited talk 140710
Opportunities for Tightened BEOL Corners
bull CBC can be pessimistic Most paths have α lt 05 bull Use tightened BEOL corners eg scale BEOL variation in
itf with α = 05
Δdj(Yrcw)dj(Ytyp) x 100
3σjd(Ytyp) x 100
Challenge how to avoid underestimating delay variation to preserve parametric yield
51ISVLSI-2014 invited talk 140710
Wiring Structure in Timing-Critical Paths
bull Critical paths are structurally similar
bull Wires on critical paths are routed on many layers
bull Structure is an outcome of the design flow
Testcasebull 45nm foundry library (wire
resistivity scaled by 8X)bull Netlist NETCARD 1mm2 570K
standard cell instancesbull 9 metal layersbull Extract critical paths from
different PVT and BEOL corners
Wirelength ratio ()
52ISVLSI-2014 invited talk 140710
Proposed Timing Signoff Flow
bull Extract RC at RC-worst C-worst and the typical corners
bull Calculate Δdelay of critical paths
bull Put path j in the group Gtbc if Δdelay is larger than a threshold
bull Fix only the paths in Gtbc using tightened BEOL corners
bull Since tightened corners have smaller delay variations timing closure is easier
Routed design
Timing analysis at BEOL corners Ytyp Ycw Yrcw
GTBC GCBC
ECOusing CBC
Timing analysis
using TBC
violation = 0
Timing analysis
using CBC
violation = 0
ECOusing TBC
done
53ISVLSI-2014 invited talk 140710
Experiment Setup
LEON3MP NETCARD SUPERBLUE12Clock period (ns) 18 20 31
bull Model BTI degradation with Vfinal throughout lifetime
bull Aging of a flat Vfinal asymp aging of an adaptive Vdd
bull But slightly pessimistic
Vdd
time
NBTI
PBTI
VBTI = Vlib asymp Vfinal
58ISVLSI-2014 invited talk 140710
Vfinal Estimation
bull Problem Vfinal is not available at early design stage (design has not been implemented)
bull Vfinal = Vdd end of life (to compensate BTI aging)
bull Gates along critical pathbull Timing slack at t = 0
bull Circuit activity is not an issue bull Because BTI effect is not sensitive to circuit activitybull DC or AC stress model is sufficient
59ISVLSI-2014 invited talk 140710
Observation and Heuristic 2
bull Observation 2 Vfinal is not sensitive to gate types
bull Heuristic 2 use average Vfinal of different gate typesbull Vfinal is a function of timing slack
bull Assume timing slack = 0
10mV
60ISVLSI-2014 invited talk 140710
Technology and Benchmark Circuits
bull NANGATE library with 32nm PTM technology bull Signoff for setup time violationbull Temperature = 125Cbull Process corner = slow NMOS and PMOSbull BTI degradation = DC AC
Supply voltages
Circuit Frequency (GHz)C5315 138c7552 125AES 089MPEG2 105
Vmax105V
Vinit090V
Vheur1 (DC) 097V
Vheur1 (AC) 095V
Vheur2 (DC) 095V
Vheur2 (AC) 093V
61ISVLSI-2014 invited talk 140710
A Reference Signoff Flow
bull Basic idea keep a consistent VBTI VLIB and Vdd throughout circuit lifetime
bull Signoff flowbull Estimate aging at each time step
bull Update circuit timing and Vdd
bull Repeat until t = tfinal
bull Modify circuit and start over if Vfinal gt maximum allowed voltage
bull No overhead in timing analysis but very slow Many STA runs
and library
Vstep AVS voltage stepVfinal converged voltage
62ISVLSI-2014 invited talk 140710
Experiment Setupbull Characterize different derated libraries
bull Evaluate impact of library characterizationbull Seven setups
1 VBTI = Vlib = Vinit Ignore AVS2 Most pessimistic derated library3 VBTI = Vlib = Vmax Extreme corner for AVS4 VBTI = Vfinal Do not overestimate aging but ignores AVS5 No derated library (reference)6 Proposed method with α=07 Proposed method with α=003
Case 1 2 3 4 5 6 7Vlib(V) Vinit Vinit Vmax Vinit NA Vheur1 Vheur2
VBTI (V) Vinit Vmax Vmax Vfinal NA Vheur1 Vheur2
63ISVLSI-2014 invited talk 140710
ldquoChicken and Eggrdquo Loop
bull ldquoChicken and eggrdquo loop in signoffbull Derated library characterization is related to BTI + AVSbull AVS affected by circuit implementation
bull Timing constraints critical paths etc
bull Circuit is affected by library characterization
Circuit
Derated Libraries
Vfinal
Vlib VBTI
64ISVLSI-2014 invited talk 140710
Bias Temperature Instability (BTI)
|ΔVth| increases when device is on (stressed)|ΔVth| is partially recovered when device is off (relaxed)NBTI PMOS PBTINMOS
|Vgs|
time
ON OFF ON OFF
[VattikondaWC06]
Device aging (|ΔVth|) accumulates over time
[TCASrsquo14]
65ISVLSI-2014 invited talk 140710
Observation 1
[Chan11]
bull BTI is a ldquofront-loadedrdquo phenomenon
bull 50 BTI aging happens within the 1st year of circuit lifetime (total lifetime = 10 years)
bull Most Vdd increment happens in early lifetime
bull Gap between Vdd and Vfinal reduces rapidly
asymp70 Vdd increment in 1 year(remaining 30 over 9 years)
Vfinal
66ISVLSI-2014 invited talk 140710
Results for DC Scenario
Optimistic signoff corner bull AVS increases supply voltage
aggressively to compensate agingbull Consume more powerbull May fail to meet timing if desired
bull Assume no EM fix at signoffbull BTI degradation is checked at each step and MTTF is updated as
30 MTTF penalty
200mV voltage compensation
>
70ISVLSI-2014 invited talk 140710
0 2 4 6 8 10 12090092094096098100102104
S1 S2 S3 S4 S5
Year
VDD
DMA 3S1 S2 S3 S4 S5
78
79
80
81
MTT
F (Y
ear)
EM Impact on AVS Scheduling
12 years MTTF penalty
>
71ISVLSI-2014 invited talk 140710
What is ldquoSignoffrdquo
bull Foundation of contract between design house and foundrybull ldquochip should workrdquo stack of models margins analysesbull Function timing signal integrity power integrity hellip
Nominal VddStatic IR drop
Power grid IR gradientDynamic IR
HCINBTI
Signoff Vdd
Voltage
Problem Margins = pessimism
overdesign schedule delay
ldquomargin stackrdquo for voltage signoff
Operating voltage
72ISVLSI-2014 invited talk 140710
Statistical Timing Analysis (1)
bull Delay sensitivity of path pj to variation source zv
bull Assumptions bull Δdjv is linear with respect to variation sources
bull Variation sources are normal distributions
bull Obtain Δdjv using 28 runs of RC extraction and static timing analysis (STA)
28 itf files (27 variation
sources + Ytyp)
Routed Netlist
RC extraction
STA
Δdjv
Δdjv = [ - ] 3dj(Yv) dj(Ytyp)
Note Path delay includes gate and wire delays
73ISVLSI-2014 invited talk 140710
Statistical Timing Analysis (2)bull Σ is the correlation matrix for variation sources (eg 27 x 27)
bull Σ = λλT (Note λ is obtained by Cholesky decomposition)
h h119879 119903119900119906119892 119901119906119905=1minus119864119877119879
+1minus119864119877119903times119879
recovery cycles
Clock period
Error rate [Kahng10]
76ISVLSI-2014 invited talk 140710
Selective-Endpoint Optimizationbull Optimize fanin cone w tighter constraints Allows replacement of Razor FF w normal FFbull Trade off cost of resilience vs data path optimization
bull Selection of signoff modes affects area power
bull ASP-DAC 2013 Optimization of signoff modes
Improve performance power or area
Reduce overdesign
NOM
ODNOM
OD
time
Vdd
tnom tOD tnom tOD
Also Multi-Mode Signoff Choices Matter
12
Fix fOD still 14 power range
Power of circuits w different overdrive modes
Different overdrive modes 26 power range
fnom = 800MHz Vnom = 08V
36ISVLSI-2014 invited talk 140710
Also Tunable Monitors Less Margin
Aggressive config Vmin_est lt Vmin_chip Some chips will fail
Optimized configbull Increase high
resistance passgatesbull Vmin_est asymp Vmin_chip
Default config
bull Low resistance passgates
bull Guardband for worst-case
bull Vmin_est gt Vmin_chip
bull 13mV margin
37ISVLSI-2014 invited talk 140710
Also Tunable Monitors Less Margin
Aggressive config Vmin_est lt Vmin_chip Some chips will fail
Default config
bull Low resistance passgates
bull Guardband for worst-case
bull Vmin_est gt Vmin_chip
bull 13mV margin
Optimized configbull Increase high
resistance passgatesbull Vmin_est asymp Vmin_chip
Benefits of tunability bull Compensate for difference
between model vs siliconbull Recover margin when variation is
reduced due to improved process
38ISVLSI-2014 invited talk 140710
Outlinebull Introductionbull Modeling of IC Variabilitybull Margining of IC Variabilitybull Tolerance of IC Variabilitybull Conclusions
39ISVLSI-2014 invited talk 140710
Conclusionsbull Variability severely challenges IC value
bull In manufacturing process during operation across lifetime
bull Benefit of ldquonext noderdquo is increasingly hard to findbull Entire node is a ldquo202020rdquo value propositionbull 5-10 in PPA metrics is now substantial at leading edge
bull Variability is connected to tapeout IC properties by models margins tolerances used in signoff
bull Some takeaways from this talkbull Substantial benefit from tightening BEOL corners (= signoff)bull ldquoMinimum cost of resiliencerdquo is a rich optimization challengebull Chicken-egg loops in signoff definition can be brokenbull Holistic approaches will provide ldquoequivalent scalingrdquo that
extends the value trajectory of Moorersquos Law
40ISVLSI-2014 invited talk 140710
Thank You
41ISVLSI-2014 invited talk 140710
Backup
42ISVLSI-2014 invited talk 140710
Power Penalty to Fix EM with AVS
1 2 3 4 5 6 7 8 91200
1300
1400
1500
1600
1700
030
032
034
036
Core Power (mW) PG Power (mW)
Implemetation
Core
Pow
er (m
W)
PG
Pow
er (m
W)
bull Core power increases due to elevated voltage bull PG power increases due to both elevated voltage and mesh degradationbull A tradeoff between invested guardband in signoff
Highest invested guardband
Least invested guardband
14 power penalty
>
43ISVLSI-2014 invited talk 140710
Homogeneous Cornersbull (1) Define RC corners of each layer separatelybull (2) Use corners from each layer to construct a
homogeneous corner for an interconnect stack
C-3σ
Layer M2
3σ
C-3σ
Layer M1
3σ
Interconnect stack with M1 and M2
M1 C
M2 C
3σ Pessimism
Example worst-case capacitance corner Homogeneous
Cw corner
44ISVLSI-2014 invited talk 140710
Homogeneous Cornersbull (1) Define RC corners of each layer separatelybull (2) Use corners from each layer to construct a
homogeneous corner for an interconnect stack
Interconnect stack with M1 and M2
M1 C
M2 C
3σ
Homogeneous Cw corner
C-3σ
Layer M2
3σ
C-3σ
Layer M1
3σ
Pessimism
Example worst-case capacitance corner
When variations in different layers are not fully correlated pessimism of homogeneous corners increase with layers
45ISVLSI-2014 invited talk 140710
Correlation Matrixbull Let Σ be the correlation matrix for variation sources
M1 M2 M3 M4
ΔW ΔT ΔH ΔW ΔT ΔH ΔW ΔT ΔH ΔW ΔT ΔH
M1 ΔW 1 0 0 γ 0 0 γ 0 0 0 0 0
ΔT 0 1 0 0 γ 0 0 γ 0 0 0 0
ΔH 0 0 1 0 0 γ 0 0 γ 0 0 0
M2 ΔW γ 0 0 1 0 0 γ 0 0 0 0 0
ΔT 0 γ 0 0 1 0 0 γ 0 0 0 0
ΔH 0 0 γ 0 0 1 0 0 γ 0 0 0
M3 ΔW γ 0 0 γ 0 0 1 0 0 0 0 0
ΔT 0 γ 0 0 γ 0 0 1 0 0 0 0
ΔH 0 0 γ 0 0 γ 0 0 1 0 0 0
M4 ΔW 0 0 0 0 0 0 0 0 0 1 0 0
ΔT 0 0 0 0 0 0 0 0 0 0 1 0
ΔH 0 0 0 0 0 0 0 0 0 0 0 1
= Σ
Correlation for variation sources with the same variation type and in the process module γ 05
Variation sources in different process modules are independent
46ISVLSI-2014 invited talk 140710
Wiring Structure in Timing-Critical Paths (2)
bull 92 of paths have lt 60 of wirelength on any single layer
Max wirelength ratio across all layers ()
Cum
ulati
ve p
roba
bilit
y
092
60
bull Variations in different layers are not fully correlated
bull Some paths have α gt 10 a CBC can underestimate delay variationsbull But these paths have larger delays at the other corner
C-worst corner underestimates delay variations but these paths are dominated by the RC-worst corner
α lt 10 delay variations are covered by the RC-worst corner
Dominated by C-worstΔdelay at C-worst gt Δdelay at RC-worst
Dominated by RC-worst Δdelay at RC-worst gt Δdelay at C-worst
Δdelay at RC-worst [d(Ycw) ndash d(Ytyp)] d(Ytyp)
48ISVLSI-2014 invited talk 140710
Delay Variation
α α
Δdelay at C-worst [d(Ycw) ndash d(Ytyp)] d(Ytyp)
bull Some paths have α gt 10 a CBC can underestimate delay variationsbull But these paths have larger delays at the other corner
C-worst corner underestimates delay variations but these paths are dominated by the RC-worst corner
α lt 10 delay variations are covered by the RC-worst corner
Dominated by C-worstΔdelay at C-worst gt Δdelay at RC-worst
Dominated by RC-worst Δdelay at RC-worst gt Δdelay at C-worst
Δdelay at RC-worst [d(Ycw) ndash d(Ytyp)] d(Ytyp)
bull Paths are more sensitive to R or to Cbull Using RC-worst or C-worst only will underestimate delay variationsbull Need both RC- and C-worst corners to cover process variationsbull In the following discussions α is defined at the dominant corner
49ISVLSI-2014 invited talk 140710
Non-Homogeneous Corner
bull Each layer can have different skewed variationsInterconnect stack with M1 and M2
M1 C
M2 C
3σ
Non-homogeneous cornerM1 == Cw (3σ)M2 == Ctyp
bull Less pessimism with non-homogeneous cornersbull Challenge
bull Many feasible combinationsbull A corner can only cover certain pathsbull How to choose the best combinations
50ISVLSI-2014 invited talk 140710
Opportunities for Tightened BEOL Corners
bull CBC can be pessimistic Most paths have α lt 05 bull Use tightened BEOL corners eg scale BEOL variation in
itf with α = 05
Δdj(Yrcw)dj(Ytyp) x 100
3σjd(Ytyp) x 100
Challenge how to avoid underestimating delay variation to preserve parametric yield
51ISVLSI-2014 invited talk 140710
Wiring Structure in Timing-Critical Paths
bull Critical paths are structurally similar
bull Wires on critical paths are routed on many layers
bull Structure is an outcome of the design flow
Testcasebull 45nm foundry library (wire
resistivity scaled by 8X)bull Netlist NETCARD 1mm2 570K
standard cell instancesbull 9 metal layersbull Extract critical paths from
different PVT and BEOL corners
Wirelength ratio ()
52ISVLSI-2014 invited talk 140710
Proposed Timing Signoff Flow
bull Extract RC at RC-worst C-worst and the typical corners
bull Calculate Δdelay of critical paths
bull Put path j in the group Gtbc if Δdelay is larger than a threshold
bull Fix only the paths in Gtbc using tightened BEOL corners
bull Since tightened corners have smaller delay variations timing closure is easier
Routed design
Timing analysis at BEOL corners Ytyp Ycw Yrcw
GTBC GCBC
ECOusing CBC
Timing analysis
using TBC
violation = 0
Timing analysis
using CBC
violation = 0
ECOusing TBC
done
53ISVLSI-2014 invited talk 140710
Experiment Setup
LEON3MP NETCARD SUPERBLUE12Clock period (ns) 18 20 31
bull Model BTI degradation with Vfinal throughout lifetime
bull Aging of a flat Vfinal asymp aging of an adaptive Vdd
bull But slightly pessimistic
Vdd
time
NBTI
PBTI
VBTI = Vlib asymp Vfinal
58ISVLSI-2014 invited talk 140710
Vfinal Estimation
bull Problem Vfinal is not available at early design stage (design has not been implemented)
bull Vfinal = Vdd end of life (to compensate BTI aging)
bull Gates along critical pathbull Timing slack at t = 0
bull Circuit activity is not an issue bull Because BTI effect is not sensitive to circuit activitybull DC or AC stress model is sufficient
59ISVLSI-2014 invited talk 140710
Observation and Heuristic 2
bull Observation 2 Vfinal is not sensitive to gate types
bull Heuristic 2 use average Vfinal of different gate typesbull Vfinal is a function of timing slack
bull Assume timing slack = 0
10mV
60ISVLSI-2014 invited talk 140710
Technology and Benchmark Circuits
bull NANGATE library with 32nm PTM technology bull Signoff for setup time violationbull Temperature = 125Cbull Process corner = slow NMOS and PMOSbull BTI degradation = DC AC
Supply voltages
Circuit Frequency (GHz)C5315 138c7552 125AES 089MPEG2 105
Vmax105V
Vinit090V
Vheur1 (DC) 097V
Vheur1 (AC) 095V
Vheur2 (DC) 095V
Vheur2 (AC) 093V
61ISVLSI-2014 invited talk 140710
A Reference Signoff Flow
bull Basic idea keep a consistent VBTI VLIB and Vdd throughout circuit lifetime
bull Signoff flowbull Estimate aging at each time step
bull Update circuit timing and Vdd
bull Repeat until t = tfinal
bull Modify circuit and start over if Vfinal gt maximum allowed voltage
bull No overhead in timing analysis but very slow Many STA runs
and library
Vstep AVS voltage stepVfinal converged voltage
62ISVLSI-2014 invited talk 140710
Experiment Setupbull Characterize different derated libraries
bull Evaluate impact of library characterizationbull Seven setups
1 VBTI = Vlib = Vinit Ignore AVS2 Most pessimistic derated library3 VBTI = Vlib = Vmax Extreme corner for AVS4 VBTI = Vfinal Do not overestimate aging but ignores AVS5 No derated library (reference)6 Proposed method with α=07 Proposed method with α=003
Case 1 2 3 4 5 6 7Vlib(V) Vinit Vinit Vmax Vinit NA Vheur1 Vheur2
VBTI (V) Vinit Vmax Vmax Vfinal NA Vheur1 Vheur2
63ISVLSI-2014 invited talk 140710
ldquoChicken and Eggrdquo Loop
bull ldquoChicken and eggrdquo loop in signoffbull Derated library characterization is related to BTI + AVSbull AVS affected by circuit implementation
bull Timing constraints critical paths etc
bull Circuit is affected by library characterization
Circuit
Derated Libraries
Vfinal
Vlib VBTI
64ISVLSI-2014 invited talk 140710
Bias Temperature Instability (BTI)
|ΔVth| increases when device is on (stressed)|ΔVth| is partially recovered when device is off (relaxed)NBTI PMOS PBTINMOS
|Vgs|
time
ON OFF ON OFF
[VattikondaWC06]
Device aging (|ΔVth|) accumulates over time
[TCASrsquo14]
65ISVLSI-2014 invited talk 140710
Observation 1
[Chan11]
bull BTI is a ldquofront-loadedrdquo phenomenon
bull 50 BTI aging happens within the 1st year of circuit lifetime (total lifetime = 10 years)
bull Most Vdd increment happens in early lifetime
bull Gap between Vdd and Vfinal reduces rapidly
asymp70 Vdd increment in 1 year(remaining 30 over 9 years)
Vfinal
66ISVLSI-2014 invited talk 140710
Results for DC Scenario
Optimistic signoff corner bull AVS increases supply voltage
aggressively to compensate agingbull Consume more powerbull May fail to meet timing if desired
bull Assume no EM fix at signoffbull BTI degradation is checked at each step and MTTF is updated as
30 MTTF penalty
200mV voltage compensation
>
70ISVLSI-2014 invited talk 140710
0 2 4 6 8 10 12090092094096098100102104
S1 S2 S3 S4 S5
Year
VDD
DMA 3S1 S2 S3 S4 S5
78
79
80
81
MTT
F (Y
ear)
EM Impact on AVS Scheduling
12 years MTTF penalty
>
71ISVLSI-2014 invited talk 140710
What is ldquoSignoffrdquo
bull Foundation of contract between design house and foundrybull ldquochip should workrdquo stack of models margins analysesbull Function timing signal integrity power integrity hellip
Nominal VddStatic IR drop
Power grid IR gradientDynamic IR
HCINBTI
Signoff Vdd
Voltage
Problem Margins = pessimism
overdesign schedule delay
ldquomargin stackrdquo for voltage signoff
Operating voltage
72ISVLSI-2014 invited talk 140710
Statistical Timing Analysis (1)
bull Delay sensitivity of path pj to variation source zv
bull Assumptions bull Δdjv is linear with respect to variation sources
bull Variation sources are normal distributions
bull Obtain Δdjv using 28 runs of RC extraction and static timing analysis (STA)
28 itf files (27 variation
sources + Ytyp)
Routed Netlist
RC extraction
STA
Δdjv
Δdjv = [ - ] 3dj(Yv) dj(Ytyp)
Note Path delay includes gate and wire delays
73ISVLSI-2014 invited talk 140710
Statistical Timing Analysis (2)bull Σ is the correlation matrix for variation sources (eg 27 x 27)
bull Σ = λλT (Note λ is obtained by Cholesky decomposition)
h h119879 119903119900119906119892 119901119906119905=1minus119864119877119879
+1minus119864119877119903times119879
recovery cycles
Clock period
Error rate [Kahng10]
76ISVLSI-2014 invited talk 140710
Selective-Endpoint Optimizationbull Optimize fanin cone w tighter constraints Allows replacement of Razor FF w normal FFbull Trade off cost of resilience vs data path optimization
bull Selection of signoff modes affects area power
bull ASP-DAC 2013 Optimization of signoff modes
Improve performance power or area
Reduce overdesign
NOM
ODNOM
OD
time
Vdd
tnom tOD tnom tOD
Also Multi-Mode Signoff Choices Matter
12
Fix fOD still 14 power range
Power of circuits w different overdrive modes
Different overdrive modes 26 power range
fnom = 800MHz Vnom = 08V
36ISVLSI-2014 invited talk 140710
Also Tunable Monitors Less Margin
Aggressive config Vmin_est lt Vmin_chip Some chips will fail
Optimized configbull Increase high
resistance passgatesbull Vmin_est asymp Vmin_chip
Default config
bull Low resistance passgates
bull Guardband for worst-case
bull Vmin_est gt Vmin_chip
bull 13mV margin
37ISVLSI-2014 invited talk 140710
Also Tunable Monitors Less Margin
Aggressive config Vmin_est lt Vmin_chip Some chips will fail
Default config
bull Low resistance passgates
bull Guardband for worst-case
bull Vmin_est gt Vmin_chip
bull 13mV margin
Optimized configbull Increase high
resistance passgatesbull Vmin_est asymp Vmin_chip
Benefits of tunability bull Compensate for difference
between model vs siliconbull Recover margin when variation is
reduced due to improved process
38ISVLSI-2014 invited talk 140710
Outlinebull Introductionbull Modeling of IC Variabilitybull Margining of IC Variabilitybull Tolerance of IC Variabilitybull Conclusions
39ISVLSI-2014 invited talk 140710
Conclusionsbull Variability severely challenges IC value
bull In manufacturing process during operation across lifetime
bull Benefit of ldquonext noderdquo is increasingly hard to findbull Entire node is a ldquo202020rdquo value propositionbull 5-10 in PPA metrics is now substantial at leading edge
bull Variability is connected to tapeout IC properties by models margins tolerances used in signoff
bull Some takeaways from this talkbull Substantial benefit from tightening BEOL corners (= signoff)bull ldquoMinimum cost of resiliencerdquo is a rich optimization challengebull Chicken-egg loops in signoff definition can be brokenbull Holistic approaches will provide ldquoequivalent scalingrdquo that
extends the value trajectory of Moorersquos Law
40ISVLSI-2014 invited talk 140710
Thank You
41ISVLSI-2014 invited talk 140710
Backup
42ISVLSI-2014 invited talk 140710
Power Penalty to Fix EM with AVS
1 2 3 4 5 6 7 8 91200
1300
1400
1500
1600
1700
030
032
034
036
Core Power (mW) PG Power (mW)
Implemetation
Core
Pow
er (m
W)
PG
Pow
er (m
W)
bull Core power increases due to elevated voltage bull PG power increases due to both elevated voltage and mesh degradationbull A tradeoff between invested guardband in signoff
Highest invested guardband
Least invested guardband
14 power penalty
>
43ISVLSI-2014 invited talk 140710
Homogeneous Cornersbull (1) Define RC corners of each layer separatelybull (2) Use corners from each layer to construct a
homogeneous corner for an interconnect stack
C-3σ
Layer M2
3σ
C-3σ
Layer M1
3σ
Interconnect stack with M1 and M2
M1 C
M2 C
3σ Pessimism
Example worst-case capacitance corner Homogeneous
Cw corner
44ISVLSI-2014 invited talk 140710
Homogeneous Cornersbull (1) Define RC corners of each layer separatelybull (2) Use corners from each layer to construct a
homogeneous corner for an interconnect stack
Interconnect stack with M1 and M2
M1 C
M2 C
3σ
Homogeneous Cw corner
C-3σ
Layer M2
3σ
C-3σ
Layer M1
3σ
Pessimism
Example worst-case capacitance corner
When variations in different layers are not fully correlated pessimism of homogeneous corners increase with layers
45ISVLSI-2014 invited talk 140710
Correlation Matrixbull Let Σ be the correlation matrix for variation sources
M1 M2 M3 M4
ΔW ΔT ΔH ΔW ΔT ΔH ΔW ΔT ΔH ΔW ΔT ΔH
M1 ΔW 1 0 0 γ 0 0 γ 0 0 0 0 0
ΔT 0 1 0 0 γ 0 0 γ 0 0 0 0
ΔH 0 0 1 0 0 γ 0 0 γ 0 0 0
M2 ΔW γ 0 0 1 0 0 γ 0 0 0 0 0
ΔT 0 γ 0 0 1 0 0 γ 0 0 0 0
ΔH 0 0 γ 0 0 1 0 0 γ 0 0 0
M3 ΔW γ 0 0 γ 0 0 1 0 0 0 0 0
ΔT 0 γ 0 0 γ 0 0 1 0 0 0 0
ΔH 0 0 γ 0 0 γ 0 0 1 0 0 0
M4 ΔW 0 0 0 0 0 0 0 0 0 1 0 0
ΔT 0 0 0 0 0 0 0 0 0 0 1 0
ΔH 0 0 0 0 0 0 0 0 0 0 0 1
= Σ
Correlation for variation sources with the same variation type and in the process module γ 05
Variation sources in different process modules are independent
46ISVLSI-2014 invited talk 140710
Wiring Structure in Timing-Critical Paths (2)
bull 92 of paths have lt 60 of wirelength on any single layer
Max wirelength ratio across all layers ()
Cum
ulati
ve p
roba
bilit
y
092
60
bull Variations in different layers are not fully correlated
bull Some paths have α gt 10 a CBC can underestimate delay variationsbull But these paths have larger delays at the other corner
C-worst corner underestimates delay variations but these paths are dominated by the RC-worst corner
α lt 10 delay variations are covered by the RC-worst corner
Dominated by C-worstΔdelay at C-worst gt Δdelay at RC-worst
Dominated by RC-worst Δdelay at RC-worst gt Δdelay at C-worst
Δdelay at RC-worst [d(Ycw) ndash d(Ytyp)] d(Ytyp)
48ISVLSI-2014 invited talk 140710
Delay Variation
α α
Δdelay at C-worst [d(Ycw) ndash d(Ytyp)] d(Ytyp)
bull Some paths have α gt 10 a CBC can underestimate delay variationsbull But these paths have larger delays at the other corner
C-worst corner underestimates delay variations but these paths are dominated by the RC-worst corner
α lt 10 delay variations are covered by the RC-worst corner
Dominated by C-worstΔdelay at C-worst gt Δdelay at RC-worst
Dominated by RC-worst Δdelay at RC-worst gt Δdelay at C-worst
Δdelay at RC-worst [d(Ycw) ndash d(Ytyp)] d(Ytyp)
bull Paths are more sensitive to R or to Cbull Using RC-worst or C-worst only will underestimate delay variationsbull Need both RC- and C-worst corners to cover process variationsbull In the following discussions α is defined at the dominant corner
49ISVLSI-2014 invited talk 140710
Non-Homogeneous Corner
bull Each layer can have different skewed variationsInterconnect stack with M1 and M2
M1 C
M2 C
3σ
Non-homogeneous cornerM1 == Cw (3σ)M2 == Ctyp
bull Less pessimism with non-homogeneous cornersbull Challenge
bull Many feasible combinationsbull A corner can only cover certain pathsbull How to choose the best combinations
50ISVLSI-2014 invited talk 140710
Opportunities for Tightened BEOL Corners
bull CBC can be pessimistic Most paths have α lt 05 bull Use tightened BEOL corners eg scale BEOL variation in
itf with α = 05
Δdj(Yrcw)dj(Ytyp) x 100
3σjd(Ytyp) x 100
Challenge how to avoid underestimating delay variation to preserve parametric yield
51ISVLSI-2014 invited talk 140710
Wiring Structure in Timing-Critical Paths
bull Critical paths are structurally similar
bull Wires on critical paths are routed on many layers
bull Structure is an outcome of the design flow
Testcasebull 45nm foundry library (wire
resistivity scaled by 8X)bull Netlist NETCARD 1mm2 570K
standard cell instancesbull 9 metal layersbull Extract critical paths from
different PVT and BEOL corners
Wirelength ratio ()
52ISVLSI-2014 invited talk 140710
Proposed Timing Signoff Flow
bull Extract RC at RC-worst C-worst and the typical corners
bull Calculate Δdelay of critical paths
bull Put path j in the group Gtbc if Δdelay is larger than a threshold
bull Fix only the paths in Gtbc using tightened BEOL corners
bull Since tightened corners have smaller delay variations timing closure is easier
Routed design
Timing analysis at BEOL corners Ytyp Ycw Yrcw
GTBC GCBC
ECOusing CBC
Timing analysis
using TBC
violation = 0
Timing analysis
using CBC
violation = 0
ECOusing TBC
done
53ISVLSI-2014 invited talk 140710
Experiment Setup
LEON3MP NETCARD SUPERBLUE12Clock period (ns) 18 20 31
bull Model BTI degradation with Vfinal throughout lifetime
bull Aging of a flat Vfinal asymp aging of an adaptive Vdd
bull But slightly pessimistic
Vdd
time
NBTI
PBTI
VBTI = Vlib asymp Vfinal
58ISVLSI-2014 invited talk 140710
Vfinal Estimation
bull Problem Vfinal is not available at early design stage (design has not been implemented)
bull Vfinal = Vdd end of life (to compensate BTI aging)
bull Gates along critical pathbull Timing slack at t = 0
bull Circuit activity is not an issue bull Because BTI effect is not sensitive to circuit activitybull DC or AC stress model is sufficient
59ISVLSI-2014 invited talk 140710
Observation and Heuristic 2
bull Observation 2 Vfinal is not sensitive to gate types
bull Heuristic 2 use average Vfinal of different gate typesbull Vfinal is a function of timing slack
bull Assume timing slack = 0
10mV
60ISVLSI-2014 invited talk 140710
Technology and Benchmark Circuits
bull NANGATE library with 32nm PTM technology bull Signoff for setup time violationbull Temperature = 125Cbull Process corner = slow NMOS and PMOSbull BTI degradation = DC AC
Supply voltages
Circuit Frequency (GHz)C5315 138c7552 125AES 089MPEG2 105
Vmax105V
Vinit090V
Vheur1 (DC) 097V
Vheur1 (AC) 095V
Vheur2 (DC) 095V
Vheur2 (AC) 093V
61ISVLSI-2014 invited talk 140710
A Reference Signoff Flow
bull Basic idea keep a consistent VBTI VLIB and Vdd throughout circuit lifetime
bull Signoff flowbull Estimate aging at each time step
bull Update circuit timing and Vdd
bull Repeat until t = tfinal
bull Modify circuit and start over if Vfinal gt maximum allowed voltage
bull No overhead in timing analysis but very slow Many STA runs
and library
Vstep AVS voltage stepVfinal converged voltage
62ISVLSI-2014 invited talk 140710
Experiment Setupbull Characterize different derated libraries
bull Evaluate impact of library characterizationbull Seven setups
1 VBTI = Vlib = Vinit Ignore AVS2 Most pessimistic derated library3 VBTI = Vlib = Vmax Extreme corner for AVS4 VBTI = Vfinal Do not overestimate aging but ignores AVS5 No derated library (reference)6 Proposed method with α=07 Proposed method with α=003
Case 1 2 3 4 5 6 7Vlib(V) Vinit Vinit Vmax Vinit NA Vheur1 Vheur2
VBTI (V) Vinit Vmax Vmax Vfinal NA Vheur1 Vheur2
63ISVLSI-2014 invited talk 140710
ldquoChicken and Eggrdquo Loop
bull ldquoChicken and eggrdquo loop in signoffbull Derated library characterization is related to BTI + AVSbull AVS affected by circuit implementation
bull Timing constraints critical paths etc
bull Circuit is affected by library characterization
Circuit
Derated Libraries
Vfinal
Vlib VBTI
64ISVLSI-2014 invited talk 140710
Bias Temperature Instability (BTI)
|ΔVth| increases when device is on (stressed)|ΔVth| is partially recovered when device is off (relaxed)NBTI PMOS PBTINMOS
|Vgs|
time
ON OFF ON OFF
[VattikondaWC06]
Device aging (|ΔVth|) accumulates over time
[TCASrsquo14]
65ISVLSI-2014 invited talk 140710
Observation 1
[Chan11]
bull BTI is a ldquofront-loadedrdquo phenomenon
bull 50 BTI aging happens within the 1st year of circuit lifetime (total lifetime = 10 years)
bull Most Vdd increment happens in early lifetime
bull Gap between Vdd and Vfinal reduces rapidly
asymp70 Vdd increment in 1 year(remaining 30 over 9 years)
Vfinal
66ISVLSI-2014 invited talk 140710
Results for DC Scenario
Optimistic signoff corner bull AVS increases supply voltage
aggressively to compensate agingbull Consume more powerbull May fail to meet timing if desired
bull Assume no EM fix at signoffbull BTI degradation is checked at each step and MTTF is updated as
30 MTTF penalty
200mV voltage compensation
>
70ISVLSI-2014 invited talk 140710
0 2 4 6 8 10 12090092094096098100102104
S1 S2 S3 S4 S5
Year
VDD
DMA 3S1 S2 S3 S4 S5
78
79
80
81
MTT
F (Y
ear)
EM Impact on AVS Scheduling
12 years MTTF penalty
>
71ISVLSI-2014 invited talk 140710
What is ldquoSignoffrdquo
bull Foundation of contract between design house and foundrybull ldquochip should workrdquo stack of models margins analysesbull Function timing signal integrity power integrity hellip
Nominal VddStatic IR drop
Power grid IR gradientDynamic IR
HCINBTI
Signoff Vdd
Voltage
Problem Margins = pessimism
overdesign schedule delay
ldquomargin stackrdquo for voltage signoff
Operating voltage
72ISVLSI-2014 invited talk 140710
Statistical Timing Analysis (1)
bull Delay sensitivity of path pj to variation source zv
bull Assumptions bull Δdjv is linear with respect to variation sources
bull Variation sources are normal distributions
bull Obtain Δdjv using 28 runs of RC extraction and static timing analysis (STA)
28 itf files (27 variation
sources + Ytyp)
Routed Netlist
RC extraction
STA
Δdjv
Δdjv = [ - ] 3dj(Yv) dj(Ytyp)
Note Path delay includes gate and wire delays
73ISVLSI-2014 invited talk 140710
Statistical Timing Analysis (2)bull Σ is the correlation matrix for variation sources (eg 27 x 27)
bull Σ = λλT (Note λ is obtained by Cholesky decomposition)
h h119879 119903119900119906119892 119901119906119905=1minus119864119877119879
+1minus119864119877119903times119879
recovery cycles
Clock period
Error rate [Kahng10]
76ISVLSI-2014 invited talk 140710
Selective-Endpoint Optimizationbull Optimize fanin cone w tighter constraints Allows replacement of Razor FF w normal FFbull Trade off cost of resilience vs data path optimization
Energy Reduction in AVS Contextbull Adaptive voltage scaling allows lower supply voltage for resilient
designs thus reduced powerbull Proposed method trades off between timing-error penalty vs
reduced power at a lower supply voltagebull Proposed method achieves an average of 18 energy reduction
compared to pure-margin designs Resilience benefits increase in the context of AVS strategy
084 088 092 096 10030
36
42
48
54
60brute-forcepure-marginCombOpt
Supply voltage (V)
En
erg
y (
mJ
)
084 086 088 09 092 094 096 098 1 10225
29
33
37
41
45brute-force
pure-margin
CombOpt
Supply voltage (V)
En
erg
y (
mJ
)
MUL EXU
Minimum achievable energy
80ISVLSI-2014 invited talk 140710
Our Concept Mode Dominancebull Design cone (of mode A) is the union of all the feasible operating modes for
circuits signed of at mode Abull Design cone is determined by tradeoff between voltage and frequency (mainly
threshold voltages)bull One mode is outside of the design cone of the other
failed design overdesignbull Mode A has positive timing slacks with respect to mode B
mode A dominates mode Bbull Equivalent dominance no mode is dominated by the other
bull Modes are in each othersrsquo design cone
Voltage
Frequency
A
Negative Slacks = failed design
Positive Slacks = overdesign
B
C
Design Cone of mode A
Multi-mode signoff at modes which do not exhibit equivalent dominance leads to overdesign
Guideline search for signoff modes within design cone reduce overdesign
81ISVLSI-2014 invited talk 140710
Our Method Global Optimizationbull Iteratively sample and refine power models
bull Avoid circuit implementation at each modebull Small constant of runs is enough Scalable
Sample (SPampR)
Construct power models
Estimate optimal signoff modes
Sample (SPampR)
Refine power models
Adaptive search
Global optimization flow
09 10 11 1214
15
16
17
18
19
201st 2nd real
Signoff Voltage (v)
Po
wer
(m
W)
Power estimation of adaptive search
bull Ovals indicate sample pointsbull 1st 2nd power from power models at first
second iterationbull real power from real implemented circuits
Design AESf 700MHz
82ISVLSI-2014 invited talk 140710
Classes of Closed-Loop AVS
bull Critical path may be difficult to identify (IP from 3rd party)
bull Calibrating monitors at multiple modesvoltages requires long test time
Closed-Loop AVS
Design-dependent replica
In-situmonitor
Generic monitor
bull Does not capture design-specific performance variation
82
This work Tunable monitor for closed-loop AVSbull Can be applied as a generic monitorbull Or tuned to capture design-specific performance
83ISVLSI-2014 invited talk 140710
Design of RO with Tunable Vmin
bull Identified two circuit knobs to tune Vmin
bull Series resistancebull Cell types (INV NAND NOR)
bull Proposed circuitbull Different cell type covers different process cornersbull Tune series resistance of each stage to high or low
1 bit 1 bit 1 bit Control pins
High resistance
Low resistance
84ISVLSI-2014 invited talk 140710
Benefit of Resilience Cost Reductionbull Reference flows
bull Pure-margin (PM) conventional methodology w only margin insertionbull Brute-force (BF) insert error-tolerant FFs at timing-critical endpoints
bull Proposed method (CO) achieves up to 20 energy reduction compared to reference methods
bull Resilience benefits increase with safety margin
PM BF CO PM BF CO PM BF CO25
30
35
40
45
50
55 Energy penalty of throughput degradationEnergy penalty of additional circuitsEnergy wo resilience
En
erg
y (
mJ
)
Large marginMedium marginSmall margin
MUL
PM BF CO PM BF CO PM BF CO25
27
29
31
33
35
En
erg
y (
mJ
)
EXU
Large marginMedium marginSmall margin
Smallmediumlarge margin safety margin = 51015 of clock period
85ISVLSI-2014 invited talk 140710
Increased Benefit of Resilience With AVSbull AVS (Adaptive Voltage Scaling) allows lower supply voltage for
resilient designs reduced powerbull We trade off between timing-error penalty vs reduced power at a
lower supply voltagebull Average 18 energy reduction compared to pure-margin designs
Resilience benefits increase in AVS context
084 088 092 096 10030
36
42
48
54
60brute-forcepure-marginCombOpt
Supply voltage (V)
En
erg
y (
mJ
)
084 086 088 09 092 094 096 098 1 10225
29
33
37
41
45brute-force
pure-margin
CombOpt
Supply voltage (V)
En
erg
y (
mJ
)
MUL EXU
Minimum achievable energy
86ISVLSI-2014 invited talk 140710
Overall Optimization Flow
bull Iteratively optimize with SEOpt and SkewOpt
Initial placement (all FFs = error-tolerant FFs)
Energy lt min energy
Save current solution
Margin insertion on K paths based on sensitivity function
Replace error-tolerant FFs w normal FFs
SEOpt
Activity-aware clock skew optimization
SkewOpt
Toward Holistic Modeling Margining and Tolerance of IC Variabi
IC Variability
Challenge Value of Technology
Solutions Modeling Margining Tolerance
Outline
BEOL Corner Optimization
Proposed Timing Signoff Flow
Conventional BEOL Corners
Statistical RC Model
Pessimism of Conventional BEOL Corners (CBC)
Intuition on Delay Variability Across Cw RCw
Intuition on Delay Variability Across Cw RCw (2)
Scaling Factor α and Delay Variation
Find Paths for Which TBCs Can Be Used
Determining α Arcw and Acw
Benefits of Tightened BEOL Corners
Outline (2)
How to Minimize Cost of Resilience
Tradeoff Resilience Cost vs Datapath Cost
Selective-Endpoint Optimization (SEOpt)
Clock Skew Optimization (SkewOpt)
Overall Optimization Flow
Benefit of Low-Cost Resilience
Increased Benefit of Resilience with AVS
Outline (3)
Breaking Chicken-Egg Loops Less Margin
Derated Library Characterization and AVS
Library Characterization for AVS
Power vs Area Across Different Signoffs
Heuristics 1
Vfinal Estimation
Observation and Heuristic 2
Proposed Library Characterization Flow
Power vs Area for All Designs
Also Multi-Mode Signoff Choices Matter
Also Tunable Monitors Less Margin
Also Tunable Monitors Less Margin (2)
Outline (4)
Conclusions
Thank You
Backup
Power Penalty to Fix EM with AVS
Homogeneous Corners
Homogeneous Corners (2)
Correlation Matrix
Wiring Structure in Timing-Critical Paths (2)
Delay Variation
Delay Variation (2)
Non-Homogeneous Corner
Opportunities for Tightened BEOL Corners
Wiring Structure in Timing-Critical Paths (2)
Proposed Timing Signoff Flow (2)
Experiment Setup
Further Analysis
Scaling Factor Results
Benefits of Tightened BEOL Corners (1)
Heuristics 1 (2)
Vfinal Estimation (2)
Observation and Heuristic 2 (2)
Technology and Benchmark Circuits
A Reference Signoff Flow
Experiment Setup (2)
ldquoChicken and Eggrdquo Loop
Bias Temperature Instability (BTI)
Observation 1
Results for DC Scenario
Problem Signoff Corner Definition
AVS Signoff Corner Selection
AVS Impact on EM Lifetime
EM Impact on AVS Scheduling
What is ldquoSignoffrdquo
Statistical Timing Analysis (1)
Statistical Timing Analysis (2)
Resilient Designs
Resilience Cost Reduction Problem
Selective-Endpoint Optimization
Process-Aware Vdd Scaling (PVS)
Challenge Variability
Energy Reduction in AVS Context
Our Concept Mode Dominance
Our Method Global Optimization
Classes of Closed-Loop AVS
Design of RO with Tunable Vmin
Benefit of Resilience Cost Reduction
Increased Benefit of Resilience With AVS
Overall Optimization Flow (2)
36ISVLSI-2014 invited talk 140710
Also Tunable Monitors Less Margin
Aggressive config Vmin_est lt Vmin_chip Some chips will fail
Optimized configbull Increase high
resistance passgatesbull Vmin_est asymp Vmin_chip
Default config
bull Low resistance passgates
bull Guardband for worst-case
bull Vmin_est gt Vmin_chip
bull 13mV margin
37ISVLSI-2014 invited talk 140710
Also Tunable Monitors Less Margin
Aggressive config Vmin_est lt Vmin_chip Some chips will fail
Default config
bull Low resistance passgates
bull Guardband for worst-case
bull Vmin_est gt Vmin_chip
bull 13mV margin
Optimized configbull Increase high
resistance passgatesbull Vmin_est asymp Vmin_chip
Benefits of tunability bull Compensate for difference
between model vs siliconbull Recover margin when variation is
reduced due to improved process
38ISVLSI-2014 invited talk 140710
Outlinebull Introductionbull Modeling of IC Variabilitybull Margining of IC Variabilitybull Tolerance of IC Variabilitybull Conclusions
39ISVLSI-2014 invited talk 140710
Conclusionsbull Variability severely challenges IC value
bull In manufacturing process during operation across lifetime
bull Benefit of ldquonext noderdquo is increasingly hard to findbull Entire node is a ldquo202020rdquo value propositionbull 5-10 in PPA metrics is now substantial at leading edge
bull Variability is connected to tapeout IC properties by models margins tolerances used in signoff
bull Some takeaways from this talkbull Substantial benefit from tightening BEOL corners (= signoff)bull ldquoMinimum cost of resiliencerdquo is a rich optimization challengebull Chicken-egg loops in signoff definition can be brokenbull Holistic approaches will provide ldquoequivalent scalingrdquo that
extends the value trajectory of Moorersquos Law
40ISVLSI-2014 invited talk 140710
Thank You
41ISVLSI-2014 invited talk 140710
Backup
42ISVLSI-2014 invited talk 140710
Power Penalty to Fix EM with AVS
1 2 3 4 5 6 7 8 91200
1300
1400
1500
1600
1700
030
032
034
036
Core Power (mW) PG Power (mW)
Implemetation
Core
Pow
er (m
W)
PG
Pow
er (m
W)
bull Core power increases due to elevated voltage bull PG power increases due to both elevated voltage and mesh degradationbull A tradeoff between invested guardband in signoff
Highest invested guardband
Least invested guardband
14 power penalty
>
43ISVLSI-2014 invited talk 140710
Homogeneous Cornersbull (1) Define RC corners of each layer separatelybull (2) Use corners from each layer to construct a
homogeneous corner for an interconnect stack
C-3σ
Layer M2
3σ
C-3σ
Layer M1
3σ
Interconnect stack with M1 and M2
M1 C
M2 C
3σ Pessimism
Example worst-case capacitance corner Homogeneous
Cw corner
44ISVLSI-2014 invited talk 140710
Homogeneous Cornersbull (1) Define RC corners of each layer separatelybull (2) Use corners from each layer to construct a
homogeneous corner for an interconnect stack
Interconnect stack with M1 and M2
M1 C
M2 C
3σ
Homogeneous Cw corner
C-3σ
Layer M2
3σ
C-3σ
Layer M1
3σ
Pessimism
Example worst-case capacitance corner
When variations in different layers are not fully correlated pessimism of homogeneous corners increase with layers
45ISVLSI-2014 invited talk 140710
Correlation Matrixbull Let Σ be the correlation matrix for variation sources
M1 M2 M3 M4
ΔW ΔT ΔH ΔW ΔT ΔH ΔW ΔT ΔH ΔW ΔT ΔH
M1 ΔW 1 0 0 γ 0 0 γ 0 0 0 0 0
ΔT 0 1 0 0 γ 0 0 γ 0 0 0 0
ΔH 0 0 1 0 0 γ 0 0 γ 0 0 0
M2 ΔW γ 0 0 1 0 0 γ 0 0 0 0 0
ΔT 0 γ 0 0 1 0 0 γ 0 0 0 0
ΔH 0 0 γ 0 0 1 0 0 γ 0 0 0
M3 ΔW γ 0 0 γ 0 0 1 0 0 0 0 0
ΔT 0 γ 0 0 γ 0 0 1 0 0 0 0
ΔH 0 0 γ 0 0 γ 0 0 1 0 0 0
M4 ΔW 0 0 0 0 0 0 0 0 0 1 0 0
ΔT 0 0 0 0 0 0 0 0 0 0 1 0
ΔH 0 0 0 0 0 0 0 0 0 0 0 1
= Σ
Correlation for variation sources with the same variation type and in the process module γ 05
Variation sources in different process modules are independent
46ISVLSI-2014 invited talk 140710
Wiring Structure in Timing-Critical Paths (2)
bull 92 of paths have lt 60 of wirelength on any single layer
Max wirelength ratio across all layers ()
Cum
ulati
ve p
roba
bilit
y
092
60
bull Variations in different layers are not fully correlated
bull Some paths have α gt 10 a CBC can underestimate delay variationsbull But these paths have larger delays at the other corner
C-worst corner underestimates delay variations but these paths are dominated by the RC-worst corner
α lt 10 delay variations are covered by the RC-worst corner
Dominated by C-worstΔdelay at C-worst gt Δdelay at RC-worst
Dominated by RC-worst Δdelay at RC-worst gt Δdelay at C-worst
Δdelay at RC-worst [d(Ycw) ndash d(Ytyp)] d(Ytyp)
48ISVLSI-2014 invited talk 140710
Delay Variation
α α
Δdelay at C-worst [d(Ycw) ndash d(Ytyp)] d(Ytyp)
bull Some paths have α gt 10 a CBC can underestimate delay variationsbull But these paths have larger delays at the other corner
C-worst corner underestimates delay variations but these paths are dominated by the RC-worst corner
α lt 10 delay variations are covered by the RC-worst corner
Dominated by C-worstΔdelay at C-worst gt Δdelay at RC-worst
Dominated by RC-worst Δdelay at RC-worst gt Δdelay at C-worst
Δdelay at RC-worst [d(Ycw) ndash d(Ytyp)] d(Ytyp)
bull Paths are more sensitive to R or to Cbull Using RC-worst or C-worst only will underestimate delay variationsbull Need both RC- and C-worst corners to cover process variationsbull In the following discussions α is defined at the dominant corner
49ISVLSI-2014 invited talk 140710
Non-Homogeneous Corner
bull Each layer can have different skewed variationsInterconnect stack with M1 and M2
M1 C
M2 C
3σ
Non-homogeneous cornerM1 == Cw (3σ)M2 == Ctyp
bull Less pessimism with non-homogeneous cornersbull Challenge
bull Many feasible combinationsbull A corner can only cover certain pathsbull How to choose the best combinations
50ISVLSI-2014 invited talk 140710
Opportunities for Tightened BEOL Corners
bull CBC can be pessimistic Most paths have α lt 05 bull Use tightened BEOL corners eg scale BEOL variation in
itf with α = 05
Δdj(Yrcw)dj(Ytyp) x 100
3σjd(Ytyp) x 100
Challenge how to avoid underestimating delay variation to preserve parametric yield
51ISVLSI-2014 invited talk 140710
Wiring Structure in Timing-Critical Paths
bull Critical paths are structurally similar
bull Wires on critical paths are routed on many layers
bull Structure is an outcome of the design flow
Testcasebull 45nm foundry library (wire
resistivity scaled by 8X)bull Netlist NETCARD 1mm2 570K
standard cell instancesbull 9 metal layersbull Extract critical paths from
different PVT and BEOL corners
Wirelength ratio ()
52ISVLSI-2014 invited talk 140710
Proposed Timing Signoff Flow
bull Extract RC at RC-worst C-worst and the typical corners
bull Calculate Δdelay of critical paths
bull Put path j in the group Gtbc if Δdelay is larger than a threshold
bull Fix only the paths in Gtbc using tightened BEOL corners
bull Since tightened corners have smaller delay variations timing closure is easier
Routed design
Timing analysis at BEOL corners Ytyp Ycw Yrcw
GTBC GCBC
ECOusing CBC
Timing analysis
using TBC
violation = 0
Timing analysis
using CBC
violation = 0
ECOusing TBC
done
53ISVLSI-2014 invited talk 140710
Experiment Setup
LEON3MP NETCARD SUPERBLUE12Clock period (ns) 18 20 31
bull Model BTI degradation with Vfinal throughout lifetime
bull Aging of a flat Vfinal asymp aging of an adaptive Vdd
bull But slightly pessimistic
Vdd
time
NBTI
PBTI
VBTI = Vlib asymp Vfinal
58ISVLSI-2014 invited talk 140710
Vfinal Estimation
bull Problem Vfinal is not available at early design stage (design has not been implemented)
bull Vfinal = Vdd end of life (to compensate BTI aging)
bull Gates along critical pathbull Timing slack at t = 0
bull Circuit activity is not an issue bull Because BTI effect is not sensitive to circuit activitybull DC or AC stress model is sufficient
59ISVLSI-2014 invited talk 140710
Observation and Heuristic 2
bull Observation 2 Vfinal is not sensitive to gate types
bull Heuristic 2 use average Vfinal of different gate typesbull Vfinal is a function of timing slack
bull Assume timing slack = 0
10mV
60ISVLSI-2014 invited talk 140710
Technology and Benchmark Circuits
bull NANGATE library with 32nm PTM technology bull Signoff for setup time violationbull Temperature = 125Cbull Process corner = slow NMOS and PMOSbull BTI degradation = DC AC
Supply voltages
Circuit Frequency (GHz)C5315 138c7552 125AES 089MPEG2 105
Vmax105V
Vinit090V
Vheur1 (DC) 097V
Vheur1 (AC) 095V
Vheur2 (DC) 095V
Vheur2 (AC) 093V
61ISVLSI-2014 invited talk 140710
A Reference Signoff Flow
bull Basic idea keep a consistent VBTI VLIB and Vdd throughout circuit lifetime
bull Signoff flowbull Estimate aging at each time step
bull Update circuit timing and Vdd
bull Repeat until t = tfinal
bull Modify circuit and start over if Vfinal gt maximum allowed voltage
bull No overhead in timing analysis but very slow Many STA runs
and library
Vstep AVS voltage stepVfinal converged voltage
62ISVLSI-2014 invited talk 140710
Experiment Setupbull Characterize different derated libraries
bull Evaluate impact of library characterizationbull Seven setups
1 VBTI = Vlib = Vinit Ignore AVS2 Most pessimistic derated library3 VBTI = Vlib = Vmax Extreme corner for AVS4 VBTI = Vfinal Do not overestimate aging but ignores AVS5 No derated library (reference)6 Proposed method with α=07 Proposed method with α=003
Case 1 2 3 4 5 6 7Vlib(V) Vinit Vinit Vmax Vinit NA Vheur1 Vheur2
VBTI (V) Vinit Vmax Vmax Vfinal NA Vheur1 Vheur2
63ISVLSI-2014 invited talk 140710
ldquoChicken and Eggrdquo Loop
bull ldquoChicken and eggrdquo loop in signoffbull Derated library characterization is related to BTI + AVSbull AVS affected by circuit implementation
bull Timing constraints critical paths etc
bull Circuit is affected by library characterization
Circuit
Derated Libraries
Vfinal
Vlib VBTI
64ISVLSI-2014 invited talk 140710
Bias Temperature Instability (BTI)
|ΔVth| increases when device is on (stressed)|ΔVth| is partially recovered when device is off (relaxed)NBTI PMOS PBTINMOS
|Vgs|
time
ON OFF ON OFF
[VattikondaWC06]
Device aging (|ΔVth|) accumulates over time
[TCASrsquo14]
65ISVLSI-2014 invited talk 140710
Observation 1
[Chan11]
bull BTI is a ldquofront-loadedrdquo phenomenon
bull 50 BTI aging happens within the 1st year of circuit lifetime (total lifetime = 10 years)
bull Most Vdd increment happens in early lifetime
bull Gap between Vdd and Vfinal reduces rapidly
asymp70 Vdd increment in 1 year(remaining 30 over 9 years)
Vfinal
66ISVLSI-2014 invited talk 140710
Results for DC Scenario
Optimistic signoff corner bull AVS increases supply voltage
aggressively to compensate agingbull Consume more powerbull May fail to meet timing if desired
bull Assume no EM fix at signoffbull BTI degradation is checked at each step and MTTF is updated as
30 MTTF penalty
200mV voltage compensation
>
70ISVLSI-2014 invited talk 140710
0 2 4 6 8 10 12090092094096098100102104
S1 S2 S3 S4 S5
Year
VDD
DMA 3S1 S2 S3 S4 S5
78
79
80
81
MTT
F (Y
ear)
EM Impact on AVS Scheduling
12 years MTTF penalty
>
71ISVLSI-2014 invited talk 140710
What is ldquoSignoffrdquo
bull Foundation of contract between design house and foundrybull ldquochip should workrdquo stack of models margins analysesbull Function timing signal integrity power integrity hellip
Nominal VddStatic IR drop
Power grid IR gradientDynamic IR
HCINBTI
Signoff Vdd
Voltage
Problem Margins = pessimism
overdesign schedule delay
ldquomargin stackrdquo for voltage signoff
Operating voltage
72ISVLSI-2014 invited talk 140710
Statistical Timing Analysis (1)
bull Delay sensitivity of path pj to variation source zv
bull Assumptions bull Δdjv is linear with respect to variation sources
bull Variation sources are normal distributions
bull Obtain Δdjv using 28 runs of RC extraction and static timing analysis (STA)
28 itf files (27 variation
sources + Ytyp)
Routed Netlist
RC extraction
STA
Δdjv
Δdjv = [ - ] 3dj(Yv) dj(Ytyp)
Note Path delay includes gate and wire delays
73ISVLSI-2014 invited talk 140710
Statistical Timing Analysis (2)bull Σ is the correlation matrix for variation sources (eg 27 x 27)
bull Σ = λλT (Note λ is obtained by Cholesky decomposition)
h h119879 119903119900119906119892 119901119906119905=1minus119864119877119879
+1minus119864119877119903times119879
recovery cycles
Clock period
Error rate [Kahng10]
76ISVLSI-2014 invited talk 140710
Selective-Endpoint Optimizationbull Optimize fanin cone w tighter constraints Allows replacement of Razor FF w normal FFbull Trade off cost of resilience vs data path optimization
Energy Reduction in AVS Contextbull Adaptive voltage scaling allows lower supply voltage for resilient
designs thus reduced powerbull Proposed method trades off between timing-error penalty vs
reduced power at a lower supply voltagebull Proposed method achieves an average of 18 energy reduction
compared to pure-margin designs Resilience benefits increase in the context of AVS strategy
084 088 092 096 10030
36
42
48
54
60brute-forcepure-marginCombOpt
Supply voltage (V)
En
erg
y (
mJ
)
084 086 088 09 092 094 096 098 1 10225
29
33
37
41
45brute-force
pure-margin
CombOpt
Supply voltage (V)
En
erg
y (
mJ
)
MUL EXU
Minimum achievable energy
80ISVLSI-2014 invited talk 140710
Our Concept Mode Dominancebull Design cone (of mode A) is the union of all the feasible operating modes for
circuits signed of at mode Abull Design cone is determined by tradeoff between voltage and frequency (mainly
threshold voltages)bull One mode is outside of the design cone of the other
failed design overdesignbull Mode A has positive timing slacks with respect to mode B
mode A dominates mode Bbull Equivalent dominance no mode is dominated by the other
bull Modes are in each othersrsquo design cone
Voltage
Frequency
A
Negative Slacks = failed design
Positive Slacks = overdesign
B
C
Design Cone of mode A
Multi-mode signoff at modes which do not exhibit equivalent dominance leads to overdesign
Guideline search for signoff modes within design cone reduce overdesign
81ISVLSI-2014 invited talk 140710
Our Method Global Optimizationbull Iteratively sample and refine power models
bull Avoid circuit implementation at each modebull Small constant of runs is enough Scalable
Sample (SPampR)
Construct power models
Estimate optimal signoff modes
Sample (SPampR)
Refine power models
Adaptive search
Global optimization flow
09 10 11 1214
15
16
17
18
19
201st 2nd real
Signoff Voltage (v)
Po
wer
(m
W)
Power estimation of adaptive search
bull Ovals indicate sample pointsbull 1st 2nd power from power models at first
second iterationbull real power from real implemented circuits
Design AESf 700MHz
82ISVLSI-2014 invited talk 140710
Classes of Closed-Loop AVS
bull Critical path may be difficult to identify (IP from 3rd party)
bull Calibrating monitors at multiple modesvoltages requires long test time
Closed-Loop AVS
Design-dependent replica
In-situmonitor
Generic monitor
bull Does not capture design-specific performance variation
82
This work Tunable monitor for closed-loop AVSbull Can be applied as a generic monitorbull Or tuned to capture design-specific performance
83ISVLSI-2014 invited talk 140710
Design of RO with Tunable Vmin
bull Identified two circuit knobs to tune Vmin
bull Series resistancebull Cell types (INV NAND NOR)
bull Proposed circuitbull Different cell type covers different process cornersbull Tune series resistance of each stage to high or low
1 bit 1 bit 1 bit Control pins
High resistance
Low resistance
84ISVLSI-2014 invited talk 140710
Benefit of Resilience Cost Reductionbull Reference flows
bull Pure-margin (PM) conventional methodology w only margin insertionbull Brute-force (BF) insert error-tolerant FFs at timing-critical endpoints
bull Proposed method (CO) achieves up to 20 energy reduction compared to reference methods
bull Resilience benefits increase with safety margin
PM BF CO PM BF CO PM BF CO25
30
35
40
45
50
55 Energy penalty of throughput degradationEnergy penalty of additional circuitsEnergy wo resilience
En
erg
y (
mJ
)
Large marginMedium marginSmall margin
MUL
PM BF CO PM BF CO PM BF CO25
27
29
31
33
35
En
erg
y (
mJ
)
EXU
Large marginMedium marginSmall margin
Smallmediumlarge margin safety margin = 51015 of clock period
85ISVLSI-2014 invited talk 140710
Increased Benefit of Resilience With AVSbull AVS (Adaptive Voltage Scaling) allows lower supply voltage for
resilient designs reduced powerbull We trade off between timing-error penalty vs reduced power at a
lower supply voltagebull Average 18 energy reduction compared to pure-margin designs
Resilience benefits increase in AVS context
084 088 092 096 10030
36
42
48
54
60brute-forcepure-marginCombOpt
Supply voltage (V)
En
erg
y (
mJ
)
084 086 088 09 092 094 096 098 1 10225
29
33
37
41
45brute-force
pure-margin
CombOpt
Supply voltage (V)
En
erg
y (
mJ
)
MUL EXU
Minimum achievable energy
86ISVLSI-2014 invited talk 140710
Overall Optimization Flow
bull Iteratively optimize with SEOpt and SkewOpt
Initial placement (all FFs = error-tolerant FFs)
Energy lt min energy
Save current solution
Margin insertion on K paths based on sensitivity function
Replace error-tolerant FFs w normal FFs
SEOpt
Activity-aware clock skew optimization
SkewOpt
Toward Holistic Modeling Margining and Tolerance of IC Variabi
IC Variability
Challenge Value of Technology
Solutions Modeling Margining Tolerance
Outline
BEOL Corner Optimization
Proposed Timing Signoff Flow
Conventional BEOL Corners
Statistical RC Model
Pessimism of Conventional BEOL Corners (CBC)
Intuition on Delay Variability Across Cw RCw
Intuition on Delay Variability Across Cw RCw (2)
Scaling Factor α and Delay Variation
Find Paths for Which TBCs Can Be Used
Determining α Arcw and Acw
Benefits of Tightened BEOL Corners
Outline (2)
How to Minimize Cost of Resilience
Tradeoff Resilience Cost vs Datapath Cost
Selective-Endpoint Optimization (SEOpt)
Clock Skew Optimization (SkewOpt)
Overall Optimization Flow
Benefit of Low-Cost Resilience
Increased Benefit of Resilience with AVS
Outline (3)
Breaking Chicken-Egg Loops Less Margin
Derated Library Characterization and AVS
Library Characterization for AVS
Power vs Area Across Different Signoffs
Heuristics 1
Vfinal Estimation
Observation and Heuristic 2
Proposed Library Characterization Flow
Power vs Area for All Designs
Also Multi-Mode Signoff Choices Matter
Also Tunable Monitors Less Margin
Also Tunable Monitors Less Margin (2)
Outline (4)
Conclusions
Thank You
Backup
Power Penalty to Fix EM with AVS
Homogeneous Corners
Homogeneous Corners (2)
Correlation Matrix
Wiring Structure in Timing-Critical Paths (2)
Delay Variation
Delay Variation (2)
Non-Homogeneous Corner
Opportunities for Tightened BEOL Corners
Wiring Structure in Timing-Critical Paths (2)
Proposed Timing Signoff Flow (2)
Experiment Setup
Further Analysis
Scaling Factor Results
Benefits of Tightened BEOL Corners (1)
Heuristics 1 (2)
Vfinal Estimation (2)
Observation and Heuristic 2 (2)
Technology and Benchmark Circuits
A Reference Signoff Flow
Experiment Setup (2)
ldquoChicken and Eggrdquo Loop
Bias Temperature Instability (BTI)
Observation 1
Results for DC Scenario
Problem Signoff Corner Definition
AVS Signoff Corner Selection
AVS Impact on EM Lifetime
EM Impact on AVS Scheduling
What is ldquoSignoffrdquo
Statistical Timing Analysis (1)
Statistical Timing Analysis (2)
Resilient Designs
Resilience Cost Reduction Problem
Selective-Endpoint Optimization
Process-Aware Vdd Scaling (PVS)
Challenge Variability
Energy Reduction in AVS Context
Our Concept Mode Dominance
Our Method Global Optimization
Classes of Closed-Loop AVS
Design of RO with Tunable Vmin
Benefit of Resilience Cost Reduction
Increased Benefit of Resilience With AVS
Overall Optimization Flow (2)
37ISVLSI-2014 invited talk 140710
Also Tunable Monitors Less Margin
Aggressive config Vmin_est lt Vmin_chip Some chips will fail
Default config
bull Low resistance passgates
bull Guardband for worst-case
bull Vmin_est gt Vmin_chip
bull 13mV margin
Optimized configbull Increase high
resistance passgatesbull Vmin_est asymp Vmin_chip
Benefits of tunability bull Compensate for difference
between model vs siliconbull Recover margin when variation is
reduced due to improved process
38ISVLSI-2014 invited talk 140710
Outlinebull Introductionbull Modeling of IC Variabilitybull Margining of IC Variabilitybull Tolerance of IC Variabilitybull Conclusions
39ISVLSI-2014 invited talk 140710
Conclusionsbull Variability severely challenges IC value
bull In manufacturing process during operation across lifetime
bull Benefit of ldquonext noderdquo is increasingly hard to findbull Entire node is a ldquo202020rdquo value propositionbull 5-10 in PPA metrics is now substantial at leading edge
bull Variability is connected to tapeout IC properties by models margins tolerances used in signoff
bull Some takeaways from this talkbull Substantial benefit from tightening BEOL corners (= signoff)bull ldquoMinimum cost of resiliencerdquo is a rich optimization challengebull Chicken-egg loops in signoff definition can be brokenbull Holistic approaches will provide ldquoequivalent scalingrdquo that
extends the value trajectory of Moorersquos Law
40ISVLSI-2014 invited talk 140710
Thank You
41ISVLSI-2014 invited talk 140710
Backup
42ISVLSI-2014 invited talk 140710
Power Penalty to Fix EM with AVS
1 2 3 4 5 6 7 8 91200
1300
1400
1500
1600
1700
030
032
034
036
Core Power (mW) PG Power (mW)
Implemetation
Core
Pow
er (m
W)
PG
Pow
er (m
W)
bull Core power increases due to elevated voltage bull PG power increases due to both elevated voltage and mesh degradationbull A tradeoff between invested guardband in signoff
Highest invested guardband
Least invested guardband
14 power penalty
>
43ISVLSI-2014 invited talk 140710
Homogeneous Cornersbull (1) Define RC corners of each layer separatelybull (2) Use corners from each layer to construct a
homogeneous corner for an interconnect stack
C-3σ
Layer M2
3σ
C-3σ
Layer M1
3σ
Interconnect stack with M1 and M2
M1 C
M2 C
3σ Pessimism
Example worst-case capacitance corner Homogeneous
Cw corner
44ISVLSI-2014 invited talk 140710
Homogeneous Cornersbull (1) Define RC corners of each layer separatelybull (2) Use corners from each layer to construct a
homogeneous corner for an interconnect stack
Interconnect stack with M1 and M2
M1 C
M2 C
3σ
Homogeneous Cw corner
C-3σ
Layer M2
3σ
C-3σ
Layer M1
3σ
Pessimism
Example worst-case capacitance corner
When variations in different layers are not fully correlated pessimism of homogeneous corners increase with layers
45ISVLSI-2014 invited talk 140710
Correlation Matrixbull Let Σ be the correlation matrix for variation sources
M1 M2 M3 M4
ΔW ΔT ΔH ΔW ΔT ΔH ΔW ΔT ΔH ΔW ΔT ΔH
M1 ΔW 1 0 0 γ 0 0 γ 0 0 0 0 0
ΔT 0 1 0 0 γ 0 0 γ 0 0 0 0
ΔH 0 0 1 0 0 γ 0 0 γ 0 0 0
M2 ΔW γ 0 0 1 0 0 γ 0 0 0 0 0
ΔT 0 γ 0 0 1 0 0 γ 0 0 0 0
ΔH 0 0 γ 0 0 1 0 0 γ 0 0 0
M3 ΔW γ 0 0 γ 0 0 1 0 0 0 0 0
ΔT 0 γ 0 0 γ 0 0 1 0 0 0 0
ΔH 0 0 γ 0 0 γ 0 0 1 0 0 0
M4 ΔW 0 0 0 0 0 0 0 0 0 1 0 0
ΔT 0 0 0 0 0 0 0 0 0 0 1 0
ΔH 0 0 0 0 0 0 0 0 0 0 0 1
= Σ
Correlation for variation sources with the same variation type and in the process module γ 05
Variation sources in different process modules are independent
46ISVLSI-2014 invited talk 140710
Wiring Structure in Timing-Critical Paths (2)
bull 92 of paths have lt 60 of wirelength on any single layer
Max wirelength ratio across all layers ()
Cum
ulati
ve p
roba
bilit
y
092
60
bull Variations in different layers are not fully correlated
bull Some paths have α gt 10 a CBC can underestimate delay variationsbull But these paths have larger delays at the other corner
C-worst corner underestimates delay variations but these paths are dominated by the RC-worst corner
α lt 10 delay variations are covered by the RC-worst corner
Dominated by C-worstΔdelay at C-worst gt Δdelay at RC-worst
Dominated by RC-worst Δdelay at RC-worst gt Δdelay at C-worst
Δdelay at RC-worst [d(Ycw) ndash d(Ytyp)] d(Ytyp)
48ISVLSI-2014 invited talk 140710
Delay Variation
α α
Δdelay at C-worst [d(Ycw) ndash d(Ytyp)] d(Ytyp)
bull Some paths have α gt 10 a CBC can underestimate delay variationsbull But these paths have larger delays at the other corner
C-worst corner underestimates delay variations but these paths are dominated by the RC-worst corner
α lt 10 delay variations are covered by the RC-worst corner
Dominated by C-worstΔdelay at C-worst gt Δdelay at RC-worst
Dominated by RC-worst Δdelay at RC-worst gt Δdelay at C-worst
Δdelay at RC-worst [d(Ycw) ndash d(Ytyp)] d(Ytyp)
bull Paths are more sensitive to R or to Cbull Using RC-worst or C-worst only will underestimate delay variationsbull Need both RC- and C-worst corners to cover process variationsbull In the following discussions α is defined at the dominant corner
49ISVLSI-2014 invited talk 140710
Non-Homogeneous Corner
bull Each layer can have different skewed variationsInterconnect stack with M1 and M2
M1 C
M2 C
3σ
Non-homogeneous cornerM1 == Cw (3σ)M2 == Ctyp
bull Less pessimism with non-homogeneous cornersbull Challenge
bull Many feasible combinationsbull A corner can only cover certain pathsbull How to choose the best combinations
50ISVLSI-2014 invited talk 140710
Opportunities for Tightened BEOL Corners
bull CBC can be pessimistic Most paths have α lt 05 bull Use tightened BEOL corners eg scale BEOL variation in
itf with α = 05
Δdj(Yrcw)dj(Ytyp) x 100
3σjd(Ytyp) x 100
Challenge how to avoid underestimating delay variation to preserve parametric yield
51ISVLSI-2014 invited talk 140710
Wiring Structure in Timing-Critical Paths
bull Critical paths are structurally similar
bull Wires on critical paths are routed on many layers
bull Structure is an outcome of the design flow
Testcasebull 45nm foundry library (wire
resistivity scaled by 8X)bull Netlist NETCARD 1mm2 570K
standard cell instancesbull 9 metal layersbull Extract critical paths from
different PVT and BEOL corners
Wirelength ratio ()
52ISVLSI-2014 invited talk 140710
Proposed Timing Signoff Flow
bull Extract RC at RC-worst C-worst and the typical corners
bull Calculate Δdelay of critical paths
bull Put path j in the group Gtbc if Δdelay is larger than a threshold
bull Fix only the paths in Gtbc using tightened BEOL corners
bull Since tightened corners have smaller delay variations timing closure is easier
Routed design
Timing analysis at BEOL corners Ytyp Ycw Yrcw
GTBC GCBC
ECOusing CBC
Timing analysis
using TBC
violation = 0
Timing analysis
using CBC
violation = 0
ECOusing TBC
done
53ISVLSI-2014 invited talk 140710
Experiment Setup
LEON3MP NETCARD SUPERBLUE12Clock period (ns) 18 20 31
bull Model BTI degradation with Vfinal throughout lifetime
bull Aging of a flat Vfinal asymp aging of an adaptive Vdd
bull But slightly pessimistic
Vdd
time
NBTI
PBTI
VBTI = Vlib asymp Vfinal
58ISVLSI-2014 invited talk 140710
Vfinal Estimation
bull Problem Vfinal is not available at early design stage (design has not been implemented)
bull Vfinal = Vdd end of life (to compensate BTI aging)
bull Gates along critical pathbull Timing slack at t = 0
bull Circuit activity is not an issue bull Because BTI effect is not sensitive to circuit activitybull DC or AC stress model is sufficient
59ISVLSI-2014 invited talk 140710
Observation and Heuristic 2
bull Observation 2 Vfinal is not sensitive to gate types
bull Heuristic 2 use average Vfinal of different gate typesbull Vfinal is a function of timing slack
bull Assume timing slack = 0
10mV
60ISVLSI-2014 invited talk 140710
Technology and Benchmark Circuits
bull NANGATE library with 32nm PTM technology bull Signoff for setup time violationbull Temperature = 125Cbull Process corner = slow NMOS and PMOSbull BTI degradation = DC AC
Supply voltages
Circuit Frequency (GHz)C5315 138c7552 125AES 089MPEG2 105
Vmax105V
Vinit090V
Vheur1 (DC) 097V
Vheur1 (AC) 095V
Vheur2 (DC) 095V
Vheur2 (AC) 093V
61ISVLSI-2014 invited talk 140710
A Reference Signoff Flow
bull Basic idea keep a consistent VBTI VLIB and Vdd throughout circuit lifetime
bull Signoff flowbull Estimate aging at each time step
bull Update circuit timing and Vdd
bull Repeat until t = tfinal
bull Modify circuit and start over if Vfinal gt maximum allowed voltage
bull No overhead in timing analysis but very slow Many STA runs
and library
Vstep AVS voltage stepVfinal converged voltage
62ISVLSI-2014 invited talk 140710
Experiment Setupbull Characterize different derated libraries
bull Evaluate impact of library characterizationbull Seven setups
1 VBTI = Vlib = Vinit Ignore AVS2 Most pessimistic derated library3 VBTI = Vlib = Vmax Extreme corner for AVS4 VBTI = Vfinal Do not overestimate aging but ignores AVS5 No derated library (reference)6 Proposed method with α=07 Proposed method with α=003
Case 1 2 3 4 5 6 7Vlib(V) Vinit Vinit Vmax Vinit NA Vheur1 Vheur2
VBTI (V) Vinit Vmax Vmax Vfinal NA Vheur1 Vheur2
63ISVLSI-2014 invited talk 140710
ldquoChicken and Eggrdquo Loop
bull ldquoChicken and eggrdquo loop in signoffbull Derated library characterization is related to BTI + AVSbull AVS affected by circuit implementation
bull Timing constraints critical paths etc
bull Circuit is affected by library characterization
Circuit
Derated Libraries
Vfinal
Vlib VBTI
64ISVLSI-2014 invited talk 140710
Bias Temperature Instability (BTI)
|ΔVth| increases when device is on (stressed)|ΔVth| is partially recovered when device is off (relaxed)NBTI PMOS PBTINMOS
|Vgs|
time
ON OFF ON OFF
[VattikondaWC06]
Device aging (|ΔVth|) accumulates over time
[TCASrsquo14]
65ISVLSI-2014 invited talk 140710
Observation 1
[Chan11]
bull BTI is a ldquofront-loadedrdquo phenomenon
bull 50 BTI aging happens within the 1st year of circuit lifetime (total lifetime = 10 years)
bull Most Vdd increment happens in early lifetime
bull Gap between Vdd and Vfinal reduces rapidly
asymp70 Vdd increment in 1 year(remaining 30 over 9 years)
Vfinal
66ISVLSI-2014 invited talk 140710
Results for DC Scenario
Optimistic signoff corner bull AVS increases supply voltage
aggressively to compensate agingbull Consume more powerbull May fail to meet timing if desired
bull Assume no EM fix at signoffbull BTI degradation is checked at each step and MTTF is updated as
30 MTTF penalty
200mV voltage compensation
>
70ISVLSI-2014 invited talk 140710
0 2 4 6 8 10 12090092094096098100102104
S1 S2 S3 S4 S5
Year
VDD
DMA 3S1 S2 S3 S4 S5
78
79
80
81
MTT
F (Y
ear)
EM Impact on AVS Scheduling
12 years MTTF penalty
>
71ISVLSI-2014 invited talk 140710
What is ldquoSignoffrdquo
bull Foundation of contract between design house and foundrybull ldquochip should workrdquo stack of models margins analysesbull Function timing signal integrity power integrity hellip
Nominal VddStatic IR drop
Power grid IR gradientDynamic IR
HCINBTI
Signoff Vdd
Voltage
Problem Margins = pessimism
overdesign schedule delay
ldquomargin stackrdquo for voltage signoff
Operating voltage
72ISVLSI-2014 invited talk 140710
Statistical Timing Analysis (1)
bull Delay sensitivity of path pj to variation source zv
bull Assumptions bull Δdjv is linear with respect to variation sources
bull Variation sources are normal distributions
bull Obtain Δdjv using 28 runs of RC extraction and static timing analysis (STA)
28 itf files (27 variation
sources + Ytyp)
Routed Netlist
RC extraction
STA
Δdjv
Δdjv = [ - ] 3dj(Yv) dj(Ytyp)
Note Path delay includes gate and wire delays
73ISVLSI-2014 invited talk 140710
Statistical Timing Analysis (2)bull Σ is the correlation matrix for variation sources (eg 27 x 27)
bull Σ = λλT (Note λ is obtained by Cholesky decomposition)
h h119879 119903119900119906119892 119901119906119905=1minus119864119877119879
+1minus119864119877119903times119879
recovery cycles
Clock period
Error rate [Kahng10]
76ISVLSI-2014 invited talk 140710
Selective-Endpoint Optimizationbull Optimize fanin cone w tighter constraints Allows replacement of Razor FF w normal FFbull Trade off cost of resilience vs data path optimization
Energy Reduction in AVS Contextbull Adaptive voltage scaling allows lower supply voltage for resilient
designs thus reduced powerbull Proposed method trades off between timing-error penalty vs
reduced power at a lower supply voltagebull Proposed method achieves an average of 18 energy reduction
compared to pure-margin designs Resilience benefits increase in the context of AVS strategy
084 088 092 096 10030
36
42
48
54
60brute-forcepure-marginCombOpt
Supply voltage (V)
En
erg
y (
mJ
)
084 086 088 09 092 094 096 098 1 10225
29
33
37
41
45brute-force
pure-margin
CombOpt
Supply voltage (V)
En
erg
y (
mJ
)
MUL EXU
Minimum achievable energy
80ISVLSI-2014 invited talk 140710
Our Concept Mode Dominancebull Design cone (of mode A) is the union of all the feasible operating modes for
circuits signed of at mode Abull Design cone is determined by tradeoff between voltage and frequency (mainly
threshold voltages)bull One mode is outside of the design cone of the other
failed design overdesignbull Mode A has positive timing slacks with respect to mode B
mode A dominates mode Bbull Equivalent dominance no mode is dominated by the other
bull Modes are in each othersrsquo design cone
Voltage
Frequency
A
Negative Slacks = failed design
Positive Slacks = overdesign
B
C
Design Cone of mode A
Multi-mode signoff at modes which do not exhibit equivalent dominance leads to overdesign
Guideline search for signoff modes within design cone reduce overdesign
81ISVLSI-2014 invited talk 140710
Our Method Global Optimizationbull Iteratively sample and refine power models
bull Avoid circuit implementation at each modebull Small constant of runs is enough Scalable
Sample (SPampR)
Construct power models
Estimate optimal signoff modes
Sample (SPampR)
Refine power models
Adaptive search
Global optimization flow
09 10 11 1214
15
16
17
18
19
201st 2nd real
Signoff Voltage (v)
Po
wer
(m
W)
Power estimation of adaptive search
bull Ovals indicate sample pointsbull 1st 2nd power from power models at first
second iterationbull real power from real implemented circuits
Design AESf 700MHz
82ISVLSI-2014 invited talk 140710
Classes of Closed-Loop AVS
bull Critical path may be difficult to identify (IP from 3rd party)
bull Calibrating monitors at multiple modesvoltages requires long test time
Closed-Loop AVS
Design-dependent replica
In-situmonitor
Generic monitor
bull Does not capture design-specific performance variation
82
This work Tunable monitor for closed-loop AVSbull Can be applied as a generic monitorbull Or tuned to capture design-specific performance
83ISVLSI-2014 invited talk 140710
Design of RO with Tunable Vmin
bull Identified two circuit knobs to tune Vmin
bull Series resistancebull Cell types (INV NAND NOR)
bull Proposed circuitbull Different cell type covers different process cornersbull Tune series resistance of each stage to high or low
1 bit 1 bit 1 bit Control pins
High resistance
Low resistance
84ISVLSI-2014 invited talk 140710
Benefit of Resilience Cost Reductionbull Reference flows
bull Pure-margin (PM) conventional methodology w only margin insertionbull Brute-force (BF) insert error-tolerant FFs at timing-critical endpoints
bull Proposed method (CO) achieves up to 20 energy reduction compared to reference methods
bull Resilience benefits increase with safety margin
PM BF CO PM BF CO PM BF CO25
30
35
40
45
50
55 Energy penalty of throughput degradationEnergy penalty of additional circuitsEnergy wo resilience
En
erg
y (
mJ
)
Large marginMedium marginSmall margin
MUL
PM BF CO PM BF CO PM BF CO25
27
29
31
33
35
En
erg
y (
mJ
)
EXU
Large marginMedium marginSmall margin
Smallmediumlarge margin safety margin = 51015 of clock period
85ISVLSI-2014 invited talk 140710
Increased Benefit of Resilience With AVSbull AVS (Adaptive Voltage Scaling) allows lower supply voltage for
resilient designs reduced powerbull We trade off between timing-error penalty vs reduced power at a
lower supply voltagebull Average 18 energy reduction compared to pure-margin designs
Resilience benefits increase in AVS context
084 088 092 096 10030
36
42
48
54
60brute-forcepure-marginCombOpt
Supply voltage (V)
En
erg
y (
mJ
)
084 086 088 09 092 094 096 098 1 10225
29
33
37
41
45brute-force
pure-margin
CombOpt
Supply voltage (V)
En
erg
y (
mJ
)
MUL EXU
Minimum achievable energy
86ISVLSI-2014 invited talk 140710
Overall Optimization Flow
bull Iteratively optimize with SEOpt and SkewOpt
Initial placement (all FFs = error-tolerant FFs)
Energy lt min energy
Save current solution
Margin insertion on K paths based on sensitivity function
Replace error-tolerant FFs w normal FFs
SEOpt
Activity-aware clock skew optimization
SkewOpt
Toward Holistic Modeling Margining and Tolerance of IC Variabi
IC Variability
Challenge Value of Technology
Solutions Modeling Margining Tolerance
Outline
BEOL Corner Optimization
Proposed Timing Signoff Flow
Conventional BEOL Corners
Statistical RC Model
Pessimism of Conventional BEOL Corners (CBC)
Intuition on Delay Variability Across Cw RCw
Intuition on Delay Variability Across Cw RCw (2)
Scaling Factor α and Delay Variation
Find Paths for Which TBCs Can Be Used
Determining α Arcw and Acw
Benefits of Tightened BEOL Corners
Outline (2)
How to Minimize Cost of Resilience
Tradeoff Resilience Cost vs Datapath Cost
Selective-Endpoint Optimization (SEOpt)
Clock Skew Optimization (SkewOpt)
Overall Optimization Flow
Benefit of Low-Cost Resilience
Increased Benefit of Resilience with AVS
Outline (3)
Breaking Chicken-Egg Loops Less Margin
Derated Library Characterization and AVS
Library Characterization for AVS
Power vs Area Across Different Signoffs
Heuristics 1
Vfinal Estimation
Observation and Heuristic 2
Proposed Library Characterization Flow
Power vs Area for All Designs
Also Multi-Mode Signoff Choices Matter
Also Tunable Monitors Less Margin
Also Tunable Monitors Less Margin (2)
Outline (4)
Conclusions
Thank You
Backup
Power Penalty to Fix EM with AVS
Homogeneous Corners
Homogeneous Corners (2)
Correlation Matrix
Wiring Structure in Timing-Critical Paths (2)
Delay Variation
Delay Variation (2)
Non-Homogeneous Corner
Opportunities for Tightened BEOL Corners
Wiring Structure in Timing-Critical Paths (2)
Proposed Timing Signoff Flow (2)
Experiment Setup
Further Analysis
Scaling Factor Results
Benefits of Tightened BEOL Corners (1)
Heuristics 1 (2)
Vfinal Estimation (2)
Observation and Heuristic 2 (2)
Technology and Benchmark Circuits
A Reference Signoff Flow
Experiment Setup (2)
ldquoChicken and Eggrdquo Loop
Bias Temperature Instability (BTI)
Observation 1
Results for DC Scenario
Problem Signoff Corner Definition
AVS Signoff Corner Selection
AVS Impact on EM Lifetime
EM Impact on AVS Scheduling
What is ldquoSignoffrdquo
Statistical Timing Analysis (1)
Statistical Timing Analysis (2)
Resilient Designs
Resilience Cost Reduction Problem
Selective-Endpoint Optimization
Process-Aware Vdd Scaling (PVS)
Challenge Variability
Energy Reduction in AVS Context
Our Concept Mode Dominance
Our Method Global Optimization
Classes of Closed-Loop AVS
Design of RO with Tunable Vmin
Benefit of Resilience Cost Reduction
Increased Benefit of Resilience With AVS
Overall Optimization Flow (2)
38ISVLSI-2014 invited talk 140710
Outlinebull Introductionbull Modeling of IC Variabilitybull Margining of IC Variabilitybull Tolerance of IC Variabilitybull Conclusions
39ISVLSI-2014 invited talk 140710
Conclusionsbull Variability severely challenges IC value
bull In manufacturing process during operation across lifetime
bull Benefit of ldquonext noderdquo is increasingly hard to findbull Entire node is a ldquo202020rdquo value propositionbull 5-10 in PPA metrics is now substantial at leading edge
bull Variability is connected to tapeout IC properties by models margins tolerances used in signoff
bull Some takeaways from this talkbull Substantial benefit from tightening BEOL corners (= signoff)bull ldquoMinimum cost of resiliencerdquo is a rich optimization challengebull Chicken-egg loops in signoff definition can be brokenbull Holistic approaches will provide ldquoequivalent scalingrdquo that
extends the value trajectory of Moorersquos Law
40ISVLSI-2014 invited talk 140710
Thank You
41ISVLSI-2014 invited talk 140710
Backup
42ISVLSI-2014 invited talk 140710
Power Penalty to Fix EM with AVS
1 2 3 4 5 6 7 8 91200
1300
1400
1500
1600
1700
030
032
034
036
Core Power (mW) PG Power (mW)
Implemetation
Core
Pow
er (m
W)
PG
Pow
er (m
W)
bull Core power increases due to elevated voltage bull PG power increases due to both elevated voltage and mesh degradationbull A tradeoff between invested guardband in signoff
Highest invested guardband
Least invested guardband
14 power penalty
>
43ISVLSI-2014 invited talk 140710
Homogeneous Cornersbull (1) Define RC corners of each layer separatelybull (2) Use corners from each layer to construct a
homogeneous corner for an interconnect stack
C-3σ
Layer M2
3σ
C-3σ
Layer M1
3σ
Interconnect stack with M1 and M2
M1 C
M2 C
3σ Pessimism
Example worst-case capacitance corner Homogeneous
Cw corner
44ISVLSI-2014 invited talk 140710
Homogeneous Cornersbull (1) Define RC corners of each layer separatelybull (2) Use corners from each layer to construct a
homogeneous corner for an interconnect stack
Interconnect stack with M1 and M2
M1 C
M2 C
3σ
Homogeneous Cw corner
C-3σ
Layer M2
3σ
C-3σ
Layer M1
3σ
Pessimism
Example worst-case capacitance corner
When variations in different layers are not fully correlated pessimism of homogeneous corners increase with layers
45ISVLSI-2014 invited talk 140710
Correlation Matrixbull Let Σ be the correlation matrix for variation sources
M1 M2 M3 M4
ΔW ΔT ΔH ΔW ΔT ΔH ΔW ΔT ΔH ΔW ΔT ΔH
M1 ΔW 1 0 0 γ 0 0 γ 0 0 0 0 0
ΔT 0 1 0 0 γ 0 0 γ 0 0 0 0
ΔH 0 0 1 0 0 γ 0 0 γ 0 0 0
M2 ΔW γ 0 0 1 0 0 γ 0 0 0 0 0
ΔT 0 γ 0 0 1 0 0 γ 0 0 0 0
ΔH 0 0 γ 0 0 1 0 0 γ 0 0 0
M3 ΔW γ 0 0 γ 0 0 1 0 0 0 0 0
ΔT 0 γ 0 0 γ 0 0 1 0 0 0 0
ΔH 0 0 γ 0 0 γ 0 0 1 0 0 0
M4 ΔW 0 0 0 0 0 0 0 0 0 1 0 0
ΔT 0 0 0 0 0 0 0 0 0 0 1 0
ΔH 0 0 0 0 0 0 0 0 0 0 0 1
= Σ
Correlation for variation sources with the same variation type and in the process module γ 05
Variation sources in different process modules are independent
46ISVLSI-2014 invited talk 140710
Wiring Structure in Timing-Critical Paths (2)
bull 92 of paths have lt 60 of wirelength on any single layer
Max wirelength ratio across all layers ()
Cum
ulati
ve p
roba
bilit
y
092
60
bull Variations in different layers are not fully correlated
bull Some paths have α gt 10 a CBC can underestimate delay variationsbull But these paths have larger delays at the other corner
C-worst corner underestimates delay variations but these paths are dominated by the RC-worst corner
α lt 10 delay variations are covered by the RC-worst corner
Dominated by C-worstΔdelay at C-worst gt Δdelay at RC-worst
Dominated by RC-worst Δdelay at RC-worst gt Δdelay at C-worst
Δdelay at RC-worst [d(Ycw) ndash d(Ytyp)] d(Ytyp)
48ISVLSI-2014 invited talk 140710
Delay Variation
α α
Δdelay at C-worst [d(Ycw) ndash d(Ytyp)] d(Ytyp)
bull Some paths have α gt 10 a CBC can underestimate delay variationsbull But these paths have larger delays at the other corner
C-worst corner underestimates delay variations but these paths are dominated by the RC-worst corner
α lt 10 delay variations are covered by the RC-worst corner
Dominated by C-worstΔdelay at C-worst gt Δdelay at RC-worst
Dominated by RC-worst Δdelay at RC-worst gt Δdelay at C-worst
Δdelay at RC-worst [d(Ycw) ndash d(Ytyp)] d(Ytyp)
bull Paths are more sensitive to R or to Cbull Using RC-worst or C-worst only will underestimate delay variationsbull Need both RC- and C-worst corners to cover process variationsbull In the following discussions α is defined at the dominant corner
49ISVLSI-2014 invited talk 140710
Non-Homogeneous Corner
bull Each layer can have different skewed variationsInterconnect stack with M1 and M2
M1 C
M2 C
3σ
Non-homogeneous cornerM1 == Cw (3σ)M2 == Ctyp
bull Less pessimism with non-homogeneous cornersbull Challenge
bull Many feasible combinationsbull A corner can only cover certain pathsbull How to choose the best combinations
50ISVLSI-2014 invited talk 140710
Opportunities for Tightened BEOL Corners
bull CBC can be pessimistic Most paths have α lt 05 bull Use tightened BEOL corners eg scale BEOL variation in
itf with α = 05
Δdj(Yrcw)dj(Ytyp) x 100
3σjd(Ytyp) x 100
Challenge how to avoid underestimating delay variation to preserve parametric yield
51ISVLSI-2014 invited talk 140710
Wiring Structure in Timing-Critical Paths
bull Critical paths are structurally similar
bull Wires on critical paths are routed on many layers
bull Structure is an outcome of the design flow
Testcasebull 45nm foundry library (wire
resistivity scaled by 8X)bull Netlist NETCARD 1mm2 570K
standard cell instancesbull 9 metal layersbull Extract critical paths from
different PVT and BEOL corners
Wirelength ratio ()
52ISVLSI-2014 invited talk 140710
Proposed Timing Signoff Flow
bull Extract RC at RC-worst C-worst and the typical corners
bull Calculate Δdelay of critical paths
bull Put path j in the group Gtbc if Δdelay is larger than a threshold
bull Fix only the paths in Gtbc using tightened BEOL corners
bull Since tightened corners have smaller delay variations timing closure is easier
Routed design
Timing analysis at BEOL corners Ytyp Ycw Yrcw
GTBC GCBC
ECOusing CBC
Timing analysis
using TBC
violation = 0
Timing analysis
using CBC
violation = 0
ECOusing TBC
done
53ISVLSI-2014 invited talk 140710
Experiment Setup
LEON3MP NETCARD SUPERBLUE12Clock period (ns) 18 20 31
bull Model BTI degradation with Vfinal throughout lifetime
bull Aging of a flat Vfinal asymp aging of an adaptive Vdd
bull But slightly pessimistic
Vdd
time
NBTI
PBTI
VBTI = Vlib asymp Vfinal
58ISVLSI-2014 invited talk 140710
Vfinal Estimation
bull Problem Vfinal is not available at early design stage (design has not been implemented)
bull Vfinal = Vdd end of life (to compensate BTI aging)
bull Gates along critical pathbull Timing slack at t = 0
bull Circuit activity is not an issue bull Because BTI effect is not sensitive to circuit activitybull DC or AC stress model is sufficient
59ISVLSI-2014 invited talk 140710
Observation and Heuristic 2
bull Observation 2 Vfinal is not sensitive to gate types
bull Heuristic 2 use average Vfinal of different gate typesbull Vfinal is a function of timing slack
bull Assume timing slack = 0
10mV
60ISVLSI-2014 invited talk 140710
Technology and Benchmark Circuits
bull NANGATE library with 32nm PTM technology bull Signoff for setup time violationbull Temperature = 125Cbull Process corner = slow NMOS and PMOSbull BTI degradation = DC AC
Supply voltages
Circuit Frequency (GHz)C5315 138c7552 125AES 089MPEG2 105
Vmax105V
Vinit090V
Vheur1 (DC) 097V
Vheur1 (AC) 095V
Vheur2 (DC) 095V
Vheur2 (AC) 093V
61ISVLSI-2014 invited talk 140710
A Reference Signoff Flow
bull Basic idea keep a consistent VBTI VLIB and Vdd throughout circuit lifetime
bull Signoff flowbull Estimate aging at each time step
bull Update circuit timing and Vdd
bull Repeat until t = tfinal
bull Modify circuit and start over if Vfinal gt maximum allowed voltage
bull No overhead in timing analysis but very slow Many STA runs
and library
Vstep AVS voltage stepVfinal converged voltage
62ISVLSI-2014 invited talk 140710
Experiment Setupbull Characterize different derated libraries
bull Evaluate impact of library characterizationbull Seven setups
1 VBTI = Vlib = Vinit Ignore AVS2 Most pessimistic derated library3 VBTI = Vlib = Vmax Extreme corner for AVS4 VBTI = Vfinal Do not overestimate aging but ignores AVS5 No derated library (reference)6 Proposed method with α=07 Proposed method with α=003
Case 1 2 3 4 5 6 7Vlib(V) Vinit Vinit Vmax Vinit NA Vheur1 Vheur2
VBTI (V) Vinit Vmax Vmax Vfinal NA Vheur1 Vheur2
63ISVLSI-2014 invited talk 140710
ldquoChicken and Eggrdquo Loop
bull ldquoChicken and eggrdquo loop in signoffbull Derated library characterization is related to BTI + AVSbull AVS affected by circuit implementation
bull Timing constraints critical paths etc
bull Circuit is affected by library characterization
Circuit
Derated Libraries
Vfinal
Vlib VBTI
64ISVLSI-2014 invited talk 140710
Bias Temperature Instability (BTI)
|ΔVth| increases when device is on (stressed)|ΔVth| is partially recovered when device is off (relaxed)NBTI PMOS PBTINMOS
|Vgs|
time
ON OFF ON OFF
[VattikondaWC06]
Device aging (|ΔVth|) accumulates over time
[TCASrsquo14]
65ISVLSI-2014 invited talk 140710
Observation 1
[Chan11]
bull BTI is a ldquofront-loadedrdquo phenomenon
bull 50 BTI aging happens within the 1st year of circuit lifetime (total lifetime = 10 years)
bull Most Vdd increment happens in early lifetime
bull Gap between Vdd and Vfinal reduces rapidly
asymp70 Vdd increment in 1 year(remaining 30 over 9 years)
Vfinal
66ISVLSI-2014 invited talk 140710
Results for DC Scenario
Optimistic signoff corner bull AVS increases supply voltage
aggressively to compensate agingbull Consume more powerbull May fail to meet timing if desired
bull Assume no EM fix at signoffbull BTI degradation is checked at each step and MTTF is updated as
30 MTTF penalty
200mV voltage compensation
>
70ISVLSI-2014 invited talk 140710
0 2 4 6 8 10 12090092094096098100102104
S1 S2 S3 S4 S5
Year
VDD
DMA 3S1 S2 S3 S4 S5
78
79
80
81
MTT
F (Y
ear)
EM Impact on AVS Scheduling
12 years MTTF penalty
>
71ISVLSI-2014 invited talk 140710
What is ldquoSignoffrdquo
bull Foundation of contract between design house and foundrybull ldquochip should workrdquo stack of models margins analysesbull Function timing signal integrity power integrity hellip
Nominal VddStatic IR drop
Power grid IR gradientDynamic IR
HCINBTI
Signoff Vdd
Voltage
Problem Margins = pessimism
overdesign schedule delay
ldquomargin stackrdquo for voltage signoff
Operating voltage
72ISVLSI-2014 invited talk 140710
Statistical Timing Analysis (1)
bull Delay sensitivity of path pj to variation source zv
bull Assumptions bull Δdjv is linear with respect to variation sources
bull Variation sources are normal distributions
bull Obtain Δdjv using 28 runs of RC extraction and static timing analysis (STA)
28 itf files (27 variation
sources + Ytyp)
Routed Netlist
RC extraction
STA
Δdjv
Δdjv = [ - ] 3dj(Yv) dj(Ytyp)
Note Path delay includes gate and wire delays
73ISVLSI-2014 invited talk 140710
Statistical Timing Analysis (2)bull Σ is the correlation matrix for variation sources (eg 27 x 27)
bull Σ = λλT (Note λ is obtained by Cholesky decomposition)
h h119879 119903119900119906119892 119901119906119905=1minus119864119877119879
+1minus119864119877119903times119879
recovery cycles
Clock period
Error rate [Kahng10]
76ISVLSI-2014 invited talk 140710
Selective-Endpoint Optimizationbull Optimize fanin cone w tighter constraints Allows replacement of Razor FF w normal FFbull Trade off cost of resilience vs data path optimization
Energy Reduction in AVS Contextbull Adaptive voltage scaling allows lower supply voltage for resilient
designs thus reduced powerbull Proposed method trades off between timing-error penalty vs
reduced power at a lower supply voltagebull Proposed method achieves an average of 18 energy reduction
compared to pure-margin designs Resilience benefits increase in the context of AVS strategy
084 088 092 096 10030
36
42
48
54
60brute-forcepure-marginCombOpt
Supply voltage (V)
En
erg
y (
mJ
)
084 086 088 09 092 094 096 098 1 10225
29
33
37
41
45brute-force
pure-margin
CombOpt
Supply voltage (V)
En
erg
y (
mJ
)
MUL EXU
Minimum achievable energy
80ISVLSI-2014 invited talk 140710
Our Concept Mode Dominancebull Design cone (of mode A) is the union of all the feasible operating modes for
circuits signed of at mode Abull Design cone is determined by tradeoff between voltage and frequency (mainly
threshold voltages)bull One mode is outside of the design cone of the other
failed design overdesignbull Mode A has positive timing slacks with respect to mode B
mode A dominates mode Bbull Equivalent dominance no mode is dominated by the other
bull Modes are in each othersrsquo design cone
Voltage
Frequency
A
Negative Slacks = failed design
Positive Slacks = overdesign
B
C
Design Cone of mode A
Multi-mode signoff at modes which do not exhibit equivalent dominance leads to overdesign
Guideline search for signoff modes within design cone reduce overdesign
81ISVLSI-2014 invited talk 140710
Our Method Global Optimizationbull Iteratively sample and refine power models
bull Avoid circuit implementation at each modebull Small constant of runs is enough Scalable
Sample (SPampR)
Construct power models
Estimate optimal signoff modes
Sample (SPampR)
Refine power models
Adaptive search
Global optimization flow
09 10 11 1214
15
16
17
18
19
201st 2nd real
Signoff Voltage (v)
Po
wer
(m
W)
Power estimation of adaptive search
bull Ovals indicate sample pointsbull 1st 2nd power from power models at first
second iterationbull real power from real implemented circuits
Design AESf 700MHz
82ISVLSI-2014 invited talk 140710
Classes of Closed-Loop AVS
bull Critical path may be difficult to identify (IP from 3rd party)
bull Calibrating monitors at multiple modesvoltages requires long test time
Closed-Loop AVS
Design-dependent replica
In-situmonitor
Generic monitor
bull Does not capture design-specific performance variation
82
This work Tunable monitor for closed-loop AVSbull Can be applied as a generic monitorbull Or tuned to capture design-specific performance
83ISVLSI-2014 invited talk 140710
Design of RO with Tunable Vmin
bull Identified two circuit knobs to tune Vmin
bull Series resistancebull Cell types (INV NAND NOR)
bull Proposed circuitbull Different cell type covers different process cornersbull Tune series resistance of each stage to high or low
1 bit 1 bit 1 bit Control pins
High resistance
Low resistance
84ISVLSI-2014 invited talk 140710
Benefit of Resilience Cost Reductionbull Reference flows
bull Pure-margin (PM) conventional methodology w only margin insertionbull Brute-force (BF) insert error-tolerant FFs at timing-critical endpoints
bull Proposed method (CO) achieves up to 20 energy reduction compared to reference methods
bull Resilience benefits increase with safety margin
PM BF CO PM BF CO PM BF CO25
30
35
40
45
50
55 Energy penalty of throughput degradationEnergy penalty of additional circuitsEnergy wo resilience
En
erg
y (
mJ
)
Large marginMedium marginSmall margin
MUL
PM BF CO PM BF CO PM BF CO25
27
29
31
33
35
En
erg
y (
mJ
)
EXU
Large marginMedium marginSmall margin
Smallmediumlarge margin safety margin = 51015 of clock period
85ISVLSI-2014 invited talk 140710
Increased Benefit of Resilience With AVSbull AVS (Adaptive Voltage Scaling) allows lower supply voltage for
resilient designs reduced powerbull We trade off between timing-error penalty vs reduced power at a
lower supply voltagebull Average 18 energy reduction compared to pure-margin designs
Resilience benefits increase in AVS context
084 088 092 096 10030
36
42
48
54
60brute-forcepure-marginCombOpt
Supply voltage (V)
En
erg
y (
mJ
)
084 086 088 09 092 094 096 098 1 10225
29
33
37
41
45brute-force
pure-margin
CombOpt
Supply voltage (V)
En
erg
y (
mJ
)
MUL EXU
Minimum achievable energy
86ISVLSI-2014 invited talk 140710
Overall Optimization Flow
bull Iteratively optimize with SEOpt and SkewOpt
Initial placement (all FFs = error-tolerant FFs)
Energy lt min energy
Save current solution
Margin insertion on K paths based on sensitivity function
Replace error-tolerant FFs w normal FFs
SEOpt
Activity-aware clock skew optimization
SkewOpt
Toward Holistic Modeling Margining and Tolerance of IC Variabi
IC Variability
Challenge Value of Technology
Solutions Modeling Margining Tolerance
Outline
BEOL Corner Optimization
Proposed Timing Signoff Flow
Conventional BEOL Corners
Statistical RC Model
Pessimism of Conventional BEOL Corners (CBC)
Intuition on Delay Variability Across Cw RCw
Intuition on Delay Variability Across Cw RCw (2)
Scaling Factor α and Delay Variation
Find Paths for Which TBCs Can Be Used
Determining α Arcw and Acw
Benefits of Tightened BEOL Corners
Outline (2)
How to Minimize Cost of Resilience
Tradeoff Resilience Cost vs Datapath Cost
Selective-Endpoint Optimization (SEOpt)
Clock Skew Optimization (SkewOpt)
Overall Optimization Flow
Benefit of Low-Cost Resilience
Increased Benefit of Resilience with AVS
Outline (3)
Breaking Chicken-Egg Loops Less Margin
Derated Library Characterization and AVS
Library Characterization for AVS
Power vs Area Across Different Signoffs
Heuristics 1
Vfinal Estimation
Observation and Heuristic 2
Proposed Library Characterization Flow
Power vs Area for All Designs
Also Multi-Mode Signoff Choices Matter
Also Tunable Monitors Less Margin
Also Tunable Monitors Less Margin (2)
Outline (4)
Conclusions
Thank You
Backup
Power Penalty to Fix EM with AVS
Homogeneous Corners
Homogeneous Corners (2)
Correlation Matrix
Wiring Structure in Timing-Critical Paths (2)
Delay Variation
Delay Variation (2)
Non-Homogeneous Corner
Opportunities for Tightened BEOL Corners
Wiring Structure in Timing-Critical Paths (2)
Proposed Timing Signoff Flow (2)
Experiment Setup
Further Analysis
Scaling Factor Results
Benefits of Tightened BEOL Corners (1)
Heuristics 1 (2)
Vfinal Estimation (2)
Observation and Heuristic 2 (2)
Technology and Benchmark Circuits
A Reference Signoff Flow
Experiment Setup (2)
ldquoChicken and Eggrdquo Loop
Bias Temperature Instability (BTI)
Observation 1
Results for DC Scenario
Problem Signoff Corner Definition
AVS Signoff Corner Selection
AVS Impact on EM Lifetime
EM Impact on AVS Scheduling
What is ldquoSignoffrdquo
Statistical Timing Analysis (1)
Statistical Timing Analysis (2)
Resilient Designs
Resilience Cost Reduction Problem
Selective-Endpoint Optimization
Process-Aware Vdd Scaling (PVS)
Challenge Variability
Energy Reduction in AVS Context
Our Concept Mode Dominance
Our Method Global Optimization
Classes of Closed-Loop AVS
Design of RO with Tunable Vmin
Benefit of Resilience Cost Reduction
Increased Benefit of Resilience With AVS
Overall Optimization Flow (2)
39ISVLSI-2014 invited talk 140710
Conclusionsbull Variability severely challenges IC value
bull In manufacturing process during operation across lifetime
bull Benefit of ldquonext noderdquo is increasingly hard to findbull Entire node is a ldquo202020rdquo value propositionbull 5-10 in PPA metrics is now substantial at leading edge
bull Variability is connected to tapeout IC properties by models margins tolerances used in signoff
bull Some takeaways from this talkbull Substantial benefit from tightening BEOL corners (= signoff)bull ldquoMinimum cost of resiliencerdquo is a rich optimization challengebull Chicken-egg loops in signoff definition can be brokenbull Holistic approaches will provide ldquoequivalent scalingrdquo that
extends the value trajectory of Moorersquos Law
40ISVLSI-2014 invited talk 140710
Thank You
41ISVLSI-2014 invited talk 140710
Backup
42ISVLSI-2014 invited talk 140710
Power Penalty to Fix EM with AVS
1 2 3 4 5 6 7 8 91200
1300
1400
1500
1600
1700
030
032
034
036
Core Power (mW) PG Power (mW)
Implemetation
Core
Pow
er (m
W)
PG
Pow
er (m
W)
bull Core power increases due to elevated voltage bull PG power increases due to both elevated voltage and mesh degradationbull A tradeoff between invested guardband in signoff
Highest invested guardband
Least invested guardband
14 power penalty
>
43ISVLSI-2014 invited talk 140710
Homogeneous Cornersbull (1) Define RC corners of each layer separatelybull (2) Use corners from each layer to construct a
homogeneous corner for an interconnect stack
C-3σ
Layer M2
3σ
C-3σ
Layer M1
3σ
Interconnect stack with M1 and M2
M1 C
M2 C
3σ Pessimism
Example worst-case capacitance corner Homogeneous
Cw corner
44ISVLSI-2014 invited talk 140710
Homogeneous Cornersbull (1) Define RC corners of each layer separatelybull (2) Use corners from each layer to construct a
homogeneous corner for an interconnect stack
Interconnect stack with M1 and M2
M1 C
M2 C
3σ
Homogeneous Cw corner
C-3σ
Layer M2
3σ
C-3σ
Layer M1
3σ
Pessimism
Example worst-case capacitance corner
When variations in different layers are not fully correlated pessimism of homogeneous corners increase with layers
45ISVLSI-2014 invited talk 140710
Correlation Matrixbull Let Σ be the correlation matrix for variation sources
M1 M2 M3 M4
ΔW ΔT ΔH ΔW ΔT ΔH ΔW ΔT ΔH ΔW ΔT ΔH
M1 ΔW 1 0 0 γ 0 0 γ 0 0 0 0 0
ΔT 0 1 0 0 γ 0 0 γ 0 0 0 0
ΔH 0 0 1 0 0 γ 0 0 γ 0 0 0
M2 ΔW γ 0 0 1 0 0 γ 0 0 0 0 0
ΔT 0 γ 0 0 1 0 0 γ 0 0 0 0
ΔH 0 0 γ 0 0 1 0 0 γ 0 0 0
M3 ΔW γ 0 0 γ 0 0 1 0 0 0 0 0
ΔT 0 γ 0 0 γ 0 0 1 0 0 0 0
ΔH 0 0 γ 0 0 γ 0 0 1 0 0 0
M4 ΔW 0 0 0 0 0 0 0 0 0 1 0 0
ΔT 0 0 0 0 0 0 0 0 0 0 1 0
ΔH 0 0 0 0 0 0 0 0 0 0 0 1
= Σ
Correlation for variation sources with the same variation type and in the process module γ 05
Variation sources in different process modules are independent
46ISVLSI-2014 invited talk 140710
Wiring Structure in Timing-Critical Paths (2)
bull 92 of paths have lt 60 of wirelength on any single layer
Max wirelength ratio across all layers ()
Cum
ulati
ve p
roba
bilit
y
092
60
bull Variations in different layers are not fully correlated
bull Some paths have α gt 10 a CBC can underestimate delay variationsbull But these paths have larger delays at the other corner
C-worst corner underestimates delay variations but these paths are dominated by the RC-worst corner
α lt 10 delay variations are covered by the RC-worst corner
Dominated by C-worstΔdelay at C-worst gt Δdelay at RC-worst
Dominated by RC-worst Δdelay at RC-worst gt Δdelay at C-worst
Δdelay at RC-worst [d(Ycw) ndash d(Ytyp)] d(Ytyp)
48ISVLSI-2014 invited talk 140710
Delay Variation
α α
Δdelay at C-worst [d(Ycw) ndash d(Ytyp)] d(Ytyp)
bull Some paths have α gt 10 a CBC can underestimate delay variationsbull But these paths have larger delays at the other corner
C-worst corner underestimates delay variations but these paths are dominated by the RC-worst corner
α lt 10 delay variations are covered by the RC-worst corner
Dominated by C-worstΔdelay at C-worst gt Δdelay at RC-worst
Dominated by RC-worst Δdelay at RC-worst gt Δdelay at C-worst
Δdelay at RC-worst [d(Ycw) ndash d(Ytyp)] d(Ytyp)
bull Paths are more sensitive to R or to Cbull Using RC-worst or C-worst only will underestimate delay variationsbull Need both RC- and C-worst corners to cover process variationsbull In the following discussions α is defined at the dominant corner
49ISVLSI-2014 invited talk 140710
Non-Homogeneous Corner
bull Each layer can have different skewed variationsInterconnect stack with M1 and M2
M1 C
M2 C
3σ
Non-homogeneous cornerM1 == Cw (3σ)M2 == Ctyp
bull Less pessimism with non-homogeneous cornersbull Challenge
bull Many feasible combinationsbull A corner can only cover certain pathsbull How to choose the best combinations
50ISVLSI-2014 invited talk 140710
Opportunities for Tightened BEOL Corners
bull CBC can be pessimistic Most paths have α lt 05 bull Use tightened BEOL corners eg scale BEOL variation in
itf with α = 05
Δdj(Yrcw)dj(Ytyp) x 100
3σjd(Ytyp) x 100
Challenge how to avoid underestimating delay variation to preserve parametric yield
51ISVLSI-2014 invited talk 140710
Wiring Structure in Timing-Critical Paths
bull Critical paths are structurally similar
bull Wires on critical paths are routed on many layers
bull Structure is an outcome of the design flow
Testcasebull 45nm foundry library (wire
resistivity scaled by 8X)bull Netlist NETCARD 1mm2 570K
standard cell instancesbull 9 metal layersbull Extract critical paths from
different PVT and BEOL corners
Wirelength ratio ()
52ISVLSI-2014 invited talk 140710
Proposed Timing Signoff Flow
bull Extract RC at RC-worst C-worst and the typical corners
bull Calculate Δdelay of critical paths
bull Put path j in the group Gtbc if Δdelay is larger than a threshold
bull Fix only the paths in Gtbc using tightened BEOL corners
bull Since tightened corners have smaller delay variations timing closure is easier
Routed design
Timing analysis at BEOL corners Ytyp Ycw Yrcw
GTBC GCBC
ECOusing CBC
Timing analysis
using TBC
violation = 0
Timing analysis
using CBC
violation = 0
ECOusing TBC
done
53ISVLSI-2014 invited talk 140710
Experiment Setup
LEON3MP NETCARD SUPERBLUE12Clock period (ns) 18 20 31
bull Model BTI degradation with Vfinal throughout lifetime
bull Aging of a flat Vfinal asymp aging of an adaptive Vdd
bull But slightly pessimistic
Vdd
time
NBTI
PBTI
VBTI = Vlib asymp Vfinal
58ISVLSI-2014 invited talk 140710
Vfinal Estimation
bull Problem Vfinal is not available at early design stage (design has not been implemented)
bull Vfinal = Vdd end of life (to compensate BTI aging)
bull Gates along critical pathbull Timing slack at t = 0
bull Circuit activity is not an issue bull Because BTI effect is not sensitive to circuit activitybull DC or AC stress model is sufficient
59ISVLSI-2014 invited talk 140710
Observation and Heuristic 2
bull Observation 2 Vfinal is not sensitive to gate types
bull Heuristic 2 use average Vfinal of different gate typesbull Vfinal is a function of timing slack
bull Assume timing slack = 0
10mV
60ISVLSI-2014 invited talk 140710
Technology and Benchmark Circuits
bull NANGATE library with 32nm PTM technology bull Signoff for setup time violationbull Temperature = 125Cbull Process corner = slow NMOS and PMOSbull BTI degradation = DC AC
Supply voltages
Circuit Frequency (GHz)C5315 138c7552 125AES 089MPEG2 105
Vmax105V
Vinit090V
Vheur1 (DC) 097V
Vheur1 (AC) 095V
Vheur2 (DC) 095V
Vheur2 (AC) 093V
61ISVLSI-2014 invited talk 140710
A Reference Signoff Flow
bull Basic idea keep a consistent VBTI VLIB and Vdd throughout circuit lifetime
bull Signoff flowbull Estimate aging at each time step
bull Update circuit timing and Vdd
bull Repeat until t = tfinal
bull Modify circuit and start over if Vfinal gt maximum allowed voltage
bull No overhead in timing analysis but very slow Many STA runs
and library
Vstep AVS voltage stepVfinal converged voltage
62ISVLSI-2014 invited talk 140710
Experiment Setupbull Characterize different derated libraries
bull Evaluate impact of library characterizationbull Seven setups
1 VBTI = Vlib = Vinit Ignore AVS2 Most pessimistic derated library3 VBTI = Vlib = Vmax Extreme corner for AVS4 VBTI = Vfinal Do not overestimate aging but ignores AVS5 No derated library (reference)6 Proposed method with α=07 Proposed method with α=003
Case 1 2 3 4 5 6 7Vlib(V) Vinit Vinit Vmax Vinit NA Vheur1 Vheur2
VBTI (V) Vinit Vmax Vmax Vfinal NA Vheur1 Vheur2
63ISVLSI-2014 invited talk 140710
ldquoChicken and Eggrdquo Loop
bull ldquoChicken and eggrdquo loop in signoffbull Derated library characterization is related to BTI + AVSbull AVS affected by circuit implementation
bull Timing constraints critical paths etc
bull Circuit is affected by library characterization
Circuit
Derated Libraries
Vfinal
Vlib VBTI
64ISVLSI-2014 invited talk 140710
Bias Temperature Instability (BTI)
|ΔVth| increases when device is on (stressed)|ΔVth| is partially recovered when device is off (relaxed)NBTI PMOS PBTINMOS
|Vgs|
time
ON OFF ON OFF
[VattikondaWC06]
Device aging (|ΔVth|) accumulates over time
[TCASrsquo14]
65ISVLSI-2014 invited talk 140710
Observation 1
[Chan11]
bull BTI is a ldquofront-loadedrdquo phenomenon
bull 50 BTI aging happens within the 1st year of circuit lifetime (total lifetime = 10 years)
bull Most Vdd increment happens in early lifetime
bull Gap between Vdd and Vfinal reduces rapidly
asymp70 Vdd increment in 1 year(remaining 30 over 9 years)
Vfinal
66ISVLSI-2014 invited talk 140710
Results for DC Scenario
Optimistic signoff corner bull AVS increases supply voltage
aggressively to compensate agingbull Consume more powerbull May fail to meet timing if desired
bull Assume no EM fix at signoffbull BTI degradation is checked at each step and MTTF is updated as
30 MTTF penalty
200mV voltage compensation
>
70ISVLSI-2014 invited talk 140710
0 2 4 6 8 10 12090092094096098100102104
S1 S2 S3 S4 S5
Year
VDD
DMA 3S1 S2 S3 S4 S5
78
79
80
81
MTT
F (Y
ear)
EM Impact on AVS Scheduling
12 years MTTF penalty
>
71ISVLSI-2014 invited talk 140710
What is ldquoSignoffrdquo
bull Foundation of contract between design house and foundrybull ldquochip should workrdquo stack of models margins analysesbull Function timing signal integrity power integrity hellip
Nominal VddStatic IR drop
Power grid IR gradientDynamic IR
HCINBTI
Signoff Vdd
Voltage
Problem Margins = pessimism
overdesign schedule delay
ldquomargin stackrdquo for voltage signoff
Operating voltage
72ISVLSI-2014 invited talk 140710
Statistical Timing Analysis (1)
bull Delay sensitivity of path pj to variation source zv
bull Assumptions bull Δdjv is linear with respect to variation sources
bull Variation sources are normal distributions
bull Obtain Δdjv using 28 runs of RC extraction and static timing analysis (STA)
28 itf files (27 variation
sources + Ytyp)
Routed Netlist
RC extraction
STA
Δdjv
Δdjv = [ - ] 3dj(Yv) dj(Ytyp)
Note Path delay includes gate and wire delays
73ISVLSI-2014 invited talk 140710
Statistical Timing Analysis (2)bull Σ is the correlation matrix for variation sources (eg 27 x 27)
bull Σ = λλT (Note λ is obtained by Cholesky decomposition)
h h119879 119903119900119906119892 119901119906119905=1minus119864119877119879
+1minus119864119877119903times119879
recovery cycles
Clock period
Error rate [Kahng10]
76ISVLSI-2014 invited talk 140710
Selective-Endpoint Optimizationbull Optimize fanin cone w tighter constraints Allows replacement of Razor FF w normal FFbull Trade off cost of resilience vs data path optimization
Energy Reduction in AVS Contextbull Adaptive voltage scaling allows lower supply voltage for resilient
designs thus reduced powerbull Proposed method trades off between timing-error penalty vs
reduced power at a lower supply voltagebull Proposed method achieves an average of 18 energy reduction
compared to pure-margin designs Resilience benefits increase in the context of AVS strategy
084 088 092 096 10030
36
42
48
54
60brute-forcepure-marginCombOpt
Supply voltage (V)
En
erg
y (
mJ
)
084 086 088 09 092 094 096 098 1 10225
29
33
37
41
45brute-force
pure-margin
CombOpt
Supply voltage (V)
En
erg
y (
mJ
)
MUL EXU
Minimum achievable energy
80ISVLSI-2014 invited talk 140710
Our Concept Mode Dominancebull Design cone (of mode A) is the union of all the feasible operating modes for
circuits signed of at mode Abull Design cone is determined by tradeoff between voltage and frequency (mainly
threshold voltages)bull One mode is outside of the design cone of the other
failed design overdesignbull Mode A has positive timing slacks with respect to mode B
mode A dominates mode Bbull Equivalent dominance no mode is dominated by the other
bull Modes are in each othersrsquo design cone
Voltage
Frequency
A
Negative Slacks = failed design
Positive Slacks = overdesign
B
C
Design Cone of mode A
Multi-mode signoff at modes which do not exhibit equivalent dominance leads to overdesign
Guideline search for signoff modes within design cone reduce overdesign
81ISVLSI-2014 invited talk 140710
Our Method Global Optimizationbull Iteratively sample and refine power models
bull Avoid circuit implementation at each modebull Small constant of runs is enough Scalable
Sample (SPampR)
Construct power models
Estimate optimal signoff modes
Sample (SPampR)
Refine power models
Adaptive search
Global optimization flow
09 10 11 1214
15
16
17
18
19
201st 2nd real
Signoff Voltage (v)
Po
wer
(m
W)
Power estimation of adaptive search
bull Ovals indicate sample pointsbull 1st 2nd power from power models at first
second iterationbull real power from real implemented circuits
Design AESf 700MHz
82ISVLSI-2014 invited talk 140710
Classes of Closed-Loop AVS
bull Critical path may be difficult to identify (IP from 3rd party)
bull Calibrating monitors at multiple modesvoltages requires long test time
Closed-Loop AVS
Design-dependent replica
In-situmonitor
Generic monitor
bull Does not capture design-specific performance variation
82
This work Tunable monitor for closed-loop AVSbull Can be applied as a generic monitorbull Or tuned to capture design-specific performance
83ISVLSI-2014 invited talk 140710
Design of RO with Tunable Vmin
bull Identified two circuit knobs to tune Vmin
bull Series resistancebull Cell types (INV NAND NOR)
bull Proposed circuitbull Different cell type covers different process cornersbull Tune series resistance of each stage to high or low
1 bit 1 bit 1 bit Control pins
High resistance
Low resistance
84ISVLSI-2014 invited talk 140710
Benefit of Resilience Cost Reductionbull Reference flows
bull Pure-margin (PM) conventional methodology w only margin insertionbull Brute-force (BF) insert error-tolerant FFs at timing-critical endpoints
bull Proposed method (CO) achieves up to 20 energy reduction compared to reference methods
bull Resilience benefits increase with safety margin
PM BF CO PM BF CO PM BF CO25
30
35
40
45
50
55 Energy penalty of throughput degradationEnergy penalty of additional circuitsEnergy wo resilience
En
erg
y (
mJ
)
Large marginMedium marginSmall margin
MUL
PM BF CO PM BF CO PM BF CO25
27
29
31
33
35
En
erg
y (
mJ
)
EXU
Large marginMedium marginSmall margin
Smallmediumlarge margin safety margin = 51015 of clock period
85ISVLSI-2014 invited talk 140710
Increased Benefit of Resilience With AVSbull AVS (Adaptive Voltage Scaling) allows lower supply voltage for
resilient designs reduced powerbull We trade off between timing-error penalty vs reduced power at a
lower supply voltagebull Average 18 energy reduction compared to pure-margin designs
Resilience benefits increase in AVS context
084 088 092 096 10030
36
42
48
54
60brute-forcepure-marginCombOpt
Supply voltage (V)
En
erg
y (
mJ
)
084 086 088 09 092 094 096 098 1 10225
29
33
37
41
45brute-force
pure-margin
CombOpt
Supply voltage (V)
En
erg
y (
mJ
)
MUL EXU
Minimum achievable energy
86ISVLSI-2014 invited talk 140710
Overall Optimization Flow
bull Iteratively optimize with SEOpt and SkewOpt
Initial placement (all FFs = error-tolerant FFs)
Energy lt min energy
Save current solution
Margin insertion on K paths based on sensitivity function
Replace error-tolerant FFs w normal FFs
SEOpt
Activity-aware clock skew optimization
SkewOpt
Toward Holistic Modeling Margining and Tolerance of IC Variabi
IC Variability
Challenge Value of Technology
Solutions Modeling Margining Tolerance
Outline
BEOL Corner Optimization
Proposed Timing Signoff Flow
Conventional BEOL Corners
Statistical RC Model
Pessimism of Conventional BEOL Corners (CBC)
Intuition on Delay Variability Across Cw RCw
Intuition on Delay Variability Across Cw RCw (2)
Scaling Factor α and Delay Variation
Find Paths for Which TBCs Can Be Used
Determining α Arcw and Acw
Benefits of Tightened BEOL Corners
Outline (2)
How to Minimize Cost of Resilience
Tradeoff Resilience Cost vs Datapath Cost
Selective-Endpoint Optimization (SEOpt)
Clock Skew Optimization (SkewOpt)
Overall Optimization Flow
Benefit of Low-Cost Resilience
Increased Benefit of Resilience with AVS
Outline (3)
Breaking Chicken-Egg Loops Less Margin
Derated Library Characterization and AVS
Library Characterization for AVS
Power vs Area Across Different Signoffs
Heuristics 1
Vfinal Estimation
Observation and Heuristic 2
Proposed Library Characterization Flow
Power vs Area for All Designs
Also Multi-Mode Signoff Choices Matter
Also Tunable Monitors Less Margin
Also Tunable Monitors Less Margin (2)
Outline (4)
Conclusions
Thank You
Backup
Power Penalty to Fix EM with AVS
Homogeneous Corners
Homogeneous Corners (2)
Correlation Matrix
Wiring Structure in Timing-Critical Paths (2)
Delay Variation
Delay Variation (2)
Non-Homogeneous Corner
Opportunities for Tightened BEOL Corners
Wiring Structure in Timing-Critical Paths (2)
Proposed Timing Signoff Flow (2)
Experiment Setup
Further Analysis
Scaling Factor Results
Benefits of Tightened BEOL Corners (1)
Heuristics 1 (2)
Vfinal Estimation (2)
Observation and Heuristic 2 (2)
Technology and Benchmark Circuits
A Reference Signoff Flow
Experiment Setup (2)
ldquoChicken and Eggrdquo Loop
Bias Temperature Instability (BTI)
Observation 1
Results for DC Scenario
Problem Signoff Corner Definition
AVS Signoff Corner Selection
AVS Impact on EM Lifetime
EM Impact on AVS Scheduling
What is ldquoSignoffrdquo
Statistical Timing Analysis (1)
Statistical Timing Analysis (2)
Resilient Designs
Resilience Cost Reduction Problem
Selective-Endpoint Optimization
Process-Aware Vdd Scaling (PVS)
Challenge Variability
Energy Reduction in AVS Context
Our Concept Mode Dominance
Our Method Global Optimization
Classes of Closed-Loop AVS
Design of RO with Tunable Vmin
Benefit of Resilience Cost Reduction
Increased Benefit of Resilience With AVS
Overall Optimization Flow (2)
40ISVLSI-2014 invited talk 140710
Thank You
41ISVLSI-2014 invited talk 140710
Backup
42ISVLSI-2014 invited talk 140710
Power Penalty to Fix EM with AVS
1 2 3 4 5 6 7 8 91200
1300
1400
1500
1600
1700
030
032
034
036
Core Power (mW) PG Power (mW)
Implemetation
Core
Pow
er (m
W)
PG
Pow
er (m
W)
bull Core power increases due to elevated voltage bull PG power increases due to both elevated voltage and mesh degradationbull A tradeoff between invested guardband in signoff
Highest invested guardband
Least invested guardband
14 power penalty
>
43ISVLSI-2014 invited talk 140710
Homogeneous Cornersbull (1) Define RC corners of each layer separatelybull (2) Use corners from each layer to construct a
homogeneous corner for an interconnect stack
C-3σ
Layer M2
3σ
C-3σ
Layer M1
3σ
Interconnect stack with M1 and M2
M1 C
M2 C
3σ Pessimism
Example worst-case capacitance corner Homogeneous
Cw corner
44ISVLSI-2014 invited talk 140710
Homogeneous Cornersbull (1) Define RC corners of each layer separatelybull (2) Use corners from each layer to construct a
homogeneous corner for an interconnect stack
Interconnect stack with M1 and M2
M1 C
M2 C
3σ
Homogeneous Cw corner
C-3σ
Layer M2
3σ
C-3σ
Layer M1
3σ
Pessimism
Example worst-case capacitance corner
When variations in different layers are not fully correlated pessimism of homogeneous corners increase with layers
45ISVLSI-2014 invited talk 140710
Correlation Matrixbull Let Σ be the correlation matrix for variation sources
M1 M2 M3 M4
ΔW ΔT ΔH ΔW ΔT ΔH ΔW ΔT ΔH ΔW ΔT ΔH
M1 ΔW 1 0 0 γ 0 0 γ 0 0 0 0 0
ΔT 0 1 0 0 γ 0 0 γ 0 0 0 0
ΔH 0 0 1 0 0 γ 0 0 γ 0 0 0
M2 ΔW γ 0 0 1 0 0 γ 0 0 0 0 0
ΔT 0 γ 0 0 1 0 0 γ 0 0 0 0
ΔH 0 0 γ 0 0 1 0 0 γ 0 0 0
M3 ΔW γ 0 0 γ 0 0 1 0 0 0 0 0
ΔT 0 γ 0 0 γ 0 0 1 0 0 0 0
ΔH 0 0 γ 0 0 γ 0 0 1 0 0 0
M4 ΔW 0 0 0 0 0 0 0 0 0 1 0 0
ΔT 0 0 0 0 0 0 0 0 0 0 1 0
ΔH 0 0 0 0 0 0 0 0 0 0 0 1
= Σ
Correlation for variation sources with the same variation type and in the process module γ 05
Variation sources in different process modules are independent
46ISVLSI-2014 invited talk 140710
Wiring Structure in Timing-Critical Paths (2)
bull 92 of paths have lt 60 of wirelength on any single layer
Max wirelength ratio across all layers ()
Cum
ulati
ve p
roba
bilit
y
092
60
bull Variations in different layers are not fully correlated
bull Some paths have α gt 10 a CBC can underestimate delay variationsbull But these paths have larger delays at the other corner
C-worst corner underestimates delay variations but these paths are dominated by the RC-worst corner
α lt 10 delay variations are covered by the RC-worst corner
Dominated by C-worstΔdelay at C-worst gt Δdelay at RC-worst
Dominated by RC-worst Δdelay at RC-worst gt Δdelay at C-worst
Δdelay at RC-worst [d(Ycw) ndash d(Ytyp)] d(Ytyp)
48ISVLSI-2014 invited talk 140710
Delay Variation
α α
Δdelay at C-worst [d(Ycw) ndash d(Ytyp)] d(Ytyp)
bull Some paths have α gt 10 a CBC can underestimate delay variationsbull But these paths have larger delays at the other corner
C-worst corner underestimates delay variations but these paths are dominated by the RC-worst corner
α lt 10 delay variations are covered by the RC-worst corner
Dominated by C-worstΔdelay at C-worst gt Δdelay at RC-worst
Dominated by RC-worst Δdelay at RC-worst gt Δdelay at C-worst
Δdelay at RC-worst [d(Ycw) ndash d(Ytyp)] d(Ytyp)
bull Paths are more sensitive to R or to Cbull Using RC-worst or C-worst only will underestimate delay variationsbull Need both RC- and C-worst corners to cover process variationsbull In the following discussions α is defined at the dominant corner
49ISVLSI-2014 invited talk 140710
Non-Homogeneous Corner
bull Each layer can have different skewed variationsInterconnect stack with M1 and M2
M1 C
M2 C
3σ
Non-homogeneous cornerM1 == Cw (3σ)M2 == Ctyp
bull Less pessimism with non-homogeneous cornersbull Challenge
bull Many feasible combinationsbull A corner can only cover certain pathsbull How to choose the best combinations
50ISVLSI-2014 invited talk 140710
Opportunities for Tightened BEOL Corners
bull CBC can be pessimistic Most paths have α lt 05 bull Use tightened BEOL corners eg scale BEOL variation in
itf with α = 05
Δdj(Yrcw)dj(Ytyp) x 100
3σjd(Ytyp) x 100
Challenge how to avoid underestimating delay variation to preserve parametric yield
51ISVLSI-2014 invited talk 140710
Wiring Structure in Timing-Critical Paths
bull Critical paths are structurally similar
bull Wires on critical paths are routed on many layers
bull Structure is an outcome of the design flow
Testcasebull 45nm foundry library (wire
resistivity scaled by 8X)bull Netlist NETCARD 1mm2 570K
standard cell instancesbull 9 metal layersbull Extract critical paths from
different PVT and BEOL corners
Wirelength ratio ()
52ISVLSI-2014 invited talk 140710
Proposed Timing Signoff Flow
bull Extract RC at RC-worst C-worst and the typical corners
bull Calculate Δdelay of critical paths
bull Put path j in the group Gtbc if Δdelay is larger than a threshold
bull Fix only the paths in Gtbc using tightened BEOL corners
bull Since tightened corners have smaller delay variations timing closure is easier
Routed design
Timing analysis at BEOL corners Ytyp Ycw Yrcw
GTBC GCBC
ECOusing CBC
Timing analysis
using TBC
violation = 0
Timing analysis
using CBC
violation = 0
ECOusing TBC
done
53ISVLSI-2014 invited talk 140710
Experiment Setup
LEON3MP NETCARD SUPERBLUE12Clock period (ns) 18 20 31
bull Model BTI degradation with Vfinal throughout lifetime
bull Aging of a flat Vfinal asymp aging of an adaptive Vdd
bull But slightly pessimistic
Vdd
time
NBTI
PBTI
VBTI = Vlib asymp Vfinal
58ISVLSI-2014 invited talk 140710
Vfinal Estimation
bull Problem Vfinal is not available at early design stage (design has not been implemented)
bull Vfinal = Vdd end of life (to compensate BTI aging)
bull Gates along critical pathbull Timing slack at t = 0
bull Circuit activity is not an issue bull Because BTI effect is not sensitive to circuit activitybull DC or AC stress model is sufficient
59ISVLSI-2014 invited talk 140710
Observation and Heuristic 2
bull Observation 2 Vfinal is not sensitive to gate types
bull Heuristic 2 use average Vfinal of different gate typesbull Vfinal is a function of timing slack
bull Assume timing slack = 0
10mV
60ISVLSI-2014 invited talk 140710
Technology and Benchmark Circuits
bull NANGATE library with 32nm PTM technology bull Signoff for setup time violationbull Temperature = 125Cbull Process corner = slow NMOS and PMOSbull BTI degradation = DC AC
Supply voltages
Circuit Frequency (GHz)C5315 138c7552 125AES 089MPEG2 105
Vmax105V
Vinit090V
Vheur1 (DC) 097V
Vheur1 (AC) 095V
Vheur2 (DC) 095V
Vheur2 (AC) 093V
61ISVLSI-2014 invited talk 140710
A Reference Signoff Flow
bull Basic idea keep a consistent VBTI VLIB and Vdd throughout circuit lifetime
bull Signoff flowbull Estimate aging at each time step
bull Update circuit timing and Vdd
bull Repeat until t = tfinal
bull Modify circuit and start over if Vfinal gt maximum allowed voltage
bull No overhead in timing analysis but very slow Many STA runs
and library
Vstep AVS voltage stepVfinal converged voltage
62ISVLSI-2014 invited talk 140710
Experiment Setupbull Characterize different derated libraries
bull Evaluate impact of library characterizationbull Seven setups
1 VBTI = Vlib = Vinit Ignore AVS2 Most pessimistic derated library3 VBTI = Vlib = Vmax Extreme corner for AVS4 VBTI = Vfinal Do not overestimate aging but ignores AVS5 No derated library (reference)6 Proposed method with α=07 Proposed method with α=003
Case 1 2 3 4 5 6 7Vlib(V) Vinit Vinit Vmax Vinit NA Vheur1 Vheur2
VBTI (V) Vinit Vmax Vmax Vfinal NA Vheur1 Vheur2
63ISVLSI-2014 invited talk 140710
ldquoChicken and Eggrdquo Loop
bull ldquoChicken and eggrdquo loop in signoffbull Derated library characterization is related to BTI + AVSbull AVS affected by circuit implementation
bull Timing constraints critical paths etc
bull Circuit is affected by library characterization
Circuit
Derated Libraries
Vfinal
Vlib VBTI
64ISVLSI-2014 invited talk 140710
Bias Temperature Instability (BTI)
|ΔVth| increases when device is on (stressed)|ΔVth| is partially recovered when device is off (relaxed)NBTI PMOS PBTINMOS
|Vgs|
time
ON OFF ON OFF
[VattikondaWC06]
Device aging (|ΔVth|) accumulates over time
[TCASrsquo14]
65ISVLSI-2014 invited talk 140710
Observation 1
[Chan11]
bull BTI is a ldquofront-loadedrdquo phenomenon
bull 50 BTI aging happens within the 1st year of circuit lifetime (total lifetime = 10 years)
bull Most Vdd increment happens in early lifetime
bull Gap between Vdd and Vfinal reduces rapidly
asymp70 Vdd increment in 1 year(remaining 30 over 9 years)
Vfinal
66ISVLSI-2014 invited talk 140710
Results for DC Scenario
Optimistic signoff corner bull AVS increases supply voltage
aggressively to compensate agingbull Consume more powerbull May fail to meet timing if desired
bull Assume no EM fix at signoffbull BTI degradation is checked at each step and MTTF is updated as
30 MTTF penalty
200mV voltage compensation
>
70ISVLSI-2014 invited talk 140710
0 2 4 6 8 10 12090092094096098100102104
S1 S2 S3 S4 S5
Year
VDD
DMA 3S1 S2 S3 S4 S5
78
79
80
81
MTT
F (Y
ear)
EM Impact on AVS Scheduling
12 years MTTF penalty
>
71ISVLSI-2014 invited talk 140710
What is ldquoSignoffrdquo
bull Foundation of contract between design house and foundrybull ldquochip should workrdquo stack of models margins analysesbull Function timing signal integrity power integrity hellip
Nominal VddStatic IR drop
Power grid IR gradientDynamic IR
HCINBTI
Signoff Vdd
Voltage
Problem Margins = pessimism
overdesign schedule delay
ldquomargin stackrdquo for voltage signoff
Operating voltage
72ISVLSI-2014 invited talk 140710
Statistical Timing Analysis (1)
bull Delay sensitivity of path pj to variation source zv
bull Assumptions bull Δdjv is linear with respect to variation sources
bull Variation sources are normal distributions
bull Obtain Δdjv using 28 runs of RC extraction and static timing analysis (STA)
28 itf files (27 variation
sources + Ytyp)
Routed Netlist
RC extraction
STA
Δdjv
Δdjv = [ - ] 3dj(Yv) dj(Ytyp)
Note Path delay includes gate and wire delays
73ISVLSI-2014 invited talk 140710
Statistical Timing Analysis (2)bull Σ is the correlation matrix for variation sources (eg 27 x 27)
bull Σ = λλT (Note λ is obtained by Cholesky decomposition)
h h119879 119903119900119906119892 119901119906119905=1minus119864119877119879
+1minus119864119877119903times119879
recovery cycles
Clock period
Error rate [Kahng10]
76ISVLSI-2014 invited talk 140710
Selective-Endpoint Optimizationbull Optimize fanin cone w tighter constraints Allows replacement of Razor FF w normal FFbull Trade off cost of resilience vs data path optimization
Energy Reduction in AVS Contextbull Adaptive voltage scaling allows lower supply voltage for resilient
designs thus reduced powerbull Proposed method trades off between timing-error penalty vs
reduced power at a lower supply voltagebull Proposed method achieves an average of 18 energy reduction
compared to pure-margin designs Resilience benefits increase in the context of AVS strategy
084 088 092 096 10030
36
42
48
54
60brute-forcepure-marginCombOpt
Supply voltage (V)
En
erg
y (
mJ
)
084 086 088 09 092 094 096 098 1 10225
29
33
37
41
45brute-force
pure-margin
CombOpt
Supply voltage (V)
En
erg
y (
mJ
)
MUL EXU
Minimum achievable energy
80ISVLSI-2014 invited talk 140710
Our Concept Mode Dominancebull Design cone (of mode A) is the union of all the feasible operating modes for
circuits signed of at mode Abull Design cone is determined by tradeoff between voltage and frequency (mainly
threshold voltages)bull One mode is outside of the design cone of the other
failed design overdesignbull Mode A has positive timing slacks with respect to mode B
mode A dominates mode Bbull Equivalent dominance no mode is dominated by the other
bull Modes are in each othersrsquo design cone
Voltage
Frequency
A
Negative Slacks = failed design
Positive Slacks = overdesign
B
C
Design Cone of mode A
Multi-mode signoff at modes which do not exhibit equivalent dominance leads to overdesign
Guideline search for signoff modes within design cone reduce overdesign
81ISVLSI-2014 invited talk 140710
Our Method Global Optimizationbull Iteratively sample and refine power models
bull Avoid circuit implementation at each modebull Small constant of runs is enough Scalable
Sample (SPampR)
Construct power models
Estimate optimal signoff modes
Sample (SPampR)
Refine power models
Adaptive search
Global optimization flow
09 10 11 1214
15
16
17
18
19
201st 2nd real
Signoff Voltage (v)
Po
wer
(m
W)
Power estimation of adaptive search
bull Ovals indicate sample pointsbull 1st 2nd power from power models at first
second iterationbull real power from real implemented circuits
Design AESf 700MHz
82ISVLSI-2014 invited talk 140710
Classes of Closed-Loop AVS
bull Critical path may be difficult to identify (IP from 3rd party)
bull Calibrating monitors at multiple modesvoltages requires long test time
Closed-Loop AVS
Design-dependent replica
In-situmonitor
Generic monitor
bull Does not capture design-specific performance variation
82
This work Tunable monitor for closed-loop AVSbull Can be applied as a generic monitorbull Or tuned to capture design-specific performance
83ISVLSI-2014 invited talk 140710
Design of RO with Tunable Vmin
bull Identified two circuit knobs to tune Vmin
bull Series resistancebull Cell types (INV NAND NOR)
bull Proposed circuitbull Different cell type covers different process cornersbull Tune series resistance of each stage to high or low
1 bit 1 bit 1 bit Control pins
High resistance
Low resistance
84ISVLSI-2014 invited talk 140710
Benefit of Resilience Cost Reductionbull Reference flows
bull Pure-margin (PM) conventional methodology w only margin insertionbull Brute-force (BF) insert error-tolerant FFs at timing-critical endpoints
bull Proposed method (CO) achieves up to 20 energy reduction compared to reference methods
bull Resilience benefits increase with safety margin
PM BF CO PM BF CO PM BF CO25
30
35
40
45
50
55 Energy penalty of throughput degradationEnergy penalty of additional circuitsEnergy wo resilience
En
erg
y (
mJ
)
Large marginMedium marginSmall margin
MUL
PM BF CO PM BF CO PM BF CO25
27
29
31
33
35
En
erg
y (
mJ
)
EXU
Large marginMedium marginSmall margin
Smallmediumlarge margin safety margin = 51015 of clock period
85ISVLSI-2014 invited talk 140710
Increased Benefit of Resilience With AVSbull AVS (Adaptive Voltage Scaling) allows lower supply voltage for
resilient designs reduced powerbull We trade off between timing-error penalty vs reduced power at a
lower supply voltagebull Average 18 energy reduction compared to pure-margin designs
Resilience benefits increase in AVS context
084 088 092 096 10030
36
42
48
54
60brute-forcepure-marginCombOpt
Supply voltage (V)
En
erg
y (
mJ
)
084 086 088 09 092 094 096 098 1 10225
29
33
37
41
45brute-force
pure-margin
CombOpt
Supply voltage (V)
En
erg
y (
mJ
)
MUL EXU
Minimum achievable energy
86ISVLSI-2014 invited talk 140710
Overall Optimization Flow
bull Iteratively optimize with SEOpt and SkewOpt
Initial placement (all FFs = error-tolerant FFs)
Energy lt min energy
Save current solution
Margin insertion on K paths based on sensitivity function
Replace error-tolerant FFs w normal FFs
SEOpt
Activity-aware clock skew optimization
SkewOpt
Toward Holistic Modeling Margining and Tolerance of IC Variabi
IC Variability
Challenge Value of Technology
Solutions Modeling Margining Tolerance
Outline
BEOL Corner Optimization
Proposed Timing Signoff Flow
Conventional BEOL Corners
Statistical RC Model
Pessimism of Conventional BEOL Corners (CBC)
Intuition on Delay Variability Across Cw RCw
Intuition on Delay Variability Across Cw RCw (2)
Scaling Factor α and Delay Variation
Find Paths for Which TBCs Can Be Used
Determining α Arcw and Acw
Benefits of Tightened BEOL Corners
Outline (2)
How to Minimize Cost of Resilience
Tradeoff Resilience Cost vs Datapath Cost
Selective-Endpoint Optimization (SEOpt)
Clock Skew Optimization (SkewOpt)
Overall Optimization Flow
Benefit of Low-Cost Resilience
Increased Benefit of Resilience with AVS
Outline (3)
Breaking Chicken-Egg Loops Less Margin
Derated Library Characterization and AVS
Library Characterization for AVS
Power vs Area Across Different Signoffs
Heuristics 1
Vfinal Estimation
Observation and Heuristic 2
Proposed Library Characterization Flow
Power vs Area for All Designs
Also Multi-Mode Signoff Choices Matter
Also Tunable Monitors Less Margin
Also Tunable Monitors Less Margin (2)
Outline (4)
Conclusions
Thank You
Backup
Power Penalty to Fix EM with AVS
Homogeneous Corners
Homogeneous Corners (2)
Correlation Matrix
Wiring Structure in Timing-Critical Paths (2)
Delay Variation
Delay Variation (2)
Non-Homogeneous Corner
Opportunities for Tightened BEOL Corners
Wiring Structure in Timing-Critical Paths (2)
Proposed Timing Signoff Flow (2)
Experiment Setup
Further Analysis
Scaling Factor Results
Benefits of Tightened BEOL Corners (1)
Heuristics 1 (2)
Vfinal Estimation (2)
Observation and Heuristic 2 (2)
Technology and Benchmark Circuits
A Reference Signoff Flow
Experiment Setup (2)
ldquoChicken and Eggrdquo Loop
Bias Temperature Instability (BTI)
Observation 1
Results for DC Scenario
Problem Signoff Corner Definition
AVS Signoff Corner Selection
AVS Impact on EM Lifetime
EM Impact on AVS Scheduling
What is ldquoSignoffrdquo
Statistical Timing Analysis (1)
Statistical Timing Analysis (2)
Resilient Designs
Resilience Cost Reduction Problem
Selective-Endpoint Optimization
Process-Aware Vdd Scaling (PVS)
Challenge Variability
Energy Reduction in AVS Context
Our Concept Mode Dominance
Our Method Global Optimization
Classes of Closed-Loop AVS
Design of RO with Tunable Vmin
Benefit of Resilience Cost Reduction
Increased Benefit of Resilience With AVS
Overall Optimization Flow (2)
41ISVLSI-2014 invited talk 140710
Backup
42ISVLSI-2014 invited talk 140710
Power Penalty to Fix EM with AVS
1 2 3 4 5 6 7 8 91200
1300
1400
1500
1600
1700
030
032
034
036
Core Power (mW) PG Power (mW)
Implemetation
Core
Pow
er (m
W)
PG
Pow
er (m
W)
bull Core power increases due to elevated voltage bull PG power increases due to both elevated voltage and mesh degradationbull A tradeoff between invested guardband in signoff
Highest invested guardband
Least invested guardband
14 power penalty
>
43ISVLSI-2014 invited talk 140710
Homogeneous Cornersbull (1) Define RC corners of each layer separatelybull (2) Use corners from each layer to construct a
homogeneous corner for an interconnect stack
C-3σ
Layer M2
3σ
C-3σ
Layer M1
3σ
Interconnect stack with M1 and M2
M1 C
M2 C
3σ Pessimism
Example worst-case capacitance corner Homogeneous
Cw corner
44ISVLSI-2014 invited talk 140710
Homogeneous Cornersbull (1) Define RC corners of each layer separatelybull (2) Use corners from each layer to construct a
homogeneous corner for an interconnect stack
Interconnect stack with M1 and M2
M1 C
M2 C
3σ
Homogeneous Cw corner
C-3σ
Layer M2
3σ
C-3σ
Layer M1
3σ
Pessimism
Example worst-case capacitance corner
When variations in different layers are not fully correlated pessimism of homogeneous corners increase with layers
45ISVLSI-2014 invited talk 140710
Correlation Matrixbull Let Σ be the correlation matrix for variation sources
M1 M2 M3 M4
ΔW ΔT ΔH ΔW ΔT ΔH ΔW ΔT ΔH ΔW ΔT ΔH
M1 ΔW 1 0 0 γ 0 0 γ 0 0 0 0 0
ΔT 0 1 0 0 γ 0 0 γ 0 0 0 0
ΔH 0 0 1 0 0 γ 0 0 γ 0 0 0
M2 ΔW γ 0 0 1 0 0 γ 0 0 0 0 0
ΔT 0 γ 0 0 1 0 0 γ 0 0 0 0
ΔH 0 0 γ 0 0 1 0 0 γ 0 0 0
M3 ΔW γ 0 0 γ 0 0 1 0 0 0 0 0
ΔT 0 γ 0 0 γ 0 0 1 0 0 0 0
ΔH 0 0 γ 0 0 γ 0 0 1 0 0 0
M4 ΔW 0 0 0 0 0 0 0 0 0 1 0 0
ΔT 0 0 0 0 0 0 0 0 0 0 1 0
ΔH 0 0 0 0 0 0 0 0 0 0 0 1
= Σ
Correlation for variation sources with the same variation type and in the process module γ 05
Variation sources in different process modules are independent
46ISVLSI-2014 invited talk 140710
Wiring Structure in Timing-Critical Paths (2)
bull 92 of paths have lt 60 of wirelength on any single layer
Max wirelength ratio across all layers ()
Cum
ulati
ve p
roba
bilit
y
092
60
bull Variations in different layers are not fully correlated
bull Some paths have α gt 10 a CBC can underestimate delay variationsbull But these paths have larger delays at the other corner
C-worst corner underestimates delay variations but these paths are dominated by the RC-worst corner
α lt 10 delay variations are covered by the RC-worst corner
Dominated by C-worstΔdelay at C-worst gt Δdelay at RC-worst
Dominated by RC-worst Δdelay at RC-worst gt Δdelay at C-worst
Δdelay at RC-worst [d(Ycw) ndash d(Ytyp)] d(Ytyp)
48ISVLSI-2014 invited talk 140710
Delay Variation
α α
Δdelay at C-worst [d(Ycw) ndash d(Ytyp)] d(Ytyp)
bull Some paths have α gt 10 a CBC can underestimate delay variationsbull But these paths have larger delays at the other corner
C-worst corner underestimates delay variations but these paths are dominated by the RC-worst corner
α lt 10 delay variations are covered by the RC-worst corner
Dominated by C-worstΔdelay at C-worst gt Δdelay at RC-worst
Dominated by RC-worst Δdelay at RC-worst gt Δdelay at C-worst
Δdelay at RC-worst [d(Ycw) ndash d(Ytyp)] d(Ytyp)
bull Paths are more sensitive to R or to Cbull Using RC-worst or C-worst only will underestimate delay variationsbull Need both RC- and C-worst corners to cover process variationsbull In the following discussions α is defined at the dominant corner
49ISVLSI-2014 invited talk 140710
Non-Homogeneous Corner
bull Each layer can have different skewed variationsInterconnect stack with M1 and M2
M1 C
M2 C
3σ
Non-homogeneous cornerM1 == Cw (3σ)M2 == Ctyp
bull Less pessimism with non-homogeneous cornersbull Challenge
bull Many feasible combinationsbull A corner can only cover certain pathsbull How to choose the best combinations
50ISVLSI-2014 invited talk 140710
Opportunities for Tightened BEOL Corners
bull CBC can be pessimistic Most paths have α lt 05 bull Use tightened BEOL corners eg scale BEOL variation in
itf with α = 05
Δdj(Yrcw)dj(Ytyp) x 100
3σjd(Ytyp) x 100
Challenge how to avoid underestimating delay variation to preserve parametric yield
51ISVLSI-2014 invited talk 140710
Wiring Structure in Timing-Critical Paths
bull Critical paths are structurally similar
bull Wires on critical paths are routed on many layers
bull Structure is an outcome of the design flow
Testcasebull 45nm foundry library (wire
resistivity scaled by 8X)bull Netlist NETCARD 1mm2 570K
standard cell instancesbull 9 metal layersbull Extract critical paths from
different PVT and BEOL corners
Wirelength ratio ()
52ISVLSI-2014 invited talk 140710
Proposed Timing Signoff Flow
bull Extract RC at RC-worst C-worst and the typical corners
bull Calculate Δdelay of critical paths
bull Put path j in the group Gtbc if Δdelay is larger than a threshold
bull Fix only the paths in Gtbc using tightened BEOL corners
bull Since tightened corners have smaller delay variations timing closure is easier
Routed design
Timing analysis at BEOL corners Ytyp Ycw Yrcw
GTBC GCBC
ECOusing CBC
Timing analysis
using TBC
violation = 0
Timing analysis
using CBC
violation = 0
ECOusing TBC
done
53ISVLSI-2014 invited talk 140710
Experiment Setup
LEON3MP NETCARD SUPERBLUE12Clock period (ns) 18 20 31
bull Model BTI degradation with Vfinal throughout lifetime
bull Aging of a flat Vfinal asymp aging of an adaptive Vdd
bull But slightly pessimistic
Vdd
time
NBTI
PBTI
VBTI = Vlib asymp Vfinal
58ISVLSI-2014 invited talk 140710
Vfinal Estimation
bull Problem Vfinal is not available at early design stage (design has not been implemented)
bull Vfinal = Vdd end of life (to compensate BTI aging)
bull Gates along critical pathbull Timing slack at t = 0
bull Circuit activity is not an issue bull Because BTI effect is not sensitive to circuit activitybull DC or AC stress model is sufficient
59ISVLSI-2014 invited talk 140710
Observation and Heuristic 2
bull Observation 2 Vfinal is not sensitive to gate types
bull Heuristic 2 use average Vfinal of different gate typesbull Vfinal is a function of timing slack
bull Assume timing slack = 0
10mV
60ISVLSI-2014 invited talk 140710
Technology and Benchmark Circuits
bull NANGATE library with 32nm PTM technology bull Signoff for setup time violationbull Temperature = 125Cbull Process corner = slow NMOS and PMOSbull BTI degradation = DC AC
Supply voltages
Circuit Frequency (GHz)C5315 138c7552 125AES 089MPEG2 105
Vmax105V
Vinit090V
Vheur1 (DC) 097V
Vheur1 (AC) 095V
Vheur2 (DC) 095V
Vheur2 (AC) 093V
61ISVLSI-2014 invited talk 140710
A Reference Signoff Flow
bull Basic idea keep a consistent VBTI VLIB and Vdd throughout circuit lifetime
bull Signoff flowbull Estimate aging at each time step
bull Update circuit timing and Vdd
bull Repeat until t = tfinal
bull Modify circuit and start over if Vfinal gt maximum allowed voltage
bull No overhead in timing analysis but very slow Many STA runs
and library
Vstep AVS voltage stepVfinal converged voltage
62ISVLSI-2014 invited talk 140710
Experiment Setupbull Characterize different derated libraries
bull Evaluate impact of library characterizationbull Seven setups
1 VBTI = Vlib = Vinit Ignore AVS2 Most pessimistic derated library3 VBTI = Vlib = Vmax Extreme corner for AVS4 VBTI = Vfinal Do not overestimate aging but ignores AVS5 No derated library (reference)6 Proposed method with α=07 Proposed method with α=003
Case 1 2 3 4 5 6 7Vlib(V) Vinit Vinit Vmax Vinit NA Vheur1 Vheur2
VBTI (V) Vinit Vmax Vmax Vfinal NA Vheur1 Vheur2
63ISVLSI-2014 invited talk 140710
ldquoChicken and Eggrdquo Loop
bull ldquoChicken and eggrdquo loop in signoffbull Derated library characterization is related to BTI + AVSbull AVS affected by circuit implementation
bull Timing constraints critical paths etc
bull Circuit is affected by library characterization
Circuit
Derated Libraries
Vfinal
Vlib VBTI
64ISVLSI-2014 invited talk 140710
Bias Temperature Instability (BTI)
|ΔVth| increases when device is on (stressed)|ΔVth| is partially recovered when device is off (relaxed)NBTI PMOS PBTINMOS
|Vgs|
time
ON OFF ON OFF
[VattikondaWC06]
Device aging (|ΔVth|) accumulates over time
[TCASrsquo14]
65ISVLSI-2014 invited talk 140710
Observation 1
[Chan11]
bull BTI is a ldquofront-loadedrdquo phenomenon
bull 50 BTI aging happens within the 1st year of circuit lifetime (total lifetime = 10 years)
bull Most Vdd increment happens in early lifetime
bull Gap between Vdd and Vfinal reduces rapidly
asymp70 Vdd increment in 1 year(remaining 30 over 9 years)
Vfinal
66ISVLSI-2014 invited talk 140710
Results for DC Scenario
Optimistic signoff corner bull AVS increases supply voltage
aggressively to compensate agingbull Consume more powerbull May fail to meet timing if desired
bull Assume no EM fix at signoffbull BTI degradation is checked at each step and MTTF is updated as
30 MTTF penalty
200mV voltage compensation
>
70ISVLSI-2014 invited talk 140710
0 2 4 6 8 10 12090092094096098100102104
S1 S2 S3 S4 S5
Year
VDD
DMA 3S1 S2 S3 S4 S5
78
79
80
81
MTT
F (Y
ear)
EM Impact on AVS Scheduling
12 years MTTF penalty
>
71ISVLSI-2014 invited talk 140710
What is ldquoSignoffrdquo
bull Foundation of contract between design house and foundrybull ldquochip should workrdquo stack of models margins analysesbull Function timing signal integrity power integrity hellip
Nominal VddStatic IR drop
Power grid IR gradientDynamic IR
HCINBTI
Signoff Vdd
Voltage
Problem Margins = pessimism
overdesign schedule delay
ldquomargin stackrdquo for voltage signoff
Operating voltage
72ISVLSI-2014 invited talk 140710
Statistical Timing Analysis (1)
bull Delay sensitivity of path pj to variation source zv
bull Assumptions bull Δdjv is linear with respect to variation sources
bull Variation sources are normal distributions
bull Obtain Δdjv using 28 runs of RC extraction and static timing analysis (STA)
28 itf files (27 variation
sources + Ytyp)
Routed Netlist
RC extraction
STA
Δdjv
Δdjv = [ - ] 3dj(Yv) dj(Ytyp)
Note Path delay includes gate and wire delays
73ISVLSI-2014 invited talk 140710
Statistical Timing Analysis (2)bull Σ is the correlation matrix for variation sources (eg 27 x 27)
bull Σ = λλT (Note λ is obtained by Cholesky decomposition)
h h119879 119903119900119906119892 119901119906119905=1minus119864119877119879
+1minus119864119877119903times119879
recovery cycles
Clock period
Error rate [Kahng10]
76ISVLSI-2014 invited talk 140710
Selective-Endpoint Optimizationbull Optimize fanin cone w tighter constraints Allows replacement of Razor FF w normal FFbull Trade off cost of resilience vs data path optimization
Energy Reduction in AVS Contextbull Adaptive voltage scaling allows lower supply voltage for resilient
designs thus reduced powerbull Proposed method trades off between timing-error penalty vs
reduced power at a lower supply voltagebull Proposed method achieves an average of 18 energy reduction
compared to pure-margin designs Resilience benefits increase in the context of AVS strategy
084 088 092 096 10030
36
42
48
54
60brute-forcepure-marginCombOpt
Supply voltage (V)
En
erg
y (
mJ
)
084 086 088 09 092 094 096 098 1 10225
29
33
37
41
45brute-force
pure-margin
CombOpt
Supply voltage (V)
En
erg
y (
mJ
)
MUL EXU
Minimum achievable energy
80ISVLSI-2014 invited talk 140710
Our Concept Mode Dominancebull Design cone (of mode A) is the union of all the feasible operating modes for
circuits signed of at mode Abull Design cone is determined by tradeoff between voltage and frequency (mainly
threshold voltages)bull One mode is outside of the design cone of the other
failed design overdesignbull Mode A has positive timing slacks with respect to mode B
mode A dominates mode Bbull Equivalent dominance no mode is dominated by the other
bull Modes are in each othersrsquo design cone
Voltage
Frequency
A
Negative Slacks = failed design
Positive Slacks = overdesign
B
C
Design Cone of mode A
Multi-mode signoff at modes which do not exhibit equivalent dominance leads to overdesign
Guideline search for signoff modes within design cone reduce overdesign
81ISVLSI-2014 invited talk 140710
Our Method Global Optimizationbull Iteratively sample and refine power models
bull Avoid circuit implementation at each modebull Small constant of runs is enough Scalable
Sample (SPampR)
Construct power models
Estimate optimal signoff modes
Sample (SPampR)
Refine power models
Adaptive search
Global optimization flow
09 10 11 1214
15
16
17
18
19
201st 2nd real
Signoff Voltage (v)
Po
wer
(m
W)
Power estimation of adaptive search
bull Ovals indicate sample pointsbull 1st 2nd power from power models at first
second iterationbull real power from real implemented circuits
Design AESf 700MHz
82ISVLSI-2014 invited talk 140710
Classes of Closed-Loop AVS
bull Critical path may be difficult to identify (IP from 3rd party)
bull Calibrating monitors at multiple modesvoltages requires long test time
Closed-Loop AVS
Design-dependent replica
In-situmonitor
Generic monitor
bull Does not capture design-specific performance variation
82
This work Tunable monitor for closed-loop AVSbull Can be applied as a generic monitorbull Or tuned to capture design-specific performance
83ISVLSI-2014 invited talk 140710
Design of RO with Tunable Vmin
bull Identified two circuit knobs to tune Vmin
bull Series resistancebull Cell types (INV NAND NOR)
bull Proposed circuitbull Different cell type covers different process cornersbull Tune series resistance of each stage to high or low
1 bit 1 bit 1 bit Control pins
High resistance
Low resistance
84ISVLSI-2014 invited talk 140710
Benefit of Resilience Cost Reductionbull Reference flows
bull Pure-margin (PM) conventional methodology w only margin insertionbull Brute-force (BF) insert error-tolerant FFs at timing-critical endpoints
bull Proposed method (CO) achieves up to 20 energy reduction compared to reference methods
bull Resilience benefits increase with safety margin
PM BF CO PM BF CO PM BF CO25
30
35
40
45
50
55 Energy penalty of throughput degradationEnergy penalty of additional circuitsEnergy wo resilience
En
erg
y (
mJ
)
Large marginMedium marginSmall margin
MUL
PM BF CO PM BF CO PM BF CO25
27
29
31
33
35
En
erg
y (
mJ
)
EXU
Large marginMedium marginSmall margin
Smallmediumlarge margin safety margin = 51015 of clock period
85ISVLSI-2014 invited talk 140710
Increased Benefit of Resilience With AVSbull AVS (Adaptive Voltage Scaling) allows lower supply voltage for
resilient designs reduced powerbull We trade off between timing-error penalty vs reduced power at a
lower supply voltagebull Average 18 energy reduction compared to pure-margin designs
Resilience benefits increase in AVS context
084 088 092 096 10030
36
42
48
54
60brute-forcepure-marginCombOpt
Supply voltage (V)
En
erg
y (
mJ
)
084 086 088 09 092 094 096 098 1 10225
29
33
37
41
45brute-force
pure-margin
CombOpt
Supply voltage (V)
En
erg
y (
mJ
)
MUL EXU
Minimum achievable energy
86ISVLSI-2014 invited talk 140710
Overall Optimization Flow
bull Iteratively optimize with SEOpt and SkewOpt
Initial placement (all FFs = error-tolerant FFs)
Energy lt min energy
Save current solution
Margin insertion on K paths based on sensitivity function
Replace error-tolerant FFs w normal FFs
SEOpt
Activity-aware clock skew optimization
SkewOpt
Toward Holistic Modeling Margining and Tolerance of IC Variabi
IC Variability
Challenge Value of Technology
Solutions Modeling Margining Tolerance
Outline
BEOL Corner Optimization
Proposed Timing Signoff Flow
Conventional BEOL Corners
Statistical RC Model
Pessimism of Conventional BEOL Corners (CBC)
Intuition on Delay Variability Across Cw RCw
Intuition on Delay Variability Across Cw RCw (2)
Scaling Factor α and Delay Variation
Find Paths for Which TBCs Can Be Used
Determining α Arcw and Acw
Benefits of Tightened BEOL Corners
Outline (2)
How to Minimize Cost of Resilience
Tradeoff Resilience Cost vs Datapath Cost
Selective-Endpoint Optimization (SEOpt)
Clock Skew Optimization (SkewOpt)
Overall Optimization Flow
Benefit of Low-Cost Resilience
Increased Benefit of Resilience with AVS
Outline (3)
Breaking Chicken-Egg Loops Less Margin
Derated Library Characterization and AVS
Library Characterization for AVS
Power vs Area Across Different Signoffs
Heuristics 1
Vfinal Estimation
Observation and Heuristic 2
Proposed Library Characterization Flow
Power vs Area for All Designs
Also Multi-Mode Signoff Choices Matter
Also Tunable Monitors Less Margin
Also Tunable Monitors Less Margin (2)
Outline (4)
Conclusions
Thank You
Backup
Power Penalty to Fix EM with AVS
Homogeneous Corners
Homogeneous Corners (2)
Correlation Matrix
Wiring Structure in Timing-Critical Paths (2)
Delay Variation
Delay Variation (2)
Non-Homogeneous Corner
Opportunities for Tightened BEOL Corners
Wiring Structure in Timing-Critical Paths (2)
Proposed Timing Signoff Flow (2)
Experiment Setup
Further Analysis
Scaling Factor Results
Benefits of Tightened BEOL Corners (1)
Heuristics 1 (2)
Vfinal Estimation (2)
Observation and Heuristic 2 (2)
Technology and Benchmark Circuits
A Reference Signoff Flow
Experiment Setup (2)
ldquoChicken and Eggrdquo Loop
Bias Temperature Instability (BTI)
Observation 1
Results for DC Scenario
Problem Signoff Corner Definition
AVS Signoff Corner Selection
AVS Impact on EM Lifetime
EM Impact on AVS Scheduling
What is ldquoSignoffrdquo
Statistical Timing Analysis (1)
Statistical Timing Analysis (2)
Resilient Designs
Resilience Cost Reduction Problem
Selective-Endpoint Optimization
Process-Aware Vdd Scaling (PVS)
Challenge Variability
Energy Reduction in AVS Context
Our Concept Mode Dominance
Our Method Global Optimization
Classes of Closed-Loop AVS
Design of RO with Tunable Vmin
Benefit of Resilience Cost Reduction
Increased Benefit of Resilience With AVS
Overall Optimization Flow (2)
42ISVLSI-2014 invited talk 140710
Power Penalty to Fix EM with AVS
1 2 3 4 5 6 7 8 91200
1300
1400
1500
1600
1700
030
032
034
036
Core Power (mW) PG Power (mW)
Implemetation
Core
Pow
er (m
W)
PG
Pow
er (m
W)
bull Core power increases due to elevated voltage bull PG power increases due to both elevated voltage and mesh degradationbull A tradeoff between invested guardband in signoff
Highest invested guardband
Least invested guardband
14 power penalty
>
43ISVLSI-2014 invited talk 140710
Homogeneous Cornersbull (1) Define RC corners of each layer separatelybull (2) Use corners from each layer to construct a
homogeneous corner for an interconnect stack
C-3σ
Layer M2
3σ
C-3σ
Layer M1
3σ
Interconnect stack with M1 and M2
M1 C
M2 C
3σ Pessimism
Example worst-case capacitance corner Homogeneous
Cw corner
44ISVLSI-2014 invited talk 140710
Homogeneous Cornersbull (1) Define RC corners of each layer separatelybull (2) Use corners from each layer to construct a
homogeneous corner for an interconnect stack
Interconnect stack with M1 and M2
M1 C
M2 C
3σ
Homogeneous Cw corner
C-3σ
Layer M2
3σ
C-3σ
Layer M1
3σ
Pessimism
Example worst-case capacitance corner
When variations in different layers are not fully correlated pessimism of homogeneous corners increase with layers
45ISVLSI-2014 invited talk 140710
Correlation Matrixbull Let Σ be the correlation matrix for variation sources
M1 M2 M3 M4
ΔW ΔT ΔH ΔW ΔT ΔH ΔW ΔT ΔH ΔW ΔT ΔH
M1 ΔW 1 0 0 γ 0 0 γ 0 0 0 0 0
ΔT 0 1 0 0 γ 0 0 γ 0 0 0 0
ΔH 0 0 1 0 0 γ 0 0 γ 0 0 0
M2 ΔW γ 0 0 1 0 0 γ 0 0 0 0 0
ΔT 0 γ 0 0 1 0 0 γ 0 0 0 0
ΔH 0 0 γ 0 0 1 0 0 γ 0 0 0
M3 ΔW γ 0 0 γ 0 0 1 0 0 0 0 0
ΔT 0 γ 0 0 γ 0 0 1 0 0 0 0
ΔH 0 0 γ 0 0 γ 0 0 1 0 0 0
M4 ΔW 0 0 0 0 0 0 0 0 0 1 0 0
ΔT 0 0 0 0 0 0 0 0 0 0 1 0
ΔH 0 0 0 0 0 0 0 0 0 0 0 1
= Σ
Correlation for variation sources with the same variation type and in the process module γ 05
Variation sources in different process modules are independent
46ISVLSI-2014 invited talk 140710
Wiring Structure in Timing-Critical Paths (2)
bull 92 of paths have lt 60 of wirelength on any single layer
Max wirelength ratio across all layers ()
Cum
ulati
ve p
roba
bilit
y
092
60
bull Variations in different layers are not fully correlated
bull Some paths have α gt 10 a CBC can underestimate delay variationsbull But these paths have larger delays at the other corner
C-worst corner underestimates delay variations but these paths are dominated by the RC-worst corner
α lt 10 delay variations are covered by the RC-worst corner
Dominated by C-worstΔdelay at C-worst gt Δdelay at RC-worst
Dominated by RC-worst Δdelay at RC-worst gt Δdelay at C-worst
Δdelay at RC-worst [d(Ycw) ndash d(Ytyp)] d(Ytyp)
48ISVLSI-2014 invited talk 140710
Delay Variation
α α
Δdelay at C-worst [d(Ycw) ndash d(Ytyp)] d(Ytyp)
bull Some paths have α gt 10 a CBC can underestimate delay variationsbull But these paths have larger delays at the other corner
C-worst corner underestimates delay variations but these paths are dominated by the RC-worst corner
α lt 10 delay variations are covered by the RC-worst corner
Dominated by C-worstΔdelay at C-worst gt Δdelay at RC-worst
Dominated by RC-worst Δdelay at RC-worst gt Δdelay at C-worst
Δdelay at RC-worst [d(Ycw) ndash d(Ytyp)] d(Ytyp)
bull Paths are more sensitive to R or to Cbull Using RC-worst or C-worst only will underestimate delay variationsbull Need both RC- and C-worst corners to cover process variationsbull In the following discussions α is defined at the dominant corner
49ISVLSI-2014 invited talk 140710
Non-Homogeneous Corner
bull Each layer can have different skewed variationsInterconnect stack with M1 and M2
M1 C
M2 C
3σ
Non-homogeneous cornerM1 == Cw (3σ)M2 == Ctyp
bull Less pessimism with non-homogeneous cornersbull Challenge
bull Many feasible combinationsbull A corner can only cover certain pathsbull How to choose the best combinations
50ISVLSI-2014 invited talk 140710
Opportunities for Tightened BEOL Corners
bull CBC can be pessimistic Most paths have α lt 05 bull Use tightened BEOL corners eg scale BEOL variation in
itf with α = 05
Δdj(Yrcw)dj(Ytyp) x 100
3σjd(Ytyp) x 100
Challenge how to avoid underestimating delay variation to preserve parametric yield
51ISVLSI-2014 invited talk 140710
Wiring Structure in Timing-Critical Paths
bull Critical paths are structurally similar
bull Wires on critical paths are routed on many layers
bull Structure is an outcome of the design flow
Testcasebull 45nm foundry library (wire
resistivity scaled by 8X)bull Netlist NETCARD 1mm2 570K
standard cell instancesbull 9 metal layersbull Extract critical paths from
different PVT and BEOL corners
Wirelength ratio ()
52ISVLSI-2014 invited talk 140710
Proposed Timing Signoff Flow
bull Extract RC at RC-worst C-worst and the typical corners
bull Calculate Δdelay of critical paths
bull Put path j in the group Gtbc if Δdelay is larger than a threshold
bull Fix only the paths in Gtbc using tightened BEOL corners
bull Since tightened corners have smaller delay variations timing closure is easier
Routed design
Timing analysis at BEOL corners Ytyp Ycw Yrcw
GTBC GCBC
ECOusing CBC
Timing analysis
using TBC
violation = 0
Timing analysis
using CBC
violation = 0
ECOusing TBC
done
53ISVLSI-2014 invited talk 140710
Experiment Setup
LEON3MP NETCARD SUPERBLUE12Clock period (ns) 18 20 31
bull Model BTI degradation with Vfinal throughout lifetime
bull Aging of a flat Vfinal asymp aging of an adaptive Vdd
bull But slightly pessimistic
Vdd
time
NBTI
PBTI
VBTI = Vlib asymp Vfinal
58ISVLSI-2014 invited talk 140710
Vfinal Estimation
bull Problem Vfinal is not available at early design stage (design has not been implemented)
bull Vfinal = Vdd end of life (to compensate BTI aging)
bull Gates along critical pathbull Timing slack at t = 0
bull Circuit activity is not an issue bull Because BTI effect is not sensitive to circuit activitybull DC or AC stress model is sufficient
59ISVLSI-2014 invited talk 140710
Observation and Heuristic 2
bull Observation 2 Vfinal is not sensitive to gate types
bull Heuristic 2 use average Vfinal of different gate typesbull Vfinal is a function of timing slack
bull Assume timing slack = 0
10mV
60ISVLSI-2014 invited talk 140710
Technology and Benchmark Circuits
bull NANGATE library with 32nm PTM technology bull Signoff for setup time violationbull Temperature = 125Cbull Process corner = slow NMOS and PMOSbull BTI degradation = DC AC
Supply voltages
Circuit Frequency (GHz)C5315 138c7552 125AES 089MPEG2 105
Vmax105V
Vinit090V
Vheur1 (DC) 097V
Vheur1 (AC) 095V
Vheur2 (DC) 095V
Vheur2 (AC) 093V
61ISVLSI-2014 invited talk 140710
A Reference Signoff Flow
bull Basic idea keep a consistent VBTI VLIB and Vdd throughout circuit lifetime
bull Signoff flowbull Estimate aging at each time step
bull Update circuit timing and Vdd
bull Repeat until t = tfinal
bull Modify circuit and start over if Vfinal gt maximum allowed voltage
bull No overhead in timing analysis but very slow Many STA runs
and library
Vstep AVS voltage stepVfinal converged voltage
62ISVLSI-2014 invited talk 140710
Experiment Setupbull Characterize different derated libraries
bull Evaluate impact of library characterizationbull Seven setups
1 VBTI = Vlib = Vinit Ignore AVS2 Most pessimistic derated library3 VBTI = Vlib = Vmax Extreme corner for AVS4 VBTI = Vfinal Do not overestimate aging but ignores AVS5 No derated library (reference)6 Proposed method with α=07 Proposed method with α=003
Case 1 2 3 4 5 6 7Vlib(V) Vinit Vinit Vmax Vinit NA Vheur1 Vheur2
VBTI (V) Vinit Vmax Vmax Vfinal NA Vheur1 Vheur2
63ISVLSI-2014 invited talk 140710
ldquoChicken and Eggrdquo Loop
bull ldquoChicken and eggrdquo loop in signoffbull Derated library characterization is related to BTI + AVSbull AVS affected by circuit implementation
bull Timing constraints critical paths etc
bull Circuit is affected by library characterization
Circuit
Derated Libraries
Vfinal
Vlib VBTI
64ISVLSI-2014 invited talk 140710
Bias Temperature Instability (BTI)
|ΔVth| increases when device is on (stressed)|ΔVth| is partially recovered when device is off (relaxed)NBTI PMOS PBTINMOS
|Vgs|
time
ON OFF ON OFF
[VattikondaWC06]
Device aging (|ΔVth|) accumulates over time
[TCASrsquo14]
65ISVLSI-2014 invited talk 140710
Observation 1
[Chan11]
bull BTI is a ldquofront-loadedrdquo phenomenon
bull 50 BTI aging happens within the 1st year of circuit lifetime (total lifetime = 10 years)
bull Most Vdd increment happens in early lifetime
bull Gap between Vdd and Vfinal reduces rapidly
asymp70 Vdd increment in 1 year(remaining 30 over 9 years)
Vfinal
66ISVLSI-2014 invited talk 140710
Results for DC Scenario
Optimistic signoff corner bull AVS increases supply voltage
aggressively to compensate agingbull Consume more powerbull May fail to meet timing if desired
bull Assume no EM fix at signoffbull BTI degradation is checked at each step and MTTF is updated as
30 MTTF penalty
200mV voltage compensation
>
70ISVLSI-2014 invited talk 140710
0 2 4 6 8 10 12090092094096098100102104
S1 S2 S3 S4 S5
Year
VDD
DMA 3S1 S2 S3 S4 S5
78
79
80
81
MTT
F (Y
ear)
EM Impact on AVS Scheduling
12 years MTTF penalty
>
71ISVLSI-2014 invited talk 140710
What is ldquoSignoffrdquo
bull Foundation of contract between design house and foundrybull ldquochip should workrdquo stack of models margins analysesbull Function timing signal integrity power integrity hellip
Nominal VddStatic IR drop
Power grid IR gradientDynamic IR
HCINBTI
Signoff Vdd
Voltage
Problem Margins = pessimism
overdesign schedule delay
ldquomargin stackrdquo for voltage signoff
Operating voltage
72ISVLSI-2014 invited talk 140710
Statistical Timing Analysis (1)
bull Delay sensitivity of path pj to variation source zv
bull Assumptions bull Δdjv is linear with respect to variation sources
bull Variation sources are normal distributions
bull Obtain Δdjv using 28 runs of RC extraction and static timing analysis (STA)
28 itf files (27 variation
sources + Ytyp)
Routed Netlist
RC extraction
STA
Δdjv
Δdjv = [ - ] 3dj(Yv) dj(Ytyp)
Note Path delay includes gate and wire delays
73ISVLSI-2014 invited talk 140710
Statistical Timing Analysis (2)bull Σ is the correlation matrix for variation sources (eg 27 x 27)
bull Σ = λλT (Note λ is obtained by Cholesky decomposition)
h h119879 119903119900119906119892 119901119906119905=1minus119864119877119879
+1minus119864119877119903times119879
recovery cycles
Clock period
Error rate [Kahng10]
76ISVLSI-2014 invited talk 140710
Selective-Endpoint Optimizationbull Optimize fanin cone w tighter constraints Allows replacement of Razor FF w normal FFbull Trade off cost of resilience vs data path optimization
bull Some paths have α gt 10 a CBC can underestimate delay variationsbull But these paths have larger delays at the other corner
C-worst corner underestimates delay variations but these paths are dominated by the RC-worst corner
α lt 10 delay variations are covered by the RC-worst corner
Dominated by C-worstΔdelay at C-worst gt Δdelay at RC-worst
Dominated by RC-worst Δdelay at RC-worst gt Δdelay at C-worst
Δdelay at RC-worst [d(Ycw) ndash d(Ytyp)] d(Ytyp)
48ISVLSI-2014 invited talk 140710
Delay Variation
α α
Δdelay at C-worst [d(Ycw) ndash d(Ytyp)] d(Ytyp)
bull Some paths have α gt 10 a CBC can underestimate delay variationsbull But these paths have larger delays at the other corner
C-worst corner underestimates delay variations but these paths are dominated by the RC-worst corner
α lt 10 delay variations are covered by the RC-worst corner
Dominated by C-worstΔdelay at C-worst gt Δdelay at RC-worst
Dominated by RC-worst Δdelay at RC-worst gt Δdelay at C-worst
Δdelay at RC-worst [d(Ycw) ndash d(Ytyp)] d(Ytyp)
bull Paths are more sensitive to R or to Cbull Using RC-worst or C-worst only will underestimate delay variationsbull Need both RC- and C-worst corners to cover process variationsbull In the following discussions α is defined at the dominant corner
49ISVLSI-2014 invited talk 140710
Non-Homogeneous Corner
bull Each layer can have different skewed variationsInterconnect stack with M1 and M2
M1 C
M2 C
3σ
Non-homogeneous cornerM1 == Cw (3σ)M2 == Ctyp
bull Less pessimism with non-homogeneous cornersbull Challenge
bull Many feasible combinationsbull A corner can only cover certain pathsbull How to choose the best combinations
50ISVLSI-2014 invited talk 140710
Opportunities for Tightened BEOL Corners
bull CBC can be pessimistic Most paths have α lt 05 bull Use tightened BEOL corners eg scale BEOL variation in
itf with α = 05
Δdj(Yrcw)dj(Ytyp) x 100
3σjd(Ytyp) x 100
Challenge how to avoid underestimating delay variation to preserve parametric yield
51ISVLSI-2014 invited talk 140710
Wiring Structure in Timing-Critical Paths
bull Critical paths are structurally similar
bull Wires on critical paths are routed on many layers
bull Structure is an outcome of the design flow
Testcasebull 45nm foundry library (wire
resistivity scaled by 8X)bull Netlist NETCARD 1mm2 570K
standard cell instancesbull 9 metal layersbull Extract critical paths from
different PVT and BEOL corners
Wirelength ratio ()
52ISVLSI-2014 invited talk 140710
Proposed Timing Signoff Flow
bull Extract RC at RC-worst C-worst and the typical corners
bull Calculate Δdelay of critical paths
bull Put path j in the group Gtbc if Δdelay is larger than a threshold
bull Fix only the paths in Gtbc using tightened BEOL corners
bull Since tightened corners have smaller delay variations timing closure is easier
Routed design
Timing analysis at BEOL corners Ytyp Ycw Yrcw
GTBC GCBC
ECOusing CBC
Timing analysis
using TBC
violation = 0
Timing analysis
using CBC
violation = 0
ECOusing TBC
done
53ISVLSI-2014 invited talk 140710
Experiment Setup
LEON3MP NETCARD SUPERBLUE12Clock period (ns) 18 20 31
bull Model BTI degradation with Vfinal throughout lifetime
bull Aging of a flat Vfinal asymp aging of an adaptive Vdd
bull But slightly pessimistic
Vdd
time
NBTI
PBTI
VBTI = Vlib asymp Vfinal
58ISVLSI-2014 invited talk 140710
Vfinal Estimation
bull Problem Vfinal is not available at early design stage (design has not been implemented)
bull Vfinal = Vdd end of life (to compensate BTI aging)
bull Gates along critical pathbull Timing slack at t = 0
bull Circuit activity is not an issue bull Because BTI effect is not sensitive to circuit activitybull DC or AC stress model is sufficient
59ISVLSI-2014 invited talk 140710
Observation and Heuristic 2
bull Observation 2 Vfinal is not sensitive to gate types
bull Heuristic 2 use average Vfinal of different gate typesbull Vfinal is a function of timing slack
bull Assume timing slack = 0
10mV
60ISVLSI-2014 invited talk 140710
Technology and Benchmark Circuits
bull NANGATE library with 32nm PTM technology bull Signoff for setup time violationbull Temperature = 125Cbull Process corner = slow NMOS and PMOSbull BTI degradation = DC AC
Supply voltages
Circuit Frequency (GHz)C5315 138c7552 125AES 089MPEG2 105
Vmax105V
Vinit090V
Vheur1 (DC) 097V
Vheur1 (AC) 095V
Vheur2 (DC) 095V
Vheur2 (AC) 093V
61ISVLSI-2014 invited talk 140710
A Reference Signoff Flow
bull Basic idea keep a consistent VBTI VLIB and Vdd throughout circuit lifetime
bull Signoff flowbull Estimate aging at each time step
bull Update circuit timing and Vdd
bull Repeat until t = tfinal
bull Modify circuit and start over if Vfinal gt maximum allowed voltage
bull No overhead in timing analysis but very slow Many STA runs
and library
Vstep AVS voltage stepVfinal converged voltage
62ISVLSI-2014 invited talk 140710
Experiment Setupbull Characterize different derated libraries
bull Evaluate impact of library characterizationbull Seven setups
1 VBTI = Vlib = Vinit Ignore AVS2 Most pessimistic derated library3 VBTI = Vlib = Vmax Extreme corner for AVS4 VBTI = Vfinal Do not overestimate aging but ignores AVS5 No derated library (reference)6 Proposed method with α=07 Proposed method with α=003
Case 1 2 3 4 5 6 7Vlib(V) Vinit Vinit Vmax Vinit NA Vheur1 Vheur2
VBTI (V) Vinit Vmax Vmax Vfinal NA Vheur1 Vheur2
63ISVLSI-2014 invited talk 140710
ldquoChicken and Eggrdquo Loop
bull ldquoChicken and eggrdquo loop in signoffbull Derated library characterization is related to BTI + AVSbull AVS affected by circuit implementation
bull Timing constraints critical paths etc
bull Circuit is affected by library characterization
Circuit
Derated Libraries
Vfinal
Vlib VBTI
64ISVLSI-2014 invited talk 140710
Bias Temperature Instability (BTI)
|ΔVth| increases when device is on (stressed)|ΔVth| is partially recovered when device is off (relaxed)NBTI PMOS PBTINMOS
|Vgs|
time
ON OFF ON OFF
[VattikondaWC06]
Device aging (|ΔVth|) accumulates over time
[TCASrsquo14]
65ISVLSI-2014 invited talk 140710
Observation 1
[Chan11]
bull BTI is a ldquofront-loadedrdquo phenomenon
bull 50 BTI aging happens within the 1st year of circuit lifetime (total lifetime = 10 years)
bull Most Vdd increment happens in early lifetime
bull Gap between Vdd and Vfinal reduces rapidly
asymp70 Vdd increment in 1 year(remaining 30 over 9 years)
Vfinal
66ISVLSI-2014 invited talk 140710
Results for DC Scenario
Optimistic signoff corner bull AVS increases supply voltage
aggressively to compensate agingbull Consume more powerbull May fail to meet timing if desired
bull Assume no EM fix at signoffbull BTI degradation is checked at each step and MTTF is updated as
30 MTTF penalty
200mV voltage compensation
>
70ISVLSI-2014 invited talk 140710
0 2 4 6 8 10 12090092094096098100102104
S1 S2 S3 S4 S5
Year
VDD
DMA 3S1 S2 S3 S4 S5
78
79
80
81
MTT
F (Y
ear)
EM Impact on AVS Scheduling
12 years MTTF penalty
>
71ISVLSI-2014 invited talk 140710
What is ldquoSignoffrdquo
bull Foundation of contract between design house and foundrybull ldquochip should workrdquo stack of models margins analysesbull Function timing signal integrity power integrity hellip
Nominal VddStatic IR drop
Power grid IR gradientDynamic IR
HCINBTI
Signoff Vdd
Voltage
Problem Margins = pessimism
overdesign schedule delay
ldquomargin stackrdquo for voltage signoff
Operating voltage
72ISVLSI-2014 invited talk 140710
Statistical Timing Analysis (1)
bull Delay sensitivity of path pj to variation source zv
bull Assumptions bull Δdjv is linear with respect to variation sources
bull Variation sources are normal distributions
bull Obtain Δdjv using 28 runs of RC extraction and static timing analysis (STA)
28 itf files (27 variation
sources + Ytyp)
Routed Netlist
RC extraction
STA
Δdjv
Δdjv = [ - ] 3dj(Yv) dj(Ytyp)
Note Path delay includes gate and wire delays
73ISVLSI-2014 invited talk 140710
Statistical Timing Analysis (2)bull Σ is the correlation matrix for variation sources (eg 27 x 27)
bull Σ = λλT (Note λ is obtained by Cholesky decomposition)
h h119879 119903119900119906119892 119901119906119905=1minus119864119877119879
+1minus119864119877119903times119879
recovery cycles
Clock period
Error rate [Kahng10]
76ISVLSI-2014 invited talk 140710
Selective-Endpoint Optimizationbull Optimize fanin cone w tighter constraints Allows replacement of Razor FF w normal FFbull Trade off cost of resilience vs data path optimization
bull Some paths have α gt 10 a CBC can underestimate delay variationsbull But these paths have larger delays at the other corner
C-worst corner underestimates delay variations but these paths are dominated by the RC-worst corner
α lt 10 delay variations are covered by the RC-worst corner
Dominated by C-worstΔdelay at C-worst gt Δdelay at RC-worst
Dominated by RC-worst Δdelay at RC-worst gt Δdelay at C-worst
Δdelay at RC-worst [d(Ycw) ndash d(Ytyp)] d(Ytyp)
48ISVLSI-2014 invited talk 140710
Delay Variation
α α
Δdelay at C-worst [d(Ycw) ndash d(Ytyp)] d(Ytyp)
bull Some paths have α gt 10 a CBC can underestimate delay variationsbull But these paths have larger delays at the other corner
C-worst corner underestimates delay variations but these paths are dominated by the RC-worst corner
α lt 10 delay variations are covered by the RC-worst corner
Dominated by C-worstΔdelay at C-worst gt Δdelay at RC-worst
Dominated by RC-worst Δdelay at RC-worst gt Δdelay at C-worst
Δdelay at RC-worst [d(Ycw) ndash d(Ytyp)] d(Ytyp)
bull Paths are more sensitive to R or to Cbull Using RC-worst or C-worst only will underestimate delay variationsbull Need both RC- and C-worst corners to cover process variationsbull In the following discussions α is defined at the dominant corner
49ISVLSI-2014 invited talk 140710
Non-Homogeneous Corner
bull Each layer can have different skewed variationsInterconnect stack with M1 and M2
M1 C
M2 C
3σ
Non-homogeneous cornerM1 == Cw (3σ)M2 == Ctyp
bull Less pessimism with non-homogeneous cornersbull Challenge
bull Many feasible combinationsbull A corner can only cover certain pathsbull How to choose the best combinations
50ISVLSI-2014 invited talk 140710
Opportunities for Tightened BEOL Corners
bull CBC can be pessimistic Most paths have α lt 05 bull Use tightened BEOL corners eg scale BEOL variation in
itf with α = 05
Δdj(Yrcw)dj(Ytyp) x 100
3σjd(Ytyp) x 100
Challenge how to avoid underestimating delay variation to preserve parametric yield
51ISVLSI-2014 invited talk 140710
Wiring Structure in Timing-Critical Paths
bull Critical paths are structurally similar
bull Wires on critical paths are routed on many layers
bull Structure is an outcome of the design flow
Testcasebull 45nm foundry library (wire
resistivity scaled by 8X)bull Netlist NETCARD 1mm2 570K
standard cell instancesbull 9 metal layersbull Extract critical paths from
different PVT and BEOL corners
Wirelength ratio ()
52ISVLSI-2014 invited talk 140710
Proposed Timing Signoff Flow
bull Extract RC at RC-worst C-worst and the typical corners
bull Calculate Δdelay of critical paths
bull Put path j in the group Gtbc if Δdelay is larger than a threshold
bull Fix only the paths in Gtbc using tightened BEOL corners
bull Since tightened corners have smaller delay variations timing closure is easier
Routed design
Timing analysis at BEOL corners Ytyp Ycw Yrcw
GTBC GCBC
ECOusing CBC
Timing analysis
using TBC
violation = 0
Timing analysis
using CBC
violation = 0
ECOusing TBC
done
53ISVLSI-2014 invited talk 140710
Experiment Setup
LEON3MP NETCARD SUPERBLUE12Clock period (ns) 18 20 31
bull Model BTI degradation with Vfinal throughout lifetime
bull Aging of a flat Vfinal asymp aging of an adaptive Vdd
bull But slightly pessimistic
Vdd
time
NBTI
PBTI
VBTI = Vlib asymp Vfinal
58ISVLSI-2014 invited talk 140710
Vfinal Estimation
bull Problem Vfinal is not available at early design stage (design has not been implemented)
bull Vfinal = Vdd end of life (to compensate BTI aging)
bull Gates along critical pathbull Timing slack at t = 0
bull Circuit activity is not an issue bull Because BTI effect is not sensitive to circuit activitybull DC or AC stress model is sufficient
59ISVLSI-2014 invited talk 140710
Observation and Heuristic 2
bull Observation 2 Vfinal is not sensitive to gate types
bull Heuristic 2 use average Vfinal of different gate typesbull Vfinal is a function of timing slack
bull Assume timing slack = 0
10mV
60ISVLSI-2014 invited talk 140710
Technology and Benchmark Circuits
bull NANGATE library with 32nm PTM technology bull Signoff for setup time violationbull Temperature = 125Cbull Process corner = slow NMOS and PMOSbull BTI degradation = DC AC
Supply voltages
Circuit Frequency (GHz)C5315 138c7552 125AES 089MPEG2 105
Vmax105V
Vinit090V
Vheur1 (DC) 097V
Vheur1 (AC) 095V
Vheur2 (DC) 095V
Vheur2 (AC) 093V
61ISVLSI-2014 invited talk 140710
A Reference Signoff Flow
bull Basic idea keep a consistent VBTI VLIB and Vdd throughout circuit lifetime
bull Signoff flowbull Estimate aging at each time step
bull Update circuit timing and Vdd
bull Repeat until t = tfinal
bull Modify circuit and start over if Vfinal gt maximum allowed voltage
bull No overhead in timing analysis but very slow Many STA runs
and library
Vstep AVS voltage stepVfinal converged voltage
62ISVLSI-2014 invited talk 140710
Experiment Setupbull Characterize different derated libraries
bull Evaluate impact of library characterizationbull Seven setups
1 VBTI = Vlib = Vinit Ignore AVS2 Most pessimistic derated library3 VBTI = Vlib = Vmax Extreme corner for AVS4 VBTI = Vfinal Do not overestimate aging but ignores AVS5 No derated library (reference)6 Proposed method with α=07 Proposed method with α=003
Case 1 2 3 4 5 6 7Vlib(V) Vinit Vinit Vmax Vinit NA Vheur1 Vheur2
VBTI (V) Vinit Vmax Vmax Vfinal NA Vheur1 Vheur2
63ISVLSI-2014 invited talk 140710
ldquoChicken and Eggrdquo Loop
bull ldquoChicken and eggrdquo loop in signoffbull Derated library characterization is related to BTI + AVSbull AVS affected by circuit implementation
bull Timing constraints critical paths etc
bull Circuit is affected by library characterization
Circuit
Derated Libraries
Vfinal
Vlib VBTI
64ISVLSI-2014 invited talk 140710
Bias Temperature Instability (BTI)
|ΔVth| increases when device is on (stressed)|ΔVth| is partially recovered when device is off (relaxed)NBTI PMOS PBTINMOS
|Vgs|
time
ON OFF ON OFF
[VattikondaWC06]
Device aging (|ΔVth|) accumulates over time
[TCASrsquo14]
65ISVLSI-2014 invited talk 140710
Observation 1
[Chan11]
bull BTI is a ldquofront-loadedrdquo phenomenon
bull 50 BTI aging happens within the 1st year of circuit lifetime (total lifetime = 10 years)
bull Most Vdd increment happens in early lifetime
bull Gap between Vdd and Vfinal reduces rapidly
asymp70 Vdd increment in 1 year(remaining 30 over 9 years)
Vfinal
66ISVLSI-2014 invited talk 140710
Results for DC Scenario
Optimistic signoff corner bull AVS increases supply voltage
aggressively to compensate agingbull Consume more powerbull May fail to meet timing if desired
bull Assume no EM fix at signoffbull BTI degradation is checked at each step and MTTF is updated as
30 MTTF penalty
200mV voltage compensation
>
70ISVLSI-2014 invited talk 140710
0 2 4 6 8 10 12090092094096098100102104
S1 S2 S3 S4 S5
Year
VDD
DMA 3S1 S2 S3 S4 S5
78
79
80
81
MTT
F (Y
ear)
EM Impact on AVS Scheduling
12 years MTTF penalty
>
71ISVLSI-2014 invited talk 140710
What is ldquoSignoffrdquo
bull Foundation of contract between design house and foundrybull ldquochip should workrdquo stack of models margins analysesbull Function timing signal integrity power integrity hellip
Nominal VddStatic IR drop
Power grid IR gradientDynamic IR
HCINBTI
Signoff Vdd
Voltage
Problem Margins = pessimism
overdesign schedule delay
ldquomargin stackrdquo for voltage signoff
Operating voltage
72ISVLSI-2014 invited talk 140710
Statistical Timing Analysis (1)
bull Delay sensitivity of path pj to variation source zv
bull Assumptions bull Δdjv is linear with respect to variation sources
bull Variation sources are normal distributions
bull Obtain Δdjv using 28 runs of RC extraction and static timing analysis (STA)
28 itf files (27 variation
sources + Ytyp)
Routed Netlist
RC extraction
STA
Δdjv
Δdjv = [ - ] 3dj(Yv) dj(Ytyp)
Note Path delay includes gate and wire delays
73ISVLSI-2014 invited talk 140710
Statistical Timing Analysis (2)bull Σ is the correlation matrix for variation sources (eg 27 x 27)
bull Σ = λλT (Note λ is obtained by Cholesky decomposition)
h h119879 119903119900119906119892 119901119906119905=1minus119864119877119879
+1minus119864119877119903times119879
recovery cycles
Clock period
Error rate [Kahng10]
76ISVLSI-2014 invited talk 140710
Selective-Endpoint Optimizationbull Optimize fanin cone w tighter constraints Allows replacement of Razor FF w normal FFbull Trade off cost of resilience vs data path optimization
bull Some paths have α gt 10 a CBC can underestimate delay variationsbull But these paths have larger delays at the other corner
C-worst corner underestimates delay variations but these paths are dominated by the RC-worst corner
α lt 10 delay variations are covered by the RC-worst corner
Dominated by C-worstΔdelay at C-worst gt Δdelay at RC-worst
Dominated by RC-worst Δdelay at RC-worst gt Δdelay at C-worst
Δdelay at RC-worst [d(Ycw) ndash d(Ytyp)] d(Ytyp)
48ISVLSI-2014 invited talk 140710
Delay Variation
α α
Δdelay at C-worst [d(Ycw) ndash d(Ytyp)] d(Ytyp)
bull Some paths have α gt 10 a CBC can underestimate delay variationsbull But these paths have larger delays at the other corner
C-worst corner underestimates delay variations but these paths are dominated by the RC-worst corner
α lt 10 delay variations are covered by the RC-worst corner
Dominated by C-worstΔdelay at C-worst gt Δdelay at RC-worst
Dominated by RC-worst Δdelay at RC-worst gt Δdelay at C-worst
Δdelay at RC-worst [d(Ycw) ndash d(Ytyp)] d(Ytyp)
bull Paths are more sensitive to R or to Cbull Using RC-worst or C-worst only will underestimate delay variationsbull Need both RC- and C-worst corners to cover process variationsbull In the following discussions α is defined at the dominant corner
49ISVLSI-2014 invited talk 140710
Non-Homogeneous Corner
bull Each layer can have different skewed variationsInterconnect stack with M1 and M2
M1 C
M2 C
3σ
Non-homogeneous cornerM1 == Cw (3σ)M2 == Ctyp
bull Less pessimism with non-homogeneous cornersbull Challenge
bull Many feasible combinationsbull A corner can only cover certain pathsbull How to choose the best combinations
50ISVLSI-2014 invited talk 140710
Opportunities for Tightened BEOL Corners
bull CBC can be pessimistic Most paths have α lt 05 bull Use tightened BEOL corners eg scale BEOL variation in
itf with α = 05
Δdj(Yrcw)dj(Ytyp) x 100
3σjd(Ytyp) x 100
Challenge how to avoid underestimating delay variation to preserve parametric yield
51ISVLSI-2014 invited talk 140710
Wiring Structure in Timing-Critical Paths
bull Critical paths are structurally similar
bull Wires on critical paths are routed on many layers
bull Structure is an outcome of the design flow
Testcasebull 45nm foundry library (wire
resistivity scaled by 8X)bull Netlist NETCARD 1mm2 570K
standard cell instancesbull 9 metal layersbull Extract critical paths from
different PVT and BEOL corners
Wirelength ratio ()
52ISVLSI-2014 invited talk 140710
Proposed Timing Signoff Flow
bull Extract RC at RC-worst C-worst and the typical corners
bull Calculate Δdelay of critical paths
bull Put path j in the group Gtbc if Δdelay is larger than a threshold
bull Fix only the paths in Gtbc using tightened BEOL corners
bull Since tightened corners have smaller delay variations timing closure is easier
Routed design
Timing analysis at BEOL corners Ytyp Ycw Yrcw
GTBC GCBC
ECOusing CBC
Timing analysis
using TBC
violation = 0
Timing analysis
using CBC
violation = 0
ECOusing TBC
done
53ISVLSI-2014 invited talk 140710
Experiment Setup
LEON3MP NETCARD SUPERBLUE12Clock period (ns) 18 20 31
bull Model BTI degradation with Vfinal throughout lifetime
bull Aging of a flat Vfinal asymp aging of an adaptive Vdd
bull But slightly pessimistic
Vdd
time
NBTI
PBTI
VBTI = Vlib asymp Vfinal
58ISVLSI-2014 invited talk 140710
Vfinal Estimation
bull Problem Vfinal is not available at early design stage (design has not been implemented)
bull Vfinal = Vdd end of life (to compensate BTI aging)
bull Gates along critical pathbull Timing slack at t = 0
bull Circuit activity is not an issue bull Because BTI effect is not sensitive to circuit activitybull DC or AC stress model is sufficient
59ISVLSI-2014 invited talk 140710
Observation and Heuristic 2
bull Observation 2 Vfinal is not sensitive to gate types
bull Heuristic 2 use average Vfinal of different gate typesbull Vfinal is a function of timing slack
bull Assume timing slack = 0
10mV
60ISVLSI-2014 invited talk 140710
Technology and Benchmark Circuits
bull NANGATE library with 32nm PTM technology bull Signoff for setup time violationbull Temperature = 125Cbull Process corner = slow NMOS and PMOSbull BTI degradation = DC AC
Supply voltages
Circuit Frequency (GHz)C5315 138c7552 125AES 089MPEG2 105
Vmax105V
Vinit090V
Vheur1 (DC) 097V
Vheur1 (AC) 095V
Vheur2 (DC) 095V
Vheur2 (AC) 093V
61ISVLSI-2014 invited talk 140710
A Reference Signoff Flow
bull Basic idea keep a consistent VBTI VLIB and Vdd throughout circuit lifetime
bull Signoff flowbull Estimate aging at each time step
bull Update circuit timing and Vdd
bull Repeat until t = tfinal
bull Modify circuit and start over if Vfinal gt maximum allowed voltage
bull No overhead in timing analysis but very slow Many STA runs
and library
Vstep AVS voltage stepVfinal converged voltage
62ISVLSI-2014 invited talk 140710
Experiment Setupbull Characterize different derated libraries
bull Evaluate impact of library characterizationbull Seven setups
1 VBTI = Vlib = Vinit Ignore AVS2 Most pessimistic derated library3 VBTI = Vlib = Vmax Extreme corner for AVS4 VBTI = Vfinal Do not overestimate aging but ignores AVS5 No derated library (reference)6 Proposed method with α=07 Proposed method with α=003
Case 1 2 3 4 5 6 7Vlib(V) Vinit Vinit Vmax Vinit NA Vheur1 Vheur2
VBTI (V) Vinit Vmax Vmax Vfinal NA Vheur1 Vheur2
63ISVLSI-2014 invited talk 140710
ldquoChicken and Eggrdquo Loop
bull ldquoChicken and eggrdquo loop in signoffbull Derated library characterization is related to BTI + AVSbull AVS affected by circuit implementation
bull Timing constraints critical paths etc
bull Circuit is affected by library characterization
Circuit
Derated Libraries
Vfinal
Vlib VBTI
64ISVLSI-2014 invited talk 140710
Bias Temperature Instability (BTI)
|ΔVth| increases when device is on (stressed)|ΔVth| is partially recovered when device is off (relaxed)NBTI PMOS PBTINMOS
|Vgs|
time
ON OFF ON OFF
[VattikondaWC06]
Device aging (|ΔVth|) accumulates over time
[TCASrsquo14]
65ISVLSI-2014 invited talk 140710
Observation 1
[Chan11]
bull BTI is a ldquofront-loadedrdquo phenomenon
bull 50 BTI aging happens within the 1st year of circuit lifetime (total lifetime = 10 years)
bull Most Vdd increment happens in early lifetime
bull Gap between Vdd and Vfinal reduces rapidly
asymp70 Vdd increment in 1 year(remaining 30 over 9 years)
Vfinal
66ISVLSI-2014 invited talk 140710
Results for DC Scenario
Optimistic signoff corner bull AVS increases supply voltage
aggressively to compensate agingbull Consume more powerbull May fail to meet timing if desired
bull Assume no EM fix at signoffbull BTI degradation is checked at each step and MTTF is updated as
30 MTTF penalty
200mV voltage compensation
>
70ISVLSI-2014 invited talk 140710
0 2 4 6 8 10 12090092094096098100102104
S1 S2 S3 S4 S5
Year
VDD
DMA 3S1 S2 S3 S4 S5
78
79
80
81
MTT
F (Y
ear)
EM Impact on AVS Scheduling
12 years MTTF penalty
>
71ISVLSI-2014 invited talk 140710
What is ldquoSignoffrdquo
bull Foundation of contract between design house and foundrybull ldquochip should workrdquo stack of models margins analysesbull Function timing signal integrity power integrity hellip
Nominal VddStatic IR drop
Power grid IR gradientDynamic IR
HCINBTI
Signoff Vdd
Voltage
Problem Margins = pessimism
overdesign schedule delay
ldquomargin stackrdquo for voltage signoff
Operating voltage
72ISVLSI-2014 invited talk 140710
Statistical Timing Analysis (1)
bull Delay sensitivity of path pj to variation source zv
bull Assumptions bull Δdjv is linear with respect to variation sources
bull Variation sources are normal distributions
bull Obtain Δdjv using 28 runs of RC extraction and static timing analysis (STA)
28 itf files (27 variation
sources + Ytyp)
Routed Netlist
RC extraction
STA
Δdjv
Δdjv = [ - ] 3dj(Yv) dj(Ytyp)
Note Path delay includes gate and wire delays
73ISVLSI-2014 invited talk 140710
Statistical Timing Analysis (2)bull Σ is the correlation matrix for variation sources (eg 27 x 27)
bull Σ = λλT (Note λ is obtained by Cholesky decomposition)
h h119879 119903119900119906119892 119901119906119905=1minus119864119877119879
+1minus119864119877119903times119879
recovery cycles
Clock period
Error rate [Kahng10]
76ISVLSI-2014 invited talk 140710
Selective-Endpoint Optimizationbull Optimize fanin cone w tighter constraints Allows replacement of Razor FF w normal FFbull Trade off cost of resilience vs data path optimization
bull Some paths have α gt 10 a CBC can underestimate delay variationsbull But these paths have larger delays at the other corner
C-worst corner underestimates delay variations but these paths are dominated by the RC-worst corner
α lt 10 delay variations are covered by the RC-worst corner
Dominated by C-worstΔdelay at C-worst gt Δdelay at RC-worst
Dominated by RC-worst Δdelay at RC-worst gt Δdelay at C-worst
Δdelay at RC-worst [d(Ycw) ndash d(Ytyp)] d(Ytyp)
48ISVLSI-2014 invited talk 140710
Delay Variation
α α
Δdelay at C-worst [d(Ycw) ndash d(Ytyp)] d(Ytyp)
bull Some paths have α gt 10 a CBC can underestimate delay variationsbull But these paths have larger delays at the other corner
C-worst corner underestimates delay variations but these paths are dominated by the RC-worst corner
α lt 10 delay variations are covered by the RC-worst corner
Dominated by C-worstΔdelay at C-worst gt Δdelay at RC-worst
Dominated by RC-worst Δdelay at RC-worst gt Δdelay at C-worst
Δdelay at RC-worst [d(Ycw) ndash d(Ytyp)] d(Ytyp)
bull Paths are more sensitive to R or to Cbull Using RC-worst or C-worst only will underestimate delay variationsbull Need both RC- and C-worst corners to cover process variationsbull In the following discussions α is defined at the dominant corner
49ISVLSI-2014 invited talk 140710
Non-Homogeneous Corner
bull Each layer can have different skewed variationsInterconnect stack with M1 and M2
M1 C
M2 C
3σ
Non-homogeneous cornerM1 == Cw (3σ)M2 == Ctyp
bull Less pessimism with non-homogeneous cornersbull Challenge
bull Many feasible combinationsbull A corner can only cover certain pathsbull How to choose the best combinations
50ISVLSI-2014 invited talk 140710
Opportunities for Tightened BEOL Corners
bull CBC can be pessimistic Most paths have α lt 05 bull Use tightened BEOL corners eg scale BEOL variation in
itf with α = 05
Δdj(Yrcw)dj(Ytyp) x 100
3σjd(Ytyp) x 100
Challenge how to avoid underestimating delay variation to preserve parametric yield
51ISVLSI-2014 invited talk 140710
Wiring Structure in Timing-Critical Paths
bull Critical paths are structurally similar
bull Wires on critical paths are routed on many layers
bull Structure is an outcome of the design flow
Testcasebull 45nm foundry library (wire
resistivity scaled by 8X)bull Netlist NETCARD 1mm2 570K
standard cell instancesbull 9 metal layersbull Extract critical paths from
different PVT and BEOL corners
Wirelength ratio ()
52ISVLSI-2014 invited talk 140710
Proposed Timing Signoff Flow
bull Extract RC at RC-worst C-worst and the typical corners
bull Calculate Δdelay of critical paths
bull Put path j in the group Gtbc if Δdelay is larger than a threshold
bull Fix only the paths in Gtbc using tightened BEOL corners
bull Since tightened corners have smaller delay variations timing closure is easier
Routed design
Timing analysis at BEOL corners Ytyp Ycw Yrcw
GTBC GCBC
ECOusing CBC
Timing analysis
using TBC
violation = 0
Timing analysis
using CBC
violation = 0
ECOusing TBC
done
53ISVLSI-2014 invited talk 140710
Experiment Setup
LEON3MP NETCARD SUPERBLUE12Clock period (ns) 18 20 31
bull Model BTI degradation with Vfinal throughout lifetime
bull Aging of a flat Vfinal asymp aging of an adaptive Vdd
bull But slightly pessimistic
Vdd
time
NBTI
PBTI
VBTI = Vlib asymp Vfinal
58ISVLSI-2014 invited talk 140710
Vfinal Estimation
bull Problem Vfinal is not available at early design stage (design has not been implemented)
bull Vfinal = Vdd end of life (to compensate BTI aging)
bull Gates along critical pathbull Timing slack at t = 0
bull Circuit activity is not an issue bull Because BTI effect is not sensitive to circuit activitybull DC or AC stress model is sufficient
59ISVLSI-2014 invited talk 140710
Observation and Heuristic 2
bull Observation 2 Vfinal is not sensitive to gate types
bull Heuristic 2 use average Vfinal of different gate typesbull Vfinal is a function of timing slack
bull Assume timing slack = 0
10mV
60ISVLSI-2014 invited talk 140710
Technology and Benchmark Circuits
bull NANGATE library with 32nm PTM technology bull Signoff for setup time violationbull Temperature = 125Cbull Process corner = slow NMOS and PMOSbull BTI degradation = DC AC
Supply voltages
Circuit Frequency (GHz)C5315 138c7552 125AES 089MPEG2 105
Vmax105V
Vinit090V
Vheur1 (DC) 097V
Vheur1 (AC) 095V
Vheur2 (DC) 095V
Vheur2 (AC) 093V
61ISVLSI-2014 invited talk 140710
A Reference Signoff Flow
bull Basic idea keep a consistent VBTI VLIB and Vdd throughout circuit lifetime
bull Signoff flowbull Estimate aging at each time step
bull Update circuit timing and Vdd
bull Repeat until t = tfinal
bull Modify circuit and start over if Vfinal gt maximum allowed voltage
bull No overhead in timing analysis but very slow Many STA runs
and library
Vstep AVS voltage stepVfinal converged voltage
62ISVLSI-2014 invited talk 140710
Experiment Setupbull Characterize different derated libraries
bull Evaluate impact of library characterizationbull Seven setups
1 VBTI = Vlib = Vinit Ignore AVS2 Most pessimistic derated library3 VBTI = Vlib = Vmax Extreme corner for AVS4 VBTI = Vfinal Do not overestimate aging but ignores AVS5 No derated library (reference)6 Proposed method with α=07 Proposed method with α=003
Case 1 2 3 4 5 6 7Vlib(V) Vinit Vinit Vmax Vinit NA Vheur1 Vheur2
VBTI (V) Vinit Vmax Vmax Vfinal NA Vheur1 Vheur2
63ISVLSI-2014 invited talk 140710
ldquoChicken and Eggrdquo Loop
bull ldquoChicken and eggrdquo loop in signoffbull Derated library characterization is related to BTI + AVSbull AVS affected by circuit implementation
bull Timing constraints critical paths etc
bull Circuit is affected by library characterization
Circuit
Derated Libraries
Vfinal
Vlib VBTI
64ISVLSI-2014 invited talk 140710
Bias Temperature Instability (BTI)
|ΔVth| increases when device is on (stressed)|ΔVth| is partially recovered when device is off (relaxed)NBTI PMOS PBTINMOS
|Vgs|
time
ON OFF ON OFF
[VattikondaWC06]
Device aging (|ΔVth|) accumulates over time
[TCASrsquo14]
65ISVLSI-2014 invited talk 140710
Observation 1
[Chan11]
bull BTI is a ldquofront-loadedrdquo phenomenon
bull 50 BTI aging happens within the 1st year of circuit lifetime (total lifetime = 10 years)
bull Most Vdd increment happens in early lifetime
bull Gap between Vdd and Vfinal reduces rapidly
asymp70 Vdd increment in 1 year(remaining 30 over 9 years)
Vfinal
66ISVLSI-2014 invited talk 140710
Results for DC Scenario
Optimistic signoff corner bull AVS increases supply voltage
aggressively to compensate agingbull Consume more powerbull May fail to meet timing if desired
bull Assume no EM fix at signoffbull BTI degradation is checked at each step and MTTF is updated as
30 MTTF penalty
200mV voltage compensation
>
70ISVLSI-2014 invited talk 140710
0 2 4 6 8 10 12090092094096098100102104
S1 S2 S3 S4 S5
Year
VDD
DMA 3S1 S2 S3 S4 S5
78
79
80
81
MTT
F (Y
ear)
EM Impact on AVS Scheduling
12 years MTTF penalty
>
71ISVLSI-2014 invited talk 140710
What is ldquoSignoffrdquo
bull Foundation of contract between design house and foundrybull ldquochip should workrdquo stack of models margins analysesbull Function timing signal integrity power integrity hellip
Nominal VddStatic IR drop
Power grid IR gradientDynamic IR
HCINBTI
Signoff Vdd
Voltage
Problem Margins = pessimism
overdesign schedule delay
ldquomargin stackrdquo for voltage signoff
Operating voltage
72ISVLSI-2014 invited talk 140710
Statistical Timing Analysis (1)
bull Delay sensitivity of path pj to variation source zv
bull Assumptions bull Δdjv is linear with respect to variation sources
bull Variation sources are normal distributions
bull Obtain Δdjv using 28 runs of RC extraction and static timing analysis (STA)
28 itf files (27 variation
sources + Ytyp)
Routed Netlist
RC extraction
STA
Δdjv
Δdjv = [ - ] 3dj(Yv) dj(Ytyp)
Note Path delay includes gate and wire delays
73ISVLSI-2014 invited talk 140710
Statistical Timing Analysis (2)bull Σ is the correlation matrix for variation sources (eg 27 x 27)
bull Σ = λλT (Note λ is obtained by Cholesky decomposition)
h h119879 119903119900119906119892 119901119906119905=1minus119864119877119879
+1minus119864119877119903times119879
recovery cycles
Clock period
Error rate [Kahng10]
76ISVLSI-2014 invited talk 140710
Selective-Endpoint Optimizationbull Optimize fanin cone w tighter constraints Allows replacement of Razor FF w normal FFbull Trade off cost of resilience vs data path optimization
Energy Reduction in AVS Contextbull Adaptive voltage scaling allows lower supply voltage for resilient
designs thus reduced powerbull Proposed method trades off between timing-error penalty vs
reduced power at a lower supply voltagebull Proposed method achieves an average of 18 energy reduction
compared to pure-margin designs Resilience benefits increase in the context of AVS strategy
084 088 092 096 10030
36
42
48
54
60brute-forcepure-marginCombOpt
Supply voltage (V)
En
erg
y (
mJ
)
084 086 088 09 092 094 096 098 1 10225
29
33
37
41
45brute-force
pure-margin
CombOpt
Supply voltage (V)
En
erg
y (
mJ
)
MUL EXU
Minimum achievable energy
80ISVLSI-2014 invited talk 140710
Our Concept Mode Dominancebull Design cone (of mode A) is the union of all the feasible operating modes for
circuits signed of at mode Abull Design cone is determined by tradeoff between voltage and frequency (mainly
threshold voltages)bull One mode is outside of the design cone of the other
failed design overdesignbull Mode A has positive timing slacks with respect to mode B
mode A dominates mode Bbull Equivalent dominance no mode is dominated by the other
bull Modes are in each othersrsquo design cone
Voltage
Frequency
A
Negative Slacks = failed design
Positive Slacks = overdesign
B
C
Design Cone of mode A
Multi-mode signoff at modes which do not exhibit equivalent dominance leads to overdesign
Guideline search for signoff modes within design cone reduce overdesign
81ISVLSI-2014 invited talk 140710
Our Method Global Optimizationbull Iteratively sample and refine power models
bull Avoid circuit implementation at each modebull Small constant of runs is enough Scalable
Sample (SPampR)
Construct power models
Estimate optimal signoff modes
Sample (SPampR)
Refine power models
Adaptive search
Global optimization flow
09 10 11 1214
15
16
17
18
19
201st 2nd real
Signoff Voltage (v)
Po
wer
(m
W)
Power estimation of adaptive search
bull Ovals indicate sample pointsbull 1st 2nd power from power models at first
second iterationbull real power from real implemented circuits
Design AESf 700MHz
82ISVLSI-2014 invited talk 140710
Classes of Closed-Loop AVS
bull Critical path may be difficult to identify (IP from 3rd party)
bull Calibrating monitors at multiple modesvoltages requires long test time
Closed-Loop AVS
Design-dependent replica
In-situmonitor
Generic monitor
bull Does not capture design-specific performance variation
82
This work Tunable monitor for closed-loop AVSbull Can be applied as a generic monitorbull Or tuned to capture design-specific performance
83ISVLSI-2014 invited talk 140710
Design of RO with Tunable Vmin
bull Identified two circuit knobs to tune Vmin
bull Series resistancebull Cell types (INV NAND NOR)
bull Proposed circuitbull Different cell type covers different process cornersbull Tune series resistance of each stage to high or low
1 bit 1 bit 1 bit Control pins
High resistance
Low resistance
84ISVLSI-2014 invited talk 140710
Benefit of Resilience Cost Reductionbull Reference flows
bull Pure-margin (PM) conventional methodology w only margin insertionbull Brute-force (BF) insert error-tolerant FFs at timing-critical endpoints
bull Proposed method (CO) achieves up to 20 energy reduction compared to reference methods
bull Resilience benefits increase with safety margin
PM BF CO PM BF CO PM BF CO25
30
35
40
45
50
55 Energy penalty of throughput degradationEnergy penalty of additional circuitsEnergy wo resilience
En
erg
y (
mJ
)
Large marginMedium marginSmall margin
MUL
PM BF CO PM BF CO PM BF CO25
27
29
31
33
35
En
erg
y (
mJ
)
EXU
Large marginMedium marginSmall margin
Smallmediumlarge margin safety margin = 51015 of clock period
85ISVLSI-2014 invited talk 140710
Increased Benefit of Resilience With AVSbull AVS (Adaptive Voltage Scaling) allows lower supply voltage for
resilient designs reduced powerbull We trade off between timing-error penalty vs reduced power at a
lower supply voltagebull Average 18 energy reduction compared to pure-margin designs
Resilience benefits increase in AVS context
084 088 092 096 10030
36
42
48
54
60brute-forcepure-marginCombOpt
Supply voltage (V)
En
erg
y (
mJ
)
084 086 088 09 092 094 096 098 1 10225
29
33
37
41
45brute-force
pure-margin
CombOpt
Supply voltage (V)
En
erg
y (
mJ
)
MUL EXU
Minimum achievable energy
86ISVLSI-2014 invited talk 140710
Overall Optimization Flow
bull Iteratively optimize with SEOpt and SkewOpt
Initial placement (all FFs = error-tolerant FFs)
Energy lt min energy
Save current solution
Margin insertion on K paths based on sensitivity function
Replace error-tolerant FFs w normal FFs
SEOpt
Activity-aware clock skew optimization
SkewOpt
Toward Holistic Modeling Margining and Tolerance of IC Variabi
IC Variability
Challenge Value of Technology
Solutions Modeling Margining Tolerance
Outline
BEOL Corner Optimization
Proposed Timing Signoff Flow
Conventional BEOL Corners
Statistical RC Model
Pessimism of Conventional BEOL Corners (CBC)
Intuition on Delay Variability Across Cw RCw
Intuition on Delay Variability Across Cw RCw (2)
Scaling Factor α and Delay Variation
Find Paths for Which TBCs Can Be Used
Determining α Arcw and Acw
Benefits of Tightened BEOL Corners
Outline (2)
How to Minimize Cost of Resilience
Tradeoff Resilience Cost vs Datapath Cost
Selective-Endpoint Optimization (SEOpt)
Clock Skew Optimization (SkewOpt)
Overall Optimization Flow
Benefit of Low-Cost Resilience
Increased Benefit of Resilience with AVS
Outline (3)
Breaking Chicken-Egg Loops Less Margin
Derated Library Characterization and AVS
Library Characterization for AVS
Power vs Area Across Different Signoffs
Heuristics 1
Vfinal Estimation
Observation and Heuristic 2
Proposed Library Characterization Flow
Power vs Area for All Designs
Also Multi-Mode Signoff Choices Matter
Also Tunable Monitors Less Margin
Also Tunable Monitors Less Margin (2)
Outline (4)
Conclusions
Thank You
Backup
Power Penalty to Fix EM with AVS
Homogeneous Corners
Homogeneous Corners (2)
Correlation Matrix
Wiring Structure in Timing-Critical Paths (2)
Delay Variation
Delay Variation (2)
Non-Homogeneous Corner
Opportunities for Tightened BEOL Corners
Wiring Structure in Timing-Critical Paths (2)
Proposed Timing Signoff Flow (2)
Experiment Setup
Further Analysis
Scaling Factor Results
Benefits of Tightened BEOL Corners (1)
Heuristics 1 (2)
Vfinal Estimation (2)
Observation and Heuristic 2 (2)
Technology and Benchmark Circuits
A Reference Signoff Flow
Experiment Setup (2)
ldquoChicken and Eggrdquo Loop
Bias Temperature Instability (BTI)
Observation 1
Results for DC Scenario
Problem Signoff Corner Definition
AVS Signoff Corner Selection
AVS Impact on EM Lifetime
EM Impact on AVS Scheduling
What is ldquoSignoffrdquo
Statistical Timing Analysis (1)
Statistical Timing Analysis (2)
Resilient Designs
Resilience Cost Reduction Problem
Selective-Endpoint Optimization
Process-Aware Vdd Scaling (PVS)
Challenge Variability
Energy Reduction in AVS Context
Our Concept Mode Dominance
Our Method Global Optimization
Classes of Closed-Loop AVS
Design of RO with Tunable Vmin
Benefit of Resilience Cost Reduction
Increased Benefit of Resilience With AVS
Overall Optimization Flow (2)
47ISVLSI-2014 invited talk 140710
Delay Variation
α α
Δdelay at C-worst [d(Ycw) ndash d(Ytyp)] d(Ytyp)
bull Some paths have α gt 10 a CBC can underestimate delay variationsbull But these paths have larger delays at the other corner
C-worst corner underestimates delay variations but these paths are dominated by the RC-worst corner
α lt 10 delay variations are covered by the RC-worst corner
Dominated by C-worstΔdelay at C-worst gt Δdelay at RC-worst
Dominated by RC-worst Δdelay at RC-worst gt Δdelay at C-worst
Δdelay at RC-worst [d(Ycw) ndash d(Ytyp)] d(Ytyp)
48ISVLSI-2014 invited talk 140710
Delay Variation
α α
Δdelay at C-worst [d(Ycw) ndash d(Ytyp)] d(Ytyp)
bull Some paths have α gt 10 a CBC can underestimate delay variationsbull But these paths have larger delays at the other corner
C-worst corner underestimates delay variations but these paths are dominated by the RC-worst corner
α lt 10 delay variations are covered by the RC-worst corner
Dominated by C-worstΔdelay at C-worst gt Δdelay at RC-worst
Dominated by RC-worst Δdelay at RC-worst gt Δdelay at C-worst
Δdelay at RC-worst [d(Ycw) ndash d(Ytyp)] d(Ytyp)
bull Paths are more sensitive to R or to Cbull Using RC-worst or C-worst only will underestimate delay variationsbull Need both RC- and C-worst corners to cover process variationsbull In the following discussions α is defined at the dominant corner
49ISVLSI-2014 invited talk 140710
Non-Homogeneous Corner
bull Each layer can have different skewed variationsInterconnect stack with M1 and M2
M1 C
M2 C
3σ
Non-homogeneous cornerM1 == Cw (3σ)M2 == Ctyp
bull Less pessimism with non-homogeneous cornersbull Challenge
bull Many feasible combinationsbull A corner can only cover certain pathsbull How to choose the best combinations
50ISVLSI-2014 invited talk 140710
Opportunities for Tightened BEOL Corners
bull CBC can be pessimistic Most paths have α lt 05 bull Use tightened BEOL corners eg scale BEOL variation in
itf with α = 05
Δdj(Yrcw)dj(Ytyp) x 100
3σjd(Ytyp) x 100
Challenge how to avoid underestimating delay variation to preserve parametric yield
51ISVLSI-2014 invited talk 140710
Wiring Structure in Timing-Critical Paths
bull Critical paths are structurally similar
bull Wires on critical paths are routed on many layers
bull Structure is an outcome of the design flow
Testcasebull 45nm foundry library (wire
resistivity scaled by 8X)bull Netlist NETCARD 1mm2 570K
standard cell instancesbull 9 metal layersbull Extract critical paths from
different PVT and BEOL corners
Wirelength ratio ()
52ISVLSI-2014 invited talk 140710
Proposed Timing Signoff Flow
bull Extract RC at RC-worst C-worst and the typical corners
bull Calculate Δdelay of critical paths
bull Put path j in the group Gtbc if Δdelay is larger than a threshold
bull Fix only the paths in Gtbc using tightened BEOL corners
bull Since tightened corners have smaller delay variations timing closure is easier
Routed design
Timing analysis at BEOL corners Ytyp Ycw Yrcw
GTBC GCBC
ECOusing CBC
Timing analysis
using TBC
violation = 0
Timing analysis
using CBC
violation = 0
ECOusing TBC
done
53ISVLSI-2014 invited talk 140710
Experiment Setup
LEON3MP NETCARD SUPERBLUE12Clock period (ns) 18 20 31
bull Model BTI degradation with Vfinal throughout lifetime
bull Aging of a flat Vfinal asymp aging of an adaptive Vdd
bull But slightly pessimistic
Vdd
time
NBTI
PBTI
VBTI = Vlib asymp Vfinal
58ISVLSI-2014 invited talk 140710
Vfinal Estimation
bull Problem Vfinal is not available at early design stage (design has not been implemented)
bull Vfinal = Vdd end of life (to compensate BTI aging)
bull Gates along critical pathbull Timing slack at t = 0
bull Circuit activity is not an issue bull Because BTI effect is not sensitive to circuit activitybull DC or AC stress model is sufficient
59ISVLSI-2014 invited talk 140710
Observation and Heuristic 2
bull Observation 2 Vfinal is not sensitive to gate types
bull Heuristic 2 use average Vfinal of different gate typesbull Vfinal is a function of timing slack
bull Assume timing slack = 0
10mV
60ISVLSI-2014 invited talk 140710
Technology and Benchmark Circuits
bull NANGATE library with 32nm PTM technology bull Signoff for setup time violationbull Temperature = 125Cbull Process corner = slow NMOS and PMOSbull BTI degradation = DC AC
Supply voltages
Circuit Frequency (GHz)C5315 138c7552 125AES 089MPEG2 105
Vmax105V
Vinit090V
Vheur1 (DC) 097V
Vheur1 (AC) 095V
Vheur2 (DC) 095V
Vheur2 (AC) 093V
61ISVLSI-2014 invited talk 140710
A Reference Signoff Flow
bull Basic idea keep a consistent VBTI VLIB and Vdd throughout circuit lifetime
bull Signoff flowbull Estimate aging at each time step
bull Update circuit timing and Vdd
bull Repeat until t = tfinal
bull Modify circuit and start over if Vfinal gt maximum allowed voltage
bull No overhead in timing analysis but very slow Many STA runs
and library
Vstep AVS voltage stepVfinal converged voltage
62ISVLSI-2014 invited talk 140710
Experiment Setupbull Characterize different derated libraries
bull Evaluate impact of library characterizationbull Seven setups
1 VBTI = Vlib = Vinit Ignore AVS2 Most pessimistic derated library3 VBTI = Vlib = Vmax Extreme corner for AVS4 VBTI = Vfinal Do not overestimate aging but ignores AVS5 No derated library (reference)6 Proposed method with α=07 Proposed method with α=003
Case 1 2 3 4 5 6 7Vlib(V) Vinit Vinit Vmax Vinit NA Vheur1 Vheur2
VBTI (V) Vinit Vmax Vmax Vfinal NA Vheur1 Vheur2
63ISVLSI-2014 invited talk 140710
ldquoChicken and Eggrdquo Loop
bull ldquoChicken and eggrdquo loop in signoffbull Derated library characterization is related to BTI + AVSbull AVS affected by circuit implementation
bull Timing constraints critical paths etc
bull Circuit is affected by library characterization
Circuit
Derated Libraries
Vfinal
Vlib VBTI
64ISVLSI-2014 invited talk 140710
Bias Temperature Instability (BTI)
|ΔVth| increases when device is on (stressed)|ΔVth| is partially recovered when device is off (relaxed)NBTI PMOS PBTINMOS
|Vgs|
time
ON OFF ON OFF
[VattikondaWC06]
Device aging (|ΔVth|) accumulates over time
[TCASrsquo14]
65ISVLSI-2014 invited talk 140710
Observation 1
[Chan11]
bull BTI is a ldquofront-loadedrdquo phenomenon
bull 50 BTI aging happens within the 1st year of circuit lifetime (total lifetime = 10 years)
bull Most Vdd increment happens in early lifetime
bull Gap between Vdd and Vfinal reduces rapidly
asymp70 Vdd increment in 1 year(remaining 30 over 9 years)
Vfinal
66ISVLSI-2014 invited talk 140710
Results for DC Scenario
Optimistic signoff corner bull AVS increases supply voltage
aggressively to compensate agingbull Consume more powerbull May fail to meet timing if desired
bull Assume no EM fix at signoffbull BTI degradation is checked at each step and MTTF is updated as
30 MTTF penalty
200mV voltage compensation
>
70ISVLSI-2014 invited talk 140710
0 2 4 6 8 10 12090092094096098100102104
S1 S2 S3 S4 S5
Year
VDD
DMA 3S1 S2 S3 S4 S5
78
79
80
81
MTT
F (Y
ear)
EM Impact on AVS Scheduling
12 years MTTF penalty
>
71ISVLSI-2014 invited talk 140710
What is ldquoSignoffrdquo
bull Foundation of contract between design house and foundrybull ldquochip should workrdquo stack of models margins analysesbull Function timing signal integrity power integrity hellip
Nominal VddStatic IR drop
Power grid IR gradientDynamic IR
HCINBTI
Signoff Vdd
Voltage
Problem Margins = pessimism
overdesign schedule delay
ldquomargin stackrdquo for voltage signoff
Operating voltage
72ISVLSI-2014 invited talk 140710
Statistical Timing Analysis (1)
bull Delay sensitivity of path pj to variation source zv
bull Assumptions bull Δdjv is linear with respect to variation sources
bull Variation sources are normal distributions
bull Obtain Δdjv using 28 runs of RC extraction and static timing analysis (STA)
28 itf files (27 variation
sources + Ytyp)
Routed Netlist
RC extraction
STA
Δdjv
Δdjv = [ - ] 3dj(Yv) dj(Ytyp)
Note Path delay includes gate and wire delays
73ISVLSI-2014 invited talk 140710
Statistical Timing Analysis (2)bull Σ is the correlation matrix for variation sources (eg 27 x 27)
bull Σ = λλT (Note λ is obtained by Cholesky decomposition)
h h119879 119903119900119906119892 119901119906119905=1minus119864119877119879
+1minus119864119877119903times119879
recovery cycles
Clock period
Error rate [Kahng10]
76ISVLSI-2014 invited talk 140710
Selective-Endpoint Optimizationbull Optimize fanin cone w tighter constraints Allows replacement of Razor FF w normal FFbull Trade off cost of resilience vs data path optimization
Energy Reduction in AVS Contextbull Adaptive voltage scaling allows lower supply voltage for resilient
designs thus reduced powerbull Proposed method trades off between timing-error penalty vs
reduced power at a lower supply voltagebull Proposed method achieves an average of 18 energy reduction
compared to pure-margin designs Resilience benefits increase in the context of AVS strategy
084 088 092 096 10030
36
42
48
54
60brute-forcepure-marginCombOpt
Supply voltage (V)
En
erg
y (
mJ
)
084 086 088 09 092 094 096 098 1 10225
29
33
37
41
45brute-force
pure-margin
CombOpt
Supply voltage (V)
En
erg
y (
mJ
)
MUL EXU
Minimum achievable energy
80ISVLSI-2014 invited talk 140710
Our Concept Mode Dominancebull Design cone (of mode A) is the union of all the feasible operating modes for
circuits signed of at mode Abull Design cone is determined by tradeoff between voltage and frequency (mainly
threshold voltages)bull One mode is outside of the design cone of the other
failed design overdesignbull Mode A has positive timing slacks with respect to mode B
mode A dominates mode Bbull Equivalent dominance no mode is dominated by the other
bull Modes are in each othersrsquo design cone
Voltage
Frequency
A
Negative Slacks = failed design
Positive Slacks = overdesign
B
C
Design Cone of mode A
Multi-mode signoff at modes which do not exhibit equivalent dominance leads to overdesign
Guideline search for signoff modes within design cone reduce overdesign
81ISVLSI-2014 invited talk 140710
Our Method Global Optimizationbull Iteratively sample and refine power models
bull Avoid circuit implementation at each modebull Small constant of runs is enough Scalable
Sample (SPampR)
Construct power models
Estimate optimal signoff modes
Sample (SPampR)
Refine power models
Adaptive search
Global optimization flow
09 10 11 1214
15
16
17
18
19
201st 2nd real
Signoff Voltage (v)
Po
wer
(m
W)
Power estimation of adaptive search
bull Ovals indicate sample pointsbull 1st 2nd power from power models at first
second iterationbull real power from real implemented circuits
Design AESf 700MHz
82ISVLSI-2014 invited talk 140710
Classes of Closed-Loop AVS
bull Critical path may be difficult to identify (IP from 3rd party)
bull Calibrating monitors at multiple modesvoltages requires long test time
Closed-Loop AVS
Design-dependent replica
In-situmonitor
Generic monitor
bull Does not capture design-specific performance variation
82
This work Tunable monitor for closed-loop AVSbull Can be applied as a generic monitorbull Or tuned to capture design-specific performance
83ISVLSI-2014 invited talk 140710
Design of RO with Tunable Vmin
bull Identified two circuit knobs to tune Vmin
bull Series resistancebull Cell types (INV NAND NOR)
bull Proposed circuitbull Different cell type covers different process cornersbull Tune series resistance of each stage to high or low
1 bit 1 bit 1 bit Control pins
High resistance
Low resistance
84ISVLSI-2014 invited talk 140710
Benefit of Resilience Cost Reductionbull Reference flows
bull Pure-margin (PM) conventional methodology w only margin insertionbull Brute-force (BF) insert error-tolerant FFs at timing-critical endpoints
bull Proposed method (CO) achieves up to 20 energy reduction compared to reference methods
bull Resilience benefits increase with safety margin
PM BF CO PM BF CO PM BF CO25
30
35
40
45
50
55 Energy penalty of throughput degradationEnergy penalty of additional circuitsEnergy wo resilience
En
erg
y (
mJ
)
Large marginMedium marginSmall margin
MUL
PM BF CO PM BF CO PM BF CO25
27
29
31
33
35
En
erg
y (
mJ
)
EXU
Large marginMedium marginSmall margin
Smallmediumlarge margin safety margin = 51015 of clock period
85ISVLSI-2014 invited talk 140710
Increased Benefit of Resilience With AVSbull AVS (Adaptive Voltage Scaling) allows lower supply voltage for
resilient designs reduced powerbull We trade off between timing-error penalty vs reduced power at a
lower supply voltagebull Average 18 energy reduction compared to pure-margin designs
Resilience benefits increase in AVS context
084 088 092 096 10030
36
42
48
54
60brute-forcepure-marginCombOpt
Supply voltage (V)
En
erg
y (
mJ
)
084 086 088 09 092 094 096 098 1 10225
29
33
37
41
45brute-force
pure-margin
CombOpt
Supply voltage (V)
En
erg
y (
mJ
)
MUL EXU
Minimum achievable energy
86ISVLSI-2014 invited talk 140710
Overall Optimization Flow
bull Iteratively optimize with SEOpt and SkewOpt
Initial placement (all FFs = error-tolerant FFs)
Energy lt min energy
Save current solution
Margin insertion on K paths based on sensitivity function
Replace error-tolerant FFs w normal FFs
SEOpt
Activity-aware clock skew optimization
SkewOpt
Toward Holistic Modeling Margining and Tolerance of IC Variabi
IC Variability
Challenge Value of Technology
Solutions Modeling Margining Tolerance
Outline
BEOL Corner Optimization
Proposed Timing Signoff Flow
Conventional BEOL Corners
Statistical RC Model
Pessimism of Conventional BEOL Corners (CBC)
Intuition on Delay Variability Across Cw RCw
Intuition on Delay Variability Across Cw RCw (2)
Scaling Factor α and Delay Variation
Find Paths for Which TBCs Can Be Used
Determining α Arcw and Acw
Benefits of Tightened BEOL Corners
Outline (2)
How to Minimize Cost of Resilience
Tradeoff Resilience Cost vs Datapath Cost
Selective-Endpoint Optimization (SEOpt)
Clock Skew Optimization (SkewOpt)
Overall Optimization Flow
Benefit of Low-Cost Resilience
Increased Benefit of Resilience with AVS
Outline (3)
Breaking Chicken-Egg Loops Less Margin
Derated Library Characterization and AVS
Library Characterization for AVS
Power vs Area Across Different Signoffs
Heuristics 1
Vfinal Estimation
Observation and Heuristic 2
Proposed Library Characterization Flow
Power vs Area for All Designs
Also Multi-Mode Signoff Choices Matter
Also Tunable Monitors Less Margin
Also Tunable Monitors Less Margin (2)
Outline (4)
Conclusions
Thank You
Backup
Power Penalty to Fix EM with AVS
Homogeneous Corners
Homogeneous Corners (2)
Correlation Matrix
Wiring Structure in Timing-Critical Paths (2)
Delay Variation
Delay Variation (2)
Non-Homogeneous Corner
Opportunities for Tightened BEOL Corners
Wiring Structure in Timing-Critical Paths (2)
Proposed Timing Signoff Flow (2)
Experiment Setup
Further Analysis
Scaling Factor Results
Benefits of Tightened BEOL Corners (1)
Heuristics 1 (2)
Vfinal Estimation (2)
Observation and Heuristic 2 (2)
Technology and Benchmark Circuits
A Reference Signoff Flow
Experiment Setup (2)
ldquoChicken and Eggrdquo Loop
Bias Temperature Instability (BTI)
Observation 1
Results for DC Scenario
Problem Signoff Corner Definition
AVS Signoff Corner Selection
AVS Impact on EM Lifetime
EM Impact on AVS Scheduling
What is ldquoSignoffrdquo
Statistical Timing Analysis (1)
Statistical Timing Analysis (2)
Resilient Designs
Resilience Cost Reduction Problem
Selective-Endpoint Optimization
Process-Aware Vdd Scaling (PVS)
Challenge Variability
Energy Reduction in AVS Context
Our Concept Mode Dominance
Our Method Global Optimization
Classes of Closed-Loop AVS
Design of RO with Tunable Vmin
Benefit of Resilience Cost Reduction
Increased Benefit of Resilience With AVS
Overall Optimization Flow (2)
48ISVLSI-2014 invited talk 140710
Delay Variation
α α
Δdelay at C-worst [d(Ycw) ndash d(Ytyp)] d(Ytyp)
bull Some paths have α gt 10 a CBC can underestimate delay variationsbull But these paths have larger delays at the other corner
C-worst corner underestimates delay variations but these paths are dominated by the RC-worst corner
α lt 10 delay variations are covered by the RC-worst corner
Dominated by C-worstΔdelay at C-worst gt Δdelay at RC-worst
Dominated by RC-worst Δdelay at RC-worst gt Δdelay at C-worst
Δdelay at RC-worst [d(Ycw) ndash d(Ytyp)] d(Ytyp)
bull Paths are more sensitive to R or to Cbull Using RC-worst or C-worst only will underestimate delay variationsbull Need both RC- and C-worst corners to cover process variationsbull In the following discussions α is defined at the dominant corner
49ISVLSI-2014 invited talk 140710
Non-Homogeneous Corner
bull Each layer can have different skewed variationsInterconnect stack with M1 and M2
M1 C
M2 C
3σ
Non-homogeneous cornerM1 == Cw (3σ)M2 == Ctyp
bull Less pessimism with non-homogeneous cornersbull Challenge
bull Many feasible combinationsbull A corner can only cover certain pathsbull How to choose the best combinations
50ISVLSI-2014 invited talk 140710
Opportunities for Tightened BEOL Corners
bull CBC can be pessimistic Most paths have α lt 05 bull Use tightened BEOL corners eg scale BEOL variation in
itf with α = 05
Δdj(Yrcw)dj(Ytyp) x 100
3σjd(Ytyp) x 100
Challenge how to avoid underestimating delay variation to preserve parametric yield
51ISVLSI-2014 invited talk 140710
Wiring Structure in Timing-Critical Paths
bull Critical paths are structurally similar
bull Wires on critical paths are routed on many layers
bull Structure is an outcome of the design flow
Testcasebull 45nm foundry library (wire
resistivity scaled by 8X)bull Netlist NETCARD 1mm2 570K
standard cell instancesbull 9 metal layersbull Extract critical paths from
different PVT and BEOL corners
Wirelength ratio ()
52ISVLSI-2014 invited talk 140710
Proposed Timing Signoff Flow
bull Extract RC at RC-worst C-worst and the typical corners
bull Calculate Δdelay of critical paths
bull Put path j in the group Gtbc if Δdelay is larger than a threshold
bull Fix only the paths in Gtbc using tightened BEOL corners
bull Since tightened corners have smaller delay variations timing closure is easier
Routed design
Timing analysis at BEOL corners Ytyp Ycw Yrcw
GTBC GCBC
ECOusing CBC
Timing analysis
using TBC
violation = 0
Timing analysis
using CBC
violation = 0
ECOusing TBC
done
53ISVLSI-2014 invited talk 140710
Experiment Setup
LEON3MP NETCARD SUPERBLUE12Clock period (ns) 18 20 31
bull Model BTI degradation with Vfinal throughout lifetime
bull Aging of a flat Vfinal asymp aging of an adaptive Vdd
bull But slightly pessimistic
Vdd
time
NBTI
PBTI
VBTI = Vlib asymp Vfinal
58ISVLSI-2014 invited talk 140710
Vfinal Estimation
bull Problem Vfinal is not available at early design stage (design has not been implemented)
bull Vfinal = Vdd end of life (to compensate BTI aging)
bull Gates along critical pathbull Timing slack at t = 0
bull Circuit activity is not an issue bull Because BTI effect is not sensitive to circuit activitybull DC or AC stress model is sufficient
59ISVLSI-2014 invited talk 140710
Observation and Heuristic 2
bull Observation 2 Vfinal is not sensitive to gate types
bull Heuristic 2 use average Vfinal of different gate typesbull Vfinal is a function of timing slack
bull Assume timing slack = 0
10mV
60ISVLSI-2014 invited talk 140710
Technology and Benchmark Circuits
bull NANGATE library with 32nm PTM technology bull Signoff for setup time violationbull Temperature = 125Cbull Process corner = slow NMOS and PMOSbull BTI degradation = DC AC
Supply voltages
Circuit Frequency (GHz)C5315 138c7552 125AES 089MPEG2 105
Vmax105V
Vinit090V
Vheur1 (DC) 097V
Vheur1 (AC) 095V
Vheur2 (DC) 095V
Vheur2 (AC) 093V
61ISVLSI-2014 invited talk 140710
A Reference Signoff Flow
bull Basic idea keep a consistent VBTI VLIB and Vdd throughout circuit lifetime
bull Signoff flowbull Estimate aging at each time step
bull Update circuit timing and Vdd
bull Repeat until t = tfinal
bull Modify circuit and start over if Vfinal gt maximum allowed voltage
bull No overhead in timing analysis but very slow Many STA runs
and library
Vstep AVS voltage stepVfinal converged voltage
62ISVLSI-2014 invited talk 140710
Experiment Setupbull Characterize different derated libraries
bull Evaluate impact of library characterizationbull Seven setups
1 VBTI = Vlib = Vinit Ignore AVS2 Most pessimistic derated library3 VBTI = Vlib = Vmax Extreme corner for AVS4 VBTI = Vfinal Do not overestimate aging but ignores AVS5 No derated library (reference)6 Proposed method with α=07 Proposed method with α=003
Case 1 2 3 4 5 6 7Vlib(V) Vinit Vinit Vmax Vinit NA Vheur1 Vheur2
VBTI (V) Vinit Vmax Vmax Vfinal NA Vheur1 Vheur2
63ISVLSI-2014 invited talk 140710
ldquoChicken and Eggrdquo Loop
bull ldquoChicken and eggrdquo loop in signoffbull Derated library characterization is related to BTI + AVSbull AVS affected by circuit implementation
bull Timing constraints critical paths etc
bull Circuit is affected by library characterization
Circuit
Derated Libraries
Vfinal
Vlib VBTI
64ISVLSI-2014 invited talk 140710
Bias Temperature Instability (BTI)
|ΔVth| increases when device is on (stressed)|ΔVth| is partially recovered when device is off (relaxed)NBTI PMOS PBTINMOS
|Vgs|
time
ON OFF ON OFF
[VattikondaWC06]
Device aging (|ΔVth|) accumulates over time
[TCASrsquo14]
65ISVLSI-2014 invited talk 140710
Observation 1
[Chan11]
bull BTI is a ldquofront-loadedrdquo phenomenon
bull 50 BTI aging happens within the 1st year of circuit lifetime (total lifetime = 10 years)
bull Most Vdd increment happens in early lifetime
bull Gap between Vdd and Vfinal reduces rapidly
asymp70 Vdd increment in 1 year(remaining 30 over 9 years)
Vfinal
66ISVLSI-2014 invited talk 140710
Results for DC Scenario
Optimistic signoff corner bull AVS increases supply voltage
aggressively to compensate agingbull Consume more powerbull May fail to meet timing if desired
bull Assume no EM fix at signoffbull BTI degradation is checked at each step and MTTF is updated as
30 MTTF penalty
200mV voltage compensation
>
70ISVLSI-2014 invited talk 140710
0 2 4 6 8 10 12090092094096098100102104
S1 S2 S3 S4 S5
Year
VDD
DMA 3S1 S2 S3 S4 S5
78
79
80
81
MTT
F (Y
ear)
EM Impact on AVS Scheduling
12 years MTTF penalty
>
71ISVLSI-2014 invited talk 140710
What is ldquoSignoffrdquo
bull Foundation of contract between design house and foundrybull ldquochip should workrdquo stack of models margins analysesbull Function timing signal integrity power integrity hellip
Nominal VddStatic IR drop
Power grid IR gradientDynamic IR
HCINBTI
Signoff Vdd
Voltage
Problem Margins = pessimism
overdesign schedule delay
ldquomargin stackrdquo for voltage signoff
Operating voltage
72ISVLSI-2014 invited talk 140710
Statistical Timing Analysis (1)
bull Delay sensitivity of path pj to variation source zv
bull Assumptions bull Δdjv is linear with respect to variation sources
bull Variation sources are normal distributions
bull Obtain Δdjv using 28 runs of RC extraction and static timing analysis (STA)
28 itf files (27 variation
sources + Ytyp)
Routed Netlist
RC extraction
STA
Δdjv
Δdjv = [ - ] 3dj(Yv) dj(Ytyp)
Note Path delay includes gate and wire delays
73ISVLSI-2014 invited talk 140710
Statistical Timing Analysis (2)bull Σ is the correlation matrix for variation sources (eg 27 x 27)
bull Σ = λλT (Note λ is obtained by Cholesky decomposition)
h h119879 119903119900119906119892 119901119906119905=1minus119864119877119879
+1minus119864119877119903times119879
recovery cycles
Clock period
Error rate [Kahng10]
76ISVLSI-2014 invited talk 140710
Selective-Endpoint Optimizationbull Optimize fanin cone w tighter constraints Allows replacement of Razor FF w normal FFbull Trade off cost of resilience vs data path optimization
bull Model BTI degradation with Vfinal throughout lifetime
bull Aging of a flat Vfinal asymp aging of an adaptive Vdd
bull But slightly pessimistic
Vdd
time
NBTI
PBTI
VBTI = Vlib asymp Vfinal
58ISVLSI-2014 invited talk 140710
Vfinal Estimation
bull Problem Vfinal is not available at early design stage (design has not been implemented)
bull Vfinal = Vdd end of life (to compensate BTI aging)
bull Gates along critical pathbull Timing slack at t = 0
bull Circuit activity is not an issue bull Because BTI effect is not sensitive to circuit activitybull DC or AC stress model is sufficient
59ISVLSI-2014 invited talk 140710
Observation and Heuristic 2
bull Observation 2 Vfinal is not sensitive to gate types
bull Heuristic 2 use average Vfinal of different gate typesbull Vfinal is a function of timing slack
bull Assume timing slack = 0
10mV
60ISVLSI-2014 invited talk 140710
Technology and Benchmark Circuits
bull NANGATE library with 32nm PTM technology bull Signoff for setup time violationbull Temperature = 125Cbull Process corner = slow NMOS and PMOSbull BTI degradation = DC AC
Supply voltages
Circuit Frequency (GHz)C5315 138c7552 125AES 089MPEG2 105
Vmax105V
Vinit090V
Vheur1 (DC) 097V
Vheur1 (AC) 095V
Vheur2 (DC) 095V
Vheur2 (AC) 093V
61ISVLSI-2014 invited talk 140710
A Reference Signoff Flow
bull Basic idea keep a consistent VBTI VLIB and Vdd throughout circuit lifetime
bull Signoff flowbull Estimate aging at each time step
bull Update circuit timing and Vdd
bull Repeat until t = tfinal
bull Modify circuit and start over if Vfinal gt maximum allowed voltage
bull No overhead in timing analysis but very slow Many STA runs
and library
Vstep AVS voltage stepVfinal converged voltage
62ISVLSI-2014 invited talk 140710
Experiment Setupbull Characterize different derated libraries
bull Evaluate impact of library characterizationbull Seven setups
1 VBTI = Vlib = Vinit Ignore AVS2 Most pessimistic derated library3 VBTI = Vlib = Vmax Extreme corner for AVS4 VBTI = Vfinal Do not overestimate aging but ignores AVS5 No derated library (reference)6 Proposed method with α=07 Proposed method with α=003
Case 1 2 3 4 5 6 7Vlib(V) Vinit Vinit Vmax Vinit NA Vheur1 Vheur2
VBTI (V) Vinit Vmax Vmax Vfinal NA Vheur1 Vheur2
63ISVLSI-2014 invited talk 140710
ldquoChicken and Eggrdquo Loop
bull ldquoChicken and eggrdquo loop in signoffbull Derated library characterization is related to BTI + AVSbull AVS affected by circuit implementation
bull Timing constraints critical paths etc
bull Circuit is affected by library characterization
Circuit
Derated Libraries
Vfinal
Vlib VBTI
64ISVLSI-2014 invited talk 140710
Bias Temperature Instability (BTI)
|ΔVth| increases when device is on (stressed)|ΔVth| is partially recovered when device is off (relaxed)NBTI PMOS PBTINMOS
|Vgs|
time
ON OFF ON OFF
[VattikondaWC06]
Device aging (|ΔVth|) accumulates over time
[TCASrsquo14]
65ISVLSI-2014 invited talk 140710
Observation 1
[Chan11]
bull BTI is a ldquofront-loadedrdquo phenomenon
bull 50 BTI aging happens within the 1st year of circuit lifetime (total lifetime = 10 years)
bull Most Vdd increment happens in early lifetime
bull Gap between Vdd and Vfinal reduces rapidly
asymp70 Vdd increment in 1 year(remaining 30 over 9 years)
Vfinal
66ISVLSI-2014 invited talk 140710
Results for DC Scenario
Optimistic signoff corner bull AVS increases supply voltage
aggressively to compensate agingbull Consume more powerbull May fail to meet timing if desired
bull Assume no EM fix at signoffbull BTI degradation is checked at each step and MTTF is updated as
30 MTTF penalty
200mV voltage compensation
>
70ISVLSI-2014 invited talk 140710
0 2 4 6 8 10 12090092094096098100102104
S1 S2 S3 S4 S5
Year
VDD
DMA 3S1 S2 S3 S4 S5
78
79
80
81
MTT
F (Y
ear)
EM Impact on AVS Scheduling
12 years MTTF penalty
>
71ISVLSI-2014 invited talk 140710
What is ldquoSignoffrdquo
bull Foundation of contract between design house and foundrybull ldquochip should workrdquo stack of models margins analysesbull Function timing signal integrity power integrity hellip
Nominal VddStatic IR drop
Power grid IR gradientDynamic IR
HCINBTI
Signoff Vdd
Voltage
Problem Margins = pessimism
overdesign schedule delay
ldquomargin stackrdquo for voltage signoff
Operating voltage
72ISVLSI-2014 invited talk 140710
Statistical Timing Analysis (1)
bull Delay sensitivity of path pj to variation source zv
bull Assumptions bull Δdjv is linear with respect to variation sources
bull Variation sources are normal distributions
bull Obtain Δdjv using 28 runs of RC extraction and static timing analysis (STA)
28 itf files (27 variation
sources + Ytyp)
Routed Netlist
RC extraction
STA
Δdjv
Δdjv = [ - ] 3dj(Yv) dj(Ytyp)
Note Path delay includes gate and wire delays
73ISVLSI-2014 invited talk 140710
Statistical Timing Analysis (2)bull Σ is the correlation matrix for variation sources (eg 27 x 27)
bull Σ = λλT (Note λ is obtained by Cholesky decomposition)
h h119879 119903119900119906119892 119901119906119905=1minus119864119877119879
+1minus119864119877119903times119879
recovery cycles
Clock period
Error rate [Kahng10]
76ISVLSI-2014 invited talk 140710
Selective-Endpoint Optimizationbull Optimize fanin cone w tighter constraints Allows replacement of Razor FF w normal FFbull Trade off cost of resilience vs data path optimization
bull Model BTI degradation with Vfinal throughout lifetime
bull Aging of a flat Vfinal asymp aging of an adaptive Vdd
bull But slightly pessimistic
Vdd
time
NBTI
PBTI
VBTI = Vlib asymp Vfinal
58ISVLSI-2014 invited talk 140710
Vfinal Estimation
bull Problem Vfinal is not available at early design stage (design has not been implemented)
bull Vfinal = Vdd end of life (to compensate BTI aging)
bull Gates along critical pathbull Timing slack at t = 0
bull Circuit activity is not an issue bull Because BTI effect is not sensitive to circuit activitybull DC or AC stress model is sufficient
59ISVLSI-2014 invited talk 140710
Observation and Heuristic 2
bull Observation 2 Vfinal is not sensitive to gate types
bull Heuristic 2 use average Vfinal of different gate typesbull Vfinal is a function of timing slack
bull Assume timing slack = 0
10mV
60ISVLSI-2014 invited talk 140710
Technology and Benchmark Circuits
bull NANGATE library with 32nm PTM technology bull Signoff for setup time violationbull Temperature = 125Cbull Process corner = slow NMOS and PMOSbull BTI degradation = DC AC
Supply voltages
Circuit Frequency (GHz)C5315 138c7552 125AES 089MPEG2 105
Vmax105V
Vinit090V
Vheur1 (DC) 097V
Vheur1 (AC) 095V
Vheur2 (DC) 095V
Vheur2 (AC) 093V
61ISVLSI-2014 invited talk 140710
A Reference Signoff Flow
bull Basic idea keep a consistent VBTI VLIB and Vdd throughout circuit lifetime
bull Signoff flowbull Estimate aging at each time step
bull Update circuit timing and Vdd
bull Repeat until t = tfinal
bull Modify circuit and start over if Vfinal gt maximum allowed voltage
bull No overhead in timing analysis but very slow Many STA runs
and library
Vstep AVS voltage stepVfinal converged voltage
62ISVLSI-2014 invited talk 140710
Experiment Setupbull Characterize different derated libraries
bull Evaluate impact of library characterizationbull Seven setups
1 VBTI = Vlib = Vinit Ignore AVS2 Most pessimistic derated library3 VBTI = Vlib = Vmax Extreme corner for AVS4 VBTI = Vfinal Do not overestimate aging but ignores AVS5 No derated library (reference)6 Proposed method with α=07 Proposed method with α=003
Case 1 2 3 4 5 6 7Vlib(V) Vinit Vinit Vmax Vinit NA Vheur1 Vheur2
VBTI (V) Vinit Vmax Vmax Vfinal NA Vheur1 Vheur2
63ISVLSI-2014 invited talk 140710
ldquoChicken and Eggrdquo Loop
bull ldquoChicken and eggrdquo loop in signoffbull Derated library characterization is related to BTI + AVSbull AVS affected by circuit implementation
bull Timing constraints critical paths etc
bull Circuit is affected by library characterization
Circuit
Derated Libraries
Vfinal
Vlib VBTI
64ISVLSI-2014 invited talk 140710
Bias Temperature Instability (BTI)
|ΔVth| increases when device is on (stressed)|ΔVth| is partially recovered when device is off (relaxed)NBTI PMOS PBTINMOS
|Vgs|
time
ON OFF ON OFF
[VattikondaWC06]
Device aging (|ΔVth|) accumulates over time
[TCASrsquo14]
65ISVLSI-2014 invited talk 140710
Observation 1
[Chan11]
bull BTI is a ldquofront-loadedrdquo phenomenon
bull 50 BTI aging happens within the 1st year of circuit lifetime (total lifetime = 10 years)
bull Most Vdd increment happens in early lifetime
bull Gap between Vdd and Vfinal reduces rapidly
asymp70 Vdd increment in 1 year(remaining 30 over 9 years)
Vfinal
66ISVLSI-2014 invited talk 140710
Results for DC Scenario
Optimistic signoff corner bull AVS increases supply voltage
aggressively to compensate agingbull Consume more powerbull May fail to meet timing if desired
bull Assume no EM fix at signoffbull BTI degradation is checked at each step and MTTF is updated as
30 MTTF penalty
200mV voltage compensation
>
70ISVLSI-2014 invited talk 140710
0 2 4 6 8 10 12090092094096098100102104
S1 S2 S3 S4 S5
Year
VDD
DMA 3S1 S2 S3 S4 S5
78
79
80
81
MTT
F (Y
ear)
EM Impact on AVS Scheduling
12 years MTTF penalty
>
71ISVLSI-2014 invited talk 140710
What is ldquoSignoffrdquo
bull Foundation of contract between design house and foundrybull ldquochip should workrdquo stack of models margins analysesbull Function timing signal integrity power integrity hellip
Nominal VddStatic IR drop
Power grid IR gradientDynamic IR
HCINBTI
Signoff Vdd
Voltage
Problem Margins = pessimism
overdesign schedule delay
ldquomargin stackrdquo for voltage signoff
Operating voltage
72ISVLSI-2014 invited talk 140710
Statistical Timing Analysis (1)
bull Delay sensitivity of path pj to variation source zv
bull Assumptions bull Δdjv is linear with respect to variation sources
bull Variation sources are normal distributions
bull Obtain Δdjv using 28 runs of RC extraction and static timing analysis (STA)
28 itf files (27 variation
sources + Ytyp)
Routed Netlist
RC extraction
STA
Δdjv
Δdjv = [ - ] 3dj(Yv) dj(Ytyp)
Note Path delay includes gate and wire delays
73ISVLSI-2014 invited talk 140710
Statistical Timing Analysis (2)bull Σ is the correlation matrix for variation sources (eg 27 x 27)
bull Σ = λλT (Note λ is obtained by Cholesky decomposition)
h h119879 119903119900119906119892 119901119906119905=1minus119864119877119879
+1minus119864119877119903times119879
recovery cycles
Clock period
Error rate [Kahng10]
76ISVLSI-2014 invited talk 140710
Selective-Endpoint Optimizationbull Optimize fanin cone w tighter constraints Allows replacement of Razor FF w normal FFbull Trade off cost of resilience vs data path optimization
bull Model BTI degradation with Vfinal throughout lifetime
bull Aging of a flat Vfinal asymp aging of an adaptive Vdd
bull But slightly pessimistic
Vdd
time
NBTI
PBTI
VBTI = Vlib asymp Vfinal
58ISVLSI-2014 invited talk 140710
Vfinal Estimation
bull Problem Vfinal is not available at early design stage (design has not been implemented)
bull Vfinal = Vdd end of life (to compensate BTI aging)
bull Gates along critical pathbull Timing slack at t = 0
bull Circuit activity is not an issue bull Because BTI effect is not sensitive to circuit activitybull DC or AC stress model is sufficient
59ISVLSI-2014 invited talk 140710
Observation and Heuristic 2
bull Observation 2 Vfinal is not sensitive to gate types
bull Heuristic 2 use average Vfinal of different gate typesbull Vfinal is a function of timing slack
bull Assume timing slack = 0
10mV
60ISVLSI-2014 invited talk 140710
Technology and Benchmark Circuits
bull NANGATE library with 32nm PTM technology bull Signoff for setup time violationbull Temperature = 125Cbull Process corner = slow NMOS and PMOSbull BTI degradation = DC AC
Supply voltages
Circuit Frequency (GHz)C5315 138c7552 125AES 089MPEG2 105
Vmax105V
Vinit090V
Vheur1 (DC) 097V
Vheur1 (AC) 095V
Vheur2 (DC) 095V
Vheur2 (AC) 093V
61ISVLSI-2014 invited talk 140710
A Reference Signoff Flow
bull Basic idea keep a consistent VBTI VLIB and Vdd throughout circuit lifetime
bull Signoff flowbull Estimate aging at each time step
bull Update circuit timing and Vdd
bull Repeat until t = tfinal
bull Modify circuit and start over if Vfinal gt maximum allowed voltage
bull No overhead in timing analysis but very slow Many STA runs
and library
Vstep AVS voltage stepVfinal converged voltage
62ISVLSI-2014 invited talk 140710
Experiment Setupbull Characterize different derated libraries
bull Evaluate impact of library characterizationbull Seven setups
1 VBTI = Vlib = Vinit Ignore AVS2 Most pessimistic derated library3 VBTI = Vlib = Vmax Extreme corner for AVS4 VBTI = Vfinal Do not overestimate aging but ignores AVS5 No derated library (reference)6 Proposed method with α=07 Proposed method with α=003
Case 1 2 3 4 5 6 7Vlib(V) Vinit Vinit Vmax Vinit NA Vheur1 Vheur2
VBTI (V) Vinit Vmax Vmax Vfinal NA Vheur1 Vheur2
63ISVLSI-2014 invited talk 140710
ldquoChicken and Eggrdquo Loop
bull ldquoChicken and eggrdquo loop in signoffbull Derated library characterization is related to BTI + AVSbull AVS affected by circuit implementation
bull Timing constraints critical paths etc
bull Circuit is affected by library characterization
Circuit
Derated Libraries
Vfinal
Vlib VBTI
64ISVLSI-2014 invited talk 140710
Bias Temperature Instability (BTI)
|ΔVth| increases when device is on (stressed)|ΔVth| is partially recovered when device is off (relaxed)NBTI PMOS PBTINMOS
|Vgs|
time
ON OFF ON OFF
[VattikondaWC06]
Device aging (|ΔVth|) accumulates over time
[TCASrsquo14]
65ISVLSI-2014 invited talk 140710
Observation 1
[Chan11]
bull BTI is a ldquofront-loadedrdquo phenomenon
bull 50 BTI aging happens within the 1st year of circuit lifetime (total lifetime = 10 years)
bull Most Vdd increment happens in early lifetime
bull Gap between Vdd and Vfinal reduces rapidly
asymp70 Vdd increment in 1 year(remaining 30 over 9 years)
Vfinal
66ISVLSI-2014 invited talk 140710
Results for DC Scenario
Optimistic signoff corner bull AVS increases supply voltage
aggressively to compensate agingbull Consume more powerbull May fail to meet timing if desired
bull Assume no EM fix at signoffbull BTI degradation is checked at each step and MTTF is updated as
30 MTTF penalty
200mV voltage compensation
>
70ISVLSI-2014 invited talk 140710
0 2 4 6 8 10 12090092094096098100102104
S1 S2 S3 S4 S5
Year
VDD
DMA 3S1 S2 S3 S4 S5
78
79
80
81
MTT
F (Y
ear)
EM Impact on AVS Scheduling
12 years MTTF penalty
>
71ISVLSI-2014 invited talk 140710
What is ldquoSignoffrdquo
bull Foundation of contract between design house and foundrybull ldquochip should workrdquo stack of models margins analysesbull Function timing signal integrity power integrity hellip
Nominal VddStatic IR drop
Power grid IR gradientDynamic IR
HCINBTI
Signoff Vdd
Voltage
Problem Margins = pessimism
overdesign schedule delay
ldquomargin stackrdquo for voltage signoff
Operating voltage
72ISVLSI-2014 invited talk 140710
Statistical Timing Analysis (1)
bull Delay sensitivity of path pj to variation source zv
bull Assumptions bull Δdjv is linear with respect to variation sources
bull Variation sources are normal distributions
bull Obtain Δdjv using 28 runs of RC extraction and static timing analysis (STA)
28 itf files (27 variation
sources + Ytyp)
Routed Netlist
RC extraction
STA
Δdjv
Δdjv = [ - ] 3dj(Yv) dj(Ytyp)
Note Path delay includes gate and wire delays
73ISVLSI-2014 invited talk 140710
Statistical Timing Analysis (2)bull Σ is the correlation matrix for variation sources (eg 27 x 27)
bull Σ = λλT (Note λ is obtained by Cholesky decomposition)
h h119879 119903119900119906119892 119901119906119905=1minus119864119877119879
+1minus119864119877119903times119879
recovery cycles
Clock period
Error rate [Kahng10]
76ISVLSI-2014 invited talk 140710
Selective-Endpoint Optimizationbull Optimize fanin cone w tighter constraints Allows replacement of Razor FF w normal FFbull Trade off cost of resilience vs data path optimization
bull Model BTI degradation with Vfinal throughout lifetime
bull Aging of a flat Vfinal asymp aging of an adaptive Vdd
bull But slightly pessimistic
Vdd
time
NBTI
PBTI
VBTI = Vlib asymp Vfinal
58ISVLSI-2014 invited talk 140710
Vfinal Estimation
bull Problem Vfinal is not available at early design stage (design has not been implemented)
bull Vfinal = Vdd end of life (to compensate BTI aging)
bull Gates along critical pathbull Timing slack at t = 0
bull Circuit activity is not an issue bull Because BTI effect is not sensitive to circuit activitybull DC or AC stress model is sufficient
59ISVLSI-2014 invited talk 140710
Observation and Heuristic 2
bull Observation 2 Vfinal is not sensitive to gate types
bull Heuristic 2 use average Vfinal of different gate typesbull Vfinal is a function of timing slack
bull Assume timing slack = 0
10mV
60ISVLSI-2014 invited talk 140710
Technology and Benchmark Circuits
bull NANGATE library with 32nm PTM technology bull Signoff for setup time violationbull Temperature = 125Cbull Process corner = slow NMOS and PMOSbull BTI degradation = DC AC
Supply voltages
Circuit Frequency (GHz)C5315 138c7552 125AES 089MPEG2 105
Vmax105V
Vinit090V
Vheur1 (DC) 097V
Vheur1 (AC) 095V
Vheur2 (DC) 095V
Vheur2 (AC) 093V
61ISVLSI-2014 invited talk 140710
A Reference Signoff Flow
bull Basic idea keep a consistent VBTI VLIB and Vdd throughout circuit lifetime
bull Signoff flowbull Estimate aging at each time step
bull Update circuit timing and Vdd
bull Repeat until t = tfinal
bull Modify circuit and start over if Vfinal gt maximum allowed voltage
bull No overhead in timing analysis but very slow Many STA runs
and library
Vstep AVS voltage stepVfinal converged voltage
62ISVLSI-2014 invited talk 140710
Experiment Setupbull Characterize different derated libraries
bull Evaluate impact of library characterizationbull Seven setups
1 VBTI = Vlib = Vinit Ignore AVS2 Most pessimistic derated library3 VBTI = Vlib = Vmax Extreme corner for AVS4 VBTI = Vfinal Do not overestimate aging but ignores AVS5 No derated library (reference)6 Proposed method with α=07 Proposed method with α=003
Case 1 2 3 4 5 6 7Vlib(V) Vinit Vinit Vmax Vinit NA Vheur1 Vheur2
VBTI (V) Vinit Vmax Vmax Vfinal NA Vheur1 Vheur2
63ISVLSI-2014 invited talk 140710
ldquoChicken and Eggrdquo Loop
bull ldquoChicken and eggrdquo loop in signoffbull Derated library characterization is related to BTI + AVSbull AVS affected by circuit implementation
bull Timing constraints critical paths etc
bull Circuit is affected by library characterization
Circuit
Derated Libraries
Vfinal
Vlib VBTI
64ISVLSI-2014 invited talk 140710
Bias Temperature Instability (BTI)
|ΔVth| increases when device is on (stressed)|ΔVth| is partially recovered when device is off (relaxed)NBTI PMOS PBTINMOS
|Vgs|
time
ON OFF ON OFF
[VattikondaWC06]
Device aging (|ΔVth|) accumulates over time
[TCASrsquo14]
65ISVLSI-2014 invited talk 140710
Observation 1
[Chan11]
bull BTI is a ldquofront-loadedrdquo phenomenon
bull 50 BTI aging happens within the 1st year of circuit lifetime (total lifetime = 10 years)
bull Most Vdd increment happens in early lifetime
bull Gap between Vdd and Vfinal reduces rapidly
asymp70 Vdd increment in 1 year(remaining 30 over 9 years)
Vfinal
66ISVLSI-2014 invited talk 140710
Results for DC Scenario
Optimistic signoff corner bull AVS increases supply voltage
aggressively to compensate agingbull Consume more powerbull May fail to meet timing if desired
bull Assume no EM fix at signoffbull BTI degradation is checked at each step and MTTF is updated as
30 MTTF penalty
200mV voltage compensation
>
70ISVLSI-2014 invited talk 140710
0 2 4 6 8 10 12090092094096098100102104
S1 S2 S3 S4 S5
Year
VDD
DMA 3S1 S2 S3 S4 S5
78
79
80
81
MTT
F (Y
ear)
EM Impact on AVS Scheduling
12 years MTTF penalty
>
71ISVLSI-2014 invited talk 140710
What is ldquoSignoffrdquo
bull Foundation of contract between design house and foundrybull ldquochip should workrdquo stack of models margins analysesbull Function timing signal integrity power integrity hellip
Nominal VddStatic IR drop
Power grid IR gradientDynamic IR
HCINBTI
Signoff Vdd
Voltage
Problem Margins = pessimism
overdesign schedule delay
ldquomargin stackrdquo for voltage signoff
Operating voltage
72ISVLSI-2014 invited talk 140710
Statistical Timing Analysis (1)
bull Delay sensitivity of path pj to variation source zv
bull Assumptions bull Δdjv is linear with respect to variation sources
bull Variation sources are normal distributions
bull Obtain Δdjv using 28 runs of RC extraction and static timing analysis (STA)
28 itf files (27 variation
sources + Ytyp)
Routed Netlist
RC extraction
STA
Δdjv
Δdjv = [ - ] 3dj(Yv) dj(Ytyp)
Note Path delay includes gate and wire delays
73ISVLSI-2014 invited talk 140710
Statistical Timing Analysis (2)bull Σ is the correlation matrix for variation sources (eg 27 x 27)
bull Σ = λλT (Note λ is obtained by Cholesky decomposition)
h h119879 119903119900119906119892 119901119906119905=1minus119864119877119879
+1minus119864119877119903times119879
recovery cycles
Clock period
Error rate [Kahng10]
76ISVLSI-2014 invited talk 140710
Selective-Endpoint Optimizationbull Optimize fanin cone w tighter constraints Allows replacement of Razor FF w normal FFbull Trade off cost of resilience vs data path optimization
bull Model BTI degradation with Vfinal throughout lifetime
bull Aging of a flat Vfinal asymp aging of an adaptive Vdd
bull But slightly pessimistic
Vdd
time
NBTI
PBTI
VBTI = Vlib asymp Vfinal
58ISVLSI-2014 invited talk 140710
Vfinal Estimation
bull Problem Vfinal is not available at early design stage (design has not been implemented)
bull Vfinal = Vdd end of life (to compensate BTI aging)
bull Gates along critical pathbull Timing slack at t = 0
bull Circuit activity is not an issue bull Because BTI effect is not sensitive to circuit activitybull DC or AC stress model is sufficient
59ISVLSI-2014 invited talk 140710
Observation and Heuristic 2
bull Observation 2 Vfinal is not sensitive to gate types
bull Heuristic 2 use average Vfinal of different gate typesbull Vfinal is a function of timing slack
bull Assume timing slack = 0
10mV
60ISVLSI-2014 invited talk 140710
Technology and Benchmark Circuits
bull NANGATE library with 32nm PTM technology bull Signoff for setup time violationbull Temperature = 125Cbull Process corner = slow NMOS and PMOSbull BTI degradation = DC AC
Supply voltages
Circuit Frequency (GHz)C5315 138c7552 125AES 089MPEG2 105
Vmax105V
Vinit090V
Vheur1 (DC) 097V
Vheur1 (AC) 095V
Vheur2 (DC) 095V
Vheur2 (AC) 093V
61ISVLSI-2014 invited talk 140710
A Reference Signoff Flow
bull Basic idea keep a consistent VBTI VLIB and Vdd throughout circuit lifetime
bull Signoff flowbull Estimate aging at each time step
bull Update circuit timing and Vdd
bull Repeat until t = tfinal
bull Modify circuit and start over if Vfinal gt maximum allowed voltage
bull No overhead in timing analysis but very slow Many STA runs
and library
Vstep AVS voltage stepVfinal converged voltage
62ISVLSI-2014 invited talk 140710
Experiment Setupbull Characterize different derated libraries
bull Evaluate impact of library characterizationbull Seven setups
1 VBTI = Vlib = Vinit Ignore AVS2 Most pessimistic derated library3 VBTI = Vlib = Vmax Extreme corner for AVS4 VBTI = Vfinal Do not overestimate aging but ignores AVS5 No derated library (reference)6 Proposed method with α=07 Proposed method with α=003
Case 1 2 3 4 5 6 7Vlib(V) Vinit Vinit Vmax Vinit NA Vheur1 Vheur2
VBTI (V) Vinit Vmax Vmax Vfinal NA Vheur1 Vheur2
63ISVLSI-2014 invited talk 140710
ldquoChicken and Eggrdquo Loop
bull ldquoChicken and eggrdquo loop in signoffbull Derated library characterization is related to BTI + AVSbull AVS affected by circuit implementation
bull Timing constraints critical paths etc
bull Circuit is affected by library characterization
Circuit
Derated Libraries
Vfinal
Vlib VBTI
64ISVLSI-2014 invited talk 140710
Bias Temperature Instability (BTI)
|ΔVth| increases when device is on (stressed)|ΔVth| is partially recovered when device is off (relaxed)NBTI PMOS PBTINMOS
|Vgs|
time
ON OFF ON OFF
[VattikondaWC06]
Device aging (|ΔVth|) accumulates over time
[TCASrsquo14]
65ISVLSI-2014 invited talk 140710
Observation 1
[Chan11]
bull BTI is a ldquofront-loadedrdquo phenomenon
bull 50 BTI aging happens within the 1st year of circuit lifetime (total lifetime = 10 years)
bull Most Vdd increment happens in early lifetime
bull Gap between Vdd and Vfinal reduces rapidly
asymp70 Vdd increment in 1 year(remaining 30 over 9 years)
Vfinal
66ISVLSI-2014 invited talk 140710
Results for DC Scenario
Optimistic signoff corner bull AVS increases supply voltage
aggressively to compensate agingbull Consume more powerbull May fail to meet timing if desired
bull Assume no EM fix at signoffbull BTI degradation is checked at each step and MTTF is updated as
30 MTTF penalty
200mV voltage compensation
>
70ISVLSI-2014 invited talk 140710
0 2 4 6 8 10 12090092094096098100102104
S1 S2 S3 S4 S5
Year
VDD
DMA 3S1 S2 S3 S4 S5
78
79
80
81
MTT
F (Y
ear)
EM Impact on AVS Scheduling
12 years MTTF penalty
>
71ISVLSI-2014 invited talk 140710
What is ldquoSignoffrdquo
bull Foundation of contract between design house and foundrybull ldquochip should workrdquo stack of models margins analysesbull Function timing signal integrity power integrity hellip
Nominal VddStatic IR drop
Power grid IR gradientDynamic IR
HCINBTI
Signoff Vdd
Voltage
Problem Margins = pessimism
overdesign schedule delay
ldquomargin stackrdquo for voltage signoff
Operating voltage
72ISVLSI-2014 invited talk 140710
Statistical Timing Analysis (1)
bull Delay sensitivity of path pj to variation source zv
bull Assumptions bull Δdjv is linear with respect to variation sources
bull Variation sources are normal distributions
bull Obtain Δdjv using 28 runs of RC extraction and static timing analysis (STA)
28 itf files (27 variation
sources + Ytyp)
Routed Netlist
RC extraction
STA
Δdjv
Δdjv = [ - ] 3dj(Yv) dj(Ytyp)
Note Path delay includes gate and wire delays
73ISVLSI-2014 invited talk 140710
Statistical Timing Analysis (2)bull Σ is the correlation matrix for variation sources (eg 27 x 27)
bull Σ = λλT (Note λ is obtained by Cholesky decomposition)
h h119879 119903119900119906119892 119901119906119905=1minus119864119877119879
+1minus119864119877119903times119879
recovery cycles
Clock period
Error rate [Kahng10]
76ISVLSI-2014 invited talk 140710
Selective-Endpoint Optimizationbull Optimize fanin cone w tighter constraints Allows replacement of Razor FF w normal FFbull Trade off cost of resilience vs data path optimization
bull Model BTI degradation with Vfinal throughout lifetime
bull Aging of a flat Vfinal asymp aging of an adaptive Vdd
bull But slightly pessimistic
Vdd
time
NBTI
PBTI
VBTI = Vlib asymp Vfinal
58ISVLSI-2014 invited talk 140710
Vfinal Estimation
bull Problem Vfinal is not available at early design stage (design has not been implemented)
bull Vfinal = Vdd end of life (to compensate BTI aging)
bull Gates along critical pathbull Timing slack at t = 0
bull Circuit activity is not an issue bull Because BTI effect is not sensitive to circuit activitybull DC or AC stress model is sufficient
59ISVLSI-2014 invited talk 140710
Observation and Heuristic 2
bull Observation 2 Vfinal is not sensitive to gate types
bull Heuristic 2 use average Vfinal of different gate typesbull Vfinal is a function of timing slack
bull Assume timing slack = 0
10mV
60ISVLSI-2014 invited talk 140710
Technology and Benchmark Circuits
bull NANGATE library with 32nm PTM technology bull Signoff for setup time violationbull Temperature = 125Cbull Process corner = slow NMOS and PMOSbull BTI degradation = DC AC
Supply voltages
Circuit Frequency (GHz)C5315 138c7552 125AES 089MPEG2 105
Vmax105V
Vinit090V
Vheur1 (DC) 097V
Vheur1 (AC) 095V
Vheur2 (DC) 095V
Vheur2 (AC) 093V
61ISVLSI-2014 invited talk 140710
A Reference Signoff Flow
bull Basic idea keep a consistent VBTI VLIB and Vdd throughout circuit lifetime
bull Signoff flowbull Estimate aging at each time step
bull Update circuit timing and Vdd
bull Repeat until t = tfinal
bull Modify circuit and start over if Vfinal gt maximum allowed voltage
bull No overhead in timing analysis but very slow Many STA runs
and library
Vstep AVS voltage stepVfinal converged voltage
62ISVLSI-2014 invited talk 140710
Experiment Setupbull Characterize different derated libraries
bull Evaluate impact of library characterizationbull Seven setups
1 VBTI = Vlib = Vinit Ignore AVS2 Most pessimistic derated library3 VBTI = Vlib = Vmax Extreme corner for AVS4 VBTI = Vfinal Do not overestimate aging but ignores AVS5 No derated library (reference)6 Proposed method with α=07 Proposed method with α=003
Case 1 2 3 4 5 6 7Vlib(V) Vinit Vinit Vmax Vinit NA Vheur1 Vheur2
VBTI (V) Vinit Vmax Vmax Vfinal NA Vheur1 Vheur2
63ISVLSI-2014 invited talk 140710
ldquoChicken and Eggrdquo Loop
bull ldquoChicken and eggrdquo loop in signoffbull Derated library characterization is related to BTI + AVSbull AVS affected by circuit implementation
bull Timing constraints critical paths etc
bull Circuit is affected by library characterization
Circuit
Derated Libraries
Vfinal
Vlib VBTI
64ISVLSI-2014 invited talk 140710
Bias Temperature Instability (BTI)
|ΔVth| increases when device is on (stressed)|ΔVth| is partially recovered when device is off (relaxed)NBTI PMOS PBTINMOS
|Vgs|
time
ON OFF ON OFF
[VattikondaWC06]
Device aging (|ΔVth|) accumulates over time
[TCASrsquo14]
65ISVLSI-2014 invited talk 140710
Observation 1
[Chan11]
bull BTI is a ldquofront-loadedrdquo phenomenon
bull 50 BTI aging happens within the 1st year of circuit lifetime (total lifetime = 10 years)
bull Most Vdd increment happens in early lifetime
bull Gap between Vdd and Vfinal reduces rapidly
asymp70 Vdd increment in 1 year(remaining 30 over 9 years)
Vfinal
66ISVLSI-2014 invited talk 140710
Results for DC Scenario
Optimistic signoff corner bull AVS increases supply voltage
aggressively to compensate agingbull Consume more powerbull May fail to meet timing if desired
bull Assume no EM fix at signoffbull BTI degradation is checked at each step and MTTF is updated as
30 MTTF penalty
200mV voltage compensation
>
70ISVLSI-2014 invited talk 140710
0 2 4 6 8 10 12090092094096098100102104
S1 S2 S3 S4 S5
Year
VDD
DMA 3S1 S2 S3 S4 S5
78
79
80
81
MTT
F (Y
ear)
EM Impact on AVS Scheduling
12 years MTTF penalty
>
71ISVLSI-2014 invited talk 140710
What is ldquoSignoffrdquo
bull Foundation of contract between design house and foundrybull ldquochip should workrdquo stack of models margins analysesbull Function timing signal integrity power integrity hellip
Nominal VddStatic IR drop
Power grid IR gradientDynamic IR
HCINBTI
Signoff Vdd
Voltage
Problem Margins = pessimism
overdesign schedule delay
ldquomargin stackrdquo for voltage signoff
Operating voltage
72ISVLSI-2014 invited talk 140710
Statistical Timing Analysis (1)
bull Delay sensitivity of path pj to variation source zv
bull Assumptions bull Δdjv is linear with respect to variation sources
bull Variation sources are normal distributions
bull Obtain Δdjv using 28 runs of RC extraction and static timing analysis (STA)
28 itf files (27 variation
sources + Ytyp)
Routed Netlist
RC extraction
STA
Δdjv
Δdjv = [ - ] 3dj(Yv) dj(Ytyp)
Note Path delay includes gate and wire delays
73ISVLSI-2014 invited talk 140710
Statistical Timing Analysis (2)bull Σ is the correlation matrix for variation sources (eg 27 x 27)
bull Σ = λλT (Note λ is obtained by Cholesky decomposition)
h h119879 119903119900119906119892 119901119906119905=1minus119864119877119879
+1minus119864119877119903times119879
recovery cycles
Clock period
Error rate [Kahng10]
76ISVLSI-2014 invited talk 140710
Selective-Endpoint Optimizationbull Optimize fanin cone w tighter constraints Allows replacement of Razor FF w normal FFbull Trade off cost of resilience vs data path optimization
bull Model BTI degradation with Vfinal throughout lifetime
bull Aging of a flat Vfinal asymp aging of an adaptive Vdd
bull But slightly pessimistic
Vdd
time
NBTI
PBTI
VBTI = Vlib asymp Vfinal
58ISVLSI-2014 invited talk 140710
Vfinal Estimation
bull Problem Vfinal is not available at early design stage (design has not been implemented)
bull Vfinal = Vdd end of life (to compensate BTI aging)
bull Gates along critical pathbull Timing slack at t = 0
bull Circuit activity is not an issue bull Because BTI effect is not sensitive to circuit activitybull DC or AC stress model is sufficient
59ISVLSI-2014 invited talk 140710
Observation and Heuristic 2
bull Observation 2 Vfinal is not sensitive to gate types
bull Heuristic 2 use average Vfinal of different gate typesbull Vfinal is a function of timing slack
bull Assume timing slack = 0
10mV
60ISVLSI-2014 invited talk 140710
Technology and Benchmark Circuits
bull NANGATE library with 32nm PTM technology bull Signoff for setup time violationbull Temperature = 125Cbull Process corner = slow NMOS and PMOSbull BTI degradation = DC AC
Supply voltages
Circuit Frequency (GHz)C5315 138c7552 125AES 089MPEG2 105
Vmax105V
Vinit090V
Vheur1 (DC) 097V
Vheur1 (AC) 095V
Vheur2 (DC) 095V
Vheur2 (AC) 093V
61ISVLSI-2014 invited talk 140710
A Reference Signoff Flow
bull Basic idea keep a consistent VBTI VLIB and Vdd throughout circuit lifetime
bull Signoff flowbull Estimate aging at each time step
bull Update circuit timing and Vdd
bull Repeat until t = tfinal
bull Modify circuit and start over if Vfinal gt maximum allowed voltage
bull No overhead in timing analysis but very slow Many STA runs
and library
Vstep AVS voltage stepVfinal converged voltage
62ISVLSI-2014 invited talk 140710
Experiment Setupbull Characterize different derated libraries
bull Evaluate impact of library characterizationbull Seven setups
1 VBTI = Vlib = Vinit Ignore AVS2 Most pessimistic derated library3 VBTI = Vlib = Vmax Extreme corner for AVS4 VBTI = Vfinal Do not overestimate aging but ignores AVS5 No derated library (reference)6 Proposed method with α=07 Proposed method with α=003
Case 1 2 3 4 5 6 7Vlib(V) Vinit Vinit Vmax Vinit NA Vheur1 Vheur2
VBTI (V) Vinit Vmax Vmax Vfinal NA Vheur1 Vheur2
63ISVLSI-2014 invited talk 140710
ldquoChicken and Eggrdquo Loop
bull ldquoChicken and eggrdquo loop in signoffbull Derated library characterization is related to BTI + AVSbull AVS affected by circuit implementation
bull Timing constraints critical paths etc
bull Circuit is affected by library characterization
Circuit
Derated Libraries
Vfinal
Vlib VBTI
64ISVLSI-2014 invited talk 140710
Bias Temperature Instability (BTI)
|ΔVth| increases when device is on (stressed)|ΔVth| is partially recovered when device is off (relaxed)NBTI PMOS PBTINMOS
|Vgs|
time
ON OFF ON OFF
[VattikondaWC06]
Device aging (|ΔVth|) accumulates over time
[TCASrsquo14]
65ISVLSI-2014 invited talk 140710
Observation 1
[Chan11]
bull BTI is a ldquofront-loadedrdquo phenomenon
bull 50 BTI aging happens within the 1st year of circuit lifetime (total lifetime = 10 years)
bull Most Vdd increment happens in early lifetime
bull Gap between Vdd and Vfinal reduces rapidly
asymp70 Vdd increment in 1 year(remaining 30 over 9 years)
Vfinal
66ISVLSI-2014 invited talk 140710
Results for DC Scenario
Optimistic signoff corner bull AVS increases supply voltage
aggressively to compensate agingbull Consume more powerbull May fail to meet timing if desired
bull Assume no EM fix at signoffbull BTI degradation is checked at each step and MTTF is updated as
30 MTTF penalty
200mV voltage compensation
>
70ISVLSI-2014 invited talk 140710
0 2 4 6 8 10 12090092094096098100102104
S1 S2 S3 S4 S5
Year
VDD
DMA 3S1 S2 S3 S4 S5
78
79
80
81
MTT
F (Y
ear)
EM Impact on AVS Scheduling
12 years MTTF penalty
>
71ISVLSI-2014 invited talk 140710
What is ldquoSignoffrdquo
bull Foundation of contract between design house and foundrybull ldquochip should workrdquo stack of models margins analysesbull Function timing signal integrity power integrity hellip
Nominal VddStatic IR drop
Power grid IR gradientDynamic IR
HCINBTI
Signoff Vdd
Voltage
Problem Margins = pessimism
overdesign schedule delay
ldquomargin stackrdquo for voltage signoff
Operating voltage
72ISVLSI-2014 invited talk 140710
Statistical Timing Analysis (1)
bull Delay sensitivity of path pj to variation source zv
bull Assumptions bull Δdjv is linear with respect to variation sources
bull Variation sources are normal distributions
bull Obtain Δdjv using 28 runs of RC extraction and static timing analysis (STA)
28 itf files (27 variation
sources + Ytyp)
Routed Netlist
RC extraction
STA
Δdjv
Δdjv = [ - ] 3dj(Yv) dj(Ytyp)
Note Path delay includes gate and wire delays
73ISVLSI-2014 invited talk 140710
Statistical Timing Analysis (2)bull Σ is the correlation matrix for variation sources (eg 27 x 27)
bull Σ = λλT (Note λ is obtained by Cholesky decomposition)
h h119879 119903119900119906119892 119901119906119905=1minus119864119877119879
+1minus119864119877119903times119879
recovery cycles
Clock period
Error rate [Kahng10]
76ISVLSI-2014 invited talk 140710
Selective-Endpoint Optimizationbull Optimize fanin cone w tighter constraints Allows replacement of Razor FF w normal FFbull Trade off cost of resilience vs data path optimization
bull Model BTI degradation with Vfinal throughout lifetime
bull Aging of a flat Vfinal asymp aging of an adaptive Vdd
bull But slightly pessimistic
Vdd
time
NBTI
PBTI
VBTI = Vlib asymp Vfinal
58ISVLSI-2014 invited talk 140710
Vfinal Estimation
bull Problem Vfinal is not available at early design stage (design has not been implemented)
bull Vfinal = Vdd end of life (to compensate BTI aging)
bull Gates along critical pathbull Timing slack at t = 0
bull Circuit activity is not an issue bull Because BTI effect is not sensitive to circuit activitybull DC or AC stress model is sufficient
59ISVLSI-2014 invited talk 140710
Observation and Heuristic 2
bull Observation 2 Vfinal is not sensitive to gate types
bull Heuristic 2 use average Vfinal of different gate typesbull Vfinal is a function of timing slack
bull Assume timing slack = 0
10mV
60ISVLSI-2014 invited talk 140710
Technology and Benchmark Circuits
bull NANGATE library with 32nm PTM technology bull Signoff for setup time violationbull Temperature = 125Cbull Process corner = slow NMOS and PMOSbull BTI degradation = DC AC
Supply voltages
Circuit Frequency (GHz)C5315 138c7552 125AES 089MPEG2 105
Vmax105V
Vinit090V
Vheur1 (DC) 097V
Vheur1 (AC) 095V
Vheur2 (DC) 095V
Vheur2 (AC) 093V
61ISVLSI-2014 invited talk 140710
A Reference Signoff Flow
bull Basic idea keep a consistent VBTI VLIB and Vdd throughout circuit lifetime
bull Signoff flowbull Estimate aging at each time step
bull Update circuit timing and Vdd
bull Repeat until t = tfinal
bull Modify circuit and start over if Vfinal gt maximum allowed voltage
bull No overhead in timing analysis but very slow Many STA runs
and library
Vstep AVS voltage stepVfinal converged voltage
62ISVLSI-2014 invited talk 140710
Experiment Setupbull Characterize different derated libraries
bull Evaluate impact of library characterizationbull Seven setups
1 VBTI = Vlib = Vinit Ignore AVS2 Most pessimistic derated library3 VBTI = Vlib = Vmax Extreme corner for AVS4 VBTI = Vfinal Do not overestimate aging but ignores AVS5 No derated library (reference)6 Proposed method with α=07 Proposed method with α=003
Case 1 2 3 4 5 6 7Vlib(V) Vinit Vinit Vmax Vinit NA Vheur1 Vheur2
VBTI (V) Vinit Vmax Vmax Vfinal NA Vheur1 Vheur2
63ISVLSI-2014 invited talk 140710
ldquoChicken and Eggrdquo Loop
bull ldquoChicken and eggrdquo loop in signoffbull Derated library characterization is related to BTI + AVSbull AVS affected by circuit implementation
bull Timing constraints critical paths etc
bull Circuit is affected by library characterization
Circuit
Derated Libraries
Vfinal
Vlib VBTI
64ISVLSI-2014 invited talk 140710
Bias Temperature Instability (BTI)
|ΔVth| increases when device is on (stressed)|ΔVth| is partially recovered when device is off (relaxed)NBTI PMOS PBTINMOS
|Vgs|
time
ON OFF ON OFF
[VattikondaWC06]
Device aging (|ΔVth|) accumulates over time
[TCASrsquo14]
65ISVLSI-2014 invited talk 140710
Observation 1
[Chan11]
bull BTI is a ldquofront-loadedrdquo phenomenon
bull 50 BTI aging happens within the 1st year of circuit lifetime (total lifetime = 10 years)
bull Most Vdd increment happens in early lifetime
bull Gap between Vdd and Vfinal reduces rapidly
asymp70 Vdd increment in 1 year(remaining 30 over 9 years)
Vfinal
66ISVLSI-2014 invited talk 140710
Results for DC Scenario
Optimistic signoff corner bull AVS increases supply voltage
aggressively to compensate agingbull Consume more powerbull May fail to meet timing if desired
bull Assume no EM fix at signoffbull BTI degradation is checked at each step and MTTF is updated as
30 MTTF penalty
200mV voltage compensation
>
70ISVLSI-2014 invited talk 140710
0 2 4 6 8 10 12090092094096098100102104
S1 S2 S3 S4 S5
Year
VDD
DMA 3S1 S2 S3 S4 S5
78
79
80
81
MTT
F (Y
ear)
EM Impact on AVS Scheduling
12 years MTTF penalty
>
71ISVLSI-2014 invited talk 140710
What is ldquoSignoffrdquo
bull Foundation of contract between design house and foundrybull ldquochip should workrdquo stack of models margins analysesbull Function timing signal integrity power integrity hellip
Nominal VddStatic IR drop
Power grid IR gradientDynamic IR
HCINBTI
Signoff Vdd
Voltage
Problem Margins = pessimism
overdesign schedule delay
ldquomargin stackrdquo for voltage signoff
Operating voltage
72ISVLSI-2014 invited talk 140710
Statistical Timing Analysis (1)
bull Delay sensitivity of path pj to variation source zv
bull Assumptions bull Δdjv is linear with respect to variation sources
bull Variation sources are normal distributions
bull Obtain Δdjv using 28 runs of RC extraction and static timing analysis (STA)
28 itf files (27 variation
sources + Ytyp)
Routed Netlist
RC extraction
STA
Δdjv
Δdjv = [ - ] 3dj(Yv) dj(Ytyp)
Note Path delay includes gate and wire delays
73ISVLSI-2014 invited talk 140710
Statistical Timing Analysis (2)bull Σ is the correlation matrix for variation sources (eg 27 x 27)
bull Σ = λλT (Note λ is obtained by Cholesky decomposition)
h h119879 119903119900119906119892 119901119906119905=1minus119864119877119879
+1minus119864119877119903times119879
recovery cycles
Clock period
Error rate [Kahng10]
76ISVLSI-2014 invited talk 140710
Selective-Endpoint Optimizationbull Optimize fanin cone w tighter constraints Allows replacement of Razor FF w normal FFbull Trade off cost of resilience vs data path optimization
Energy Reduction in AVS Contextbull Adaptive voltage scaling allows lower supply voltage for resilient
designs thus reduced powerbull Proposed method trades off between timing-error penalty vs
reduced power at a lower supply voltagebull Proposed method achieves an average of 18 energy reduction
compared to pure-margin designs Resilience benefits increase in the context of AVS strategy
084 088 092 096 10030
36
42
48
54
60brute-forcepure-marginCombOpt
Supply voltage (V)
En
erg
y (
mJ
)
084 086 088 09 092 094 096 098 1 10225
29
33
37
41
45brute-force
pure-margin
CombOpt
Supply voltage (V)
En
erg
y (
mJ
)
MUL EXU
Minimum achievable energy
80ISVLSI-2014 invited talk 140710
Our Concept Mode Dominancebull Design cone (of mode A) is the union of all the feasible operating modes for
circuits signed of at mode Abull Design cone is determined by tradeoff between voltage and frequency (mainly
threshold voltages)bull One mode is outside of the design cone of the other
failed design overdesignbull Mode A has positive timing slacks with respect to mode B
mode A dominates mode Bbull Equivalent dominance no mode is dominated by the other
bull Modes are in each othersrsquo design cone
Voltage
Frequency
A
Negative Slacks = failed design
Positive Slacks = overdesign
B
C
Design Cone of mode A
Multi-mode signoff at modes which do not exhibit equivalent dominance leads to overdesign
Guideline search for signoff modes within design cone reduce overdesign
81ISVLSI-2014 invited talk 140710
Our Method Global Optimizationbull Iteratively sample and refine power models
bull Avoid circuit implementation at each modebull Small constant of runs is enough Scalable
Sample (SPampR)
Construct power models
Estimate optimal signoff modes
Sample (SPampR)
Refine power models
Adaptive search
Global optimization flow
09 10 11 1214
15
16
17
18
19
201st 2nd real
Signoff Voltage (v)
Po
wer
(m
W)
Power estimation of adaptive search
bull Ovals indicate sample pointsbull 1st 2nd power from power models at first
second iterationbull real power from real implemented circuits
Design AESf 700MHz
82ISVLSI-2014 invited talk 140710
Classes of Closed-Loop AVS
bull Critical path may be difficult to identify (IP from 3rd party)
bull Calibrating monitors at multiple modesvoltages requires long test time
Closed-Loop AVS
Design-dependent replica
In-situmonitor
Generic monitor
bull Does not capture design-specific performance variation
82
This work Tunable monitor for closed-loop AVSbull Can be applied as a generic monitorbull Or tuned to capture design-specific performance
83ISVLSI-2014 invited talk 140710
Design of RO with Tunable Vmin
bull Identified two circuit knobs to tune Vmin
bull Series resistancebull Cell types (INV NAND NOR)
bull Proposed circuitbull Different cell type covers different process cornersbull Tune series resistance of each stage to high or low
1 bit 1 bit 1 bit Control pins
High resistance
Low resistance
84ISVLSI-2014 invited talk 140710
Benefit of Resilience Cost Reductionbull Reference flows
bull Pure-margin (PM) conventional methodology w only margin insertionbull Brute-force (BF) insert error-tolerant FFs at timing-critical endpoints
bull Proposed method (CO) achieves up to 20 energy reduction compared to reference methods
bull Resilience benefits increase with safety margin
PM BF CO PM BF CO PM BF CO25
30
35
40
45
50
55 Energy penalty of throughput degradationEnergy penalty of additional circuitsEnergy wo resilience
En
erg
y (
mJ
)
Large marginMedium marginSmall margin
MUL
PM BF CO PM BF CO PM BF CO25
27
29
31
33
35
En
erg
y (
mJ
)
EXU
Large marginMedium marginSmall margin
Smallmediumlarge margin safety margin = 51015 of clock period
85ISVLSI-2014 invited talk 140710
Increased Benefit of Resilience With AVSbull AVS (Adaptive Voltage Scaling) allows lower supply voltage for
resilient designs reduced powerbull We trade off between timing-error penalty vs reduced power at a
lower supply voltagebull Average 18 energy reduction compared to pure-margin designs
Resilience benefits increase in AVS context
084 088 092 096 10030
36
42
48
54
60brute-forcepure-marginCombOpt
Supply voltage (V)
En
erg
y (
mJ
)
084 086 088 09 092 094 096 098 1 10225
29
33
37
41
45brute-force
pure-margin
CombOpt
Supply voltage (V)
En
erg
y (
mJ
)
MUL EXU
Minimum achievable energy
86ISVLSI-2014 invited talk 140710
Overall Optimization Flow
bull Iteratively optimize with SEOpt and SkewOpt
Initial placement (all FFs = error-tolerant FFs)
Energy lt min energy
Save current solution
Margin insertion on K paths based on sensitivity function
Replace error-tolerant FFs w normal FFs
SEOpt
Activity-aware clock skew optimization
SkewOpt
Toward Holistic Modeling Margining and Tolerance of IC Variabi
IC Variability
Challenge Value of Technology
Solutions Modeling Margining Tolerance
Outline
BEOL Corner Optimization
Proposed Timing Signoff Flow
Conventional BEOL Corners
Statistical RC Model
Pessimism of Conventional BEOL Corners (CBC)
Intuition on Delay Variability Across Cw RCw
Intuition on Delay Variability Across Cw RCw (2)
Scaling Factor α and Delay Variation
Find Paths for Which TBCs Can Be Used
Determining α Arcw and Acw
Benefits of Tightened BEOL Corners
Outline (2)
How to Minimize Cost of Resilience
Tradeoff Resilience Cost vs Datapath Cost
Selective-Endpoint Optimization (SEOpt)
Clock Skew Optimization (SkewOpt)
Overall Optimization Flow
Benefit of Low-Cost Resilience
Increased Benefit of Resilience with AVS
Outline (3)
Breaking Chicken-Egg Loops Less Margin
Derated Library Characterization and AVS
Library Characterization for AVS
Power vs Area Across Different Signoffs
Heuristics 1
Vfinal Estimation
Observation and Heuristic 2
Proposed Library Characterization Flow
Power vs Area for All Designs
Also Multi-Mode Signoff Choices Matter
Also Tunable Monitors Less Margin
Also Tunable Monitors Less Margin (2)
Outline (4)
Conclusions
Thank You
Backup
Power Penalty to Fix EM with AVS
Homogeneous Corners
Homogeneous Corners (2)
Correlation Matrix
Wiring Structure in Timing-Critical Paths (2)
Delay Variation
Delay Variation (2)
Non-Homogeneous Corner
Opportunities for Tightened BEOL Corners
Wiring Structure in Timing-Critical Paths (2)
Proposed Timing Signoff Flow (2)
Experiment Setup
Further Analysis
Scaling Factor Results
Benefits of Tightened BEOL Corners (1)
Heuristics 1 (2)
Vfinal Estimation (2)
Observation and Heuristic 2 (2)
Technology and Benchmark Circuits
A Reference Signoff Flow
Experiment Setup (2)
ldquoChicken and Eggrdquo Loop
Bias Temperature Instability (BTI)
Observation 1
Results for DC Scenario
Problem Signoff Corner Definition
AVS Signoff Corner Selection
AVS Impact on EM Lifetime
EM Impact on AVS Scheduling
What is ldquoSignoffrdquo
Statistical Timing Analysis (1)
Statistical Timing Analysis (2)
Resilient Designs
Resilience Cost Reduction Problem
Selective-Endpoint Optimization
Process-Aware Vdd Scaling (PVS)
Challenge Variability
Energy Reduction in AVS Context
Our Concept Mode Dominance
Our Method Global Optimization
Classes of Closed-Loop AVS
Design of RO with Tunable Vmin
Benefit of Resilience Cost Reduction
Increased Benefit of Resilience With AVS
Overall Optimization Flow (2)
57ISVLSI-2014 invited talk 140710
Heuristics 1
bull Model BTI degradation with Vfinal throughout lifetime
bull Aging of a flat Vfinal asymp aging of an adaptive Vdd
bull But slightly pessimistic
Vdd
time
NBTI
PBTI
VBTI = Vlib asymp Vfinal
58ISVLSI-2014 invited talk 140710
Vfinal Estimation
bull Problem Vfinal is not available at early design stage (design has not been implemented)
bull Vfinal = Vdd end of life (to compensate BTI aging)
bull Gates along critical pathbull Timing slack at t = 0
bull Circuit activity is not an issue bull Because BTI effect is not sensitive to circuit activitybull DC or AC stress model is sufficient
59ISVLSI-2014 invited talk 140710
Observation and Heuristic 2
bull Observation 2 Vfinal is not sensitive to gate types
bull Heuristic 2 use average Vfinal of different gate typesbull Vfinal is a function of timing slack
bull Assume timing slack = 0
10mV
60ISVLSI-2014 invited talk 140710
Technology and Benchmark Circuits
bull NANGATE library with 32nm PTM technology bull Signoff for setup time violationbull Temperature = 125Cbull Process corner = slow NMOS and PMOSbull BTI degradation = DC AC
Supply voltages
Circuit Frequency (GHz)C5315 138c7552 125AES 089MPEG2 105
Vmax105V
Vinit090V
Vheur1 (DC) 097V
Vheur1 (AC) 095V
Vheur2 (DC) 095V
Vheur2 (AC) 093V
61ISVLSI-2014 invited talk 140710
A Reference Signoff Flow
bull Basic idea keep a consistent VBTI VLIB and Vdd throughout circuit lifetime
bull Signoff flowbull Estimate aging at each time step
bull Update circuit timing and Vdd
bull Repeat until t = tfinal
bull Modify circuit and start over if Vfinal gt maximum allowed voltage
bull No overhead in timing analysis but very slow Many STA runs
and library
Vstep AVS voltage stepVfinal converged voltage
62ISVLSI-2014 invited talk 140710
Experiment Setupbull Characterize different derated libraries
bull Evaluate impact of library characterizationbull Seven setups
1 VBTI = Vlib = Vinit Ignore AVS2 Most pessimistic derated library3 VBTI = Vlib = Vmax Extreme corner for AVS4 VBTI = Vfinal Do not overestimate aging but ignores AVS5 No derated library (reference)6 Proposed method with α=07 Proposed method with α=003
Case 1 2 3 4 5 6 7Vlib(V) Vinit Vinit Vmax Vinit NA Vheur1 Vheur2
VBTI (V) Vinit Vmax Vmax Vfinal NA Vheur1 Vheur2
63ISVLSI-2014 invited talk 140710
ldquoChicken and Eggrdquo Loop
bull ldquoChicken and eggrdquo loop in signoffbull Derated library characterization is related to BTI + AVSbull AVS affected by circuit implementation
bull Timing constraints critical paths etc
bull Circuit is affected by library characterization
Circuit
Derated Libraries
Vfinal
Vlib VBTI
64ISVLSI-2014 invited talk 140710
Bias Temperature Instability (BTI)
|ΔVth| increases when device is on (stressed)|ΔVth| is partially recovered when device is off (relaxed)NBTI PMOS PBTINMOS
|Vgs|
time
ON OFF ON OFF
[VattikondaWC06]
Device aging (|ΔVth|) accumulates over time
[TCASrsquo14]
65ISVLSI-2014 invited talk 140710
Observation 1
[Chan11]
bull BTI is a ldquofront-loadedrdquo phenomenon
bull 50 BTI aging happens within the 1st year of circuit lifetime (total lifetime = 10 years)
bull Most Vdd increment happens in early lifetime
bull Gap between Vdd and Vfinal reduces rapidly
asymp70 Vdd increment in 1 year(remaining 30 over 9 years)
Vfinal
66ISVLSI-2014 invited talk 140710
Results for DC Scenario
Optimistic signoff corner bull AVS increases supply voltage
aggressively to compensate agingbull Consume more powerbull May fail to meet timing if desired
bull Assume no EM fix at signoffbull BTI degradation is checked at each step and MTTF is updated as
30 MTTF penalty
200mV voltage compensation
>
70ISVLSI-2014 invited talk 140710
0 2 4 6 8 10 12090092094096098100102104
S1 S2 S3 S4 S5
Year
VDD
DMA 3S1 S2 S3 S4 S5
78
79
80
81
MTT
F (Y
ear)
EM Impact on AVS Scheduling
12 years MTTF penalty
>
71ISVLSI-2014 invited talk 140710
What is ldquoSignoffrdquo
bull Foundation of contract between design house and foundrybull ldquochip should workrdquo stack of models margins analysesbull Function timing signal integrity power integrity hellip
Nominal VddStatic IR drop
Power grid IR gradientDynamic IR
HCINBTI
Signoff Vdd
Voltage
Problem Margins = pessimism
overdesign schedule delay
ldquomargin stackrdquo for voltage signoff
Operating voltage
72ISVLSI-2014 invited talk 140710
Statistical Timing Analysis (1)
bull Delay sensitivity of path pj to variation source zv
bull Assumptions bull Δdjv is linear with respect to variation sources
bull Variation sources are normal distributions
bull Obtain Δdjv using 28 runs of RC extraction and static timing analysis (STA)
28 itf files (27 variation
sources + Ytyp)
Routed Netlist
RC extraction
STA
Δdjv
Δdjv = [ - ] 3dj(Yv) dj(Ytyp)
Note Path delay includes gate and wire delays
73ISVLSI-2014 invited talk 140710
Statistical Timing Analysis (2)bull Σ is the correlation matrix for variation sources (eg 27 x 27)
bull Σ = λλT (Note λ is obtained by Cholesky decomposition)
h h119879 119903119900119906119892 119901119906119905=1minus119864119877119879
+1minus119864119877119903times119879
recovery cycles
Clock period
Error rate [Kahng10]
76ISVLSI-2014 invited talk 140710
Selective-Endpoint Optimizationbull Optimize fanin cone w tighter constraints Allows replacement of Razor FF w normal FFbull Trade off cost of resilience vs data path optimization
Energy Reduction in AVS Contextbull Adaptive voltage scaling allows lower supply voltage for resilient
designs thus reduced powerbull Proposed method trades off between timing-error penalty vs
reduced power at a lower supply voltagebull Proposed method achieves an average of 18 energy reduction
compared to pure-margin designs Resilience benefits increase in the context of AVS strategy
084 088 092 096 10030
36
42
48
54
60brute-forcepure-marginCombOpt
Supply voltage (V)
En
erg
y (
mJ
)
084 086 088 09 092 094 096 098 1 10225
29
33
37
41
45brute-force
pure-margin
CombOpt
Supply voltage (V)
En
erg
y (
mJ
)
MUL EXU
Minimum achievable energy
80ISVLSI-2014 invited talk 140710
Our Concept Mode Dominancebull Design cone (of mode A) is the union of all the feasible operating modes for
circuits signed of at mode Abull Design cone is determined by tradeoff between voltage and frequency (mainly
threshold voltages)bull One mode is outside of the design cone of the other
failed design overdesignbull Mode A has positive timing slacks with respect to mode B
mode A dominates mode Bbull Equivalent dominance no mode is dominated by the other
bull Modes are in each othersrsquo design cone
Voltage
Frequency
A
Negative Slacks = failed design
Positive Slacks = overdesign
B
C
Design Cone of mode A
Multi-mode signoff at modes which do not exhibit equivalent dominance leads to overdesign
Guideline search for signoff modes within design cone reduce overdesign
81ISVLSI-2014 invited talk 140710
Our Method Global Optimizationbull Iteratively sample and refine power models
bull Avoid circuit implementation at each modebull Small constant of runs is enough Scalable
Sample (SPampR)
Construct power models
Estimate optimal signoff modes
Sample (SPampR)
Refine power models
Adaptive search
Global optimization flow
09 10 11 1214
15
16
17
18
19
201st 2nd real
Signoff Voltage (v)
Po
wer
(m
W)
Power estimation of adaptive search
bull Ovals indicate sample pointsbull 1st 2nd power from power models at first
second iterationbull real power from real implemented circuits
Design AESf 700MHz
82ISVLSI-2014 invited talk 140710
Classes of Closed-Loop AVS
bull Critical path may be difficult to identify (IP from 3rd party)
bull Calibrating monitors at multiple modesvoltages requires long test time
Closed-Loop AVS
Design-dependent replica
In-situmonitor
Generic monitor
bull Does not capture design-specific performance variation
82
This work Tunable monitor for closed-loop AVSbull Can be applied as a generic monitorbull Or tuned to capture design-specific performance
83ISVLSI-2014 invited talk 140710
Design of RO with Tunable Vmin
bull Identified two circuit knobs to tune Vmin
bull Series resistancebull Cell types (INV NAND NOR)
bull Proposed circuitbull Different cell type covers different process cornersbull Tune series resistance of each stage to high or low
1 bit 1 bit 1 bit Control pins
High resistance
Low resistance
84ISVLSI-2014 invited talk 140710
Benefit of Resilience Cost Reductionbull Reference flows
bull Pure-margin (PM) conventional methodology w only margin insertionbull Brute-force (BF) insert error-tolerant FFs at timing-critical endpoints
bull Proposed method (CO) achieves up to 20 energy reduction compared to reference methods
bull Resilience benefits increase with safety margin
PM BF CO PM BF CO PM BF CO25
30
35
40
45
50
55 Energy penalty of throughput degradationEnergy penalty of additional circuitsEnergy wo resilience
En
erg
y (
mJ
)
Large marginMedium marginSmall margin
MUL
PM BF CO PM BF CO PM BF CO25
27
29
31
33
35
En
erg
y (
mJ
)
EXU
Large marginMedium marginSmall margin
Smallmediumlarge margin safety margin = 51015 of clock period
85ISVLSI-2014 invited talk 140710
Increased Benefit of Resilience With AVSbull AVS (Adaptive Voltage Scaling) allows lower supply voltage for
resilient designs reduced powerbull We trade off between timing-error penalty vs reduced power at a
lower supply voltagebull Average 18 energy reduction compared to pure-margin designs
Resilience benefits increase in AVS context
084 088 092 096 10030
36
42
48
54
60brute-forcepure-marginCombOpt
Supply voltage (V)
En
erg
y (
mJ
)
084 086 088 09 092 094 096 098 1 10225
29
33
37
41
45brute-force
pure-margin
CombOpt
Supply voltage (V)
En
erg
y (
mJ
)
MUL EXU
Minimum achievable energy
86ISVLSI-2014 invited talk 140710
Overall Optimization Flow
bull Iteratively optimize with SEOpt and SkewOpt
Initial placement (all FFs = error-tolerant FFs)
Energy lt min energy
Save current solution
Margin insertion on K paths based on sensitivity function
Replace error-tolerant FFs w normal FFs
SEOpt
Activity-aware clock skew optimization
SkewOpt
Toward Holistic Modeling Margining and Tolerance of IC Variabi
IC Variability
Challenge Value of Technology
Solutions Modeling Margining Tolerance
Outline
BEOL Corner Optimization
Proposed Timing Signoff Flow
Conventional BEOL Corners
Statistical RC Model
Pessimism of Conventional BEOL Corners (CBC)
Intuition on Delay Variability Across Cw RCw
Intuition on Delay Variability Across Cw RCw (2)
Scaling Factor α and Delay Variation
Find Paths for Which TBCs Can Be Used
Determining α Arcw and Acw
Benefits of Tightened BEOL Corners
Outline (2)
How to Minimize Cost of Resilience
Tradeoff Resilience Cost vs Datapath Cost
Selective-Endpoint Optimization (SEOpt)
Clock Skew Optimization (SkewOpt)
Overall Optimization Flow
Benefit of Low-Cost Resilience
Increased Benefit of Resilience with AVS
Outline (3)
Breaking Chicken-Egg Loops Less Margin
Derated Library Characterization and AVS
Library Characterization for AVS
Power vs Area Across Different Signoffs
Heuristics 1
Vfinal Estimation
Observation and Heuristic 2
Proposed Library Characterization Flow
Power vs Area for All Designs
Also Multi-Mode Signoff Choices Matter
Also Tunable Monitors Less Margin
Also Tunable Monitors Less Margin (2)
Outline (4)
Conclusions
Thank You
Backup
Power Penalty to Fix EM with AVS
Homogeneous Corners
Homogeneous Corners (2)
Correlation Matrix
Wiring Structure in Timing-Critical Paths (2)
Delay Variation
Delay Variation (2)
Non-Homogeneous Corner
Opportunities for Tightened BEOL Corners
Wiring Structure in Timing-Critical Paths (2)
Proposed Timing Signoff Flow (2)
Experiment Setup
Further Analysis
Scaling Factor Results
Benefits of Tightened BEOL Corners (1)
Heuristics 1 (2)
Vfinal Estimation (2)
Observation and Heuristic 2 (2)
Technology and Benchmark Circuits
A Reference Signoff Flow
Experiment Setup (2)
ldquoChicken and Eggrdquo Loop
Bias Temperature Instability (BTI)
Observation 1
Results for DC Scenario
Problem Signoff Corner Definition
AVS Signoff Corner Selection
AVS Impact on EM Lifetime
EM Impact on AVS Scheduling
What is ldquoSignoffrdquo
Statistical Timing Analysis (1)
Statistical Timing Analysis (2)
Resilient Designs
Resilience Cost Reduction Problem
Selective-Endpoint Optimization
Process-Aware Vdd Scaling (PVS)
Challenge Variability
Energy Reduction in AVS Context
Our Concept Mode Dominance
Our Method Global Optimization
Classes of Closed-Loop AVS
Design of RO with Tunable Vmin
Benefit of Resilience Cost Reduction
Increased Benefit of Resilience With AVS
Overall Optimization Flow (2)
58ISVLSI-2014 invited talk 140710
Vfinal Estimation
bull Problem Vfinal is not available at early design stage (design has not been implemented)
bull Vfinal = Vdd end of life (to compensate BTI aging)
bull Gates along critical pathbull Timing slack at t = 0
bull Circuit activity is not an issue bull Because BTI effect is not sensitive to circuit activitybull DC or AC stress model is sufficient
59ISVLSI-2014 invited talk 140710
Observation and Heuristic 2
bull Observation 2 Vfinal is not sensitive to gate types
bull Heuristic 2 use average Vfinal of different gate typesbull Vfinal is a function of timing slack
bull Assume timing slack = 0
10mV
60ISVLSI-2014 invited talk 140710
Technology and Benchmark Circuits
bull NANGATE library with 32nm PTM technology bull Signoff for setup time violationbull Temperature = 125Cbull Process corner = slow NMOS and PMOSbull BTI degradation = DC AC
Supply voltages
Circuit Frequency (GHz)C5315 138c7552 125AES 089MPEG2 105
Vmax105V
Vinit090V
Vheur1 (DC) 097V
Vheur1 (AC) 095V
Vheur2 (DC) 095V
Vheur2 (AC) 093V
61ISVLSI-2014 invited talk 140710
A Reference Signoff Flow
bull Basic idea keep a consistent VBTI VLIB and Vdd throughout circuit lifetime
bull Signoff flowbull Estimate aging at each time step
bull Update circuit timing and Vdd
bull Repeat until t = tfinal
bull Modify circuit and start over if Vfinal gt maximum allowed voltage
bull No overhead in timing analysis but very slow Many STA runs
and library
Vstep AVS voltage stepVfinal converged voltage
62ISVLSI-2014 invited talk 140710
Experiment Setupbull Characterize different derated libraries
bull Evaluate impact of library characterizationbull Seven setups
1 VBTI = Vlib = Vinit Ignore AVS2 Most pessimistic derated library3 VBTI = Vlib = Vmax Extreme corner for AVS4 VBTI = Vfinal Do not overestimate aging but ignores AVS5 No derated library (reference)6 Proposed method with α=07 Proposed method with α=003
Case 1 2 3 4 5 6 7Vlib(V) Vinit Vinit Vmax Vinit NA Vheur1 Vheur2
VBTI (V) Vinit Vmax Vmax Vfinal NA Vheur1 Vheur2
63ISVLSI-2014 invited talk 140710
ldquoChicken and Eggrdquo Loop
bull ldquoChicken and eggrdquo loop in signoffbull Derated library characterization is related to BTI + AVSbull AVS affected by circuit implementation
bull Timing constraints critical paths etc
bull Circuit is affected by library characterization
Circuit
Derated Libraries
Vfinal
Vlib VBTI
64ISVLSI-2014 invited talk 140710
Bias Temperature Instability (BTI)
|ΔVth| increases when device is on (stressed)|ΔVth| is partially recovered when device is off (relaxed)NBTI PMOS PBTINMOS
|Vgs|
time
ON OFF ON OFF
[VattikondaWC06]
Device aging (|ΔVth|) accumulates over time
[TCASrsquo14]
65ISVLSI-2014 invited talk 140710
Observation 1
[Chan11]
bull BTI is a ldquofront-loadedrdquo phenomenon
bull 50 BTI aging happens within the 1st year of circuit lifetime (total lifetime = 10 years)
bull Most Vdd increment happens in early lifetime
bull Gap between Vdd and Vfinal reduces rapidly
asymp70 Vdd increment in 1 year(remaining 30 over 9 years)
Vfinal
66ISVLSI-2014 invited talk 140710
Results for DC Scenario
Optimistic signoff corner bull AVS increases supply voltage
aggressively to compensate agingbull Consume more powerbull May fail to meet timing if desired
bull Assume no EM fix at signoffbull BTI degradation is checked at each step and MTTF is updated as
30 MTTF penalty
200mV voltage compensation
>
70ISVLSI-2014 invited talk 140710
0 2 4 6 8 10 12090092094096098100102104
S1 S2 S3 S4 S5
Year
VDD
DMA 3S1 S2 S3 S4 S5
78
79
80
81
MTT
F (Y
ear)
EM Impact on AVS Scheduling
12 years MTTF penalty
>
71ISVLSI-2014 invited talk 140710
What is ldquoSignoffrdquo
bull Foundation of contract between design house and foundrybull ldquochip should workrdquo stack of models margins analysesbull Function timing signal integrity power integrity hellip
Nominal VddStatic IR drop
Power grid IR gradientDynamic IR
HCINBTI
Signoff Vdd
Voltage
Problem Margins = pessimism
overdesign schedule delay
ldquomargin stackrdquo for voltage signoff
Operating voltage
72ISVLSI-2014 invited talk 140710
Statistical Timing Analysis (1)
bull Delay sensitivity of path pj to variation source zv
bull Assumptions bull Δdjv is linear with respect to variation sources
bull Variation sources are normal distributions
bull Obtain Δdjv using 28 runs of RC extraction and static timing analysis (STA)
28 itf files (27 variation
sources + Ytyp)
Routed Netlist
RC extraction
STA
Δdjv
Δdjv = [ - ] 3dj(Yv) dj(Ytyp)
Note Path delay includes gate and wire delays
73ISVLSI-2014 invited talk 140710
Statistical Timing Analysis (2)bull Σ is the correlation matrix for variation sources (eg 27 x 27)
bull Σ = λλT (Note λ is obtained by Cholesky decomposition)
h h119879 119903119900119906119892 119901119906119905=1minus119864119877119879
+1minus119864119877119903times119879
recovery cycles
Clock period
Error rate [Kahng10]
76ISVLSI-2014 invited talk 140710
Selective-Endpoint Optimizationbull Optimize fanin cone w tighter constraints Allows replacement of Razor FF w normal FFbull Trade off cost of resilience vs data path optimization
Energy Reduction in AVS Contextbull Adaptive voltage scaling allows lower supply voltage for resilient
designs thus reduced powerbull Proposed method trades off between timing-error penalty vs
reduced power at a lower supply voltagebull Proposed method achieves an average of 18 energy reduction
compared to pure-margin designs Resilience benefits increase in the context of AVS strategy
084 088 092 096 10030
36
42
48
54
60brute-forcepure-marginCombOpt
Supply voltage (V)
En
erg
y (
mJ
)
084 086 088 09 092 094 096 098 1 10225
29
33
37
41
45brute-force
pure-margin
CombOpt
Supply voltage (V)
En
erg
y (
mJ
)
MUL EXU
Minimum achievable energy
80ISVLSI-2014 invited talk 140710
Our Concept Mode Dominancebull Design cone (of mode A) is the union of all the feasible operating modes for
circuits signed of at mode Abull Design cone is determined by tradeoff between voltage and frequency (mainly
threshold voltages)bull One mode is outside of the design cone of the other
failed design overdesignbull Mode A has positive timing slacks with respect to mode B
mode A dominates mode Bbull Equivalent dominance no mode is dominated by the other
bull Modes are in each othersrsquo design cone
Voltage
Frequency
A
Negative Slacks = failed design
Positive Slacks = overdesign
B
C
Design Cone of mode A
Multi-mode signoff at modes which do not exhibit equivalent dominance leads to overdesign
Guideline search for signoff modes within design cone reduce overdesign
81ISVLSI-2014 invited talk 140710
Our Method Global Optimizationbull Iteratively sample and refine power models
bull Avoid circuit implementation at each modebull Small constant of runs is enough Scalable
Sample (SPampR)
Construct power models
Estimate optimal signoff modes
Sample (SPampR)
Refine power models
Adaptive search
Global optimization flow
09 10 11 1214
15
16
17
18
19
201st 2nd real
Signoff Voltage (v)
Po
wer
(m
W)
Power estimation of adaptive search
bull Ovals indicate sample pointsbull 1st 2nd power from power models at first
second iterationbull real power from real implemented circuits
Design AESf 700MHz
82ISVLSI-2014 invited talk 140710
Classes of Closed-Loop AVS
bull Critical path may be difficult to identify (IP from 3rd party)
bull Calibrating monitors at multiple modesvoltages requires long test time
Closed-Loop AVS
Design-dependent replica
In-situmonitor
Generic monitor
bull Does not capture design-specific performance variation
82
This work Tunable monitor for closed-loop AVSbull Can be applied as a generic monitorbull Or tuned to capture design-specific performance
83ISVLSI-2014 invited talk 140710
Design of RO with Tunable Vmin
bull Identified two circuit knobs to tune Vmin
bull Series resistancebull Cell types (INV NAND NOR)
bull Proposed circuitbull Different cell type covers different process cornersbull Tune series resistance of each stage to high or low
1 bit 1 bit 1 bit Control pins
High resistance
Low resistance
84ISVLSI-2014 invited talk 140710
Benefit of Resilience Cost Reductionbull Reference flows
bull Pure-margin (PM) conventional methodology w only margin insertionbull Brute-force (BF) insert error-tolerant FFs at timing-critical endpoints
bull Proposed method (CO) achieves up to 20 energy reduction compared to reference methods
bull Resilience benefits increase with safety margin
PM BF CO PM BF CO PM BF CO25
30
35
40
45
50
55 Energy penalty of throughput degradationEnergy penalty of additional circuitsEnergy wo resilience
En
erg
y (
mJ
)
Large marginMedium marginSmall margin
MUL
PM BF CO PM BF CO PM BF CO25
27
29
31
33
35
En
erg
y (
mJ
)
EXU
Large marginMedium marginSmall margin
Smallmediumlarge margin safety margin = 51015 of clock period
85ISVLSI-2014 invited talk 140710
Increased Benefit of Resilience With AVSbull AVS (Adaptive Voltage Scaling) allows lower supply voltage for
resilient designs reduced powerbull We trade off between timing-error penalty vs reduced power at a
lower supply voltagebull Average 18 energy reduction compared to pure-margin designs
Resilience benefits increase in AVS context
084 088 092 096 10030
36
42
48
54
60brute-forcepure-marginCombOpt
Supply voltage (V)
En
erg
y (
mJ
)
084 086 088 09 092 094 096 098 1 10225
29
33
37
41
45brute-force
pure-margin
CombOpt
Supply voltage (V)
En
erg
y (
mJ
)
MUL EXU
Minimum achievable energy
86ISVLSI-2014 invited talk 140710
Overall Optimization Flow
bull Iteratively optimize with SEOpt and SkewOpt
Initial placement (all FFs = error-tolerant FFs)
Energy lt min energy
Save current solution
Margin insertion on K paths based on sensitivity function
Replace error-tolerant FFs w normal FFs
SEOpt
Activity-aware clock skew optimization
SkewOpt
Toward Holistic Modeling Margining and Tolerance of IC Variabi
IC Variability
Challenge Value of Technology
Solutions Modeling Margining Tolerance
Outline
BEOL Corner Optimization
Proposed Timing Signoff Flow
Conventional BEOL Corners
Statistical RC Model
Pessimism of Conventional BEOL Corners (CBC)
Intuition on Delay Variability Across Cw RCw
Intuition on Delay Variability Across Cw RCw (2)
Scaling Factor α and Delay Variation
Find Paths for Which TBCs Can Be Used
Determining α Arcw and Acw
Benefits of Tightened BEOL Corners
Outline (2)
How to Minimize Cost of Resilience
Tradeoff Resilience Cost vs Datapath Cost
Selective-Endpoint Optimization (SEOpt)
Clock Skew Optimization (SkewOpt)
Overall Optimization Flow
Benefit of Low-Cost Resilience
Increased Benefit of Resilience with AVS
Outline (3)
Breaking Chicken-Egg Loops Less Margin
Derated Library Characterization and AVS
Library Characterization for AVS
Power vs Area Across Different Signoffs
Heuristics 1
Vfinal Estimation
Observation and Heuristic 2
Proposed Library Characterization Flow
Power vs Area for All Designs
Also Multi-Mode Signoff Choices Matter
Also Tunable Monitors Less Margin
Also Tunable Monitors Less Margin (2)
Outline (4)
Conclusions
Thank You
Backup
Power Penalty to Fix EM with AVS
Homogeneous Corners
Homogeneous Corners (2)
Correlation Matrix
Wiring Structure in Timing-Critical Paths (2)
Delay Variation
Delay Variation (2)
Non-Homogeneous Corner
Opportunities for Tightened BEOL Corners
Wiring Structure in Timing-Critical Paths (2)
Proposed Timing Signoff Flow (2)
Experiment Setup
Further Analysis
Scaling Factor Results
Benefits of Tightened BEOL Corners (1)
Heuristics 1 (2)
Vfinal Estimation (2)
Observation and Heuristic 2 (2)
Technology and Benchmark Circuits
A Reference Signoff Flow
Experiment Setup (2)
ldquoChicken and Eggrdquo Loop
Bias Temperature Instability (BTI)
Observation 1
Results for DC Scenario
Problem Signoff Corner Definition
AVS Signoff Corner Selection
AVS Impact on EM Lifetime
EM Impact on AVS Scheduling
What is ldquoSignoffrdquo
Statistical Timing Analysis (1)
Statistical Timing Analysis (2)
Resilient Designs
Resilience Cost Reduction Problem
Selective-Endpoint Optimization
Process-Aware Vdd Scaling (PVS)
Challenge Variability
Energy Reduction in AVS Context
Our Concept Mode Dominance
Our Method Global Optimization
Classes of Closed-Loop AVS
Design of RO with Tunable Vmin
Benefit of Resilience Cost Reduction
Increased Benefit of Resilience With AVS
Overall Optimization Flow (2)
59ISVLSI-2014 invited talk 140710
Observation and Heuristic 2
bull Observation 2 Vfinal is not sensitive to gate types
bull Heuristic 2 use average Vfinal of different gate typesbull Vfinal is a function of timing slack
bull Assume timing slack = 0
10mV
60ISVLSI-2014 invited talk 140710
Technology and Benchmark Circuits
bull NANGATE library with 32nm PTM technology bull Signoff for setup time violationbull Temperature = 125Cbull Process corner = slow NMOS and PMOSbull BTI degradation = DC AC
Supply voltages
Circuit Frequency (GHz)C5315 138c7552 125AES 089MPEG2 105
Vmax105V
Vinit090V
Vheur1 (DC) 097V
Vheur1 (AC) 095V
Vheur2 (DC) 095V
Vheur2 (AC) 093V
61ISVLSI-2014 invited talk 140710
A Reference Signoff Flow
bull Basic idea keep a consistent VBTI VLIB and Vdd throughout circuit lifetime
bull Signoff flowbull Estimate aging at each time step
bull Update circuit timing and Vdd
bull Repeat until t = tfinal
bull Modify circuit and start over if Vfinal gt maximum allowed voltage
bull No overhead in timing analysis but very slow Many STA runs
and library
Vstep AVS voltage stepVfinal converged voltage
62ISVLSI-2014 invited talk 140710
Experiment Setupbull Characterize different derated libraries
bull Evaluate impact of library characterizationbull Seven setups
1 VBTI = Vlib = Vinit Ignore AVS2 Most pessimistic derated library3 VBTI = Vlib = Vmax Extreme corner for AVS4 VBTI = Vfinal Do not overestimate aging but ignores AVS5 No derated library (reference)6 Proposed method with α=07 Proposed method with α=003
Case 1 2 3 4 5 6 7Vlib(V) Vinit Vinit Vmax Vinit NA Vheur1 Vheur2
VBTI (V) Vinit Vmax Vmax Vfinal NA Vheur1 Vheur2
63ISVLSI-2014 invited talk 140710
ldquoChicken and Eggrdquo Loop
bull ldquoChicken and eggrdquo loop in signoffbull Derated library characterization is related to BTI + AVSbull AVS affected by circuit implementation
bull Timing constraints critical paths etc
bull Circuit is affected by library characterization
Circuit
Derated Libraries
Vfinal
Vlib VBTI
64ISVLSI-2014 invited talk 140710
Bias Temperature Instability (BTI)
|ΔVth| increases when device is on (stressed)|ΔVth| is partially recovered when device is off (relaxed)NBTI PMOS PBTINMOS
|Vgs|
time
ON OFF ON OFF
[VattikondaWC06]
Device aging (|ΔVth|) accumulates over time
[TCASrsquo14]
65ISVLSI-2014 invited talk 140710
Observation 1
[Chan11]
bull BTI is a ldquofront-loadedrdquo phenomenon
bull 50 BTI aging happens within the 1st year of circuit lifetime (total lifetime = 10 years)
bull Most Vdd increment happens in early lifetime
bull Gap between Vdd and Vfinal reduces rapidly
asymp70 Vdd increment in 1 year(remaining 30 over 9 years)
Vfinal
66ISVLSI-2014 invited talk 140710
Results for DC Scenario
Optimistic signoff corner bull AVS increases supply voltage
aggressively to compensate agingbull Consume more powerbull May fail to meet timing if desired
bull Assume no EM fix at signoffbull BTI degradation is checked at each step and MTTF is updated as
30 MTTF penalty
200mV voltage compensation
>
70ISVLSI-2014 invited talk 140710
0 2 4 6 8 10 12090092094096098100102104
S1 S2 S3 S4 S5
Year
VDD
DMA 3S1 S2 S3 S4 S5
78
79
80
81
MTT
F (Y
ear)
EM Impact on AVS Scheduling
12 years MTTF penalty
>
71ISVLSI-2014 invited talk 140710
What is ldquoSignoffrdquo
bull Foundation of contract between design house and foundrybull ldquochip should workrdquo stack of models margins analysesbull Function timing signal integrity power integrity hellip
Nominal VddStatic IR drop
Power grid IR gradientDynamic IR
HCINBTI
Signoff Vdd
Voltage
Problem Margins = pessimism
overdesign schedule delay
ldquomargin stackrdquo for voltage signoff
Operating voltage
72ISVLSI-2014 invited talk 140710
Statistical Timing Analysis (1)
bull Delay sensitivity of path pj to variation source zv
bull Assumptions bull Δdjv is linear with respect to variation sources
bull Variation sources are normal distributions
bull Obtain Δdjv using 28 runs of RC extraction and static timing analysis (STA)
28 itf files (27 variation
sources + Ytyp)
Routed Netlist
RC extraction
STA
Δdjv
Δdjv = [ - ] 3dj(Yv) dj(Ytyp)
Note Path delay includes gate and wire delays
73ISVLSI-2014 invited talk 140710
Statistical Timing Analysis (2)bull Σ is the correlation matrix for variation sources (eg 27 x 27)
bull Σ = λλT (Note λ is obtained by Cholesky decomposition)
h h119879 119903119900119906119892 119901119906119905=1minus119864119877119879
+1minus119864119877119903times119879
recovery cycles
Clock period
Error rate [Kahng10]
76ISVLSI-2014 invited talk 140710
Selective-Endpoint Optimizationbull Optimize fanin cone w tighter constraints Allows replacement of Razor FF w normal FFbull Trade off cost of resilience vs data path optimization
Energy Reduction in AVS Contextbull Adaptive voltage scaling allows lower supply voltage for resilient
designs thus reduced powerbull Proposed method trades off between timing-error penalty vs
reduced power at a lower supply voltagebull Proposed method achieves an average of 18 energy reduction
compared to pure-margin designs Resilience benefits increase in the context of AVS strategy
084 088 092 096 10030
36
42
48
54
60brute-forcepure-marginCombOpt
Supply voltage (V)
En
erg
y (
mJ
)
084 086 088 09 092 094 096 098 1 10225
29
33
37
41
45brute-force
pure-margin
CombOpt
Supply voltage (V)
En
erg
y (
mJ
)
MUL EXU
Minimum achievable energy
80ISVLSI-2014 invited talk 140710
Our Concept Mode Dominancebull Design cone (of mode A) is the union of all the feasible operating modes for
circuits signed of at mode Abull Design cone is determined by tradeoff between voltage and frequency (mainly
threshold voltages)bull One mode is outside of the design cone of the other
failed design overdesignbull Mode A has positive timing slacks with respect to mode B
mode A dominates mode Bbull Equivalent dominance no mode is dominated by the other
bull Modes are in each othersrsquo design cone
Voltage
Frequency
A
Negative Slacks = failed design
Positive Slacks = overdesign
B
C
Design Cone of mode A
Multi-mode signoff at modes which do not exhibit equivalent dominance leads to overdesign
Guideline search for signoff modes within design cone reduce overdesign
81ISVLSI-2014 invited talk 140710
Our Method Global Optimizationbull Iteratively sample and refine power models
bull Avoid circuit implementation at each modebull Small constant of runs is enough Scalable
Sample (SPampR)
Construct power models
Estimate optimal signoff modes
Sample (SPampR)
Refine power models
Adaptive search
Global optimization flow
09 10 11 1214
15
16
17
18
19
201st 2nd real
Signoff Voltage (v)
Po
wer
(m
W)
Power estimation of adaptive search
bull Ovals indicate sample pointsbull 1st 2nd power from power models at first
second iterationbull real power from real implemented circuits
Design AESf 700MHz
82ISVLSI-2014 invited talk 140710
Classes of Closed-Loop AVS
bull Critical path may be difficult to identify (IP from 3rd party)
bull Calibrating monitors at multiple modesvoltages requires long test time
Closed-Loop AVS
Design-dependent replica
In-situmonitor
Generic monitor
bull Does not capture design-specific performance variation
82
This work Tunable monitor for closed-loop AVSbull Can be applied as a generic monitorbull Or tuned to capture design-specific performance
83ISVLSI-2014 invited talk 140710
Design of RO with Tunable Vmin
bull Identified two circuit knobs to tune Vmin
bull Series resistancebull Cell types (INV NAND NOR)
bull Proposed circuitbull Different cell type covers different process cornersbull Tune series resistance of each stage to high or low
1 bit 1 bit 1 bit Control pins
High resistance
Low resistance
84ISVLSI-2014 invited talk 140710
Benefit of Resilience Cost Reductionbull Reference flows
bull Pure-margin (PM) conventional methodology w only margin insertionbull Brute-force (BF) insert error-tolerant FFs at timing-critical endpoints
bull Proposed method (CO) achieves up to 20 energy reduction compared to reference methods
bull Resilience benefits increase with safety margin
PM BF CO PM BF CO PM BF CO25
30
35
40
45
50
55 Energy penalty of throughput degradationEnergy penalty of additional circuitsEnergy wo resilience
En
erg
y (
mJ
)
Large marginMedium marginSmall margin
MUL
PM BF CO PM BF CO PM BF CO25
27
29
31
33
35
En
erg
y (
mJ
)
EXU
Large marginMedium marginSmall margin
Smallmediumlarge margin safety margin = 51015 of clock period
85ISVLSI-2014 invited talk 140710
Increased Benefit of Resilience With AVSbull AVS (Adaptive Voltage Scaling) allows lower supply voltage for
resilient designs reduced powerbull We trade off between timing-error penalty vs reduced power at a
lower supply voltagebull Average 18 energy reduction compared to pure-margin designs
Resilience benefits increase in AVS context
084 088 092 096 10030
36
42
48
54
60brute-forcepure-marginCombOpt
Supply voltage (V)
En
erg
y (
mJ
)
084 086 088 09 092 094 096 098 1 10225
29
33
37
41
45brute-force
pure-margin
CombOpt
Supply voltage (V)
En
erg
y (
mJ
)
MUL EXU
Minimum achievable energy
86ISVLSI-2014 invited talk 140710
Overall Optimization Flow
bull Iteratively optimize with SEOpt and SkewOpt
Initial placement (all FFs = error-tolerant FFs)
Energy lt min energy
Save current solution
Margin insertion on K paths based on sensitivity function
Replace error-tolerant FFs w normal FFs
SEOpt
Activity-aware clock skew optimization
SkewOpt
Toward Holistic Modeling Margining and Tolerance of IC Variabi
IC Variability
Challenge Value of Technology
Solutions Modeling Margining Tolerance
Outline
BEOL Corner Optimization
Proposed Timing Signoff Flow
Conventional BEOL Corners
Statistical RC Model
Pessimism of Conventional BEOL Corners (CBC)
Intuition on Delay Variability Across Cw RCw
Intuition on Delay Variability Across Cw RCw (2)
Scaling Factor α and Delay Variation
Find Paths for Which TBCs Can Be Used
Determining α Arcw and Acw
Benefits of Tightened BEOL Corners
Outline (2)
How to Minimize Cost of Resilience
Tradeoff Resilience Cost vs Datapath Cost
Selective-Endpoint Optimization (SEOpt)
Clock Skew Optimization (SkewOpt)
Overall Optimization Flow
Benefit of Low-Cost Resilience
Increased Benefit of Resilience with AVS
Outline (3)
Breaking Chicken-Egg Loops Less Margin
Derated Library Characterization and AVS
Library Characterization for AVS
Power vs Area Across Different Signoffs
Heuristics 1
Vfinal Estimation
Observation and Heuristic 2
Proposed Library Characterization Flow
Power vs Area for All Designs
Also Multi-Mode Signoff Choices Matter
Also Tunable Monitors Less Margin
Also Tunable Monitors Less Margin (2)
Outline (4)
Conclusions
Thank You
Backup
Power Penalty to Fix EM with AVS
Homogeneous Corners
Homogeneous Corners (2)
Correlation Matrix
Wiring Structure in Timing-Critical Paths (2)
Delay Variation
Delay Variation (2)
Non-Homogeneous Corner
Opportunities for Tightened BEOL Corners
Wiring Structure in Timing-Critical Paths (2)
Proposed Timing Signoff Flow (2)
Experiment Setup
Further Analysis
Scaling Factor Results
Benefits of Tightened BEOL Corners (1)
Heuristics 1 (2)
Vfinal Estimation (2)
Observation and Heuristic 2 (2)
Technology and Benchmark Circuits
A Reference Signoff Flow
Experiment Setup (2)
ldquoChicken and Eggrdquo Loop
Bias Temperature Instability (BTI)
Observation 1
Results for DC Scenario
Problem Signoff Corner Definition
AVS Signoff Corner Selection
AVS Impact on EM Lifetime
EM Impact on AVS Scheduling
What is ldquoSignoffrdquo
Statistical Timing Analysis (1)
Statistical Timing Analysis (2)
Resilient Designs
Resilience Cost Reduction Problem
Selective-Endpoint Optimization
Process-Aware Vdd Scaling (PVS)
Challenge Variability
Energy Reduction in AVS Context
Our Concept Mode Dominance
Our Method Global Optimization
Classes of Closed-Loop AVS
Design of RO with Tunable Vmin
Benefit of Resilience Cost Reduction
Increased Benefit of Resilience With AVS
Overall Optimization Flow (2)
60ISVLSI-2014 invited talk 140710
Technology and Benchmark Circuits
bull NANGATE library with 32nm PTM technology bull Signoff for setup time violationbull Temperature = 125Cbull Process corner = slow NMOS and PMOSbull BTI degradation = DC AC
Supply voltages
Circuit Frequency (GHz)C5315 138c7552 125AES 089MPEG2 105
Vmax105V
Vinit090V
Vheur1 (DC) 097V
Vheur1 (AC) 095V
Vheur2 (DC) 095V
Vheur2 (AC) 093V
61ISVLSI-2014 invited talk 140710
A Reference Signoff Flow
bull Basic idea keep a consistent VBTI VLIB and Vdd throughout circuit lifetime
bull Signoff flowbull Estimate aging at each time step
bull Update circuit timing and Vdd
bull Repeat until t = tfinal
bull Modify circuit and start over if Vfinal gt maximum allowed voltage
bull No overhead in timing analysis but very slow Many STA runs
and library
Vstep AVS voltage stepVfinal converged voltage
62ISVLSI-2014 invited talk 140710
Experiment Setupbull Characterize different derated libraries
bull Evaluate impact of library characterizationbull Seven setups
1 VBTI = Vlib = Vinit Ignore AVS2 Most pessimistic derated library3 VBTI = Vlib = Vmax Extreme corner for AVS4 VBTI = Vfinal Do not overestimate aging but ignores AVS5 No derated library (reference)6 Proposed method with α=07 Proposed method with α=003
Case 1 2 3 4 5 6 7Vlib(V) Vinit Vinit Vmax Vinit NA Vheur1 Vheur2
VBTI (V) Vinit Vmax Vmax Vfinal NA Vheur1 Vheur2
63ISVLSI-2014 invited talk 140710
ldquoChicken and Eggrdquo Loop
bull ldquoChicken and eggrdquo loop in signoffbull Derated library characterization is related to BTI + AVSbull AVS affected by circuit implementation
bull Timing constraints critical paths etc
bull Circuit is affected by library characterization
Circuit
Derated Libraries
Vfinal
Vlib VBTI
64ISVLSI-2014 invited talk 140710
Bias Temperature Instability (BTI)
|ΔVth| increases when device is on (stressed)|ΔVth| is partially recovered when device is off (relaxed)NBTI PMOS PBTINMOS
|Vgs|
time
ON OFF ON OFF
[VattikondaWC06]
Device aging (|ΔVth|) accumulates over time
[TCASrsquo14]
65ISVLSI-2014 invited talk 140710
Observation 1
[Chan11]
bull BTI is a ldquofront-loadedrdquo phenomenon
bull 50 BTI aging happens within the 1st year of circuit lifetime (total lifetime = 10 years)
bull Most Vdd increment happens in early lifetime
bull Gap between Vdd and Vfinal reduces rapidly
asymp70 Vdd increment in 1 year(remaining 30 over 9 years)
Vfinal
66ISVLSI-2014 invited talk 140710
Results for DC Scenario
Optimistic signoff corner bull AVS increases supply voltage
aggressively to compensate agingbull Consume more powerbull May fail to meet timing if desired
bull Assume no EM fix at signoffbull BTI degradation is checked at each step and MTTF is updated as
30 MTTF penalty
200mV voltage compensation
>
70ISVLSI-2014 invited talk 140710
0 2 4 6 8 10 12090092094096098100102104
S1 S2 S3 S4 S5
Year
VDD
DMA 3S1 S2 S3 S4 S5
78
79
80
81
MTT
F (Y
ear)
EM Impact on AVS Scheduling
12 years MTTF penalty
>
71ISVLSI-2014 invited talk 140710
What is ldquoSignoffrdquo
bull Foundation of contract between design house and foundrybull ldquochip should workrdquo stack of models margins analysesbull Function timing signal integrity power integrity hellip
Nominal VddStatic IR drop
Power grid IR gradientDynamic IR
HCINBTI
Signoff Vdd
Voltage
Problem Margins = pessimism
overdesign schedule delay
ldquomargin stackrdquo for voltage signoff
Operating voltage
72ISVLSI-2014 invited talk 140710
Statistical Timing Analysis (1)
bull Delay sensitivity of path pj to variation source zv
bull Assumptions bull Δdjv is linear with respect to variation sources
bull Variation sources are normal distributions
bull Obtain Δdjv using 28 runs of RC extraction and static timing analysis (STA)
28 itf files (27 variation
sources + Ytyp)
Routed Netlist
RC extraction
STA
Δdjv
Δdjv = [ - ] 3dj(Yv) dj(Ytyp)
Note Path delay includes gate and wire delays
73ISVLSI-2014 invited talk 140710
Statistical Timing Analysis (2)bull Σ is the correlation matrix for variation sources (eg 27 x 27)
bull Σ = λλT (Note λ is obtained by Cholesky decomposition)
h h119879 119903119900119906119892 119901119906119905=1minus119864119877119879
+1minus119864119877119903times119879
recovery cycles
Clock period
Error rate [Kahng10]
76ISVLSI-2014 invited talk 140710
Selective-Endpoint Optimizationbull Optimize fanin cone w tighter constraints Allows replacement of Razor FF w normal FFbull Trade off cost of resilience vs data path optimization
bull Assume no EM fix at signoffbull BTI degradation is checked at each step and MTTF is updated as
30 MTTF penalty
200mV voltage compensation
>
70ISVLSI-2014 invited talk 140710
0 2 4 6 8 10 12090092094096098100102104
S1 S2 S3 S4 S5
Year
VDD
DMA 3S1 S2 S3 S4 S5
78
79
80
81
MTT
F (Y
ear)
EM Impact on AVS Scheduling
12 years MTTF penalty
>
71ISVLSI-2014 invited talk 140710
What is ldquoSignoffrdquo
bull Foundation of contract between design house and foundrybull ldquochip should workrdquo stack of models margins analysesbull Function timing signal integrity power integrity hellip
Nominal VddStatic IR drop
Power grid IR gradientDynamic IR
HCINBTI
Signoff Vdd
Voltage
Problem Margins = pessimism
overdesign schedule delay
ldquomargin stackrdquo for voltage signoff
Operating voltage
72ISVLSI-2014 invited talk 140710
Statistical Timing Analysis (1)
bull Delay sensitivity of path pj to variation source zv
bull Assumptions bull Δdjv is linear with respect to variation sources
bull Variation sources are normal distributions
bull Obtain Δdjv using 28 runs of RC extraction and static timing analysis (STA)
28 itf files (27 variation
sources + Ytyp)
Routed Netlist
RC extraction
STA
Δdjv
Δdjv = [ - ] 3dj(Yv) dj(Ytyp)
Note Path delay includes gate and wire delays
73ISVLSI-2014 invited talk 140710
Statistical Timing Analysis (2)bull Σ is the correlation matrix for variation sources (eg 27 x 27)
bull Σ = λλT (Note λ is obtained by Cholesky decomposition)
h h119879 119903119900119906119892 119901119906119905=1minus119864119877119879
+1minus119864119877119903times119879
recovery cycles
Clock period
Error rate [Kahng10]
76ISVLSI-2014 invited talk 140710
Selective-Endpoint Optimizationbull Optimize fanin cone w tighter constraints Allows replacement of Razor FF w normal FFbull Trade off cost of resilience vs data path optimization
bull Assume no EM fix at signoffbull BTI degradation is checked at each step and MTTF is updated as
30 MTTF penalty
200mV voltage compensation
>
70ISVLSI-2014 invited talk 140710
0 2 4 6 8 10 12090092094096098100102104
S1 S2 S3 S4 S5
Year
VDD
DMA 3S1 S2 S3 S4 S5
78
79
80
81
MTT
F (Y
ear)
EM Impact on AVS Scheduling
12 years MTTF penalty
>
71ISVLSI-2014 invited talk 140710
What is ldquoSignoffrdquo
bull Foundation of contract between design house and foundrybull ldquochip should workrdquo stack of models margins analysesbull Function timing signal integrity power integrity hellip
Nominal VddStatic IR drop
Power grid IR gradientDynamic IR
HCINBTI
Signoff Vdd
Voltage
Problem Margins = pessimism
overdesign schedule delay
ldquomargin stackrdquo for voltage signoff
Operating voltage
72ISVLSI-2014 invited talk 140710
Statistical Timing Analysis (1)
bull Delay sensitivity of path pj to variation source zv
bull Assumptions bull Δdjv is linear with respect to variation sources
bull Variation sources are normal distributions
bull Obtain Δdjv using 28 runs of RC extraction and static timing analysis (STA)
28 itf files (27 variation
sources + Ytyp)
Routed Netlist
RC extraction
STA
Δdjv
Δdjv = [ - ] 3dj(Yv) dj(Ytyp)
Note Path delay includes gate and wire delays
73ISVLSI-2014 invited talk 140710
Statistical Timing Analysis (2)bull Σ is the correlation matrix for variation sources (eg 27 x 27)
bull Σ = λλT (Note λ is obtained by Cholesky decomposition)
h h119879 119903119900119906119892 119901119906119905=1minus119864119877119879
+1minus119864119877119903times119879
recovery cycles
Clock period
Error rate [Kahng10]
76ISVLSI-2014 invited talk 140710
Selective-Endpoint Optimizationbull Optimize fanin cone w tighter constraints Allows replacement of Razor FF w normal FFbull Trade off cost of resilience vs data path optimization
bull Assume no EM fix at signoffbull BTI degradation is checked at each step and MTTF is updated as
30 MTTF penalty
200mV voltage compensation
>
70ISVLSI-2014 invited talk 140710
0 2 4 6 8 10 12090092094096098100102104
S1 S2 S3 S4 S5
Year
VDD
DMA 3S1 S2 S3 S4 S5
78
79
80
81
MTT
F (Y
ear)
EM Impact on AVS Scheduling
12 years MTTF penalty
>
71ISVLSI-2014 invited talk 140710
What is ldquoSignoffrdquo
bull Foundation of contract between design house and foundrybull ldquochip should workrdquo stack of models margins analysesbull Function timing signal integrity power integrity hellip
Nominal VddStatic IR drop
Power grid IR gradientDynamic IR
HCINBTI
Signoff Vdd
Voltage
Problem Margins = pessimism
overdesign schedule delay
ldquomargin stackrdquo for voltage signoff
Operating voltage
72ISVLSI-2014 invited talk 140710
Statistical Timing Analysis (1)
bull Delay sensitivity of path pj to variation source zv
bull Assumptions bull Δdjv is linear with respect to variation sources
bull Variation sources are normal distributions
bull Obtain Δdjv using 28 runs of RC extraction and static timing analysis (STA)
28 itf files (27 variation
sources + Ytyp)
Routed Netlist
RC extraction
STA
Δdjv
Δdjv = [ - ] 3dj(Yv) dj(Ytyp)
Note Path delay includes gate and wire delays
73ISVLSI-2014 invited talk 140710
Statistical Timing Analysis (2)bull Σ is the correlation matrix for variation sources (eg 27 x 27)
bull Σ = λλT (Note λ is obtained by Cholesky decomposition)
h h119879 119903119900119906119892 119901119906119905=1minus119864119877119879
+1minus119864119877119903times119879
recovery cycles
Clock period
Error rate [Kahng10]
76ISVLSI-2014 invited talk 140710
Selective-Endpoint Optimizationbull Optimize fanin cone w tighter constraints Allows replacement of Razor FF w normal FFbull Trade off cost of resilience vs data path optimization
bull Assume no EM fix at signoffbull BTI degradation is checked at each step and MTTF is updated as
30 MTTF penalty
200mV voltage compensation
>
70ISVLSI-2014 invited talk 140710
0 2 4 6 8 10 12090092094096098100102104
S1 S2 S3 S4 S5
Year
VDD
DMA 3S1 S2 S3 S4 S5
78
79
80
81
MTT
F (Y
ear)
EM Impact on AVS Scheduling
12 years MTTF penalty
>
71ISVLSI-2014 invited talk 140710
What is ldquoSignoffrdquo
bull Foundation of contract between design house and foundrybull ldquochip should workrdquo stack of models margins analysesbull Function timing signal integrity power integrity hellip
Nominal VddStatic IR drop
Power grid IR gradientDynamic IR
HCINBTI
Signoff Vdd
Voltage
Problem Margins = pessimism
overdesign schedule delay
ldquomargin stackrdquo for voltage signoff
Operating voltage
72ISVLSI-2014 invited talk 140710
Statistical Timing Analysis (1)
bull Delay sensitivity of path pj to variation source zv
bull Assumptions bull Δdjv is linear with respect to variation sources
bull Variation sources are normal distributions
bull Obtain Δdjv using 28 runs of RC extraction and static timing analysis (STA)
28 itf files (27 variation
sources + Ytyp)
Routed Netlist
RC extraction
STA
Δdjv
Δdjv = [ - ] 3dj(Yv) dj(Ytyp)
Note Path delay includes gate and wire delays
73ISVLSI-2014 invited talk 140710
Statistical Timing Analysis (2)bull Σ is the correlation matrix for variation sources (eg 27 x 27)
bull Σ = λλT (Note λ is obtained by Cholesky decomposition)
h h119879 119903119900119906119892 119901119906119905=1minus119864119877119879
+1minus119864119877119903times119879
recovery cycles
Clock period
Error rate [Kahng10]
76ISVLSI-2014 invited talk 140710
Selective-Endpoint Optimizationbull Optimize fanin cone w tighter constraints Allows replacement of Razor FF w normal FFbull Trade off cost of resilience vs data path optimization
bull Assume no EM fix at signoffbull BTI degradation is checked at each step and MTTF is updated as
30 MTTF penalty
200mV voltage compensation
>
70ISVLSI-2014 invited talk 140710
0 2 4 6 8 10 12090092094096098100102104
S1 S2 S3 S4 S5
Year
VDD
DMA 3S1 S2 S3 S4 S5
78
79
80
81
MTT
F (Y
ear)
EM Impact on AVS Scheduling
12 years MTTF penalty
>
71ISVLSI-2014 invited talk 140710
What is ldquoSignoffrdquo
bull Foundation of contract between design house and foundrybull ldquochip should workrdquo stack of models margins analysesbull Function timing signal integrity power integrity hellip
Nominal VddStatic IR drop
Power grid IR gradientDynamic IR
HCINBTI
Signoff Vdd
Voltage
Problem Margins = pessimism
overdesign schedule delay
ldquomargin stackrdquo for voltage signoff
Operating voltage
72ISVLSI-2014 invited talk 140710
Statistical Timing Analysis (1)
bull Delay sensitivity of path pj to variation source zv
bull Assumptions bull Δdjv is linear with respect to variation sources
bull Variation sources are normal distributions
bull Obtain Δdjv using 28 runs of RC extraction and static timing analysis (STA)
28 itf files (27 variation
sources + Ytyp)
Routed Netlist
RC extraction
STA
Δdjv
Δdjv = [ - ] 3dj(Yv) dj(Ytyp)
Note Path delay includes gate and wire delays
73ISVLSI-2014 invited talk 140710
Statistical Timing Analysis (2)bull Σ is the correlation matrix for variation sources (eg 27 x 27)
bull Σ = λλT (Note λ is obtained by Cholesky decomposition)
h h119879 119903119900119906119892 119901119906119905=1minus119864119877119879
+1minus119864119877119903times119879
recovery cycles
Clock period
Error rate [Kahng10]
76ISVLSI-2014 invited talk 140710
Selective-Endpoint Optimizationbull Optimize fanin cone w tighter constraints Allows replacement of Razor FF w normal FFbull Trade off cost of resilience vs data path optimization
bull Assume no EM fix at signoffbull BTI degradation is checked at each step and MTTF is updated as
30 MTTF penalty
200mV voltage compensation
>
70ISVLSI-2014 invited talk 140710
0 2 4 6 8 10 12090092094096098100102104
S1 S2 S3 S4 S5
Year
VDD
DMA 3S1 S2 S3 S4 S5
78
79
80
81
MTT
F (Y
ear)
EM Impact on AVS Scheduling
12 years MTTF penalty
>
71ISVLSI-2014 invited talk 140710
What is ldquoSignoffrdquo
bull Foundation of contract between design house and foundrybull ldquochip should workrdquo stack of models margins analysesbull Function timing signal integrity power integrity hellip
Nominal VddStatic IR drop
Power grid IR gradientDynamic IR
HCINBTI
Signoff Vdd
Voltage
Problem Margins = pessimism
overdesign schedule delay
ldquomargin stackrdquo for voltage signoff
Operating voltage
72ISVLSI-2014 invited talk 140710
Statistical Timing Analysis (1)
bull Delay sensitivity of path pj to variation source zv
bull Assumptions bull Δdjv is linear with respect to variation sources
bull Variation sources are normal distributions
bull Obtain Δdjv using 28 runs of RC extraction and static timing analysis (STA)
28 itf files (27 variation
sources + Ytyp)
Routed Netlist
RC extraction
STA
Δdjv
Δdjv = [ - ] 3dj(Yv) dj(Ytyp)
Note Path delay includes gate and wire delays
73ISVLSI-2014 invited talk 140710
Statistical Timing Analysis (2)bull Σ is the correlation matrix for variation sources (eg 27 x 27)
bull Σ = λλT (Note λ is obtained by Cholesky decomposition)
h h119879 119903119900119906119892 119901119906119905=1minus119864119877119879
+1minus119864119877119903times119879
recovery cycles
Clock period
Error rate [Kahng10]
76ISVLSI-2014 invited talk 140710
Selective-Endpoint Optimizationbull Optimize fanin cone w tighter constraints Allows replacement of Razor FF w normal FFbull Trade off cost of resilience vs data path optimization
bull Assume no EM fix at signoffbull BTI degradation is checked at each step and MTTF is updated as
30 MTTF penalty
200mV voltage compensation
>
70ISVLSI-2014 invited talk 140710
0 2 4 6 8 10 12090092094096098100102104
S1 S2 S3 S4 S5
Year
VDD
DMA 3S1 S2 S3 S4 S5
78
79
80
81
MTT
F (Y
ear)
EM Impact on AVS Scheduling
12 years MTTF penalty
>
71ISVLSI-2014 invited talk 140710
What is ldquoSignoffrdquo
bull Foundation of contract between design house and foundrybull ldquochip should workrdquo stack of models margins analysesbull Function timing signal integrity power integrity hellip
Nominal VddStatic IR drop
Power grid IR gradientDynamic IR
HCINBTI
Signoff Vdd
Voltage
Problem Margins = pessimism
overdesign schedule delay
ldquomargin stackrdquo for voltage signoff
Operating voltage
72ISVLSI-2014 invited talk 140710
Statistical Timing Analysis (1)
bull Delay sensitivity of path pj to variation source zv
bull Assumptions bull Δdjv is linear with respect to variation sources
bull Variation sources are normal distributions
bull Obtain Δdjv using 28 runs of RC extraction and static timing analysis (STA)
28 itf files (27 variation
sources + Ytyp)
Routed Netlist
RC extraction
STA
Δdjv
Δdjv = [ - ] 3dj(Yv) dj(Ytyp)
Note Path delay includes gate and wire delays
73ISVLSI-2014 invited talk 140710
Statistical Timing Analysis (2)bull Σ is the correlation matrix for variation sources (eg 27 x 27)
bull Σ = λλT (Note λ is obtained by Cholesky decomposition)
h h119879 119903119900119906119892 119901119906119905=1minus119864119877119879
+1minus119864119877119903times119879
recovery cycles
Clock period
Error rate [Kahng10]
76ISVLSI-2014 invited talk 140710
Selective-Endpoint Optimizationbull Optimize fanin cone w tighter constraints Allows replacement of Razor FF w normal FFbull Trade off cost of resilience vs data path optimization
bull Assume no EM fix at signoffbull BTI degradation is checked at each step and MTTF is updated as
30 MTTF penalty
200mV voltage compensation
>
70ISVLSI-2014 invited talk 140710
0 2 4 6 8 10 12090092094096098100102104
S1 S2 S3 S4 S5
Year
VDD
DMA 3S1 S2 S3 S4 S5
78
79
80
81
MTT
F (Y
ear)
EM Impact on AVS Scheduling
12 years MTTF penalty
>
71ISVLSI-2014 invited talk 140710
What is ldquoSignoffrdquo
bull Foundation of contract between design house and foundrybull ldquochip should workrdquo stack of models margins analysesbull Function timing signal integrity power integrity hellip
Nominal VddStatic IR drop
Power grid IR gradientDynamic IR
HCINBTI
Signoff Vdd
Voltage
Problem Margins = pessimism
overdesign schedule delay
ldquomargin stackrdquo for voltage signoff
Operating voltage
72ISVLSI-2014 invited talk 140710
Statistical Timing Analysis (1)
bull Delay sensitivity of path pj to variation source zv
bull Assumptions bull Δdjv is linear with respect to variation sources
bull Variation sources are normal distributions
bull Obtain Δdjv using 28 runs of RC extraction and static timing analysis (STA)
28 itf files (27 variation
sources + Ytyp)
Routed Netlist
RC extraction
STA
Δdjv
Δdjv = [ - ] 3dj(Yv) dj(Ytyp)
Note Path delay includes gate and wire delays
73ISVLSI-2014 invited talk 140710
Statistical Timing Analysis (2)bull Σ is the correlation matrix for variation sources (eg 27 x 27)
bull Σ = λλT (Note λ is obtained by Cholesky decomposition)
h h119879 119903119900119906119892 119901119906119905=1minus119864119877119879
+1minus119864119877119903times119879
recovery cycles
Clock period
Error rate [Kahng10]
76ISVLSI-2014 invited talk 140710
Selective-Endpoint Optimizationbull Optimize fanin cone w tighter constraints Allows replacement of Razor FF w normal FFbull Trade off cost of resilience vs data path optimization
bull Assume no EM fix at signoffbull BTI degradation is checked at each step and MTTF is updated as
30 MTTF penalty
200mV voltage compensation
>
70ISVLSI-2014 invited talk 140710
0 2 4 6 8 10 12090092094096098100102104
S1 S2 S3 S4 S5
Year
VDD
DMA 3S1 S2 S3 S4 S5
78
79
80
81
MTT
F (Y
ear)
EM Impact on AVS Scheduling
12 years MTTF penalty
>
71ISVLSI-2014 invited talk 140710
What is ldquoSignoffrdquo
bull Foundation of contract between design house and foundrybull ldquochip should workrdquo stack of models margins analysesbull Function timing signal integrity power integrity hellip
Nominal VddStatic IR drop
Power grid IR gradientDynamic IR
HCINBTI
Signoff Vdd
Voltage
Problem Margins = pessimism
overdesign schedule delay
ldquomargin stackrdquo for voltage signoff
Operating voltage
72ISVLSI-2014 invited talk 140710
Statistical Timing Analysis (1)
bull Delay sensitivity of path pj to variation source zv
bull Assumptions bull Δdjv is linear with respect to variation sources
bull Variation sources are normal distributions
bull Obtain Δdjv using 28 runs of RC extraction and static timing analysis (STA)
28 itf files (27 variation
sources + Ytyp)
Routed Netlist
RC extraction
STA
Δdjv
Δdjv = [ - ] 3dj(Yv) dj(Ytyp)
Note Path delay includes gate and wire delays
73ISVLSI-2014 invited talk 140710
Statistical Timing Analysis (2)bull Σ is the correlation matrix for variation sources (eg 27 x 27)
bull Σ = λλT (Note λ is obtained by Cholesky decomposition)
h h119879 119903119900119906119892 119901119906119905=1minus119864119877119879
+1minus119864119877119903times119879
recovery cycles
Clock period
Error rate [Kahng10]
76ISVLSI-2014 invited talk 140710
Selective-Endpoint Optimizationbull Optimize fanin cone w tighter constraints Allows replacement of Razor FF w normal FFbull Trade off cost of resilience vs data path optimization
Energy Reduction in AVS Contextbull Adaptive voltage scaling allows lower supply voltage for resilient
designs thus reduced powerbull Proposed method trades off between timing-error penalty vs
reduced power at a lower supply voltagebull Proposed method achieves an average of 18 energy reduction
compared to pure-margin designs Resilience benefits increase in the context of AVS strategy
084 088 092 096 10030
36
42
48
54
60brute-forcepure-marginCombOpt
Supply voltage (V)
En
erg
y (
mJ
)
084 086 088 09 092 094 096 098 1 10225
29
33
37
41
45brute-force
pure-margin
CombOpt
Supply voltage (V)
En
erg
y (
mJ
)
MUL EXU
Minimum achievable energy
80ISVLSI-2014 invited talk 140710
Our Concept Mode Dominancebull Design cone (of mode A) is the union of all the feasible operating modes for
circuits signed of at mode Abull Design cone is determined by tradeoff between voltage and frequency (mainly
threshold voltages)bull One mode is outside of the design cone of the other
failed design overdesignbull Mode A has positive timing slacks with respect to mode B
mode A dominates mode Bbull Equivalent dominance no mode is dominated by the other
bull Modes are in each othersrsquo design cone
Voltage
Frequency
A
Negative Slacks = failed design
Positive Slacks = overdesign
B
C
Design Cone of mode A
Multi-mode signoff at modes which do not exhibit equivalent dominance leads to overdesign
Guideline search for signoff modes within design cone reduce overdesign
81ISVLSI-2014 invited talk 140710
Our Method Global Optimizationbull Iteratively sample and refine power models
bull Avoid circuit implementation at each modebull Small constant of runs is enough Scalable
Sample (SPampR)
Construct power models
Estimate optimal signoff modes
Sample (SPampR)
Refine power models
Adaptive search
Global optimization flow
09 10 11 1214
15
16
17
18
19
201st 2nd real
Signoff Voltage (v)
Po
wer
(m
W)
Power estimation of adaptive search
bull Ovals indicate sample pointsbull 1st 2nd power from power models at first
second iterationbull real power from real implemented circuits
Design AESf 700MHz
82ISVLSI-2014 invited talk 140710
Classes of Closed-Loop AVS
bull Critical path may be difficult to identify (IP from 3rd party)
bull Calibrating monitors at multiple modesvoltages requires long test time
Closed-Loop AVS
Design-dependent replica
In-situmonitor
Generic monitor
bull Does not capture design-specific performance variation
82
This work Tunable monitor for closed-loop AVSbull Can be applied as a generic monitorbull Or tuned to capture design-specific performance
83ISVLSI-2014 invited talk 140710
Design of RO with Tunable Vmin
bull Identified two circuit knobs to tune Vmin
bull Series resistancebull Cell types (INV NAND NOR)
bull Proposed circuitbull Different cell type covers different process cornersbull Tune series resistance of each stage to high or low
1 bit 1 bit 1 bit Control pins
High resistance
Low resistance
84ISVLSI-2014 invited talk 140710
Benefit of Resilience Cost Reductionbull Reference flows
bull Pure-margin (PM) conventional methodology w only margin insertionbull Brute-force (BF) insert error-tolerant FFs at timing-critical endpoints
bull Proposed method (CO) achieves up to 20 energy reduction compared to reference methods
bull Resilience benefits increase with safety margin
PM BF CO PM BF CO PM BF CO25
30
35
40
45
50
55 Energy penalty of throughput degradationEnergy penalty of additional circuitsEnergy wo resilience
En
erg
y (
mJ
)
Large marginMedium marginSmall margin
MUL
PM BF CO PM BF CO PM BF CO25
27
29
31
33
35
En
erg
y (
mJ
)
EXU
Large marginMedium marginSmall margin
Smallmediumlarge margin safety margin = 51015 of clock period
85ISVLSI-2014 invited talk 140710
Increased Benefit of Resilience With AVSbull AVS (Adaptive Voltage Scaling) allows lower supply voltage for
resilient designs reduced powerbull We trade off between timing-error penalty vs reduced power at a
lower supply voltagebull Average 18 energy reduction compared to pure-margin designs
Resilience benefits increase in AVS context
084 088 092 096 10030
36
42
48
54
60brute-forcepure-marginCombOpt
Supply voltage (V)
En
erg
y (
mJ
)
084 086 088 09 092 094 096 098 1 10225
29
33
37
41
45brute-force
pure-margin
CombOpt
Supply voltage (V)
En
erg
y (
mJ
)
MUL EXU
Minimum achievable energy
86ISVLSI-2014 invited talk 140710
Overall Optimization Flow
bull Iteratively optimize with SEOpt and SkewOpt
Initial placement (all FFs = error-tolerant FFs)
Energy lt min energy
Save current solution
Margin insertion on K paths based on sensitivity function
Replace error-tolerant FFs w normal FFs
SEOpt
Activity-aware clock skew optimization
SkewOpt
Toward Holistic Modeling Margining and Tolerance of IC Variabi
IC Variability
Challenge Value of Technology
Solutions Modeling Margining Tolerance
Outline
BEOL Corner Optimization
Proposed Timing Signoff Flow
Conventional BEOL Corners
Statistical RC Model
Pessimism of Conventional BEOL Corners (CBC)
Intuition on Delay Variability Across Cw RCw
Intuition on Delay Variability Across Cw RCw (2)
Scaling Factor α and Delay Variation
Find Paths for Which TBCs Can Be Used
Determining α Arcw and Acw
Benefits of Tightened BEOL Corners
Outline (2)
How to Minimize Cost of Resilience
Tradeoff Resilience Cost vs Datapath Cost
Selective-Endpoint Optimization (SEOpt)
Clock Skew Optimization (SkewOpt)
Overall Optimization Flow
Benefit of Low-Cost Resilience
Increased Benefit of Resilience with AVS
Outline (3)
Breaking Chicken-Egg Loops Less Margin
Derated Library Characterization and AVS
Library Characterization for AVS
Power vs Area Across Different Signoffs
Heuristics 1
Vfinal Estimation
Observation and Heuristic 2
Proposed Library Characterization Flow
Power vs Area for All Designs
Also Multi-Mode Signoff Choices Matter
Also Tunable Monitors Less Margin
Also Tunable Monitors Less Margin (2)
Outline (4)
Conclusions
Thank You
Backup
Power Penalty to Fix EM with AVS
Homogeneous Corners
Homogeneous Corners (2)
Correlation Matrix
Wiring Structure in Timing-Critical Paths (2)
Delay Variation
Delay Variation (2)
Non-Homogeneous Corner
Opportunities for Tightened BEOL Corners
Wiring Structure in Timing-Critical Paths (2)
Proposed Timing Signoff Flow (2)
Experiment Setup
Further Analysis
Scaling Factor Results
Benefits of Tightened BEOL Corners (1)
Heuristics 1 (2)
Vfinal Estimation (2)
Observation and Heuristic 2 (2)
Technology and Benchmark Circuits
A Reference Signoff Flow
Experiment Setup (2)
ldquoChicken and Eggrdquo Loop
Bias Temperature Instability (BTI)
Observation 1
Results for DC Scenario
Problem Signoff Corner Definition
AVS Signoff Corner Selection
AVS Impact on EM Lifetime
EM Impact on AVS Scheduling
What is ldquoSignoffrdquo
Statistical Timing Analysis (1)
Statistical Timing Analysis (2)
Resilient Designs
Resilience Cost Reduction Problem
Selective-Endpoint Optimization
Process-Aware Vdd Scaling (PVS)
Challenge Variability
Energy Reduction in AVS Context
Our Concept Mode Dominance
Our Method Global Optimization
Classes of Closed-Loop AVS
Design of RO with Tunable Vmin
Benefit of Resilience Cost Reduction
Increased Benefit of Resilience With AVS
Overall Optimization Flow (2)
70ISVLSI-2014 invited talk 140710
0 2 4 6 8 10 12090092094096098100102104
S1 S2 S3 S4 S5
Year
VDD
DMA 3S1 S2 S3 S4 S5
78
79
80
81
MTT
F (Y
ear)
EM Impact on AVS Scheduling
12 years MTTF penalty
>
71ISVLSI-2014 invited talk 140710
What is ldquoSignoffrdquo
bull Foundation of contract between design house and foundrybull ldquochip should workrdquo stack of models margins analysesbull Function timing signal integrity power integrity hellip
Nominal VddStatic IR drop
Power grid IR gradientDynamic IR
HCINBTI
Signoff Vdd
Voltage
Problem Margins = pessimism
overdesign schedule delay
ldquomargin stackrdquo for voltage signoff
Operating voltage
72ISVLSI-2014 invited talk 140710
Statistical Timing Analysis (1)
bull Delay sensitivity of path pj to variation source zv
bull Assumptions bull Δdjv is linear with respect to variation sources
bull Variation sources are normal distributions
bull Obtain Δdjv using 28 runs of RC extraction and static timing analysis (STA)
28 itf files (27 variation
sources + Ytyp)
Routed Netlist
RC extraction
STA
Δdjv
Δdjv = [ - ] 3dj(Yv) dj(Ytyp)
Note Path delay includes gate and wire delays
73ISVLSI-2014 invited talk 140710
Statistical Timing Analysis (2)bull Σ is the correlation matrix for variation sources (eg 27 x 27)
bull Σ = λλT (Note λ is obtained by Cholesky decomposition)
h h119879 119903119900119906119892 119901119906119905=1minus119864119877119879
+1minus119864119877119903times119879
recovery cycles
Clock period
Error rate [Kahng10]
76ISVLSI-2014 invited talk 140710
Selective-Endpoint Optimizationbull Optimize fanin cone w tighter constraints Allows replacement of Razor FF w normal FFbull Trade off cost of resilience vs data path optimization
Energy Reduction in AVS Contextbull Adaptive voltage scaling allows lower supply voltage for resilient
designs thus reduced powerbull Proposed method trades off between timing-error penalty vs
reduced power at a lower supply voltagebull Proposed method achieves an average of 18 energy reduction
compared to pure-margin designs Resilience benefits increase in the context of AVS strategy
084 088 092 096 10030
36
42
48
54
60brute-forcepure-marginCombOpt
Supply voltage (V)
En
erg
y (
mJ
)
084 086 088 09 092 094 096 098 1 10225
29
33
37
41
45brute-force
pure-margin
CombOpt
Supply voltage (V)
En
erg
y (
mJ
)
MUL EXU
Minimum achievable energy
80ISVLSI-2014 invited talk 140710
Our Concept Mode Dominancebull Design cone (of mode A) is the union of all the feasible operating modes for
circuits signed of at mode Abull Design cone is determined by tradeoff between voltage and frequency (mainly
threshold voltages)bull One mode is outside of the design cone of the other
failed design overdesignbull Mode A has positive timing slacks with respect to mode B
mode A dominates mode Bbull Equivalent dominance no mode is dominated by the other
bull Modes are in each othersrsquo design cone
Voltage
Frequency
A
Negative Slacks = failed design
Positive Slacks = overdesign
B
C
Design Cone of mode A
Multi-mode signoff at modes which do not exhibit equivalent dominance leads to overdesign
Guideline search for signoff modes within design cone reduce overdesign
81ISVLSI-2014 invited talk 140710
Our Method Global Optimizationbull Iteratively sample and refine power models
bull Avoid circuit implementation at each modebull Small constant of runs is enough Scalable
Sample (SPampR)
Construct power models
Estimate optimal signoff modes
Sample (SPampR)
Refine power models
Adaptive search
Global optimization flow
09 10 11 1214
15
16
17
18
19
201st 2nd real
Signoff Voltage (v)
Po
wer
(m
W)
Power estimation of adaptive search
bull Ovals indicate sample pointsbull 1st 2nd power from power models at first
second iterationbull real power from real implemented circuits
Design AESf 700MHz
82ISVLSI-2014 invited talk 140710
Classes of Closed-Loop AVS
bull Critical path may be difficult to identify (IP from 3rd party)
bull Calibrating monitors at multiple modesvoltages requires long test time
Closed-Loop AVS
Design-dependent replica
In-situmonitor
Generic monitor
bull Does not capture design-specific performance variation
82
This work Tunable monitor for closed-loop AVSbull Can be applied as a generic monitorbull Or tuned to capture design-specific performance
83ISVLSI-2014 invited talk 140710
Design of RO with Tunable Vmin
bull Identified two circuit knobs to tune Vmin
bull Series resistancebull Cell types (INV NAND NOR)
bull Proposed circuitbull Different cell type covers different process cornersbull Tune series resistance of each stage to high or low
1 bit 1 bit 1 bit Control pins
High resistance
Low resistance
84ISVLSI-2014 invited talk 140710
Benefit of Resilience Cost Reductionbull Reference flows
bull Pure-margin (PM) conventional methodology w only margin insertionbull Brute-force (BF) insert error-tolerant FFs at timing-critical endpoints
bull Proposed method (CO) achieves up to 20 energy reduction compared to reference methods
bull Resilience benefits increase with safety margin
PM BF CO PM BF CO PM BF CO25
30
35
40
45
50
55 Energy penalty of throughput degradationEnergy penalty of additional circuitsEnergy wo resilience
En
erg
y (
mJ
)
Large marginMedium marginSmall margin
MUL
PM BF CO PM BF CO PM BF CO25
27
29
31
33
35
En
erg
y (
mJ
)
EXU
Large marginMedium marginSmall margin
Smallmediumlarge margin safety margin = 51015 of clock period
85ISVLSI-2014 invited talk 140710
Increased Benefit of Resilience With AVSbull AVS (Adaptive Voltage Scaling) allows lower supply voltage for
resilient designs reduced powerbull We trade off between timing-error penalty vs reduced power at a
lower supply voltagebull Average 18 energy reduction compared to pure-margin designs
Resilience benefits increase in AVS context
084 088 092 096 10030
36
42
48
54
60brute-forcepure-marginCombOpt
Supply voltage (V)
En
erg
y (
mJ
)
084 086 088 09 092 094 096 098 1 10225
29
33
37
41
45brute-force
pure-margin
CombOpt
Supply voltage (V)
En
erg
y (
mJ
)
MUL EXU
Minimum achievable energy
86ISVLSI-2014 invited talk 140710
Overall Optimization Flow
bull Iteratively optimize with SEOpt and SkewOpt
Initial placement (all FFs = error-tolerant FFs)
Energy lt min energy
Save current solution
Margin insertion on K paths based on sensitivity function
Replace error-tolerant FFs w normal FFs
SEOpt
Activity-aware clock skew optimization
SkewOpt
Toward Holistic Modeling Margining and Tolerance of IC Variabi
IC Variability
Challenge Value of Technology
Solutions Modeling Margining Tolerance
Outline
BEOL Corner Optimization
Proposed Timing Signoff Flow
Conventional BEOL Corners
Statistical RC Model
Pessimism of Conventional BEOL Corners (CBC)
Intuition on Delay Variability Across Cw RCw
Intuition on Delay Variability Across Cw RCw (2)
Scaling Factor α and Delay Variation
Find Paths for Which TBCs Can Be Used
Determining α Arcw and Acw
Benefits of Tightened BEOL Corners
Outline (2)
How to Minimize Cost of Resilience
Tradeoff Resilience Cost vs Datapath Cost
Selective-Endpoint Optimization (SEOpt)
Clock Skew Optimization (SkewOpt)
Overall Optimization Flow
Benefit of Low-Cost Resilience
Increased Benefit of Resilience with AVS
Outline (3)
Breaking Chicken-Egg Loops Less Margin
Derated Library Characterization and AVS
Library Characterization for AVS
Power vs Area Across Different Signoffs
Heuristics 1
Vfinal Estimation
Observation and Heuristic 2
Proposed Library Characterization Flow
Power vs Area for All Designs
Also Multi-Mode Signoff Choices Matter
Also Tunable Monitors Less Margin
Also Tunable Monitors Less Margin (2)
Outline (4)
Conclusions
Thank You
Backup
Power Penalty to Fix EM with AVS
Homogeneous Corners
Homogeneous Corners (2)
Correlation Matrix
Wiring Structure in Timing-Critical Paths (2)
Delay Variation
Delay Variation (2)
Non-Homogeneous Corner
Opportunities for Tightened BEOL Corners
Wiring Structure in Timing-Critical Paths (2)
Proposed Timing Signoff Flow (2)
Experiment Setup
Further Analysis
Scaling Factor Results
Benefits of Tightened BEOL Corners (1)
Heuristics 1 (2)
Vfinal Estimation (2)
Observation and Heuristic 2 (2)
Technology and Benchmark Circuits
A Reference Signoff Flow
Experiment Setup (2)
ldquoChicken and Eggrdquo Loop
Bias Temperature Instability (BTI)
Observation 1
Results for DC Scenario
Problem Signoff Corner Definition
AVS Signoff Corner Selection
AVS Impact on EM Lifetime
EM Impact on AVS Scheduling
What is ldquoSignoffrdquo
Statistical Timing Analysis (1)
Statistical Timing Analysis (2)
Resilient Designs
Resilience Cost Reduction Problem
Selective-Endpoint Optimization
Process-Aware Vdd Scaling (PVS)
Challenge Variability
Energy Reduction in AVS Context
Our Concept Mode Dominance
Our Method Global Optimization
Classes of Closed-Loop AVS
Design of RO with Tunable Vmin
Benefit of Resilience Cost Reduction
Increased Benefit of Resilience With AVS
Overall Optimization Flow (2)
71ISVLSI-2014 invited talk 140710
What is ldquoSignoffrdquo
bull Foundation of contract between design house and foundrybull ldquochip should workrdquo stack of models margins analysesbull Function timing signal integrity power integrity hellip
Nominal VddStatic IR drop
Power grid IR gradientDynamic IR
HCINBTI
Signoff Vdd
Voltage
Problem Margins = pessimism
overdesign schedule delay
ldquomargin stackrdquo for voltage signoff
Operating voltage
72ISVLSI-2014 invited talk 140710
Statistical Timing Analysis (1)
bull Delay sensitivity of path pj to variation source zv
bull Assumptions bull Δdjv is linear with respect to variation sources
bull Variation sources are normal distributions
bull Obtain Δdjv using 28 runs of RC extraction and static timing analysis (STA)
28 itf files (27 variation
sources + Ytyp)
Routed Netlist
RC extraction
STA
Δdjv
Δdjv = [ - ] 3dj(Yv) dj(Ytyp)
Note Path delay includes gate and wire delays
73ISVLSI-2014 invited talk 140710
Statistical Timing Analysis (2)bull Σ is the correlation matrix for variation sources (eg 27 x 27)
bull Σ = λλT (Note λ is obtained by Cholesky decomposition)
h h119879 119903119900119906119892 119901119906119905=1minus119864119877119879
+1minus119864119877119903times119879
recovery cycles
Clock period
Error rate [Kahng10]
76ISVLSI-2014 invited talk 140710
Selective-Endpoint Optimizationbull Optimize fanin cone w tighter constraints Allows replacement of Razor FF w normal FFbull Trade off cost of resilience vs data path optimization
h h119879 119903119900119906119892 119901119906119905=1minus119864119877119879
+1minus119864119877119903times119879
recovery cycles
Clock period
Error rate [Kahng10]
76ISVLSI-2014 invited talk 140710
Selective-Endpoint Optimizationbull Optimize fanin cone w tighter constraints Allows replacement of Razor FF w normal FFbull Trade off cost of resilience vs data path optimization
h h119879 119903119900119906119892 119901119906119905=1minus119864119877119879
+1minus119864119877119903times119879
recovery cycles
Clock period
Error rate [Kahng10]
76ISVLSI-2014 invited talk 140710
Selective-Endpoint Optimizationbull Optimize fanin cone w tighter constraints Allows replacement of Razor FF w normal FFbull Trade off cost of resilience vs data path optimization
h h119879 119903119900119906119892 119901119906119905=1minus119864119877119879
+1minus119864119877119903times119879
recovery cycles
Clock period
Error rate [Kahng10]
76ISVLSI-2014 invited talk 140710
Selective-Endpoint Optimizationbull Optimize fanin cone w tighter constraints Allows replacement of Razor FF w normal FFbull Trade off cost of resilience vs data path optimization
h h119879 119903119900119906119892 119901119906119905=1minus119864119877119879
+1minus119864119877119903times119879
recovery cycles
Clock period
Error rate [Kahng10]
76ISVLSI-2014 invited talk 140710
Selective-Endpoint Optimizationbull Optimize fanin cone w tighter constraints Allows replacement of Razor FF w normal FFbull Trade off cost of resilience vs data path optimization
Energy Reduction in AVS Contextbull Adaptive voltage scaling allows lower supply voltage for resilient
designs thus reduced powerbull Proposed method trades off between timing-error penalty vs
reduced power at a lower supply voltagebull Proposed method achieves an average of 18 energy reduction
compared to pure-margin designs Resilience benefits increase in the context of AVS strategy
084 088 092 096 10030
36
42
48
54
60brute-forcepure-marginCombOpt
Supply voltage (V)
En
erg
y (
mJ
)
084 086 088 09 092 094 096 098 1 10225
29
33
37
41
45brute-force
pure-margin
CombOpt
Supply voltage (V)
En
erg
y (
mJ
)
MUL EXU
Minimum achievable energy
80ISVLSI-2014 invited talk 140710
Our Concept Mode Dominancebull Design cone (of mode A) is the union of all the feasible operating modes for
circuits signed of at mode Abull Design cone is determined by tradeoff between voltage and frequency (mainly
threshold voltages)bull One mode is outside of the design cone of the other
failed design overdesignbull Mode A has positive timing slacks with respect to mode B
mode A dominates mode Bbull Equivalent dominance no mode is dominated by the other
bull Modes are in each othersrsquo design cone
Voltage
Frequency
A
Negative Slacks = failed design
Positive Slacks = overdesign
B
C
Design Cone of mode A
Multi-mode signoff at modes which do not exhibit equivalent dominance leads to overdesign
Guideline search for signoff modes within design cone reduce overdesign
81ISVLSI-2014 invited talk 140710
Our Method Global Optimizationbull Iteratively sample and refine power models
bull Avoid circuit implementation at each modebull Small constant of runs is enough Scalable
Sample (SPampR)
Construct power models
Estimate optimal signoff modes
Sample (SPampR)
Refine power models
Adaptive search
Global optimization flow
09 10 11 1214
15
16
17
18
19
201st 2nd real
Signoff Voltage (v)
Po
wer
(m
W)
Power estimation of adaptive search
bull Ovals indicate sample pointsbull 1st 2nd power from power models at first
second iterationbull real power from real implemented circuits
Design AESf 700MHz
82ISVLSI-2014 invited talk 140710
Classes of Closed-Loop AVS
bull Critical path may be difficult to identify (IP from 3rd party)
bull Calibrating monitors at multiple modesvoltages requires long test time
Closed-Loop AVS
Design-dependent replica
In-situmonitor
Generic monitor
bull Does not capture design-specific performance variation
82
This work Tunable monitor for closed-loop AVSbull Can be applied as a generic monitorbull Or tuned to capture design-specific performance
83ISVLSI-2014 invited talk 140710
Design of RO with Tunable Vmin
bull Identified two circuit knobs to tune Vmin
bull Series resistancebull Cell types (INV NAND NOR)
bull Proposed circuitbull Different cell type covers different process cornersbull Tune series resistance of each stage to high or low
1 bit 1 bit 1 bit Control pins
High resistance
Low resistance
84ISVLSI-2014 invited talk 140710
Benefit of Resilience Cost Reductionbull Reference flows
bull Pure-margin (PM) conventional methodology w only margin insertionbull Brute-force (BF) insert error-tolerant FFs at timing-critical endpoints
bull Proposed method (CO) achieves up to 20 energy reduction compared to reference methods
bull Resilience benefits increase with safety margin
PM BF CO PM BF CO PM BF CO25
30
35
40
45
50
55 Energy penalty of throughput degradationEnergy penalty of additional circuitsEnergy wo resilience
En
erg
y (
mJ
)
Large marginMedium marginSmall margin
MUL
PM BF CO PM BF CO PM BF CO25
27
29
31
33
35
En
erg
y (
mJ
)
EXU
Large marginMedium marginSmall margin
Smallmediumlarge margin safety margin = 51015 of clock period
85ISVLSI-2014 invited talk 140710
Increased Benefit of Resilience With AVSbull AVS (Adaptive Voltage Scaling) allows lower supply voltage for
resilient designs reduced powerbull We trade off between timing-error penalty vs reduced power at a
lower supply voltagebull Average 18 energy reduction compared to pure-margin designs
Resilience benefits increase in AVS context
084 088 092 096 10030
36
42
48
54
60brute-forcepure-marginCombOpt
Supply voltage (V)
En
erg
y (
mJ
)
084 086 088 09 092 094 096 098 1 10225
29
33
37
41
45brute-force
pure-margin
CombOpt
Supply voltage (V)
En
erg
y (
mJ
)
MUL EXU
Minimum achievable energy
86ISVLSI-2014 invited talk 140710
Overall Optimization Flow
bull Iteratively optimize with SEOpt and SkewOpt
Initial placement (all FFs = error-tolerant FFs)
Energy lt min energy
Save current solution
Margin insertion on K paths based on sensitivity function
Replace error-tolerant FFs w normal FFs
SEOpt
Activity-aware clock skew optimization
SkewOpt
Toward Holistic Modeling Margining and Tolerance of IC Variabi
IC Variability
Challenge Value of Technology
Solutions Modeling Margining Tolerance
Outline
BEOL Corner Optimization
Proposed Timing Signoff Flow
Conventional BEOL Corners
Statistical RC Model
Pessimism of Conventional BEOL Corners (CBC)
Intuition on Delay Variability Across Cw RCw
Intuition on Delay Variability Across Cw RCw (2)
Scaling Factor α and Delay Variation
Find Paths for Which TBCs Can Be Used
Determining α Arcw and Acw
Benefits of Tightened BEOL Corners
Outline (2)
How to Minimize Cost of Resilience
Tradeoff Resilience Cost vs Datapath Cost
Selective-Endpoint Optimization (SEOpt)
Clock Skew Optimization (SkewOpt)
Overall Optimization Flow
Benefit of Low-Cost Resilience
Increased Benefit of Resilience with AVS
Outline (3)
Breaking Chicken-Egg Loops Less Margin
Derated Library Characterization and AVS
Library Characterization for AVS
Power vs Area Across Different Signoffs
Heuristics 1
Vfinal Estimation
Observation and Heuristic 2
Proposed Library Characterization Flow
Power vs Area for All Designs
Also Multi-Mode Signoff Choices Matter
Also Tunable Monitors Less Margin
Also Tunable Monitors Less Margin (2)
Outline (4)
Conclusions
Thank You
Backup
Power Penalty to Fix EM with AVS
Homogeneous Corners
Homogeneous Corners (2)
Correlation Matrix
Wiring Structure in Timing-Critical Paths (2)
Delay Variation
Delay Variation (2)
Non-Homogeneous Corner
Opportunities for Tightened BEOL Corners
Wiring Structure in Timing-Critical Paths (2)
Proposed Timing Signoff Flow (2)
Experiment Setup
Further Analysis
Scaling Factor Results
Benefits of Tightened BEOL Corners (1)
Heuristics 1 (2)
Vfinal Estimation (2)
Observation and Heuristic 2 (2)
Technology and Benchmark Circuits
A Reference Signoff Flow
Experiment Setup (2)
ldquoChicken and Eggrdquo Loop
Bias Temperature Instability (BTI)
Observation 1
Results for DC Scenario
Problem Signoff Corner Definition
AVS Signoff Corner Selection
AVS Impact on EM Lifetime
EM Impact on AVS Scheduling
What is ldquoSignoffrdquo
Statistical Timing Analysis (1)
Statistical Timing Analysis (2)
Resilient Designs
Resilience Cost Reduction Problem
Selective-Endpoint Optimization
Process-Aware Vdd Scaling (PVS)
Challenge Variability
Energy Reduction in AVS Context
Our Concept Mode Dominance
Our Method Global Optimization
Classes of Closed-Loop AVS
Design of RO with Tunable Vmin
Benefit of Resilience Cost Reduction
Increased Benefit of Resilience With AVS
Overall Optimization Flow (2)
76ISVLSI-2014 invited talk 140710
Selective-Endpoint Optimizationbull Optimize fanin cone w tighter constraints Allows replacement of Razor FF w normal FFbull Trade off cost of resilience vs data path optimization