Analyzing Circuit-aware Microarchitectural Reliability
Taniya Siddiqua , Paul Lee
[email protected], [email protected] of Virginia, Charlottesville
Motivation
Hard Errors(EM, TC, SM, TDDB,
NBTI)
Transient Faults
Tran
sist
or S
ize
Time
5%
Problem DescriptionArchitects focus on this problem at architecture-level granularityPoint of focus are architectural structures for e.g. caches, ALU etc.Reliability predictions are circuit-agnosticThere is a potential gap between architecture and circuit level reliability estimation
10%
Problem DescriptionWe :
Show that circuit-level granularity affects architecture-level granularity reliability simulationsLook into 2 hard-errors viz. NBTI (or Negative Bias Temperature Instability) and TDDB (or Time Dependent Dielectric Breakdown) at architecture and circuit level on ALUDetermine the effect of scaling of NBTI and TDDB on ALU up to 22nm technologyPropose a design of NBTI-aware ALU, which utilizes architecture as well as circuit-level optimizations
15%
NBTI – A quick guideKey reliability issue related to P-Channel MOSConcerned with MOS devices stressed with negative gate voltagesManifests as the threshold voltage increase and drain current decreaseConsequently the circuit slows down – timing constraintGood News! -- Recovery starts as soon as stress is removed
25%
Architecture-level Reliability SimulationWe simulate:
2-wide issue core having 2 INT ALUsSimpleScalar 3.0 for modeling processor behaviorWattch and HotSpot for simulating power and temperature behavior respectively Estimate lifetime of 1st INT ALU Lifetimes of ALUs are projected based on MTTF for NBTI
30.25 0.252
1 001
2( )4V ( ) . ( ). .. .ox
ts ox gs t stress
ox
ta GS
ox
V VEkT t Eqt
K C V V e T te
1 0 2 02.(1 )
(1 ) ( )
a
a
E
kTx rec
t tsE
kTox stress rec
t e T tV V
t e t t
1 anbtiE
kTNBTI
gs
MTTF eV
35%
Circuit-level Reliability SimulationWe :
Use Kogge-Stone adder circuit for ALUUse average temperature of 1st ALU from architectural-level reliability simulation and feed to Cadence frameworkCalculate stress and recovery time based on utilization pattern obtained from architectural-level reliability simulationCalculate lifetime based on circuit-delay to be 25 % of original delay
45%
Comparison of Approaches
NBTI
TDDB
Architecture-level Simulation
Lifetime: 5.7 Yrs
?
Circuit-level Simulation
Lifetime: 7 Yrs
?
50%
Scaling EffectWe :
Show scaling effect for 65nm, 45nm, 32nm, 22nmShow output delay for NBTI for each technology scale after 7 yrs65 nm (25%), 45 nm(27%), 32 nm (31%), 22 nm (46%) Require design of NBTI-aware ALU
55%
NBTI-aware ALU DesignWe :
Determine that SPEC2000 INT benchmarks have 50 % operands of 16-bit sizePartition 64-bit ALU into four 8-bit and two 16-bit independent blocks to support 8,16,32 and 64bit operationAim is to use utilize idle time and narrow-width operands to increase recovery time of PMOS devices Use Power gating technique Use round-robin mechanism to let all the blocks of ALU experience equal recovery timeAfter 7 yrs the delay is only 10% - Achieves 60% improvement over non-NBTI aware ALUTradeoff!!
60%
TDDB – A quick guide
70%
Gate dielectric wears down over time due to electric field and failure occurs when there is a short through the gate oxideUltra-thin gate oxide breakdown is highly dependent on temperature, but also dependent on Vgs
55%45%
The SplitLess Than 0.55VGreater Than 0.55V
Circuit-level Reliability SimulationWe :
Use Pin to get a set of inputs used when running gzip and use those inputs to find an input pattern based on the samples taken from PinUse Cadence Spectre simulatorUse Kogge-Stone adder circuit for ALUUse average temperature of 1st ALU from architectural-level reliability simulation and feed to Cadence frameworkExtract Vgs from every device in Kogge-Stone adder
80%
0 3 6 9 12 15 18 21 24 27 300
50000010000001500000200000025000003000000
Input Distribution
inputs1 inputs2
bit number
# of
tim
es h
igh
0.000169 0.114496 1.099531 1.0998420
20
40
60
80
100
120
65nm Vgs Distribution
Vgs
Num
ber o
f Dev
ices
Comparison of Approaches
NBTI
TDDB
Architecture-level Simulation
Lifetime: 5.7 Yrs
Lifetime: 5.09 Yrs
Circuit-level Simulation
Lifetime: 7 Yrs
Lifetime: 5.09 Yrs
85%
Scaling EffectWe :
Measured Vgs, but temperature needs to be investigated.
95%
65 nm 45 nm 32 nm 22 nm
Vgs<Vdd/2Min Vgs 0V 0V 0V 0V
Max Vgs 0.255177V
0.248489V
0.24522V 0.255226V
Mean Vgs 0.032351V
0.033159V
0.034129V
0.035485V
StdDev Vgs
0.077101V
0.077045V
0.077484V
0.076488V
Vgs>Vdd/2Min Vgs 1.099435
V0.999175V
0.89365V 0.789839V
Max Vgs 1.1V 1V 0.9V 0.8V
Mean Vgs 1.099834V
0.99764V 0.899567V
0.797425V
StdDev Vgs
0.000117V
0.000181V
0.000335V
0.001789V
0.000423 0.24643 0.9995760
20
40
60
80
100
120
45nm Vgs Distribution
Vgs
Num
ber o
f Dev
ices
0.000394 0.12667 0.2447580.8992310.8999650
50
100
150
32nm Vgs Distribution
Vgs
Num
ber o
f Dev
ices
0.001243 0.002811 0.004068 0.225921 0.794278 0.796235 0.7980430
20
40
60
80
100
120
22nm Vgs Distribution
Vgs
Num
ber o
f Dev
ices
Conclusion
For some problems like TDDB, the Architecture / Circuit level simulation gap is almost nonexistentFor other problems like NBTI, the Architecture / Circuit level simulation gap is significant and combining both approaches can yield better designs
100%
Thank you
Questions ?