self-calibrating online wearout detection authors: jason blome shuguang feng
DESCRIPTION
Self-calibrating Online Wearout Detection Authors: Jason Blome Shuguang Feng Shantanu Gupta Scott Mahlke. MICRO-40 December 3, 2007. [Srinivasan, DSN‘04]. [Borkar, MICRO‘05]. Motivation. “Designing Reliable Systems from Unreliable Components…” - PowerPoint PPT PresentationTRANSCRIPT
![Page 1: Self-calibrating Online Wearout Detection Authors: Jason Blome Shuguang Feng](https://reader036.vdocuments.mx/reader036/viewer/2022062315/568159f3550346895dc73ca5/html5/thumbnails/1.jpg)
1 University of MichiganElectrical Engineering and Computer Science
University of MichiganElectrical Engineering and Computer Science
Self-calibrating Online Wearout Detection
Authors: Jason Blome
Shuguang Feng
Shantanu Gupta
Scott Mahlke
MICRO-40
December 3, 2007
![Page 2: Self-calibrating Online Wearout Detection Authors: Jason Blome Shuguang Feng](https://reader036.vdocuments.mx/reader036/viewer/2022062315/568159f3550346895dc73ca5/html5/thumbnails/2.jpg)
2 University of MichiganElectrical Engineering and Computer Science
University of MichiganElectrical Engineering and Computer Science
Motivation
“Designing Reliable Systems from Unreliable Components…”
- Shekhar Borkar (Intel)
[Srinivasan, DSN‘04] [Borkar, MICRO‘05]
More failures to comeFailures will be wearout
induced
![Page 3: Self-calibrating Online Wearout Detection Authors: Jason Blome Shuguang Feng](https://reader036.vdocuments.mx/reader036/viewer/2022062315/568159f3550346895dc73ca5/html5/thumbnails/3.jpg)
3 University of MichiganElectrical Engineering and Computer Science
University of MichiganElectrical Engineering and Computer Science
Current Approaches
Traditional Design margins Burn-in
Detection: based on replication of computation TMR (Tandem/HP NonStop servers) DIVA (Bower, MICRO’05)
Prediction: utilizes precise analytical models and/or sensors
Canary circuits (SentinelSilicion, RidgeTop) RAMP (Srinivasan, UIUC/IBM)
RA
MP
CostlyStatic
Impractical
![Page 4: Self-calibrating Online Wearout Detection Authors: Jason Blome Shuguang Feng](https://reader036.vdocuments.mx/reader036/viewer/2022062315/568159f3550346895dc73ca5/html5/thumbnails/4.jpg)
4 University of MichiganElectrical Engineering and Computer Science
University of MichiganElectrical Engineering and Computer Science
Wearout Mechanisms
Many failure mechanisms have been shown to be progressive
Hot carrier injection (HCI)
Oxide
Electromigration (EM) Oxide Breakdown (OBD)
GS
I gs DIgd
B
N+N+
P-wellIgb
I gcsIgcd
Negative Bias Temperature Inversion (NBTI)
Oxide
![Page 5: Self-calibrating Online Wearout Detection Authors: Jason Blome Shuguang Feng](https://reader036.vdocuments.mx/reader036/viewer/2022062315/568159f3550346895dc73ca5/html5/thumbnails/5.jpg)
5 University of MichiganElectrical Engineering and Computer Science
University of MichiganElectrical Engineering and Computer Science
Objective
Propose a failure prediction technique that exploits the progressive nature of wearout
Monitor impact on path delays
Prediction
• Monitors evolution of wearout
• Proactive
• enables failure avoidance/mitigation
• Continuous feedback
• False negatives and positives
Detection
• Identifies existing fault
• Reactive
• enables failure recovery
• End-of-life feedback
• False negatives
![Page 6: Self-calibrating Online Wearout Detection Authors: Jason Blome Shuguang Feng](https://reader036.vdocuments.mx/reader036/viewer/2022062315/568159f3550346895dc73ca5/html5/thumbnails/6.jpg)
6 University of MichiganElectrical Engineering and Computer Science
University of MichiganElectrical Engineering and Computer Science
GGGGG
Oxide Breakdown (OBD)
G
Accumulation of defects leads to a conductive path
G
ΔIoxide
GS D
B
N+N+
P-well
Oxide
Percolation Model [Stathis, JAP‘06]
![Page 7: Self-calibrating Online Wearout Detection Authors: Jason Blome Shuguang Feng](https://reader036.vdocuments.mx/reader036/viewer/2022062315/568159f3550346895dc73ca5/html5/thumbnails/7.jpg)
7 University of MichiganElectrical Engineering and Computer Science
University of MichiganElectrical Engineering and Computer Science
OBD HSPICE Model
Post-breakdown leakage modeling
[Rodriguez, Stathis, Linder, IRPS ‘03]
0
0
gdgd
gsgs
IKI
IKI
GS
I gs DIgd
B
N+N+
P-wellIgb
I gcsIgcd
unchangedremain
and ,, gbgcdgcs III
[BSIM4.6.0, ‘06]
GS
I gs DIgd
B
N+N+
P-wellIgb
I gcsIgcd
![Page 8: Self-calibrating Online Wearout Detection Authors: Jason Blome Shuguang Feng](https://reader036.vdocuments.mx/reader036/viewer/2022062315/568159f3550346895dc73ca5/html5/thumbnails/8.jpg)
8 University of MichiganElectrical Engineering and Computer Science
University of MichiganElectrical Engineering and Computer Science
Characterization Testbench
tcircuit
tcell
90nm standard cell library
BUFX4 BUFX4
FO4GATE FO4BUFX4
DC
Gate UUT
![Page 9: Self-calibrating Online Wearout Detection Authors: Jason Blome Shuguang Feng](https://reader036.vdocuments.mx/reader036/viewer/2022062315/568159f3550346895dc73ca5/html5/thumbnails/9.jpg)
9 University of MichiganElectrical Engineering and Computer Science
University of MichiganElectrical Engineering and Computer Science
Impact on Propagation Delay
![Page 10: Self-calibrating Online Wearout Detection Authors: Jason Blome Shuguang Feng](https://reader036.vdocuments.mx/reader036/viewer/2022062315/568159f3550346895dc73ca5/html5/thumbnails/10.jpg)
10 University of MichiganElectrical Engineering and Computer Science
University of MichiganElectrical Engineering and Computer Science
Delay Profiling Unit (DPU)
input signal
LatencySampling
1 1
0
0
0
0
0
0
01
1
1
1
1
1
0
0
1
1
1
uArch Module
![Page 11: Self-calibrating Online Wearout Detection Authors: Jason Blome Shuguang Feng](https://reader036.vdocuments.mx/reader036/viewer/2022062315/568159f3550346895dc73ca5/html5/thumbnails/11.jpg)
11 University of MichiganElectrical Engineering and Computer Science
University of MichiganElectrical Engineering and Computer Science
TRIX Analysis
Magnitude of divergence between TRIXglobal
and TRIXlocal reflects amount of degradation
![Page 12: Self-calibrating Online Wearout Detection Authors: Jason Blome Shuguang Feng](https://reader036.vdocuments.mx/reader036/viewer/2022062315/568159f3550346895dc73ca5/html5/thumbnails/12.jpg)
12 University of MichiganElectrical Engineering and Computer Science
University of MichiganElectrical Engineering and Computer Science
Exponential Moving Average (EMA)
Triple-smoothed Exponential Moving Average
TRIX Analysis Details
size windowby the defined is where
)()( 11
tt EMApriceEMAtEMA
)()(
)()(
)()(
132
133
121
122
11
111
ttt
ttt
ttt
EMAEMAEMAtEMA
EMAEMAEMAtEMA
EMApriceEMAtEMA
![Page 13: Self-calibrating Online Wearout Detection Authors: Jason Blome Shuguang Feng](https://reader036.vdocuments.mx/reader036/viewer/2022062315/568159f3550346895dc73ca5/html5/thumbnails/13.jpg)
13 University of MichiganElectrical Engineering and Computer Science
University of MichiganElectrical Engineering and Computer Science
Noisy Latency Profile
94
96
98
100
102
104
106
108
110
Raw Latency Profile Trix Profile (local) Trix Profile (global)
Per
cen
t N
om
inal
Del
ay (
%)
Increasing Age
![Page 14: Self-calibrating Online Wearout Detection Authors: Jason Blome Shuguang Feng](https://reader036.vdocuments.mx/reader036/viewer/2022062315/568159f3550346895dc73ca5/html5/thumbnails/14.jpg)
14 University of MichiganElectrical Engineering and Computer Science
University of MichiganElectrical Engineering and Computer Science
DPU with TRIX Hardware
input signal
LatencySampling
TRIXl
Calculation
Prediction
TRIXg
Calculation
0
0
0
0
0
0
0
1
1
1
![Page 15: Self-calibrating Online Wearout Detection Authors: Jason Blome Shuguang Feng](https://reader036.vdocuments.mx/reader036/viewer/2022062315/568159f3550346895dc73ca5/html5/thumbnails/15.jpg)
15 University of MichiganElectrical Engineering and Computer Science
University of MichiganElectrical Engineering and Computer Science
Wearout Detection Unit (WDU)
LatencySampling
Prediction
TRIXl
Calculation+
TRIXg
Calculation
![Page 16: Self-calibrating Online Wearout Detection Authors: Jason Blome Shuguang Feng](https://reader036.vdocuments.mx/reader036/viewer/2022062315/568159f3550346895dc73ca5/html5/thumbnails/16.jpg)
16 University of MichiganElectrical Engineering and Computer Science
University of MichiganElectrical Engineering and Computer Science
Evaluation Framework
OR1200Verilog
OR1200Verilog
Synthesis and Place and Route
Synthesis and Place and Route
Timing, Power, and Temperature
Simulations
Timing, Power, and Temperature
Simulations
MediaBenchSuite
MediaBenchSuite
90nm Library
90nm Library
Fully Synthesized, P&R, OR1200 Core
Monte Carlo
Simulator
OBD Wearout Model
OBD Wearout Model
HSPICE Simulations
HSPICE Simulations
Gate-level Processor Simulator
Workload Simulator
Wearout Simulator
![Page 17: Self-calibrating Online Wearout Detection Authors: Jason Blome Shuguang Feng](https://reader036.vdocuments.mx/reader036/viewer/2022062315/568159f3550346895dc73ca5/html5/thumbnails/17.jpg)
17 University of MichiganElectrical Engineering and Computer Science
University of MichiganElectrical Engineering and Computer Science
WDU Accuracy
0
20
40
60
80
100
120
ALU Register File LSU Next PC
Module
Per
cent
age
(%)
Life Expended Signals Flagged
![Page 18: Self-calibrating Online Wearout Detection Authors: Jason Blome Shuguang Feng](https://reader036.vdocuments.mx/reader036/viewer/2022062315/568159f3550346895dc73ca5/html5/thumbnails/18.jpg)
18 University of MichiganElectrical Engineering and Computer Science
University of MichiganElectrical Engineering and Computer Science
WDU Overhead
0
5
10
15
20
25
30
35
40
45
50
1 2 4 8
# Signals Monitored
Per
cen
tag
e O
verh
ead
(%
)
Area-Hybrid Area-Hardware Power-Hybrid Power-Hardware
![Page 19: Self-calibrating Online Wearout Detection Authors: Jason Blome Shuguang Feng](https://reader036.vdocuments.mx/reader036/viewer/2022062315/568159f3550346895dc73ca5/html5/thumbnails/19.jpg)
19 University of MichiganElectrical Engineering and Computer Science
University of MichiganElectrical Engineering and Computer Science
WDU Overhead
0
0.5
1
1.5
2
2.5
3
1 2 4 8
# Signals Monitored
Per
cen
tag
e O
verh
ead
(%
)
Area-Hybrid Area-Hardware Power-Hybrid Power-Hardware
![Page 20: Self-calibrating Online Wearout Detection Authors: Jason Blome Shuguang Feng](https://reader036.vdocuments.mx/reader036/viewer/2022062315/568159f3550346895dc73ca5/html5/thumbnails/20.jpg)
20 University of MichiganElectrical Engineering and Computer Science
University of MichiganElectrical Engineering and Computer Science
Long-term Vision
Introspective Reliability Management (IRM) Intelligent reliability management directed by on-chip
sensor feedback
Prospective sensors Delay (WDU) Leakage/Vt Temperature
![Page 21: Self-calibrating Online Wearout Detection Authors: Jason Blome Shuguang Feng](https://reader036.vdocuments.mx/reader036/viewer/2022062315/568159f3550346895dc73ca5/html5/thumbnails/21.jpg)
21 University of MichiganElectrical Engineering and Computer Science
University of MichiganElectrical Engineering and Computer Science
Introspective Reliability Management
Sen
sor
Dat
a
Virtualization Layer
OS
Ru
nti
me
An
alys
is
Reliability Assesment
Scheduled Jobs IRM Policy
Raw
Sen
sor
Dat
a
Filt
ered
Dat
a S
trea
m
Job Assignment
Thread Migration
Power/CLK Gating
DVFS Configuration
WDU
WDU
WDU
WDU
WDU
Fil
teri
ng
an
d A
nal
ys
is
Raw
Sen
sor
Dat
a
Ag
gre
ga
te A
na
lys
is
Pro
cess
ed D
ata
Virtualization Layer Reliability Assesment
OS
Scheduled Jobs IRM Policy
Thread Migration
Reconfiguration
Power/CLK Gating
DVFS Settings
![Page 22: Self-calibrating Online Wearout Detection Authors: Jason Blome Shuguang Feng](https://reader036.vdocuments.mx/reader036/viewer/2022062315/568159f3550346895dc73ca5/html5/thumbnails/22.jpg)
22 University of MichiganElectrical Engineering and Computer Science
University of MichiganElectrical Engineering and Computer Science
Conclusions
Many progressive wearout phenomenon impact device-level performance.
It’s possible to characterize this impact and anticipate failures
WDU performance Failure predicted within 20% of end of life (tunable) Area overhead < 3% (hybrid)
Low-level sensors can be used to enable intelligent reliability management
![Page 23: Self-calibrating Online Wearout Detection Authors: Jason Blome Shuguang Feng](https://reader036.vdocuments.mx/reader036/viewer/2022062315/568159f3550346895dc73ca5/html5/thumbnails/23.jpg)
23 University of MichiganElectrical Engineering and Computer Science
University of MichiganElectrical Engineering and Computer Science
Questions?
?