university of michigan electrical engineering and computer science 1 cost-efficient soft error...
Post on 21-Dec-2015
213 views
TRANSCRIPT
1 University of MichiganElectrical Engineering and Computer Science
Cost-Efficient Soft Error Protection for Embedded Microprocessors
Jason Blome1, Shuguang Feng1, Shantanu Gupta1, Scott Mahlke1, Daryl Bradley2
University of Michigan1
ARM, Ltd. 2
2 University of MichiganElectrical Engineering and Computer Science
The Soft Error Problem
transient fault soft error
0CLK
DQ1
3 University of MichiganElectrical Engineering and Computer Science
Fault Masking
• Logical: faulted value does not affect logical operation of the circuit
0
0
• Latching-Window: the fault pulse does not reach a state element within the latching window
• Electrical: the fault pulse is electrically attenuated by subsequent gates in the circuit
• Architectural/Software: incorrect state is written before it is read
CLK
tsetup thold
mov r5, 8
mov r2, 4------
…d
eco
der
Register File
012345
add r6, r2, r5
mov r5, 8
mov r2, 4
98
4add r6, r2, r5
4 University of MichiganElectrical Engineering and Computer Science
Soft Error Rate Trends
Shivakumar 2002
Soft Error Rate Contributions
Mitra 2005
49%
11%
40%
StaticCombinationalLogicUnprotectedSRAMs
SequentialElements
Increasing contribution of faults in combinational logic to the overall soft error rate
5 University of MichiganElectrical Engineering and Computer Science
Outline
• Soft error analysis setup• Summary of fault analysis results• Fault tolerance techniques
► Register value cache► Strategic deployment of fault detectors
• Conclusion
6 University of MichiganElectrical Engineering and Computer Science
Fault Analysis Frameworktestbench
referencedesign
testdesign
report generationreport generation
benchmarkbenchmark
fault injection/error analysis framework
error checkingand logging
fault injectionscheduler
RegisterBank
RegisterBank
Data InterfaceData Interface
InstructionAddress
Logic
InstructionAddress
Logic
DataAddress
Logic
DataAddress
Logic
MultiplyMultiply ALU
ShiftShift
Instruction DecodeInstruction Decode
ARM926EJ-S
Instruction FetchInstruction Fetch
Datacache
Datacache
MMUMMU
Instructioncache
Instructioncache
MMUMMU
Bus Interface
Write Buffer/Bus Interface
MuxArray
MuxArray
7 University of MichiganElectrical Engineering and Computer Science
Observed Error Rates
Error Site Error Rate
Microarchitectural State 94%
Architectural State 7%
Error Site Error Rate
Microarchitectural State 16%
Architectural State 4%
Faults Occurring in Registers
Faults Occurring in Combinational Logic
At the software interface, error rates within 3%
94%
16%
7%
4%
8 University of MichiganElectrical Engineering and Computer Science
Impact of Fault Injection
05
101520253035404550
0 5 10 15 20Cycle
Nu
mb
er
of
Err
ors
Comb. Logic:Microarchitectural StateErrors
Comb. Logic: ArchitecturalState Errors
Seq. State:Microarchitectural StateErrors
Seq. State: ArchitecturalState Errors
9 University of MichiganElectrical Engineering and Computer Science
Targeting the Faults that Count
• ARM926EJ-S register file consumes 8.7% of total core area
► Responsible for 57.4% of architectural errors
• Register file area dominated by combinational logic
► ECC cost, efficacy?
10 University of MichiganElectrical Engineering and Computer Science
The Register Value Cache
Register Value Cache
Register File
CMP
CMP
CMP
Stall/Check CRC
…
dec
ode
r
012345
x
x…
10
32
54
Read/WriteAddr/Data Read Result
11 University of MichiganElectrical Engineering and Computer Science
The Register Value CacheValid
Read/WriteAddr
ReadData
Index Array
Value Array
Previous Read Values
CRC
CRC
WriteData
WriteData
Error
CMP Error
Read OperationWrite OperationCheck Operation
12 University of MichiganElectrical Engineering and Computer Science
Example
------
…
dec
ode
r
Register File
Register Cache
x
x…
----
4
8
40
48
mov r5, 8
mov r2, 4
add r3, r1, r4
mov r5, 8
mov r2, 4
add r3, r2, r5
CheckCRC
012345
10
32
54
---
-8 crc4 crc
15 University of MichiganElectrical Engineering and Computer Science
What About the Rest?• Leverage fault fanout to place detectors at
likely targets
17 University of MichiganElectrical Engineering and Computer Science
Transient Fault Detector
Main Flip-Flop
ShadowLatchDelay
D
CLK
Error
Q
ShadowLatch
A Self-Tuning DVS Processor Using Delay-Error Detection and Correction: S. Das 2006
Main Flip-Flop
18 University of MichiganElectrical Engineering and Computer Science
Glitch Detector CoveragePower Area
Percent Overhead Percent Overhead
Co
ve
rag
e
Co
ve
rag
e
19 University of MichiganElectrical Engineering and Computer Science
Combined Technique CoveragePower Area
Percent Overhead Percent Overhead
Co
ve
rag
e
Co
ve
rag
e
20 University of MichiganElectrical Engineering and Computer Science
Conclusion
• Circuit level soft error analysis offers significant insight
• Faults in combinational logic do not require structural duplication
► Coverage versus cost tradeoffs available► Significant benefits in compromise
• 85% fault coverage for only 5.5% area► 2-3x increase in MTTF