elec 7770 advanced vlsi design spring 2014 soft errors and fault-tolerant design
DESCRIPTION
ELEC 7770 Advanced VLSI Design Spring 2014 Soft Errors and Fault-Tolerant Design. Vishwani D. Agrawal James J. Danaher Professor ECE Department, Auburn University Auburn, AL 36849 [email protected] http://www.eng.auburn.edu/~vagrawal/COURSE/E7770_Spr14. Soft Errors. - PowerPoint PPT PresentationTRANSCRIPT
![Page 1: ELEC 7770 Advanced VLSI Design Spring 2014 Soft Errors and Fault-Tolerant Design](https://reader035.vdocuments.mx/reader035/viewer/2022062301/56814fd1550346895dbd925b/html5/thumbnails/1.jpg)
Spring 2014, Apr 11 . . .Spring 2014, Apr 11 . . . ELEC 7770: Advanced VLSI Design (Agrawal)ELEC 7770: Advanced VLSI Design (Agrawal) 11
ELEC 7770ELEC 7770Advanced VLSI DesignAdvanced VLSI Design
Spring 2014Spring 2014 Soft Errors and Fault-Tolerant DesignSoft Errors and Fault-Tolerant Design
Vishwani D. AgrawalVishwani D. AgrawalJames J. Danaher ProfessorJames J. Danaher Professor
ECE Department, Auburn UniversityECE Department, Auburn University
Auburn, AL 36849Auburn, AL 36849
http://www.eng.auburn.edu/~vagrawal/COURSE/E7770_Spr14
![Page 2: ELEC 7770 Advanced VLSI Design Spring 2014 Soft Errors and Fault-Tolerant Design](https://reader035.vdocuments.mx/reader035/viewer/2022062301/56814fd1550346895dbd925b/html5/thumbnails/2.jpg)
Spring 2014, Apr 11 . . .Spring 2014, Apr 11 . . . ELEC 7770: Advanced VLSI Design (Agrawal)ELEC 7770: Advanced VLSI Design (Agrawal) 22
Soft ErrorsSoft Errors Soft errors are the errors caused by the Soft errors are the errors caused by the
operating environment.operating environment. They are not due to a permanent hardware fault.They are not due to a permanent hardware fault. Soft errors are intermittent or random, which Soft errors are intermittent or random, which
makes their testing unreliable.makes their testing unreliable. One way to deal with soft errors is to make One way to deal with soft errors is to make
hardware robust:hardware robust: Capable of detecting soft errorsCapable of detecting soft errors Capable of correcting soft errorsCapable of correcting soft errors Both measures are probabilisticBoth measures are probabilistic
![Page 3: ELEC 7770 Advanced VLSI Design Spring 2014 Soft Errors and Fault-Tolerant Design](https://reader035.vdocuments.mx/reader035/viewer/2022062301/56814fd1550346895dbd925b/html5/thumbnails/3.jpg)
Spring 2014, Apr 11 . . .Spring 2014, Apr 11 . . . ELEC 7770: Advanced VLSI Design (Agrawal)ELEC 7770: Advanced VLSI Design (Agrawal) 33
Some Early ReferencesSome Early References J. von Neumann, “Probabilistic Logics and the Synthesis J. von Neumann, “Probabilistic Logics and the Synthesis
of Reliable Organisms from Unreliable Components,” pp. of Reliable Organisms from Unreliable Components,” pp. 329-378, 1959, in A. H. Taub, editor, 329-378, 1959, in A. H. Taub, editor, John von Neumann: John von Neumann: Collected WorksCollected Works, , Volume V: Design of Computers, Theory Volume V: Design of Computers, Theory of Automata and Numerical Analysisof Automata and Numerical Analysis, , Oxford University Press, 1963. Oxford University Press, 1963.
M. A. Breuer, “Testing for Intermittent Faults in Digital M. A. Breuer, “Testing for Intermittent Faults in Digital Circuits,” Circuits,” IEEE Trans. ComputersIEEE Trans. Computers, vol. C-22, no. 3, pp. , vol. C-22, no. 3, pp. 241-246, March 1973.241-246, March 1973.
T. C. May and M. H. Woods, “Alpha-Particle-Induces Soft T. C. May and M. H. Woods, “Alpha-Particle-Induces Soft Errors in Dynamic Memories,” Errors in Dynamic Memories,” IEEE Trans. Electron IEEE Trans. Electron DevicesDevices, vol. ED-26, no. 1, pp. 2-9, 1979., vol. ED-26, no. 1, pp. 2-9, 1979.
![Page 4: ELEC 7770 Advanced VLSI Design Spring 2014 Soft Errors and Fault-Tolerant Design](https://reader035.vdocuments.mx/reader035/viewer/2022062301/56814fd1550346895dbd925b/html5/thumbnails/4.jpg)
Spring 2014, Apr 11 . . .Spring 2014, Apr 11 . . . ELEC 7770: Advanced VLSI Design (Agrawal)ELEC 7770: Advanced VLSI Design (Agrawal) 44
Causes of Soft ErrorsCauses of Soft Errors
Interconnect coupling (crosstalk).Interconnect coupling (crosstalk). Power supply noise: IR-drop, power droop, Power supply noise: IR-drop, power droop,
ground bounce.ground bounce. Ignition noise.Ignition noise. Electromagnetic pulse (EMP).Electromagnetic pulse (EMP). Effects generally attributed to alpha-particles:Effects generally attributed to alpha-particles:
Charged particles: electrons, protons, ions.Charged particles: electrons, protons, ions. Radiation (photons): X-rays, gamma-rays, ultra-violet Radiation (photons): X-rays, gamma-rays, ultra-violet
light. light.
![Page 5: ELEC 7770 Advanced VLSI Design Spring 2014 Soft Errors and Fault-Tolerant Design](https://reader035.vdocuments.mx/reader035/viewer/2022062301/56814fd1550346895dbd925b/html5/thumbnails/5.jpg)
Spring 2014, Apr 11 . . .Spring 2014, Apr 11 . . . ELEC 7770: Advanced VLSI Design (Agrawal)ELEC 7770: Advanced VLSI Design (Agrawal) 55
Sources of Alpha-ParticlesSources of Alpha-Particles
Radioactive contamination in VLSI packaging Radioactive contamination in VLSI packaging material.material.
Ionosphere, magnetosphere and solar radiation.Ionosphere, magnetosphere and solar radiation. Other electromagnetic radiation.Other electromagnetic radiation.
![Page 6: ELEC 7770 Advanced VLSI Design Spring 2014 Soft Errors and Fault-Tolerant Design](https://reader035.vdocuments.mx/reader035/viewer/2022062301/56814fd1550346895dbd925b/html5/thumbnails/6.jpg)
Spring 2014, Apr 11 . . .Spring 2014, Apr 11 . . . ELEC 7770: Advanced VLSI Design (Agrawal)ELEC 7770: Advanced VLSI Design (Agrawal) 66
Alpha-ParticleAlpha-Particle
Helium nucleus: two protons and two Helium nucleus: two protons and two neutrons, mass = 6.65 neutrons, mass = 6.65 ×10×10-27-27kgkg, charge = , charge = +2e (e = 1.6 +2e (e = 1.6 ×10×10-19-19C).C).
Energy = 3.73 GeVEnergy = 3.73 GeV
![Page 7: ELEC 7770 Advanced VLSI Design Spring 2014 Soft Errors and Fault-Tolerant Design](https://reader035.vdocuments.mx/reader035/viewer/2022062301/56814fd1550346895dbd925b/html5/thumbnails/7.jpg)
Spring 2014, Apr 11 . . .Spring 2014, Apr 11 . . . ELEC 7770: Advanced VLSI Design (Agrawal)ELEC 7770: Advanced VLSI Design (Agrawal) 77
Soft Error Rate (SER)Soft Error Rate (SER)
Failures in time (FIT): One FIT is 1 error per Failures in time (FIT): One FIT is 1 error per billion hours of operation.billion hours of operation.
Alternative unit is mean time between failures Alternative unit is mean time between failures (MTBF) or mean time to failure (MTTF).(MTBF) or mean time to failure (MTTF).
1 year MTBF = 109/(365×24) = 114,155 FIT
![Page 8: ELEC 7770 Advanced VLSI Design Spring 2014 Soft Errors and Fault-Tolerant Design](https://reader035.vdocuments.mx/reader035/viewer/2022062301/56814fd1550346895dbd925b/html5/thumbnails/8.jpg)
Spring 2014, Apr 11 . . .Spring 2014, Apr 11 . . . ELEC 7770: Advanced VLSI Design (Agrawal)ELEC 7770: Advanced VLSI Design (Agrawal) 88
Particle StrikeParticle Strike
p - substrate
n - + + ++ - -
Ion orCharged particle
![Page 9: ELEC 7770 Advanced VLSI Design Spring 2014 Soft Errors and Fault-Tolerant Design](https://reader035.vdocuments.mx/reader035/viewer/2022062301/56814fd1550346895dbd925b/html5/thumbnails/9.jpg)
Spring 2014, Apr 11 . . .Spring 2014, Apr 11 . . . ELEC 7770: Advanced VLSI Design (Agrawal)ELEC 7770: Advanced VLSI Design (Agrawal) 99
Induced CurrentInduced Current
time
curr
ent
I(t) = I0(e– t/a – e– t/b), a >> b
![Page 10: ELEC 7770 Advanced VLSI Design Spring 2014 Soft Errors and Fault-Tolerant Design](https://reader035.vdocuments.mx/reader035/viewer/2022062301/56814fd1550346895dbd925b/html5/thumbnails/10.jpg)
Spring 2014, Apr 11 . . .Spring 2014, Apr 11 . . . ELEC 7770: Advanced VLSI Design (Agrawal)ELEC 7770: Advanced VLSI Design (Agrawal) 1010
Voltage Induced at a NodeVoltage Induced at a Node
V = Q/C
Where Q = ∫ I(t) dt
C = node capacitance
Smaller node capacitance will result in larger voltage swing.
![Page 11: ELEC 7770 Advanced VLSI Design Spring 2014 Soft Errors and Fault-Tolerant Design](https://reader035.vdocuments.mx/reader035/viewer/2022062301/56814fd1550346895dbd925b/html5/thumbnails/11.jpg)
Spring 2014, Apr 11 . . .Spring 2014, Apr 11 . . . ELEC 7770: Advanced VLSI Design (Agrawal)ELEC 7770: Advanced VLSI Design (Agrawal) 1111
Effect on Digital CircuitEffect on Digital Circuit
IN OUT
CK
CombinationalLogic
ChargedParticles
ChargedParticles
![Page 12: ELEC 7770 Advanced VLSI Design Spring 2014 Soft Errors and Fault-Tolerant Design](https://reader035.vdocuments.mx/reader035/viewer/2022062301/56814fd1550346895dbd925b/html5/thumbnails/12.jpg)
Spring 2014, Apr 11 . . .Spring 2014, Apr 11 . . . ELEC 7770: Advanced VLSI Design (Agrawal)ELEC 7770: Advanced VLSI Design (Agrawal) 1212
An SRAM CellAn SRAM Cell
bit bit
VDD
WL
BL BL
01
![Page 13: ELEC 7770 Advanced VLSI Design Spring 2014 Soft Errors and Fault-Tolerant Design](https://reader035.vdocuments.mx/reader035/viewer/2022062301/56814fd1550346895dbd925b/html5/thumbnails/13.jpg)
Spring 2014, Apr 11 . . .Spring 2014, Apr 11 . . . ELEC 7770: Advanced VLSI Design (Agrawal)ELEC 7770: Advanced VLSI Design (Agrawal) 1313
SRAM Cell Struck by Alpha-ParticleSRAM Cell Struck by Alpha-ParticleSingle-Event Upset (SEU)Single-Event Upset (SEU)
bit bit
VDD
WL
BL BL
0→11→0
ChargedParticles
![Page 14: ELEC 7770 Advanced VLSI Design Spring 2014 Soft Errors and Fault-Tolerant Design](https://reader035.vdocuments.mx/reader035/viewer/2022062301/56814fd1550346895dbd925b/html5/thumbnails/14.jpg)
Spring 2014, Apr 11 . . .Spring 2014, Apr 11 . . . ELEC 7770: Advanced VLSI Design (Agrawal)ELEC 7770: Advanced VLSI Design (Agrawal) 1414
A Resistor Hardened SRAM CellA Resistor Hardened SRAM Cell
bit bit
VDD
WL
BL BL
01
![Page 15: ELEC 7770 Advanced VLSI Design Spring 2014 Soft Errors and Fault-Tolerant Design](https://reader035.vdocuments.mx/reader035/viewer/2022062301/56814fd1550346895dbd925b/html5/thumbnails/15.jpg)
Spring 2014, Apr 11 . . .Spring 2014, Apr 11 . . . ELEC 7770: Advanced VLSI Design (Agrawal)ELEC 7770: Advanced VLSI Design (Agrawal) 1515
D-LatchD-Latch
D
CK = 0
Q1
0
Q
![Page 16: ELEC 7770 Advanced VLSI Design Spring 2014 Soft Errors and Fault-Tolerant Design](https://reader035.vdocuments.mx/reader035/viewer/2022062301/56814fd1550346895dbd925b/html5/thumbnails/16.jpg)
Spring 2014, Apr 11 . . .Spring 2014, Apr 11 . . . ELEC 7770: Advanced VLSI Design (Agrawal)ELEC 7770: Advanced VLSI Design (Agrawal) 1616
SEU in D-LatchSEU in D-Latch
D
CK = 0
Q
1→0
0→1
ChargedParticles
Q
![Page 17: ELEC 7770 Advanced VLSI Design Spring 2014 Soft Errors and Fault-Tolerant Design](https://reader035.vdocuments.mx/reader035/viewer/2022062301/56814fd1550346895dbd925b/html5/thumbnails/17.jpg)
Spring 2014, Apr 11 . . .Spring 2014, Apr 11 . . . ELEC 7770: Advanced VLSI Design (Agrawal)ELEC 7770: Advanced VLSI Design (Agrawal) 1717
Single Event Transients in Single Event Transients in Combinational LogicCombinational Logic
CK
CK
1
1
0
1
0
1
ChargedParticles
![Page 18: ELEC 7770 Advanced VLSI Design Spring 2014 Soft Errors and Fault-Tolerant Design](https://reader035.vdocuments.mx/reader035/viewer/2022062301/56814fd1550346895dbd925b/html5/thumbnails/18.jpg)
Spring 2014, Apr 11 . . .Spring 2014, Apr 11 . . . ELEC 7770: Advanced VLSI Design (Agrawal)ELEC 7770: Advanced VLSI Design (Agrawal) 1818
Effects of TransientsEffects of Transients
Error correcting effectsError correcting effects Transient pulse is filtered by gate inertiaTransient pulse is filtered by gate inertia Transient is blocked by an unsensitized pathTransient is blocked by an unsensitized path Transient is blocked by an inactive clockTransient is blocked by an inactive clock
Error enhancing effectsError enhancing effects Large number of gates can produce multiple Large number of gates can produce multiple
pulsespulses Fanouts can multiply error pulsesFanouts can multiply error pulses
![Page 19: ELEC 7770 Advanced VLSI Design Spring 2014 Soft Errors and Fault-Tolerant Design](https://reader035.vdocuments.mx/reader035/viewer/2022062301/56814fd1550346895dbd925b/html5/thumbnails/19.jpg)
Typical Soft Error DistributionTypical Soft Error Distribution
Spring 2014, Apr 11 . . .Spring 2014, Apr 11 . . . ELEC 7770: Advanced VLSI Design (Agrawal)ELEC 7770: Advanced VLSI Design (Agrawal) 1919
S. Mitra, N. Seifert, M. Zhang, Q. Shi, and K. S. Kim, “Robust System Design with Built-In Soft-Error Resilience,” Computer, vol. 38, no. 2, pp. 43-52, February 2005.
![Page 20: ELEC 7770 Advanced VLSI Design Spring 2014 Soft Errors and Fault-Tolerant Design](https://reader035.vdocuments.mx/reader035/viewer/2022062301/56814fd1550346895dbd925b/html5/thumbnails/20.jpg)
Soft Error SimulationSoft Error Simulation
F. Wang and V. D. Agrawal, “Soft Error Rate F. Wang and V. D. Agrawal, “Soft Error Rate with Inertial and Logical Masking,” with Inertial and Logical Masking,” Proc. 22Proc. 22ndnd International Conference on Quality VLSI International Conference on Quality VLSI DesignDesign, January 2009, pp. 459-464., January 2009, pp. 459-464.
F. Wang and V. D. Agrawal, “Soft Error Rate F. Wang and V. D. Agrawal, “Soft Error Rate Determination for Nanoscale Sequential Logic,” Determination for Nanoscale Sequential Logic,” Proc. 11Proc. 11thth International Symposium on Quality International Symposium on Quality Electronic Design (ISQED), Electronic Design (ISQED), March 2010, pp. March 2010, pp. 225-230.225-230.
Spring 2014, Apr 11 . . .Spring 2014, Apr 11 . . . ELEC 7770: Advanced VLSI Design (Agrawal)ELEC 7770: Advanced VLSI Design (Agrawal) 2020
![Page 21: ELEC 7770 Advanced VLSI Design Spring 2014 Soft Errors and Fault-Tolerant Design](https://reader035.vdocuments.mx/reader035/viewer/2022062301/56814fd1550346895dbd925b/html5/thumbnails/21.jpg)
Spring 2014, Apr 11 . . .Spring 2014, Apr 11 . . . ELEC 7770: Advanced VLSI Design (Agrawal)ELEC 7770: Advanced VLSI Design (Agrawal) 2121
SEUs in FPGASEUs in FPGA
Parts that can be affectedParts that can be affected Look-up table (LUT)Look-up table (LUT) Configuration memory cellConfiguration memory cell Flip-flopFlip-flop Block RAMBlock RAM
F. L. Kastensmidt, L. Carro and R. Reis, F. L. Kastensmidt, L. Carro and R. Reis, Fault-Tolerant Techniques for SRAM-Based Fault-Tolerant Techniques for SRAM-Based FPGAsFPGAs, Springer, 2006., Springer, 2006.
![Page 22: ELEC 7770 Advanced VLSI Design Spring 2014 Soft Errors and Fault-Tolerant Design](https://reader035.vdocuments.mx/reader035/viewer/2022062301/56814fd1550346895dbd925b/html5/thumbnails/22.jpg)
Spring 2014, Apr 11 . . .Spring 2014, Apr 11 . . . ELEC 7770: Advanced VLSI Design (Agrawal)ELEC 7770: Advanced VLSI Design (Agrawal) 2222
LUTLUT
out
F1 F2 F3 F4
1
0
1
1
0
1
1
0
0
0
0
0
1
1
1
0
Mem
ory
cells
![Page 23: ELEC 7770 Advanced VLSI Design Spring 2014 Soft Errors and Fault-Tolerant Design](https://reader035.vdocuments.mx/reader035/viewer/2022062301/56814fd1550346895dbd925b/html5/thumbnails/23.jpg)
Spring 2014, Apr 11 . . .Spring 2014, Apr 11 . . . ELEC 7770: Advanced VLSI Design (Agrawal)ELEC 7770: Advanced VLSI Design (Agrawal) 2323
SEU in SEU in LUTLUT
out
F1 F2 F3 F4
1
0
1
0
0
1
1
0
0
0
0
0
1
1
1
0
Mem
ory
cells
ChargedParticle
1 changed to 0
![Page 24: ELEC 7770 Advanced VLSI Design Spring 2014 Soft Errors and Fault-Tolerant Design](https://reader035.vdocuments.mx/reader035/viewer/2022062301/56814fd1550346895dbd925b/html5/thumbnails/24.jpg)
Spring 2014, Apr 11 . . .Spring 2014, Apr 11 . . . ELEC 7770: Advanced VLSI Design (Agrawal)ELEC 7770: Advanced VLSI Design (Agrawal) 2424
Four Types of SEU in FPGAFour Types of SEU in FPGA
F1F2F3F4
LUT
FF
M
M
M
M
M M M
Configuration memory cell
Type 1
Type 2
Type 3
BlockRAM
Type 4
![Page 25: ELEC 7770 Advanced VLSI Design Spring 2014 Soft Errors and Fault-Tolerant Design](https://reader035.vdocuments.mx/reader035/viewer/2022062301/56814fd1550346895dbd925b/html5/thumbnails/25.jpg)
Spring 2014, Apr 11 . . .Spring 2014, Apr 11 . . . ELEC 7770: Advanced VLSI Design (Agrawal)ELEC 7770: Advanced VLSI Design (Agrawal) 2525
SEU Detection MethodsSEU Detection Methods
Hardware redundancyHardware redundancy Time redundancyTime redundancy Error detection codes (EDC)Error detection codes (EDC) Self-checker techniquesSelf-checker techniques
![Page 26: ELEC 7770 Advanced VLSI Design Spring 2014 Soft Errors and Fault-Tolerant Design](https://reader035.vdocuments.mx/reader035/viewer/2022062301/56814fd1550346895dbd925b/html5/thumbnails/26.jpg)
Spring 2014, Apr 11 . . .Spring 2014, Apr 11 . . . ELEC 7770: Advanced VLSI Design (Agrawal)ELEC 7770: Advanced VLSI Design (Agrawal) 2626
SEU Mitigation TechniquesSEU Mitigation Techniques
Triple modular redundancy (TMR)Triple modular redundancy (TMR) Multiple redundancy with votingMultiple redundancy with voting Error detection and correction codes (EDAC)Error detection and correction codes (EDAC) Hardened memory cellsHardened memory cells FPGA-specific methodsFPGA-specific methods
ReconfigurationReconfiguration Partial configurationPartial configuration Rerouting designRerouting design
![Page 27: ELEC 7770 Advanced VLSI Design Spring 2014 Soft Errors and Fault-Tolerant Design](https://reader035.vdocuments.mx/reader035/viewer/2022062301/56814fd1550346895dbd925b/html5/thumbnails/27.jpg)
Spring 2014, Apr 11 . . .Spring 2014, Apr 11 . . . ELEC 7770: Advanced VLSI Design (Agrawal)ELEC 7770: Advanced VLSI Design (Agrawal) 2727
Hardware Redundancy for DetectionHardware Redundancy for Detection
CombinationalLogic
CombinationalLogic
(duplicated)
outputinputs
Logic 1 indicates
error
Hardware overhead is high ~ 100%Performance penalty is negligible.
![Page 28: ELEC 7770 Advanced VLSI Design Spring 2014 Soft Errors and Fault-Tolerant Design](https://reader035.vdocuments.mx/reader035/viewer/2022062301/56814fd1550346895dbd925b/html5/thumbnails/28.jpg)
Spring 2014, Apr 11 . . .Spring 2014, Apr 11 . . . ELEC 7770: Advanced VLSI Design (Agrawal)ELEC 7770: Advanced VLSI Design (Agrawal) 2828
Time Redundancy for DetectionTime Redundancy for Detection
CombinationalLogic outputinputs
Logic 1 indicates
error
Hardware overhead is low.Performance penalty ( ~ d) = maximum detectable pulse width.
D Q
D Q
CK+ d
CK
![Page 29: ELEC 7770 Advanced VLSI Design Spring 2014 Soft Errors and Fault-Tolerant Design](https://reader035.vdocuments.mx/reader035/viewer/2022062301/56814fd1550346895dbd925b/html5/thumbnails/29.jpg)
Spring 2014, Apr 11 . . .Spring 2014, Apr 11 . . . ELEC 7770: Advanced VLSI Design (Agrawal)ELEC 7770: Advanced VLSI Design (Agrawal) 2929
Repeat on Error DetectionRepeat on Error Detection
CombinationalLogic
output
inputs
Logic 1 indicates
errorD Q
D Q
CK+ d
CK
C
Operation: If error is detected, then output retains its previous value.Repeating the computation can produce correct result.
![Page 30: ELEC 7770 Advanced VLSI Design Spring 2014 Soft Errors and Fault-Tolerant Design](https://reader035.vdocuments.mx/reader035/viewer/2022062301/56814fd1550346895dbd925b/html5/thumbnails/30.jpg)
Spring 2014, Apr 11 . . .Spring 2014, Apr 11 . . . ELEC 7770: Advanced VLSI Design (Agrawal)ELEC 7770: Advanced VLSI Design (Agrawal) 3030
Muller C-ElementMuller C-Element
outputC
A
B
A B output
00 00 00
00 11 Old outputOld output
11 00 Old outputOld output
11 11 11
S Q
R
A
B
output
![Page 31: ELEC 7770 Advanced VLSI Design Spring 2014 Soft Errors and Fault-Tolerant Design](https://reader035.vdocuments.mx/reader035/viewer/2022062301/56814fd1550346895dbd925b/html5/thumbnails/31.jpg)
Dynamic CMOS C-ElementDynamic CMOS C-Element
Spring 2014, Apr 11 . . .Spring 2014, Apr 11 . . . ELEC 7770: Advanced VLSI Design (Agrawal)ELEC 7770: Advanced VLSI Design (Agrawal) 3131
outputC
A
B
A B output
00 00 11
00 11 Old outputOld output
11 00 Old outputOld output
11 11 00
A
Boutput
![Page 32: ELEC 7770 Advanced VLSI Design Spring 2014 Soft Errors and Fault-Tolerant Design](https://reader035.vdocuments.mx/reader035/viewer/2022062301/56814fd1550346895dbd925b/html5/thumbnails/32.jpg)
Pseudostatic CMOS C-ElementPseudostatic CMOS C-Element
Spring 2014, Apr 11 . . .Spring 2014, Apr 11 . . . ELEC 7770: Advanced VLSI Design (Agrawal)ELEC 7770: Advanced VLSI Design (Agrawal) 3232
outputC
A
B
A B output
00 00 11
00 11 Old outputOld output
11 00 Old outputOld output
11 11 00
A
Boutput
Weakkeeper
![Page 33: ELEC 7770 Advanced VLSI Design Spring 2014 Soft Errors and Fault-Tolerant Design](https://reader035.vdocuments.mx/reader035/viewer/2022062301/56814fd1550346895dbd925b/html5/thumbnails/33.jpg)
Built-In Soft Error Resilience (BISER)Built-In Soft Error Resilience (BISER)
Spring 2014, Apr 11 . . .Spring 2014, Apr 11 . . . ELEC 7770: Advanced VLSI Design (Agrawal)ELEC 7770: Advanced VLSI Design (Agrawal) 3333
A B output
0 0 1
0 1 Old output
1 0 Old output
1 1 0
A
B
output
Weakkeeper
Flip-flop
DuplicateFlip-flop
Clock
Data fromcombinationallogic
![Page 34: ELEC 7770 Advanced VLSI Design Spring 2014 Soft Errors and Fault-Tolerant Design](https://reader035.vdocuments.mx/reader035/viewer/2022062301/56814fd1550346895dbd925b/html5/thumbnails/34.jpg)
BISERBISER Assumptions:Assumptions:
Most soft errors in combinational logic are eliminated by Most soft errors in combinational logic are eliminated by inertial or logic masking.inertial or logic masking.
Soft error pulse generated in flip-flop is much shorter Soft error pulse generated in flip-flop is much shorter than clock period.than clock period.
Probability of either a master or slave latch being struck Probability of either a master or slave latch being struck by soft error exactly at clock edge is small.by soft error exactly at clock edge is small.
Flip-flop is duplicated and outputs fed to C-element.Flip-flop is duplicated and outputs fed to C-element. Twenty times reduction of soft error observed.Twenty times reduction of soft error observed. Ref.: S. Mitra, N. Seifert, M. Zhang, Q. Shi, and K. S. Kim, Ref.: S. Mitra, N. Seifert, M. Zhang, Q. Shi, and K. S. Kim,
“Robust System Design with Built-In Soft-Error Resilience,” “Robust System Design with Built-In Soft-Error Resilience,” ComputerComputer, vol. 38, no. 2, pp. 43-52, February 2005., vol. 38, no. 2, pp. 43-52, February 2005.
Spring 2014, Apr 11 . . .Spring 2014, Apr 11 . . . ELEC 7770: Advanced VLSI Design (Agrawal)ELEC 7770: Advanced VLSI Design (Agrawal) 3434
![Page 35: ELEC 7770 Advanced VLSI Design Spring 2014 Soft Errors and Fault-Tolerant Design](https://reader035.vdocuments.mx/reader035/viewer/2022062301/56814fd1550346895dbd925b/html5/thumbnails/35.jpg)
Spring 2014, Apr 11 . . .Spring 2014, Apr 11 . . . ELEC 7770: Advanced VLSI Design (Agrawal)ELEC 7770: Advanced VLSI Design (Agrawal) 3535
Triple Modular Redundancy (TMR)Triple Modular Redundancy (TMR)
CombinationalLogic copy 1
outputinputs MajorityVoter
CombinationalLogic copy 3
CombinationalLogic copy 2
![Page 36: ELEC 7770 Advanced VLSI Design Spring 2014 Soft Errors and Fault-Tolerant Design](https://reader035.vdocuments.mx/reader035/viewer/2022062301/56814fd1550346895dbd925b/html5/thumbnails/36.jpg)
TMR Error ReductionTMR Error Reduction Voter input error probability = E, assumed Voter input error probability = E, assumed
independent for each input.independent for each input. Output error probability,Output error probability,
ee = = Prob(Prob(two errors two errors or or three errorsthree errors))
== ( ) E( ) E2 2 (1 – E) + ( ) E(1 – E) + ( ) E33
== 3 E3 E22 – 3 E – 3 E33 + E + E33 == 3 E3 E22 – 2 E – 2 E33
For very small E, EFor very small E, E3 3 << E<< E22 → e = 3E → e = 3E22
Spring 2014, Apr 11 . . .Spring 2014, Apr 11 . . . ELEC 7770: Advanced VLSI Design (Agrawal)ELEC 7770: Advanced VLSI Design (Agrawal) 3636
3
2
3
3
![Page 37: ELEC 7770 Advanced VLSI Design Spring 2014 Soft Errors and Fault-Tolerant Design](https://reader035.vdocuments.mx/reader035/viewer/2022062301/56814fd1550346895dbd925b/html5/thumbnails/37.jpg)
TMR Error ProbabilityTMR Error Probability
Input error probability, E Output error probability, e
0.0 0.0
0.001 0.000002998
0.01 0.000298
0.1 0.027
0.2 0.104
0.3 0.216
0.4 0.352
0.5 0.5
0.6 0.648
Spring 2014, Apr 11 . . .Spring 2014, Apr 11 . . . ELEC 7770: Advanced VLSI Design (Agrawal)ELEC 7770: Advanced VLSI Design (Agrawal) 3737
![Page 38: ELEC 7770 Advanced VLSI Design Spring 2014 Soft Errors and Fault-Tolerant Design](https://reader035.vdocuments.mx/reader035/viewer/2022062301/56814fd1550346895dbd925b/html5/thumbnails/38.jpg)
Spring 2014, Apr 11 . . .Spring 2014, Apr 11 . . . ELEC 7770: Advanced VLSI Design (Agrawal)ELEC 7770: Advanced VLSI Design (Agrawal) 3838
Majority Voter CircuitMajority Voter Circuit
A
B
AA BB CC outputoutput
00 00 00 00
00 00 11 00
00 11 00 00
00 11 11 11
11 00 00 00
11 00 11 11
11 11 00 11
11 11 11 11
A
B output
outputMajorityVoter
C
C
![Page 39: ELEC 7770 Advanced VLSI Design Spring 2014 Soft Errors and Fault-Tolerant Design](https://reader035.vdocuments.mx/reader035/viewer/2022062301/56814fd1550346895dbd925b/html5/thumbnails/39.jpg)
Spring 2014, Apr 11 . . .Spring 2014, Apr 11 . . . ELEC 7770: Advanced VLSI Design (Agrawal)ELEC 7770: Advanced VLSI Design (Agrawal) 3939
Alternative Implementations of VoterAlternative Implementations of Voter
LUT
00010111
output output
A
B
C
A B C
VDD
![Page 40: ELEC 7770 Advanced VLSI Design Spring 2014 Soft Errors and Fault-Tolerant Design](https://reader035.vdocuments.mx/reader035/viewer/2022062301/56814fd1550346895dbd925b/html5/thumbnails/40.jpg)
Spring 2014, Apr 11 . . .Spring 2014, Apr 11 . . . ELEC 7770: Advanced VLSI Design (Agrawal)ELEC 7770: Advanced VLSI Design (Agrawal) 4040
Triple Modular Redundancy (TMR)Triple Modular Redundancy (TMR)
CombinationalLogic
output
inputs
D Q
D Q
CK
CK + d
MajorityVoter
D Q
D Q
CK + 2d
CK + 3d
![Page 41: ELEC 7770 Advanced VLSI Design Spring 2014 Soft Errors and Fault-Tolerant Design](https://reader035.vdocuments.mx/reader035/viewer/2022062301/56814fd1550346895dbd925b/html5/thumbnails/41.jpg)
Spring 2014, Apr 11 . . .Spring 2014, Apr 11 . . . ELEC 7770: Advanced VLSI Design (Agrawal)ELEC 7770: Advanced VLSI Design (Agrawal) 4141
TMR for Memory CellsTMR for Memory Cells
CombinationalLogic
output
inputs
D Q
D Q
CK
CK
MajorityVoter
D Q
CK
Problems:1. Accumulation of
errors in flip-flops.1. Voter is not protected.
![Page 42: ELEC 7770 Advanced VLSI Design Spring 2014 Soft Errors and Fault-Tolerant Design](https://reader035.vdocuments.mx/reader035/viewer/2022062301/56814fd1550346895dbd925b/html5/thumbnails/42.jpg)
Spring 2014, Apr 11 . . .Spring 2014, Apr 11 . . . ELEC 7770: Advanced VLSI Design (Agrawal)ELEC 7770: Advanced VLSI Design (Agrawal) 4242
FF Refresh and TMR for Memory CellsFF Refresh and TMR for Memory Cells
output
D Q
D Q
CK
CK
D Q
CK
MajorityVoter
MajorityVoter
MajorityVoter
MajorityVoter
r1
r2
r3
![Page 43: ELEC 7770 Advanced VLSI Design Spring 2014 Soft Errors and Fault-Tolerant Design](https://reader035.vdocuments.mx/reader035/viewer/2022062301/56814fd1550346895dbd925b/html5/thumbnails/43.jpg)
Reliability AnalysisReliability Analysis
Determine how long a system will work without Determine how long a system will work without failure.failure.
Find:Find: Mean time to failure (MTTF) or mean time between Mean time to failure (MTTF) or mean time between
failures (MTBF) failures (MTBF) Mean time to repair (MTTR)Mean time to repair (MTTR) FIT rateFIT rate
Spring 2014, Apr 11 . . .Spring 2014, Apr 11 . . . ELEC 7770: Advanced VLSI Design (Agrawal)ELEC 7770: Advanced VLSI Design (Agrawal) 4343
![Page 44: ELEC 7770 Advanced VLSI Design Spring 2014 Soft Errors and Fault-Tolerant Design](https://reader035.vdocuments.mx/reader035/viewer/2022062301/56814fd1550346895dbd925b/html5/thumbnails/44.jpg)
Reliability FunctionReliability Function Reliability function of a system,Reliability function of a system,
R(t) = Probability of survival at time tR(t) = Probability of survival at time t
Determined from failure rates of components,Determined from failure rates of components,
λλ(t) = Number of failures per unit time(t) = Number of failures per unit time
Generally varies with time.Generally varies with time.
Spring 2014, Apr 11 . . .Spring 2014, Apr 11 . . . ELEC 7770: Advanced VLSI Design (Agrawal)ELEC 7770: Advanced VLSI Design (Agrawal) 4444
![Page 45: ELEC 7770 Advanced VLSI Design Spring 2014 Soft Errors and Fault-Tolerant Design](https://reader035.vdocuments.mx/reader035/viewer/2022062301/56814fd1550346895dbd925b/html5/thumbnails/45.jpg)
Failure Rate, Failure Rate, λλ(t)(t)
Spring 2014, Apr 11 . . .Spring 2014, Apr 11 . . . ELEC 7770: Advanced VLSI Design (Agrawal)ELEC 7770: Advanced VLSI Design (Agrawal) 4545
Time, t
Fai
lure
s pe
r se
cond
, λ(
t)
10-12
10-9
10-6
10-3
100
Infantmortality
Constant failureRate (useful life)
λ(t) = λ
Wearoutor aging
![Page 46: ELEC 7770 Advanced VLSI Design Spring 2014 Soft Errors and Fault-Tolerant Design](https://reader035.vdocuments.mx/reader035/viewer/2022062301/56814fd1550346895dbd925b/html5/thumbnails/46.jpg)
Deriving R(t)Deriving R(t)
R(t) is the probability of no error in interval [0, t].R(t) is the probability of no error in interval [0, t]. Divide interval in a large number (n) of subintervals of Divide interval in a large number (n) of subintervals of
duration t/n. Let x be the probability of error in one duration t/n. Let x be the probability of error in one subinterval.subinterval.
Assume that duration t/n is so small that either no error Assume that duration t/n is so small that either no error occurs or at most one error can occur. Then, average occurs or at most one error can occur. Then, average errors in a subinterval = 0.(1 – x) + 1.x = x = errors in a subinterval = 0.(1 – x) + 1.x = x = λλt/n.t/n.
Probability of no error in interval [0, t] is,Probability of no error in interval [0, t] is,
R(t)R(t) = (1 – x)= (1 – x)nn = (1 – = (1 – λλt/n)t/n)nn
= exp(– = exp(– λλt), from Sterling’s formula as n → t), from Sterling’s formula as n → ∞∞
Spring 2014, Apr 11 . . .Spring 2014, Apr 11 . . . ELEC 7770: Advanced VLSI Design (Agrawal)ELEC 7770: Advanced VLSI Design (Agrawal) 4646
![Page 47: ELEC 7770 Advanced VLSI Design Spring 2014 Soft Errors and Fault-Tolerant Design](https://reader035.vdocuments.mx/reader035/viewer/2022062301/56814fd1550346895dbd925b/html5/thumbnails/47.jpg)
R(t) and MTBFR(t) and MTBFR(t)R(t) == ee – –λλt t
Failure rate, Failure rate, λλ = failures per unit time = failures per unit time
Number of failures in time T = Number of failures in time T = λλTT∞∞
MTBF = T/MTBF = T/λλT = 1/T = 1/λλ = = ∫ ∫ R(t) dtR(t) dt00
R(t) = exp( – t/MTBF)R(t) = exp( – t/MTBF)
For t = MTBF, R(MTBF) = eFor t = MTBF, R(MTBF) = e –1 –1 = 0.368= 0.368
Spring 2014, Apr 11 . . .Spring 2014, Apr 11 . . . ELEC 7770: Advanced VLSI Design (Agrawal)ELEC 7770: Advanced VLSI Design (Agrawal) 4747
![Page 48: ELEC 7770 Advanced VLSI Design Spring 2014 Soft Errors and Fault-Tolerant Design](https://reader035.vdocuments.mx/reader035/viewer/2022062301/56814fd1550346895dbd925b/html5/thumbnails/48.jpg)
Reliability and MTBFReliability and MTBF
Spring 2014, Apr 11 . . .Spring 2014, Apr 11 . . . ELEC 7770: Advanced VLSI Design (Agrawal)ELEC 7770: Advanced VLSI Design (Agrawal) 4848
Time, t
Rel
iabi
lity,
R(t
)
1.0
0.8
0.6
0.4
0.2
0.01 MTBF 2 MTBF 3 MTBF
R(t) = 1/e = 0.368
![Page 49: ELEC 7770 Advanced VLSI Design Spring 2014 Soft Errors and Fault-Tolerant Design](https://reader035.vdocuments.mx/reader035/viewer/2022062301/56814fd1550346895dbd925b/html5/thumbnails/49.jpg)
Example: First Generation ComputerExample: First Generation Computer
10,000 electron tubes.10,000 electron tubes. Average burn out rate: 5 tubes per 100,000 hours.Average burn out rate: 5 tubes per 100,000 hours. MTBF = 100,000/5 = 20,000 hours = 2.3 years, MTBF = 100,000/5 = 20,000 hours = 2.3 years,
i.e., 37% chance of survival beyond 2.3 years.i.e., 37% chance of survival beyond 2.3 years. Time for 95% chance of survival:Time for 95% chance of survival:
R(t) = exp(– t/MTBF) = 0.95, or t = 1.4 months R(t) = exp(– t/MTBF) = 0.95, or t = 1.4 months
Spring 2014, Apr 11 . . .Spring 2014, Apr 11 . . . ELEC 7770: Advanced VLSI Design (Agrawal)ELEC 7770: Advanced VLSI Design (Agrawal) 4949
![Page 50: ELEC 7770 Advanced VLSI Design Spring 2014 Soft Errors and Fault-Tolerant Design](https://reader035.vdocuments.mx/reader035/viewer/2022062301/56814fd1550346895dbd925b/html5/thumbnails/50.jpg)
Reliability of TMRReliability of TMR
R(TMR)R(TMR) = Prob(all three modules correct)= Prob(all three modules correct)
+ Prob(any two modules + Prob(any two modules correct)correct)
= R= R3 3 + 3R + 3R22 (1 – R) (1 – R)
= 3 R= 3 R22 – 2 R – 2 R33
= 3e= 3e-2-2λλtt – 2e – 2e-3-3λλtt
Spring 2014, Apr 11 . . .Spring 2014, Apr 11 . . . ELEC 7770: Advanced VLSI Design (Agrawal)ELEC 7770: Advanced VLSI Design (Agrawal) 5050
![Page 51: ELEC 7770 Advanced VLSI Design Spring 2014 Soft Errors and Fault-Tolerant Design](https://reader035.vdocuments.mx/reader035/viewer/2022062301/56814fd1550346895dbd925b/html5/thumbnails/51.jpg)
MTBF of TMRMTBF of TMR
R(TMR)R(TMR) == 3e 3e-2-2λλtt – 2e – 2e-3-3λλtt
MTBF = ∫ R(TMR) dtMTBF = ∫ R(TMR) dt == 5/(65/(6λλ))
0 0
This is less than the MTBF = 1/This is less than the MTBF = 1/λλ for a single for a single system!system!
Spring 2014, Apr 11 . . .Spring 2014, Apr 11 . . . ELEC 7770: Advanced VLSI Design (Agrawal)ELEC 7770: Advanced VLSI Design (Agrawal) 5151
8
![Page 52: ELEC 7770 Advanced VLSI Design Spring 2014 Soft Errors and Fault-Tolerant Design](https://reader035.vdocuments.mx/reader035/viewer/2022062301/56814fd1550346895dbd925b/html5/thumbnails/52.jpg)
MTBF of TMRMTBF of TMR
Spring 2014, Apr 11 . . .Spring 2014, Apr 11 . . . ELEC 7770: Advanced VLSI Design (Agrawal)ELEC 7770: Advanced VLSI Design (Agrawal) 5252
Time, t
Rel
iabi
lity,
R(t
)
1.0
0.8
0.6
0.4
0.2
0.0
Singlemodule
TMR
Missionduration
![Page 53: ELEC 7770 Advanced VLSI Design Spring 2014 Soft Errors and Fault-Tolerant Design](https://reader035.vdocuments.mx/reader035/viewer/2022062301/56814fd1550346895dbd925b/html5/thumbnails/53.jpg)
Error Detection CodeError Detection Code Errors: Bits can flip due too noise in circuits and Errors: Bits can flip due too noise in circuits and
in communication.in communication. Extra bits used for error detection.Extra bits used for error detection. Example: a parity bit in ASCII codeExample: a parity bit in ASCII code
Spring 2014, Apr 11 . . .Spring 2014, Apr 11 . . . ELEC 7770: Advanced VLSI Design (Agrawal)ELEC 7770: Advanced VLSI Design (Agrawal)
Even parity code for A 01000001(even number of 1s)
Odd parity code for A 11000001(odd number of 1s)
7-bit ASCII code
Parity bits
Single-bit error in 7-bit code of “A”, e.g., 1000101, will changesymbol to “E” or 1000000 to “@”. But error will be detected inthe 8-bit code because the error changes the specified parity.
5353
![Page 54: ELEC 7770 Advanced VLSI Design Spring 2014 Soft Errors and Fault-Tolerant Design](https://reader035.vdocuments.mx/reader035/viewer/2022062301/56814fd1550346895dbd925b/html5/thumbnails/54.jpg)
Richard W. HammingRichard W. Hamming Error-correcting codes Error-correcting codes
(ECC).(ECC). Also known forAlso known for
Hamming distance Hamming distance HD = Number of bits two HD = Number of bits two
binary binary vectors vectors differ differ inin
Example:Example:
HD(1101, 1010) = 3HD(1101, 1010) = 3 Hamming Medal, 1988Hamming Medal, 1988
Spring 2014, Apr 11 . . .Spring 2014, Apr 11 . . . ELEC 7770: Advanced VLSI Design (Agrawal)ELEC 7770: Advanced VLSI Design (Agrawal)
1915-19985454
![Page 55: ELEC 7770 Advanced VLSI Design Spring 2014 Soft Errors and Fault-Tolerant Design](https://reader035.vdocuments.mx/reader035/viewer/2022062301/56814fd1550346895dbd925b/html5/thumbnails/55.jpg)
The Idea of Hamming CodeThe Idea of Hamming Code
Spring 2014, Apr 11 . . .Spring 2014, Apr 11 . . . ELEC 7770: Advanced VLSI Design (Agrawal)ELEC 7770: Advanced VLSI Design (Agrawal) 5555
Code space contains 2N possible N-bit code words
1010”A”
1110”E”
1011”B”
1000”8”
0010”2”
1-bit error in “A”
HD = 1HD = 1
HD = 1HD = 1
Error not correctable. Reason: No redundancy.Hamming’s idea: Increase HD between valid code words.
![Page 56: ELEC 7770 Advanced VLSI Design Spring 2014 Soft Errors and Fault-Tolerant Design](https://reader035.vdocuments.mx/reader035/viewer/2022062301/56814fd1550346895dbd925b/html5/thumbnails/56.jpg)
Hamming’s Distance ≥ 3 CodeHamming’s Distance ≥ 3 Code
Spring 2014, Apr 11 . . .Spring 2014, Apr 11 . . . ELEC 7770: Advanced VLSI Design (Agrawal)ELEC 7770: Advanced VLSI Design (Agrawal) 5656
1010010
”A”1-bit error in “A”shortest distancedecoding eliminateserror
HD = 2
HD = 1
0010101
”2”
1000111
”8”1011001
”B”
1110100
”E”
HD = 3
HD = 3
HD = 3
HD = 4
0010010
”?”
HD = 3
HD = 4
HD = 4
0011110
”3”
![Page 57: ELEC 7770 Advanced VLSI Design Spring 2014 Soft Errors and Fault-Tolerant Design](https://reader035.vdocuments.mx/reader035/viewer/2022062301/56814fd1550346895dbd925b/html5/thumbnails/57.jpg)
Minimum Distance-3 Hamming CodeMinimum Distance-3 Hamming CodeSymbol
Original code
Odd-parity code
ECC, HD ≥ 3
0 0000 10000 0000000
1 0001 00001 0001011
2 0010 00010 0010101
3 0011 10011 0011110
4 0100 00100 0100110
5 0101 10101 0101101
6 0110 10110 0110011
7 0111 00111 0111000
8 1000 01000 1000111
9 1001 11001 1001100
A 1010 11010 1010010
B 1011 01011 1011001
C 1100 11100 1100001
D 1101 01101 1101010
E 1110 01110 1110100
F 1111 11111 1111111
Spring 2014, Apr 11 . . .Spring 2014, Apr 11 . . . ELEC 7770: Advanced VLSI Design (Agrawal)ELEC 7770: Advanced VLSI Design (Agrawal) 5757
Original code: Symbol “0” with a single-bit error will be Interpreted as“1”, “2”, “4” or “8”.
Reason: Hamming distance betweencodes is 1. A code with any bit error willmap onto another valid code.
Remedy: Design codes with HD ≥ 2.Example: Parity code. Single bit errordetected but not correctable.
Remedy: Design codes with HD ≥ 3.For single bit error correction, decodeas the valid code at HD = 1.
For more error bit detection orcorrection, design code with HD ≥ 4.
![Page 58: ELEC 7770 Advanced VLSI Design Spring 2014 Soft Errors and Fault-Tolerant Design](https://reader035.vdocuments.mx/reader035/viewer/2022062301/56814fd1550346895dbd925b/html5/thumbnails/58.jpg)
A Book on Coding TheoryA Book on Coding Theory
R. W. Hamming, R. W. Hamming, Coding and Information TheoryCoding and Information Theory, , Englewood Cliffs, New Jersey: Prentice-Hall, Englewood Cliffs, New Jersey: Prentice-Hall, 1980.1980.
Spring 2014, Apr 11 . . .Spring 2014, Apr 11 . . . ELEC 7770: Advanced VLSI Design (Agrawal)ELEC 7770: Advanced VLSI Design (Agrawal) 5858
![Page 59: ELEC 7770 Advanced VLSI Design Spring 2014 Soft Errors and Fault-Tolerant Design](https://reader035.vdocuments.mx/reader035/viewer/2022062301/56814fd1550346895dbd925b/html5/thumbnails/59.jpg)
Byzantine Empire, 527-565Byzantine Empire, 527-565
Spring 2014, Apr 11 . . .Spring 2014, Apr 11 . . . ELEC 7770: Advanced VLSI Design (Agrawal)ELEC 7770: Advanced VLSI Design (Agrawal) 5959
Emperor Justinian and General Belisarius
![Page 60: ELEC 7770 Advanced VLSI Design Spring 2014 Soft Errors and Fault-Tolerant Design](https://reader035.vdocuments.mx/reader035/viewer/2022062301/56814fd1550346895dbd925b/html5/thumbnails/60.jpg)
Byzantine General’s ProblemByzantine General’s Problem
In a war a general needs to communicate an In a war a general needs to communicate an attack (a) or retreat (r) order to subordinates in attack (a) or retreat (r) order to subordinates in the field.the field.
For success a perfect agreement is necessary.For success a perfect agreement is necessary. Byzantine Fault:Byzantine Fault:
Subordinates can be unreliable or malicious.Subordinates can be unreliable or malicious. Communication (messengers) can be unreliable or Communication (messengers) can be unreliable or
malicious.malicious.
Spring 2014, Apr 11 . . .Spring 2014, Apr 11 . . . ELEC 7770: Advanced VLSI Design (Agrawal)ELEC 7770: Advanced VLSI Design (Agrawal) 6060
![Page 61: ELEC 7770 Advanced VLSI Design Spring 2014 Soft Errors and Fault-Tolerant Design](https://reader035.vdocuments.mx/reader035/viewer/2022062301/56814fd1550346895dbd925b/html5/thumbnails/61.jpg)
Example 1: Single FaultExample 1: Single Fault
General: D; Subordinates: A, B and CGeneral: D; Subordinates: A, B and C
Spring 2014, Apr 11 . . .Spring 2014, Apr 11 . . . ELEC 7770: Advanced VLSI Design (Agrawal)ELEC 7770: Advanced VLSI Design (Agrawal) 6161
D
A B C
r→ar
r
![Page 62: ELEC 7770 Advanced VLSI Design Spring 2014 Soft Errors and Fault-Tolerant Design](https://reader035.vdocuments.mx/reader035/viewer/2022062301/56814fd1550346895dbd925b/html5/thumbnails/62.jpg)
Example 1: Majority AgreementExample 1: Majority Agreement
General: D; Subordinates: A, B and CGeneral: D; Subordinates: A, B and C
Spring 2014, Apr 11 . . .Spring 2014, Apr 11 . . . ELEC 7770: Advanced VLSI Design (Agrawal)ELEC 7770: Advanced VLSI Design (Agrawal) 6262
D
A B C
r→ar
r
a
a
r r
r
r
Retreat RetreatRetreat
![Page 63: ELEC 7770 Advanced VLSI Design Spring 2014 Soft Errors and Fault-Tolerant Design](https://reader035.vdocuments.mx/reader035/viewer/2022062301/56814fd1550346895dbd925b/html5/thumbnails/63.jpg)
Example 2: Two FaultsExample 2: Two Faults
General: D; Subordinates: A, B and CGeneral: D; Subordinates: A, B and C
Spring 2014, Apr 11 . . .Spring 2014, Apr 11 . . . ELEC 7770: Advanced VLSI Design (Agrawal)ELEC 7770: Advanced VLSI Design (Agrawal) 6363
D
A B C
aa
a
![Page 64: ELEC 7770 Advanced VLSI Design Spring 2014 Soft Errors and Fault-Tolerant Design](https://reader035.vdocuments.mx/reader035/viewer/2022062301/56814fd1550346895dbd925b/html5/thumbnails/64.jpg)
Example 2: Byzantine FailureExample 2: Byzantine Failure
General: D; Subordinates: A, B and CGeneral: D; Subordinates: A, B and C
Spring 2014, Apr 11 . . .Spring 2014, Apr 11 . . . ELEC 7770: Advanced VLSI Design (Agrawal)ELEC 7770: Advanced VLSI Design (Agrawal) 6464
D
A B C
aa
a
r
r
r r
a
a
RetreatAttackAttack
![Page 65: ELEC 7770 Advanced VLSI Design Spring 2014 Soft Errors and Fault-Tolerant Design](https://reader035.vdocuments.mx/reader035/viewer/2022062301/56814fd1550346895dbd925b/html5/thumbnails/65.jpg)
Byzantine Resilient SystemByzantine Resilient System A system that can correctly function in presence of A system that can correctly function in presence of
Byzantine faults.Byzantine faults. Byzantine protocol for n node system:Byzantine protocol for n node system:
Any node can initiate a message broadcast.Any node can initiate a message broadcast. All nodes rebroadcast the received message to all nodes All nodes rebroadcast the received message to all nodes
it has not heard from.it has not heard from. After communications end, nodes take majority decision.After communications end, nodes take majority decision.
Ref.: L. Lamport, R. Shostak and M. Pease, “The Ref.: L. Lamport, R. Shostak and M. Pease, “The Byzantine General’s Problem,” Byzantine General’s Problem,” ACM Trans. Prog. ACM Trans. Prog. Lang. SystLang. Syst., vol. 4, no. 3, pp. 382-401, July 1982.., vol. 4, no. 3, pp. 382-401, July 1982.
Spring 2014, Apr 11 . . .Spring 2014, Apr 11 . . . ELEC 7770: Advanced VLSI Design (Agrawal)ELEC 7770: Advanced VLSI Design (Agrawal) 6565
![Page 66: ELEC 7770 Advanced VLSI Design Spring 2014 Soft Errors and Fault-Tolerant Design](https://reader035.vdocuments.mx/reader035/viewer/2022062301/56814fd1550346895dbd925b/html5/thumbnails/66.jpg)
Byzantine Resilience ConditionsByzantine Resilience Conditions
In order to tolerate t failures:In order to tolerate t failures: The system must have at least 3t + 1 nodes.The system must have at least 3t + 1 nodes. There must be at least 2t +1 disjoint There must be at least 2t +1 disjoint
communication paths between nodes.communication paths between nodes. A node must exchange messages at least t +1 A node must exchange messages at least t +1
times.times.
Spring 2014, Apr 11 . . .Spring 2014, Apr 11 . . . ELEC 7770: Advanced VLSI Design (Agrawal)ELEC 7770: Advanced VLSI Design (Agrawal) 6666
![Page 67: ELEC 7770 Advanced VLSI Design Spring 2014 Soft Errors and Fault-Tolerant Design](https://reader035.vdocuments.mx/reader035/viewer/2022062301/56814fd1550346895dbd925b/html5/thumbnails/67.jpg)
Four-Core Processor SystemFour-Core Processor System
Spring 2014, Apr 11 . . .Spring 2014, Apr 11 . . . ELEC 7770: Advanced VLSI Design (Agrawal)ELEC 7770: Advanced VLSI Design (Agrawal) 6767
A
B
C
D
![Page 68: ELEC 7770 Advanced VLSI Design Spring 2014 Soft Errors and Fault-Tolerant Design](https://reader035.vdocuments.mx/reader035/viewer/2022062301/56814fd1550346895dbd925b/html5/thumbnails/68.jpg)
Example 1: C Initiates Message m, Example 1: C Initiates Message m, Sends n to A and m to B and DSends n to A and m to B and D
Processor First roundSecond round
Decoded message
A n m m m
B m m n m
D m m n m
Spring 2014, Apr 11 . . .Spring 2014, Apr 11 . . . ELEC 7770: Advanced VLSI Design (Agrawal)ELEC 7770: Advanced VLSI Design (Agrawal) 6868
![Page 69: ELEC 7770 Advanced VLSI Design Spring 2014 Soft Errors and Fault-Tolerant Design](https://reader035.vdocuments.mx/reader035/viewer/2022062301/56814fd1550346895dbd925b/html5/thumbnails/69.jpg)
Example 2: C Initiates Message m, Example 2: C Initiates Message m, B Sends p to A and DB Sends p to A and D
Processor First roundSecond round
Decoded message
A m m p m
B m m m m
D m m p m
Spring 2014, Apr 11 . . .Spring 2014, Apr 11 . . . ELEC 7770: Advanced VLSI Design (Agrawal)ELEC 7770: Advanced VLSI Design (Agrawal) 6969
![Page 70: ELEC 7770 Advanced VLSI Design Spring 2014 Soft Errors and Fault-Tolerant Design](https://reader035.vdocuments.mx/reader035/viewer/2022062301/56814fd1550346895dbd925b/html5/thumbnails/70.jpg)
Example 2: C Initiates Message m, Example 2: C Initiates Message m, A and B generate faulty message qA and B generate faulty message q
Processor First roundSecond round
Decoded message
A m m q m
B m m q m
D m q q q
Spring 2014, Apr 11 . . .Spring 2014, Apr 11 . . . ELEC 7770: Advanced VLSI Design (Agrawal)ELEC 7770: Advanced VLSI Design (Agrawal) 7070
![Page 71: ELEC 7770 Advanced VLSI Design Spring 2014 Soft Errors and Fault-Tolerant Design](https://reader035.vdocuments.mx/reader035/viewer/2022062301/56814fd1550346895dbd925b/html5/thumbnails/71.jpg)
ReferencesReferences L. Lamport, R. Shostak and M. Pease, “The L. Lamport, R. Shostak and M. Pease, “The
Byzantine General’s Problem,” Byzantine General’s Problem,” ACM Trans. ACM Trans. Prog. Lang. Syst., Prog. Lang. Syst., vol. 4, no. 3, pp. 382-401, vol. 4, no. 3, pp. 382-401, July 1982.July 1982.
D. K. Pradhan, D. K. Pradhan, Fault-Tolerant Computer System Fault-Tolerant Computer System Design,Design, Upper Saddle River, New Jersey: Upper Saddle River, New Jersey: Prentice Hall PTR, 1996.Prentice Hall PTR, 1996.
P. K. Lala, P. K. Lala, Self-Checking and Fault-Tolerant Self-Checking and Fault-Tolerant Digital DesignDigital Design, San Francisco: Morgan-, San Francisco: Morgan-Kaufmann, 2001.Kaufmann, 2001.
Spring 2014, Apr 11 . . .Spring 2014, Apr 11 . . . ELEC 7770: Advanced VLSI Design (Agrawal)ELEC 7770: Advanced VLSI Design (Agrawal) 7171