Lehrstuhl Technische Informatik - Computer Engineering
Brandenburgische Technische Universität Cottbus
Transient and Permanent Faults in Nanoelectronic ICs: Compensation and Repair
Problems, Solutions, Limitations
H. T. VierhausBTU Cottbus
Computer Engineering
Lehrstuhl Technische Informatik - Computer Engineering
Brandenburgische Technische Universität Cottbus
Outline
1. Introduction: Nanostructure Problems
3. Repair of Permanent Faults
4. Bus Structures and NoCs
5. Diagnostic Test
6. A Lot of Things to do ...
2. Transient Faults
Lehrstuhl Technische Informatik - Computer Engineering
Brandenburgische Technische Universität Cottbus
1. Introduction
A bunch of new problems from nanostructures ...
Lehrstuhl Technische Informatik - Computer Engineering
Brandenburgische Technische Universität Cottbus
Nanoelectronic Problems
Lithography:
The wavelength used to „map“ structural information frommasks to wafers is larger (4 times of more) than the minimumstructural features (193 versus 90 / 65 / 45 nm).
Adaptation of layouts for correction of mapping faults
Parameter variations:
The number of atoms in MOS- transistor channels becomes sosmall that statistical variations of doping densities have an impacton device parameters such as threshold voltages.
Lehrstuhl Technische Informatik - Computer Engineering
Brandenburgische Technische Universität Cottbus
Doping Fluctuations in MOS Transistors
p-Substrate
n n
Poly-Si
doping atom
p-Substrate
n n
Poly-Si
doping atom
Density and distribution of doping atomscause shifts in transistor threshold voltages!
Lehrstuhl Technische Informatik - Computer Engineering
Brandenburgische Technische Universität Cottbus
Nanostructure ProblemsIndividual device characteristics such as Vth are more dependent on statistical variations of underlying physical features such as doping profiles.
A significant share of basic devices will be „out or specs“ and needs a replacement by backup elements for yield improvement after production.
As smaller features mean higher stress (field strength, current density), also early failures „in the field“ are more likely and must be compensated.
Transient error recognition and compensation „in time“ is becoming a must due to e. g. charged particles that can discharge circuit nodes.
Lehrstuhl Technische Informatik - Computer Engineering
Brandenburgische Technische Universität Cottbus
Fault tolerant computing
An old technology that is already heavily used in every day computing(e.g. memory interfaces with ECC- check and correction).
Is required to handle intermittent and transient fault effects, e.g. induced by radiation.
Can handle only a limited number of permanent faults!
Built-in self test (BIST) and self-repair (BISR) Is required to handle permanent faults by self-repair using redundant elements.State-of-the-art for memories, not for logic.
Can handle multiple faults (sequentially) until the resource of redundancy is exhausted.
Algorithms that are fully or partially „fault hard“Most DSP algorithms show an inherent „stability“ and work even underfault conditions with reduced precision. The effect can be „HW-enhanced“.
Key Technologies
Lehrstuhl Technische Informatik - Computer Engineering
Brandenburgische Technische Universität Cottbus
System-on-a Chip (SoC)
LocalMemory
DSP
LocalMemory
DSP
RISCLocalMemory
FU 1 FU 2 FU 3globalbus
localbus
Buscoupler
globalbus
SoCs are heterogeneoussystems that requiretest & repair strategies for:
- logic (also in processors)
- memory blocks
- interconnects
- analog and D/A components
Lehrstuhl Technische Informatik - Computer Engineering
Brandenburgische Technische Universität Cottbus
Fault Tolerant Computing
Faultevent
Software-basedfault detection
& compensation
HW logic & RT-level
detection &compensation
Works onlyfor transient faults!
Typically worksfor transient and permanent faults!
Transistor-and switch levelcompensation
Typically worksfor specific types of
transient faultsonly!
specific
veryspecific
universal
Lehrstuhl Technische Informatik - Computer Engineering
Brandenburgische Technische Universität Cottbus
2. Transient Fault Effects
Lehrstuhl Technische Informatik - Computer Engineering
Brandenburgische Technische Universität Cottbus
Storage Nodes and Particles
1
10
100
Q / fC
Technology0,35 0,25 0,18 0,09
1 MeV Alpha-Particle generates 42 fC Charge!
Alpha-Part.
Lehrstuhl Technische Informatik - Computer Engineering
Brandenburgische Technische Universität Cottbus
Contribution to Soft-Error Rates
Static combinational logic: 11 %
Sequential elements (FFs, Latches): 49 %
Unprotected SRAM: 40 %
Source: S. Mitra, N. Seifert, M. Zhang, Q. Shi, K. S. Kim,„Robust System Design with Built-In Soft Error Resilience“IEEE Computer, Vol. 38, No.2, Febr. 2005, pp. 43-52
Lehrstuhl Technische Informatik - Computer Engineering
Brandenburgische Technische Universität Cottbus
Spikes and Clock Rates in Logic
Source: Pulse of 100 ps
t
clock
t
clock
Charge-/status restorationis possible
Charge-/status restorationis impossible
Fault probability is digital logic is about proportionalto clock frequency!
Lehrstuhl Technische Informatik - Computer Engineering
Brandenburgische Technische Universität Cottbus
Logic Structures and Fault EventsInput-FFs Output
FFs
Particle-radiation
Flip-flops need fault tolerance / fault hardeningin the first place, logic close-to outputs comes next.
Lehrstuhl Technische Informatik - Computer Engineering
Brandenburgische Technische Universität Cottbus
Muller-C-Element
&
&
&
OR
If both inputs are equal, the value is stored.
If the inputs are different, the previous value is kept.
Lehrstuhl Technische Informatik - Computer Engineering
Brandenburgische Technische Universität Cottbus
Fault-Tolerant Latch Design
Latch1
Latch2
MullerC-Element
out
CL
in
t
v(t)
clock
outl1
outl2
outl1= inoutl2= in
outl1,outl2latched
outl1= inoutl2= in
If clock is high: out = in
Lehrstuhl Technische Informatik - Computer Engineering
Brandenburgische Technische Universität Cottbus
Fault HandlingMuller-C-Element:
If both inputs are equal: out = outl1, outl2
If both element are not equal: out = previous (outl1, outl2)
Under local fault conditions on the latch outputs (one of 2 latches false), the C-element preserves the outputcondition from the „charge“ phase of the latch.
Latch1
Latch2
MullerC-Element
outin
outl1
outl2 Essentially 3 latches!
Lehrstuhl Technische Informatik - Computer Engineering
Brandenburgische Technische Universität Cottbus
Intel‘s Scan Path Element
OR
1D
C1
2DC2
SI
SCB
SCA
Cap-
ture
CLK
D
update
C11D
C1C2
LatchLA
LatchPH2
1D
C1
LatchLB
LatchPH1
2D
1D
SO
Q
Lehrstuhl Technische Informatik - Computer Engineering
Brandenburgische Technische Universität Cottbus
Intel‘s Scan Path Element plus Fault Compensation
OR
1D
C1
2DC2
SI
SCB
SCA
&
Cap-
ture
CLK
D
update
C11D
C1C2
LatchLA
LatchPH2
1D
C1
LatchLB
LatchPH1
KeeperLatch
2D
1D
SO
Q
C-Element
Test
Lehrstuhl Technische Informatik - Computer Engineering
Brandenburgische Technische Universität Cottbus
TMR-Latch / Flip-Flop
XOR
in
FF1
FF2
FF3
Out = L1out with cout = 1
MUX
cout
Out = L2out with cout = 0
clock
Can compensate static or dynamic faults in latches / FFs!
Works with latches or flip-flops-
FF1 is untestable (active redundancy)
Lehrstuhl Technische Informatik - Computer Engineering
Brandenburgische Technische Universität Cottbus
TMR-Scan-Element
XOR
ff1
ff2
ff3
Mux3
clock
Scanin
out
Mux1
D
Mux2
SC
TMR
ic
&
ic TMR out
0 1 ff21 1 ff1
0 0 ff20 0 ff2
funct.
Scantest
Scanin dyn.
SC
00
00
11
t1t2
t1t2
0 0 0 ff2
ff20ff2ff2
00 0
1 0 0
Scanin stat.
dyn.
11
01
ff2
ff111
Scanteststat.
0
0
0
1
1
1
ff2
ff1
Signals
Lehrstuhl Technische Informatik - Computer Engineering
Brandenburgische Technische Universität Cottbus
TMR Scan-Element
Fault tolerant in functional mode
Fault tolerant in scan-mode
Optional support of test strategies that require a specific sequence of 2 input bits!
Lehrstuhl Technische Informatik - Computer Engineering
Brandenburgische Technische Universität Cottbus
Fault tolerant Latches and FFs
No. oftrans.
Contr.signals
fault tol.funct.dyn.
fault tol.scan dyn.
fault tol.ffs static
Latch withC-elem. [9]
20
0
yes
-
no
Scan-pathcell + C-el. [9]
48
5 (2 clocks)
yes
no
no
TMR scan path .elem.
Scan pathcell [9]
34
4 (2 clocks)
2-pat.scan test
-
no
66
2 (1 clock)
yes
yes
yes
no no yes
TMR-latch
24
yes
yes
0
- -
no
Lehrstuhl Technische Informatik - Computer Engineering
Brandenburgische Technische Universität Cottbus
Fault Compensation in Combinational Logic
Input-FFsParticle-radiation
DMC
DMC
DMC
Lehrstuhl Technische Informatik - Computer Engineering
Brandenburgische Technische Universität Cottbus
Fault Compensation in Combinational Logic
V(t)
V(t)
V(t)
fault-free signal
Signal with glitch
Signal with delayed glitch
MC capture MC no capture /hold
MC capture
t
t
t
Latchclose
Time leftto capture!
Lehrstuhl Technische Informatik - Computer Engineering
Brandenburgische Technische Universität Cottbus
3. Repair of Permanents Faults
Compensation of transient faults is not enough.
Some technologies for transient compensation can handle permanent faults, too, but not on the long run and withadditional transient faults!
Lehrstuhl Technische Informatik - Computer Engineering
Brandenburgische Technische Universität Cottbus
Memory Test & Repair
Lines
columns
Lineaddress
Read-/Write lines
spare column
Lehrstuhl Technische Informatik - Computer Engineering
Brandenburgische Technische Universität Cottbus
Memory Test & Repair (2)
Lines
columns
Lineaddress
Read-/Write lines
spare column
MemoryBIST
controller... is already state-of-the-art!
Lehrstuhl Technische Informatik - Computer Engineering
Brandenburgische Technische Universität Cottbus
Logic Self Repair
Size or replaced blocks(granularity)
Repair procedureoverhead
FunctioningElements lost
Size or replaced blocks(granularity)
Repair procedureoverhead
FunctioningElements lost
Lehrstuhl Technische Informatik - Computer Engineering
Brandenburgische Technische Universität Cottbus
Granularity of Replacement
Granularity(transistors)
100 101 102 103 104 105 106
trans. gate macroFPGA-block
cores CPU
Block-levelreplacement
(e. g. FPGAs)
Core-Replacement(e. g. CPU)
Expected fault density (1 out of..)
Hardly explored(logic)
Granularity(transistors)
100 101 102 103 104 105 106
trans. gate macroFPGA-block
cores CPU
Block-levelreplacement
(e. g. FPGAs)
Core-Replacement(e. g. CPU)
Expected fault density (1 out of..)
Hardly explored(logic)
Lehrstuhl Technische Informatik - Computer Engineering
Brandenburgische Technische Universität Cottbus
Levels of RepairTransistors - Switch LevelReplace transistors or transistor groupsLosses by reconfiguration: (switched-off „good“ devices):
Overhead for test and diagnosis: Very highPotentially small ( 20 – 50%) for transistor faults
Gate LevelReplace gates or logic cellsLosses by reconfiguration: Medium (60 to 90 %) for single transistor faultsOverhead for test and diagnosis: Medium to high
Macro-Block LevelReplace functional macros (ALU, FPU, CPU)Losses by reconfiguration: High, 99 % or more
Overhead for test and diagnosis: Low
Lehrstuhl Technische Informatik - Computer Engineering
Brandenburgische Technische Universität Cottbus
Replacement in Regular Structures (e.g. for DSP)
Z -1
Z -1
Z -1
Z -1
Z -1
Z -1
Z -1
Z -1
x
x
x
x
x
x
x
xx
x
x
+
++
+
+
+
+
+
+
+
+
+
c0
c1
c2
c3
cM-1
cM
d1
d2
d3
dN-1
dN
x (n)
InputOutputy (n)
Verzöge-rungen
y (n-1)
y (n-2)
y (n-3)
y (n-N-1)
y(n-N)
x(n-1)
x(n-2)
x(n-3)
x(n-M-1)
x(n-M)
Addierer
Multipliz.Z -1
Z -1
Z -1
Z -1
Z -1
Z -1
Z -1
Z -1
x
x
x
x
x
x
x
xx
x
x
+
++
+
+
+
+
+
+
+
+
+
c0
c1
c2
c3
cM-1
cM
d1
d2
d3
dN-1
dN
x (n)
InputOutputy (n)
Verzöge-rungen
y (n-1)
y (n-2)
y (n-3)
y (n-N-1)
y(n-N)
x(n-1)
x(n-2)
x(n-3)
x(n-M-1)
x(n-M)
Addierer
Multipliz.
+
Macro-replacement
faulty
Z-1
Z -1
Z -1
Z -1
Z -1
Z -1
Z -1
Z -1
Z -1
x
x
x
x
x
x
x
xx
x
x
+
++
+
+
+
+
+
+
+
+
+
c0
c1
c2
c3
cM-1
cM
d1
d2
d3
dN-1
dN
x (n)
InputOutputy (n)
Verzöge-rungen
y (n-1)
y (n-2)
y (n-3)
y (n-N-1)
y(n-N)
x(n-1)
x(n-2)
x(n-3)
x(n-M-1)
x(n-M)
Addierer
Multipliz.Z -1
Z -1
Z -1
Z -1
Z -1
Z -1
Z -1
Z -1
x
x
x
x
x
x
x
xx
x
x
+
++
+
+
+
+
+
+
+
+
+
c0
c1
c2
c3
cM-1
cM
d1
d2
d3
dN-1
dN
x (n)
InputOutputy (n)
Verzöge-rungen
y (n-1)
y (n-2)
y (n-3)
y (n-N-1)
y(n-N)
x(n-1)
x(n-2)
x(n-3)
x(n-M-1)
x(n-M)
Addierer
Multipliz.
+
Macro-replacement
faulty
Z-1
Lehrstuhl Technische Informatik - Computer Engineering
Brandenburgische Technische Universität Cottbus
Parallel Backup Transistors
VDD
GND
VDD
GND
outin1
in2
in1
in2
out
redundanttransistors
Basic gate Gate with redundant transistors
Lehrstuhl Technische Informatik - Computer Engineering
Brandenburgische Technische Universität Cottbus
Redundancy by „Active“ Parallel Transistors
Active redundancy is not testable. Therefore there is no way to monitor the status of „available“ redundancy in a logic circuit.
Parallel transistors cannot compensate a fault of the „stuck-on“ type (transistor always conducting).
Faulty „backup“-transistors may produce additional faults that cannot be corrected!
Adding redundancy is not enough, fault isolation is a real problem!
Lehrstuhl Technische Informatik - Computer Engineering
Brandenburgische Technische Universität Cottbus
Configuration and Fault Isolation
VDD
GND
outin1
in2
VDD
GND
out
in1
in2
config. switches
AnAn
config.switches
backuptransistors
Ap Ap
stuck-onfault
Lehrstuhl Technische Informatik - Computer Engineering
Brandenburgische Technische Universität Cottbus
The Gate-Short-Problem
Load1
Load2
Driver
Gate-short
GND-shorts of input gates affect the whole fan-innetwork and make redundancy obsolete!!
Lehrstuhl Technische Informatik - Computer Engineering
Brandenburgische Technische Universität Cottbus
Gate Turn-off
VDD
GND
out
in1
in2
config. switches
AnAn
config.switches
backuptransistors
Ap Ap
gate_control
input shut-offswitches
Lehrstuhl Technische Informatik - Computer Engineering
Brandenburgische Technische Universität Cottbus
Schematic Layout with VDD/GND Switches
Ap Ap
Anin1 in2GND
VDD
out
in1
An
in2
pass-transistorgate-sep.
pass-transistorgate-sep.
GateSep.
GateSep.
GND
VDD
outin 1 in 2
n-diff.
p-diff.
metal 1
poly-Si
contact
metal2
via
redundantstripes ofp / n-diffusion
Gate with parallelredundancy
Gate with parallel redundancy andfault isolation
Lehrstuhl Technische Informatik - Computer Engineering
Brandenburgische Technische Universität Cottbus
Transistor-Level Overhead
Overhead(cells only)
paralleltransistors
VDD / GNDswitches
separate gate poly lines
stuck-off coveragestuck-oncoverage
gateshorts cov.
control
30-40% 60-80 % 100-150 %
yes yes yes
no yes yes
no no yes
none one wire mult. wires
Redundancy
lines
estimates
Lehrstuhl Technische Informatik - Computer Engineering
Brandenburgische Technische Universität Cottbus
Duplicate Standard CellsVDD
VDD-SwitchSwitchcontrol
VDD1
GND
out
in1
in2
Gate 1
out
VDD2
GND
in1
in2
Gate 2
Lehrstuhl Technische Informatik - Computer Engineering
Brandenburgische Technische Universität Cottbus
Again: Fault IsolationVDD
VDD-SwitchSwitchcontrol
VDD1
GND
out
in1
in2
Gate 1
out
VDD2
GND
in1
in2
Gate 2
Output VDD / GND shortGate input short
Lehrstuhl Technische Informatik - Computer Engineering
Brandenburgische Technische Universität Cottbus
Administrated Duplicate Cells
Gate 1 Gate 2
GND
VDD
VDD1 VDD2
Act 1
GND1 GND2
Gateshort
gate in
gateout
gate in
gateout
power switches
GND switchesAct 2
0 1
1 0
X 0
X 1
1 0
0 X
1 X
Lehrstuhl Technische Informatik - Computer Engineering
Brandenburgische Technische Universität Cottbus
FeaturesUse „normal“ cell designsFour states of operation: Config. 1: Gate 1 active, Gate 2 isolatedConfig. 2: Gate 2 active, Gate 1 isolatedConfig. 3: Both Gates active operating in parallel Config. 4: Both Gates isolated from VDD / GNDOperations like „high / low power“ possible.Cells can be put to temporary „sleep“ for stress relieve.Permanent repair functions.Active cell output is connected only to „floating“outputs of the other cell.If twin tubs are used and cell-internal tubs arealso disconnected, gate input / GND short prohibited.
Lehrstuhl Technische Informatik - Computer Engineering
Brandenburgische Technische Universität Cottbus
Bistable Switching Cell
Gate 1 Gate 2
GND
VDD
Act
0 1
0 1
1 0
1 0
0 1
1 0
1 0
0 1Outputseparation
Lehrstuhl Technische Informatik - Computer Engineering
Brandenburgische Technische Universität Cottbus
Cell Duplication and Power Switch
Possible for all types of cells (also flip-flops).
Granularity of partitioning for replacements (single gates,blocks) can be selected upon demand.
Combination with dynamic circuit optimization is favorablypossible.
Good coverage potential for transistor faults.
Significant overhead (above 100 %), but most likely belowTriple Modular Redundancy (TMR).
Redundancy may become exhausted and requires a further levelof redundancy!
Lehrstuhl Technische Informatik - Computer Engineering
Brandenburgische Technische Universität Cottbus
Gate - Replacement
Std cells (gates)
Gate-fault
backup-cell
Insertion of replacement cell
Lehrstuhl Technische Informatik - Computer Engineering
Brandenburgische Technische Universität Cottbus
Regular Logic Wiring
feeddrive
Con
fig
Blo
ck
logicgates
link
backupcell
link
next cell
next cell
next cell
Lehrstuhl Technische Informatik - Computer Engineering
Brandenburgische Technische Universität Cottbus
Faults on Irregular Interconnects
Ssignal source C
C
C
C
Routing tree
single fault(line break)
Lehrstuhl Technische Informatik - Computer Engineering
Brandenburgische Technische Universität Cottbus
Redundant Wiring
Ssignal source C
C
C
C
Routing tree with loops
single fault(line break)
extra wire .. plus double vias!
Problem: classic delay calculation works well on trees only!
Lehrstuhl Technische Informatik - Computer Engineering
Brandenburgische Technische Universität Cottbus
4. Bus Structures and „Networks on Chip“ (NoCs)
Technology forecasts predict that nano-wires may becomethe most vulnerable and unreliable circuit elements ...
Lehrstuhl Technische Informatik - Computer Engineering
Brandenburgische Technische Universität Cottbus
Buses versus NoCs
Busmaster
Busmaster
Busmaster
Busmaster
Busmaster
NoCnode
NoCnode
NoCnode
NoCnode
NoCnode
NoCnode
NoCnode
NoCnode
NoCnode
Irregular bus structure(SoC)
Regular network structure(NoC)
Lehrstuhl Technische Informatik - Computer Engineering
Brandenburgische Technische Universität Cottbus
Faults on Bus Structures
BM1
BM2
BM3
BM4
BM5
BM6
Local defectaffecting thetotal network
Lehrstuhl Technische Informatik - Computer Engineering
Brandenburgische Technische Universität Cottbus
Bus Fault Conditions
A single permanent fault on a bus may affect the busas a whole.
Fault detection and compensation by methods developedfor transient faults (Hamming code, ECC-checks) can handlestatic faults, but are relatively expensive.
Capabilities of handling transient faults on top of permanentfaults are limited.
Technology forecasts predict a reliability problem withinterconnects (nano-wires) in nano-technologies.
Lehrstuhl Technische Informatik - Computer Engineering
Brandenburgische Technische Universität Cottbus
Bus Segmentation
BM1
BM2
BM3
BM4
BM5
BM6
SC
SC
SC
SC
SC
SC
SC
SC
SC
segmentcouplers
Structure the bus into segments that can be repairedindividually!
Lehrstuhl Technische Informatik - Computer Engineering
Brandenburgische Technische Universität Cottbus
The Switching Problem
n n+k
1 1p p
n k p switches contr. states
8 1 1 16 9
16 1 1 32 3332 2 2 128 65
n
backup
Lehrstuhl Technische Informatik - Computer Engineering
Brandenburgische Technische Universität Cottbus
Faults and Repair Actions1. Line- break: Section of a line is interrupted
use spare wire!
2. Line- short to GND: Section of a line is connected to GND
use spare wire!
3. Dynamic coupling between adjacent line:
a. Re-allocate lines in bundle
b. Insert grounded line for decoupling
4. Bridge between lines:
a. Feed both lines with same signal
b. Make one line „floating“
Lehrstuhl Technische Informatik - Computer Engineering
Brandenburgische Technische Universität Cottbus
Single Line Replacement
s0
(k-1)
b0
b1
s1s2s3s4
b2
Bachup
Signal
Overhead: 2k switches, (k+1) logic states for 1 backup line
2pk switches, p (k+1) logic states for p backup lines
Fault
Lehrstuhl Technische Informatik - Computer Engineering
Brandenburgische Technische Universität Cottbus
Inserting Lines for Decoupling
s0
(k-1)
b0
b1
s1s2s3s4
b2
Backup
Signal
coupling-fault
Multiple line insertion for de-coupling requires multipleShifts of lines, multiple switches and states!
Lehrstuhl Technische Informatik - Computer Engineering
Brandenburgische Technische Universität Cottbus
Repair Mechanisms
Buses with „extra“ backup lines that need specific configurationfor repair generate high cost in terms of switches and administration due to many „logic states“ of the bus section.
Such repair schemes are not suited to re-organize neighborhood relations on buses for de-coupling of lines.
Try to cover all relevant fault conditions by a small set ofstates using permutation of lines!
Lehrstuhl Technische Informatik - Computer Engineering
Brandenburgische Technische Universität Cottbus
Reconfiguration for De-Couplings0
s5
SC SC
s0
s5
SC SC s0
s5
reconfigure
… can help to minimize dynamic coupling faults!
ik
ik
ik
ik
2-Way Switchesmay be used!
Lehrstuhl Technische Informatik - Computer Engineering
Brandenburgische Technische Universität Cottbus
Characteristics of 6 / 8 Wire BundlesGiven a bundle of 6 or 8 bus lines:
Are there any permutations that create all-new neighborsfor every single line in order to eliminate coupling faults?
0 - 2
1 - 4
2 - 0
3 - 5
4 - 1
5 - 3
6 lines 8 lines
0 - 21 - 62 - 03 - 54 - 75 - 36 - 1 7 - 4
0 - 31 - 52 - 73 - 0 4 - 65 - 16 - 4 7 - 2
NNP6 NNP81 NNP82 NNP83
0 - 51 - 72 - 4 3 - 64 - 25 - 06 - 3 7 - 1
Lehrstuhl Technische Informatik - Computer Engineering
Brandenburgische Technische Universität Cottbus
6 Wires: Permutations and ReplacementNNPPW2 PW3
005
02
53
0324
51
30
1 12
14
20
15
422403
221
02
14
2105
34
50
2,5,3
4,2,0
0,1,4
3
NNPPW2 PW3
34
35
41
305142155,4,1
443
4130
42
152403
1,3,0
550
53
02
51300324
3,0,2
2
42
2 4 backup
backup
Input wire
mapping1st switching column
2nd switching column3rd switching column
Replacement possibleby lines # (2 sw. col.)
Line selectedfor backup
Selected backup lines
PermutationsAdministration:
4 logic states for2 sw.-columns
6 logic states for3 sw.-columns
2 extra. wires
1 extra. wire
Lehrstuhl Technische Informatik - Computer Engineering
Brandenburgische Technische Universität Cottbus
Selection of Permutations
All single faults must be repairable by selectinga minimum set of permutations.
Those lines that can act as replacement for most of theothers are selected for „backup lines“.
No permutation used for repair must map a functionalline to a faulty line.
By permutation, also non-faulty functional lines are re-arranged.
Lehrstuhl Technische Informatik - Computer Engineering
Brandenburgische Technische Universität Cottbus
Permutations for 8-Wire-Bundles
0 - 21 - 62 - 03 - 54 - 75 - 36 - 1 7 - 4
0 - 31 - 52 - 73 - 0 4 - 65 - 16 - 4 7 - 2
PW1 PW2 PW3
0 - 11 - 02 - 33 - 24 - 55 - 46 - 77 - 6
1- 32 - 43 - 14 - 25 - 76 - 07 - 5
0 - 6NNP1 NNP2
0 - 41 - 72 - 53 - 64 - 05 - 26 - 3 7 - 1
0 - 51 - 72 - 4 3 - 64 - 25 - 06 - 3 7 - 1
NNP3
New-neighborhood Pair-wise symmetrical
Lehrstuhl Technische Informatik - Computer Engineering
Brandenburgische Technische Universität Cottbus
8 Wires: Permutations and ReplacementNNP2NNP1 PW3
002
03
27
0436
25
71
116
15
64
17
526340
220
72
03
2571
04
36
3,2,7
5,6,4
7, 0,3
3
NNP2NNP1 PW3
35
30
51
360452170,5,1
447
4672
40
637125
6,7,2
553
51
30
521736041,3,0
NNP2NNP1 PW3
661
77
4
64
15
63401752
4,1,5
72
46
71
25
4063
2,4,65
5 57
7
7
Bit
Bit
Bit
Bit
Bit
Bit
Bit
Bit
Selectedbackupwires
Selectedbackup
2 lines selected for backup!
Permutations
Lehrstuhl Technische Informatik - Computer Engineering
Brandenburgische Technische Universität Cottbus
8 Wires: Permutations and Replacement
NNP2NNP1 PW3
002
03
27
0436
25
71
116
15
64
17
526340
220
72
03
2571
04
36
3,2,7
5,6,4
7, 0,3
3
NNP2NNP1 PW3
35
30
51
360452170,5,1
4 47
4672
40
637125
6,7,2
553
51
30
521736041,3,0
NNP2NNP1 PW3
661
77
4
64
15
63401752
4,1,5
72
46
71
25
4063
2,4,65
5 57
7
Bit
Bit
Bit
Bit
Bit
Bit
Bit
Bit
2
2
2
7
1
1 1
4 lines selected for backup!
Permutations
Lehrstuhl Technische Informatik - Computer Engineering
Brandenburgische Technische Universität Cottbus
Overhead / Coverage for 6-Line-Bundle
Spare. lines / Switches
-Singleline fault
Dyn. coupl.faults
Doubleline faults
Faults 0/ 12 1 /36 2 / 24
+
-
+
+
-
+
+
50%
Lehrstuhl Technische Informatik - Computer Engineering
Brandenburgische Technische Universität Cottbus
Overhead / Coverage for 8-Line-Bundle
Spare Lines (out of 8) / Switches
-Singlefine fault
Dyn. coupl.fault
Doubleline faults
Faults 0/ 16 1 /48 2 / 32
+
-
+
+
-
+
+ +
20%
3 / 32
+
++
30%
+
++
100 %
4/ 32
Note: The number of switches is reduced by a factorof 2 if full 2-way-switches with 2 inputs / 2 outputs are used!
Lehrstuhl Technische Informatik - Computer Engineering
Brandenburgische Technische Universität Cottbus
ResultsBus segments can favorably be organized into bundlesof 8 lines for reconfiguration. Wider bundles require evenmore columns of switches.
In a bundle of 8 lines, all single faults can be repairedeither by one backup line and 3 columns of switches ortwo backup lines and 2 columns with 6 / 4 logic states.
Two columns with 4 states also allow for two alternativemodes of changing neighborhood relations for de-coupling.It also covers a fraction of double-line faults.
A full coverage of double-line-faults requires 4 backup linesand 2 columns of switches or 2 backup lines and 4 columns.
Lehrstuhl Technische Informatik - Computer Engineering
Brandenburgische Technische Universität Cottbus
Administration Scheme
A B B A
0
1
2
3
4
5
6
7
0
1
2
3
4
5
6
7
0‘
1‘
2‘
3‘
4‘
5‘
6‘
7‘
C1 C2
Switches
C2 C1
Config-bitsDecode Decode
Config-Logic
Config-Logic
Switches
Matching
in / out
in / out
linesSC SC
Lehrstuhl Technische Informatik - Computer Engineering
Brandenburgische Technische Universität Cottbus
Processor-Based Bus Test
TestProcessor
BusMaster
Bus Master
BusMaster
clock
reflector select
invert control
data lines
Bus reflector
Lehrstuhl Technische Informatik - Computer Engineering
Brandenburgische Technische Universität Cottbus
Test and Fault Diagnosis
Test Processor
BM
BM
BM
BM
BM
BM
SC
SC
SC
SC
SC
SC
SC
S C
S C
SC
SegmentStatusList
Lehrstuhl Technische Informatik - Computer Engineering
Brandenburgische Technische Universität Cottbus
Upcoming:Test Procedure & Fault Management
Test-Processor can „reset“ control of bus sections.
Test processor runs diagnostic test to identify faulty lines.
In case of faults, „trial and error“ test to identifyfaulty line segment(s).
Test Processor keeps „fault list“ for redundancymanagement & supervision.
Lehrstuhl Technische Informatik - Computer Engineering
Brandenburgische Technische Universität Cottbus
SummaryA simple scheme of re-arranging bus sections for repair ofpermanent faults.
Simple control scheme based on few logic states.
The number and the electrical effect of switches in complexbus systems may still cause problems.
Modular approach based on bundles of lines is scalable tocover wider buses. Should work well with NoCs.
Compatibility with regular schemes for bus test based on adedicated test processor device.
Lehrstuhl Technische Informatik - Computer Engineering
Brandenburgische Technische Universität Cottbus
5. Diagnostic Tests
Fault diagnosis by diagnostic (self-) test is possibly the real bottleneck in logic BISR!
Lehrstuhl Technische Informatik - Computer Engineering
Brandenburgische Technische Universität Cottbus
Fault DiagnosisMemory cells are either to diagnose in case of faultsaffecting single cells. BIST is possible.
Diagnostic tests of buses that have to discover a singlefaulty line are straightforward. They can easily find whichwires are affected, but not where the fault is.
Detecting a fault gate or even transistor in a logic blockis a much more challenging problem. Diagnosis must be compatible with methods of test response compaction usedin scan testing.
Intelligent encoding for test responses! ... such as done by U. Potsdam and Infineon!
Lehrstuhl Technische Informatik - Computer Engineering
Brandenburgische Technische Universität Cottbus
Combinational Logic Fault DiagnosisInput-FFs Output
FFs
Faults can occur within specific gates, on interconnects,or in a „distributed“ manner. Identifying a specific fault gate or line isnot easy at best and sometimes close-to impossible by logic testing.
Lehrstuhl Technische Informatik - Computer Engineering
Brandenburgische Technische Universität Cottbus
Comb.Logic
(pseudo-) inputs
(pseudo-) outputs
Inputvector
Outputvector
Logic Test
Lehrstuhl Technische Informatik - Computer Engineering
Brandenburgische Technische Universität Cottbus
Scan Path Technology
Comb.Logic
(pseudo-) inputs
(pseudo-) outputs
Inputvector
Outputvector
ff
ff
ff
ff
ff
ffff
ff
ff
ff
ff
Scan-in Scan-out
Lehrstuhl Technische Informatik - Computer Engineering
Brandenburgische Technische Universität Cottbus
Scan-based Logic Test
De-compactor
Compacted / encoded test information
CL
CL
Test response compactor
Diagnosis
Coding
Lehrstuhl Technische Informatik - Computer Engineering
Brandenburgische Technische Universität Cottbus
Fault Diagnosis on Compacted Output Data
Scan input Generator(De-Compactor)
*patented, U. Potsdam and Infineon Technologies AG
& & & & & & &
MISR Ref. MISR
compare
d0 d1 d3 d4 d5 d6d2d-valuestorage
scan clock
MISR clock: k * scan-clock
Lehrstuhl Technische Informatik - Computer Engineering
Brandenburgische Technische Universität Cottbus
6. A Lot of Work to Do
Logic fault diagnosis
Efficient logic self repair
Redundancy supervision and management
Resource management under fault conditions
Repair functions for interconnects
Overall system-level fault management