EE 108A Lecture 13 (c) 2005 W. J. Dally 111/9/2005
EE108A
Lecture 13: Metastability and Synchronization Failure
(or When Good Flip-Flops go Bad)
EE 108A Lecture 13 (c) 2005 W. J. Dally 211/9/2005
What happens when we violate setup and hold time constraints?
D Qd q
clk
d
clk
q
ts th
tdCQ
EE 108A Lecture 13 (c) 2005 W. J. Dally 311/9/2005
Look at structure of CMOS latch
• Storage loop gets initialized with an ‘analog’ value
clk
d
clk
clk
V1 V2
ΔV+ -
Data transition time
ΔV
+2
-ts
-2
th
EE 108A Lecture 13 (c) 2005 W. J. Dally 411/9/2005
Storage loop has a metastable state between 0 and 1
V1 V2
ΔV+ -
V1
V2
Stable point
Metastable point
Stable point
EE 108A Lecture 13 (c) 2005 W. J. Dally 511/9/2005
Dynamics of V
V1
V2
Stable point
Metastable point
Stable point
V1 V2
ΔV+ -
EE 108A Lecture 13 (c) 2005 W. J. Dally 611/9/2005
Dynamics of V
ΔV
1τ 2τ t
1
-1
e-1
-e-1
V1 V2
ΔV+ -
Metastability Demonstration Circuit
1
23
3.3n1K
9
1210
11
13
146
1u
100K
100K
U1
U2
9
1210
11
13
146
1u
100K
100K
U3
13
1211
U1
9
810
U1
13
1211
U4
9
810
U4
1
23
U4
5
64
U4
No Part Vdd GNDU1 4093 14 7U2 4007 14 7U3 4007 14 7U4 4011 14 7U5 4007 14 7U6 4007 14 7
Relaxation Oscillator
Voltage-Controlled DelayIncreasing VC increases delay
Buffer
RS FFs under test
Q1
Q1N
10
11
12
14
13 Q2
6
7
8
9
U5
10
11
12
14
13
6
7
8
9
U6
Q2N
VC
EE 108A Lecture 13 (c) 2005 W. J. Dally 811/9/2005
Metastability Demonstration Circuit - Implementation
EE 108A Lecture 13 (c) 2005 W. J. Dally 911/9/2005
Metastable state of FF1 – 4007 Nand RS Latch
EE 108A Lecture 13 (c) 2005 W. J. Dally 1011/9/2005
Over time the waveform fills in
EE 108A Lecture 13 (c) 2005 W. J. Dally 1111/9/2005
A Brute-Force Synchronizer
D QA
Clk
D Q
FF1 FF2 ASAW
Clk
A
AW
AS
EE 108A Lecture 13 (c) 2005 W. J. Dally 1211/9/2005
What if AW is still in a metastable state when FF2 is clocked?
Clk
A
AW
AS
EE 108A Lecture 13 (c) 2005 W. J. Dally 1311/9/2005
Calculating Synchronization Failure (The Big Picture)
P(failure) = P(enter metastable state) x P(still in state after tw)
EE 108A Lecture 13 (c) 2005 W. J. Dally 1411/9/2005
Probability of Entering a Metastable State
• FF1 may enter the metastable state if the input signal transitions during the setup+hold window of the flip flop
( )hscycy
hsE ttf
t
ttP +=
+=
Clk
ts+thtcy
• Probability of a given transition being in the setup+hold window is the fraction of time that is setup+hold window
EE 108A Lecture 13 (c) 2005 W. J. Dally 1511/9/2005
Probability of Staying in the Metastable State
• Still in metastable state if initial voltage difference was too small to be exponentially amplified during wait time
• Probability of starting with this voltage is proportion of total voltage range that is ‘too small’
⎟⎟⎠
⎞⎜⎜⎝
⎛−=
⎟⎟⎠
⎞⎜⎜⎝
⎛−Δ=Δ
S
wS
S
wFS
tP
tVV
τ
τ
exp
exp
ΔVS
tw
ΔVF=1
EE 108A Lecture 13 (c) 2005 W. J. Dally 1611/9/2005
Failure Probability and Error Rate
( )
( ) ⎟⎠
⎞⎜⎝
⎛−+==
⎟⎠
⎞⎜⎝
⎛−+==
τ
τ
wcyehsFeF
wcyhsSEF
tffttPff
tfttPPP
exp
exp
• Each event can potentially fail.
• Failure rate = event rate x failure probability
ΔVS
tw
ΔVF=1
Clk
ts+thtcy
EE 108A Lecture 13 (c) 2005 W. J. Dally 1711/9/2005
Example
• ts = th = tdCQ = τ =100ps
• tcy = 2ns
• must sample a fE = 1MHz asynchronous signal
• PE = (.1+.1)/2 = 0.1
• PS = exp(-1.8/.1) = exp(-18) = 1.5 x 10-8
• PF = PSPE = 1.5 x 10-9
• fF = fEPF = 1.5 x 10-3
• 1 failure every 656 seconds ~ every 11 minutes• This is not adequate. How do we improve it?
• How do we get failure rate to one every 10 years ~ 3 x 108s (fF < 3 x 10-9)
EE 108A Lecture 13 (c) 2005 W. J. Dally 1811/9/2005
How much difference does one FF make?
• Previous example: 2 FF brute-force synchronizer– 1 failure every 11 minutes (fE = 1.5 x 10-3)
• Add a third FF:– ts = th = tdCQ = τ =100ps (same)– tcy = 2ns (same)– must sample a fE = 1MHz asynchronous signal (same)– PE = (.1+.1)/2 = 0.1 (same)– PS = exp(-3.6/.1) = exp(-36) = 2.3 x 10-16
– PF = PSPE = 2.3 x 10-17
– fF = fEPF = 2.3 x 10-11 (much) less than one failure every 10 years!
• Exponentials grow quickly. Adding one flip flop took us from 11 minutes to 1,300 years.
EE 108A Lecture 13 (c) 2005 W. J. Dally 1911/9/2005
Synchronizing multi-bit signals
Consider a 4-bit counter running on clk1 you need the value of this counter sampled by clk2. Will the following circuit work? (assume tw >> τ)
counter
clk1
4D Q
clk2
D Q4 4
cnt cnt_scnt_w
This happens, for example, in a FIFO where the head and tail pointers are in different clock domains.
EE 108A Lecture 13 (c) 2005 W. J. Dally 2011/9/2005
When synchronizing a multi-bit signal, each changing bit is independently synchronized
Consider what happens on the 0111 to 1000 transition. All bits are changing. Each can independently fall either way.
How do you fix this?
counter
clk1
4D Q
clk2
D Q4 4
cnt cnt_scnt_w
Multi-bit signals (2)
EE 108A Lecture 13 (c) 2005 W. J. Dally 2111/9/2005
Each bit can fail either way!
EE 108A Lecture 13 (c) 2005 W. J. Dally 2211/9/2005
Warning: The Surgeon General has determined that passing binary-coded and one-hot signals through a brute-force synchronizer can be hazardous to your circuits.
EE 108A Lecture 13 (c) 2005 W. J. Dally 2311/9/2005
# xxxx # 0000 # 0001 # 0011 # 0010 # 0110 # 0111 # 0101 # 0100 # 1100 # 1101 # 1111 # 1110 # 1010 # 1011 # 1001 # 1000 # 0000 # 0001 # 0011 # 0010
Gray code: only one bit changes each time
• How does this help?• Remember each bit can fail either way.• If we only change 1 bit each time, then what’s the worst
that can happen?– 0111 => cycle 1: 0101, cycle 2: 0101 (no failure)– 0111 => cycle 1: 0111, cycle 2: 0101 (failure)
• On the second cycle we will have had even more time for our input to stabilize so we should be fine. By using Gray code the worst that happens is we see the transistion 1 cycle later.
EE 108A Lecture 13 (c) 2005 W. J. Dally 2411/9/2005
Why do we care?
• Most designs have multiple clock domains
– I.e., your PCI bus interface runs at 66MHz, but your image compression engine might run at 200MHz
– You need to get data from the PCI bus to the image compression engine
• Example: DVD driver System-on-chip (SoC):
Tsai, C., et. al., “A CMOS SoC for 56/32/56/16 Combo Driver Applications”, ISSC 2004
EE 108A Lecture 13 (c) 2005 W. J. Dally 2511/9/2005
Metastability and Synchronization Failure Summary
• Clocking a flip-flop during the “keepout” interval may leave the storage node in an “illegal state”
• Some “illegal states” are Metastable• Time to decay to a legal state depends on log of initial voltage
• Probability of entering metastable state is probability of hitting “keepout” interval.
• Probability of staying in metastable state after time T is probability that initial voltage was too small to decay in time T
• Brute-force synchronizer – sample signal and wait for metastable states to decay.• Don’t use on multi-bit signals unless they are Gray coded
( )( )0log Vt Δ−= τ
cy
hsE t
ttP
+=
⎟⎠
⎞⎜⎝
⎛−=
τW
S
tP exp
EE 108A Lecture 13 (c) 2005 W. J. Dally 2611/9/2005
The end of ee108a…
• Any questions?