A Pausible Bisynchronous FIFO for GALS Systems
Ben Keller*†, Ma@hew FojCk*, Brucek Khailany* *NVIDIA CorporaCon †University of California, Berkeley
May 4, 2015 A Pausible Bisynchronous FIFO for GALS Systems 1
2015 Symposium on Asynchronous Circuits and Systems
1 5
6
E 4
3 8
7
2
The Nightmare of Global Clocking • Large chips have only a
handful of clock domains • Each clock domain is many
mm2, even in ≤28nm – Made up of many par$$ons
• Tapeout signoff currently requires distribuCng balanced synchronous clocks to every flip-‐flop in a clock domain…
• …then closing SETUP and HOLD on all paths in the clock domain – Across dozens of PVT corners! 25mm
25mm
ParCCons
May 4, 2015 A Pausible Bisynchronous FIFO for GALS Systems 2
Clock domain
Globally Asynchronous, Locally Synchronous!
May 4, 2015 A Pausible Bisynchronous FIFO for GALS Systems 3
From this…
Globally Asynchronous, Locally Synchronous!
May 4, 2015 A Pausible Bisynchronous FIFO for GALS Systems 4
…to this.
Partition A Partition B
Internal Logic
Local DVCO
Internal Logic
Local DVCO
Local Clock Generator
Local Clock Generator
Partition A Partition B
Internal Logic
Local DVCO
Internal Logic
Local DVCO
Local Clock Generator
Local Clock Generator
Towards Fine-‐Grained GALS
• No global clock! Local ring oscillator generates clock in each parCCon
• No global Cming closure • Reduce Cming margin
– Tracks local PVT variaCon
– May improve tracking of high-‐frequency noise
• But there’s a catch… Clock Domain Crossing
May 4, 2015 A Pausible Bisynchronous FIFO for GALS Systems 5
Textbook Approach: Bisynchronous FIFO
FIFO Memory (Dual Port RAM)
Write Pointer Logic
tx_clk
tx_clk
Read Pointer Logic
rx_clk
Write Pointer
Read Pointer
Valid
Full
Empty
Ready “Brute Force” Synchronizers
Data In Data Out
May 4, 2015 A Pausible Bisynchronous FIFO for GALS Systems 6
RX Interface TX Interface
The Brute Force Synchronizer
• Add latency unCl metastability is most likely resolved
• +3 cycles of latency at every parCCon boundary!
• Need to do be@er…
MTBF∝ et
May 4, 2015 A Pausible Bisynchronous FIFO for GALS Systems 7
Pausible Clocking Basics
Clock
R2 OK Phase
Metastability happens when Data_Sync transitions near the
clock edge.
Data_Sync
Sync_OK
If data arrives when clock is high, let it
through.
Delay Phase G1
G2
R1
R2
Data_Sync Sync_OK
Clock Generator
Mutual exclusion circuit only allows G1
OR G2 to go high (never both).
May 4, 2015 A Pausible Bisynchronous FIFO for GALS Systems 8
Pausible Clocking Basics
Clock
R2 OK Phase
Metastability happens when Data_Sync transitions near the
clock edge.
Data_Sync
Sync_OK
If data arrives when clock is low, delay it until after the
clock edge.
Delay Phase G1
G2
R1
R2
Data_Sync Sync_OK
Clock Generator
Mutual exclusion circuit only allows G1
OR G2 to go high (never both).
May 4, 2015 A Pausible Bisynchronous FIFO for GALS Systems 9
Metastability in Pausible Clocks
If the metastability lasts long enough, the next clock edge is delayed until it resolves safely.
G1
G2
R1
R2
Data_Sync Sync_OK
Clock Generator
Clock
R2
OK Phase
Data_Sync
Mutex Output metastability
Delay Phase
May 4, 2015 A Pausible Bisynchronous FIFO for GALS Systems 10
Clock Pauses Are Rare
1E+01
1E+02
1E+03
1E+04
1E+05
1E-‐07 1E-‐06 1E-‐05 1E-‐04 1E-‐03 1E-‐02 1E-‐01 1E+00 1E+01 1E+02 1E+03
Mutex Delay (p
s)
Difference in Arrival Times (ps)
96% of synchronizations
have no metastability
99.99% of metastability events resolve
in <100ps
Average impact on cycle time:
<0.1%
May 4, 2015 A Pausible Bisynchronous FIFO for GALS Systems 11
SynchronizaCon With Pausible Clock Generators
“Pausible Clock Generator”
“Demystifying Data Driven and Pausible Clocking Schemes,” Mullins07
Clock Generator
Synchronized signal arrives
after (an average of) just 1 clock
cycle!
May 4, 2015 A Pausible Bisynchronous FIFO for GALS Systems 12
Mutex and pausible clock
safely synchronize the signal.
Pausible Clocks: Our ContribuCon • Prior work in pausible clocks:
– Proposed pausible clocking as a technique for low-‐latency asynchronous interface crossings (Yun96)
– Integrated pausible clocks with asynchronous FIFOs to make GALS wrappers (Moore02, Fan09, others)
• We propose a circuit that: – Pairs pausible clocking techniques with standard two-‐ported synchronous FIFOs
– Uses a novel flow-‐control scheme that enables low-‐latency asynchronous boundary crossings
– Integrates easily with standard toolflows
May 4, 2015 A Pausible Bisynchronous FIFO for GALS Systems 13
A Pausible Bisynchronous FIFO
FIFO Memory (Dual Port RAM)
Write Pointer Logic
RX Interface TX Interface
tx_clk
tx_clk
Read Pointer Logic
rx_clk
Write Pointer
Read Pointer
Valid
Full
Empty
Ready
Data In Data Out
May 4, 2015 A Pausible Bisynchronous FIFO for GALS Systems 14
A Pausible Bisynchronous FIFO
FIFO Memory (Dual Port RAM)
Write Pointer Logic
RX Interface TX Interface
tx_clk
tx_clk
Read Pointer Logic
rx_clk
Write Pointer
Read Pointer
Valid
Full
Empty
Ready
Data In Data Out
Pausible Clock
Generator
Pausible Clock
Generator
May 4, 2015 A Pausible Bisynchronous FIFO for GALS Systems 15
A Pausible Bisynchronous FIFO
Pointers are synchronized via two-phase increment and acknowledge
signals. Pausible Clock
Generator
Pausible Clock
Generator
May 4, 2015 A Pausible Bisynchronous FIFO for GALS Systems 16
May 4, 2015 A Pausible Bisynchronous FIFO for GALS Systems 17
Two-phase signals
1. Valid data is written to the FIFO.
2. Write pointer increment is toggled.
3. The write pointer increment is synchronized.
5. Read pointer increment is toggled.
6. After the RX clock, write pointer acknowledge is toggled.
7. Write pointer acknowledge is synchronized. The increment signal can now safely be reused.
4. Valid is asserted and data is read out of the FIFO.
0
1
2
3
4
5
6
7
8
0.5 1 2 4
Latency (RX Cycles)
TX Clock Period / RX Clock Period
Brute Force Synchronizer
Pausible Synchronizer
SimulaCon Results: Brute Force Synchronizer
Typical latency: 4 cycles
May 4, 2015 A Pausible Bisynchronous FIFO for GALS Systems 18
Typical latency: 1.25 cycles
Pausible Clocking Timing Constraints
Minimum clock period:
t fb
tr2
tg2
T2 ≥ tr2 + tg2 + t fb
May 4, 2015 A Pausible Bisynchronous FIFO for GALS Systems 19
Maximum insertion delay: Tins ≤ T − t fb − tg2
tins
Maximum same-cycle work:
TCL ≤ t fb + tg2 +Tins
RX Pointer Logic
Pausible Clocking LimitaCons: Where to Put the Mutexes?
Clock Generator
Pausible Interface
Pausible Interface
Pausible Interface
Partition
Delay due to
physical distance
Pausible Interface
May 4, 2015 A Pausible Bisynchronous FIFO for GALS Systems 20
Pausible Clocking LimitaCons: Where to Put the Mutexes?
OpAon 1: Mutexes at the interface – Reduces maximum clock rate
OpAon 2: Mutexes at the center – Increases latency + Improves maximum clock rate + Can bake the mutex into a black box
with the clock generator
Added Delay
May 4, 2015 A Pausible Bisynchronous FIFO for GALS Systems 21
Playing Nice with the Tools • Most of the
design is synchronous and works fine with synthesis tools
• FIFO is a standard dual-‐ported memory
• A single custom cell is needed
• Custom Cming constraints at clock domain crossings
May 4, 2015 A Pausible Bisynchronous FIFO for GALS Systems 22
Synchronizer Comparison
Design Average Latency (cycles) Area (um2) Power (mW) Energy
(fJ/bit)
Synchronous Interface 1 4968 4.08 25.5
Brute Force Synchronizer ~4 5005 6.03 37.7
Pausible Synchronizer ~1.5 4808 5.41 33.8
May 4, 2015 A Pausible Bisynchronous FIFO for GALS Systems 23
Summary
• Pausible clocks have key advantages over standard synchronizers, including low latency and zero probability of failure
• We designed a pausible bisynchronous FIFO that pairs pausible clocking techniques with standard two-‐ported synchronous FIFOs
• Fast asynchronous interfaces that work well with standard toolflows are a key enabling technology of fine-‐grained GALS design
May 4, 2015 A Pausible Bisynchronous FIFO for GALS Systems 24
References R. Mullins and S. Moore, “DemysCfying Data-‐Driven and
Pausible Clocking Schemes,” in Proc. IEEE Symposium on Asynchronous Cir-‐ cuits and Systems, 2007, pp. 175–185.
K. Yun and R. Donohue, “Pausible clocking: a first step toward heterogeneous systems,” in Proc. IEEE Interna$onal Conference on Computer Design, 1996, pp. 118–123.
S. Moore et al., “Point to point GALS interconnect,” in Proc. IEEE Symposium on Asynchronous Circuits and Systems, 2002, pp. 69–75.
X. Fan et al., “Analysis and opCmizaCon of pausible clocking based GALS design,” IEEE Interna$onal Conference Computer Design, 2009.
May 4, 2015 A Pausible Bisynchronous FIFO for GALS Systems 25
QuesCons?
May 4, 2015 A Pausible Bisynchronous FIFO for GALS Systems 26
Backup Slides
May 4, 2015 A Pausible Bisynchronous FIFO for GALS Systems 27
A Pausible Bisynchronous FIFO
MUTEX
MUTEX
MUTEX
MUTEX
r1
r2
g1
g2
MUTEX
MUTEX
MUTEX
MUTEX
C
LAT
C
LAT
tx_clk_root
tx_clk
tx_clk
valid
ready
valid
ready
data_outdata_in
r1
r2
g1
rptr_inc
rptrwptr
wptr_ack_sync
wptr_ack wptr_ack
rptr_ack_sync
rx_clk_root
wptr_inc
wptr_inc_
sync
rx_clk
To TX Side
Dual-Port FIFO
From RX Side
Write Pointer Logic
Read Pointer Logic
g2
LAT LAT
t_pa
use_
star
t
t_pause_start r_pause_startr_
paus
e_st
art
May 4, 2015 A Pausible Bisynchronous FIFO for GALS Systems 28
Mutex Circuit
• Only G1 or G2 can be high at once (never both)
• If R1 and R2 transiCon high simultaneously, metastability can result
• The “metastability filter” keeps both outputs low unCl metastability resolves
SR Latch Metastability Filter
R1 R2 G1 G2
0 0 0 0
1 0 1 0
0 1 0 1
1 1 R1 first -‐> G1 high R2 first -‐> G2 high
May 4, 2015 A Pausible Bisynchronous FIFO for GALS Systems 29
SimulaCon Setup
• Implemented interface in Verilog • Allows direct comparison to brute-‐force synchronizers via drop-‐in replacement
Dummy Compute
(TX) Interface
Dummy Compute
(RX)
TX Clock Domain RX Clock Domain
May 4, 2015 A Pausible Bisynchronous FIFO for GALS Systems 30
Minimum Clock Period
Clock
R2 OK Phase Delay Phase
Request arrives here
Minimum clock period:
t fb
tr2
tg2T2 ≥ tr2 + tg2 + t fb
May 4, 2015 A Pausible Bisynchronous FIFO for GALS Systems 31
InserCon Delay
Clock_root
R2
OK Phase Delay Phase
Clock tins
Clock_root tins
Requests can arrive near the clock edge!
LAT
r2
May 4, 2015 A Pausible Bisynchronous FIFO for GALS Systems 32
Maximum insertion delay: Tins ≤ T − t fb − tg2
CombinaConal Logic Delay TCL
t fb
tr2
tg2
May 4, 2015 A Pausible Bisynchronous FIFO for GALS Systems 33
Maximum same-cycle work:
TCL ≤ t fb + tg2 +Tins
Metastability Margin tm
Minimum clock period:
t fb
tr2
tg2
T2 ≥ tr2 + tg2 + t fb
May 4, 2015 A Pausible Bisynchronous FIFO for GALS Systems 34
Metastability margin: tm = T 2− (tr2 + tg2 + t fb )
tm
Average Latency
Average latency: TL = T +Tins − tr2
May 4, 2015 A Pausible Bisynchronous FIFO for GALS Systems 35