asynchronous interface specification, analysis and synthesis m. kishinevsky intel corporation j....
Post on 20-Dec-2015
234 views
TRANSCRIPT
Asynchronous Interface Asynchronous Interface Specification, AnalysisSpecification, Analysis
and Synthesisand Synthesis
M. Kishinevsky
Intel Corporation
J. Cortadella
Technical Universityof Catalonia
Steps in Design Flow
Specification Synthesis
– Next-state functions– State encoding– Decomposition and technology mapping
Timing optimization Verification
x
y
z
x+
x-
y+
y-
z+
z-
Signal Transition Graph (STG)
x
y
z
x+
x-
y+
y-
z+
z-
x+
x-
y+
y-
z+
z-
xyz000
x+
100y+z+
z+y+
101 110
111
x-
x-
001
011y+
z-
010
y-
xyz000
x+
100y+z+
z+y+
101 110
111
x-
x-
001
011y+
z-
010
y-
Next-state functions
x z x y ( )
y z x
z x y z
Next-state functions
x z x y ( )
y z x
z x y z
x
z
y
VME bus
DeviceLDS
LDTACK
D
DSr
DSw
DTACK
VME BusController
DataTransceiver
BusDSr
LDS
LDTACK
D
DTACK
Read Cycle
STG for the READ cycle
LDS+ LDTACK+ D+ DTACK+ DSr- D-
DTACK-
LDS-LDTACK-
DSr+
Choice: Read and Write cycles
DSr+
LDS+
LDTACK+
D+
DTACK+
DSr-
D-
LDS-
LDTACK- DTACK-
DSw+
D+
LDS+
LDTACK+
D-
DTACK+
DSw-
LDS-
LDTACK-DTACK-
Choice: Read and Write cycles
DSr+
LDS+
LDTACK+
D+
DTACK+
DSr-
D-
LDS-
LDTACK- DTACK-
DSw+
D+
LDS+
LDTACK+
D-
DTACK+
DSw-
LDS-
LDTACK-DTACK-
Speed independence
Delay model:– Unbounded gate delays– Wire delays after fork are less than gate delays
Conditions for implementability:– Consistent and Complete State Coding– Determinism– Output persistency– Commutativity
State Graph (Read cycle)
DSr+
DSr+
DSr+
DTACK-
DTACK-
DTACK-
LDS-LDS-LDS-
LDTACK- LDTACK- LDTACK-
D-
DSr-DTACK+
D+
LDTACK+
LDS+
Binary encoding of signals
DSr+
DSr+
DSr+
DTACK-
DTACK-
DTACK-
LDS-LDS-LDS-
LDTACK- LDTACK- LDTACK-
D-
DSr-DTACK+
D+
LDTACK+
LDS+
Binary encoding of signals
DSr+
DSr+
DSr+
DTACK-
DTACK-
DTACK-
LDS-LDS-LDS-
LDTACK- LDTACK- LDTACK-
D-
DSr-DTACK+
D+
LDTACK+
LDS+
10000
10010
10110 01110
01100
0011010110
(DSr , DTACK , LDTACK , LDS , D)
QR (LDS+)QR (LDS+)
QR (LDS-)QR (LDS-)
Excitation / Quiescent Regions
ER (LDS+)ER (LDS+)
ER (LDS-)ER (LDS-)
LDS-LDS-
LDS+
LDS-
Next-state function
0 1
LDS-LDS-
LDS+
LDS-
1 0
0 0
1 1
Karnaugh map for LDS
DTACKDSrD
LDTACK 00 01 11 10
00
01
11
10
DTACKDSrD
LDTACK 00 01 11 10
00
01
11
10
LDS = 0 LDS = 1
0 1-0
0 0 0 0 0 ?
1
111
-
-
-
---
- - - -
-
- ---
- - -
State encoding conflicts
0 1
LDS-LDS-
LDS+
LDS-
1 0
0 0
1 1
LDTACK-
LDTACK+
1011010110
Concurrency reduction
LDS-LDS-
LDS+
LDS-
1011010110
DSr+
DSr+
DSr+
Concurrency reduction
LDS+ LDTACK+ D+ DTACK+ DSr- D-
DTACK-
LDS-LDTACK-
DSr+
State encoding conflicts
LDS-
LDTACK-
LDTACK+
LDS+
10110
10110
Signal Insertion
LDS-
LDTACK-
D-
DSr-
LDTACK+
LDS+
CSC-
CSC+
101101
101100
Decomposition
Hazards Global acknowledgement Generating candidates Hazard-free signal insertion
– Event insertion– Signal insertion
Hazardsabcx1000
1100
b+
0100
a-
0110
c+
a
bz
cx
1
0
0
00
10001
1
0
001100
1
1
1
001100
0
1
1
00
0100
0
1
1
10
0110
0
1
1
11
0
1
0
11
0
1
0
10
Global acknowledgement
abc
z
abd
y
d- b+ d+ y+ a- y- c+ d-
c- d+ z- b- z+ c+ a+ c-
abc
z
abd
y
How about 2-input gates ?
d- b+ d+ y+ a- y- c+ d-
c- d+ z- b- z+ c+ a+ c-
a
bc
z
abd
y
How about 2-input gates ?
d- b+ d+ y+ a- y- c+ d-
c- d+ z- b- z+ c+ a+ c-
a
bc
z
abd
y
How about 2-input gates ?
00
d- b+ d+ y+ a- y- c+ d-
c- d+ z- b- z+ c+ a+ c-
abc
z
a
bd
y
How about 2-input gates ?
d- b+ d+ y+ a- y- c+ d-
c- d+ z- b- z+ c+ a+ c-
cz
dy
How about 2-input gates ?
a
b
d- b+ d+ y+ a- y- c+ d-
c- d+ z- b- z+ c+ a+ c-
Strategy for correct logic decomposition
Each decomposition defines a new internal signal of the circuit
Method: Insert new internal signalssuch that– After resynthesis,
some large gates are decomposed– The new specification is SI-implementable
(hazard-free under unbounded gate delays)
FC
Sr
D
Decomposition-Boolean relations
- Algebraic factorization
Hazard-free ?(Signal insertion)
NO YES
C
C
C
C
Sr
Sr
D
D
until no more progress
Decomposition(Boolean relations)F
C
Sr
D
Hazard-free ?(Signal insertion)
NO YES
CC
Sr
D
until no more progress
Decomposition(Boolean relations)
Decomposition(Boolean relations)
Decomposition(Boolean relations)
Decomposition(Boolean relations)
Decomposition(Boolean relations)
Decomposition(Boolean relations)
Decomposition-Boolean relations
-Algebraic factorization
Boolean decomposition
Fx1
xn
f H Gx1
xn
h1
hm
f
f = F (x1,…,xn) f = G(H(x1,…,xn))
Our problem: Given F and G, find H
Ch1
h2
f
state f next(f) (h1,h2)
s1 0 0 (0,-) (-,0) s2 0 1 (1,1) s3 1 0 (0,0) s4 1 1 (-,1) (1,-) dc - - (-,-)
This is a Boolean Relation
y-
a+ c-
d-
a-
c+
a+
y+
a-c-
d+
c+
y
acd Facd y c d ( )
Rsy
R
S
y-
a+ c-
d-
a-
c+
a+
y+
a-c-
d+
c+
y
acd acd y c d ( )
Rsy
acdc
d
y-
a+ c-
d-
a-
c+
a+
y+
a-c-
d+
c+
y
acd acd y c d ( )
Rsy
cd yc
a
y-
a+ c-
d-
a-
c+
a+
y+
a-c-
d+
c+
y
acd acd y c d ( )
Rsya
Ddc
Ad hoc solver for Boolean Relations
Existing solvers [Somenzi,Watanabe] aim at minimizing PLA size
Our approach:– Targeted to 2-output functions– Individual minimization of each function– Branch-and-bound to eliminate
incompatible solutions (heuristic pruning)– Yields several solutions with similar cost
FC
Sr
D
Decomposition-Boolean relations
-Algebraic factorization
NO YES
CC
Sr
D
until no more progress
Hazard-free ?(Signal insertion)
Hazard-free ?(Signal insertion)
Hazard-free ?(Signal insertion)
Hazard-free ?(Signal insertion)
Hazard-free ?(Signal insertion)
Hazard-free ?(Signal insertion)
Hazard-free ?(Signal insertion)
Hazard-free ?(Signal insertion)
Event insertion (Vanbekbergen’92)
a b
ER(x)
cx x x x
b
SR(x)
a
Properties to preserve during insertion:– trace equivalence– speed-independence
output-persistency commutativity
Signal insertion = a few events insertion
Event insertion (Continued)
a b
ab
a b
ab
Event insertion: examples
a
a
b
b
a
a
b
b
a
a
b
b
xx a is notpersistent
a
a
b
b
a
a
b
b
ba
a
b
b
xx
xx
a ispersistent
Signal insertion for function F
State Graph
F=0 F=1
Insertion by input borders
F-
F+
y-
z- w-
y+ x+
z+
x-
w+
1001 1011
1000
1010
0001
0000 0101
0010 0100
0110 0111
0011
y-
y+
x-
x+w+
w-
z+
z-
w-
w-
z-
z-y+
y+
x+
x+
1001 1011
1000
1010
0001
0000 0101
0010 0100
0110 0111
0011
y-
y+
x-
x+w+
w-
z+
z-
w-
w-
z-
z-y+
y+
x+
x+
C
C
x
y
x
y
w
z
xyz
y
zw
z
w
z
y
yz=1
1001 1011
1000
1010
0001
0000 0101
0010 0100
0110 0111
0011
y-
y+
x-
x+w+
w-
z+
z-
w-
w-
z-
z-y+
y+
x+
x+
yz=0
1001 1011
1000
1010
0001
0000 0101
0010 0100
0110 0111
0011
y-
y+
x-
x+w+
w-
z+
z-
w-
w-
z-
z-y+
y+
x+
x+
C
C
x
y
x
y
w
z
xyz
y
zw
z
w
z
y
yz=1yz=0
1001 1011
1000
1010
0001
0000 0101
0010 0100
0110 0111
0011
y-
y+
x-
x+w+
w-
z+
z-
w-
w-
z-
z-y+
y+
x+
x+
yz=1yz=0
1001 1011
1000
1010
0001
0000 0101
0010 0100
0110 0111
0011
y-
y+
x-
x+w+
w-
z+
z-
w-
w-
z-
z-y+
y+
x+
x+
C
C
x
y
x
y
w
z
x
y
z
w
z
w
z
yz is delayed by the new signal !!!
yz=1yz=0
1001 1011
1000
1010
0001
0000 0101
0010 0100
0110 0111
0011
y-
y+
x-
x+w+
w-
z+
z-
w-
w-
z-
z-y+
y+
x+
x+
C
C
x
y
x
y
w
z
x
y
z
w
z
w
z
yyyyyyy
1001 1011
1000
1010
0001
0000 0101
0010 0100
0110 0111
0011
y-
y+
x-
x+w+
w-
z+
z-
w-
w-
z-
z-y+
y+
x+
x+
yz=1yz=0
1001 1011
1000
1010
0001
0000 0101
0010 0100
0110 0111
0011
y-
y+
x-
x+w+
w-
z+
z-
w-
w-
z-
z-y+
y+
x+
x+
1001 1011
1000
1010
0001
0000 0101
0010 0100
0110 0111
0011
y-
y+
x-
x+w+
w-
z+
z-
w-
w-
z-
z-y+
y+
x+
x+
C
C
x
y
x
y
w
z
x
y
z
w
z
w
z
y
s-
s+
s-
s-
s=1
s=0
1001 1011
1000
1010
0111
0011y+
x-
w+
z+
z-
0001
0000 0101
0010 0100
0110
x+
w-
w-
w-
z-
z-y+
y+
x+
x+
1001
1000
1010
y+
z-
0111
C
C
x
y
x
y
w
z
x
y
z
w
z
w
z
y
sy-
s-
s+
s-
s-
s=1
s=0
1001 1011
1000
1010
0111
0011y+
x-
w+
z+
z-
0001
0000 0101
0010 0100
0110
x+
w-
w-
w-
z-
z-y+
y+
x+
x+
1001
1000
1010
y+
z-
0111
y-y-
z- w-
y+ x+
z+
x-
w+
s-
s+
s-
s+
s-
s+
s-
s+
s-
s+
s-
s+
s-
s+
s-
s+
Technology mapping
BDD-based boolean matching[Mailhot 93]
Handles sequential gates and combinational feedbacks
Merging small gates into larger gates introduces no new hazards
No guarantee to find correct mapping(some gates cannot be decomposed)
Timing optimization (I)
If exact timing bounds are unknown, use relative timing assumptions
Timing assumptions always reduce the set of states– DC-set is larger– No new logic dependencies– Less state conflicts– Simpler logic
READ control in 2-input gates
LDS+ LDTACK+ D+ DTACK+ DSr- D-
DTACK-
LDS-LDTACK-
DSr+
DTACKD
DSr
LDS
LDTAKE
csc
map
Adding timing assumptions (I)
LDS+ LDTACK+ D+ DTACK+ DSr- D-
DTACK-
LDS-LDTACK-
DSr+
DTACKD
DSr
LDS
LDTAKE
csc
map
LDTACK- before DSr+Sep_max(LDTACK-,DSr+)<0
Adding timing assumption (I)
LDS+ LDTACK+ D+ DTACK+ DSr- D-
DTACK-
LDS-LDTACK-
DSr+
DTACKD
DSrLDS
LDTAKE
Sep(LDTACK-,DSr+)<0
TIMING CONSTRAINT
Timing optimization (II)
Lazy optimization:– Idea: Increase concurrency for enabling,
without increasing concurrency for firing– Early enabling of a signal cannot produce
new reachable states if some other enabled signal is faster
Adding timing assumptions (II)
LDS+ LDTACK+ D+ DTACK+ DSr- D-
DTACK-
LDS-LDTACK-
DSr+
DTACKD
DSrLDS
LDTAKE
Sep(LDTACK-,DSr+)<0and Sep(D-,LDS-)<0
TIMING CONSTRAINT
Sep_max(LDTACK-,DSr+)<0Sep_max(D-,LDS-) < 0
Adding timing assumptions (II)
LDS+ LDTACK+ D+ DTACK+ DSr- D-
DTACK-
LDS-LDTACK-
DSr+
Sep(LDTACK-,DSr+)<0and Sep(D-,LDS-)<0
TIMING CONSTRAINT
Sep_max(LDTACK-,DSr+)<0Sep_max(D-,LDS-) < 0
DTACKD
DSr LDS
LDTAKE
Summary
Asynchronous design is applicable to– asynchronous interfaces– high-performance computing– low-power design– low-emission design
There is an increased interest of few, but large scale companies: Intel, Philips, Sun, Sharp, ARM, HP, Cogency
Summary
Asynchronous circuits are more difficult to design than synchronous
Clock distribution and on-die variations makes synchronous design more difficult
CAD support is crucial CAD tools have matured Most steps of the design process covered by
this tutorial are supported by tool Petrify
Summary
Asynchronous circuits are more difficult to design than synchronous
Clock distribution and on-die variations makes synchronous design more difficult
CAD support is crucial CAD tools have matured Most steps of the design process covered by
this tutorial are supported by tool Petrify