forbidden transition free crosstalk avoidance codec design chunjie duan mitsubishi electric research...
TRANSCRIPT
Forbidden Transition Free Forbidden Transition Free Crosstalk Avoidance CODEC Crosstalk Avoidance CODEC
DesignDesign
Chunjie DuanMitsubishi Electric Research Labs, Cambridge, MA, USA
Chengyu ZhuPolaris Microelectronic System, Shanghai, China
Sunil P. KhatriTexas A&M University, College Station, TX, USA
• Background• On-chip bus crosstalk classification• Forbidden Transition Free (FTF) crosstalk
avoidance code (CAC)• CODEC design for FTF code
• Previous approaches (exponential growth)• Our approach (quadratic growth)
• Experimental results and comparison• Conclusions
Outline Outline
On-chip Bus InterconnectsOn-chip Bus Interconnects
As a consequence: Wire delay depends on state of adjacent wiresWire delay depends on state of adjacent wires Interconnect delay >> gate delayInterconnect delay >> gate delay Global interconnect becomes the performance Global interconnect becomes the performance
bottleneckbottleneck
a
C
C2
1C
C2C2
1a av
v
C1
C2 C2
1
C2
C C
C2
1C
C2C2
1a v aa v a
C
C2
1C
C2C2
1
v
a
a v aC
C2
1C
C2C2
1
v
a
In DSM processes CC11 >> >> CC22 and hence, inter-wire crosstalk becomes dominant
λ = C1 / C2 > 10 for Metal4 in a 0.1m CMOS process
Bus ClassificationBus Classification
4C sequence 101 → 010
3C sequence 101 → 011
2C sequence 100 → 011
1C sequence 001 → 111
0C sequence 000 → 111
Delay impact of different sequences confirmed by SPICE confirmed by SPICE simulationssimulations 0.1um CMOS process
or
Bus can be classified by maximum value of the classified by maximum value of the effective capacitance charged,effective capacitance charged, over all its bits
Crosstalk Avoidance CodesCrosstalk Avoidance Codes The strong dependence of delay on crosstalk class has
motivated much work on crosstalk avoidance codes (CACs) Crosstalk Avoidance Codes (CACs) are a class of codes that Crosstalk Avoidance Codes (CACs) are a class of codes that
when transmitted on the bus, certain undesired classes of when transmitted on the bus, certain undesired classes of crosstalk are avoidedcrosstalk are avoided
CACs can be categorized based on the crosstalk classes crosstalk classes eliminatedeliminated 4C/3C/2C/1C –free codes
CACs can also be categorized based on the memory memory requirementrequirement Memory-based / Memoryless CACs
CACs can be categorized based on the bus typebus type Binary / Multi-level buses
Recovered sequence
Encoder DecoderDriver Receiver
TransmittedSequence(n-bit)
m-bit bus
Crosstalk Avoidance CodesCrosstalk Avoidance Codes
Memoryless CACs Earliest work by our group for 4C free and 3C free
“forbidden pattern free” (FPF)“forbidden pattern free” (FPF) codes in 2001 Forbidden transition free (FTF)Forbidden transition free (FTF) codes by Victor et
al (2001)
We focus on 3C-free, FTF codes CODEC design for these and other codes was
done in an ad-hoc mannerad-hoc manner Worst-case area of CODEC is exponential in bus exponential in bus
widthwidth Key Contribution:Key Contribution: This paper reports a systematic
3C-free CODEC design approach which is based on the Fibonacci Numeral System (FNS) Fibonacci Numeral System (FNS) Complexity grows quadratically with bus widthquadratically with bus width
FTF CACsFTF CACs Forbidden transitionForbidden transition: two adjacent bits transition in opposite
directions, i.e., 01 10 An FTF codeFTF code is a set of vectors such that transitions between
codewords have no forbidden transitions e.g., {00, 01, 11}, {000, 001, 100, 101, 111}.
How to design FTF codes ?How to design FTF codes ? All codewords that are compatible with a class-1 codeword form All codewords that are compatible with a class-1 codeword form
an FTF code with maximum cardinality.an FTF code with maximum cardinality. A class-1 codeword is a vector with alternating ‘0’s and ‘1’s. 101010 or 010101 are the two 6-bit class-1 codewords
In other words, we avoid ’01’ in d2jd2j-1 (even) boundaries and avoid ’10’ in d2j+1d2j (odd) boundaries
Hence, no forbidden transitions are possible There are two FTF codes with maximum cardinalitytwo FTF codes with maximum cardinality
Derived from the two possible class-1 codewords
Inductive FTF Code GenerationInductive FTF Code Generation Generating the set of Generating the set of mm bit codewords bit codewords
QQmm from the from the m-1m-1 bit set bit set QQm-1m-1
Suppose class-1 codeword = …101010101 Q2 = {00, 01, 1100, 01, 11}
For even m > 2, take m-1 bit v Qm-1
v = 0xxx => Qm = Qm U {0000xxx}
v = 1xxx => Qm = Qm U {0101xxx, 1111xxx}
For odd m > 2, take m-1 bit v Qm-1 v = 0xxx => Qm = Qm U {1010xxx, 0000xxx}
v = 1xxx => Qm = Qm U {1111xxx}
FTF Cardinality, Area Overhead FTF Cardinality, Area Overhead A difference equationdifference equation can be derived from the inductive
algorithm
T(m) = T(m-1) + T(m-2) Initial conditions: T(2) = 3, T(3)= 5 Maximum cardinalitycardinality of the FTF code is T(m) = fm+2
Define area overheadarea overhead as ratio of additional wires required in the coded bus to uncoded bus size:
Minimum number of bits m required to code n-bit data is:
fm+2 ≥ 2n
It is well known that where φ = 1.618, is the golden ratio
Therefore
or m ≥ 1.44∙ n (for large n)
Overhead lower bound:
n
nmnOvh
)(
mfn mm 694.016.1)618.1(log5loglog 222
%44)( nnOvh
55
)( mmm
mf
Designing An Efficient CODECDesigning An Efficient CODEC We focus on the 3C-free FTF CODEC3C-free FTF CODEC designs
Most efficient, robust and popular codes Existing solutions have some deficiencies
Potential solutions: Solution 1: Brute-force logic optimization Solution 2: Bus partitioning Solution 3: Fibonacci Numeral System based CODECSolution 3: Fibonacci Numeral System based CODEC
Brute-force Logic OptimizationBrute-force Logic Optimization Multi level implementation based on random mapping
Too many permutations, more codewords than needed
Rely purely on logic optimization CODEC size grows exponentially Not composable: design are not extendable Does not work for large bussesDoes not work for large busses
* S.R. Sridhara et al ”Area and Energy-Efficient Crosstalk Avoidance Codes for On-Chip busses”, ICCD, 2004
Bus PartitioningBus Partitioning
Small size bus group → small CODEC Exhaustively searchExhaustively search for
the optimal CODEC for small bus groups
Forbidden transition across Forbidden transition across the group boundarythe group boundary Group complement Bit overlapping
Area overhead goes upArea overhead goes up from 44% to 62% or more
b(13:16)
b(9:12)
b(5:8)
b(1:4)
Fibonacci Numeral SystemFibonacci Numeral System Fibonacci Sequence:Fibonacci Sequence:
F = {0, 1, 1, 2, 3, 5, 8, 13, 21…}
Useful properties: Golden ratio expression:
So for large m:
Summation identity:Summation identity:
Fibonacci Numeral System (FNS)Fibonacci Numeral System (FNS) Use Fibonacci numbers as base where
Fibonacci numeral system is completecomplete but but ambiguousambiguous Range : [0, fRange : [0, fm+2m+2-1]-1]
A total of fm+2 values can be represented by m-bit Fibonacci vectors
m
m
m f
f 1lim
12
0
m
kkm ff
21
1
0
1
0
mmm fff
f
f
5
)( mm
mf
m
kkk fdv
1
}1,0{kd
FTF CODEC DesignFTF CODEC Design Theorem:
For a number For a number vv in the range [ in the range [00, , ffm+2m+2), there exists at least ), there exists at least one one mm-bit FTF vector -bit FTF vector ddmmddm-1m-1..d..d22dd11 in the Fibonacci numeral in the Fibonacci numeral systemsystem
Proof There exists at least one Fibonacci vector for v
(completeness) v ∈ S01 can be replaced by v ∈ S00 or v ∈ S10. v ∈ S10 can be replaced by v ∈ S01 or v ∈ S11. If this vector is not FTF, an equivalent FTF vector can be
generated by replacing the prohibited patterns at the boundaries.
otherwise
fvd k
k 0
10 fk fk+2fk+1 2fkfk-1
S00
S01
S10
S11
otherwise
fvd k
k 0
1 1
Encoding AlgorithmEncoding Algorithm
<fm
<fm
dmrm
dm-1
rm-1
<fm-2dm-2
rm-2
<f4d3
r3
<f2d2d1
<fm-2
dm-1
dm
fm
fm-1
d3
d2
d1
f3
v
v
encoder decoder
Decoder implements An m-input adder No multipliers needed
m
kkk fdv
1
Encoder consists of m-1 stages Each stage produces one coded bit Each stage outputs a remainder The remainder of one stage is the
input of the following stage
Encoding ExampleEncoding Example Input: v =19 Output: 7-bit FTF vector
⑦ v ≥ 13 → d7 = 1, r7 = v-13 = 6
⑥ r7 < 13 → d6 = 0, r6 = r7-0 = 6
⑤ r6 ≥ 5 → d5 = 1, r5 = r7-5 = 1
④ r5 < 5 → d4 = 0, r4 = r7-0 = 1
③ r4 < 2 → d3 = 0, r3 = r4-0 = 1
② r3 < 2 → d2 = 0, r2 = r3-0 = 1
① d1 = r2 = 1 Output: 11001100000011
0
1
11
01
01
06
19 16
ImplementationImplementation Multi-stage structureMulti-stage structure
Systematic ExtendableExtendable modular
design Easily pipelinedEasily pipelined
Internal logicInternal logic Even-stage
2 adders + 1 MUX
Odd-stage 1 adder + 1 MUX
Combining 2 stages 2 adders + 1 MUX
fk
CMPfk+1
SUBSEL
dk
rk+1
rk
even stage
fkSUB
SEL
dk
rk+1 rk
odd stage
fk
SUBfk+1
SUBSEL
dk
rk+1
rk-1
dk-1
combined stage
CODEC Gate Count & SpeedCODEC Gate Count & Speed Gate count grows quadraticallly Gate count grows quadraticallly
with bus size as opposed to with bus size as opposed to exponentially for a brute-force exponentially for a brute-force designdesign Brute: >20K @ 12bit FTF: 237 @ 12bit,
2244 @ 32bit
Delay also grows quadraticallyDelay also grows quadratically Pipelined design with special
adder is estimated to reach 3GHz speed
Combined with bus partitioningCombined with bus partitioning, our approach will Further reduce CODEC size Also improve CODEC speed Require a single ground wire
between groups
Encoder Gate Count
0
500
1000
1500
2000
2500
3000
1 5 9 13 17 21 25 29
FPF gate count
FTF gate count
Results – Speed ImprovementResults – Speed Improvement
Random sequence directly into bus buffer
10mm trace 45x buffer >1ns delay variation
Random sequence into an FTF encoder
10mm trace 45x buffer <500ps delay variation
waveform w/ coding
-5.00E-01
0.00E+00
5.00E-01
1.00E+00
1.50E+00
2.00E+00
0 2 4 6 8 10 12 14 16 18
Vin1
Vseg1
Vseg2
Vseg3
Vseg4
Vseg5
waveform w/o encoder
-4.00E-01
-2.00E-01
0.00E+00
2.00E-01
4.00E-01
6.00E-01
8.00E-01
1.00E+00
1.20E+00
1.40E+00
1.60E+00
0 2 4 6 8 10 12 14 16 18 20
Vtx1
Vtx2
Vtx3
Vtx4
Vtx5
Results – Speed ImprovementResults – Speed Improvement
Without coding Edge jitter > 1000ps
With coding Edge jitter < 500ps
Received data w/o coding
-2.00E-01
0.00E+00
2.00E-01
4.00E-01
6.00E-01
8.00E-01
1.00E+00
1.20E+00
1.40E+00
0 2 4 6 8 10 12 14 16 18
Voo1
Voo2
Voo3
Voo4
Voo5
Received data w/ coding
-2.00E-01
0.00E+00
2.00E-01
4.00E-01
6.00E-01
8.00E-01
1.00E+00
1.20E+00
1.40E+00
0 2 4 6 8 10 12 14 16
rcv1
rcv2
rcv3
rcv4
rcv5
SummarySummary Showed Forbidden Transition Free code is an
efficient CAC Showed existing CODEC designs are not efficientexisting CODEC designs are not efficient
Exponential growthExponential growth in area as bus size increases
Proposed a mapping scheme based on Fibonacci Proposed a mapping scheme based on Fibonacci Numeral SystemNumeral System Designed efficient CODECs for the FTF code A deterministic mapping Area overhead performance reaches asymptotic lower reaches asymptotic lower
boundbound Systematic implementationSystematic implementation
Implementation results confirms quadratic growth in quadratic growth in both size and delayboth size and delay
Thank you!